University of Valencia logo Logo Master's Degree in Data Science Logo del portal

AI Can Recognize Your Face Even If You’re Pixelated

Pixelated doesn't work anymore. We can't see what it hides, but a computer can.

20 september 2016

Pixelation has long been a familiar fig leaf to cover our visual media’s most private parts. Blurred chunks of text or obscured faces and license plates show up on the news, in redacted documents, and online. The technique is nothing fancy, but it has worked well enough, because people can’t see or read through the distortion. The problem, however, is that humans aren’t the only image recognition masters around anymore. As computer vision becomes increasingly robust, it’s starting to see things we can’t.

Researchers at the University of Texas at Austin and Cornell Tech say that they’ve trained a piece of software that can undermine the privacy benefits of standard content-masking techniques like blurring and pixelation by learning to read or see what’s meant to be hidden in images—anything from a blurred house number to a pixelated human face in the background of a photo. The team found that mainstream machine learning methods—the process of "training a computer with a set of example data rather tahn programming it"—lend themselves readily to this type of attack.

The researchers were able to defeat three privacy protection technologies, starting with YouTube’s proprietary blur tool. YouTube allows uploaders to select objects or figures that they want to blur, but the team used their attack to identify obfuscated faces in videos. In another example of their method, the researchers attacked pixelation (also called mosaicing). To generate different levels of pixelation, they used their own implementation of a standard mosaicing technique that the researchers say is found in Photoshop and other commons programs. And finally, they attacked a tool called Privacy Preserving Photo Sharing (P3), which encrypts identifying data in JPEG photos so humans can’t see the overall image, while leaving other data components in the clear so computers can still do things with the files like compress them.

To execute the attacks, the team trained neural networks to perform image recognition by feeding them data from four large and well-known image sets for analysis. The more words, faces, or objects a neural network “sees,” the better it gets at spotting those targets. Once the neural networks achieved roughly 90 percent accuracy or better on identifying relevant objects in the training sets, the researchers obfuscated the images using the three privacy tools and then further trained their neural networks to interpret blurred and pixelated images based on knowledge of the originals.

Finally, they used obfuscated test images that the neural networks hadn’t yet been exposed to in any form to see whether the image recognition could identify faces, objects, and handwritten numbers. If the computers had been randomly guessing to identify the faces, shapes, and numbers, however, the researchers calculated that the success rates for each test set would have been at most 10 percent and as low as a fifth of a percent, meaning that even relatively low identification success rates were still far better than guessing.

Even if the group’s machine learning method couldn’t always penetrate the effects of redaction on an image, it still represents a serious blow to pixelation and blurring as a privacy tool, says Lawrence Saul, a machine learning researcher at University of California, San Diego. “For the purposes of defeating privacy, you don’t really need to show that 99.9 percent of the time you can reconstruct” an image or string of text, says Saul. “If 40 or 50 percent of the time you can guess the face or figure out what the text is then that’s enough to render that privacy method as something that should be obsolete.”

The researchers’ larger goal is to warn the privacy and security communities that advances in machine learning as a tool for identification and data collection can’t be ignored. There are ways to defend against these types of attacks, as Saul points out, like using black boxes that offer total coverage instead of image distortions that leave traces of the content behind.

The original piece was published at Wired.