Posted on

Are video deepfakes powerful enough to influence political discourse?

Are video deepfakes powerful enough to influence political discourse?

An expert in AI video generation talks about the technology’s rapid advances – and its current limitations.

There have already been several high-profile examples of people using deepfakes this presidential cycle to try to influence voters. Deepfakes are images, audio recordings or videos that are created or modified using artificial intelligence (AI) models to depict real or fictional people. Recent deepfake examples include manipulated audio recordings of Joe Biden urging voters to stay home during the primaries and fake images of Taylor Swift endorsing Donald Trump.

It seems that generative artificial intelligence is an increasingly important tool in the misinformation toolbox. Should voters worry about being bombarded with fake videos of politicians created with generative AI? A computer vision and deep learning expert at the University of Rochester says that while the technology is rapidly evolving, the complexity of creating deepfake videos still makes it harder for malicious actors to exploit.

While OpenAI’s products, including ChatGPT for text generation and DALL-E 3 for image generation, are becoming increasingly popular, the company has not yet released an equivalent for video generation. According to Chenliang

“Generating videos using AI is still an ongoing research topic and a difficult problem because it involves what we call multimodal content,” says Xu. “Producing moving videos along with corresponding sound is a difficult task in itself – and matching them together is even more difficult.”

Xu says that in 2017, his research group was among the first to use artificial neural networks to generate multimodal videos. They started with tasks such as providing an image of a violin player and recording audio of a violin to create a moving video of a violin player. From there, they moved on to problems like creating lip movements and then creating full talking faces with head gestures from a single image.

“Now we can generate fully addressable heads in real time and even convert the heads into different styles specified by language descriptions,” says Xu.

TALKING HEADS: The computer scientist Chenliang

Challenges in deepfake detection technology

Xu’s team has also developed deepfake detection technology. He calls it an area that requires extensive further research and points out that because of the training data required to build the generalized deepfake detection models, it is easier to develop technologies to generate deepfakes than to detect them.

Politicians and celebrities are easier to generate than regular people because there is simply more data about them.”

“If you want to develop a technology that can detect deepfakes, you need to create a database that detects what are fake images and what are real images,” says Xu. “This labeling requires an additional level of human engagement that this generation does not.”

Another concern, he adds, is developing a detector that is applicable to different types of deepfake generators. “You can build a model that performs well with the techniques you know, but if someone uses a different model, your detection algorithm will have a hard time picking that up,” he says.

The easiest targets for video deepfakes

Access to good training data is critical to building effective generative AI models. As a result, Xu says politicians and celebrities will be the first and easiest targets when video generators become widely available.

“Politicians and celebrities are easier to generate than regular people because there is simply more data about them,” says Xu. “Because there are already so many videos of them, these models can use it to learn what expressions they show in different situations, along with their voices, their hair, movements and emotions.”

But he assumes that the training data on which the “celebrity deepfakes” in particular are based could be easier to notice, at least initially.

“Using only high-quality photos to train a model will produce similar results,” says Xu. “It can result in an overly slick style, which can be seen as an indication that it is a deepfake.”

Other clues can include how natural a person’s reaction appears, whether they can move their head, and even the number of teeth displayed. But image generators have overcome similar early tells — such as creating hands with six fingers — and Xu says enough training data can mitigate these limitations.

He calls on the research community to invest more effort in developing deepfake detection strategies and address the ethical concerns surrounding the development of these technologies.

“Generative models are a tool that can do good things in the hands of good people, but can do bad things in the hands of bad people,” Xu says. “The technology itself is neither good nor bad, but we need to discuss how we can prevent these powerful tools from falling into the wrong hands and being used maliciously.”