Everything You Need to Know About How to Use Deepfake Technology
Deepfakes are a synthetic media created by machine-learning algorithms named for the deep-learning methods used in the creation process and the fake events they depict.
Deepfake methods intersect disciplines and industries from computer science and programming to visual effects, computer animation, and even neuroscience. They can be convincingly realistic and difficult to detect when done well and with the aid of sophisticated and powerful technologies.
But ultimately, machine learning is a foundational concept for data scientists, and as such, it offers an interesting area of study in the context of deepfakes and the predictive models used to create them. The training methods, algorithmic structures and synthetic output of these models offer insight into deep learning and data.
A Brief History of Deepfake Technology
In 2017 a Reddit user by the name of “deepfake” posted pornographic videos created through the use of face-swapping technology that replaced the original subjects’ faces with those of known celebrities.
Though cropping up in a variety of applications, Deepfakes have been implemented within the porn industry more than any other to date. A 2019 report released by Amsterdam-based cybersecurity firm Sensity — formerly Deeptrace — found that “nonconsensual deepfake pornography accounted for 96% of the total deepfake videos online.”
This, however, is not where the deepfake story begins, ends, or best succeeds.
Deep-learning technology, including rudimentary versions of the models that make deepfakes — also known as synthetic media — has existed for decades, but the limited graphics processing power of computers at that time made most applications cumbersome and impractical.
According to freeCodeCamp contributor Nick McCullum, the cognitive psychologist and computer scientist Geoffrey Hinton contributed significantly to the study of deep learning with his introduction of the artificial neural network.
Hinton’s artificial neural network, an integral component of advanced deepfake techniques used today, was intended to closely resemble the architecture of the human brain, relaying signals through layers of nodes that process large amounts of data to learn and classify information.
Similar to the way neurons in the human brain create meaning as they process the data they receive, artificial neural networks, or ANNs, pass raw data (noise) from their input layers to their middle (hidden) layers and finally to the output layer.
As we’ll see when we get to the section on how to create a deepfake video, image, or audio by way of artificial intelligence deep-learning models, the most accurate synthetic media outputs are those that result from a large volume of high-quality data.
For example, some of the most popular deepfakes have come from visual effects specialist Chris Ume, who offered a peek behind the curtain of his unsettlingly realistic viral deepfakes of Tom Cruise on TikTok.
In an interview with Science Weekly, Ume explained that such sophisticated deepfakes require “as much data as possible — pictures, videos, anything you can find. And then you scrub through them and you clean it up so you only have the best of the best.”
This abundance of available data is a big part of what makes the Tom Cruise videos so uncannily authentic. The actor has been filmed and photographed for nearly 40 years, so the sheer volume of data that could be used for training makes the output — i.e., the deepfake — a stunningly accurate representation.
“The important thing is you cover all angles. You have as much expression as possible and even try to have a lot of different light angles, so the machine knows how Tom’s face reacts in certain scenes,” said Ume.
We’ll get into a discussion of data and machine-learning training techniques in the section on creating a deepfake. For now, the key takeaway is that deepfake technology, which is based on deep-learning models, has been around for decades.
Deep learning has its roots in cognitive science and has been advanced over the years by researchers in diverse fields, including computer science, artificial intelligence, neurophysiology, cybernetics and logic.
How to Create a Deepfake Video
There are several ways to make deepfake videos. For a piece of synthetic media to qualify as a true deepfake, it must use deep-learning training techniques to achieve the goal of facial manipulation. This includes altering expressions, swapping the faces of two real people or generating a nonexistent human face from a dataset that includes thousands of images of real people.
Recalling Ume’s explanation of the resources needed to make realistic synthetic media, we can deduce that some methods are more precise and exacting than others.
In addition to massive amounts of pre-training data in the form of random faces and the training data required for Cruise’s face specifically, Ume attributes the authenticity of his deepfakes to the performance of professional actor Miles Fisher.
Fisher has perfected Cruise’s gestures, mannerisms and facial expressions, which provided the destination videos that Ume used in his deepfakes. He compared the current deepfake technology to that of Photoshop, telling host Alex Hern that, just as it takes professional-level skills to create superior images with Photoshop, you need a high level of skill and experience for generating undetectable deepfakes.
“You can’t do it by just pressing a button,” Ume told The Verge in a recent interview. “That’s important, that’s a message I want to tell people.”
All of this is to say that high-quality synthetic media require impeccable data for both the source and target media to effectively train the models.
It’s also worth noting that for regular people, this amount of data does not exist in the wild.
Tech and Gear
To create a deepfake on the level of the videos found on Ume’s @deeptomcruise TIkTok, you would need a high-powered machine and GPU.
You can find several no-code apps, websites and open-source software that allow for facial manipulation in one of two categories: facial expression manipulation and facial identity manipulation. The latter is commonly known as “face swapping.”
Commercial Apps and Websites for Facial Manipulation
- Deepfakes Web
In addition to these tools, we’ve seen a deluge of shallowfakes that employ methods as simple as slowing or accelerating audio and video or mislabeling media with the intent to deceive.
In 2019, human rights campaigner and Witness program manager Sam Gregory raised awareness of the dangers of shallowfakes in a speech he gave to an EmTech Digital audience.
“By these ‘shallowfakes’ I mean the tens of thousands of videos circulated with malicious intent worldwide right now — crafted not with sophisticated AI, but often simply relabeled and re-uploaded, claiming an event in one place has just happened in another,” Gregory said.
From a data science and machine-learning standpoint, shallowfakes are of little value, but it’s important to note their existence and understand the difference between a shallowfake and a deepfake.
The artificial intelligence and deep-learning technology currently used for deepfakes typically involve generative adversarial networks, or GANs, and autoencoders.
The Science Behind Deepfakes
Data scientists interested in the implications of deepfake technology for private enterprise, government entities, cybersecurity, and public safety can learn a lot from studying deepfake methods and the science behind them.
In the face of evolving deep-learning models, it is becoming more crucial for researchers, companies, and world leaders to develop the skills and resources to address the potential threat posed by harmful synthetic media.
Government agencies and large corporations are invested in the advancement of deepfake detection. Thanh Thi Nguyen and colleagues authored a paper titled “Deep Learning for Deepfakes Creation and Detection,” which emphasized the importance of deepfake detection.
“To address the threat of face-swapping technology or deepfakes, the United States Defense Advanced Research Projects Agency (DARPA) initiated a research scheme in media forensics (named Media Forensics or MediFor) to accelerate the development of fake digital visual media detection methods.”
The ability to create and detect deepfakes will become a more valuable skill set as the technology advances and the potential for nefarious uses of deep learning escalates.
The following three methods for creating deepfakes are commonly used machine-learning models.
An autoencoder is an unsupervised neural network that can reduce the dimensionality of raw data and generate an output that replicates its input.
Autoencoders consist of encoders and decoders. When data is fed through the first layer — the input layer — of the autoencoder’s neural network, the encoder compresses the image and feeds it to the decoder. The decoder then attempts to reconstruct the original data.
Deepfakes leverage autoencoders by training two network pairs, one encoder-decoder pair for the source-image dataset and another for the target-image dataset. The pairs share the encoder network, which allows the encoder to learn the structure of a human face. When the source image then passes through the decoder that has been trained for the target image, it synthesizes the two images in the reconstruction process.
Generative Adversarial Networks — GANs
A generative adversarial network, or GAN, is a machine-learning method in which two neural networks — a generator and a discriminator — compete to boost their levels of accuracy.
In this model, which is often referred to as a zero-sum game, the generator converts randomized data from a training dataset into an image. This image is added to a stream of real images that is then fed to the discriminator. The job of the discriminator is to differentiate the real images from the synthetic images.
The goal of a neural network is to minimize errors. In the case of deepfakes, this means minimizing the difference between the fake image and the real images. To achieve this result, the process is repeated with model-weight adjustments until the output reaches the desired level of accuracy.
First Order Motion Model
In first order motion models, image animation techniques allow the user to animate existing videos using the source code from the paper First Order Motion Model for Image Animation.
According to the authors, the model “is trained to reconstruct the training videos by combining a single frame and a learned latent representation of the motion in the video.”
Dimitris Poulopoulos, a machine learning engineer and contributor to Towards Data Science, summarized the first order motion model and provided an interactive example in which he used the source code to create a shell script, and then applied the model weights, a YAML configuration file, a source image, and a driving video.
How Neural Networks Make Deepfakes Possible
We’ve referenced neural networks throughout this article. That’s because neural networks are the basis for deepfake technology.
Neural networks make machine learning possible through a “feed-forward” structure of interconnected nodes. These nodes mirror the neurons in the human brain. And like the human brain, a computer can learn to perform a task through training.
In fact, researchers at MIT in 2016 reported that their computational model of the human brain’s face-recognition mechanism generated a spontaneous reproduction of “invariant representations” of faces.
They had designed and trained a machine-learning scheme for the model and discovered that “the trained system included an intermediate processing step that represented a face’s degree of rotation — say, 45 degrees from center — but not the direction — left or right.”
According to the researchers, this step, which wasn’t built into the algorithm and appeared to mimic the human brain in its recognition of faces and objects, was “an indication that their system and the brain are doing something similar.”
Christof Koch of the Allen Institute for Brain Science considered the findings significant.
“In this day and age, when everything is dominated by either big data or huge computer simulations, this shows you how a principled understanding of learning can explain some puzzling findings,” said Koch.
Neural networks of half a century ago consisted of fewer than five layers. Today, the advanced capabilities of graphics processing units are able to power neural networks with depths of up to 50 layers.
For data scientists to fully understand deep learning, they need a solid foundational knowledge of neural networks.
The process is fairly straightforward.
The nodes throughout the layers of a neural network receive input signals and perform calculations that result in output signals that the nodes then feed forward to the next layer. The more layers of nodes, the deeper the network.
The connection that allows the transmission of these signals, the synapse, is associated with a weight that determines the influence of the node on the final output. During training, the synapse weights are adjusted repeatedly.
According to McCullum, weights “are a very important topic in the field of deep learning because adjusting a model’s weights is the primary way through which deep learning models are trained.”
The output layer of the neural network makes predictions based on the calculations of the hidden layers in a deep network. The training process consists of the network determining which input values should be used in the next layer’s calculations.
In order for the computer to make these determinations, the program must be soft coded to enable the computer to interpret the problem and solve it on its own.
Uses: The Good & Bad
The ethical implications of deepfake technology have been debated for the last several years. While there are many beneficial uses for image and video manipulation, researchers will need to stay on top of this evolving branch of artificial intelligence and continue to hone their skills to guard against harmful applications employed by bad actors.
The threat inherent in such powerful technology is real, and it increases with each new development. As deepfakes improve and become less detectable, the risk to our security expands in scope and potential impact.
According to Nguyen and colleagues, deepfakes become a threat to the world when they are used to falsify the speech and actions of world leaders.
“Deepfakes therefore can be abused to cause political or religion tensions between countries, to fool public and affect results in election campaigns, or create chaos in financial markets by creating fake news,” the authors posited.
- Intercultural communication
- Disruption of extremist groups
Laws and policies have been implemented to thwart the exploitation of deepfake methods, but critics say many of these rules don’t go far enough.
After Facebook announced its policy to ban deepfakes, The Guardian reported that “Facebook did not give a reason as to why it limited its policy exclusively to those videos manipulated using AI tools, but it is likely that the company wanted to avoid putting itself in a situation where it had to make subjective decisions about intent or truth.”
And according to IEEE Spectrum, “Identity fraud was the top worry regarding deepfakes for more than three-quarters of respondents to a cybersecurity industry poll by the biometric firm iProov.”
Data scientists who work in the financial services industry will likely be at the forefront of AI for detecting deepfakes and performing digital media forensics.
- Identity fraud
As human rights activist Mark Latonero argued, we need data scientists and technology companies to take a proactive approach to the inevitable erosion of trust that will follow the deepfake evolution. If the notion that we, as mere humans, can’t trust anything we see takes hold, it will undermine democracy.
“Now is really the time for companies, researchers, and others to build these very strong connections to civil society, and the different country offices where your products might launch. … Engage with the people who are closest to the issues in these countries. Build those alliances now,” he said. “When something does go wrong — and it will — we can start to have the foundation for collaboration and knowledge exchange.”