Deepfake Video Detection with Convolutional and Recurrent Networks

Related Work

Prior research has examined different methods for detecting deepfake videos. Guera and Delp [4] create a convolutional LSTM model, where a CNN is used to extract features from the frames of each video and the result is fed into an LSTM to process the sequence of frames and predict whether the video is real or a deepfake. Lima et al. [5] also use a convolutional LSTM model to detect deepfake videos, but found that a 3D CNN model far outperformed the convolutional LSTM, achieving over 20% higher accuracy. Although the ideas and models presented in these papers served as inspiration for the models we developed, other work has focused on different approaches to deepfake video detection.

Mittal et al. [6] extract face and speech features from videos and input those features for a real video and a fake video of the same subject to two CNNs, one for face features and another for speech features. In practice, this could be an effective method of detecting deepfake videos because, in reality, both the audio and visual appearance of a deepfake video are typically altered. However, only the appearance of people’s faces in the videos in the Facebook Deepfake Detection Challenge dataset [1] have been altered, not the audio.

CSE 599 G1 Deep Learning Project

Erica Eaton
Pirouz Naghavi

Deepfake Video Detection with Convolutional and Recurrent Networks

Related Work

CSE 599 G1 Deep Learning Project

Erica EatonPirouz Naghavi

Deepfake Video Detection with Convolutional and Recurrent Networks

Related Work

Erica Eaton
Pirouz Naghavi