GenConVit+: Advanced hybrid framework for deepfake detection for safeguarding digital media integrity

DOI:
https://doi.org/10.62110/sciencein.jist.2024.v12.820Keywords:
Deep Learning , DeepFake detection, Computer Vision Transformer, 3D CNN, Hybrid models, Convolutional Neural Network (CNN)Abstract
The propagation of deepfake videos has introduced serious concerns, particularly in their potential to Circulate misleading details and undermine the integrity of digital media. In response to this challenge, we present the Generative Convolutional Vision Transformer (GenConViT) as a robust solution for deepfake video detection. GenConViT integrates the strengths of ConvNeXt and Swin Transformer models with 3D Convolutional neural network (CNN) to extract relevant features. It further harnesses the capabilities of Autoencoders and Variational Autoencoders to discern patterns in latent data distribution. Our model’s proficiency is validated through rigorous training and evaluation on four distinct datasets. DFDC, FF++, DeepFakeTIMIT, and Celeb-DF (v2). The results speak volumes, with GenConViT achieving notably high classification accuracy, F1 Scores, and AUC values. It rises to the challenge of generalizability in deepfake detection by effectively differentiating a wide spectrum of falsified videos while upholding the integrity of digital media. On average, the GenConViT model attains an accuracy of 95.6% and an impressive AUC value of 99.3% across the datasets we examined. This underscores its capacity to robustly detect deepfake content and maintain the integrity of digital media.
URN:NBN:sciencein.jist.2024.v12.820
Downloads
Downloads
Published
Issue
Section
URN
License
Copyright (c) 2024 Mithun B. Patil, Vijay A. Sangolgi, Vipul V. Bag, Abdul Basit Patwegar, Rohini Koli, Aafra Naikwadi, Abdul Gani Shaikh

This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.
Rights and Permission