Dataset Dependency in CNN-Based Copy-Move Forgery Detection: A Multi-Dataset Comparative Analysis

Convolutional neural networks (CNNs) have established themselves over time as a fundamental tool in the field of copy-move forgery detection due to their ability to effectively identify and analyze manipulated images. Unfortunately, they still represent a persistent challenge in digital image forens...

Full description

Saved in:
Bibliographic Details
Main Authors: Potito Valle Dell’Olmo, Oleksandr Kuznetsov, Emanuele Frontoni, Marco Arnesano, Christian Napoli, Cristian Randieri
Format: Article
Language:English
Published: MDPI AG 2025-06-01
Series:Machine Learning and Knowledge Extraction
Subjects:
Online Access:https://www.mdpi.com/2504-4990/7/2/54
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:Convolutional neural networks (CNNs) have established themselves over time as a fundamental tool in the field of copy-move forgery detection due to their ability to effectively identify and analyze manipulated images. Unfortunately, they still represent a persistent challenge in digital image forensics, underlining the importance of ensuring the integrity of digital visual content. In this study, we present a systematic evaluation of the performance of a convolutional neural network (CNN) specifically designed for copy-move manipulation detection, applied to three datasets widely used in the literature in the context of digital forensics: CoMoFoD, Coverage, and CASIA v2. Our experimental analysis highlighted a significant variability of the results, with an accuracy ranging from 95.90% on CoMoFoD to 27.50% on Coverage. This inhomogeneity has been attributed to specific structural factors of the datasets used, such as the sample size, the degree of imbalance between classes, and the intrinsic complexity of the manipulations. We also investigated different regularization techniques and data augmentation strategies to understand their impact on the network performance, finding that adopting the L2 penalty and reducing the learning rate led to an accuracy increase of up to 2.5% for CASIA v2, while on CoMoFoD we recorded a much more modest impact (1.3%). Similarly, we observed that data augmentation was able to improve performance on large datasets but was ineffective on smaller ones. Our results challenge the idea of universal generalizability of CNN architectures in the context of copy-move forgery detection, highlighting instead how performance is strictly dependent on the intrinsic characteristics of the dataset under consideration. Finally, we propose a series of operational recommendations for optimizing the training process, the choice of the dataset, and the definition of robust evaluation protocols aimed at guiding the development of detection systems that are more reliable and generalizable.
ISSN:2504-4990