Conformal taxonomic validation: A semi-automated validation framework for citizen science records

Citizen science records are a valuable source of marine biodiversity data, especially where standardized sampling campaigns are limited in spatial or temporal scope. However, such records often contain biases and errors and typically require expert validation before they can reliably support scienti...

Full description

Saved in:
Bibliographic Details
Main Authors: Matthieu de Castelbajac, Sandra Bringay, Arnaud Sallaberry, Maximilien Servajean, Clémence Epinoux, Juan Carlos Molinero, Delphine Bonnet
Format: Article
Language:English
Published: Elsevier 2025-12-01
Series:Ecological Informatics
Subjects:
Online Access:http://www.sciencedirect.com/science/article/pii/S1574954125002997
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:Citizen science records are a valuable source of marine biodiversity data, especially where standardized sampling campaigns are limited in spatial or temporal scope. However, such records often contain biases and errors and typically require expert validation before they can reliably support scientific research. Validating large volumes of citizen science data remains an important challenge. In this paper, we present a semi-automated validation framework that combines a deep learning classifier with conformal prediction to generate sets of plausible taxonomic labels at multiple ranks, while providing rigorous control over prediction confidence. Extensive evaluation was carried out using 25,000 jellyfish records, both with and without prior validation, as well as against 800 expert-validated entries. Our results show that the method frequently produces singleton prediction sets that can be accepted automatically, offering a high-confidence and scalable solution for validating marine citizen science data.
ISSN:1574-9541