OpenFungi: A Machine Learning Dataset for Fungal Image Recognition Tasks

A key aspect driving advancements in machine learning applications in medicine is the availability of publicly accessible datasets. Evidently, there are studies conducted in the past with promising results, but they are not reproducible due to the fact that the data used are closed or proprietary or...

Full description

Saved in:
Bibliographic Details
Main Authors: Anca Cighir, Roland Bolboacă, Teri Lenard
Format: Article
Language:English
Published: MDPI AG 2025-07-01
Series:Life
Subjects:
Online Access:https://www.mdpi.com/2075-1729/15/7/1132
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:A key aspect driving advancements in machine learning applications in medicine is the availability of publicly accessible datasets. Evidently, there are studies conducted in the past with promising results, but they are not reproducible due to the fact that the data used are closed or proprietary or the authors were not able to publish them. The current study aims to narrow this gap for researchers who focus on image recognition tasks in microbiology, specifically in fungal identification and classification. An open database named OpenFungi is made available in this work; it contains high-quality images of macroscopic and microscopic fungal genera. The fungal cultures were grown from food products such as green leaf spices and cereals. The quality of the dataset is demonstrated by solving a classification problem with a simple convolutional neural network. A thorough experimental analysis was conducted, where six performance metrics were measured in three distinct validation scenarios. The results obtained demonstrate that in the fungal species classification task, the model achieved an overall accuracy of 99.79%, a true-positive rate of 99.55%, a true-negative rate of 99.96%, and an F1 score of 99.63% on the macroscopic dataset. On the microscopic dataset, the model reached a 97.82% accuracy, a 94.89% true-positive rate, a 99.19% true-negative rate, and a 95.20% F1 score. The results also reveal that the model maintains promising performance even when trained on smaller datasets, highlighting its robustness and generalization capabilities.
ISSN:2075-1729