Advanced deep learning and transfer learning approaches for breast cancer classification using advanced multi-line classifiers and datasets with model optimization and interpretability

This study evaluated machine learning (ML) models on the Wisconsin Breast Cancer Dataset (WBCD), refined to 554 unique instances after addressing 5% missing values via mean imputation, removing 15 duplicates, and normalizing features with Min–Max scaling. Data were split into 80% training and 20% te...

Full description

Saved in:
Bibliographic Details
Main Authors: Xiang Zhang, Wei Shao, Ming Qiu, Chenglin Xiao, Liming Ma
Format: Article
Language:English
Published: PeerJ Inc. 2025-07-01
Series:PeerJ Computer Science
Subjects:
Online Access:https://peerj.com/articles/cs-2951.pdf
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1839631468180013056
author Xiang Zhang
Wei Shao
Ming Qiu
Chenglin Xiao
Liming Ma
author_facet Xiang Zhang
Wei Shao
Ming Qiu
Chenglin Xiao
Liming Ma
author_sort Xiang Zhang
collection DOAJ
description This study evaluated machine learning (ML) models on the Wisconsin Breast Cancer Dataset (WBCD), refined to 554 unique instances after addressing 5% missing values via mean imputation, removing 15 duplicates, and normalizing features with Min–Max scaling. Data were split into 80% training and 20% testing, maintaining a 63% benign and 37% malignant distribution. Using 10-fold cross-validation, the random forest, XGBoost, and deep neural network (DNN) models achieved accuracies of 96.5% (95% CI: [93.1–98.6]), 97.4% 95% CI: [94.2–99.1], and 98.0% (95% CI [95.1–99.5]), respectively. The DNN demonstrated a benign precision of 0.97, malignant precision of 1.00, benign recall of 1.00, malignant recall of 0.95, and F1-scores of 0.99 and 0.98, with an ROC-AUC of 0.992 (p < 0.001); its accuracy further improved to 98.9% after Bayesian hyperparameter tuning. Additionally, a convolutional neural network (CNN) using transfer learning (VGG16) achieved 99.3% accuracy, with precision and recall of 99.4% and 99.2%, respectively, although potential domain mismatch issues warrant caution. Optimized DNN and CNN models achieved high accuracy, demonstrating highly reliable diagnostic performance with promising clinical applicability.
format Article
id doaj-art-4d3c4a1ded534a6e9f73bff6a5fe68b0
institution Matheson Library
issn 2376-5992
language English
publishDate 2025-07-01
publisher PeerJ Inc.
record_format Article
series PeerJ Computer Science
spelling doaj-art-4d3c4a1ded534a6e9f73bff6a5fe68b02025-07-11T15:05:19ZengPeerJ Inc.PeerJ Computer Science2376-59922025-07-0111e295110.7717/peerj-cs.2951Advanced deep learning and transfer learning approaches for breast cancer classification using advanced multi-line classifiers and datasets with model optimization and interpretabilityXiang Zhang0Wei Shao1Ming Qiu2Chenglin Xiao3Liming Ma4Department of Information Management Center, Zhongshan Hospital Affiliated to Dalian University, Dalian, Liaoning, ChinaDepartment of Information Center, The First Hospital of China Medical University, Shenyang, Liaoning, ChinaShenzhen Boyi Technology Co., Ltd., Shenzhen, Guangdong, ChinaResearch and Development Center, Shenzhen Boyi Technology Co., Ltd., Shenzhen, Guangdong, ChinaDepartment of Information Center, Foshan Women and Children Hospital, Foshan, Guangdong, ChinaThis study evaluated machine learning (ML) models on the Wisconsin Breast Cancer Dataset (WBCD), refined to 554 unique instances after addressing 5% missing values via mean imputation, removing 15 duplicates, and normalizing features with Min–Max scaling. Data were split into 80% training and 20% testing, maintaining a 63% benign and 37% malignant distribution. Using 10-fold cross-validation, the random forest, XGBoost, and deep neural network (DNN) models achieved accuracies of 96.5% (95% CI: [93.1–98.6]), 97.4% 95% CI: [94.2–99.1], and 98.0% (95% CI [95.1–99.5]), respectively. The DNN demonstrated a benign precision of 0.97, malignant precision of 1.00, benign recall of 1.00, malignant recall of 0.95, and F1-scores of 0.99 and 0.98, with an ROC-AUC of 0.992 (p < 0.001); its accuracy further improved to 98.9% after Bayesian hyperparameter tuning. Additionally, a convolutional neural network (CNN) using transfer learning (VGG16) achieved 99.3% accuracy, with precision and recall of 99.4% and 99.2%, respectively, although potential domain mismatch issues warrant caution. Optimized DNN and CNN models achieved high accuracy, demonstrating highly reliable diagnostic performance with promising clinical applicability.https://peerj.com/articles/cs-2951.pdfBreast cancer diagnosisMachine learningSupervised learningRandom forestXGBoostDeep neural networks (DNN)
spellingShingle Xiang Zhang
Wei Shao
Ming Qiu
Chenglin Xiao
Liming Ma
Advanced deep learning and transfer learning approaches for breast cancer classification using advanced multi-line classifiers and datasets with model optimization and interpretability
PeerJ Computer Science
Breast cancer diagnosis
Machine learning
Supervised learning
Random forest
XGBoost
Deep neural networks (DNN)
title Advanced deep learning and transfer learning approaches for breast cancer classification using advanced multi-line classifiers and datasets with model optimization and interpretability
title_full Advanced deep learning and transfer learning approaches for breast cancer classification using advanced multi-line classifiers and datasets with model optimization and interpretability
title_fullStr Advanced deep learning and transfer learning approaches for breast cancer classification using advanced multi-line classifiers and datasets with model optimization and interpretability
title_full_unstemmed Advanced deep learning and transfer learning approaches for breast cancer classification using advanced multi-line classifiers and datasets with model optimization and interpretability
title_short Advanced deep learning and transfer learning approaches for breast cancer classification using advanced multi-line classifiers and datasets with model optimization and interpretability
title_sort advanced deep learning and transfer learning approaches for breast cancer classification using advanced multi line classifiers and datasets with model optimization and interpretability
topic Breast cancer diagnosis
Machine learning
Supervised learning
Random forest
XGBoost
Deep neural networks (DNN)
url https://peerj.com/articles/cs-2951.pdf
work_keys_str_mv AT xiangzhang advanceddeeplearningandtransferlearningapproachesforbreastcancerclassificationusingadvancedmultilineclassifiersanddatasetswithmodeloptimizationandinterpretability
AT weishao advanceddeeplearningandtransferlearningapproachesforbreastcancerclassificationusingadvancedmultilineclassifiersanddatasetswithmodeloptimizationandinterpretability
AT mingqiu advanceddeeplearningandtransferlearningapproachesforbreastcancerclassificationusingadvancedmultilineclassifiersanddatasetswithmodeloptimizationandinterpretability
AT chenglinxiao advanceddeeplearningandtransferlearningapproachesforbreastcancerclassificationusingadvancedmultilineclassifiersanddatasetswithmodeloptimizationandinterpretability
AT limingma advanceddeeplearningandtransferlearningapproachesforbreastcancerclassificationusingadvancedmultilineclassifiersanddatasetswithmodeloptimizationandinterpretability