Advanced deep learning and transfer learning approaches for breast cancer classification using advanced multi-line classifiers and datasets with model optimization and interpretability
This study evaluated machine learning (ML) models on the Wisconsin Breast Cancer Dataset (WBCD), refined to 554 unique instances after addressing 5% missing values via mean imputation, removing 15 duplicates, and normalizing features with Min–Max scaling. Data were split into 80% training and 20% te...
Saved in:
Main Authors: | , , , , |
---|---|
Format: | Article |
Language: | English |
Published: |
PeerJ Inc.
2025-07-01
|
Series: | PeerJ Computer Science |
Subjects: | |
Online Access: | https://peerj.com/articles/cs-2951.pdf |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
_version_ | 1839631468180013056 |
---|---|
author | Xiang Zhang Wei Shao Ming Qiu Chenglin Xiao Liming Ma |
author_facet | Xiang Zhang Wei Shao Ming Qiu Chenglin Xiao Liming Ma |
author_sort | Xiang Zhang |
collection | DOAJ |
description | This study evaluated machine learning (ML) models on the Wisconsin Breast Cancer Dataset (WBCD), refined to 554 unique instances after addressing 5% missing values via mean imputation, removing 15 duplicates, and normalizing features with Min–Max scaling. Data were split into 80% training and 20% testing, maintaining a 63% benign and 37% malignant distribution. Using 10-fold cross-validation, the random forest, XGBoost, and deep neural network (DNN) models achieved accuracies of 96.5% (95% CI: [93.1–98.6]), 97.4% 95% CI: [94.2–99.1], and 98.0% (95% CI [95.1–99.5]), respectively. The DNN demonstrated a benign precision of 0.97, malignant precision of 1.00, benign recall of 1.00, malignant recall of 0.95, and F1-scores of 0.99 and 0.98, with an ROC-AUC of 0.992 (p < 0.001); its accuracy further improved to 98.9% after Bayesian hyperparameter tuning. Additionally, a convolutional neural network (CNN) using transfer learning (VGG16) achieved 99.3% accuracy, with precision and recall of 99.4% and 99.2%, respectively, although potential domain mismatch issues warrant caution. Optimized DNN and CNN models achieved high accuracy, demonstrating highly reliable diagnostic performance with promising clinical applicability. |
format | Article |
id | doaj-art-4d3c4a1ded534a6e9f73bff6a5fe68b0 |
institution | Matheson Library |
issn | 2376-5992 |
language | English |
publishDate | 2025-07-01 |
publisher | PeerJ Inc. |
record_format | Article |
series | PeerJ Computer Science |
spelling | doaj-art-4d3c4a1ded534a6e9f73bff6a5fe68b02025-07-11T15:05:19ZengPeerJ Inc.PeerJ Computer Science2376-59922025-07-0111e295110.7717/peerj-cs.2951Advanced deep learning and transfer learning approaches for breast cancer classification using advanced multi-line classifiers and datasets with model optimization and interpretabilityXiang Zhang0Wei Shao1Ming Qiu2Chenglin Xiao3Liming Ma4Department of Information Management Center, Zhongshan Hospital Affiliated to Dalian University, Dalian, Liaoning, ChinaDepartment of Information Center, The First Hospital of China Medical University, Shenyang, Liaoning, ChinaShenzhen Boyi Technology Co., Ltd., Shenzhen, Guangdong, ChinaResearch and Development Center, Shenzhen Boyi Technology Co., Ltd., Shenzhen, Guangdong, ChinaDepartment of Information Center, Foshan Women and Children Hospital, Foshan, Guangdong, ChinaThis study evaluated machine learning (ML) models on the Wisconsin Breast Cancer Dataset (WBCD), refined to 554 unique instances after addressing 5% missing values via mean imputation, removing 15 duplicates, and normalizing features with Min–Max scaling. Data were split into 80% training and 20% testing, maintaining a 63% benign and 37% malignant distribution. Using 10-fold cross-validation, the random forest, XGBoost, and deep neural network (DNN) models achieved accuracies of 96.5% (95% CI: [93.1–98.6]), 97.4% 95% CI: [94.2–99.1], and 98.0% (95% CI [95.1–99.5]), respectively. The DNN demonstrated a benign precision of 0.97, malignant precision of 1.00, benign recall of 1.00, malignant recall of 0.95, and F1-scores of 0.99 and 0.98, with an ROC-AUC of 0.992 (p < 0.001); its accuracy further improved to 98.9% after Bayesian hyperparameter tuning. Additionally, a convolutional neural network (CNN) using transfer learning (VGG16) achieved 99.3% accuracy, with precision and recall of 99.4% and 99.2%, respectively, although potential domain mismatch issues warrant caution. Optimized DNN and CNN models achieved high accuracy, demonstrating highly reliable diagnostic performance with promising clinical applicability.https://peerj.com/articles/cs-2951.pdfBreast cancer diagnosisMachine learningSupervised learningRandom forestXGBoostDeep neural networks (DNN) |
spellingShingle | Xiang Zhang Wei Shao Ming Qiu Chenglin Xiao Liming Ma Advanced deep learning and transfer learning approaches for breast cancer classification using advanced multi-line classifiers and datasets with model optimization and interpretability PeerJ Computer Science Breast cancer diagnosis Machine learning Supervised learning Random forest XGBoost Deep neural networks (DNN) |
title | Advanced deep learning and transfer learning approaches for breast cancer classification using advanced multi-line classifiers and datasets with model optimization and interpretability |
title_full | Advanced deep learning and transfer learning approaches for breast cancer classification using advanced multi-line classifiers and datasets with model optimization and interpretability |
title_fullStr | Advanced deep learning and transfer learning approaches for breast cancer classification using advanced multi-line classifiers and datasets with model optimization and interpretability |
title_full_unstemmed | Advanced deep learning and transfer learning approaches for breast cancer classification using advanced multi-line classifiers and datasets with model optimization and interpretability |
title_short | Advanced deep learning and transfer learning approaches for breast cancer classification using advanced multi-line classifiers and datasets with model optimization and interpretability |
title_sort | advanced deep learning and transfer learning approaches for breast cancer classification using advanced multi line classifiers and datasets with model optimization and interpretability |
topic | Breast cancer diagnosis Machine learning Supervised learning Random forest XGBoost Deep neural networks (DNN) |
url | https://peerj.com/articles/cs-2951.pdf |
work_keys_str_mv | AT xiangzhang advanceddeeplearningandtransferlearningapproachesforbreastcancerclassificationusingadvancedmultilineclassifiersanddatasetswithmodeloptimizationandinterpretability AT weishao advanceddeeplearningandtransferlearningapproachesforbreastcancerclassificationusingadvancedmultilineclassifiersanddatasetswithmodeloptimizationandinterpretability AT mingqiu advanceddeeplearningandtransferlearningapproachesforbreastcancerclassificationusingadvancedmultilineclassifiersanddatasetswithmodeloptimizationandinterpretability AT chenglinxiao advanceddeeplearningandtransferlearningapproachesforbreastcancerclassificationusingadvancedmultilineclassifiersanddatasetswithmodeloptimizationandinterpretability AT limingma advanceddeeplearningandtransferlearningapproachesforbreastcancerclassificationusingadvancedmultilineclassifiersanddatasetswithmodeloptimizationandinterpretability |