Explainable machine learning model for predicting internal mammary node metastasis in breast cancer: Multi-method development and cross-cohort validation

Background: This study developed an explainable machine learning model for baseline internal mammary lymph node metastasis (IMNM) in breast cancer patients. Materials and methods: This study included three cohorts: a derivation cohort (n = 1997) from Peking University Cancer Hospital, a temporal tes...

Full description

Saved in:

Bibliographic Details
Main Authors:	Yirong Xiang, Jian Tie, Siyuan Zhang, Chen Shi, Changkuo Guo, Yushuo Peng, Zhaoqing Fan, Weihu Wang
Format:	Article
Language:	English
Published:	Elsevier 2025-08-01
Series:	Breast
Subjects:	Breast cancer Internal mammary lymph node metastasis Machine learning SHAP
Online Access:	http://www.sciencedirect.com/science/article/pii/S096097762500534X
Tags:	Add Tag No Tags, Be the first to tag this record!

Description
Summary:	Background: This study developed an explainable machine learning model for baseline internal mammary lymph node metastasis (IMNM) in breast cancer patients. Materials and methods: This study included three cohorts: a derivation cohort (n = 1997) from Peking University Cancer Hospital, a temporal testing cohort (n = 633) from the same center, and a SEER cohort (n = 51,420). Multiple machine learning strategies were conducted: Least Absolute Shrinkage and Selection Operator (LASSO), Boruta, backward stepwise regression, and best subset for feature selection, and logistic regression (LR), support vector machines (SVM), k-nearest neighbors (KNN), and extreme gradient boosting (XGBoost) for model construction. The best-performing model was validated across internal and temporal testing cohorts. Shapley Additive Explanations (SHAP) analysis was conducted to improve interpretability. Results: Six clinical features (clinical N stage, size, stage, classification, grade and location) were used to construct the final predictive model with SVM. The model achieved robust performance, with AUCs of 0·811 (0·790–0·843), 0.806 (0·760-0·857) and 0·864 (0·830–0·926) in the training, internal testing and temporal testing cohort, respectively. High-risk patients exhibited significantly worse outcomes with DFS (HR 2·776, 95 % CI: 1·897–4·064, p < 0·001) and OS (HR of 1·962, 95 % CI: 1·853–2·077, p < 0·001). An online prediction tool was established that allows users to input key clinical variables and obtain model-predicted probabilities along with SHAP-based explanations. Conclusion: This validated and explainable machine learning model offers a practical tool for early risk stratification, aiding clinicians in appropriate baseline imaging selection and adjuvant treatment planning.
ISSN:	1532-3080

Explainable machine learning model for predicting internal mammary node metastasis in breast cancer: Multi-method development and cross-cohort validation

Similar Items