Explainable machine learning model for predicting internal mammary node metastasis in breast cancer: Multi-method development and cross-cohort validation
Background: This study developed an explainable machine learning model for baseline internal mammary lymph node metastasis (IMNM) in breast cancer patients. Materials and methods: This study included three cohorts: a derivation cohort (n = 1997) from Peking University Cancer Hospital, a temporal tes...
Saved in:
Main Authors: | , , , , , , , |
---|---|
Format: | Article |
Language: | English |
Published: |
Elsevier
2025-08-01
|
Series: | Breast |
Subjects: | |
Online Access: | http://www.sciencedirect.com/science/article/pii/S096097762500534X |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Summary: | Background: This study developed an explainable machine learning model for baseline internal mammary lymph node metastasis (IMNM) in breast cancer patients. Materials and methods: This study included three cohorts: a derivation cohort (n = 1997) from Peking University Cancer Hospital, a temporal testing cohort (n = 633) from the same center, and a SEER cohort (n = 51,420). Multiple machine learning strategies were conducted: Least Absolute Shrinkage and Selection Operator (LASSO), Boruta, backward stepwise regression, and best subset for feature selection, and logistic regression (LR), support vector machines (SVM), k-nearest neighbors (KNN), and extreme gradient boosting (XGBoost) for model construction. The best-performing model was validated across internal and temporal testing cohorts. Shapley Additive Explanations (SHAP) analysis was conducted to improve interpretability. Results: Six clinical features (clinical N stage, size, stage, classification, grade and location) were used to construct the final predictive model with SVM. The model achieved robust performance, with AUCs of 0·811 (0·790–0·843), 0.806 (0·760-0·857) and 0·864 (0·830–0·926) in the training, internal testing and temporal testing cohort, respectively. High-risk patients exhibited significantly worse outcomes with DFS (HR 2·776, 95 % CI: 1·897–4·064, p < 0·001) and OS (HR of 1·962, 95 % CI: 1·853–2·077, p < 0·001). An online prediction tool was established that allows users to input key clinical variables and obtain model-predicted probabilities along with SHAP-based explanations. Conclusion: This validated and explainable machine learning model offers a practical tool for early risk stratification, aiding clinicians in appropriate baseline imaging selection and adjuvant treatment planning. |
---|---|
ISSN: | 1532-3080 |