Stacked random forest model for colorectal cancer detection using complete blood counts

Background In China, adherence to screening colonoscopy among eligible individuals remains suboptimal, primarily due to cost concerns and potential adverse effects. A machine learning model utilizing complete blood count (CBC) data could help prioritize colonoscopy referrals and improve screening pa...

Full description

Saved in:
Bibliographic Details
Main Authors: Junfeng Luo, Weiwei Tan, Shaobo Chen, Yijing Chen, Ya Fu, Xiaojuan Jing, Lingling Kang, Qingyun Li, Zhenjian Ma, Tingji Sun, Peng Xiao, Shigui Xue, Xiaozhi Wang, Houde Zhang
Format: Article
Language:English
Published: SAGE Publishing 2025-07-01
Series:Digital Health
Online Access:https://doi.org/10.1177/20552076251362072
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:Background In China, adherence to screening colonoscopy among eligible individuals remains suboptimal, primarily due to cost concerns and potential adverse effects. A machine learning model utilizing complete blood count (CBC) data could help prioritize colonoscopy referrals and improve screening participation. Method This multicenter study included participants who underwent CBC testing within three months before colonoscopy. CBC data were classified into three types (A, B, and C) based on hematology analyzer capabilities, with Type C excluded from analysis. Using Types A and B, we developed a stacking machine learning model incorporating 24 CBC features and 5 combined CBC components to predict colorectal cancer (CRC). Model performance was evaluated using the area under the curve (AUC), specificity, and sensitivity. Results The study included 1795 CRC cases and 26,380 cancer-free individuals with CBC data. On external validation, the model achieved 80.3% specificity and 65.2% sensitivity. Notably, it demonstrated 41% sensitivity for Stage I CRC and 57.6% sensitivity for Stages I–III combined. Conclusions CBC testing, combined with electronic medical record data, is a low-cost and widely accessible tool. Our robust CRC risk prediction model can serve as a preliminary screening method, aiding in colonoscopy referral decisions and improving CRC screening efficiency.
ISSN:2055-2076