Artificial intelligence-augmented ultrasound diagnosis of follicular-patterned thyroid neoplasms: a multicenter retrospective studyResearch in context

Summary: Background: Conventional diagnostic tools, including ultrasound, fine-needle aspiration cytology, and intraoperative frozen section pathology, may fail to reliably distinguish between benign and malignant FNs, leading to unnecessary or inadequate surgical interventions. We aimed to develop...

Full description

Saved in:
Bibliographic Details
Main Authors: Hui Shen, Shufang Pei, Yue Huang, Suqing Wu, Chifa Zhang, Ting Liang, Dan Yang, Xiaoxiao Feng, Shuyi Liu, Yu Wang, Weihan Cao, Ying Cheng, Hongyan Chen, Qiujie Ni, Fei Wang, Jingjing You, Zhe Jin, Wenle He, Jie Sun, Dexing Yang, Lijuan Liu, Boling Cao, Xiao Zhang, Yingjia Li, Shuixing Zhang, Bin Zhang
Format: Article
Language:English
Published: Elsevier 2025-08-01
Series:EClinicalMedicine
Subjects:
Online Access:http://www.sciencedirect.com/science/article/pii/S2589537025002834
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:Summary: Background: Conventional diagnostic tools, including ultrasound, fine-needle aspiration cytology, and intraoperative frozen section pathology, may fail to reliably distinguish between benign and malignant FNs, leading to unnecessary or inadequate surgical interventions. We aimed to develop and validate a deep learning (DL) system for the preoperative diagnosis of follicular-patterned thyroid neoplasms (FNs) using routine ultrasound images, with the goal of improving diagnostic accuracy and reducing unnecessary procedures. Methods: In this multicenter, retrospective study, we included 3817 patients (2877 [75.4%] female) with a definitive diagnosis of FNs from 11 centers across China. All patients underwent preoperative ultrasound examinations. The dataset comprised 9393 ultrasound images, including thyroid follicular adenoma (n = 1787, 4317 images), follicular carcinoma (n = 446, 1593 images), and follicular variant of papillary thyroid carcinoma (n = 1584, 3483 images) collected between 2012 and 2025. A state-of-the-art OverLoCK (Overview-first-Look-Closely-next ConvNet with Context-Mixing Dynamic Kernels) model was developed on a dataset comprising 2728 patients (6625 images) and validated on an internal cohort (n = 683, 1905 images) and an external cohort (n = 406, 863 images). Model performance was evaluated using the area under the curve (AUC), accuracy, sensitivity, specificity, positive predictive value (PPV), negative predictive value (NPV), and F1 score. Model calibration was evaluated using calibration curves, while clinical usefulness was assessed through decision curve analysis (DCA). Findings: The OverLoCK model exhibited excellent performance in both the internal and external validation sets. In the internal validation cohort, the OverLoCK model achieved an AUC of 0.937 (95% confidence interval [CI]: 0.919–0.954), with accuracy of 90.9% (95% CI: 87.7–92.0), sensitivity of 93.9% (95% CI: 91.5–95.6), specificity of 84.8% (95% CI: 82.6–86.0), PPV of 92.7% (95% CI: 90.7–93.8), NPV of 87.2% (95% CI: 86.0–91.0), and F1 score of 0.911% (95% CI: 0.887–0.932). In the external validation cohort, the model yielded an AUC of 0.853 (95% CI: 0.832–0.876), accuracy of 82.8% (95% CI: 81.7–84.4), sensitivity of 84.5% (95% CI: 82.5–86.2), specificity of 81.1% (95% CI: 79.2–84.5), PPV of 80.4% (95% CI: 79.0–84.0), NPV of 85.1% (95% CI: 83.2–87.7), and F1 score of 0.839 (95% CI: 0.802–0.877). The DL model demonstrates good agreement between the predicted and actual probabilities of malignancy. DCA confirmed that the model was clinically useful. Interpretation: Our study demonstrates that a DL-based system can provide a noninvasive, accurate, and reliable tool for the preoperative diagnosis of FNs. By improving diagnostic precision, this approach has the potential to optimize clinical decision-making and reduce the burden of overtreatment in patients with FNs. Further prospective studies are warranted to validate these findings in real-world clinical settings. Funding: This work was supported by the National Key Research and Development Program of China (2023YFF1204600), the National Natural Science Foundation of China (82227802 and 82302190), the Clinical Frontier Technology Program of the First Affiliated Hospital of Jinan University (No. JNU1AF-CFTP-2022-a01201), the Science and Technology Projects in Guangzhou (202201020022, 2023A03J1036, 2023A03J1038, 2025A04J7006), the Outstanding Young Talents of Guangdong Special Support Program (Health Commission of Guangdong Province) (0720240213), and the Science and Technology Youth Talent Nurturing Program of Jinan University (21623209).
ISSN:2589-5370