XGBoost Algorithm for Cervical Cancer Risk Prediction: Multi-dimensional Feature Analysis

Cervical cancer continues to pose a significant global health challenge, with early detection remaining the cornerstone for effective intervention. This study is situated at the intersection of clinical oncology and computational intelligence, exploring the potential of gradient-boosting algorithms...

Full description

Saved in:
Bibliographic Details
Main Authors: Sudi Suryadi, Masrizal
Format: Article
Language:English
Published: Ikatan Ahli Informatika Indonesia 2025-06-01
Series:Jurnal RESTI (Rekayasa Sistem dan Teknologi Informasi)
Subjects:
Online Access:https://jurnal.iaii.or.id/index.php/RESTI/article/view/6587
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:Cervical cancer continues to pose a significant global health challenge, with early detection remaining the cornerstone for effective intervention. This study is situated at the intersection of clinical oncology and computational intelligence, exploring the potential of gradient-boosting algorithms to overcome the limitations of conventional screening methodologies. An XGBoost model was developed to predict cervical cancer risk. This model incorporates demographic, behavioral, and clinical parameters. The model was developed using data from 858 patients at the Hospital Universitario de Caracas. The preprocessing pipeline was designed to address the complexities inherent in medical data, including strategic management of missing values and standardizing heterogeneous features. The model demonstrated an overall accuracy of 96.3%, with a sensitivity of 66.7% and a specificity of 97.6%. This performance profile indicates adept navigation of the delicate balance between missed diagnoses and unnecessary interventions. Feature importance analysis revealed a multifaceted risk landscape, where screening test results contributed substantial predictive power (approximately 60%), complemented by demographic and behavioral factors, including age, reproductive history, and contraceptive usage patterns. The confusion matrix analysis revealed the clinical implications of the model predictions, demonstrating a promising positive predictive value of 55.0% despite the pronounced class imbalance. These findings suggest that ensemble learning approaches can effectively synthesize diverse patient data into meaningful risk assessments, potentially enhancing screening efficiency through personalized stratification. Future research directions include prospective validation across diverse populations, integration of longitudinal data, and further exploration of explainable AI techniques to bridge the gap between algorithmic predictions and clinical implementation.
ISSN:2580-0760