Application of Machine Learning Models for Baseball Outcome Prediction
Data science has become an essential component in professional sports, particularly for predicting team performance and outcomes. This study aims to develop and evaluate machine learning models that accurately predict game outcomes in the Chinese Professional Baseball League (CPBL). Method: A total...
Saved in:
Main Authors: | , , , , , |
---|---|
Format: | Article |
Language: | English |
Published: |
MDPI AG
2025-06-01
|
Series: | Applied Sciences |
Subjects: | |
Online Access: | https://www.mdpi.com/2076-3417/15/13/7081 |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Summary: | Data science has become an essential component in professional sports, particularly for predicting team performance and outcomes. This study aims to develop and evaluate machine learning models that accurately predict game outcomes in the Chinese Professional Baseball League (CPBL). Method: A total of 859 games from the 2021 to 2023 regular seasons were analyzed, using both traditional baseball statistics and advanced sabermetric indicators such as the Weighted Runs Created Plus (wRC+), Weighted Runs Above Average (wRAA), and Percentage of Leadoff Batters on Base (PLOB%). Five machine learning models—decision tree, logistic regression, Neural Network, Random Forest, and XGBoost—were constructed and assessed through a five-fold cross-validation. Evaluation metrics included accuracy, F1 scores, sensitivity, specificity, and the AUC-ROC. Results: Among the models, logistic regression and XGBoost achieved the highest performance, with an accuracy ranging from 0.89 to 0.93 and an AUC-ROC from 0.97 to 0.98. The feature importance and SHapley Additive exPlanations (SHAP) analysis revealed that the wRC+ and PLOB% were the most influential predictors, reflecting the offensive efficiency and pitching control. Conclusion: The results suggest that combining interpretable machine learning with sabermetrics provides valuable insights for coaches and analysts in professional baseball. Furthermore, incorporating performance weighting based on game context may further enhance model accuracy. This research demonstrates the potential of data-driven strategies in sports analytics and decision-making. |
---|---|
ISSN: | 2076-3417 |