Attention-Enhanced CNN-LSTM Model for Exercise Oxygen Consumption Prediction with Multi-Source Temporal Features

Dynamic oxygen uptake (VO<sub>2</sub>) reflects moment-to-moment changes in oxygen consumption during exercise and underpins training design, performance enhancement, and clinical decision-making. We tackled two key obstacles—the limited fusion of heterogeneous sensor data and inadequate...

Full description

Saved in:

Bibliographic Details
Main Authors:	Zhen Wang, Yingzhe Song, Lei Pang, Shanjun Li, Gang Sun
Format:	Article
Language:	English
Published:	MDPI AG 2025-06-01
Series:	Sensors
Subjects:	oxygen uptake deep learning neural network attention mechanism
Online Access:	https://www.mdpi.com/1424-8220/25/13/4062
Tags:	Add Tag No Tags, Be the first to tag this record!

Description
Summary:	Dynamic oxygen uptake (VO<sub>2</sub>) reflects moment-to-moment changes in oxygen consumption during exercise and underpins training design, performance enhancement, and clinical decision-making. We tackled two key obstacles—the limited fusion of heterogeneous sensor data and inadequate modeling of long-range temporal patterns—by integrating wearable accelerometer and heart-rate streams with a convolutional neural network–LSTM (CNN-LSTM) architecture and optional attention modules. Physiological signals and VO<sub>2</sub> were recorded from 21 adults through resting assessment and cardiopulmonary exercise testing. The results showed that pairing accelerometer with heart-rate inputs improves prediction compared with considering the heart rate alone. The baseline CNN-LSTM reached <i>R</i><sup>2</sup> = 0.946, outperforming a plain LSTM (<i>R</i><sup>2</sup> = 0.926) thanks to stronger local spatio-temporal feature extraction. Introducing a spatial attention mechanism raised accuracy further (<i>R</i><sup>2</sup> = 0.962), whereas temporal attention reduced it (<i>R</i><sup>2</sup> = 0.930), indicating that attention success depends on how well the attended features align with exercise dynamics. Stacking both attentions (spatio-temporal) yielded <i>R</i><sup>2</sup> = 0.960, slightly below the value for spatial attention alone, implying that added complexity does not guarantee better performance. Across all models, prediction errors grew during high-intensity bouts, highlighting a bottleneck in capturing non-linear physiological responses under heavy load. These findings inform architecture selection for wearable metabolic monitoring and clarify when attention mechanisms add value.
ISSN:	1424-8220

Attention-Enhanced CNN-LSTM Model for Exercise Oxygen Consumption Prediction with Multi-Source Temporal Features

Similar Items