A Comprehensive Methodological Survey of Human Activity Recognition Across Diverse Data Modalities

Human Activity Recognition (HAR) systems aim to understand human behavior and assign a label to each action, attracting significant attention in computer vision due to their wide range of applications. HAR can leverage various data modalities, such as RGB images and video, skeleton, depth, infrared,...

Full description

Saved in:
Bibliographic Details
Main Authors: Jungpil Shin, Najmul Hassan, Abu Saleh Musa Miah, Satoshi Nishimura
Format: Article
Language:English
Published: MDPI AG 2025-06-01
Series:Sensors
Subjects:
Online Access:https://www.mdpi.com/1424-8220/25/13/4028
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:Human Activity Recognition (HAR) systems aim to understand human behavior and assign a label to each action, attracting significant attention in computer vision due to their wide range of applications. HAR can leverage various data modalities, such as RGB images and video, skeleton, depth, infrared, point cloud, event stream, audio, acceleration, and radar signals. Each modality provides unique and complementary information suited to different application scenarios. Consequently, numerous studies have investigated diverse approaches for HAR using these modalities. This survey includes only peer-reviewed research papers published in English to ensure linguistic consistency and academic integrity. This paper presents a comprehensive survey of the latest advancements in HAR from 2014 to 2025, focusing on Machine Learning (ML) and Deep Learning (DL) approaches categorized by input data modalities. We review both single-modality and multi-modality techniques, highlighting fusion-based and co-learning frameworks. Additionally, we cover advancements in hand-crafted action features, methods for recognizing human–object interactions, and activity detection. Our survey includes a detailed dataset description for each modality, as well as a summary of the latest HAR systems, accompanied by a mathematical derivation for evaluating the deep learning model for each modality, and it also provides comparative results on benchmark datasets. Finally, we provide insightful observations and propose effective future research directions in HAR.
ISSN:1424-8220