CAGNet: A Network Combining Multiscale Feature Aggregation and Attention Mechanisms for Intelligent Facial Expression Recognition in Human-Robot Interaction

The development of Facial Expression Recognition (FER) technology has significantly enhanced the naturalness and intuitiveness of human-robot interaction. In the field of service robots, particularly in applications such as production assistance, caregiving, and daily service communication, efficien...

Full description

Saved in:
Bibliographic Details
Main Authors: Dengpan Zhang, Wenwen Ma, Zhihao Shen, Qingping Ma
Format: Article
Language:English
Published: MDPI AG 2025-06-01
Series:Sensors
Subjects:
Online Access:https://www.mdpi.com/1424-8220/25/12/3653
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1839652726578872320
author Dengpan Zhang
Wenwen Ma
Zhihao Shen
Qingping Ma
author_facet Dengpan Zhang
Wenwen Ma
Zhihao Shen
Qingping Ma
author_sort Dengpan Zhang
collection DOAJ
description The development of Facial Expression Recognition (FER) technology has significantly enhanced the naturalness and intuitiveness of human-robot interaction. In the field of service robots, particularly in applications such as production assistance, caregiving, and daily service communication, efficient FER capabilities are crucial. However, existing Convolutional Neural Network (CNN) models still have limitations in terms of feature representation and recognition accuracy for facial expressions. To address these challenges, we propose CAGNet, a novel network that combines multiscale feature aggregation and attention mechanisms. CAGNet employs a deep learning-based hierarchical convolutional architecture, enhancing the extraction of features at multiple scales through stacked convolutional layers. The network integrates the Convolutional Block Attention Module (CBAM) and Global Average Pooling (GAP) modules to optimize the capture of both local and global features. Additionally, Batch Normalization (BN) layers and Dropout techniques are incorporated to improve model stability and generalization. CAGNet was evaluated on two standard datasets, FER2013 and CK+, and the experiment results demonstrate that the network achieves accuracies of 71.52% and 97.97%, respectively, in FER. These results not only validate the effectiveness and superiority of our approach but also provide a new technical solution for FER. Furthermore, CAGNet offers robust support for the intelligent upgrade of service robots.
format Article
id doaj-art-8b3ba2d7bfec41c6aa48d2bc1baf6bec
institution Matheson Library
issn 1424-8220
language English
publishDate 2025-06-01
publisher MDPI AG
record_format Article
series Sensors
spelling doaj-art-8b3ba2d7bfec41c6aa48d2bc1baf6bec2025-06-25T14:25:23ZengMDPI AGSensors1424-82202025-06-012512365310.3390/s25123653CAGNet: A Network Combining Multiscale Feature Aggregation and Attention Mechanisms for Intelligent Facial Expression Recognition in Human-Robot InteractionDengpan Zhang0Wenwen Ma1Zhihao Shen2Qingping Ma3School of Mechanical and Power Engineering, Henan Polytechnic University, Jiaozuo 454000, ChinaSchool of Mechanical and Power Engineering, Henan Polytechnic University, Jiaozuo 454000, ChinaSchool of Mechanical and Power Engineering, Henan Polytechnic University, Jiaozuo 454000, ChinaSchool of Mechanical and Power Engineering, Henan Polytechnic University, Jiaozuo 454000, ChinaThe development of Facial Expression Recognition (FER) technology has significantly enhanced the naturalness and intuitiveness of human-robot interaction. In the field of service robots, particularly in applications such as production assistance, caregiving, and daily service communication, efficient FER capabilities are crucial. However, existing Convolutional Neural Network (CNN) models still have limitations in terms of feature representation and recognition accuracy for facial expressions. To address these challenges, we propose CAGNet, a novel network that combines multiscale feature aggregation and attention mechanisms. CAGNet employs a deep learning-based hierarchical convolutional architecture, enhancing the extraction of features at multiple scales through stacked convolutional layers. The network integrates the Convolutional Block Attention Module (CBAM) and Global Average Pooling (GAP) modules to optimize the capture of both local and global features. Additionally, Batch Normalization (BN) layers and Dropout techniques are incorporated to improve model stability and generalization. CAGNet was evaluated on two standard datasets, FER2013 and CK+, and the experiment results demonstrate that the network achieves accuracies of 71.52% and 97.97%, respectively, in FER. These results not only validate the effectiveness and superiority of our approach but also provide a new technical solution for FER. Furthermore, CAGNet offers robust support for the intelligent upgrade of service robots.https://www.mdpi.com/1424-8220/25/12/3653Facial Expression RecognitionMultiscale Feature AggregationConvolutional Block Attention ModuleGlobal Average Pooling
spellingShingle Dengpan Zhang
Wenwen Ma
Zhihao Shen
Qingping Ma
CAGNet: A Network Combining Multiscale Feature Aggregation and Attention Mechanisms for Intelligent Facial Expression Recognition in Human-Robot Interaction
Sensors
Facial Expression Recognition
Multiscale Feature Aggregation
Convolutional Block Attention Module
Global Average Pooling
title CAGNet: A Network Combining Multiscale Feature Aggregation and Attention Mechanisms for Intelligent Facial Expression Recognition in Human-Robot Interaction
title_full CAGNet: A Network Combining Multiscale Feature Aggregation and Attention Mechanisms for Intelligent Facial Expression Recognition in Human-Robot Interaction
title_fullStr CAGNet: A Network Combining Multiscale Feature Aggregation and Attention Mechanisms for Intelligent Facial Expression Recognition in Human-Robot Interaction
title_full_unstemmed CAGNet: A Network Combining Multiscale Feature Aggregation and Attention Mechanisms for Intelligent Facial Expression Recognition in Human-Robot Interaction
title_short CAGNet: A Network Combining Multiscale Feature Aggregation and Attention Mechanisms for Intelligent Facial Expression Recognition in Human-Robot Interaction
title_sort cagnet a network combining multiscale feature aggregation and attention mechanisms for intelligent facial expression recognition in human robot interaction
topic Facial Expression Recognition
Multiscale Feature Aggregation
Convolutional Block Attention Module
Global Average Pooling
url https://www.mdpi.com/1424-8220/25/12/3653
work_keys_str_mv AT dengpanzhang cagnetanetworkcombiningmultiscalefeatureaggregationandattentionmechanismsforintelligentfacialexpressionrecognitioninhumanrobotinteraction
AT wenwenma cagnetanetworkcombiningmultiscalefeatureaggregationandattentionmechanismsforintelligentfacialexpressionrecognitioninhumanrobotinteraction
AT zhihaoshen cagnetanetworkcombiningmultiscalefeatureaggregationandattentionmechanismsforintelligentfacialexpressionrecognitioninhumanrobotinteraction
AT qingpingma cagnetanetworkcombiningmultiscalefeatureaggregationandattentionmechanismsforintelligentfacialexpressionrecognitioninhumanrobotinteraction