CNN-ViT: A multi-feature learning based approach for driver drowsiness detection

Driver drowsiness remains a critical contributor to road accidents, frequently resulting in severe injuries and fatalities. To address this issue, the present study proposes an advanced drowsiness detection system that combines the competencies of Convolutional Neural Networks (CNNs) — namely DenseN...

Full description

Saved in:

Bibliographic Details
Main Authors:	Madduri Venkateswarlu, Venkata Rami Reddy Chirra
Format:	Article
Language:	English
Published:	Elsevier 2025-09-01
Series:	Array
Subjects:	Hybrid CNN-viT model Vision transformer ResNet50 DenseNet121 VGG19 VGG16
Online Access:	http://www.sciencedirect.com/science/article/pii/S2590005625000529
Tags:	Add Tag No Tags, Be the first to tag this record!

_version_	1839648355893903360
author	Madduri Venkateswarlu Venkata Rami Reddy Chirra
author_facet	Madduri Venkateswarlu Venkata Rami Reddy Chirra
author_sort	Madduri Venkateswarlu
collection	DOAJ
description	Driver drowsiness remains a critical contributor to road accidents, frequently resulting in severe injuries and fatalities. To address this issue, the present study proposes an advanced drowsiness detection system that combines the competencies of Convolutional Neural Networks (CNNs) — namely DenseNet121, VGG16, VGG19, and ResNet50 — with a Vision Transformer (ViT). This hybrid framework is designed to harness the complementary strengths of CNNs and transformers: CNNs excel at capturing fine-grained local features, while ViT effectively models global dependencies within images. The input images are processed simultaneously through both branches, and their extracted features are merged and used to classify the driver’s state into one of four categories: Closed, Open, no_yawn, or yawn. The proposed system was evaluated on two separate datasets, named Dataset-1 and Dataset-2. Results demonstrated that the ResNet50_ViT hybrid attained a high accuracy of 99.76% on Dataset-1, while the VGG19_ViT model attained 98.21% on Dataset-2. Performance was assessed using metrics such as accuracy, precision, F1-score, and recall. The strong results, supported by optimized hyperparameters, highlight the reliability and effectiveness of the hybrid model for real-time driver drowsiness detection.
format	Article
id	doaj-art-4989bbf5f00f43e5b5cceb9da2fe7cbe
institution	Matheson Library
issn	2590-0056
language	English
publishDate	2025-09-01
publisher	Elsevier
record_format	Article
series	Array
spelling	doaj-art-4989bbf5f00f43e5b5cceb9da2fe7cbe2025-06-29T04:52:47ZengElsevierArray2590-00562025-09-0127100425CNN-ViT: A multi-feature learning based approach for driver drowsiness detectionMadduri Venkateswarlu0Venkata Rami Reddy Chirra1School of Computer Science and Engineering, VIT-AP University, Amaravati, 522237, Andhra Pradesh, IndiaCorresponding author.; School of Computer Science and Engineering, VIT-AP University, Amaravati, 522237, Andhra Pradesh, IndiaDriver drowsiness remains a critical contributor to road accidents, frequently resulting in severe injuries and fatalities. To address this issue, the present study proposes an advanced drowsiness detection system that combines the competencies of Convolutional Neural Networks (CNNs) — namely DenseNet121, VGG16, VGG19, and ResNet50 — with a Vision Transformer (ViT). This hybrid framework is designed to harness the complementary strengths of CNNs and transformers: CNNs excel at capturing fine-grained local features, while ViT effectively models global dependencies within images. The input images are processed simultaneously through both branches, and their extracted features are merged and used to classify the driver’s state into one of four categories: Closed, Open, no_yawn, or yawn. The proposed system was evaluated on two separate datasets, named Dataset-1 and Dataset-2. Results demonstrated that the ResNet50_ViT hybrid attained a high accuracy of 99.76% on Dataset-1, while the VGG19_ViT model attained 98.21% on Dataset-2. Performance was assessed using metrics such as accuracy, precision, F1-score, and recall. The strong results, supported by optimized hyperparameters, highlight the reliability and effectiveness of the hybrid model for real-time driver drowsiness detection.http://www.sciencedirect.com/science/article/pii/S2590005625000529Hybrid CNN-viT modelVision transformerResNet50DenseNet121VGG19VGG16
spellingShingle	Madduri Venkateswarlu Venkata Rami Reddy Chirra CNN-ViT: A multi-feature learning based approach for driver drowsiness detection Array Hybrid CNN-viT model Vision transformer ResNet50 DenseNet121 VGG19 VGG16
title	CNN-ViT: A multi-feature learning based approach for driver drowsiness detection
title_full	CNN-ViT: A multi-feature learning based approach for driver drowsiness detection
title_fullStr	CNN-ViT: A multi-feature learning based approach for driver drowsiness detection
title_full_unstemmed	CNN-ViT: A multi-feature learning based approach for driver drowsiness detection
title_short	CNN-ViT: A multi-feature learning based approach for driver drowsiness detection
title_sort	cnn vit a multi feature learning based approach for driver drowsiness detection
topic	Hybrid CNN-viT model Vision transformer ResNet50 DenseNet121 VGG19 VGG16
url	http://www.sciencedirect.com/science/article/pii/S2590005625000529
work_keys_str_mv	AT maddurivenkateswarlu cnnvitamultifeaturelearningbasedapproachfordriverdrowsinessdetection AT venkataramireddychirra cnnvitamultifeaturelearningbasedapproachfordriverdrowsinessdetection

CNN-ViT: A multi-feature learning based approach for driver drowsiness detection

Similar Items