Driver Drowsiness Detection Using Vision Transformer

This work explores the capability of the new neural network architecture called Vision Transformer (ViT) in addressing prevalent issue of road accidents attributed to drowsy driving. The development of the ViT model involves the use of a pre-trained ViT_B_16 model with initial weight from IMAGENETIK...

Full description

Bibliographic Details
Published in:	14th IEEE Symposium on Computer Applications and Industrial Electronics, ISCAIE 2024
Main Author:	Bin Mohamad Azmi M.M.; Kamaru Zaman F.H.
Format:	Conference paper
Language:	English
Published:	Institute of Electrical and Electronics Engineers Inc. 2024
Online Access:	https://www.scopus.com/inward/record.uri?eid=2-s2.0-85198903652&doi=10.1109%2fISCAIE61308.2024.10576317&partnerID=40&md5=27dee52f1f56e7f6b5fccc84fe69f329

id	2-s2.0-85198903652
spelling	2-s2.0-85198903652 Bin Mohamad Azmi M.M.; Kamaru Zaman F.H. Driver Drowsiness Detection Using Vision Transformer 2024 14th IEEE Symposium on Computer Applications and Industrial Electronics, ISCAIE 2024 10.1109/ISCAIE61308.2024.10576317 https://www.scopus.com/inward/record.uri?eid=2-s2.0-85198903652&doi=10.1109%2fISCAIE61308.2024.10576317&partnerID=40&md5=27dee52f1f56e7f6b5fccc84fe69f329 This work explores the capability of the new neural network architecture called Vision Transformer (ViT) in addressing prevalent issue of road accidents attributed to drowsy driving. The development of the ViT model involves the use of a pre-trained ViT_B_16 model with initial weight from IMAGENETIK_ VI and was trained using our own driver behavior dataset. The dataset undergoes a thorough preprocessing pipeline, including face extraction, normalization, and data augmentation techniques resulting in 33,034 images for training data. With a focus on detecting normal, yawning, and nodding behaviors, the system achieves remarkable accuracy, reaching 98.07% in training and 93% in testing. The ViT's implementation is demonstrated through webcam-based inferences with the model deployment on a Raspberry Pi 4 by measuring the FPS of the video inferences for capturing real time input in which it achieves unfavorable performance of 0.59 fps. However, on a better performance system, the model can achieve up to 21 fps. Overall, the project contributes to advancing driver monitoring systems and investigation of the ViT model's potential for real-time applications and highlighting the issues for implementing ViT in real world applications considering its computational demand for a low resource embedded system. © 2024 IEEE. Institute of Electrical and Electronics Engineers Inc. English Conference paper
author	Bin Mohamad Azmi M.M.; Kamaru Zaman F.H.
spellingShingle	Bin Mohamad Azmi M.M.; Kamaru Zaman F.H. Driver Drowsiness Detection Using Vision Transformer
author_facet	Bin Mohamad Azmi M.M.; Kamaru Zaman F.H.
author_sort	Bin Mohamad Azmi M.M.; Kamaru Zaman F.H.
title	Driver Drowsiness Detection Using Vision Transformer
title_short	Driver Drowsiness Detection Using Vision Transformer
title_full	Driver Drowsiness Detection Using Vision Transformer
title_fullStr	Driver Drowsiness Detection Using Vision Transformer
title_full_unstemmed	Driver Drowsiness Detection Using Vision Transformer
title_sort	Driver Drowsiness Detection Using Vision Transformer
publishDate	2024
container_title	14th IEEE Symposium on Computer Applications and Industrial Electronics, ISCAIE 2024
container_volume
container_issue
doi_str_mv	10.1109/ISCAIE61308.2024.10576317
url	https://www.scopus.com/inward/record.uri?eid=2-s2.0-85198903652&doi=10.1109%2fISCAIE61308.2024.10576317&partnerID=40&md5=27dee52f1f56e7f6b5fccc84fe69f329
description	This work explores the capability of the new neural network architecture called Vision Transformer (ViT) in addressing prevalent issue of road accidents attributed to drowsy driving. The development of the ViT model involves the use of a pre-trained ViT_B_16 model with initial weight from IMAGENETIK_ VI and was trained using our own driver behavior dataset. The dataset undergoes a thorough preprocessing pipeline, including face extraction, normalization, and data augmentation techniques resulting in 33,034 images for training data. With a focus on detecting normal, yawning, and nodding behaviors, the system achieves remarkable accuracy, reaching 98.07% in training and 93% in testing. The ViT's implementation is demonstrated through webcam-based inferences with the model deployment on a Raspberry Pi 4 by measuring the FPS of the video inferences for capturing real time input in which it achieves unfavorable performance of 0.59 fps. However, on a better performance system, the model can achieve up to 21 fps. Overall, the project contributes to advancing driver monitoring systems and investigation of the ViT model's potential for real-time applications and highlighting the issues for implementing ViT in real world applications considering its computational demand for a low resource embedded system. © 2024 IEEE.
publisher	Institute of Electrical and Electronics Engineers Inc.
issn
language	English
format	Conference paper
accesstype
record_format	scopus
collection	Scopus
_version_	1818940556612468736

Driver Drowsiness Detection Using Vision Transformer

Similar Items