Driver Drowsiness Detection Using Vision Transformer

This work explores the capability of the new neural network architecture called Vision Transformer (ViT) in addressing prevalent issue of road accidents attributed to drowsy driving. The development of the ViT model involves the use of a pre-trained ViT_B_16 model with initial weight from IMAGENET1K...

Full description

Bibliographic Details
Published in:2024 IEEE 14TH SYMPOSIUM ON COMPUTER APPLICATIONS & INDUSTRIAL ELECTRONICS, ISCAIE 2024
Main Authors: Azmi, Muhammad Muizuddin Bin Mohamad; Zaman, Fadhlan Hafizhelmi Kamaru
Format: Proceedings Paper
Language:English
Published: IEEE 2024
Subjects:
Online Access:https://www-webofscience-com.uitm.idm.oclc.org/wos/woscc/full-record/WOS:001283898700030
author Azmi
Muhammad Muizuddin Bin Mohamad; Zaman
Fadhlan Hafizhelmi Kamaru
spellingShingle Azmi
Muhammad Muizuddin Bin Mohamad; Zaman
Fadhlan Hafizhelmi Kamaru
Driver Drowsiness Detection Using Vision Transformer
Computer Science; Engineering
author_facet Azmi
Muhammad Muizuddin Bin Mohamad; Zaman
Fadhlan Hafizhelmi Kamaru
author_sort Azmi
spelling Azmi, Muhammad Muizuddin Bin Mohamad; Zaman, Fadhlan Hafizhelmi Kamaru
Driver Drowsiness Detection Using Vision Transformer
2024 IEEE 14TH SYMPOSIUM ON COMPUTER APPLICATIONS & INDUSTRIAL ELECTRONICS, ISCAIE 2024
English
Proceedings Paper
This work explores the capability of the new neural network architecture called Vision Transformer (ViT) in addressing prevalent issue of road accidents attributed to drowsy driving. The development of the ViT model involves the use of a pre-trained ViT_B_16 model with initial weight from IMAGENET1K_V1 and was trained using our own driver behavior dataset. The dataset undergoes a thorough preprocessing pipeline, including face extraction, normalization, and data augmentation techniques resulting in 33,034 images for training data. With a focus on detecting normal, yawning, and nodding behaviors, the system achieves remarkable accuracy, reaching 98.07% in training and 93% in testing. The ViT's implementation is demonstrated through webcam-based inferences with the model deployment on a Raspberry Pi 4 by measuring the FPS of the video inferences for capturing real time input in which it achieves unfavorable performance of 0.59 fps. However, on a better performance system, the model can achieve up to 21 fps. Overall, the project contributes to advancing driver monitoring systems and investigation of the ViT model's potential for real-time applications and highlighting the issues for implementing ViT in real world applications considering its computational demand for a low resource embedded system.
IEEE
2836-4864

2024


10.1109/ISCAIE61308.2024.10576317
Computer Science; Engineering

WOS:001283898700030
https://www-webofscience-com.uitm.idm.oclc.org/wos/woscc/full-record/WOS:001283898700030
title Driver Drowsiness Detection Using Vision Transformer
title_short Driver Drowsiness Detection Using Vision Transformer
title_full Driver Drowsiness Detection Using Vision Transformer
title_fullStr Driver Drowsiness Detection Using Vision Transformer
title_full_unstemmed Driver Drowsiness Detection Using Vision Transformer
title_sort Driver Drowsiness Detection Using Vision Transformer
container_title 2024 IEEE 14TH SYMPOSIUM ON COMPUTER APPLICATIONS & INDUSTRIAL ELECTRONICS, ISCAIE 2024
language English
format Proceedings Paper
description This work explores the capability of the new neural network architecture called Vision Transformer (ViT) in addressing prevalent issue of road accidents attributed to drowsy driving. The development of the ViT model involves the use of a pre-trained ViT_B_16 model with initial weight from IMAGENET1K_V1 and was trained using our own driver behavior dataset. The dataset undergoes a thorough preprocessing pipeline, including face extraction, normalization, and data augmentation techniques resulting in 33,034 images for training data. With a focus on detecting normal, yawning, and nodding behaviors, the system achieves remarkable accuracy, reaching 98.07% in training and 93% in testing. The ViT's implementation is demonstrated through webcam-based inferences with the model deployment on a Raspberry Pi 4 by measuring the FPS of the video inferences for capturing real time input in which it achieves unfavorable performance of 0.59 fps. However, on a better performance system, the model can achieve up to 21 fps. Overall, the project contributes to advancing driver monitoring systems and investigation of the ViT model's potential for real-time applications and highlighting the issues for implementing ViT in real world applications considering its computational demand for a low resource embedded system.
publisher IEEE
issn 2836-4864

publishDate 2024
container_volume
container_issue
doi_str_mv 10.1109/ISCAIE61308.2024.10576317
topic Computer Science; Engineering
topic_facet Computer Science; Engineering
accesstype
id WOS:001283898700030
url https://www-webofscience-com.uitm.idm.oclc.org/wos/woscc/full-record/WOS:001283898700030
record_format wos
collection Web of Science (WoS)
_version_ 1823296085342289920