Driver Drowsiness Detection Using Vision Transformer
This work explores the capability of the new neural network architecture called Vision Transformer (ViT) in addressing prevalent issue of road accidents attributed to drowsy driving. The development of the ViT model involves the use of a pre-trained ViT_B_16 model with initial weight from IMAGENET1K...
Published in: | 2024 IEEE 14TH SYMPOSIUM ON COMPUTER APPLICATIONS & INDUSTRIAL ELECTRONICS, ISCAIE 2024 |
---|---|
Main Authors: | , , |
Format: | Proceedings Paper |
Language: | English |
Published: |
IEEE
2024
|
Subjects: | |
Online Access: | https://www-webofscience-com.uitm.idm.oclc.org/wos/woscc/full-record/WOS:001283898700030 |
author |
Azmi Muhammad Muizuddin Bin Mohamad; Zaman Fadhlan Hafizhelmi Kamaru |
---|---|
spellingShingle |
Azmi Muhammad Muizuddin Bin Mohamad; Zaman Fadhlan Hafizhelmi Kamaru Driver Drowsiness Detection Using Vision Transformer Computer Science; Engineering |
author_facet |
Azmi Muhammad Muizuddin Bin Mohamad; Zaman Fadhlan Hafizhelmi Kamaru |
author_sort |
Azmi |
spelling |
Azmi, Muhammad Muizuddin Bin Mohamad; Zaman, Fadhlan Hafizhelmi Kamaru Driver Drowsiness Detection Using Vision Transformer 2024 IEEE 14TH SYMPOSIUM ON COMPUTER APPLICATIONS & INDUSTRIAL ELECTRONICS, ISCAIE 2024 English Proceedings Paper This work explores the capability of the new neural network architecture called Vision Transformer (ViT) in addressing prevalent issue of road accidents attributed to drowsy driving. The development of the ViT model involves the use of a pre-trained ViT_B_16 model with initial weight from IMAGENET1K_V1 and was trained using our own driver behavior dataset. The dataset undergoes a thorough preprocessing pipeline, including face extraction, normalization, and data augmentation techniques resulting in 33,034 images for training data. With a focus on detecting normal, yawning, and nodding behaviors, the system achieves remarkable accuracy, reaching 98.07% in training and 93% in testing. The ViT's implementation is demonstrated through webcam-based inferences with the model deployment on a Raspberry Pi 4 by measuring the FPS of the video inferences for capturing real time input in which it achieves unfavorable performance of 0.59 fps. However, on a better performance system, the model can achieve up to 21 fps. Overall, the project contributes to advancing driver monitoring systems and investigation of the ViT model's potential for real-time applications and highlighting the issues for implementing ViT in real world applications considering its computational demand for a low resource embedded system. IEEE 2836-4864 2024 10.1109/ISCAIE61308.2024.10576317 Computer Science; Engineering WOS:001283898700030 https://www-webofscience-com.uitm.idm.oclc.org/wos/woscc/full-record/WOS:001283898700030 |
title |
Driver Drowsiness Detection Using Vision Transformer |
title_short |
Driver Drowsiness Detection Using Vision Transformer |
title_full |
Driver Drowsiness Detection Using Vision Transformer |
title_fullStr |
Driver Drowsiness Detection Using Vision Transformer |
title_full_unstemmed |
Driver Drowsiness Detection Using Vision Transformer |
title_sort |
Driver Drowsiness Detection Using Vision Transformer |
container_title |
2024 IEEE 14TH SYMPOSIUM ON COMPUTER APPLICATIONS & INDUSTRIAL ELECTRONICS, ISCAIE 2024 |
language |
English |
format |
Proceedings Paper |
description |
This work explores the capability of the new neural network architecture called Vision Transformer (ViT) in addressing prevalent issue of road accidents attributed to drowsy driving. The development of the ViT model involves the use of a pre-trained ViT_B_16 model with initial weight from IMAGENET1K_V1 and was trained using our own driver behavior dataset. The dataset undergoes a thorough preprocessing pipeline, including face extraction, normalization, and data augmentation techniques resulting in 33,034 images for training data. With a focus on detecting normal, yawning, and nodding behaviors, the system achieves remarkable accuracy, reaching 98.07% in training and 93% in testing. The ViT's implementation is demonstrated through webcam-based inferences with the model deployment on a Raspberry Pi 4 by measuring the FPS of the video inferences for capturing real time input in which it achieves unfavorable performance of 0.59 fps. However, on a better performance system, the model can achieve up to 21 fps. Overall, the project contributes to advancing driver monitoring systems and investigation of the ViT model's potential for real-time applications and highlighting the issues for implementing ViT in real world applications considering its computational demand for a low resource embedded system. |
publisher |
IEEE |
issn |
2836-4864 |
publishDate |
2024 |
container_volume |
|
container_issue |
|
doi_str_mv |
10.1109/ISCAIE61308.2024.10576317 |
topic |
Computer Science; Engineering |
topic_facet |
Computer Science; Engineering |
accesstype |
|
id |
WOS:001283898700030 |
url |
https://www-webofscience-com.uitm.idm.oclc.org/wos/woscc/full-record/WOS:001283898700030 |
record_format |
wos |
collection |
Web of Science (WoS) |
_version_ |
1823296085342289920 |