On the use of voice activity detection in speech emotion recognition
Emotion recognition through speech has many potential applications, however the challenge comes from achieving a high emotion recognition while using limited resources or interference such as noise. In this paper we have explored the possibility of improving speech emotion recognition by utilizing t...
Published in: | Bulletin of Electrical Engineering and Informatics |
---|---|
Main Author: | |
Format: | Article |
Language: | English |
Published: |
Institute of Advanced Engineering and Science
2019
|
Online Access: | https://www.scopus.com/inward/record.uri?eid=2-s2.0-85075591444&doi=10.11591%2feei.v8i4.1646&partnerID=40&md5=e0ee362267a15f85350b225970da84e5 |
id |
2-s2.0-85075591444 |
---|---|
spelling |
2-s2.0-85075591444 Alghifari M.F.; Gunawan T.S.; Wan Nordin M.A.B.; Qadri S.A.A.; Kartiwi M.; Janin Z. On the use of voice activity detection in speech emotion recognition 2019 Bulletin of Electrical Engineering and Informatics 8 4 10.11591/eei.v8i4.1646 https://www.scopus.com/inward/record.uri?eid=2-s2.0-85075591444&doi=10.11591%2feei.v8i4.1646&partnerID=40&md5=e0ee362267a15f85350b225970da84e5 Emotion recognition through speech has many potential applications, however the challenge comes from achieving a high emotion recognition while using limited resources or interference such as noise. In this paper we have explored the possibility of improving speech emotion recognition by utilizing the voice activity detection (VAD) concept. The emotional voice data from the Berlin Emotion Database (EMO-DB) and a custom-made database LQ Audio Dataset are firstly preprocessed by VAD before feature extraction. The features are then passed to the deep neural network for classification. In this paper, we have chosen MFCC to be the sole determinant feature. From the results obtained using VAD and without, we have found that the VAD improved the recognition rate of 5 emotions (happy, angry, sad, fear, and neutral) by 3.7% when recognizing clean signals, while the effect of using VAD when training a network with both clean and noisy signals improved our previous results by 50%. © 2019 Institute of Advanced Engineering and Science. All rights reserved. Institute of Advanced Engineering and Science 20893191 English Article All Open Access; Gold Open Access; Green Open Access |
author |
Alghifari M.F.; Gunawan T.S.; Wan Nordin M.A.B.; Qadri S.A.A.; Kartiwi M.; Janin Z. |
spellingShingle |
Alghifari M.F.; Gunawan T.S.; Wan Nordin M.A.B.; Qadri S.A.A.; Kartiwi M.; Janin Z. On the use of voice activity detection in speech emotion recognition |
author_facet |
Alghifari M.F.; Gunawan T.S.; Wan Nordin M.A.B.; Qadri S.A.A.; Kartiwi M.; Janin Z. |
author_sort |
Alghifari M.F.; Gunawan T.S.; Wan Nordin M.A.B.; Qadri S.A.A.; Kartiwi M.; Janin Z. |
title |
On the use of voice activity detection in speech emotion recognition |
title_short |
On the use of voice activity detection in speech emotion recognition |
title_full |
On the use of voice activity detection in speech emotion recognition |
title_fullStr |
On the use of voice activity detection in speech emotion recognition |
title_full_unstemmed |
On the use of voice activity detection in speech emotion recognition |
title_sort |
On the use of voice activity detection in speech emotion recognition |
publishDate |
2019 |
container_title |
Bulletin of Electrical Engineering and Informatics |
container_volume |
8 |
container_issue |
4 |
doi_str_mv |
10.11591/eei.v8i4.1646 |
url |
https://www.scopus.com/inward/record.uri?eid=2-s2.0-85075591444&doi=10.11591%2feei.v8i4.1646&partnerID=40&md5=e0ee362267a15f85350b225970da84e5 |
description |
Emotion recognition through speech has many potential applications, however the challenge comes from achieving a high emotion recognition while using limited resources or interference such as noise. In this paper we have explored the possibility of improving speech emotion recognition by utilizing the voice activity detection (VAD) concept. The emotional voice data from the Berlin Emotion Database (EMO-DB) and a custom-made database LQ Audio Dataset are firstly preprocessed by VAD before feature extraction. The features are then passed to the deep neural network for classification. In this paper, we have chosen MFCC to be the sole determinant feature. From the results obtained using VAD and without, we have found that the VAD improved the recognition rate of 5 emotions (happy, angry, sad, fear, and neutral) by 3.7% when recognizing clean signals, while the effect of using VAD when training a network with both clean and noisy signals improved our previous results by 50%. © 2019 Institute of Advanced Engineering and Science. All rights reserved. |
publisher |
Institute of Advanced Engineering and Science |
issn |
20893191 |
language |
English |
format |
Article |
accesstype |
All Open Access; Gold Open Access; Green Open Access |
record_format |
scopus |
collection |
Scopus |
_version_ |
1820775466935517184 |