Fuzzy-based voiced-unvoiced segmentation for emotion recognition using spectral feature fusions

Despite abundant growth in automatic emotion recognition system (ERS) studies using various techniques in feature extractions and classifiers, scarce sources found to improve the system via pre-processing techniques. This paper proposed a smart pre-processing stage using fuzzy logic inference system...

Full description

Bibliographic Details
Published in:	Indonesian Journal of Electrical Engineering and Computer Science
Main Author:	Ali Y.M.; Rahim A.F.A.; Noorsal E.; Yassin Z.M.; Mokhtar N.F.; Ramlan M.H.
Format:	Article
Language:	English
Published:	Institute of Advanced Engineering and Science 2020
Online Access:	https://www.scopus.com/inward/record.uri?eid=2-s2.0-85083092765&doi=10.11591%2fijeecs.v19.i1.pp196-206&partnerID=40&md5=5849faa3152907cbeedf174173968813

id	2-s2.0-85083092765
spelling	2-s2.0-85083092765 Ali Y.M.; Rahim A.F.A.; Noorsal E.; Yassin Z.M.; Mokhtar N.F.; Ramlan M.H. Fuzzy-based voiced-unvoiced segmentation for emotion recognition using spectral feature fusions 2020 Indonesian Journal of Electrical Engineering and Computer Science 19 1 10.11591/ijeecs.v19.i1.pp196-206 https://www.scopus.com/inward/record.uri?eid=2-s2.0-85083092765&doi=10.11591%2fijeecs.v19.i1.pp196-206&partnerID=40&md5=5849faa3152907cbeedf174173968813 Despite abundant growth in automatic emotion recognition system (ERS) studies using various techniques in feature extractions and classifiers, scarce sources found to improve the system via pre-processing techniques. This paper proposed a smart pre-processing stage using fuzzy logic inference system (FIS) based on Mamdani engine and simple time-based features i.e. zero-crossing rate (ZCR) and short-time energy (STE) to initially identify a frame as voiced (V) or unvoiced (UV). Mel-frequency cepstral coefficients (MFCC) and linear prediction coefficients (LPC) were tested with K-nearest neighbours (KNN) classifiers to evaluate the proposed FIS V-UV segmentation. We also introduced two feature fusions of MFCC and LPC with formants to obtain better performance. Experimental results of the proposed system surpassed the conventional ERS which yielded a rise in accuracy rate from 3.7% to 9.0%. The fusion of LPC and formants named as SFF LPC-fmnt indicated a promising result between 1.3% and 5.1% higher accuracy rate than its baseline features in classifying between neutral, angry, happy and sad emotions. The best accuracy rates yielded for male and female speakers were 79.1% and 79.9% respectively using SFF MFCC-fmnt fusion technique. Copyright © 2020 Institute of Advanced Engineering and Science. All rights reserved. Institute of Advanced Engineering and Science 25024752 English Article All Open Access; Gold Open Access; Green Open Access
author	Ali Y.M.; Rahim A.F.A.; Noorsal E.; Yassin Z.M.; Mokhtar N.F.; Ramlan M.H.
spellingShingle	Ali Y.M.; Rahim A.F.A.; Noorsal E.; Yassin Z.M.; Mokhtar N.F.; Ramlan M.H. Fuzzy-based voiced-unvoiced segmentation for emotion recognition using spectral feature fusions
author_facet	Ali Y.M.; Rahim A.F.A.; Noorsal E.; Yassin Z.M.; Mokhtar N.F.; Ramlan M.H.
author_sort	Ali Y.M.; Rahim A.F.A.; Noorsal E.; Yassin Z.M.; Mokhtar N.F.; Ramlan M.H.
title	Fuzzy-based voiced-unvoiced segmentation for emotion recognition using spectral feature fusions
title_short	Fuzzy-based voiced-unvoiced segmentation for emotion recognition using spectral feature fusions
title_full	Fuzzy-based voiced-unvoiced segmentation for emotion recognition using spectral feature fusions
title_fullStr	Fuzzy-based voiced-unvoiced segmentation for emotion recognition using spectral feature fusions
title_full_unstemmed	Fuzzy-based voiced-unvoiced segmentation for emotion recognition using spectral feature fusions
title_sort	Fuzzy-based voiced-unvoiced segmentation for emotion recognition using spectral feature fusions
publishDate	2020
container_title	Indonesian Journal of Electrical Engineering and Computer Science
container_volume	19
container_issue	1
doi_str_mv	10.11591/ijeecs.v19.i1.pp196-206
url	https://www.scopus.com/inward/record.uri?eid=2-s2.0-85083092765&doi=10.11591%2fijeecs.v19.i1.pp196-206&partnerID=40&md5=5849faa3152907cbeedf174173968813
description	Despite abundant growth in automatic emotion recognition system (ERS) studies using various techniques in feature extractions and classifiers, scarce sources found to improve the system via pre-processing techniques. This paper proposed a smart pre-processing stage using fuzzy logic inference system (FIS) based on Mamdani engine and simple time-based features i.e. zero-crossing rate (ZCR) and short-time energy (STE) to initially identify a frame as voiced (V) or unvoiced (UV). Mel-frequency cepstral coefficients (MFCC) and linear prediction coefficients (LPC) were tested with K-nearest neighbours (KNN) classifiers to evaluate the proposed FIS V-UV segmentation. We also introduced two feature fusions of MFCC and LPC with formants to obtain better performance. Experimental results of the proposed system surpassed the conventional ERS which yielded a rise in accuracy rate from 3.7% to 9.0%. The fusion of LPC and formants named as SFF LPC-fmnt indicated a promising result between 1.3% and 5.1% higher accuracy rate than its baseline features in classifying between neutral, angry, happy and sad emotions. The best accuracy rates yielded for male and female speakers were 79.1% and 79.9% respectively using SFF MFCC-fmnt fusion technique. Copyright © 2020 Institute of Advanced Engineering and Science. All rights reserved.
publisher	Institute of Advanced Engineering and Science
issn	25024752
language	English
format	Article
accesstype	All Open Access; Gold Open Access; Green Open Access
record_format	scopus
collection	Scopus
_version_	1809677599779061760

Fuzzy-based voiced-unvoiced segmentation for emotion recognition using spectral feature fusions

Similar Items