Forest Sound Event Detection with Convolutional Recurrent Neural Network-Long Short-Term Memory

Sound event detection tackles an audio environment's complex sound analysis and recognition problem. The process involves localizing and classifying sounds mainly to estimate the start point and end points of the separate sounds and describe each sound. Sound event detection capability relies o...

Full description

Bibliographic Details
Published in:Journal of Advanced Research in Applied Sciences and Engineering Technology
Main Author: Zaini M.N.; Yusoff M.; Sadikin M.A.
Format: Article
Language:English
Published: Penerbit Akademia Baru 2023
Online Access:https://www.scopus.com/inward/record.uri?eid=2-s2.0-85174960423&doi=10.37934%2fARASET.32.2.242254&partnerID=40&md5=bdcb4261a7403ba5cdc4f93bec86f159
id 2-s2.0-85174960423
spelling 2-s2.0-85174960423
Zaini M.N.; Yusoff M.; Sadikin M.A.
Forest Sound Event Detection with Convolutional Recurrent Neural Network-Long Short-Term Memory
2023
Journal of Advanced Research in Applied Sciences and Engineering Technology
32
2
10.37934/ARASET.32.2.242254
https://www.scopus.com/inward/record.uri?eid=2-s2.0-85174960423&doi=10.37934%2fARASET.32.2.242254&partnerID=40&md5=bdcb4261a7403ba5cdc4f93bec86f159
Sound event detection tackles an audio environment's complex sound analysis and recognition problem. The process involves localizing and classifying sounds mainly to estimate the start point and end points of the separate sounds and describe each sound. Sound event detection capability relies on the type of sound. Although detecting sequences of distinct temporal sounds is straightforward, the situation becomes complex when the sound is multiple overlapping of much single audio. This situation usually occurs in the forest environment. Therefore, this aim of the paper is to propose a Convolution Recurrent Neural Network-Long Short-Term Memory algorithm to detect an audio signature of intruders in the forest environment. The audio is extracted in the Mel-frequency cepstrum coefficient and fed into the algorithm as an input. Six sound categories are chainsaw, machete, car, hatchet, ambiance, and bike. They were tested using several epochs, batch size, and filter of the layer in the model. The proposed model can achieve an accuracy of 98.52 percent in detecting the audio signature with a suitable parameter selection. In the future, additional types of audio signatures of intruders and further aspects of evaluation can be added to make the algorithm better at detecting intruders in the forest environment. © 2023, Penerbit Akademia Baru. All rights reserved.
Penerbit Akademia Baru
24621943
English
Article
All Open Access; Hybrid Gold Open Access
author Zaini M.N.; Yusoff M.; Sadikin M.A.
spellingShingle Zaini M.N.; Yusoff M.; Sadikin M.A.
Forest Sound Event Detection with Convolutional Recurrent Neural Network-Long Short-Term Memory
author_facet Zaini M.N.; Yusoff M.; Sadikin M.A.
author_sort Zaini M.N.; Yusoff M.; Sadikin M.A.
title Forest Sound Event Detection with Convolutional Recurrent Neural Network-Long Short-Term Memory
title_short Forest Sound Event Detection with Convolutional Recurrent Neural Network-Long Short-Term Memory
title_full Forest Sound Event Detection with Convolutional Recurrent Neural Network-Long Short-Term Memory
title_fullStr Forest Sound Event Detection with Convolutional Recurrent Neural Network-Long Short-Term Memory
title_full_unstemmed Forest Sound Event Detection with Convolutional Recurrent Neural Network-Long Short-Term Memory
title_sort Forest Sound Event Detection with Convolutional Recurrent Neural Network-Long Short-Term Memory
publishDate 2023
container_title Journal of Advanced Research in Applied Sciences and Engineering Technology
container_volume 32
container_issue 2
doi_str_mv 10.37934/ARASET.32.2.242254
url https://www.scopus.com/inward/record.uri?eid=2-s2.0-85174960423&doi=10.37934%2fARASET.32.2.242254&partnerID=40&md5=bdcb4261a7403ba5cdc4f93bec86f159
description Sound event detection tackles an audio environment's complex sound analysis and recognition problem. The process involves localizing and classifying sounds mainly to estimate the start point and end points of the separate sounds and describe each sound. Sound event detection capability relies on the type of sound. Although detecting sequences of distinct temporal sounds is straightforward, the situation becomes complex when the sound is multiple overlapping of much single audio. This situation usually occurs in the forest environment. Therefore, this aim of the paper is to propose a Convolution Recurrent Neural Network-Long Short-Term Memory algorithm to detect an audio signature of intruders in the forest environment. The audio is extracted in the Mel-frequency cepstrum coefficient and fed into the algorithm as an input. Six sound categories are chainsaw, machete, car, hatchet, ambiance, and bike. They were tested using several epochs, batch size, and filter of the layer in the model. The proposed model can achieve an accuracy of 98.52 percent in detecting the audio signature with a suitable parameter selection. In the future, additional types of audio signatures of intruders and further aspects of evaluation can be added to make the algorithm better at detecting intruders in the forest environment. © 2023, Penerbit Akademia Baru. All rights reserved.
publisher Penerbit Akademia Baru
issn 24621943
language English
format Article
accesstype All Open Access; Hybrid Gold Open Access
record_format scopus
collection Scopus
_version_ 1809677580720144384