Neural Machine Translation Models with Attention-Based Dropout Layer

In bilingual translation, attention-based Neural Machine Translation (NMT) models are used to achieve synchrony between input and output sequences and the notion of alignment. NMT model has obtained state-of-the-art performance for several language pairs. However, there has been little work explorin...

Full description

Bibliographic Details
Published in:	Computers, Materials and Continua
Main Author:	Israr H.; Khan S.A.; Tahir M.A.; Shahzad M.K.; Ahmad M.; Zain J.M.
Format:	Article
Language:	English
Published:	Tech Science Press 2023
Online Access:	https://www.scopus.com/inward/record.uri?eid=2-s2.0-85154579168&doi=10.32604%2fcmc.2023.035814&partnerID=40&md5=49d9b0e72985224dcb6209ce864a271f

id	2-s2.0-85154579168
spelling	2-s2.0-85154579168 Israr H.; Khan S.A.; Tahir M.A.; Shahzad M.K.; Ahmad M.; Zain J.M. Neural Machine Translation Models with Attention-Based Dropout Layer 2023 Computers, Materials and Continua 75 2 10.32604/cmc.2023.035814 https://www.scopus.com/inward/record.uri?eid=2-s2.0-85154579168&doi=10.32604%2fcmc.2023.035814&partnerID=40&md5=49d9b0e72985224dcb6209ce864a271f In bilingual translation, attention-based Neural Machine Translation (NMT) models are used to achieve synchrony between input and output sequences and the notion of alignment. NMT model has obtained state-of-the-art performance for several language pairs. However, there has been little work exploring useful architectures for Urdu-to-English machine translation. We conducted extensive Urdu-to-English translation experiments using Long short-term memory (LSTM)/Bidirectional recurrent neural networks (Bi-RNN)/Statistical recurrent unit (SRU)/Gated recurrent unit (GRU)/Convolutional neural network (CNN) and Transformer. Experimental results show that Bi-RNN and LSTM with attention mechanism trained iteratively, with a scalable data set, make precise predictions on unseen data. The trained models yielded competitive results by achieving 62.6% and 61% accuracy and 49.67 and 47.14 BLEU scores, respectively. From a qualitative perspective, the translation of the test sets was examined manually, and it was observed that trained models tend to produce repetitive output more frequently. The attention score produced by Bi-RNN and LSTM produced clear alignment, while GRU showed incorrect translation for words, poor alignment and lack of a clear structure. Therefore, we considered refining the attention-based models by defining an additional attention-based dropout layer. Attention dropout fixes alignment errors and minimizes translation errors at the word level. After empirical demonstration and comparison with their counterparts, we found improvement in the quality of the resulting translation system and a decrease in the perplexity and over-translation score. The ability of the proposed model was evaluated using Arabic-English and Persian-English datasets as well. We empirically concluded that adding an attention-based dropout layer helps improve GRU, SRU, and Transformer translation and is considerably more efficient in translation quality and speed. © 2023 Tech Science Press. All rights reserved. Tech Science Press 15462218 English Article All Open Access; Gold Open Access
author	Israr H.; Khan S.A.; Tahir M.A.; Shahzad M.K.; Ahmad M.; Zain J.M.
spellingShingle	Israr H.; Khan S.A.; Tahir M.A.; Shahzad M.K.; Ahmad M.; Zain J.M. Neural Machine Translation Models with Attention-Based Dropout Layer
author_facet	Israr H.; Khan S.A.; Tahir M.A.; Shahzad M.K.; Ahmad M.; Zain J.M.
author_sort	Israr H.; Khan S.A.; Tahir M.A.; Shahzad M.K.; Ahmad M.; Zain J.M.
title	Neural Machine Translation Models with Attention-Based Dropout Layer
title_short	Neural Machine Translation Models with Attention-Based Dropout Layer
title_full	Neural Machine Translation Models with Attention-Based Dropout Layer
title_fullStr	Neural Machine Translation Models with Attention-Based Dropout Layer
title_full_unstemmed	Neural Machine Translation Models with Attention-Based Dropout Layer
title_sort	Neural Machine Translation Models with Attention-Based Dropout Layer
publishDate	2023
container_title	Computers, Materials and Continua
container_volume	75
container_issue	2
doi_str_mv	10.32604/cmc.2023.035814
url	https://www.scopus.com/inward/record.uri?eid=2-s2.0-85154579168&doi=10.32604%2fcmc.2023.035814&partnerID=40&md5=49d9b0e72985224dcb6209ce864a271f
description	In bilingual translation, attention-based Neural Machine Translation (NMT) models are used to achieve synchrony between input and output sequences and the notion of alignment. NMT model has obtained state-of-the-art performance for several language pairs. However, there has been little work exploring useful architectures for Urdu-to-English machine translation. We conducted extensive Urdu-to-English translation experiments using Long short-term memory (LSTM)/Bidirectional recurrent neural networks (Bi-RNN)/Statistical recurrent unit (SRU)/Gated recurrent unit (GRU)/Convolutional neural network (CNN) and Transformer. Experimental results show that Bi-RNN and LSTM with attention mechanism trained iteratively, with a scalable data set, make precise predictions on unseen data. The trained models yielded competitive results by achieving 62.6% and 61% accuracy and 49.67 and 47.14 BLEU scores, respectively. From a qualitative perspective, the translation of the test sets was examined manually, and it was observed that trained models tend to produce repetitive output more frequently. The attention score produced by Bi-RNN and LSTM produced clear alignment, while GRU showed incorrect translation for words, poor alignment and lack of a clear structure. Therefore, we considered refining the attention-based models by defining an additional attention-based dropout layer. Attention dropout fixes alignment errors and minimizes translation errors at the word level. After empirical demonstration and comparison with their counterparts, we found improvement in the quality of the resulting translation system and a decrease in the perplexity and over-translation score. The ability of the proposed model was evaluated using Arabic-English and Persian-English datasets as well. We empirically concluded that adding an attention-based dropout layer helps improve GRU, SRU, and Transformer translation and is considerably more efficient in translation quality and speed. © 2023 Tech Science Press. All rights reserved.
publisher	Tech Science Press
issn	15462218
language	English
format	Article
accesstype	All Open Access; Gold Open Access
record_format	scopus
collection	Scopus
_version_	1820775452606726144

Neural Machine Translation Models with Attention-Based Dropout Layer

Similar Items