Enhancing Pretrained Multilingual Machine Translation Model with Code-Switching: A Study on Chinese, English and Malay Language

In the field of multilingual machine translation, many pretrained language models have achieved the inspiring results. However, the results based on pretrained models are not yet very satisfactory for low-resource languages. This paper investigates how to leverage code-switching data to fine-tune pr...

Full description

Bibliographic Details
Published in:	ICCPR 2024 - Proceedings of the 2024 13th International Conference on Computing and Pattern Recognition
Main Author:	Liu H.; Seman N.
Format:	Conference paper
Language:	English
Published:	Association for Computing Machinery, Inc 2025
Online Access:	https://www.scopus.com/inward/record.uri?eid=2-s2.0-85218347057&doi=10.1145%2f3704323.3704346&partnerID=40&md5=da2c58dd104d36641e0865c691fbe6df

id	2-s2.0-85218347057
spelling	2-s2.0-85218347057 Liu H.; Seman N. Enhancing Pretrained Multilingual Machine Translation Model with Code-Switching: A Study on Chinese, English and Malay Language 2025 ICCPR 2024 - Proceedings of the 2024 13th International Conference on Computing and Pattern Recognition 10.1145/3704323.3704346 https://www.scopus.com/inward/record.uri?eid=2-s2.0-85218347057&doi=10.1145%2f3704323.3704346&partnerID=40&md5=da2c58dd104d36641e0865c691fbe6df In the field of multilingual machine translation, many pretrained language models have achieved the inspiring results. However, the results based on pretrained models are not yet very satisfactory for low-resource languages. This paper investigates how to leverage code-switching data to fine-tune pretrained multilingual machine translation model, in order to boost the performance of few-shot low resource machine translation. By utilizing a multilingual mixed corpus, the code-switching method can enhance the cross-linguistic generalization ability of the model and improve its overall understanding of the language. In this paper, on the smaller size model of FLORES-101 benchmark, we use the code-switching data augmentation method to achieve the results of benchmark's larger model on six direction pairs of three languages, Chinese, English and Malay. This paper studied various corpus mixture mechanisms to construct the data in code-switching, and the experimental findings show that the results using the code-switching fine-tuning model improve the spBLEU score by an average of 2 to 3 points over the results without code-switching. © 2024 Copyright held by the owner/author(s). Association for Computing Machinery, Inc English Conference paper
author	Liu H.; Seman N.
spellingShingle	Liu H.; Seman N. Enhancing Pretrained Multilingual Machine Translation Model with Code-Switching: A Study on Chinese, English and Malay Language
author_facet	Liu H.; Seman N.
author_sort	Liu H.; Seman N.
title	Enhancing Pretrained Multilingual Machine Translation Model with Code-Switching: A Study on Chinese, English and Malay Language
title_short	Enhancing Pretrained Multilingual Machine Translation Model with Code-Switching: A Study on Chinese, English and Malay Language
title_full	Enhancing Pretrained Multilingual Machine Translation Model with Code-Switching: A Study on Chinese, English and Malay Language
title_fullStr	Enhancing Pretrained Multilingual Machine Translation Model with Code-Switching: A Study on Chinese, English and Malay Language
title_full_unstemmed	Enhancing Pretrained Multilingual Machine Translation Model with Code-Switching: A Study on Chinese, English and Malay Language
title_sort	Enhancing Pretrained Multilingual Machine Translation Model with Code-Switching: A Study on Chinese, English and Malay Language
publishDate	2025
container_title	ICCPR 2024 - Proceedings of the 2024 13th International Conference on Computing and Pattern Recognition
container_volume
container_issue
doi_str_mv	10.1145/3704323.3704346
url	https://www.scopus.com/inward/record.uri?eid=2-s2.0-85218347057&doi=10.1145%2f3704323.3704346&partnerID=40&md5=da2c58dd104d36641e0865c691fbe6df
description	In the field of multilingual machine translation, many pretrained language models have achieved the inspiring results. However, the results based on pretrained models are not yet very satisfactory for low-resource languages. This paper investigates how to leverage code-switching data to fine-tune pretrained multilingual machine translation model, in order to boost the performance of few-shot low resource machine translation. By utilizing a multilingual mixed corpus, the code-switching method can enhance the cross-linguistic generalization ability of the model and improve its overall understanding of the language. In this paper, on the smaller size model of FLORES-101 benchmark, we use the code-switching data augmentation method to achieve the results of benchmark's larger model on six direction pairs of three languages, Chinese, English and Malay. This paper studied various corpus mixture mechanisms to construct the data in code-switching, and the experimental findings show that the results using the code-switching fine-tuning model improve the spBLEU score by an average of 2 to 3 points over the results without code-switching. © 2024 Copyright held by the owner/author(s).
publisher	Association for Computing Machinery, Inc
issn
language	English
format	Conference paper
accesstype
record_format	scopus
collection	Scopus
_version_	1825722573662453760

Enhancing Pretrained Multilingual Machine Translation Model with Code-Switching: A Study on Chinese, English and Malay Language

Similar Items