A Cross-lingual part-of-speech tagging for Malay language
Cross-lingual annotation projection methods can benefit from rich-resourced languages to improve the performance of Natural Language Processing (NLP) tasks in less-resourced languages. In this research, Malay is experimented as the less-resourced language and English is experimented as the rich-reso...
Published in: | ICAART 2015 - 7th International Conference on Agents and Artificial Intelligence, Proceedings |
---|---|
Main Author: | |
Format: | Conference paper |
Language: | English |
Published: |
SciTePress
2015
|
Online Access: | https://www.scopus.com/inward/record.uri?eid=2-s2.0-84943278386&doi=10.5220%2f0005150602320240&partnerID=40&md5=de92cd6de1750dbe2a4450930a83396d |
id |
2-s2.0-84943278386 |
---|---|
spelling |
2-s2.0-84943278386 Zamin N.; Bakar Z.A. A Cross-lingual part-of-speech tagging for Malay language 2015 ICAART 2015 - 7th International Conference on Agents and Artificial Intelligence, Proceedings 2 10.5220/0005150602320240 https://www.scopus.com/inward/record.uri?eid=2-s2.0-84943278386&doi=10.5220%2f0005150602320240&partnerID=40&md5=de92cd6de1750dbe2a4450930a83396d Cross-lingual annotation projection methods can benefit from rich-resourced languages to improve the performance of Natural Language Processing (NLP) tasks in less-resourced languages. In this research, Malay is experimented as the less-resourced language and English is experimented as the rich-resourced language. The research is proposed to reduce the deadlock in Malay computational linguistic research due to the shortage of Malay tools and annotated corpus by exploiting state-of-the-art English tools. This paper proposed a cross-lingual annotation projection based on word alignment of two languages with syntactical differences. A word alignment method known as MEWA (Malay-English Word Aligner) that integrates a Dice Coefficient and bigram string similarity measure is proposed. MEWA is experimented to automatically induced annotations using a Malay test collection on terrorism and an identified English tool. In the POS annotation projection experiment, the algorithm achieved accuracy rate of 79%. SciTePress English Conference paper All Open Access; Green Open Access; Hybrid Gold Open Access |
author |
Zamin N.; Bakar Z.A. |
spellingShingle |
Zamin N.; Bakar Z.A. A Cross-lingual part-of-speech tagging for Malay language |
author_facet |
Zamin N.; Bakar Z.A. |
author_sort |
Zamin N.; Bakar Z.A. |
title |
A Cross-lingual part-of-speech tagging for Malay language |
title_short |
A Cross-lingual part-of-speech tagging for Malay language |
title_full |
A Cross-lingual part-of-speech tagging for Malay language |
title_fullStr |
A Cross-lingual part-of-speech tagging for Malay language |
title_full_unstemmed |
A Cross-lingual part-of-speech tagging for Malay language |
title_sort |
A Cross-lingual part-of-speech tagging for Malay language |
publishDate |
2015 |
container_title |
ICAART 2015 - 7th International Conference on Agents and Artificial Intelligence, Proceedings |
container_volume |
2 |
container_issue |
|
doi_str_mv |
10.5220/0005150602320240 |
url |
https://www.scopus.com/inward/record.uri?eid=2-s2.0-84943278386&doi=10.5220%2f0005150602320240&partnerID=40&md5=de92cd6de1750dbe2a4450930a83396d |
description |
Cross-lingual annotation projection methods can benefit from rich-resourced languages to improve the performance of Natural Language Processing (NLP) tasks in less-resourced languages. In this research, Malay is experimented as the less-resourced language and English is experimented as the rich-resourced language. The research is proposed to reduce the deadlock in Malay computational linguistic research due to the shortage of Malay tools and annotated corpus by exploiting state-of-the-art English tools. This paper proposed a cross-lingual annotation projection based on word alignment of two languages with syntactical differences. A word alignment method known as MEWA (Malay-English Word Aligner) that integrates a Dice Coefficient and bigram string similarity measure is proposed. MEWA is experimented to automatically induced annotations using a Malay test collection on terrorism and an identified English tool. In the POS annotation projection experiment, the algorithm achieved accuracy rate of 79%. |
publisher |
SciTePress |
issn |
|
language |
English |
format |
Conference paper |
accesstype |
All Open Access; Green Open Access; Hybrid Gold Open Access |
record_format |
scopus |
collection |
Scopus |
_version_ |
1809677687447355392 |