Improving document relevancy using integrated language modeling techniques
This paper presents an integrated language model to improve document relevancy for text-queries. To be precise, an integrated stemming-lemmatization (S-L) model was developed and its retrieval performance was compared at three document levels, that is, at top 5, 10 and 15. A prototype search engine...
Published in: | Malaysian Journal of Computer Science |
---|---|
Main Author: | |
Format: | Article |
Language: | English |
Published: |
Faculty of Computer Science and Information Technology
2016
|
Online Access: | https://www.scopus.com/inward/record.uri?eid=2-s2.0-84966599713&doi=10.22452%2fmjcs.vol29no1.4&partnerID=40&md5=c6cbb48781ce0da0b37dae5ef6153c9a |
id |
2-s2.0-84966599713 |
---|---|
spelling |
2-s2.0-84966599713 Balakrishnan V.; Humaidi N.; Lloyd-Yemoh E. Improving document relevancy using integrated language modeling techniques 2016 Malaysian Journal of Computer Science 29 1 10.22452/mjcs.vol29no1.4 https://www.scopus.com/inward/record.uri?eid=2-s2.0-84966599713&doi=10.22452%2fmjcs.vol29no1.4&partnerID=40&md5=c6cbb48781ce0da0b37dae5ef6153c9a This paper presents an integrated language model to improve document relevancy for text-queries. To be precise, an integrated stemming-lemmatization (S-L) model was developed and its retrieval performance was compared at three document levels, that is, at top 5, 10 and 15. A prototype search engine was developed and fifteen queries were executed. The mean average precisions revealed the S-L model to outperform the baseline (i.e. no language processing), stemming and also the lemmatization models at all three levels of the documents. These results were also supported by the histogram precisions which illustrated the integrated model to improve the document relevancy. However, it is to note that the precision differences between the various models were insignificant. Overall the study found that when language processing techniques, that is, stemming and lemmatization are combined, more relevant documents are retrieved. Faculty of Computer Science and Information Technology 01279084 English Article All Open Access; Bronze Open Access |
author |
Balakrishnan V.; Humaidi N.; Lloyd-Yemoh E. |
spellingShingle |
Balakrishnan V.; Humaidi N.; Lloyd-Yemoh E. Improving document relevancy using integrated language modeling techniques |
author_facet |
Balakrishnan V.; Humaidi N.; Lloyd-Yemoh E. |
author_sort |
Balakrishnan V.; Humaidi N.; Lloyd-Yemoh E. |
title |
Improving document relevancy using integrated language modeling techniques |
title_short |
Improving document relevancy using integrated language modeling techniques |
title_full |
Improving document relevancy using integrated language modeling techniques |
title_fullStr |
Improving document relevancy using integrated language modeling techniques |
title_full_unstemmed |
Improving document relevancy using integrated language modeling techniques |
title_sort |
Improving document relevancy using integrated language modeling techniques |
publishDate |
2016 |
container_title |
Malaysian Journal of Computer Science |
container_volume |
29 |
container_issue |
1 |
doi_str_mv |
10.22452/mjcs.vol29no1.4 |
url |
https://www.scopus.com/inward/record.uri?eid=2-s2.0-84966599713&doi=10.22452%2fmjcs.vol29no1.4&partnerID=40&md5=c6cbb48781ce0da0b37dae5ef6153c9a |
description |
This paper presents an integrated language model to improve document relevancy for text-queries. To be precise, an integrated stemming-lemmatization (S-L) model was developed and its retrieval performance was compared at three document levels, that is, at top 5, 10 and 15. A prototype search engine was developed and fifteen queries were executed. The mean average precisions revealed the S-L model to outperform the baseline (i.e. no language processing), stemming and also the lemmatization models at all three levels of the documents. These results were also supported by the histogram precisions which illustrated the integrated model to improve the document relevancy. However, it is to note that the precision differences between the various models were insignificant. Overall the study found that when language processing techniques, that is, stemming and lemmatization are combined, more relevant documents are retrieved. |
publisher |
Faculty of Computer Science and Information Technology |
issn |
01279084 |
language |
English |
format |
Article |
accesstype |
All Open Access; Bronze Open Access |
record_format |
scopus |
collection |
Scopus |
_version_ |
1814778509588955136 |