Improving document relevancy using integrated language modeling techniques

This paper presents an integrated language model to improve document relevancy for text-queries. To be precise, an integrated stemming-lemmatization (S-L) model was developed and its retrieval performance was compared at three document levels, that is, at top 5, 10 and 15. A prototype search engine...

Full description

Bibliographic Details
Published in:Malaysian Journal of Computer Science
Main Author: Balakrishnan V.; Humaidi N.; Lloyd-Yemoh E.
Format: Article
Language:English
Published: Faculty of Computer Science and Information Technology 2016
Online Access:https://www.scopus.com/inward/record.uri?eid=2-s2.0-84966599713&doi=10.22452%2fmjcs.vol29no1.4&partnerID=40&md5=c6cbb48781ce0da0b37dae5ef6153c9a
id 2-s2.0-84966599713
spelling 2-s2.0-84966599713
Balakrishnan V.; Humaidi N.; Lloyd-Yemoh E.
Improving document relevancy using integrated language modeling techniques
2016
Malaysian Journal of Computer Science
29
1
10.22452/mjcs.vol29no1.4
https://www.scopus.com/inward/record.uri?eid=2-s2.0-84966599713&doi=10.22452%2fmjcs.vol29no1.4&partnerID=40&md5=c6cbb48781ce0da0b37dae5ef6153c9a
This paper presents an integrated language model to improve document relevancy for text-queries. To be precise, an integrated stemming-lemmatization (S-L) model was developed and its retrieval performance was compared at three document levels, that is, at top 5, 10 and 15. A prototype search engine was developed and fifteen queries were executed. The mean average precisions revealed the S-L model to outperform the baseline (i.e. no language processing), stemming and also the lemmatization models at all three levels of the documents. These results were also supported by the histogram precisions which illustrated the integrated model to improve the document relevancy. However, it is to note that the precision differences between the various models were insignificant. Overall the study found that when language processing techniques, that is, stemming and lemmatization are combined, more relevant documents are retrieved.
Faculty of Computer Science and Information Technology
01279084
English
Article
All Open Access; Bronze Open Access
author Balakrishnan V.; Humaidi N.; Lloyd-Yemoh E.
spellingShingle Balakrishnan V.; Humaidi N.; Lloyd-Yemoh E.
Improving document relevancy using integrated language modeling techniques
author_facet Balakrishnan V.; Humaidi N.; Lloyd-Yemoh E.
author_sort Balakrishnan V.; Humaidi N.; Lloyd-Yemoh E.
title Improving document relevancy using integrated language modeling techniques
title_short Improving document relevancy using integrated language modeling techniques
title_full Improving document relevancy using integrated language modeling techniques
title_fullStr Improving document relevancy using integrated language modeling techniques
title_full_unstemmed Improving document relevancy using integrated language modeling techniques
title_sort Improving document relevancy using integrated language modeling techniques
publishDate 2016
container_title Malaysian Journal of Computer Science
container_volume 29
container_issue 1
doi_str_mv 10.22452/mjcs.vol29no1.4
url https://www.scopus.com/inward/record.uri?eid=2-s2.0-84966599713&doi=10.22452%2fmjcs.vol29no1.4&partnerID=40&md5=c6cbb48781ce0da0b37dae5ef6153c9a
description This paper presents an integrated language model to improve document relevancy for text-queries. To be precise, an integrated stemming-lemmatization (S-L) model was developed and its retrieval performance was compared at three document levels, that is, at top 5, 10 and 15. A prototype search engine was developed and fifteen queries were executed. The mean average precisions revealed the S-L model to outperform the baseline (i.e. no language processing), stemming and also the lemmatization models at all three levels of the documents. These results were also supported by the histogram precisions which illustrated the integrated model to improve the document relevancy. However, it is to note that the precision differences between the various models were insignificant. Overall the study found that when language processing techniques, that is, stemming and lemmatization are combined, more relevant documents are retrieved.
publisher Faculty of Computer Science and Information Technology
issn 01279084
language English
format Article
accesstype All Open Access; Bronze Open Access
record_format scopus
collection Scopus
_version_ 1814778509588955136