A Malay Hadith translated document retrieval using parallel Latent Semantic Indexing (LSI)

Latent Semantic Indexing (LSI) is one of the well-known searching techniques where documents are retrieved based on the content similarity or meaning of the documents. LSI is an effective method to improve the retrieval performance, however, as the size of documents gets larger; a better technique i...

Full description

Bibliographic Details
Published in:2016 3rd International Conference on Information Retrieval and Knowledge Management, CAMP 2016 - Conference Proceedings
Main Author: Amirah N.N.; Rahim T.M.; Mabni Z.; Hanum H.M.; Rahman N.A.
Format: Conference paper
Language:English
Published: Institute of Electrical and Electronics Engineers Inc. 2017
Online Access:https://www.scopus.com/inward/record.uri?eid=2-s2.0-85015812907&doi=10.1109%2fINFRKM.2016.7806346&partnerID=40&md5=0a60b7dc337b0e95e27c9f012db9b38d
id 2-s2.0-85015812907
spelling 2-s2.0-85015812907
Amirah N.N.; Rahim T.M.; Mabni Z.; Hanum H.M.; Rahman N.A.
A Malay Hadith translated document retrieval using parallel Latent Semantic Indexing (LSI)
2017
2016 3rd International Conference on Information Retrieval and Knowledge Management, CAMP 2016 - Conference Proceedings


10.1109/INFRKM.2016.7806346
https://www.scopus.com/inward/record.uri?eid=2-s2.0-85015812907&doi=10.1109%2fINFRKM.2016.7806346&partnerID=40&md5=0a60b7dc337b0e95e27c9f012db9b38d
Latent Semantic Indexing (LSI) is one of the well-known searching techniques where documents are retrieved based on the content similarity or meaning of the documents. LSI is an effective method to improve the retrieval performance, however, as the size of documents gets larger; a better technique is needed to process the documents faster. In this paper, a new parallel LSI algorithm which runs on standard multi-core personal computer (PC) is proposed to improve the performance of retrieving relevant documents. The parallel LSI algorithm uses parallel threads to automatically perform the matrix computations using the Fork-Join approach. 2028 text documents extracted from four volumes of the Malay-translated book of Hadith known as Shahih Bukhari were used as the test collections. We compare the time to process LSI space between both sequential and parallel systems. The percentage of recall, precision and effectiveness for retrieving relevant document are also measured for both systems using the Information Retrieval (IR) metrics which are recall, precision, and effectiveness. The results show that the time taken to create LSI space for parallel system is faster than sequential system. Based on recall, precision and effectiveness measures, our proposed parallel LSI system is comparable to sequential LSI system. © 2016 IEEE.
Institute of Electrical and Electronics Engineers Inc.

English
Conference paper

author Amirah N.N.; Rahim T.M.; Mabni Z.; Hanum H.M.; Rahman N.A.
spellingShingle Amirah N.N.; Rahim T.M.; Mabni Z.; Hanum H.M.; Rahman N.A.
A Malay Hadith translated document retrieval using parallel Latent Semantic Indexing (LSI)
author_facet Amirah N.N.; Rahim T.M.; Mabni Z.; Hanum H.M.; Rahman N.A.
author_sort Amirah N.N.; Rahim T.M.; Mabni Z.; Hanum H.M.; Rahman N.A.
title A Malay Hadith translated document retrieval using parallel Latent Semantic Indexing (LSI)
title_short A Malay Hadith translated document retrieval using parallel Latent Semantic Indexing (LSI)
title_full A Malay Hadith translated document retrieval using parallel Latent Semantic Indexing (LSI)
title_fullStr A Malay Hadith translated document retrieval using parallel Latent Semantic Indexing (LSI)
title_full_unstemmed A Malay Hadith translated document retrieval using parallel Latent Semantic Indexing (LSI)
title_sort A Malay Hadith translated document retrieval using parallel Latent Semantic Indexing (LSI)
publishDate 2017
container_title 2016 3rd International Conference on Information Retrieval and Knowledge Management, CAMP 2016 - Conference Proceedings
container_volume
container_issue
doi_str_mv 10.1109/INFRKM.2016.7806346
url https://www.scopus.com/inward/record.uri?eid=2-s2.0-85015812907&doi=10.1109%2fINFRKM.2016.7806346&partnerID=40&md5=0a60b7dc337b0e95e27c9f012db9b38d
description Latent Semantic Indexing (LSI) is one of the well-known searching techniques where documents are retrieved based on the content similarity or meaning of the documents. LSI is an effective method to improve the retrieval performance, however, as the size of documents gets larger; a better technique is needed to process the documents faster. In this paper, a new parallel LSI algorithm which runs on standard multi-core personal computer (PC) is proposed to improve the performance of retrieving relevant documents. The parallel LSI algorithm uses parallel threads to automatically perform the matrix computations using the Fork-Join approach. 2028 text documents extracted from four volumes of the Malay-translated book of Hadith known as Shahih Bukhari were used as the test collections. We compare the time to process LSI space between both sequential and parallel systems. The percentage of recall, precision and effectiveness for retrieving relevant document are also measured for both systems using the Information Retrieval (IR) metrics which are recall, precision, and effectiveness. The results show that the time taken to create LSI space for parallel system is faster than sequential system. Based on recall, precision and effectiveness measures, our proposed parallel LSI system is comparable to sequential LSI system. © 2016 IEEE.
publisher Institute of Electrical and Electronics Engineers Inc.
issn
language English
format Conference paper
accesstype
record_format scopus
collection Scopus
_version_ 1809678160643489792