A Malay Hadith translated document retrieval using parallel Latent Semantic Indexing (LSI)

Latent Semantic Indexing (LSI) is one of the well-known searching techniques where documents are retrieved based on the content similarity or meaning of the documents. LSI is an effective method to improve the retrieval performance, however, as the size of documents gets larger; a better technique i...

Full description

Bibliographic Details
Published in:2016 3rd International Conference on Information Retrieval and Knowledge Management, CAMP 2016 - Conference Proceedings
Main Author: Amirah N.N.; Rahim T.M.; Mabni Z.; Hanum H.M.; Rahman N.A.
Format: Conference paper
Language:English
Published: Institute of Electrical and Electronics Engineers Inc. 2017
Online Access:https://www.scopus.com/inward/record.uri?eid=2-s2.0-85015812907&doi=10.1109%2fINFRKM.2016.7806346&partnerID=40&md5=0a60b7dc337b0e95e27c9f012db9b38d
Description
Summary:Latent Semantic Indexing (LSI) is one of the well-known searching techniques where documents are retrieved based on the content similarity or meaning of the documents. LSI is an effective method to improve the retrieval performance, however, as the size of documents gets larger; a better technique is needed to process the documents faster. In this paper, a new parallel LSI algorithm which runs on standard multi-core personal computer (PC) is proposed to improve the performance of retrieving relevant documents. The parallel LSI algorithm uses parallel threads to automatically perform the matrix computations using the Fork-Join approach. 2028 text documents extracted from four volumes of the Malay-translated book of Hadith known as Shahih Bukhari were used as the test collections. We compare the time to process LSI space between both sequential and parallel systems. The percentage of recall, precision and effectiveness for retrieving relevant document are also measured for both systems using the Information Retrieval (IR) metrics which are recall, precision, and effectiveness. The results show that the time taken to create LSI space for parallel system is faster than sequential system. Based on recall, precision and effectiveness measures, our proposed parallel LSI system is comparable to sequential LSI system. © 2016 IEEE.
ISSN:
DOI:10.1109/INFRKM.2016.7806346