Research Article Open Access

Matching LSI for Scalable Information Retrieval

Rajagopal Palsonkennedy1 and T. V. Gopal1
  • 1 Anna University, India
Journal of Computer Science
Volume 8 No. 12, 2012, 2083-2097

DOI: https://doi.org/10.3844/jcssp.2012.2083.2097

Submitted On: 9 June 2012 Published On: 2 January 2013

How to Cite: Palsonkennedy, R. & Gopal, T. V. (2012). Matching LSI for Scalable Information Retrieval. Journal of Computer Science, 8(12), 2083-2097. https://doi.org/10.3844/jcssp.2012.2083.2097

Abstract

Latent Semantic Indexing (LSI) is one of the well-liked techniques in the information retrieval fields. Different from the traditional information retrieval techniques, LSI is not based on the keyword matching simply. It uses statistics and algebraic computations. Based on Singular Value Decomposition (SVD), the higher dimensional matrix is converted to a lower dimensional approximate matrix, of which the noises could be filtered. And also the issues of synonymy and polysemy in the traditional techniques can be prevail over based on the investigations of the terms related with the documents. However, it is notable that LSI suffers a scalability issue due to the computing complexity of SVD. This study presents a distributed LSI algorithm MR-LSI which can solve the scalability issue using Hadoop framework based on the distributed computing model Map Reduce. It also solves the overhead issue caused by the involved clustering algorithm by k-means algorithm. The evaluations indicate that MR-LSI can gain noteworthy improvement compared to the other scheme on processing large scale of documents. One significant advantage of Hadoop is that it supports various computing environments so that the issue of unbalanced load among nodes is highlighted.Hence, a load balancing algorithm based on genetic algorithm for balancing load in static environment is proposed. The results show that it can advance the performance of a cluster according to different levels.

  • 1,005 Views
  • 1,757 Downloads
  • 0 Citations

Download

Keywords

  • SVD
  • K-Means Cluster
  • T-Dmatrix
  • Mapreduce
  • LSI
  • Information Retrieval