Journal of Computer Science

Matching LSI for Scalable Information Retrieval

Rajagopal Palsonkennedy and T. V. Gopal

DOI : 10.3844/jcssp.2012.2083.2097

Journal of Computer Science

Volume 8, Issue 12

Pages 2083-2097

Abstract

Latent Semantic Indexing (LSI) is one of the well-liked techniques in the information retrieval fields. Different from the traditional information retrieval techniques, LSI is not based on the keyword matching simply. It uses statistics and algebraic computations. Based on Singular Value Decomposition (SVD), the higher dimensional matrix is converted to a lower dimensional approximate matrix, of which the noises could be filtered. And also the issues of synonymy and polysemy in the traditional techniques can be prevail over based on the investigations of the terms related with the documents. However, it is notable that LSI suffers a scalability issue due to the computing complexity of SVD. This study presents a distributed LSI algorithm MR-LSI which can solve the scalability issue using Hadoop framework based on the distributed computing model Map Reduce. It also solves the overhead issue caused by the involved clustering algorithm by k-means algorithm. The evaluations indicate that MR-LSI can gain noteworthy improvement compared to the other scheme on processing large scale of documents. One significant advantage of Hadoop is that it supports various computing environments so that the issue of unbalanced load among nodes is highlighted.Hence, a load balancing algorithm based on genetic algorithm for balancing load in static environment is proposed. The results show that it can advance the performance of a cluster according to different levels.

Copyright

© 2012 Rajagopal Palsonkennedy and T. V. Gopal. This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.