Highly Efficient Architecture for Scalable Focused Crawling Using Incremental Parallel Web Crawler
P. Jaganathan and T. Karthikeyan
DOI : 10.3844/jcssp.2015.120.126
Journal of Computer Science
Volume 11, Issue 1
With the growing industrial impact over the recent years in computer science, data mining has established itself as one of the most important disciplines. In the fast growing Web and in an appropriate amount of time, locating the resources that are precise and relevant is a huge challenge for the all-purpose single process crawlers, which makes the enhanced and the convincing algorithm in demand. Gradually Large scale search engines frequently update their index and in a timely behavior which are not capable to present such information. In this study a scalable focused crawling is proposed with an incremental parallel Web crawler, the Web pages can be crawled concurrently that are relevant to multiple pre-defined topics. Furthermore, to solve the issue of URL distribution, a compound decision model based on multi-objective decision making method is introduced, which will consider multiple factors synthetically such as load balance and relevance, the update frequency issue can be solved by the local repository decision. The result shows that our proposed system will efficiently produce high quality, relevance and freshness with significantly low memory requirement.
© 2015 P. Jaganathan and T. Karthikeyan. This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.