Web Document Segmentation Using Frequent Term Sets for Summarization

Chitra Pasupathi; Baskaran Ramachandran; Sarukesi Karunakaran

doi:10.3844/jcssp.2012.2053.2061

Research Article Open Access

Web Document Segmentation Using Frequent Term Sets for Summarization

Chitra Pasupathi¹, Baskaran Ramachandran² and Sarukesi Karunakaran³

¹ RMK Engineering College, India
² Anna University, India
³ Hindustan University, India

Abstract

Query sensitive summarization aims at extracting the query relevant contents from web documents. Web page segmentation focuses on reducing the run time overhead of the summarization systems by grouping the related contents of a web page into segments. At query time, query relevant segments of the web page are identified and important sentences from these segments are extracted to compose the summary. DOM tree structures of the web documents are utilized to perform the segmentation of the contents. Leaf nodes of DOM tress are merged to form segments according to the statistical and linguistic similarity measure. The proposed system has been evaluated by intrinsic approach making use of user satisfaction index. The performance of the system is compared with summarization without using preprocessed segments. Performance of this system is more promising than the other measures like cosine similarity, jaccard measure which make use of sparse term-frequent vectors, since the most frequent term sets are considered to measure the relevance. Relevant segments alone need to be processed at run time for summarization which reduces the time complexity of the summarization process.

Journal of Computer Science

Volume 8 No. 12, 2012, 2053-2061

DOI: https://doi.org/10.3844/jcssp.2012.2053.2061

Submitted On: 4 June 2012 Published On: 19 December 2012

How to Cite: Pasupathi, C., Ramachandran, B. & Karunakaran, S. (2012). Web Document Segmentation Using Frequent Term Sets for Summarization. Journal of Computer Science, 8(12), 2053-2061. https://doi.org/10.3844/jcssp.2012.2053.2061

Copyright: © 2012 Chitra Pasupathi, Baskaran Ramachandran and Sarukesi Karunakaran. This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.

6,278 Views
4,560 Downloads
9 Citations

Download

Keywords

Search Engine Optimization
Segmentation
Summarization
Pre-Processing
Query Sensitive