TY - JOUR AU - Pasupathi, Chitra AU - Ramachandran, Baskaran AU - Karunakaran, Sarukesi PY - 2012 TI - Web Document Segmentation Using Frequent Term Sets for Summarization JF - Journal of Computer Science VL - 8 IS - 12 DO - 10.3844/jcssp.2012.2053.2061 UR - https://thescipub.com/abstract/jcssp.2012.2053.2061 AB - Query sensitive summarization aims at extracting the query relevant contents from web documents. Web page segmentation focuses on reducing the run time overhead of the summarization systems by grouping the related contents of a web page into segments. At query time, query relevant segments of the web page are identified and important sentences from these segments are extracted to compose the summary. DOM tree structures of the web documents are utilized to perform the segmentation of the contents. Leaf nodes of DOM tress are merged to form segments according to the statistical and linguistic similarity measure. The proposed system has been evaluated by intrinsic approach making use of user satisfaction index. The performance of the system is compared with summarization without using preprocessed segments. Performance of this system is more promising than the other measures like cosine similarity, jaccard measure which make use of sparse term-frequent vectors, since the most frequent term sets are considered to measure the relevance. Relevant segments alone need to be processed at run time for summarization which reduces the time complexity of the summarization process.