A Methodology to Segment the Text for Index Terms

Muhammad Shoaib; Abad Ali Shah

doi:10.3844/ajassp.2005.1309.1314

Research Article Open Access

A Methodology to Segment the Text for Index Terms

Muhammad Shoaib and Abad Ali Shah

Abstract

The problem of information overload is a hot issue with the growth of the world wide web. The need for the tools those should be able to absorb this huge information and eliminate this problem is evident especially for IR systems. Text is not a simple sequence of words but carries a structure. It is essential to handle these uncontrollable complex structures of sentence, grammatical and lexical irrelevancy of different units. The main idea to handle these problems is to segment the text into elementary units, which will be simpler and lesser complex than their equivalent text. We have used cue phrases, punctuations. We are presenting an algorithm, which is not only efficient but also handling more than 500 cue phrases and most of punctuations. This proposed algorithm can yield elementary units, which can be used by Rhetorical Relations Finder to get relations among them, which can be used by the RST Parser for the construction of RST Tree which will be used to design an RST based indexer. In future, the algorithm can be enhanced for handling other discourse markers, which will enable us to handle the most complex cases where cue phrases and punctuations are not applicable.

American Journal of Applied Sciences

Volume 2 No. 9, 2005, 1309-1314

DOI: https://doi.org/10.3844/ajassp.2005.1309.1314

Submitted On: 20 September 2005 Published On: 30 September 2005

How to Cite: Shoaib, M. & Shah, A. A. (2005). A Methodology to Segment the Text for Index Terms. American Journal of Applied Sciences, 2(9), 1309-1314. https://doi.org/10.3844/ajassp.2005.1309.1314

Copyright: © 2005 Muhammad Shoaib and Abad Ali Shah. This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.

5,899 Views
3,719 Downloads
0 Citations

Download

Keywords

RST
text segmentation
cue phrases
punctuations