Journal of Computer Science

A Comparative Analysis of the Entropy and Transition Point Approach in Representing Index Terms of Literary Text

Hayati Abd Rahman and Shahrul Azman Noah

DOI : 10.3844/jcssp.2011.1088.1093

Journal of Computer Science

Volume 7, Issue 7

Pages 1088-1093

Abstract

Problem statement: Concept hierarchy is a hierarchically organized collection of domain concepts. It is particularly useful in many applications such as information retrieval, document browsing and document classification. Approach: One of the important tasks in construction of concept hierarchy is identification of suitable terms with appropriate size of domain vocabulary. Results: One way of achieving such a size is by using term reduction. The aim of this study is to examine the effectiveness of reduction approach to reduce size of vocabulary using term selection methods for literary text. The experiment compares entropy method, transition point method and hybrid of transition point and entropy methods with the Vector Space Model (VSM). Conclusion/Recommendations: Results indicate the effectiveness of Transition Point method as compared to the others in reducing size of vocabulary but at same time preserve those important terms that exist in the literary documents.

Copyright

© 2011 Hayati Abd Rahman and Shahrul Azman Noah. This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.