Research Article Open Access

SUPERVISED TERM WEIGHTING METHODS FOR URL CLASSIFICATION

R. Rajalakshmi1
  • 1 , India

Abstract

Many term weighting methods are suggested in the literature for Information Retrieval and Text Categorization. Term weighting method, a part of feature selection process is not yet explored for URL classification problem. We classify a web page using its URL alone without fetching its content and hence URL based classification is faster than other methods. In this study, we investigate the use of term weighting methods for selecting relevant URL features and their impact on the performance of URL classification. We propose a New Relevance Factor (NRF) for the supervised term weighting method to compute the URL weights and perform multiclass classification of URLs using Naive Bayes Classifier. To evaluate the proposed method, we have conducted various experiments on ODP dataset and our experimental results show that the proposed supervised term weighting method based on NRF is suitable for URL classification. We have achieved 11% improvement in terms of Precision over the existing binary classifier methods and 22% improvement in terms of F1 when compared with existing multiclass classifiers.

Journal of Computer Science
Volume 10 No. 10, 2014, 1969-1976

DOI: https://doi.org/10.3844/jcssp.2014.1969.1976

Submitted On: 3 April 2014 Published On: 23 June 2014

How to Cite: Rajalakshmi, R. (2014). SUPERVISED TERM WEIGHTING METHODS FOR URL CLASSIFICATION. Journal of Computer Science, 10(10), 1969-1976. https://doi.org/10.3844/jcssp.2014.1969.1976

  • 2,531 Views
  • 1,876 Downloads
  • 12 Citations

Download

Keywords

  • Web Page Classification
  • URL Features
  • Term Weighting Method
  • ODP