Research Article Open Access

Poetry Classification Using Support Vector Machines

Noraini Jamal1, Masnizah Mohd1 and Shahrul Azman Noah1
  • 1 Universiti Kebangsaan Malaysia, Malaysia
Journal of Computer Science
Volume 8 No. 9, 2012, 1441-1446

DOI: https://doi.org/10.3844/jcssp.2012.1441.1446

Submitted On: 5 March 2012 Published On: 8 August 2012

How to Cite: Jamal, N., Mohd, M. & Noah, S. A. (2012). Poetry Classification Using Support Vector Machines. Journal of Computer Science, 8(9), 1441-1446. https://doi.org/10.3844/jcssp.2012.1441.1446

Abstract

Problem statement: Traditional Malay poetry called pantun is a form of art to express ideas, emotions and feelings in the form of rhyming lines. Malay poetry usually has a broad and deep meaning making it difficult to be interpreted. Moreover, few efforts have been done on automatic classification of literary text such as poetry. Approach: This research concerns with the classification of Malay pantun using Support Vector Machines (SVM). The capability of SVM through Radial Basic Function (RBF) and linear kernel function are implemented to classify pantun by theme, as well as poetry or non-poetry. A total of 1500 pantun are divided into 10 themes with 214 Malaysian folklore documents used as the training and testing datasets. We used tfidf for both classification experiments and the shape feature for the classification of poetry and non-poetry experiment alone. Results: The results of each experiment showed that the linear kernel achieved a better percentage of average accuracy compared to the RBF kernel. Conclusion: The results show the potential of SVM technique in classifying poems into various classification of which previous approaches only focused on classifying prose only.

  • 1,514 Views
  • 2,927 Downloads
  • 15 Citations

Download

Keywords

  • Text classification
  • support vector machines
  • malay poetry
  • Radial Basic Function (RBF)
  • express ideas
  • Malaysian folklore
  • classify pantun