Research Article Open Access

A Comparison between Conditional Random Field and Structured Support Vector Machine for Arabic Named Entity Recognition

Marwa Muhammad1, Muhammad Rohaim1, Alaa Hamouda1 and Salah Abdel-Mageid1
  • 1 Al-Azhar University, Egypt
Journal of Computer Science
Volume 16 No. 1, 2020, 117-125

DOI: https://doi.org/10.3844/jcssp.2020.117.125

Submitted On: 2 January 2019 Published On: 30 January 2020

How to Cite: Muhammad, M., Rohaim, M., Hamouda, A. & Abdel-Mageid, S. (2020). A Comparison between Conditional Random Field and Structured Support Vector Machine for Arabic Named Entity Recognition. Journal of Computer Science, 16(1), 117-125. https://doi.org/10.3844/jcssp.2020.117.125

Abstract

The Named Entity Recognition (NER) is an integrated task in many NLP applications such as machine translation, Information extraction and question answering. Arabic is one of the authorised spoken languages in the united nation. Currently, there is much Arabic information on the internet, so, nowadays the need for tools which process this information becomes significant. In this study, we have examined the impact of the conditional random field and the structured support vector machine in the task of Arabic NER. The structured support vector machine is the first time to be applied in the Arabic name entity recognition. Our proposed system has three stages: Preprocessing, extracting features and building model. We have used simple features like the bag of words in the [-1,1] window, the bag of part of speech tag in the [-1,1] window to enable our system to detect the multi-words entities. Also, we have tried to enhance the Stanford part of speech tagger to enhance the tagger output tags, which enabled our system to differentiate between the name entities from the non-entities. In addition, we have employed the binary features of: Is a person, is a prename, is a pre-location, is a location and is an organization. Our system has been trained and tested on part of ANER Crop. The results have proved that the conditional random field-based Arabic NER system outperforms the structured support vector machine-based Arabic NER using the same features set.

  • 567 Views
  • 226 Downloads
  • 0 Citations

Download

Keywords

  • Natural Language Processing
  • Arabic Named Entity Recognition
  • Stanford Part of Speech Tagger Training
  • Conditional Random Field
  • Structured Support Vector Machine