A Comparison between Conditional Random Field and Structured Support Vector Machine for Arabic Named Entity Recognition
- 1 Al-Azhar University, Egypt
Copyright: © 2020 Marwa Muhammad, Muhammad Rohaim, Alaa Hamouda and Salah Abdel-Mageid. This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
The Named Entity Recognition (NER) is an integrated task in many NLP applications such as machine translation, Information extraction and question answering. Arabic is one of the authorised spoken languages in the united nation. Currently, there is much Arabic information on the internet, so, nowadays the need for tools which process this information becomes significant. In this study, we have examined the impact of the conditional random field and the structured support vector machine in the task of Arabic NER. The structured support vector machine is the first time to be applied in the Arabic name entity recognition. Our proposed system has three stages: Preprocessing, extracting features and building model. We have used simple features like the bag of words in the [-1,1] window, the bag of part of speech tag in the [-1,1] window to enable our system to detect the multi-words entities. Also, we have tried to enhance the Stanford part of speech tagger to enhance the tagger output tags, which enabled our system to differentiate between the name entities from the non-entities. In addition, we have employed the binary features of: Is a person, is a prename, is a pre-location, is a location and is an organization. Our system has been trained and tested on part of ANER Crop. The results have proved that the conditional random field-based Arabic NER system outperforms the structured support vector machine-based Arabic NER using the same features set.
- 567 Views
- 226 Downloads
- 0 Citations
- Natural Language Processing
- Arabic Named Entity Recognition
- Stanford Part of Speech Tagger Training
- Conditional Random Field
- Structured Support Vector Machine