Research Article Open Access

Tree Rotations for Dependency Trees: Converting the Head-Directionality of Noun Phrases

Ika Alfina1, Indra Budi1 and Heru Suhartanto1
  • 1 Universitas Indonesia, Indonesia
Journal of Computer Science
Volume 16 No. 11, 2020, 1585-1597

DOI: https://doi.org/10.3844/jcssp.2020.1585.1597

Submitted On: 2 September 2020 Published On: 17 November 2020

How to Cite: Alfina, I., Budi, I. & Suhartanto, H. (2020). Tree Rotations for Dependency Trees: Converting the Head-Directionality of Noun Phrases. Journal of Computer Science, 16(11), 1585-1597. https://doi.org/10.3844/jcssp.2020.1585.1597

Abstract

To overcome the lack of NLP resources for the low-resource languages, we can utilize tools that are already available for other highresource languages and then modify the output to conform to the target language. In this study, we proposed an approach to convert an Indonesian constituency treebank to a dependency treebank by utilizing an English NLP tool (Stanford CoreNLP) to create the initial dependency treebank. Some annotations in this initial treebank did not conform to Indonesian grammar, especially noun phrases’ head-directionality. Noun phrases in English usually have head-final direction, while in Indonesian is the opposite, head-initial. We proposed a variant of tree rotations algorithm named headSwap for dependency trees. We used this algorithm to convert the head-directionality for noun phrases that were initially labeled as a compound. Moreover, we also proposed a set of rules to rename the dependency relation labels to conform to the recent guidelines. To evaluate our proposed method, we created a gold standard of 2,846 tokens that were annotated manually. Experiment results showed that our proposed method improved the Unlabeled Attachment Score (UAS) with a margin of 32.5% from 61.6 to 94.1% and the Labeled Attachment Score (LAS) with a margin of 41% from 44.1 to 85.1%. Finally, we created a new Indonesian dependency treebank that converted automatically using our proposed method that consists of 25,416 tokens. The dependency parser model built using this treebank has UAS of 75.90% and LAS of 70.38%.

  • 71 Views
  • 43 Downloads
  • 0 Citations

Download

Keywords

  • Dependency Parsing
  • Head-Directionality
  • Indonesian
  • Noun Phrases
  • Tree Rotations