Tree Rotations for Dependency Trees: Converting the Head-Directionality of Noun Phrases
- 1 Universitas Indonesia, Indonesia
Copyright: © 2020 Ika Alfina, Indra Budi and Heru Suhartanto. This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
To overcome the lack of NLP resources for the low-resource languages, we can utilize tools that are already available for other highresource languages and then modify the output to conform to the target language. In this study, we proposed an approach to convert an Indonesian constituency treebank to a dependency treebank by utilizing an English NLP tool (Stanford CoreNLP) to create the initial dependency treebank. Some annotations in this initial treebank did not conform to Indonesian grammar, especially noun phrases’ head-directionality. Noun phrases in English usually have head-final direction, while in Indonesian is the opposite, head-initial. We proposed a variant of tree rotations algorithm named headSwap for dependency trees. We used this algorithm to convert the head-directionality for noun phrases that were initially labeled as a compound. Moreover, we also proposed a set of rules to rename the dependency relation labels to conform to the recent guidelines. To evaluate our proposed method, we created a gold standard of 2,846 tokens that were annotated manually. Experiment results showed that our proposed method improved the Unlabeled Attachment Score (UAS) with a margin of 32.5% from 61.6 to 94.1% and the Labeled Attachment Score (LAS) with a margin of 41% from 44.1 to 85.1%. Finally, we created a new Indonesian dependency treebank that converted automatically using our proposed method that consists of 25,416 tokens. The dependency parser model built using this treebank has UAS of 75.90% and LAS of 70.38%.
- 71 Views
- 43 Downloads
- 0 Citations
- Dependency Parsing
- Noun Phrases
- Tree Rotations