Distributional Models with Syntactic Contexts for the Measurement of Word Similarity in Brazilian Portuguese
- 1 University of Vale do Rio dos Sinos (UNISINOS), Brazil
The similarity between words constitutes significant support to tasks in natural language processing. Several works use Lexical resources such as WordNet for semantic similarity and synonym identification. Nevertheless, words out-of-vocabulary or missing links between senses are perceived problems of this approach. Distributional-based proposals like word embeddings have successfully been used to meet such problems, but the lack of contextual information can prevent the achievement of even better results. The distributional models that include contextual information can bring advantages to this area, but these models are still scarcely explored. Therefore, this work studies the advantages of incorporating syntactic information in the distributional models, fostering for better results in semantic similarity approaches. For that purpose, the current work explore existing lexical and distributional techniques regarding the measurement of word similarity in Brazilian Portuguese. Experiments were carried out with the lexical database WordNet, using different techniques over a standard dataset. The results indicate that word embeddings can cover words out of vocabulary and have better results in comparison with lexical approaches. The main contribution of this article is a new approach to apply syntactic context in the training process of word embeddings to a Brazilian Portuguese corpus. The comparison of this model with the outcome of the previous experiments shows sound results and presents relevant complementary aspects.
Copyright: © 2019 Eduardo E. Berlitz, Denis A. Araujo, Allan B. Silva, Rodrigo R. Righi and Sandro J. Rigo. This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
- 2,686 Views
- 1,345 Downloads
- 0 Citations
- Word Similarity
- Word Embeddings
- Computational Linguistics
- Natural Language Processing