Distributional Models with Syntactic Contexts for the Measurement of Word Similarity in Brazilian Portuguese

Eduardo E. Berlitz; Denis A. Araujo; Allan B. Silva; Rodrigo R. Righi; Sandro J. Rigo

doi:10.3844/jcssp.2019.1378.1389

Research Article Open Access

Distributional Models with Syntactic Contexts for the Measurement of Word Similarity in Brazilian Portuguese

Eduardo E. Berlitz¹, Denis A. Araujo¹, Allan B. Silva¹, Rodrigo R. Righi¹ and Sandro J. Rigo¹

¹ University of Vale do Rio dos Sinos (UNISINOS), Brazil

Abstract

The similarity between words constitutes significant support to tasks in natural language processing. Several works use Lexical resources such as WordNet for semantic similarity and synonym identification. Nevertheless, words out-of-vocabulary or missing links between senses are perceived problems of this approach. Distributional-based proposals like word embeddings have successfully been used to meet such problems, but the lack of contextual information can prevent the achievement of even better results. The distributional models that include contextual information can bring advantages to this area, but these models are still scarcely explored. Therefore, this work studies the advantages of incorporating syntactic information in the distributional models, fostering for better results in semantic similarity approaches. For that purpose, the current work explore existing lexical and distributional techniques regarding the measurement of word similarity in Brazilian Portuguese. Experiments were carried out with the lexical database WordNet, using different techniques over a standard dataset. The results indicate that word embeddings can cover words out of vocabulary and have better results in comparison with lexical approaches. The main contribution of this article is a new approach to apply syntactic context in the training process of word embeddings to a Brazilian Portuguese corpus. The comparison of this model with the outcome of the previous experiments shows sound results and presents relevant complementary aspects.

Journal of Computer Science

Volume 15 No. 10, 2019, 1378-1389

DOI: https://doi.org/10.3844/jcssp.2019.1378.1389

Submitted On: 23 April 2019 Published On: 31 May 2019

How to Cite: Berlitz, E. E., Araujo, D. A., Silva, A. B., Righi, R. R. & Rigo, S. J. (2019). Distributional Models with Syntactic Contexts for the Measurement of Word Similarity in Brazilian Portuguese. Journal of Computer Science, 15(10), 1378-1389. https://doi.org/10.3844/jcssp.2019.1378.1389

Copyright: © 2019 Eduardo E. Berlitz, Denis A. Araujo, Allan B. Silva, Rodrigo R. Righi and Sandro J. Rigo. This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.

5,954 Views
3,312 Downloads
0 Citations

Download

Keywords

Word Similarity
WordNet
Word Embeddings
Computational Linguistics
Natural Language Processing