Journal of Computer Science

Toponym Disambiguation by Arborescent Relationships

Imene Bensalem and Mohamed-Khireddine Kholladi

DOI : 10.3844/jcssp.2010.653.659

Journal of Computer Science

Volume 6, Issue 6

Pages 653-659


Problem statement: The way of referring to a place in the geographical space can be formal, based on the spatial coordinates, or informal, which we use in natural language by using toponyms (place names). A toponym can represent several geographical places. This ambiguity made problematic its conversion towards a unique formal representation. Toponym disambiguation in text is the task of assigning a unique location to an ambiguous place name in a given textual context. Approach: Several toponym disambiguation heuristics assumed a geographical proximity between the toponyms of the same context. This proximity can be in terms of spatial distance or in terms of arborsecent relationships, i.e., proximity in the hierarchical tree of the world places. This study presented a new toponym disambiguation heuristic in text based on the quantification of the arborescent proximity between toponyms. This quantification was done by a new measure of geographical correlation that we call the Geographical Density. Results: Our method was compared to the state of the art methods using GeoSemCor corpus and it has outperformed them in term of recall (87.4%) and coverage (99.0%). The results showed that the toponyms of the same context are much closer in terms of arborescent relationships than in terms of spatial relationships. Conclusion: We believe that the quantification of arborescent relationships between toponyms of the same textual context is a good way to improve the recall of TD task. However, all the arborescent relationships’ types must be considered and not only the meronymy, which is the relation the most exploited in the existing TD methods.


© 2010 Imene Bensalem and Mohamed-Khireddine Kholladi. This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.