Research Article Open Access

Integrating Correlation Clustering and Agglomerative Hierarchical Clustering for Holistic Schema Matching

Basel Alshaikhdeeb1 and Kamsuriah Ahmad1
  • 1 National University of Malaysia, Malaysia

Abstract

Holistic schema matching is the process of carrying off several number of schemas as an input and outputs the correspondences among them. Treating large number of schemas may consume longer time with poor quality. Therefore, several clustering approaches have been proposed in order to reduce the search space by partitioning the data into smaller portions which can facilitate the matching process. However, there is still a demand for improving the partitioning mechanism by avoiding the random initial solutions (centroids) re-sulted from the clustering process. Such random solutions have a significant impact on the matching results. This study aims to integrate correlation clustering and agglomerative hierarchical clustering toward improving the effectiveness of holistic schema matching. The proposed integrated method avoids the random initial so-lutions and the predefined number of centroids. Several preprocessing steps have been performed with using auxiliary information (domain dictionary). The experiments have been carried out on Airfare, Auto and Book datasets from UIUC Web Integration Repository. The proposed method has been compared with K-means and K-medoids clustering methods. As a results the proposed method has outperformed K-means and K-medoids by achieving 0.9, 0.93 and 0.9 of accuracy for Airfare, Auto and Book respectively.

Journal of Computer Science
Volume 11 No. 3, 2015, 484-489

DOI: https://doi.org/10.3844/jcssp.2015.484.489

Submitted On: 9 April 2014 Published On: 3 April 2015

How to Cite: Alshaikhdeeb, B. & Ahmad, K. (2015). Integrating Correlation Clustering and Agglomerative Hierarchical Clustering for Holistic Schema Matching. Journal of Computer Science, 11(3), 484-489. https://doi.org/10.3844/jcssp.2015.484.489

  • 2,221 Views
  • 2,041 Downloads
  • 3 Citations

Download

Keywords

  • Schema Integration
  • Holistic Schema Matching
  • Correlation Clustering
  • Agglomerative Hierar-Chical Clustering