Research Article Open Access

Proposing the new Algorithm and Technique Development for Integrating Web Table Extraction and Building a Mashup

Rudy A.G. Gultom, Riri Fitri Sari and Bagio Budiardjo

Abstract

Problem statement: Nowadays, various types of data in web table can be easily extracted from the Internet, although not all of web tables are relevant to the users. As we may know, most web pages are in unstructured HTML format, making web table extraction process very time consuming and costly. HTML format only focuses on the presentation, not based on the database system. Therefore, users need a tool in dealing with that process. Approach: This research proposed an approach for implementing web table extraction and making a Mashup from HTML web pages using Xtractorz application. It is also discussed on how to collaborate and integrate a web table extraction process in the stage of building a Mashup, i.e., Data Retrieval, Data Source Modeling, Data Cleaning/ Filtering, Data Integration and Data Visualization. The main issue lies in stage of data modeling creation, in which Xtractorz must be able to automatically render Document Object Model (DOM) tree in accordance to HTML tag or code of the web page from which the table is extracted. To overcome that, the Xtractorz is equipped with algorithm and rules so it can enable to specifically analyze the HTML tags and to extract the data into a new table format. The algorithm is created by using recursive technique within a user-friendly GUI of Xtractorz. Results: The approach was evaluated by conducting experiment using Xtractorz and other similar applications, such as RoboMaker and Karma. The result of experiment showed that Xtractorz is more efficient in completing the experiment tasks, since Xtractorz has fewer steps to complete the whole tasks. Conclusion: Xtractorz can give a positive contribution in terms of algorithm technique and a new approach method to web table extraction process and making a Mashup, where the core algorithm can extracts web data tables using recursive technique while rendering the DOM tree model automatically.

Journal of Computer Science
Volume 7 No. 2, 2011, 129-142

DOI: https://doi.org/10.3844/jcssp.2011.129.142

Submitted On: 2 September 2010 Published On: 25 February 2011

How to Cite: Gultom, R. A., Sari, R. F. & Budiardjo, B. (2011). Proposing the new Algorithm and Technique Development for Integrating Web Table Extraction and Building a Mashup. Journal of Computer Science, 7(2), 129-142. https://doi.org/10.3844/jcssp.2011.129.142

  • 3,211 Views
  • 3,038 Downloads
  • 4 Citations

Download

Keywords

  • Web table extraction
  • mashup stages
  • recursive algorithm
  • Document Object Model (DOM)
  • HTML format
  • Integrated Development Environment (IDE)
  • data integration