Research Article Open Access

Optimizing Feature Construction Process for Dynamic Aggregation of Relational Attributes

Rayner Alfred1
  • 1 ,
Journal of Computer Science
Volume 5 No. 11, 2009, 864-877

DOI: https://doi.org/10.3844/jcssp.2009.864.877

Submitted On: 13 February 2009 Published On: 30 November 2009

How to Cite: Alfred, R. (2009). Optimizing Feature Construction Process for Dynamic Aggregation of Relational Attributes. Journal of Computer Science, 5(11), 864-877. https://doi.org/10.3844/jcssp.2009.864.877

Abstract

Problem statement: The importance of input representation has been recognized already in machine learning. Feature construction is one of the methods used to generate relevant features for learning data. This study addressed the question whether or not the descriptive accuracy of the DARA algorithm benefits from the feature construction process. In other words, this paper discusses the application of genetic algorithm to optimize the feature construction process to generate input data for the data summarization method called Dynamic Aggregation of Relational Attributes (DARA). Approach: The DARA algorithm was designed to summarize data stored in the non-target tables by clustering them into groups, where multiple records stored in non-target tables correspond to a single record stored in a target table. Here, feature construction methods are applied in order to improve the descriptive accuracy of the DARA algorithm. Since, the study addressed the question whether or not the descriptive accuracy of the DARA algorithm benefits from the feature construction process, the involved task includes solving the problem of constructing a relevant set of features for the DARA algorithm by using a genetic-based algorithm. Results: It is shown in the experimental results that the quality of summarized data is directly influenced by the methods used to create patterns that represent records in the (n×p) TF-IDF weighted frequency matrix. The results of the evaluation of the genetic-based feature construction algorithm showed that the data summarization results can be improved by constructing features by using the Cluster Entropy (CE) genetic-based feature construction algorithm. Conclusion: This study showed that the data summarization results can be improved by constructing features by using the cluster entropy genetic-based feature construction algorithm.

  • 1,245 Views
  • 1,436 Downloads
  • 10 Citations

Download

Keywords

  • Feature construction
  • feature transformation
  • data summarization
  • genetic algorithm
  • clustering