Feature Selection in Data-Mining for Genetics Using Genetic Algorithm

V. N. Rajavarman; S. P. Rajagopalan

doi:10.3844/jcssp.2007.723.725

Research Article Open Access

Feature Selection in Data-Mining for Genetics Using Genetic Algorithm

V. N. Rajavarman and S. P. Rajagopalan

Abstract

We discovered genetic features and environmental factors which were involved in multifactorial diseases. To exploit the massive data obtained from the experiments conducted at the General Hospital, Chennai, data mining tools were required and we proposed a 2-Phase approach using a specific genetic algorithm. This heuristic approach had been chosen as the number of features to consider was large (upto 3654 for biological data under our study). Collected data indicated for pairs of affected individuals of a same family their similarity at given points (locus) of their chromosomes. This was represented in a matrix where each locus was represented by a column and each pairs of individuals considered by a row. The objective was first to isolate the most relevant associations of features and then to class individuals that had the considered disease according to these associations. For the first phase, the feature selection problem, we used a genetic algorithm (GA). To deal with this very specific problem, some advanced mechanisms had been introduced in the genetic algorithm such as sharing, random immigrant, dedicated genetic operators and a particular distance operator had been defined. Then, the second phase, a clustering based on the features selected during the previous phase, will use the clustering algorithm k-means.

Journal of Computer Science

Volume 3 No. 9, 2007, 723-725

DOI: https://doi.org/10.3844/jcssp.2007.723.725

Submitted On: 6 September 2007 Published On: 30 September 2007

How to Cite: Rajavarman, V. N. & Rajagopalan, S. P. (2007). Feature Selection in Data-Mining for Genetics Using Genetic Algorithm. Journal of Computer Science, 3(9), 723-725. https://doi.org/10.3844/jcssp.2007.723.725

Copyright: © 2007 V. N. Rajavarman and S. P. Rajagopalan. This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.

6,288 Views
4,315 Downloads
9 Citations

Download

Keywords

Crossover
mutation
selection
fitness function
random immigrant
k-means algorithm