Simultaneous informative gene selection and clustering through multiobjective optimization


Abstract

Clustering methods are used for unsupervised classification of tumor subclasses in microarray gene expression data sets organized in a fashion where the rows represent the tumor samples and columns represent the genes. Clustering algorithms can be very sensitive with respect to the set of features (genes) considered in the clustering process. It is important to select the set of informative and relevant genes to be used for clustering. In this article, a multiobjective genetic algorithm based technique has been proposed for performing the tasks of gene selection and fuzzy clustering simultaneously. A novel encoding technique is developed in this regard and the algorithm searches for the best cluster centers while minimizing the number of selected genes. The number of clusters is evolved automatically. The performance of the proposed technique has been illustrated on an artificial data set and compared with that of several other related feature selection/clustering approaches. Moreover its performance is demonstrated on two real life multi-class gene expression data sets viz., Brain tumor and Lung tumor data sets.