Multiobjective Binary Biogeography Based Optimization for Feature Selection Using Gene Expression Data


Abstract

Gene expression data play an important role in the development of efficient cancer diagnoses and classification. However, gene expression data are usually redundant and noisy, and only a subset of them present distinct profiles for different classes of samples. Thus, selecting high discriminative genes from gene expression data has become increasingly interesting in the field of bioinformatics. In this paper, a multi-objective biogeography based optimization method is proposed to select the small subset of informative gene relevant to the classification. In the proposed algorithm, firstly, the Fisher-Markov selector is used to choose the 60 top gene expression data. Secondly, to make biogeography based optimization suitable for the discrete problem, binary biogeography based optimization, as called BBBO, is proposed based on a binary migration model and a binary mutation model. Then, multi-objective binary biogeography based optimization, as we called MOBBBO, is proposed by integrating the non-dominated sorting method and the crowding distance method into the BBBO framework. Finally, the MOBBBO method is used for gene selection, and support vector machine is used as the classifier with the leave-one-out cross-validation method (LOOCV). In order to show the effective and efficiency of the algorithm, the proposed algorithm is tested on ten gene expression dataset benchmarks. Experimental results demonstrate that the proposed method is better or at least comparable with previous particle swarm optimization (PSO) algorithm and support vector machine (SVM) from literature when considering the quality of the solutions obtained.