Multi-objective Design of Hierarchical Consensus Functions for Clustering Ensembles Via Genetic Programming


Abstract

This paper investigates a genetic programming (GP) approach aimed at the multi-objective design of hierarchical consensus functions for clustering ensembles. By this means, data partitions obtained via different clustering techniques can be continuously refined (via selection and merging) by a population of fusion hierarchies having complementary validation indices as objective functions. To assess the potential of the novel framework in terms of efficiency and effectiveness, a series of systematic experiments, involving eleven variants of the proposed GP-based algorithm and a comparison with basic as well as advanced clustering methods (of which some are clustering ensembles and/or multi-objective in nature), have been conducted on a number of artificial, benchmark and bioinformatics datasets. Overall, the results corroborate the perspective that having fusion hierarchies operating on well-chosen subsets of data partitions is a fine strategy that may yield significant gains in terms of clustering robustness.