Facing-up Challenges of Multiobjective Clustering Based on Evolutionary Algorithms: Representations, Scalability and Retrieval Solutions


Abstract

The era in which we live can be considered the Information Age because it is characterized by a technological revolution centered on digital technologies of information and communication. Large amount of information is collected every day, being the cornerstone of modern society. However, information is not useful if it is not properly managed to be transformed into wisdom through the extraction of understandable knowledge. Data Mining is the process of automatically extracting and discovering new, useful and understandable knowledge from huge volumes of data. It allows experts to accost the problems better in a specific domain and to obtain wisdom, such as the melanoma detection and managing the demand of energy more efficiently. Data Mining involves four kind of techniques, and one of them is the clustering approach. It is based on grouping data according to a set of criteria, summarized in a single objective, obtaining groups where the elements are similar among them and different from the elements of the other clusters. These groupings provide a possible classification or categorization of the elements. Experts can obtain wisdom if they properly understand this categorization, for this reason it is necessary to obtain understandable patterns. Thus, experts may need to define several criteria to be optimized in the clustering process that cannot be summarized in a single objective due to their characteristics. Nevertheless, conventional clustering algorithms are not useful when more than one objective has to be optimized and it is necessary to apply other kind of methods. This thesis is focused on multiobjective clustering algorithms, which are based on optimizing several objectives simultaneously obtaining a collection of potential solutions with different trade-offs among objectives. Specifically, the goal of the thesis is to design and implement a new multiobjective clustering technique based on evolutionary algorithms for facing up three current challenges related to this kind of techniques. The first challenge is focused on successfully defining the area of possible solutions that is explored in order to find the best solution, and this depends on the knowledge representation. The second challenge tries to scale-up the system splitting the original data set into several data subsets in order to work with less data in the clustering process. The third challenge is addressed to the retrieval of the most suitable solution according to the quality and shape of the clusters from the most interesting region of the collection of solutions returned by the multiobjective clustering algorithm. All the contributions related to these challenges are integrated in a framework called CAOS and successfully tested in a wide range of artificial and real-world data sets.