Algorithms to Identify Pareto Points in Multi-Dimensional Data Sets


Abstract

The focus of this research is on developing a fast, efficient hybrid algorithm to identify the Pareto frontier in multi-dimensional data sets. The hybrid algorithm is a blend of two different base algorithms, the Simple Cull (SC) algorithm that has a low overhead but is of overall high computational complexity, and the Divide & Conquer (DC) algorithm that has a lower computational complexity but has a high overhead. The hybrid algorithm employs aspects of each of the two based algorithms, adapting in response to the properties of the data. Each of the two base algorithms perform better for different classes of data, with the SC algorithm performing best for data sets with few nondominated points, high dimensionality, or fewer total numbers of points, while the DC algorithm performs better otherwise. The general approach to the hybrid algorithm is to execute the following steps in order: 1. Execute one pass of the SC algorithm through the data if merited 2. Execute the DC algorithm, which recursively splits the data into smaller problem sizes 3. Switch to the SC algorithm for problem sizes below a certain limit In order to determine whether Step 1 should be executed, and to determine at what problem size the switch in Step 3 should be made, estimates of both algorithms' run times as a function of the size of the data set, the dimension of the data set, and the expected number of nondominated points are needed. These are developed in the thesis. To aid in increasing the speed and reducing the computational and storage complexity of the algorithm, and to enable the ability for the algorithm to adapt to the data, a canonical transformation of the data to a Lattice Latin Hypercube (LLH) form is developed. The transformation preserves the Pareto property of points but reduces storage spage and algorithm run time. In order to test the three algorithms, three different methods for creating randomized data sets with arbitrary dimensionality and numbers of nondominated points are developed. Each of the methods provides insight into the properties of nondominated sets, along with providing test cases for the algorithms. Additionally, a spacecraft design problem is developed to serve as a source of reald world test data.