### Algorithms to Identify Pareto Points in Multi-Dimensional Data Sets

Abstract

The focus of this research is on developing a fast, efficient
hybrid algorithm to identify the Pareto frontier in multi-dimensional
data sets. The hybrid algorithm is a blend of two different base
algorithms, the Simple Cull (SC) algorithm that has a low overhead
but is of overall high computational complexity, and the Divide &
Conquer (DC) algorithm that has a lower computational complexity
but has a high overhead. The hybrid algorithm employs aspects of
each of the two based algorithms, adapting in response to the
properties of the data.
Each of the two base algorithms perform better for different classes
of data, with the SC algorithm performing best for data sets with
few nondominated points, high dimensionality, or fewer total numbers
of points, while the DC algorithm performs better otherwise. The
general approach to the hybrid algorithm is to execute the following
steps in order:
1. Execute one pass of the SC algorithm through the data if merited
2. Execute the DC algorithm, which recursively splits the data into
smaller problem sizes
3. Switch to the SC algorithm for problem sizes below a certain limit
In order to determine whether Step 1 should be executed, and to
determine at what problem size the switch in Step 3 should be made,
estimates of both algorithms' run times as a function of the size
of the data set, the dimension of the data set, and the expected
number of nondominated points are needed. These are developed in
the thesis.
To aid in increasing the speed and reducing the computational and
storage complexity of the algorithm, and to enable the ability for
the algorithm to adapt to the data, a canonical transformation of
the data to a Lattice Latin Hypercube (LLH) form is developed. The
transformation preserves the Pareto property of points but reduces
storage spage and algorithm run time.
In order to test the three algorithms, three different methods for
creating randomized data sets with arbitrary dimensionality and
numbers of nondominated points are developed. Each of the methods
provides insight into the properties of nondominated sets, along
with providing test cases for the algorithms. Additionally, a
spacecraft design problem is developed to serve as a source of
reald world test data.