### Scientific Discovery using Genetic Programming

Abstract

Genetic Programming is capable of automatically inducing symbolic computer
programs on the basis of a set of examples or their performance in a simulation.
Mathematical expressions are a well-defined subset of symbolic computer programs and
are also suitable for optimization using the genetic programming paradigm. The
induction of mathematical expressions based on data is called symbolic regression.
In this work, genetic programming is extended to not just fit the data i.e., get the
numbers right, but also to get the dimensions right. For this units of measurement
are used. The main contribution in this work can be summarized as:
The symbolic expressions produced by genetic programming can be
made suitable for analysis and interpretation by using units of
measurement to guide or restrict the search.
To achieve this, the following has been accomplished:
.A standard genetic programming system is modified to be able to induce
expressions that more-or-less abide type constraints. This system is used to
implement a preferential bias towards dimensionally correct solutions.
.A novel genetic programming system is introduced that is able to induce
expressions in languages that need context-sensitive constraints. It is
demonstrated that this system can be used to implement a declarative bias towards
1. the exclusion of certain syntactical constructs;
2. the induction of expressions that use units of measurement;
3. the induction of expressions that use matrix algebra;
4. the induction of expressions that are numerically stable and correct.
.A case study using tour real-world problems in the induction of dimensionally
correct empirical-equations on data using the two different methods is
presented to illustrate the use and limitations of these methods in a framework
of scientific discovery.