A Multi-objective Genetic Programming Approach to Uncover Explicit and Implicit Equations from Data


Abstract

Identification of implicit and explicit relationships in a data is a generic problem commonly encountered in many fields of science and engineering. In the case of explicit relations, one is interested in identifying a compact and an accurate predictor function i.e. y = f(x), while in the implicit case, one is interested in identifying an equation of the form f(x) = 0. In both these classes of problems, one would need to search through a space of mathematical expressions, while minimizing some form of error metric. Such expressions are commonly identified using genetic programming (GP). While methods to uncover explicit equations have been studied extensively in the literature, there have been limited attempts to solve implicit cases. Since there are infinite trivial implicit forms that can be generated from a given set of data, the choice of an appropriate error metric is critical in the context of implicit equation mining. In this paper, we introduce a multiobjective genetic programming approach (MOGPA) for the solution of both classes of problems. The maximum depth of a GP-tree is used as the first objective reflecting the complexity/compactness of the expressions, while the mean error, either in the predictor variable or the implicit derivatives is used as the second objective during the course of search. The performance of the approach is illustrated using four examples. The approach delivers expressions of various complexities spanning a range of accuracy levels in a single run, unlike single objective GP formulations. It was able to identify more compact and accurate explicit forms than those from previously reported studies, and the correct, most compact expressions for implicit cases.