GP Under Streaming Data Constraints: A Case for Pareto Archiving


Abstract

Classification as applied to streaming data implies that only a small number of new training instances appear at each generation and are never explicitly reintroduced by the stream. Pareto competitive coevolution provides a potential framework for archiving useful training instances between generations under an archive of finite size. Such a coevolutionary framework is de fined for the online evolution of classifiers under genetic programming. Benchmarking is performed under multi-class data sets with class imbalance and training partitions with between 1,000's to 100,000's of instances. The impact of enforcing different constraints for accessing the stream are investigated. The role of online adaptation is explicitly documented and tests made on the relative impact of label error on the quality of streaming classifier results.