Application of Computational Intelligence for Source Code Classification

Abstract

Multi-language Source Code Management systems have been largely used to collaboratively manage software development projects. These systems represent a fundamental step in order to fully use communication enhancements by producing concrete value on the way people collaborate to produce more reliable computational systems. These systems evaluate results of analyses in order to organise and optimise source code. These analyses are strongly dependent on technologies (i.e. framework, programming language, libraries) each of them with their own characteristics and syntactic structure. To overcome such limitation, source code classification is an essential preprocessing step to identify which analyses should be evaluated. This paper introduces a new approach for generating content-based classifiers by using Evolutionary Algorithms. Experiments were performed on real world source code collected from more than 200 different open source projects. Results show us that our approach can be successfully used for creating more accurate source code classifiers. The resulting classifier is also expansible and flexible to new classification scenarios (opening perspectives for new technologies).