Fuzzy Rules for Describing Subgroups from Influenza a Virus Using a Multi-Objective Evolutionary Algorithm


Abstract

Extraction of biologically-meaningful knowledge is one of the important and challenging tasks in bioinformatics, in particular computational analysis of DNA and protein sequences, in order to identify biological function(s) and behaviour(s) of newly-extracted sequences. Computational intelligence techniques in corporation with sequence-driven features have been applied to tackle the problem and help classify different functional classes of the sequences. In order to study this problem, subgroup discovery algorithms together with a signal processing-based feature extraction method are applied, where the sequences are represented as a signal. The applicability of this method has been studied through four different Neuraminidase genes of Influenza A subtypes, H1N1, H2N2, H3N2 and H5N1. The results yielded not only higher predictive accuracy over these four classes of the proteins but also interpretable rule-based representation of the descriptive model with a significantly reduced feature set driven by means of the signal processing method. Subgroup discovery technique based on evolutionary fuzzy systems is expected to open new areas of research in bioinformatics and further help identify and understand more focused therapeutic protein targets.