An Analysis of Clustering Objectives for Feature Selection Applied to Encrypted Traffic Identification


Abstract

This work explores the use of clustering objectives in a Multi-Objective Genetic Algorithm (MOGA) for both, feature selection and cluster count optimization, under the application of flow based encrypted traffic identification. We first explore whether it is possible to achieve the performance of a gold standard model (i.e., classification objectives), using a MOGA based on clustering objectives. Then, we explore the performance gain (if it exists) of applying a logarithmic transformation to the data prior to running the MOGA. Results show that MOGA trained with clustering objectives can closely reproduce the behavior of a gold standard model, not only in terms of the selected features, but also in terms of the achieved detection rate and false positives rate, above 90% and less than 1% respectively. On the other hand, no gain was observed by applying logarithmic transformation to the data.