This paper presents a feature extraction algorithm for partial discharge (PD) pulses separation using S transform (ST)-based time-frequency representation. Firstly, the algorithm acquires a series of base vectors in the frequency domain and location vectors in the time domain obtained by applying a non-negative matrix factorization (NMF)-based matrix decomposition technique to compress ST amplitude (STA) matrices of PD pulses. Then, a new group of features including sharpness, sum of derivatives, sparsity, entropy, mean value and standard deviation is extracted from the base and location vectors, which is further separated by a fuzzy C-means (FCM) clustering algorithm. Finally, non-dominated sorting genetic algorithm II (NSGA-II) is introduced as a feature selection tool to improve the FCM clustering performance and acquire the corresponding selected feature subsets. The 600 PD pulses sampled from four typical defect models are adopted for testing. It is shown that a minimum clustering error of 7.67% with 4 dimensional optimal feature subset selected by NSGA-II is achieved when NMF parameter r = 1. In addition, NSGA-II can not only reduce the feature dimension but also dramatically improve the FCM clustering performance compared with the original extracted features. The selected four features are also examined by the data of two PD sources simultaneous active. The results demonstrate that it is feasible to apply the proposed algorithm to PD pulses separation.