Feature selection optimization with filtering and wrapper methods: two disease classification cases

Authors: SERHAT ATİK, TUĞBA DALYAN

Abstract: Discarding the less informative and redundant features helps to reduce the time required to train a learning algorithm and the amount of storage required, improving the learning accuracy as well as the quality of results. In this study, we present different feature selection approaches to address the problem of disease classification based on the Parkinson and Cardiac Arrhythmia datasets. For this purpose, first we utilize three filtering algorithms including the Pearson correlation coefficient, Spearman correlation coefficient, and relief. Second, metaheuristic algorithms are compared to find the most informative subset of the features to obtain better classification accuracy. As a final method, a hybrid model involving filtering algorithms is applied to the datasets to eliminate half of the features, and then a metaheuristic algorithm based on a proposed genetic algorithm is applied to the rest of the datasets. With all three methods, we use three classification algorithms: support vector machine, K-nearest neighbor, and random forest. The results show that the best scores are obtained from the metaheuristic algorithm based on the proposed genetic algorithm for both datasets. This comparative study contributes to the literature by increasing the accuracy of classification for both datasets and presenting a hybrid model with filtering and a metaheuristic algorithm.

Keywords: Feature selection, optimization algorithms, metaheuristic algorithms, genetic algorithms, filtering methods

Full Text: PDF