Comparison of AIS and fuzzy c-means clustering methods on the classification of breast cancer and diabetes datasets

Authors: SERAL ÖZŞEN, RAHİME CEYLAN

Abstract: Data reduction is an indispensable part of pattern classification processes in many cases. If the number of samples is excessive, sample reduction or data reduction algorithms can be used for an effective processing time and reliable successive results. Many methods have been used for data reduction. Fuzzy c-means is one of these methods and it is widely used in such applications as clustering algorithms. In this study, we applied a different clustering algorithm, an artificial immune system (AIS), for the data reduction process. We realized the performance evaluation experiments on the standard Chainlink and Iris datasets, while the main application was conducted using the Wisconsin Breast Cancer and Pima Indian datasets, which were taken from the University of California, Irvine Machine Learning Repository. For these datasets, the performance of the AIS in the data reduction process was compared with the fuzzy c-means clustering algorithm, in which a multilayer perceptron artificial neural network was used as a classifier after the data reduction processes. The obtained results show that the maximum classification accuracies were obtained as 73.96 % for the Pima Indian Diabetes dataset and 97.80% for the Wisconsin Breast Cancer dataset with the AIS and the compression rates were 80% and 40% for these results. For fuzzy c-means clustering, however, the aforementioned accuracies were obtained as 63% and 86.69% for the Pima Indian Diabetes and Wisconsin Breast Cancer datasets, respectively. Moreover, the compression rates for these results for fuzzy c-means were 90% and 70%. When the mean classification accuracy values over the experimented compression rates were taken into consideration, the AIS reached a mean classification accuracy of 70.07% for the Pima Indian Diabetes dataset, while 47.64% was obtained by fuzzy c-means for this dataset. For the Wisconsin Breast Cancer dataset, however, the mean classification accuracies of the AIS and fuzzy c-means methods were recorded as 94.90% and 75.43%, respectively.

Keywords: Artificial immune systems, artificial neural networks, fuzzy c-means clustering, breast cancer dataset, diabetes dataset

Full Text: PDF