Authors: PREMA RAMASAMY, PREMALATHA KANDHASAMY
Abstract: Analysis of gene expression data is essential in microarray gene expression in order to retrieve the required information. Gene expression data generally contain a large number of genes but a small number of samples. The complicated relations among the different genes make analysis more difficult, and removing irrelevant genes improves the quality of results. This paper presents two fuzzy preprocessing techniques, using a fuzzy set (FS) and intuitionistic fuzzy set (IFS), to normalize datasets. In the feature selection part, four statistical methods were used. Using three publicly available gene expression datasets, the fuzzy normalization techniques were compared with two standard normalization techniques (min-max and Z-score) as well as raw gene expression. The classifiers of support vector machine, k-nearest-neighbor, and random forest were used to identify the accuracy of selected features. The experimental results show that the genes selected using FS- and IFS-normalized datasets give high classification accuracy; in addition, IFS outperforms FS normalization.
Keywords: Gene expression data, feature selection, classification, intuitionistic fuzzy normalization
Full Text: PDF