Authors: BİHTER DAŞ, SUAT TORAMAN, İBRAHİM TÜRKOĞLU
Abstract: The identification of DNA sequences as exon and intron is a common problem in genome analysis. The methods used for feature extraction and mapping techniques for the digitization of sequences affect directly the solution of this problem. The existing mapping techniques are not enough to detect coding and noncoding regions in some genomes because the digital representation of each base in a DNA sequence with an integer does not fully reflect the structure of an original DNA sequence. In the entropy-based mapping technique, we could overcome this problem because the technique deepens distinction rates of exon regions, and better reflects the complexity of DNA sequences. Moreover, in the literature, features are extracted by using various statistical techniques. The statistical features to be extracted are chosen by a system designer's experience. The other proposed approach in this study is to carry out the feature extraction using the transfer learning method. Transfer learning and feature extraction are performed automatically by convolutional neural network models as independent of the data set. In this study, we propose a new method to classify DNA sequences as exon and intron using two approaches. In the first approach, the entropy-based numerical technique was used for the numerical representation of DNA sequences. In the second approach, transfer learning was used to extract features. Then, the obtained features were classified by support vector machine and k -nearest neighbors algorithm. As a result of the classification, accurate performance with 97.8% was achieved. The performance of the current method was compared with the other numerical mapping techniques and feature extraction methods. The results showed that the developed method was much more successful than other methods.
Keywords: DNA, genome analysis, convolutional neural network, classification, entropy-based mapping technique
Full Text: PDF