Toxicity prediction of small drug molecules of aryl hydrocarbon receptor using a proposed ensemble model

Authors: VISHAN KUMAR GUPTA, PRASHANT SINGH RANA

Abstract: Quantitative structure-activity relationships and quantitative structure?property relationships have proved their usefulness for predicting toxicities of drug molecules regarding their biological activities. In silico toxicity prediction techniques are essential for reducing testing on rodents (in vivo) and for a less time-consuming and more cost-efficient alternative for the identification of toxic effects at an early stage of drug development. The authors aim to build a prediction model for better assessment of toxicity to quickly and efficiently test whether certain chemical compounds have the potential to disrupt the processes in the human body that may adversely affect human health. Here, we have proposed a computational method (in silico) for the toxicity prediction of small drug molecules using their various physicochemical properties (molecular descriptors) that can bind to the aryl hydrocarbon receptor. Pharmaceutical data exploration laboratory software is used for extracting the features of drug molecules. The dataset of the aryl hydrocarbon receptor contains 9008 drug molecules, where 1063 are active and 7945 are inactive, and each drug molecule contains 1444 features. It is a novel prediction model based on ensemble learning that can efficiently classify active (binding) and inactive (nonbinding) compounds of the dataset. In our proposed ensemble model, we primarily performed feature selection using the Boruta library in R, after which we resolved the class imbalance problem itself by ensemble learning where we divided the dataset into seven data frames, which have approximately equal numbers of active and inactive drug molecules. An ensemble model based upon the votes of seven random forest models is proposed, which gives an accuracy of 93.76%. K-fold cross-validation is conducted to measure the consistency of the model. Finally, the validity of the proposed ensemble model for some drug molecules of acquired immune deficiency syndrome therapy and androgen receptor has been proved.

Keywords: Aryl hydrocarbon receptor, molecular descriptor, feature selection, class imbalance, toxicity, ensemble model

Full Text: PDF