Early diagnosis of pancreatic cancer by machine learning methods using urine biomarker combinations

Authors: İREM ACER, FIRAT ORHAN BULUCU, SEMRA İÇER, FATMA LATİFOĞLU

Abstract: The most common type of pancreatic cancer is pancreatic ductal adenocarcinoma (PDAC), which accounts for the vast majority of pancreatic cancers. The five-year survival rate for PDAC due to late diagnosis is 9%. Early diagnosed PDAC patients survive longer than patients diagnosed at a more advanced stage. Biomarkers can play an essential role in the early detection of PDAC to assist the health professional. Machine learning and deep learning methods are used with biomarkers obtained in recent studies for diagnostic purposes. In order to increase the survival rates of PDAC patients, early diagnosis of the disease with a noninvasive test is a critical need. Our study offers a promising approach for the early detection of PDAC with noninvasive urinary biomarkers and carbohydrate antigen 19-9 (CA19-9). The Kaggle Urinary Biomarkers for Pancreatic Cancer (2020) open-access dataset consisting of 590 participants was used in this study. Seven machine learning classifiers (support vector machine (SVM), naive Bayes (NB), k-nearest neighbors (kNN), random forest (RF), light gradient boosting machine (LightGBM), AdaBoost, and gradient boosting classifier (GBC)) to detect PDAC disease classifier were used. Binary and multiple classification processes were carried out. Data was validated in our study using 5-10-fold crossvalidation. This study aimed to determine the best machine learning model by analyzing the performance of machine learning models in determining the classes of healthy controls, pancreatic disorders, and patients with PDAC. It is a remarkable finding that ensemble learning models were more successful in all our groups. The most successful classification method in classifying healthy controls and patients with PDAC was CV-10, while the GBC (92.99%) model was (AUC = 0.9761). The most successful classification method in classifying patients with pancreatic disorders and PDAC was CV-10, while the LightGBM (86.37%) model was (AUC = 0.9348). In the classification of healthy controls, pancreatic disorders, and patients with PDAC, the most successful classification method was CV-5, while the GBC (72.91%) model was (AUC = 0.8733).

Keywords: Pancreatic cancer, urine biomarker, machine learning, ensemble learning, classification

Full Text: PDF