Authors: DEMET ÇANGA, MUSTAFA BOĞA
Abstract: In this study, it is aimed to determine pregnancy outcomes by using multivariate adaptive regression splines (MARS) algorithm for classification type problems. For this purpose, data obtained from a private dairy farm in the Konya region of Türkiye in 2020 were used to determine pregnancy outcomes in Holstein dairy cattle. It has been determined how to perform statistical analyses on solving classification-type problems with the MARS algorithm and how to use R packages (caret and earth) by creating an R script file. After the analysis, the MARS estimation equation was created and in finding the probability of being pregnant: While lactation period, cow age, number of lactations, insemination number, and total lactation milk yield variables are important, it was seen that 7-day mean milk yield and last lactation milk yield were not significant. Using the train function of the caret package, the number of terms that produce the highest accuracy and the degree of interaction are determined. Goodness-of-fit tests of the optimum model were calculated. Within the scope of the evaluation of the generalization ability of the model, training and test sets were created, the classification success graph of the MARS algorithm, the model building phase were summarized, and the generalization ability of the established model was measured. When the pregnancy status is taken as a positive reference, the correct classification rate (sensitivity) of the animal with positive pregnancy status was found to be 0.9574. The correct classification rate (specificity) of pregnant animals was found to be 0.8370. The overall classification ratio of the training set (accuracy) was found to be 0.8777. The area under the ROC curve (AUC) was found to be 0.947, which indicates that the optimum specificity value is close to 1.
Keywords: Logistic regression, classification, binary data, train and test set, Holstein breed
Full Text: PDF