Authors: ALEV MUTLU, FURKAN GÖZ, ORHAN AKBULUT
Abstract: Discretization is the process of converting continuous values into discrete values. It is a preprocessing step of several machine learning and data mining algorithms and the quality of discretization may drastically affect the performance of these algorithms. In this study we propose a discretization algorithm, namely line fitting-based discretization (lFIT), based on the Ramer--Douglas--Peucker algorithm. It is a static, univariate, unsupervised, splitting-based, global, and incremental discretization method where intervals are determined based on the Ramer--Douglas--Peucker algorithm and the quality of partitioning is assessed based on the standard error of the estimate. To evaluate the performance of the proposed method, a set of experiments are conducted on ten benchmark datasets and the achieved results are compared to those obtained by eight state-of-the-art discretization methods. Experimental results show that lFIT achieves higher predictive accuracy and produces less number of inconsistency while it generates larger number of intervals. The obtained results are also validated through Friedman's test and Holm's post hoc test which revealed the fact that lFIT produces discretization schemes that statistically comply both with supervised and unsupervised discretization methods.
Keywords: Unsupervised discretization, the Ramer--Douglas--Peucker algorithm, polyline simplification, the standard error of the estimate
Full Text: PDF