Authors: KANU GOEL, SHALINI BATRA
Abstract: Concept drift is the phenomenon where underlying data distribution changes over time unexpectedly. Examining such drifts and getting insight into the executing processes at that instance of time is a big challenge. Prediction models should be capable of handling drifts in scenarios where statistical properties show abrupt changes. Various strategies exist in the literature to deal with such challenging scenarios but the majority of them are limited to the identification of a particular kind of drift pattern. The proposed approach uses online drift detection in a diversified adaptive setting with pruning techniques to formulate a concept drift handling approach, named ensemble-based online diversified drift detection (En-ODDD), with an aim to identify the majority of drifts including abrupt, gradual, recurring, mixed, etc. in a single model. En-ODDD is equipped with a dynamically updated ensemble to speed up the adaptability to changing distributions. Unlike prevalent approaches, which do not consider correlations between experts, En-ODDD entails experts using varying randomized subsets of input data. Different levels of sampling having been applied for diversity generation to promote generalization. Prediction accuracy has been used to evaluate the effectiveness of the proposed approach using Massive Online Analysis software and compared with ten state-of-the-art algorithms. Experimental results on fifteen benchmark datasets (artificial and real-world) having up to one million instances depict that En-ODDD outperforms the existing approaches irrespective of nature of drift.
Keywords: Concept drift, ensemble learning, classification, diversity, data streams, machine learning
Full Text: PDF