M-FDBSCAN: A multicore density-based uncertain data clustering algorithm

Authors: ATAKAN ERDEM, TAFLAN İMRE GÜNDEM

Abstract: In many data mining applications, we use a clustering algorithm on a large amount of uncertain data. In this paper, we adapt an uncertain data clustering algorithm called fast density-based spatial clustering of applications with noise (FDBSCAN) to multicore systems in order to have fast processing. The new algorithm, which we call multicore FDBSCAN (M-FDBSCAN), splits the data domain into c rectangular regions, where c is the number of cores in the system. The FDBSCAN algorithm is then applied to each rectangular region simultaneously. After the clustering operation is completed, semiclusters that occur during splitting are detected and merged to construct the final clusters. M-FDBSCAN is tested for correctness and performance. The experiments show that there is a significant performance increase due to M-FDBSCAN, which is not just due to multicore usage.

Keywords: Data mining, uncertain data management, clustering, concurrent execution

Full Text: PDF