A supervised learning approach for detecting erroneous samples in embeddings

Authors: GÖRKEM SAYGILI

Abstract: Visualizing multidimensional data has been a crucial task in recent years regarding the growing amount of data from various sources. To achieve this, dimensionality reduction algorithms have been used to reduce the number of dimensions for visualization of the data on a screen. However, these algorithms may fail to faithfully represent high dimensional data in lower dimensions and eventually lead to erroneous visualizations. In this work, we propose an error detection algorithm for dimensionality reduction algorithms based on recently developed error prediction algorithms for medical image registration. The proposed algorithm matches the neighborhoods of high and low dimensional data with different similarity measures and predicts the errors using a random forest classifier. The results on three datasets show that the proposed algorithm can successfully detect errors with an accuracy up to 86% and area under the curve score of 0.81.

Keywords: Dimensionality reduction, error estimation, t-SNE, random forests, matching

Full Text: PDF