Authors: APDULLAH YAYIK, VEDAT AYBAR, HASAN HÜSEYİN APIK, SEVCAN İÇÖZ, BEKİR BAKAR, TUNGA GÜNGÖR
Abstract: In Turkey, Turkish Personal Data Protection Rule (PDPR) No. 6698, in force since 2016, provides protection to citizens for the legal existence of their personal data. Although the law provides excellent guidance, companies currently face challenges in complying with its regulations in terms of storing, sharing, or monitoring personal data. Since any specially designed software with wide industrial usage is not on the market, almost all of the companies have no other choice but to take expensive and error-prone operations manually to ensure their compliance. In this paper, we present an automated solution to facilitate and accelerate PDPR compliance. In a structured or unstructured document, words or phrases that could include personal data entities are tagged with the help of a Bi-LSTM based named entity recognition model and a rule-based matching component that employs contextual analysis. To find associations in personal data and obtain individual personal profiles, these entities are divided into categories according to their confidence levels. Personal profiles are constructed using an approach similar to clustering. It treats the personal data categories with high identification levels as separate clusters and finds related personal data entities at the left and/or right of its contexts. We evaluated the system on a data set formed of 70 documents of different types and personal data entities. We obtained 91.76 % micro-averaged F1-measure for personal data entity classification and 72.46 % accuracy for profile extraction. We also performed experiments related to the performance of the named entity recognition and to the time complexity of the overall system on a data set formed of 33K documents.
Keywords: Turkish Personal Data Protection Rule, named entity recognition, rule-based matching, personal data associations, relation extraction
Full Text: PDF