Data analysis through social media according to the classified crime

Authors: SERKAN SAVAŞ, NURETTİN TOPALOĞLU

Abstract: The amount and variety of data generated through social media sites has increased along with the widespread use of social media sites. In addition, the data production rate has increased in the same way. The inclusion of personal information within these data makes it important to process the data and reach meaningful information within it. This process can be called intelligence and this meaningful information may be for commercial, academic, or security purposes. An example application is developed in this study for intelligence on Twitter. Crimes in Turkey are classified according to Turkish Statistical Institute criminal data and keywords are defined according to this data. A total of 150,000 tweet data in the Turkish language are collected from Twitter between specified dates and processed by Turkish Zemberek natural language processing. It is seen that 56 % of the people are talking about terrorist attacks and bombing attacks on the study dates. The words "bomb", "terror", "attack", "organization", and "explode" have percentages of 24 %, 12 %, 8 %, 6 %, and 6 %, respectively. Moreover, associations between words and situations are found. Correlations are important to create new subclusters like "terror" and "rape" in this study with 0.90 correlation. Bigger masses can be accessible by expanding keyword groups to have a clear picture of the real situation.

Keywords: Big data, social media, Twitter stream, Zemberek-NLP, data mining, text mining, commercial intelligence, academic intelligence, security intelligence, cyber intelligence

Full Text: PDF