Authors: MARYAM JALALI, MORTEZA ZAHEDI, ABDOLALI BASIRI
Abstract: Many text mining methods have used statistical information as text and language-independent procedures that are not deterministic. On the other hand, grammatical structure-based methods are limited to use in a certain language and text. We aim to suggest an algorithmic algebraic equation in a deterministic and nonprobabilistic way while maintaining the advantage of language independence. We propose a mathematical approach that transforms text and labels into a set of dumb equations. By solving the equations, each word is assigned a weight that can reflect the semantic information of that word, then we use the proposed algorithm to build a novel sentiment dictionary. We propose a purely mathematical approach to remove less informative tokens preprocessing steps and to pay attention to specific semantically rich words and contents. This is done by applying automatic weight allocation capability to understand each word?s meaning in the user?s notes in various texts. Solving a set of dumb equations is one of the strengths of the proposed algorithm. Finally, we evaluated the proposed algorithm in a sentiment analysis (SA) case study, and the Taboada database and its capability and efficiency in weight allocation and creation of automated dictionaries have been demonstrated. The numerical results show up to 15% of improvement in all parts of the database compared to existing methods.
Keywords: Algebraic approaches to semantics, applications and expert knowledge intensive systems, sentiment analysis, text mining, text processing
Full Text: PDF