Authors: MANOJ MANUJA, DEEPAK GARG
Abstract: Over the last couple of decades, web classification has gradually transitioned from a syntax- to semantic-centered approach that classifies the text based on domain ontologies. These ontologies are either built manually or populated automatically using machine learning techniques. A prerequisite condition to build such systems is the availability of ontology, which may be either full-fledged domain ontology or a seed ontology that can be enriched automatically. This is a dependency condition for any given semantics-based text classification system. We share the details of a proof of concept of a web classification system that is self-governed in terms of ontology population and does not require any prebuilt ontology, neither full-fledged nor seed. It starts from a user query, builds a seed ontology from it, and automatically enriches it by extracting concepts from the downloaded documents only. The evaluated parameters like precision (85{\%}), accuracy (86{\%}), AUC (convex), and MCC (high positive) demonstrate the better performance of the proposed system when compared with similar automated text classification systems.
Keywords: Ontology, support vector machine, resource description framework, text classification
Full Text: PDF