A Hybrid Classification Approach using Topic Modeling and Graph Convolution Networks

Image credit: Thomas Kipf

Abstract

Text classification has become a key operation in various natural language processing tasks. The efficiency of most classification algorithms predominantly confide in the quality of input features. In this work, we propose a novel multi-class text classification technique that harvests features from two distinct feature extraction methods. Firstly, a structured heterogeneous text graph built based on document-word relations and word co-occurrences is leveraged using a Graph Convolution Network (GCN). Secondly, the documents are topic modeled to use the document-topic score as features into the classification model. The concerned graph is constructed using Point-Wise Mutual Information (PMI) between pair of word co-occurrences and Term Frequency-Inverse Document Frequency (TF-IDF) score for words in the documents for word co-occurrences. Experimentation reveals that our text classification model outperforms the existing techniques for five benchmark text classification data sets.

Publication
2020 International Conference on Computational Performance Evaluation (ComPE)

Related