Probabilistic topic models have become a standard in modern machine learning with wide applications in organizing and summarizing `documents' in high-dimensional data such as images, videos, texts, gene expression data, and so on. Representing data by dimensional reduction of mixture proportion extracted from topic models is not only richer in semantics than bag-of-word interpretation, but also more informative for classification tasks. This paper describes the Topic Model Kernel (TMK), a high dimensional mapping for Support Vector Machine classification of data generated from probabilistic topic models. The applicability of our proposed kernel is demonstrated in several classification tasks from real world datasets. We outperform existing kernels on the distributional features and give the comparative results on non-probabilistic data types.
[ bib | .pdf ]