Deakin Research

Deakin Research

print
 
Contact us
Email:

pradasrc@deakin.edu.au

Phone:

+61 3 5227 2150

Mail:

Centre for Pattern Recognition and Data Analytics
School of Information Technology
Deakin University
Locked Bag 20000
GEELONG VIC 3220

Publication Details

Copyright and Disclaimer Notice


B. Saha, D. Phung, S. Pham, and S. Venkatesh. Sparse Subspace Representation for Spectral Document Clustering. In Proceedings of 11th IEEE International Conference on Data Mining, 2012.

We present a novel method for document clustering using sparse representation of documents in conjunction with spectral clustering. An `1-norm optimization formulation is posed to learn the sparse representation of each document, allowing us to characterize the affinity between documents by considering the overall information instead of traditional pairwise similarities. This document affinity is encoded through a graph on which spectral clustering is performed. The decomposition into multiple subspaces allows documents to be part of a sub-group that shares a smaller set of similar vocabulary, thus allowing for cleaner clusters. Extensive experimental evaluations on three real-world datasets from Reuters-21578 and 20Newsgroup corpora show that our proposed method consistently outperforms state-of-theart algorithms. Significantly, the performance improvement over other methods is prominent for this datasets.

bib  .pdf ]

Deakin University acknowledges the traditional land owners of present campus sites.

30th April 2012