CLC number: TP391.1

On-line Access: 2015-06-04

Received: 2014-10-15

Revision Accepted: 2015-03-12

Crosschecked: 2015-05-07

Cited: 3

Clicked: 2334

Xi-ming Li


Frontiers of Information Technology & Electronic Engineering  2015 Vol.16 No.6 P.457-465


Topic modeling for large-scale text data

Author(s):  Xi-ming Li, Ji-hong Ouyang, You Lu

Affiliation(s):  College of Computer Science and Technology, Jilin University, Changchun 130012, China; more

Corresponding email(s):   liximing86@gmail.com, ouyj@jlu.edu.cn

Key Words:  Latent Dirichlet allocation (LDA), Topic modeling, Online learning, Moving average

Xi-ming Li, Ji-hong Ouyang, You Lu. Topic modeling for large-scale text data[J]. Frontiers of Information Technology & Electronic Engineering, 2015, 16(6): 457-465.

This paper develops a novel online algorithm, namely moving average stochastic variational inference (MASVI), which applies the results obtained by previous iterations to smooth out noisy natural gradients. We analyze the convergence property of the proposed algorithm and conduct a set of experiments on two large-scale collections that contain millions of documents. Experimental results indicate that in contrast to algorithms named ‘stochastic variational inference’ and ‘SGRLD’, our algorithm achieves a faster convergence rate and better performance.

Overall, I liked the idea introduced by the paper, as well as the large empirical case study. Scaling up topic models without loss of precision indeed is an important area.




