CLC number: TP391.1
On-line Access: 2015-06-04
Received: 2014-10-15
Revision Accepted: 2015-03-12
Crosschecked: 2015-05-07
Cited: 3
Clicked: 7024
Xi-ming Li, Ji-hong Ouyang, You Lu. Topic modeling for large-scale text data[J]. Frontiers of Information Technology & Electronic Engineering, 2015, 16(6): 457-465.
@article{title="Topic modeling for large-scale text data",
author="Xi-ming Li, Ji-hong Ouyang, You Lu",
journal="Frontiers of Information Technology & Electronic Engineering",
volume="16",
number="6",
pages="457-465",
year="2015",
publisher="Zhejiang University Press & Springer",
doi="10.1631/FITEE.1400352"
}
%0 Journal Article
%T Topic modeling for large-scale text data
%A Xi-ming Li
%A Ji-hong Ouyang
%A You Lu
%J Frontiers of Information Technology & Electronic Engineering
%V 16
%N 6
%P 457-465
%@ 2095-9184
%D 2015
%I Zhejiang University Press & Springer
%DOI 10.1631/FITEE.1400352
TY - JOUR
T1 - Topic modeling for large-scale text data
A1 - Xi-ming Li
A1 - Ji-hong Ouyang
A1 - You Lu
J0 - Frontiers of Information Technology & Electronic Engineering
VL - 16
IS - 6
SP - 457
EP - 465
%@ 2095-9184
Y1 - 2015
PB - Zhejiang University Press & Springer
ER -
DOI - 10.1631/FITEE.1400352
Abstract: This paper develops a novel online algorithm, namely moving average stochastic variational inference (MASVI), which applies the results obtained by previous iterations to smooth out noisy natural gradients. We analyze the convergence property of the proposed algorithm and conduct a set of experiments on two large-scale collections that contain millions of documents. Experimental results indicate that in contrast to algorithms named ‘stochastic variational inference’ and ‘SGRLD’, our algorithm achieves a faster convergence rate and better performance.
Overall, I liked the idea introduced by the paper, as well as the large empirical case study. Scaling up topic models without loss of precision indeed is an important area.
[1]Amari, S., 1998. Natural gradient works efficiently in learning. Neur. Comput., 10(2):251-276.
[2]Andrieu, C., de Freitas, N., Doucet, A., et al., 2003. An introduction to MCMC for machine learning. Mach. Learn., 50(1-2):5-43.
[3]Blatt, D., Hero, A.O., Gauchman, H., 2007. A convergent incremental gradient method with a constant step size. SIAM J. Optim., 18(1):29-51.
[4]Blei, D.M., 2012. Probabilistic topic models. Commun. ACM, 55(4):77-84.
[5]Blei, D.M., Ng, A.Y., Jordan, M.I., 2003. Latent Dirichlet allocation. J. Mach. Learn. Res., 3:993-1022.
[6]Canini, K.R., Shi, L., Griffiths, T.L., 2009. Online inference of topics with latent Dirichlet allocation. J. Mach. Learn. Res., 5(2):65-72.
[7]Griffiths, T.L., Steyvers, M., 2004. Finding scientific topics. PNAS, 101(suppl 1):5228-5235.
[8]Hoffman, M., Bach, F.R., Blei, D.M., 2010. Online learning for latent Dirichlet allocation. Advances in Neural Information Processing Systems, p.856-864.
[9]Hoffman, M., Blei, D.M., Wang, C., et al., 2013. Stochastic variational inference. J. Mach. Learn. Res., 14(1): 1303-1347.
[10]Liu, Z., Zhang, Y., Chang, E.Y., et al., 2011. PLDA+: parallel latent Dirichlet allocation with data placement and pipeline processing. ACM Trans. Intell. Syst. Technol., 2(3), Article 26.
[11]Newman, D., Asuncion, A., Smyth, P., et al., 2009. Distributed algorithms for topic models. J. Mach. Learn. Res., 10:1801-1828.
[12]Ouyang, J., Lu, Y., Li, X., 2014. Momentum online LDA for large-scale datasets. Proc. 21st European Conf. on Artificial Intelligence, p.1075-1076.
[13]Patterson, S., Teh, Y.W., 2013. Stochastic gradient Riemannian Langevin dynamics on the probability simplex. Advances in Neural Information Processing Systems, p.3102-3110.
[14]Ranganath, R., Wang, C., Blei, D.M., et al., 2013. An adaptive learning rate for stochastic variational inferencen. J. Mach. Learn. Res., 28(2):298-306.
[15]Schaul, T., Zhang, S., LeCun, Y., 2013. No more pesky learning rates. arXiv preprint, arXiv:1206:1106v2.
[16]Song, X., Lin, C.Y., Tseng, B.L., et al., 2005. Modeling and predicting personal information dissemination behavior. Proc. 11th ACM SIGKDD Int. Conf. on Knowledge Discovery in Data Mining, p.479-488.
[17]Tadić, V.B., 2009. Convergence rate of stochastic gradient search in the case of multiple and non-isolated minima. arXiv preprint, arXiv:0904.4229v2.
[18]Teh, Y.W., Newman, D., Welling, M., 2007. A collapsed variational Bayesian inference algorithm for latent Dirichlet allocation. Advances in Neural Information Processing Systems, p.1353-1360.
[19]Wang, C., Chen, X., Smola, A.J., et al., 2013. Variance reduction for stochastic gradient optimization. Advances in Neural Information Processing Systems, p.181-189.
[20]Wang, Y., Bai, H., Stanton, M., et al., 2009. PLDA: parallel latent Dirichlet allocation for large-scale applications. Proc. 5th Int. Conf. on Algorithmic Aspects in Information and Management, p.301-314.
[21]Yan, F., Xu, N., Qi, Y., 2009. Parallel inference for latent Dirichlet allocation on graphics processing units. Advances in Neural Information Processing Systems, p.2134-2142.
[22]Ye, Y., Gong, S., Liu, C., et al., 2013. Online belief propagation algorithm for probabilistic latent semantic analysis. Front. Comput. Sci., 7(5):526-535.
Open peer comments: Debate/Discuss/Question/Opinion
<1>