JZUS - Journal of Zhejiang University SCIENCE

Frontiers of Information Technology & Electronic Engineering

Accepted manuscript available online (unedited version)

Topic modeling for large-scale text data

Author(s): Xi-ming Li, Ji-hong Ouyang, You Lu
Affiliation(s): College of Computer Science and Technology, Jilin University, Changchun 130012, China; more
Corresponding email(s): liximing86@gmail.com, ouyj@jlu.edu.cn
Key Words: Latent Dirichlet allocation (LDA), Topic modeling, Online learning, Moving average

Share this article to： More <<< Previous Paper \|Next Paper >>>

Abstract: This paper develops a novel online algorithm, namely moving average stochastic variational inference (MASVI), which applies the results obtained by previous iterations to smooth out noisy natural gradients. We analyze the convergence property of the proposed algorithm and conduct a set of experiments on two large-scale collections that contain millions of documents. Experimental results indicate that in contrast to algorithms named ‘stochastic variational inference’ and ‘SGRLD’, our algorithm achieves a faster convergence rate and better performance.

大规模文本数据的主题建模

目的：研究大规模数据的主题模型在线推理算法，针对随机变分推理算法中随机梯度误差较大的问题，提出一种移动平均随机变分推理算法。
创新点：使用多次迭代的随机梯度移动平均值近似代替真实随机梯度，以此减小随机梯度和真实梯度间的误差。
方法：以主题模型的基础模型潜在狄利克雷分配为载体展开研究。考虑不同次迭代的文本子集具有不同的词汇（表1），使用不同次迭代的随机项移动平均值近似代替真实随机梯度的随机项。为尽可能保证算法的精度，使用最近R次迭代的随机项（图2）并验证所提算法的收敛性。
结论：在随机变分推理算法基础上，提出一种移动平均随机变分推理算法，实现更好的文本主题建模效果和更快的收敛速度。

关键词组：潜在狄利克雷分配；主题模型；在线学习；移动平均值

Reference

[1]Amari, S., 1998. Natural gradient works efficiently in learning. Neur. Comput., 10(2):251-276.

[2]Andrieu, C., de Freitas, N., Doucet, A., et al., 2003. An introduction to MCMC for machine learning. Mach. Learn., 50(1-2):5-43.

[3]Blatt, D., Hero, A.O., Gauchman, H., 2007. A convergent incremental gradient method with a constant step size. SIAM J. Optim., 18(1):29-51.

[4]Blei, D.M., 2012. Probabilistic topic models. Commun. ACM, 55(4):77-84.

[5]Blei, D.M., Ng, A.Y., Jordan, M.I., 2003. Latent Dirichlet allocation. J. Mach. Learn. Res., 3:993-1022.

[6]Canini, K.R., Shi, L., Griffiths, T.L., 2009. Online inference of topics with latent Dirichlet allocation. J. Mach. Learn. Res., 5(2):65-72.

[7]Griffiths, T.L., Steyvers, M., 2004. Finding scientific topics. PNAS, 101(suppl 1):5228-5235.

[8]Hoffman, M., Bach, F.R., Blei, D.M., 2010. Online learning for latent Dirichlet allocation. Advances in Neural Information Processing Systems, p.856-864.

[9]Hoffman, M., Blei, D.M., Wang, C., et al., 2013. Stochastic variational inference. J. Mach. Learn. Res., 14(1): 1303-1347.

[10]Liu, Z., Zhang, Y., Chang, E.Y., et al., 2011. PLDA+: parallel latent Dirichlet allocation with data placement and pipeline processing. ACM Trans. Intell. Syst. Technol., 2(3), Article 26.

[11]Newman, D., Asuncion, A., Smyth, P., et al., 2009. Distributed algorithms for topic models. J. Mach. Learn. Res., 10:1801-1828.

[12]Ouyang, J., Lu, Y., Li, X., 2014. Momentum online LDA for large-scale datasets. Proc. 21st European Conf. on Artificial Intelligence, p.1075-1076.

[13]Patterson, S., Teh, Y.W., 2013. Stochastic gradient Riemannian Langevin dynamics on the probability simplex. Advances in Neural Information Processing Systems, p.3102-3110.

[14]Ranganath, R., Wang, C., Blei, D.M., et al., 2013. An adaptive learning rate for stochastic variational inferencen. J. Mach. Learn. Res., 28(2):298-306.

[15]Schaul, T., Zhang, S., LeCun, Y., 2013. No more pesky learning rates. arXiv preprint, arXiv:1206:1106v2.

[16]Song, X., Lin, C.Y., Tseng, B.L., et al., 2005. Modeling and predicting personal information dissemination behavior. Proc. 11th ACM SIGKDD Int. Conf. on Knowledge Discovery in Data Mining, p.479-488.

[17]Tadić, V.B., 2009. Convergence rate of stochastic gradient search in the case of multiple and non-isolated minima. arXiv preprint, arXiv:0904.4229v2.

[18]Teh, Y.W., Newman, D., Welling, M., 2007. A collapsed variational Bayesian inference algorithm for latent Dirichlet allocation. Advances in Neural Information Processing Systems, p.1353-1360.

[19]Wang, C., Chen, X., Smola, A.J., et al., 2013. Variance reduction for stochastic gradient optimization. Advances in Neural Information Processing Systems, p.181-189.

[20]Wang, Y., Bai, H., Stanton, M., et al., 2009. PLDA: parallel latent Dirichlet allocation for large-scale applications. Proc. 5th Int. Conf. on Algorithmic Aspects in Information and Management, p.301-314.

[21]Yan, F., Xu, N., Qi, Y., 2009. Parallel inference for latent Dirichlet allocation on graphics processing units. Advances in Neural Information Processing Systems, p.2134-2142.

[22]Ye, Y., Gong, S., Liu, C., et al., 2013. Online belief propagation algorithm for probabilistic latent semantic analysis. Front. Comput. Sci., 7(5):526-535.

Open peer comments: Debate/Discuss/Question/Opinion

<1>

- Go to

大规模文本数据的主题建模

Darkslateblue:Affiliate; Royal Blue:Author; Turquoise:Article

Reference