CLC number: TP391.3
On-line Access: 2018-12-14
Received: 2016-08-15
Revision Accepted: 2017-07-12
Crosschecked: 2018-11-12
Cited: 0
Clicked: 6250
Wei Song, Ying Liu, Li-zhen Liu, Han-shi Wang. Semantic composition of distributed representations for query subtopic mining[J]. Frontiers of Information Technology & Electronic Engineering, 2018, 19(11): 1409-1419.
@article{title="Semantic composition of distributed representations for query subtopic mining",
author="Wei Song, Ying Liu, Li-zhen Liu, Han-shi Wang",
journal="Frontiers of Information Technology & Electronic Engineering",
volume="19",
number="11",
pages="1409-1419",
year="2018",
publisher="Zhejiang University Press & Springer",
doi="10.1631/FITEE.1601476"
}
%0 Journal Article
%T Semantic composition of distributed representations for query subtopic mining
%A Wei Song
%A Ying Liu
%A Li-zhen Liu
%A Han-shi Wang
%J Frontiers of Information Technology & Electronic Engineering
%V 19
%N 11
%P 1409-1419
%@ 2095-9184
%D 2018
%I Zhejiang University Press & Springer
%DOI 10.1631/FITEE.1601476
TY - JOUR
T1 - Semantic composition of distributed representations for query subtopic mining
A1 - Wei Song
A1 - Ying Liu
A1 - Li-zhen Liu
A1 - Han-shi Wang
J0 - Frontiers of Information Technology & Electronic Engineering
VL - 19
IS - 11
SP - 1409
EP - 1419
%@ 2095-9184
Y1 - 2018
PB - Zhejiang University Press & Springer
ER -
DOI - 10.1631/FITEE.1601476
Abstract: Inferring query intent is significant in information retrieval tasks. Query subtopic mining aims to find possible subtopics for a given query to represent potential intents. subtopic mining is challenging due to the nature of short queries. Learning distributed representations or sequences of words has been developed recently and quickly, making great impacts on many fields. It is still not clear whether distributed representations are effective in alleviating the challenges of query subtopic mining. In this paper, we exploit and compare the main semantic composition of distributed representations for query subtopic mining. Specifically, we focus on two types of distributed representations: paragraph vector which represents word sequences with an arbitrary length directly, and word vector composition. We thoroughly investigate the impacts of semantic composition strategies and the types of data for learning distributed representations. Experiments were conducted on a public dataset offered by the National Institute of Informatics Testbeds and Community for Information Access Research. The empirical results show that distributed semantic representations can achieve outstanding performance for query subtopic mining, compared with traditional semantic representations. More insights are reported as well.
[1]Anagnostopoulos I, Razis G, Mylonas P, et al., 2015. Semantic query suggestion using Twitter entities. Neurocomputing, 163:137-150.
[2]Baeza-Yates R, Hurtado C, Mendoza M, 2005. Query recommendation using query logs in search engines. LNCS, 3268:588-596.
[3]Baroni M, Dinu G, Kruszewski G, 2014. Don’t count, predict! A systematic comparison of context-counting vs. context- predicting semantic vectors. Proc 52nd Annual Meeting of the Association for Computational Linguistics, p.238-247.
[4]Beeferman D, Berger A, 2000. Agglomerative clustering of a search engine query log. Proc 6th ACM SIGKDD Int Conf on Knowledge Discovery and Data Mining, p.407-416.
[5]Bengio Y, Ducharme R, Vincent P, et al., 2003. A neural probabilistic language model. J Mach Learn Res, 3: 1137-1155.
[6]Clarke CLA, Craswell N, Soboroff I, 2009. Overview of the TREC 2009 web track. 18th Text Retrieval Conf, p.1-9.
[7]Collobert R, Weston J, 2008. A unified architecture for natural language processing: deep neural networks with multitask learning. Proc 25th Int Conf on Machine Learning, p.160-167.
[8]Damien A, Zhang M, Liu Y, et al., 2013. Improve web search diversification with intent subtopic mining. CCIS, 400: 322-333.
[9]Dang V, Xue X, Croft WB, 2011. Inferring query aspects from reformulations using clustering. Proc 20th ACM Int Conf on Information and Knowledge Management, p.2117- 2120.
[10]Grefenstette E, Dinu G, Zhang YZ, et al., 2013. Multi-step regression learning for compositional distributional semantics. https://arxiv.org/abs/1301.6939
[11]Hu J, Wang G, Lochovsky F, et al., 2009. Understanding user’s query intent with Wikipedia. Proc 18th Int Conf on World Wide Web, p.471-480.
[12]Hu Y, Qian Y, Li H, et al., 2012. Mining query subtopics from search log data. Proc 35th Int ACM SIGIR Conf on Research and Development in Information Retrieval, p.305-314.
[13]Jiang X, Han X, Sun L, 2011. ISCAS at subtopic mining task in NTCIR9. Proc NTCIR-9 Workshop Meeting, p.168-171.
[14]Joho H, Kishida K, 2014. Overview of NTCIR-11. Proc 11th NTCIR Conf, p.1-7.
[15]Jones R, Klinkner KL, 2008. Beyond the session timeout: automatic hierarchical segmentation of search topics in query logs. Proc 17th ACM Conf on Information and Knowledge Management, p.699-708.
[16]Karunasekera S, Harwood A, Samarawickrama S, et al., 2014. Topic-specific post identification in microblog streams. IEEE Int Conf on Big Data, p.7-13.
[17]Kim SJ, Lee JH, 2013. Subtopic mining based on head- modifier relation and co-occurrence of intents using web documents. LNCS, 8138:179-191.
[18]Le Q, Mikolov T, 2014. Distributed representations of sentences and documents. Proc 31st Int Conf on Machine Learning, p.1188-1196.
[19]Li X, Wang YY, Acero A, 2008. Learning query intent from regularized click graphs. Proc 31st Annual Int ACM SIGIR Conf on Research and Development in Information Retrieval, p.339-346.
[20]Liu Y, Song R, Zhang M, et al., 2014. Overview of the NTCIR-11 IMine task. Proc 11th NTCIR Conf, p.8-23.
[21]Luo C, Liu Y, Zhang M, et al., 2014. Query recommendation based on user intent recognition. J Chin Inform Process, 28(1):64-72 (in Chinese).
[22]Mikolov T, Chen K, Corrado G, et al., 2013a. Efficient estimation of word representations in vector space. https://arxiv.org/abs/1301.3781
[23]Mikolov T, Yih WT, Zweig G, 2013b. Linguistic regularities in continuous space word representations. Proc NAACL- HLT, p.746-751.
[24]Mitchell J, Lapata M, 2010. Composition in distributional models of semantics. Cogn Sci, 34(8):1388-1429.
[25]Mnih A, Hinton G, 2007. Three new graphical models for statistical language modelling. Proc 24th Int Conf on Machine Learning, p.641-648.
[26]Radlinski F, Szummer M, Craswell N, 2010. Inferring query intent from reformulations and clicks. Proc 19th Int Conf on World Wide Web, p.1171-1172.
[27]Rafiei D, Bharat K, Shukla A, 2010. Diversifying web search results. Proc 19th Int Conf on World Wide Web, p.781-790.
[28]Sakai T, Song R, 2011. Evaluating diversified search results using per-intent graded relevance. Proc 34th Int ACM SIGIR Conf on Research and Development in Information Retrieval, p.1043-1052.
[29]Sakai T, Dou Z, Yamamoto T, et al., 2013. Overview of the NTCIR-10 INTENT-2 task. Proc 10th NTCIR Conf, p.94-123.
[30]Santos RLT, Macdonald C, Ounis I, 2010. Exploiting query reformulations for web search result diversification. Proc 19th Int Conf on World Wide Web, p.881-890.
[31]Socher R, Lin CC, Ng AY, et al., 2011a. Parsing natural scenes and natural language with recursive neural networks. Proc 28th Int Conf on Machine Learning, p.129-136.
[32]Socher R, Pennington J, Huang EH, et al., 2011b. Semi- supervised recursive autoencoders for predicting sentiment distributions. Proc Conf on Empirical Methods in Natural Language Processing, Association for Computational Linguistics, p.151-161.
[33]Song R, Luo Z, Nie JY, et al., 2009. Identification of ambiguous queries in web search. Inform Process Manag, 45(2): 216-229.
[34]Song R, Zhang M, Sakai T, et al., 2011. Overview of the NTCIR-9 INTENT task. Proc NTCIR-9 Workshop Meeting, p.82-105.
[35]Song W, Yu Q, Xu ZH, et al., 2012. Multi-aspect query summarization by composite query. Proc 35th Int ACM SIGIR Conf on Research and development in Information Retrieval, p.325-334.
[36]Song W, Liu Y, Liu L, et al., 2016. Examining personalization heuristics by topical analysis of query log. Int J Innov Comput Inform Contr, 12(5):1745-1760.
[37]Strohmaier M, Kröll M, Körner C, 2009. Intentional query suggestion: making user goals more explicit during search. Proc Workshop on Web Search Click Data, p.68-74.
[38]Wang CJ, Lin YW, Tsai MF, et al., 2013. Mining subtopics from different aspects for diversifying search results. Inform Retriev, 16(4):452-483.
[39]Xu J, Croft WB, 1996. Query expansion using local and global document analysis. Proc 19th Annual Int ACM SIGIR Conf on Research and Development in Information Retrieval, p.4-11.
[40]Yu M, Dredze M, 2015. Learning composition models for phrase embeddings. Trans Assoc Comput Ling, 3:227-242.
[41]Zanzotto FM, Korkontzelos I, Fallucchi F, et al., 2010. Estimating linear models for compositional distributional semantics. Proc 23rd Int Conf on Computational Linguistics, p.1263-1271.
[42]Zeng HJ, He QC, Chen Z, et al., 2004. Learning to cluster web search results. Proc 27th Annual Int ACM SIGIR Conf on Research and Development in Information Retrieval, p.210-217.
[43]Zhao Y, Liu Z, Sun M, 2015. Phrase type sensitive tensor indexing model for semantic composition. Proc 29th AAAI Conf on Artificial Intelligence, p.2195-2201.
[44]Zheng W, Fang H, 2011. A comparative study of search result diversification methods. 1st Int Workshop on Diversity in Document Retrieval, p.55-62.
[45]Zheng W, Fang H, Yao C, et al., 2014. Leveraging integrated information to extract query subtopics for search result diversification. Inform Retriev, 17(1):52-73.
Open peer comments: Debate/Discuss/Question/Opinion
<1>