Full Text:   <581>

Summary:  <150>

CLC number: TP391.3

On-line Access: 2018-12-14

Received: 2016-08-15

Revision Accepted: 2017-07-12

Crosschecked: 2018-11-12

Cited: 0

Clicked: 1649

Citations:  Bibtex RefMan EndNote GB/T7714

 ORCID:

Ying Liu

https://orcid.org/0000-0002-9125-4326

-   Go to

Article info.
Open peer comments

Frontiers of Information Technology & Electronic Engineering  2018 Vol.19 No.11 P.1409-1419

http://doi.org/10.1631/FITEE.1601476


Semantic composition of distributed representations for query subtopic mining


Author(s):  Wei Song, Ying Liu, Li-zhen Liu, Han-shi Wang

Affiliation(s):  Information and Engineering College, Capital Normal University, Beijing 100048, China

Corresponding email(s):   liz_liu7480@cnu.edu.cn

Key Words:  Subtopic mining, Query intent, Distributed representation, Semantic composition


Wei Song, Ying Liu, Li-zhen Liu, Han-shi Wang. Semantic composition of distributed representations for query subtopic mining[J]. Frontiers of Information Technology & Electronic Engineering, 2018, 19(11): 1409-1419.

@article{title="Semantic composition of distributed representations for query subtopic mining",
author="Wei Song, Ying Liu, Li-zhen Liu, Han-shi Wang",
journal="Frontiers of Information Technology & Electronic Engineering",
volume="19",
number="11",
pages="1409-1419",
year="2018",
publisher="Zhejiang University Press & Springer",
doi="10.1631/FITEE.1601476"
}

%0 Journal Article
%T Semantic composition of distributed representations for query subtopic mining
%A Wei Song
%A Ying Liu
%A Li-zhen Liu
%A Han-shi Wang
%J Frontiers of Information Technology & Electronic Engineering
%V 19
%N 11
%P 1409-1419
%@ 2095-9184
%D 2018
%I Zhejiang University Press & Springer
%DOI 10.1631/FITEE.1601476

TY - JOUR
T1 - Semantic composition of distributed representations for query subtopic mining
A1 - Wei Song
A1 - Ying Liu
A1 - Li-zhen Liu
A1 - Han-shi Wang
J0 - Frontiers of Information Technology & Electronic Engineering
VL - 19
IS - 11
SP - 1409
EP - 1419
%@ 2095-9184
Y1 - 2018
PB - Zhejiang University Press & Springer
ER -
DOI - 10.1631/FITEE.1601476


Abstract: 
Inferring query intent is significant in information retrieval tasks. Query subtopic mining aims to find possible subtopics for a given query to represent potential intents. subtopic mining is challenging due to the nature of short queries. Learning distributed representations or sequences of words has been developed recently and quickly, making great impacts on many fields. It is still not clear whether distributed representations are effective in alleviating the challenges of query subtopic mining. In this paper, we exploit and compare the main semantic composition of distributed representations for query subtopic mining. Specifically, we focus on two types of distributed representations: paragraph vector which represents word sequences with an arbitrary length directly, and word vector composition. We thoroughly investigate the impacts of semantic composition strategies and the types of data for learning distributed representations. Experiments were conducted on a public dataset offered by the National Institute of Informatics Testbeds and Community for Information Access Research. The empirical results show that distributed semantic representations can achieve outstanding performance for query subtopic mining, compared with traditional semantic representations. More insights are reported as well.

基于分布式表示语义组合的查询子主题挖掘

摘要:推断查询意图对于信息检索具有重要意义。查询子主题挖掘旨在找到可能的子主题,用于表示给定查询的潜在意图。由于查询较短,子主题挖掘具有挑战性。学习词或句子分布式表示推动和影响了很多领域的发展。然而,没有清晰的结论表明该分布式表示是否有助于应对查询子主题挖掘面临的挑战。提出并比较利用分布式表示的语义组合进行查询子主题挖掘。采用两种分布式表示策略:能学习任意长度文本分布式表示的段落向量(paragraph vector)以及词向量的语义组合。探索了语义组合策略和数据类型对查询表示的影响。在国家信息学研究所信息获取研究试验平台和社区(National Institute of InformaticsTestbeds and Community for Information Access Research,NTCIR)提供的公开数据集上的实验结果表明,与传统语义表示相比,分布式语义表示能获得更优查询子主题挖掘性能。文中做了更多深入探讨。

关键词:查询子主题挖掘;查询意图;分布式表示;语义组合

Darkslateblue:Affiliate; Royal Blue:Author; Turquoise:Article

Reference

[1]Anagnostopoulos I, Razis G, Mylonas P, et al., 2015. Semantic query suggestion using Twitter entities. Neurocomputing, 163:137-150.

[2]Baeza-Yates R, Hurtado C, Mendoza M, 2005. Query recommendation using query logs in search engines. LNCS, 3268:588-596.

[3]Baroni M, Dinu G, Kruszewski G, 2014. Don’t count, predict! A systematic comparison of context-counting vs. context- predicting semantic vectors. Proc 52nd Annual Meeting of the Association for Computational Linguistics, p.238-247.

[4]Beeferman D, Berger A, 2000. Agglomerative clustering of a search engine query log. Proc 6th ACM SIGKDD Int Conf on Knowledge Discovery and Data Mining, p.407-416.

[5]Bengio Y, Ducharme R, Vincent P, et al., 2003. A neural probabilistic language model. J Mach Learn Res, 3: 1137-1155.

[6]Clarke CLA, Craswell N, Soboroff I, 2009. Overview of the TREC 2009 web track. 18th Text Retrieval Conf, p.1-9.

[7]Collobert R, Weston J, 2008. A unified architecture for natural language processing: deep neural networks with multitask learning. Proc 25th Int Conf on Machine Learning, p.160-167.

[8]Damien A, Zhang M, Liu Y, et al., 2013. Improve web search diversification with intent subtopic mining. CCIS, 400: 322-333.

[9]Dang V, Xue X, Croft WB, 2011. Inferring query aspects from reformulations using clustering. Proc 20th ACM Int Conf on Information and Knowledge Management, p.2117- 2120.

[10]Grefenstette E, Dinu G, Zhang YZ, et al., 2013. Multi-step regression learning for compositional distributional semantics. https://arxiv.org/abs/1301.6939

[11]Hu J, Wang G, Lochovsky F, et al., 2009. Understanding user’s query intent with Wikipedia. Proc 18th Int Conf on World Wide Web, p.471-480.

[12]Hu Y, Qian Y, Li H, et al., 2012. Mining query subtopics from search log data. Proc 35th Int ACM SIGIR Conf on Research and Development in Information Retrieval, p.305-314.

[13]Jiang X, Han X, Sun L, 2011. ISCAS at subtopic mining task in NTCIR9. Proc NTCIR-9 Workshop Meeting, p.168-171.

[14]Joho H, Kishida K, 2014. Overview of NTCIR-11. Proc 11th NTCIR Conf, p.1-7.

[15]Jones R, Klinkner KL, 2008. Beyond the session timeout: automatic hierarchical segmentation of search topics in query logs. Proc 17th ACM Conf on Information and Knowledge Management, p.699-708.

[16]Karunasekera S, Harwood A, Samarawickrama S, et al., 2014. Topic-specific post identification in microblog streams. IEEE Int Conf on Big Data, p.7-13.

[17]Kim SJ, Lee JH, 2013. Subtopic mining based on head- modifier relation and co-occurrence of intents using web documents. LNCS, 8138:179-191.

[18]Le Q, Mikolov T, 2014. Distributed representations of sentences and documents. Proc 31st Int Conf on Machine Learning, p.1188-1196.

[19]Li X, Wang YY, Acero A, 2008. Learning query intent from regularized click graphs. Proc 31st Annual Int ACM SIGIR Conf on Research and Development in Information Retrieval, p.339-346.

[20]Liu Y, Song R, Zhang M, et al., 2014. Overview of the NTCIR-11 IMine task. Proc 11th NTCIR Conf, p.8-23.

[21]Luo C, Liu Y, Zhang M, et al., 2014. Query recommendation based on user intent recognition. J Chin Inform Process, 28(1):64-72 (in Chinese).

[22]Mikolov T, Chen K, Corrado G, et al., 2013a. Efficient estimation of word representations in vector space. https://arxiv.org/abs/1301.3781

[23]Mikolov T, Yih WT, Zweig G, 2013b. Linguistic regularities in continuous space word representations. Proc NAACL- HLT, p.746-751.

[24]Mitchell J, Lapata M, 2010. Composition in distributional models of semantics. Cogn Sci, 34(8):1388-1429.

[25]Mnih A, Hinton G, 2007. Three new graphical models for statistical language modelling. Proc 24th Int Conf on Machine Learning, p.641-648.

[26]Radlinski F, Szummer M, Craswell N, 2010. Inferring query intent from reformulations and clicks. Proc 19th Int Conf on World Wide Web, p.1171-1172.

[27]Rafiei D, Bharat K, Shukla A, 2010. Diversifying web search results. Proc 19th Int Conf on World Wide Web, p.781-790.

[28]Sakai T, Song R, 2011. Evaluating diversified search results using per-intent graded relevance. Proc 34th Int ACM SIGIR Conf on Research and Development in Information Retrieval, p.1043-1052.

[29]Sakai T, Dou Z, Yamamoto T, et al., 2013. Overview of the NTCIR-10 INTENT-2 task. Proc 10th NTCIR Conf, p.94-123.

[30]Santos RLT, Macdonald C, Ounis I, 2010. Exploiting query reformulations for web search result diversification. Proc 19th Int Conf on World Wide Web, p.881-890.

[31]Socher R, Lin CC, Ng AY, et al., 2011a. Parsing natural scenes and natural language with recursive neural networks. Proc 28th Int Conf on Machine Learning, p.129-136.

[32]Socher R, Pennington J, Huang EH, et al., 2011b. Semi- supervised recursive autoencoders for predicting sentiment distributions. Proc Conf on Empirical Methods in Natural Language Processing, Association for Computational Linguistics, p.151-161.

[33]Song R, Luo Z, Nie JY, et al., 2009. Identification of ambiguous queries in web search. Inform Process Manag, 45(2): 216-229.

[34]Song R, Zhang M, Sakai T, et al., 2011. Overview of the NTCIR-9 INTENT task. Proc NTCIR-9 Workshop Meeting, p.82-105.

[35]Song W, Yu Q, Xu ZH, et al., 2012. Multi-aspect query summarization by composite query. Proc 35th Int ACM SIGIR Conf on Research and development in Information Retrieval, p.325-334.

[36]Song W, Liu Y, Liu L, et al., 2016. Examining personalization heuristics by topical analysis of query log. Int J Innov Comput Inform Contr, 12(5):1745-1760.

[37]Strohmaier M, Kröll M, Körner C, 2009. Intentional query suggestion: making user goals more explicit during search. Proc Workshop on Web Search Click Data, p.68-74.

[38]Wang CJ, Lin YW, Tsai MF, et al., 2013. Mining subtopics from different aspects for diversifying search results. Inform Retriev, 16(4):452-483.

[39]Xu J, Croft WB, 1996. Query expansion using local and global document analysis. Proc 19th Annual Int ACM SIGIR Conf on Research and Development in Information Retrieval, p.4-11.

[40]Yu M, Dredze M, 2015. Learning composition models for phrase embeddings. Trans Assoc Comput Ling, 3:227-242.

[41]Zanzotto FM, Korkontzelos I, Fallucchi F, et al., 2010. Estimating linear models for compositional distributional semantics. Proc 23rd Int Conf on Computational Linguistics, p.1263-1271.

[42]Zeng HJ, He QC, Chen Z, et al., 2004. Learning to cluster web search results. Proc 27th Annual Int ACM SIGIR Conf on Research and Development in Information Retrieval, p.210-217.

[43]Zhao Y, Liu Z, Sun M, 2015. Phrase type sensitive tensor indexing model for semantic composition. Proc 29th AAAI Conf on Artificial Intelligence, p.2195-2201.

[44]Zheng W, Fang H, 2011. A comparative study of search result diversification methods. 1st Int Workshop on Diversity in Document Retrieval, p.55-62.

[45]Zheng W, Fang H, Yao C, et al., 2014. Leveraging integrated information to extract query subtopics for search result diversification. Inform Retriev, 17(1):52-73.

Open peer comments: Debate/Discuss/Question/Opinion

<1>

Please provide your name, email address and a comment





Journal of Zhejiang University-SCIENCE, 38 Zheda Road, Hangzhou 310027, China
Tel: +86-571-87952783; E-mail: cjzhang@zju.edu.cn
Copyright © 2000 - Journal of Zhejiang University-SCIENCE