JZUS - Journal of Zhejiang University SCIENCE

Frontiers of Information Technology & Electronic Engineering 2020 Vol.21 No.3 P.448-464

Efficient keyword search over graph-structured data based on minimal covered r-cliques

Author(s): Asieh Ghanbarpour, Khashayar Niknafs, Hassan Naderi
Affiliation(s): School of Computer Engineering, Iran University of Science and Technology, Tehran 13114-16846, Iran; more
Corresponding email(s): naderi@iust.ac.ir
Key Words: Keyword search, Graph mining, Information retrieval, Database, Clique

Share this article to： More <<< Previous Article \|Next Article >>>

Asieh Ghanbarpour, Khashayar Niknafs, Hassan Naderi. Efficient keyword search over graph-structured data based on minimal covered r-cliques[J]. Frontiers of Information Technology & Electronic Engineering, 2020, 21(3): 448-464.

@article{title="Efficient keyword search over graph-structured data based on minimal covered r-cliques",
author="Asieh Ghanbarpour, Khashayar Niknafs, Hassan Naderi",
journal="Frontiers of Information Technology & Electronic Engineering",
volume="21",
number="3",
pages="448-464",
year="2020",
publisher="Zhejiang University Press & Springer",
doi="10.1631/FITEE.1800133"
}

%0 Journal Article
%T Efficient keyword search over graph-structured data based on minimal covered r-cliques
%A Asieh Ghanbarpour
%A Khashayar Niknafs
%A Hassan Naderi
%J Frontiers of Information Technology & Electronic Engineering
%V 21
%N 3
%P 448-464
%@ 2095-9184
%D 2020
%I Zhejiang University Press & Springer
%DOI 10.1631/FITEE.1800133

TY - JOUR
T1 - Efficient keyword search over graph-structured data based on minimal covered r-cliques
A1 - Asieh Ghanbarpour
A1 - Khashayar Niknafs
A1 - Hassan Naderi
J0 - Frontiers of Information Technology & Electronic Engineering
VL - 21
IS - 3
SP - 448
EP - 464
%@ 2095-9184
Y1 - 2020
PB - Zhejiang University Press & Springer
ER -
DOI - 10.1631/FITEE.1800133

Abstract
Chinese Summary
Academic Network
Reviewer Comment

Abstract: keyword search is an alternative for structured languages in querying graph-structured data. A result to a keyword query is a connected structure covering all or part of the queried keywords. The textual coverage and structural compactness have been known as the two main properties of a relevant result to a keyword query. Many previous works examined these properties after retrieving all of the candidate results using a ranking function in a comparative manner. However, this needs a time-consuming search process, which is not appropriate for an interactive system in which the user expects results in the least possible time. This problem has been addressed in recent works by confining the shape of results to examine their coverage and compactness during the search. However, these methods still suffer from the existence of redundant nodes in the retrieved results. In this paper, we introduce the semantic of minimal covered r-clique (MCC_r) for the results of a keyword query as an extended model of existing definitions. We propose some efficient algorithms to detect the MCC_rs of a given query. These algorithms can retrieve a comprehensive set of non-duplicate MCC_rs in response to a keyword query. In addition, these algorithms can be executed in a distributive manner, which makes them outstanding in the field of keyword search. We also propose the approximate versions of these algorithms to retrieve the top-k approximate MCCrs in a polynomial delay. It is proved that the approximate algorithms can retrieve results in two-approximation. Extensive experiments on two real-world datasets confirm the efficiency and effectiveness of the proposed algorithms. .

This article has been corrected, see doi:10.1631/FITEE.18e0133

基于r-子团最小覆盖的图结构数据高效关键字搜索

Asieh GHANBARPOUR^1,2, Khashayar NIKNAFS¹, Hassan NADERI¹
¹伊朗科技大学计算机工程学院，伊朗德黑兰，13114-16846
²锡斯坦-俾路支斯坦大学计算机工程系，伊朗扎黑丹，98167-45845

摘要：对图结构数据的查询，关键字搜索是结构化查询语言的一种替代方式。关键字查询的结果是图结构数据上一个连接的结构，其覆盖所有或部分关键字。文本覆盖率和结构紧凑性是评价关键字查询结果是否相关的两个主要属性。以往研究通过排序函数对比所有候选结果的上述属性。但是，该过程耗时长，不符合人们在交互系统中短时得到返回结果的需求。近期研究通过在搜索过程中限制搜素结果的结构形状并考察其覆盖率和紧凑性来解决上述问题。然而，这些方法仍无法解决检索结果中存在冗余节点的问题。本文针对关键字查询结果提出基于r-子团最小覆盖（minimal covered r-clique,MCC_r）的概念，作为现有定义的扩展模型，并给出高效算法以检测给定查询的MCC_r。这些算法的优势在于不仅可以检索出某个关键字查询的全部非重复MCCr，还可以分布式方式执行。此外，提出这些算法的近似版本，以多项式时间复杂度检索最高的k个近似MCC_r。论文表明近似算法可基于成对近似排序检索出结果。基于两个真实数据集的大量实验验证了所提算法的效率和有效性。

关键词：关键字搜索；图挖掘；信息检索；数据库；子团

Darkslateblue:Affiliate; Royal Blue:Author; Turquoise:Article

Reference

[1]Bergamaschi S, Guerra F, Interlandi M, et al., 2013. QUEST: a keyword search system for relational data based on semantic and machine learning techniques. Proc VLDB Endowm, 6(12):1222-1225.

[2]Bergamaschi S, Guerra F, Interlandi M, et al., 2016. Combining user and database perspective for solving keyword queries over relational databases. Inform Syst, 55:1-19.

[3]Bron C, Kerbosch J, 1973. Finding all cliques of an undirected graph. Commun ACM, 16(9):575-577.

[4]Calado P, da Silva AS, Laender AHF, et al., 2004. A Bayesian network approach to searching Web databases through keyword-based queries. Inform Process Manag, 40(5): 773-790.

[5]Dasari NS, Ranjan D, Mohammad Z, 2014. Maximal clique enumeration for large graphs on Hadoop framework. Proc 1^st Workshop on Parallel Programming for Analytics Applications, p.21-30.

[6]de Virgilio R, Cappellari P, Miscione M, 2009. Cluster-based exploration for effective keyword search over semantic datasets. In: Laender AHF, Castano S, Dayal U, et al. (Eds.), Conceptual Modeling - ER 2009. Springer Berlin Heidelberg, p.205-218.

[7]Ding BL,Yu JX, Wang S, et al., 2007. Finding top-k min-cost connected trees in databases. IEEE 23^rd Int Conf on Data Engineering, p.836-845.

[8]Ghanbarpour A, Naderi H, 2018. A model-based keyword search approach for detecting top-k effective answers. Comput J, 62(3):377-393.

[9]Golenberg K, Kimelfeld B, Sagiv Y, 2008. Keyword proximity search in complex data graphs. Proc ACM SIGMOD Int Conf on Management of Data, p.927-940.

[10]Guo L, Shao F, Botev C, et al., 2003. XRANK: ranked keyword search over XML documents. Proc ACM SIGMOD Int Conf on Management of Data, p.16-27.

[11]Hao YF, Cao HP, Qi Y, et al., 2015. Efficient keyword search on graphs using MapReduce. IEEE Int Conf on Big Data, p.2871-2873.

[12]He H, Wang HX, Yang J, et al., 2007. BLINKS: ranked keyword searches on graphs. Proc ACM SIGMOD Int Conf on Management of Data, p.305-316.

[13]Kargar M, An AJ, 2011. Keyword search in graphs: finding r-cliques. Proc VLDB Endowm, 4(10):681-692.

[14]Kargar M, An AJ, 2015. Finding top-k r-cliques for keyword search from graphs in polynomial delay. Knowl Inform Syst, 43(2):249-280.

[15]Kargar M, An AJ, Yu XH, 2014. Efficient duplication free and minimal keyword search in graphs. IEEE Trans Knowl Data Eng, 26(7):1657-1669.

[16]Kim S, Lee W, Arora NR, et al., 2012. Retrieving keyworded subgraphs with graph ranking score. Expert Syst Appl, 39(5):4647-4656.

[17]Kimelfeld B, Sagiv Y, 2006. Finding and approximating top-k answers in keyword proximity search. Proc 25^th ACM SIGMOD-SIGACT-SIGART Symp on Principles of Database Systems, p.173-182.

[18]Lawler EL, 1972. A procedure for computing the K best solutions to discrete optimization problems and its application to the shortest path problem. Manag Sci, 18(7):401-405.

[19]Le TN, Bao FZ, Ling TW, 2015. Exploiting semantics for XML keyword search. Data Knowl Eng, 99:105-125.

[20]Li GL, Ooi BC, Feng JH, et al., 2008. EASE: an effective 3-in-1 keyword search method for unstructured, semi- structured and structured data. Proc ACM SIGMOD Int Conf on Management of Data, p.903-914.

[21]Liu F, Yu C, Meng WY, et al., 2006. Effective keyword search in relational databases. Proc ACM SIGMOD Int Conf on Management of Data, p.563-574.

[22]Mass Y, Sagiv Y, 2012. Language models for keyword search over data graphs. Proc 5^th ACM Int Conf on Web Search and Data Mining, p.363-372.

[23]Mesquita F, da Silva AS, de Moura ES, et al., 2007. LABRADOR: efficiently publishing relational databases on the web by using keyword-based query interfaces. Inform Process Manag, 43(4):983-1004.

[24]Nguyen K, Cao JL, 2012. Top-K data source selection for keyword queries over multiple XML data sources. J Inform Sci, 38(2):156-175.

[25]Ning XM, Jin H, Jia WJ, et al., 2009. Practical and effective IR-style keyword search over semantic web. Inform Process Manag, 45(2):263-271.

[26]Pan YF, Wu YQ, 2013. ROU: advanced keyword search on graph. Proc 22^nd ACM Int Conf on Information & Knowledge Management, p.1625-1630.

[27]Park CS, Lim S, 2015. Efficient processing of keyword queries over graph databases for finding effective answers. Inform Process Manag, 51(1):42-57.

[28]Park J, Lee SG, 2011. Keyword search in relational databases. Knowl Inform Syst, 26(2):175-193.

[29]Qin L, Yu JX, Chang LJ, et al., 2009. Querying communities in relational databases. Proc IEEE 25^th Int Conf on Data Engineering, p.724-735.

[30]Sagharichian M, Naderi H, Haghjoo M, 2015. ExPregel: a new computational model for large‐scale graph processing. Concurr Comput Pract Exp, 27(17):4954-4969.

[31]Segundo PS, Lopez A, Batsyn M, et al., 2016. Improved initial vertex ordering for exact maximum clique search. Appl Intell, 45(3):868-880.

[32]Segundo PS, Artieda J, Strash D, 2018. Efficiently enumerating all maximal cliques with bit-parallelism. Comput Oper Res, 92:37-46.

[33]Tomita E, Tanaka A, Takahashi H, 2006. The worst-case time complexity for generating all maximal cliques and computational experiments. Theor Comput Sci, 363(1):28-42.

[34]Wang D, Zou L, Zhao DY, 2015. Top-k queries on RDF graphs. Inform Sci, 316:201-217.

[35]Xu YW, Guan JH, Li FR, et al., 2013. Scalable continual top-k keyword search in relational databases. Data Knowl Eng, 86:206-223.

[36]Yu T, Liu MC, 2017. A linear time algorithm for maximal clique enumeration in large sparse graphs. Inform Process Lett, 125:35-40.

[37]Yuan Y, Wang GR, Chen L, et al., 2013. Efficient keyword search on uncertain graph data. IEEE Trans Knowl Data Eng, 25(12):2767-2779.

Open peer comments: Debate/Discuss/Question/Opinion

<1>

Similar articles

- Go to

基于r-子团最小覆盖的图结构数据高效关键字搜索

Darkslateblue:Affiliate; Royal Blue:Author; Turquoise:Article

Reference