CLC number: TP311.13
On-line Access: 2024-08-27
Received: 2023-10-17
Revision Accepted: 2024-05-08
Crosschecked: 2017-11-06
Cited: 0
Clicked: 7494
Ji-zhou Luo, Sheng-fei Shi, Hong-zhi Wang, Jian-zhong Li. FrepJoin: an efficient partition-based algorithm for edit similarity join[J]. Frontiers of Information Technology & Electronic Engineering,in press.https://doi.org/10.1631/FITEE.1601347 @article{title="FrepJoin: an efficient partition-based algorithm for edit similarity join", %0 Journal Article TY - JOUR
频率连接:基于数据划分的一种高效字符串相似性连接算法关键词组: Darkslateblue:Affiliate; Royal Blue:Author; Turquoise:Article
Reference[1]Afrati, F.N., Sarma, A.D., Menestrina, D., et al., 2012. Fuzzy joins using MapReduce. Int. Conf. on Data Engineering, p.498-509. ![]() [2]Arasu, A., Ganti, V., Kaushik, R., 2006. Efficient exact set-similarity joins. Int. Conf. on Very Large Data Bases, p.918-929. ![]() [3]Bayardo, R.J., Ma, Y., Srikant, R., 2007. Scaling up all pairs similarity search. Int. World Wide Web Conf., p.131-140. ![]() [4]Chaudhuri, S., Ganjam, K., Ganti, V., et al., 2003. Robust and efficient fuzzy match for online data cleaning. Int. SIGMOD Conf. on Management of Data, p.313-324. ![]() [5]Chaudhuri, S., Ganti, V., Kaushik, R., 2006a. Data debugger: an operator-centric approach for data quality solutions. IEEE Data Eng. Bull., 29(2):60-66. ![]() [6]Chaudhuri, S., Ganti, V., Kaushik, R., 2006b. A primitive operator for similarity joins in data cleaning. Int. Conf. on Data Engineering, p.687-698. ![]() [7]Dong, X., Halevy, A.Y., Yu, C., 2007. Data integration with uncertainty. Int. Conf. on Very Large Data Bases, p.687-698. ![]() [8]Feng, J., Wang, J., Li, G., 2012. Trie-join: a trie-based method for efficient string similarity joins. VLDB J., 21(4):437-461. ![]() [9]Ge, T., Li, Z., 2011. Approximate substring matching over uncertain strings. Proc. VLDB Endow., {bf 4}(11):772-782. ![]() [10]Gravano, L., Ipeirotis, P.G., Jagadish, H.V., et al., 2001. Approximate string joins in a database (almost) for free. Int. Conf. on Very Large Data Bases, p.491-500. ![]() [11]Hadjieleftheriou, M., Srivastava, D., 2010. Weighted Set-Based String Similarity. Bulletin of the IEEE Computer Society Technical Committee on Data Engineering. AT&T Lab-Research. ![]() [12]Henzinger, M.R., 2006. Finding near-duplicate web pages: a large-scale evaluation of algorithms. Int. ACM SIGIR Conf. on Research and Development in Information Retrieval, p.284-291. ![]() [13]Ji, S., Li, G., Li, C., et al., 2009. Efficient interactive fuzzy keyword search. Int. World Wide Web Conf., p.371-380. ![]() [14]Li, C., Lu, J., Lu, Y., 2008. Efficient merging and filtering algorithms for approximate string searches. Int. Conf. on Data Engineering, p.257-266. ![]() [15]Li, G., Deng, D., Wang, J., et al., 2011. Pass-Join: a partition-based method for similarity joins. Proc. VLDB Endow., {bf 5}(3):253-264. ![]() [16]Metwally, A., Agrawal, D., Abbadi, A.E., 2007. Detectives: detecting coalition hit inflation attacks in advertising networks streams. Int. World Wide Web Conf., p.241-250. ![]() [17]Navarro, G., Salmela, L., 2009. Indexing variable length substrings for exact and approximate matching. Int. Symp. on String Processing and Information Retrieval, p.214-221. ![]() [18]Qin, J., Wang, W., Xiao, C., et al., 2013. Vchunkjoin: an efficient algorithm for edit similarity joins. Trans. Knowl. Dat. Eng., {bf 25}(8):1916-1929. ![]() [19]Sarawagi, S., Kirpal, A., 2004. Efficient set joins on similarity predicates. Int. SIGMOD Conf. on Management of Data, p.743-754. ![]() [20]Scott, D.W., 1979. On optimal and data-based histograms. Biometrika, 66:605-610. ![]() [21]Vernica, R., Carey, M.J., Li, C., 2010. Efficient parallel set-similarity joins using MapReduce. Int. SIGMOD Conf. on Management of Data, p.495-506. ![]() [22]Wang, J., Li, G., Feng, J., 2011. Fast-join: an efficient method for fuzzy token matching based string similarity join. Int. Conf. on Data Engineering, p.458-469. ![]() [23]Wang, W., Xiao, C., Lin, X., et al., 2009. Efficient approximate entity extraction with edit distance constraints. Int. SIGMOD Conf. on Management of Data, p.759-770. ![]() [24]Xiao, C., Wang, W., Lin, X., 2008a. Ed-Join: an efficient algorithm for similarity joins with edit distance constraints. Proc. VLDB Endow., {bf 1}(1):933-944. ![]() [25]Xiao, C., Wang, W., Lin, X., et al., 2008b. Efficient similarity joins for near duplicate detection. Int. World Wide Web Conf., p.131-140. ![]() Journal of Zhejiang University-SCIENCE, 38 Zheda Road, Hangzhou
310027, China
Tel: +86-571-87952783; E-mail: cjzhang@zju.edu.cn Copyright © 2000 - 2025 Journal of Zhejiang University-SCIENCE |
Open peer comments: Debate/Discuss/Question/Opinion
<1>