CLC number: TP311
On-line Access: 2022-08-22
Received: 2021-08-07
Revision Accepted: 2022-03-24
Crosschecked: 2022-08-29
Cited: 0
Clicked: 1606
Citations: Bibtex RefMan EndNote GB/T7714
Yichao SHAO, Zhiqiu HUANG, Weiwei LI, Yaoshen YU. Fast code recommendation via approximate sub-tree matching[J]. Frontiers of Information Technology & Electronic Engineering,in press.https://doi.org/10.1631/FITEE.2100379 @article{title="Fast code recommendation via approximate sub-tree matching", %0 Journal Article TY - JOUR
基于近似子树匹配的快速代码推荐方法1南京航空航天大学计算机科学与技术学院,中国南京市,211100 2工业和信息化部安全关键软件重点实验室,中国南京市,211100 3软件新技术与产业化协同创新中心,中国南京市,210016 摘要:软件开发人员通常需编写与已有代码具有类似功能的代码,而帮助开发人员重用这些代码片段的代码推荐工具可显著提高软件开发效率。近年来许多研究者开始关注这一领域,并提出多种代码推荐方法。一些研究者使用序列匹配算法得到相关代码,这些方法往往效率较低,且只能利用代码中的文本信息。另一些研究者从代码中提取特征并形成特征向量,从而计算代码间相似性并得到推荐结果。然而特征向量相似往往不代表原始代码相似,在将抽象语法树转换为向量的过程中存在结构信息丢失问题。对此,我们提出一种基于近似子树匹配的代码推荐方法。与现有基于特征向量匹配的方法不同,该方法在匹配过程中保留了查询代码的树型结构,从而找到与当前查询在结构上最为相似的代码片段。此外,通过哈希思想将子树匹配问题转化为树与列表间的匹配,使得抽象语法树信息可以用于对时间要求较高的代码推荐任务。为评估方法的有效性,构建了多个涵盖不同语言和粒度的代码数据集。实验结果表明,该方法在所有数据集上的召回率均优于两种对比方法—SENSORY和Aroma,且可以应用于大型数据集。 关键词组: Darkslateblue:Affiliate; Royal Blue:Author; Turquoise:Article
Reference[1]Ai L, Huang ZQ, Li WW, et al., 2019. SENSORY: leveraging code statement sequence information for code snippets recommendation. Proc IEEE 43rd Annual Computer Software and Applications Conf, p.27-36. [2]Antunes B, Furtado B, Gomes P, 2014. Context-based search, recommendation and browsing in software development. In: Brézillon P, Gonzalez AJ (Eds.), Context in Computing: a Cross-Disciplinary Approach for Modeling the Real World. Springer, New York, USA, p.45-62. [3]Baxter ID, Yahin A, Moura L, et al., 1998. Clone detection using abstract syntax trees. Proc Int Conf on Software Maintenance, p.368-377. [4]Chen S, Zhang KZ, 2014. An improved algorithm for tree edit distance with applications for RNA secondary structure comparison. J Comb Optim, 27(4):778-797. [5]Ďuračík M, Kršák E, Hrkút P, 2020. Searching source code fragments using incremental clustering. Concurr Comput Pract Exp, 32(13):e5416. [6]Holmes R, Murphy GC, 2005. Using structural context to recommend source code examples. Proc 27th Int Conf on Software Engineering. [7]Jiang H, Nie LM, Sun ZY, et al., 2019. ROSF: leveraging information retrieval and supervised learning for recommending code snippets. IEEE Trans Serv Comput, 12(1):34-46. [8]Jiang LX, Misherghi G, Su ZD, et al., 2007. DECKARD: scalable and accurate tree-based detection of code clones. Proc 29th Int Conf on Software Engineering, p.96-105. [9]Kamalpriya CM, Singh P, 2018. Enhancing program dependency graph based clone detection using approximate subgraph matching. Proc IEEE 11th Int Workshop on Software Clones, p.1-7. [10]Luan SF, Yang D, Barnaby C, et al., 2018. Aroma: code recommendation via structural code search. Proc ACM Program Lang, 3:152. [11]Mou LL, Li G, Zhang L, et al., 2016. Convolutional neural networks over tree structures for programming language processing. Proc 31st AAAI Conf on Artificial Intelligence, p.1287-1293. [12]Rahman MM, Roy CK, 2014. On the use of context in recommending exception handling code examples. Proc IEEE 14th Int Working Conf on Source Code Analysis and Manipulation, p.285-294. [13]Rahman MM, Roy CK, Lo D, 2016. RACK: automatic API recommendation using crowdsourced knowledge. Proc IEEE 23rd Int Conf on Software Analysis, Evolution, and Reengineering, p.349-359. [14]Sahavechaphan N, Claypool K, 2006. XSnippet: mining for sample code. Proc 21st Annual ACM SIGPLAN Conf on Object-Oriented Programming Systems, Languages, and Applications, p.413-430. [15]Saini V, Farmahinifarahani F, Lu YD, et al., 2018. Oreo: detection of clones in the twilight zone. Proc 26th ACM Joint Meeting on European Software Engineering Conf and Symp on the Foundations of Software Engineering, p.354-365. [16]Shasha D, Wang JTL, Zhang KZ, et al., 1994. Exact and approximate algorithms for unordered tree matching. IEEE Trans Syst Man Cybern, 24(4):668-678. [17]Smith TF, Waterman MS, 1981. Identification of common molecular subsequences. J Mol Biol, 147(1):195-197. [18]Svajlenko J, Roy CK, 2021. The mutation and injection framework: evaluating clone detection tools with mutation analysis. IEEE Trans Soft Eng, 47(5):1060-1087. [19]Yang YM, Ren ZL, Chen X, et al., 2018. Structural function based code clone detection using a new hybrid technique. Proc IEEE 42nd Annual Computer Software and Applications Conf, p.286-291. [20]Ye YW, Fischer G, 2002. Supporting reuse by delivering task-relevant and personalized information. Proc 24th Int Conf on Software Engineering. [21]Zhang KZ, Shasha D, 1989. Simple fast algorithms for the editing distance between trees and related problems. SIAM J Comput, 18:1245-1262. Journal of Zhejiang University-SCIENCE, 38 Zheda Road, Hangzhou
310027, China
Tel: +86-571-87952783; E-mail: cjzhang@zju.edu.cn Copyright © 2000 - 2024 Journal of Zhejiang University-SCIENCE |
Open peer comments: Debate/Discuss/Question/Opinion
<1>