JZUS - Journal of Zhejiang University SCIENCE

Frontiers of Information Technology & Electronic Engineering 2022 Vol.23 No.8 P.1205-1216

Fast code recommendation via approximate sub-tree matching

Author(s): Yichao SHAO, Zhiqiu HUANG, Weiwei LI, Yaoshen YU
Affiliation(s): School of Computer Science and Technology, Nanjing University of Aeronautics and Astronautics, Nanjing 211100, China; more
Corresponding email(s): shaoyichao@nuaa.edu.cn, zqhuang@nuaa.edu.cn
Key Words: Code reuse, Code recommendation, Tree similarity, Structure information

Share this article to： More <<< Previous Article \|Next Article >>>

Abstract: Software developers often write code that has similar functionality to existing code segments. A code recommendation tool that helps developers reuse these code fragments can significantly improve their efficiency. Several methods have been proposed in recent years. Some use sequence matching algorithms to find the related recommendations. Most of these methods are time-consuming and can leverage only low-level textual information from code. Others extract features from code and obtain similarity using numerical feature vectors. However, the similarity of feature vectors is often not equivalent to the original code’s similarity. Structural information is lost during the process of transforming abstract syntax trees into vectors. We propose an approximate sub-tree matching based method to solve this problem. Unlike existing tree-based approaches that match feature vectors, it retains the tree structure of the query code in the matching process to find code fragments that best match the current query. It uses a fast approximation sub-tree matching algorithm by transforming the sub-tree matching problem into the match between the tree and the list. In this way, the structural information can be used for code recommendation tasks that have high time requirements. We have constructed several real-world code databases covering different languages and granularities to evaluate the effectiveness of our method. The results show that our method outperforms two compared methods, SENSORY and Aroma, in terms of the recall value on all the datasets, and can be applied to large datasets.

基于近似子树匹配的快速代码推荐方法

邵宜超^1,2,3，黄志球^1,2,3，李伟湋^1,2,3，喻垚慎^1,2,3
¹南京航空航天大学计算机科学与技术学院，中国南京市，211100
²工业和信息化部安全关键软件重点实验室，中国南京市，211100
³软件新技术与产业化协同创新中心，中国南京市，210016
摘要：软件开发人员通常需编写与已有代码具有类似功能的代码，而帮助开发人员重用这些代码片段的代码推荐工具可显著提高软件开发效率。近年来许多研究者开始关注这一领域，并提出多种代码推荐方法。一些研究者使用序列匹配算法得到相关代码，这些方法往往效率较低，且只能利用代码中的文本信息。另一些研究者从代码中提取特征并形成特征向量，从而计算代码间相似性并得到推荐结果。然而特征向量相似往往不代表原始代码相似，在将抽象语法树转换为向量的过程中存在结构信息丢失问题。对此，我们提出一种基于近似子树匹配的代码推荐方法。与现有基于特征向量匹配的方法不同，该方法在匹配过程中保留了查询代码的树型结构，从而找到与当前查询在结构上最为相似的代码片段。此外，通过哈希思想将子树匹配问题转化为树与列表间的匹配，使得抽象语法树信息可以用于对时间要求较高的代码推荐任务。为评估方法的有效性，构建了多个涵盖不同语言和粒度的代码数据集。实验结果表明，该方法在所有数据集上的召回率均优于两种对比方法—SENSORY和Aroma，且可以应用于大型数据集。

关键词：代码复用；代码推荐；树相似度；结构信息

Reference

[1]Ai L, Huang ZQ, Li WW, et al., 2019. SENSORY: leveraging code statement sequence information for code snippets recommendation. Proc IEEE 43^rd Annual Computer Software and Applications Conf, p.27-36.

[2]Antunes B, Furtado B, Gomes P, 2014. Context-based search, recommendation and browsing in software development. In: Brézillon P, Gonzalez AJ (Eds.), Context in Computing: a Cross-Disciplinary Approach for Modeling the Real World. Springer, New York, USA, p.45-62.

[3]Baxter ID, Yahin A, Moura L, et al., 1998. Clone detection using abstract syntax trees. Proc Int Conf on Software Maintenance, p.368-377.

[4]Chen S, Zhang KZ, 2014. An improved algorithm for tree edit distance with applications for RNA secondary structure comparison. J Comb Optim, 27(4):778-797.

[5]Ďuračík M, Kršák E, Hrkút P, 2020. Searching source code fragments using incremental clustering. Concurr Comput Pract Exp, 32(13):e5416.

[6]Holmes R, Murphy GC, 2005. Using structural context to recommend source code examples. Proc 27th Int Conf on Software Engineering.

[7]Jiang H, Nie LM, Sun ZY, et al., 2019. ROSF: leveraging information retrieval and supervised learning for recommending code snippets. IEEE Trans Serv Comput, 12(1):34-46.

[8]Jiang LX, Misherghi G, Su ZD, et al., 2007. DECKARD: scalable and accurate tree-based detection of code clones. Proc 29^th Int Conf on Software Engineering, p.96-105.

[9]Kamalpriya CM, Singh P, 2018. Enhancing program dependency graph based clone detection using approximate subgraph matching. Proc IEEE 11^th Int Workshop on Software Clones, p.1-7.

[10]Luan SF, Yang D, Barnaby C, et al., 2018. Aroma: code recommendation via structural code search. Proc ACM Program Lang, 3:152.

[11]Mou LL, Li G, Zhang L, et al., 2016. Convolutional neural networks over tree structures for programming language processing. Proc 31^st AAAI Conf on Artificial Intelligence, p.1287-1293.

[12]Rahman MM, Roy CK, 2014. On the use of context in recommending exception handling code examples. Proc IEEE 14^th Int Working Conf on Source Code Analysis and Manipulation, p.285-294.

[13]Rahman MM, Roy CK, Lo D, 2016. RACK: automatic API recommendation using crowdsourced knowledge. Proc IEEE 23^rd Int Conf on Software Analysis, Evolution, and Reengineering, p.349-359.

[14]Sahavechaphan N, Claypool K, 2006. XSnippet: mining for sample code. Proc 21^st Annual ACM SIGPLAN Conf on Object-Oriented Programming Systems, Languages, and Applications, p.413-430.

[15]Saini V, Farmahinifarahani F, Lu YD, et al., 2018. Oreo: detection of clones in the twilight zone. Proc 26^thACM Joint Meeting on European Software Engineering Conf and Symp on the Foundations of Software Engineering, p.354-365.

[16]Shasha D, Wang JTL, Zhang KZ, et al., 1994. Exact and approximate algorithms for unordered tree matching. IEEE Trans Syst Man Cybern, 24(4):668-678.

[17]Smith TF, Waterman MS, 1981. Identification of common molecular subsequences. J Mol Biol, 147(1):195-197.

[18]Svajlenko J, Roy CK, 2021. The mutation and injection framework: evaluating clone detection tools with mutation analysis. IEEE Trans Soft Eng, 47(5):1060-1087.

[19]Yang YM, Ren ZL, Chen X, et al., 2018. Structural function based code clone detection using a new hybrid technique. Proc IEEE 42^nd Annual Computer Software and Applications Conf, p.286-291.

[20]Ye YW, Fischer G, 2002. Supporting reuse by delivering task-relevant and personalized information. Proc 24^th Int Conf on Software Engineering.

[21]Zhang KZ, Shasha D, 1989. Simple fast algorithms for the editing distance between trees and related problems. SIAM J Comput, 18:1245-1262.

Open peer comments: Debate/Discuss/Question/Opinion

<1>

Similar articles

- Go to

基于近似子树匹配的快速代码推荐方法

Darkslateblue:Affiliate; Royal Blue:Author; Turquoise:Article

Reference