
CLC number: TP311.53;TP183
On-line Access: 2025-11-17
Received: 2025-01-16
Revision Accepted: 2025-11-18
Crosschecked: 2025-04-14
Cited: 0
Clicked: 1258
Citations: Bibtex RefMan EndNote GB/T7714
https://orcid.org/0009-0006-3328-4080
Xinlong PAN, Jianhua LI, Zhihong ZHOU, Gaolei LI, Xiuzhen CHEN, Jin MA, Jun WU, Quanhai ZHANG. Large language model-enhanced probabilistic modeling for effective static analysis alarms[J]. Frontiers of Information Technology & Electronic Engineering,in press.https://doi.org/10.1631/FITEE.2500038 @article{title="Large language model-enhanced probabilistic modeling for effective static analysis alarms", %0 Journal Article TY - JOUR
大语言模型增强的静态分析警报概率模型1上海交通大学计算机学院网络安全技术研究院,中国上海市,200240 2上海市信息安全综合管理技术研究重点实验室,中国上海市,200240 摘要:静态分析在警报处理方面面临诸多挑战,其中概率模型与警报优先级排序是缓解用户手动检查负担的关键方法。这些模型通常依赖用户反馈对警报进行排序,从而提升处理效率。然而,现有方法常受限于效率低下及泛化能力不足等问题。尽管基于学习的方法已展现一定潜力,但其通常伴随着高昂的训练代价,并受预定义模型结构的制约。此外,大语言模型(LLM)在静态分析中的集成尚未充分发挥其潜力,导致漏洞识别准确率偏低。为应对上述问题,本文提出一种新型框架—BinLLM,该框架利用LLM的泛化能力,通过规则学习提升警报概率模型的性能。我们的方法将LLM生成的抽象规则引入概率模型,结合静态分析中的警报路径与关键语句,从而增强模型推理能力,有效提高真实漏洞的识别率,并缓解泛化错误问题。在一组C程序的实验评估中,BinLLM在警报验证所需检查数量上,较两个先进基线方法Bingo和BayeSmith分别减少40.1%与9.4%,充分体现了LLM与静态分析的结合在提升警报管理方面的应用潜力。 关键词组: Darkslateblue:Affiliate; Royal Blue:Author; Turquoise:Article
Reference[1]Achiam J, Adler S, Agarwal S, et al., 2023. GPT-4 technical report. https://arxiv.org/abs/2303.08774 ![]() [2]Ahmed T, Pai KS, Devanbu P, et al., 2023. Improving few-shot prompts with relevant static analysis products. https://arxiv.org/abs/2304.06815v1 ![]() [3]Beller M, Bholanath R, McIntosh S, et al., 2016. Analyzing the state of static analysis: a large-scale evaluation in open source software. IEEE 23rd Int Conf on Software Analysis, Evolution, and Reengineering, p.470-481. ![]() [4]Chen M, Tworek J, Jun H, et al., 2021. Evaluating large language models trained on code. https://arxiv.org/abs/2107.03374 ![]() [5]Chen TY, Heo K, Raghothaman M, 2021. Boosting static analysis accuracy with instrumented test executions. Proc 29th ACM Joint Meeting on European Software Engineering Conf and Symp on the Foundations of Software Engineering, p.1154-1165. ![]() [6]Christakis M, Bird C, 2016. What developers want and need from program analysis: an empirical study. Proc 31st IEEE/ACM Int Conf on Automated Software Engineering, p.332-343. ![]() [7]Eggert P, 2010. sort: Commit 14ad7a2. http://git.savannah.gnu.org/cgit/coreutils.git/commit/?id=14ad7a2 [Accessed on Nov. 4, 2024]. ![]() [8]Feng ZY, Guo DY, Tang DY, et al., 2020. CodeBERT: a pre-trained model for programming and natural languages. Findings of the Association for Computational Linguistics: EMNLP, p.1536-1547. ![]() [9]Ferrante J, Ottenstein KJ, Warren JD, 1987. The program dependence graph and its use in optimization. ACM Trans Program Lang Syst, 9(3):319-349. ![]() [10]Gao ZY, Wang H, Zhou YC, et al., 2023. How far have we gone in vulnerability detection using large language models. https://arxiv.org/abs/2311.12420 ![]() [11]Heo K, Oh H, Yi K, 2017. Machine-learning-guided selectively unsound static analysis. IEEE/ACM 39th Int Conf on Software Engineering, p.519-529. ![]() [12]Heo K, Raghothaman M, Si XJ, et al., 2019. Continuously reasoning about programs using differential Bayesian inference. Proc 40th ACM SIGPLAN Conf on Programming Language Design and Implementation, p.561-575. ![]() [13]Ji ZW, Lee N, Frieske R, et al., 2023. Survey of hallucination in natural language generation. ACM Comput Surv, 55(12):248. ![]() [14]Khanfar H, Lisper B, Masud AN, 2015. Static backward program slicing for safety-critical systems. 20th Ada-Europe Int Conf on Reliable Software Technologies, p.50-65. ![]() [15]Kim H, Raghothaman M, Heo K, 2022. Learning probabilistic models for static analysis alarms. Proc 44th Int Conf on Software Engineering, p.1282-1293. ![]() [16]Li HN, Hao Y, Zhai YZ, et al., 2023. The Hitchhiker's guide to program analysis: a journey with large language models. https://arxiv.org/abs/2308.00245 ![]() [17]Li HN, Hao Y, Zhai YZ, et al., 2024. Enhancing static analysis for practical bug detection: an LLM-integrated approach. Proc ACM Program Lang, 8(OOPSLA1):111. ![]() [18]Li ZY, Dutta S, Naik M, 2024. IRIS: LLM-assisted static analysis for detecting security vulnerabilities. https://arxiv.org/abs/2405.17238 ![]() [19]Libtiff developers, 2024. Issue #624 - libtiff. https://gitlab.com/libtiff/libtiff/-/issues/624 [Accessed on Nov. 4, 2024]. ![]() [20]Lisper B, Masud AN, Khanfar H, 2015. Static backward demand-driven slicing. Proc Workshop on Partial Evaluation and Program Manipulation, p.115-126. ![]() [21]Ma W, Liu SQ, Lin ZH, et al., 2023. LMs: understanding code syntax and semantics for code analysis. https://arxiv.org/abs/2305.12138 ![]() [22]Mangal R, Zhang X, Nori AV, et al., 2015. A user-guided approach to program analysis. Proc 10th Joint Meeting on Foundations of Software Engineering, p.462-473. ![]() [23]Meyering J, 2018. tar: Commit b531801. http://git.savannah.gnu.org/cgit/tar.git/commit/?id=b531801 [Accessed on Nov. 4, 2024]. ![]() [24]MITRE, 2015a. CVE-2015-1345. https://cve.mitre.org/cgi-bin/cvename.cgi?name=CVE-2015-1345 [Accessed on Nov. 4, 2024]. ![]() [25]MITRE, 2015b. CVE-2015-8106. https://cve.mitre.org/cgi-bin/cvename.cgi?name=CVE-2015-8106 [Accessed on Nov. 4, 2024]. ![]() [26]MITRE, 2016. CVE-2016-10713. https://cve.mitre.org/cgi-bin/cvename.cgi?name=CVE-2016-10713 [Accessed on Nov. 4, 2024]. ![]() [27]MITRE, 2017a. CVE-2017-9181. https://cve.mitre.org/cgi-bin/cvename.cgi?name=CVE-2017-9181 [Accessed on Nov. 4, 2024]. ![]() [28]MITRE, 2017b. CVE-2017-16663. https://cve.mitre.org/cgi-bin/cvename.cgi?name=CVE-2017-16663 [Accessed on Nov. 4, 2024]. ![]() [29]MITRE, 2018. CVE-2018-10372. https://cve.mitre.org/cgi-bin/cvename.cgi?name=CVE-2018-10372 [Accessed on Nov. 4, 2024]. ![]() [30]MITRE, 2019a. CVE-2019-16166. https://cve.mitre.org/cgi-bin/cvename.cgi?name=CVE-2019-16166 [Accessed on Nov. 4, 2024]. ![]() [31]MITRE, 2019b. CVE-2019-18397. https://cve.mitre.org/cgi-bin/cvename.cgi?name=CVE-2019-18397 [Accessed on Nov. 4, 2024]. ![]() [32]MITRE, 2024. CVE-2024-7006. https://cve.mitre.org/cgi-bin/cvename.cgi?name=CVE-2024-7006 [Accessed on Nov. 4, 2024]. ![]() [33]Mohajer MM, Aleithan R, Harzevili NS, et al., 2023. SkipAnalyzer: an embodied agent for code analysis with large language models. https://arxiv.org/abs/2310.18532 ![]() [34]Mooij JM, 2010. libDAI: a free and open source C++ library for discrete approximate inference in graphical models. J Mach Learn Res, 11:2169-2173. ![]() [35]Muske T, Serebrenik A, 2022. Survey of approaches for postprocessing of static analysis alarms. ACM Comput Surv, 55(3):48. ![]() [36]Oh H, Heo K, Lee W, et al., 2012. The Sparrow static analyzer. https://github.com/ropas/sparrow [Accessed on Nov. 4, 2024]. ![]() [37]Pearce H, Tan B, Ahmad B, et al., 2023. Examining zero-shot vulnerability repair with large language models. IEEE Symp on Security and Privacy, p.2339-2356. ![]() [38]Pei KX, Bieber D, Shi KS, et al., 2023. Can large language models reason about program invariants? Proc 40th Int Conf on Machine Learning, p.27496-27520. ![]() [39]Raghothaman M, Kulkarni S, Heo K, et al., 2018. User-guided program reasoning using Bayesian inference. Proc 39th ACM SIGPLAN Conf on Programming Language Design and Implementation, p.722-735. ![]() [40]Ruhsen T, 2018a. wget: Commit b3ff8ce. http://git.savannah.gnu.org/cgit/wget.git/commit/?id=b3ff8ce [Accessed on Nov. 4, 2024]. ![]() [41]Ruhsen T, 2018b. wget: Commit f0d715b. http://git.savannah.gnu.org/cgit/wget.git/commit/?id=f0d715b [Accessed on Nov. 4, 2024]. ![]() [42]Shen HH, Fang JH, Zhao JJ, 2011. Efindbugs: effective error ranking for findbugs. 4th IEEE Int Conf on Software Testing, Verification and Validation, p.299-308. ![]() [43]SouffleRules, 2024. Soufflé: a Datalog Synthesis Tool—Rules. https://souffle-lang.github.io/rules [Accessed on Nov. 4, 2024]. ![]() [44]Sun YQ, Wu DY, Xue Y, et al., 2024. GPTScan: detecting logic vulnerabilities in smart contracts by combining GPT with program analysis. Proc IEEE/ACM 46th Int Conf on Software Engineering, Article 166. ![]() [45]Touvron H, Martin L, Stone K, et al., 2023. Llama 2: open foundation and fine-tuned chat models. https://arxiv.org/abs/2307.09288 ![]() [46]Wei J, Wang XZ, Schuurmans D, et al., 2022. Chain-of-thought prompting elicits reasoning in large language models. Proc 36th Int Conf on Neural Information Processing Systems, Article 1800. ![]() [47]Yao SY, Yu D, Zhao J, et al., 2023. Tree of thoughts: deliberate problem solving with large language models. Proc 37th Int Conf on Neural Information Processing Systems, Article 517. ![]() [48]Zhang YF, Shi YF, Zhang X, 2024. Learning abstraction selection for Bayesian program analysis. Proc ACM Program Lang, 8(OOPSLA1):128. ![]() [49]Zhou X, Cao SC, Sun XB, et al., 2025. Large language model for vulnerability detection and repair: literature review and the road ahead. ACM Trans Softw Eng Methodol, 34(5):145. ![]() Journal of Zhejiang University-SCIENCE, 38 Zheda Road, Hangzhou
310027, China
Tel: +86-571-87952783; E-mail: cjzhang@zju.edu.cn Copyright © 2000 - 2026 Journal of Zhejiang University-SCIENCE | ||||||||||||||


ORCID:
Open peer comments: Debate/Discuss/Question/Opinion
<1>