JZUS - Journal of Zhejiang University SCIENCE

ENGINEERING Information Technology & Electronic Engineering

Accepted manuscript available online (unedited version)

Large language model-enhanced probabilistic modeling for effective static analysis alarms

Author(s): Xinlong PAN, Jianhua LI, Zhihong ZHOU, Gaolei LI, Xiuzhen CHEN, Jin MA, Jun WU, Quanhai ZHANG
Affiliation(s): Institute of Cyber Security and Technology, School of Computer Science, Shanghai Jiao Tong University, Shanghai 200240, China; more
Corresponding email(s): mr.p332@sjtu.edu.cn, Lijh888@sjtu.edu.cn, zhouzhihong@sjtu.edu.cn
Key Words: Static analysis; Bayesian inference; Large language models (LLMs); Alarm ranking

Share this article to： More <<< Previous Paper \|Next Paper >>>

Xinlong PAN, Jianhua LI, Zhihong ZHOU, Gaolei LI, Xiuzhen CHEN, Jin MA, Jun WU, Quanhai ZHANG. Large language model-enhanced probabilistic modeling for effective static analysis alarms[J]. Frontiers of Information Technology & Electronic Engineering,in press.https://doi.org/10.1631/FITEE.2500038

@article{title="Large language model-enhanced probabilistic modeling for effective static analysis alarms",
author="Xinlong PAN, Jianhua LI, Zhihong ZHOU, Gaolei LI, Xiuzhen CHEN, Jin MA, Jun WU, Quanhai ZHANG",
journal="Frontiers of Information Technology & Electronic Engineering",
year="in press",
publisher="Zhejiang University Press & Springer",
doi="https://doi.org/10.1631/FITEE.2500038"
}

%0 Journal Article
%T Large language model-enhanced probabilistic modeling for effective static analysis alarms
%A Xinlong PAN
%A Jianhua LI
%A Zhihong ZHOU
%A Gaolei LI
%A Xiuzhen CHEN
%A Jin MA
%A Jun WU
%A Quanhai ZHANG
%J Frontiers of Information Technology & Electronic Engineering
%P 1926-1941
%@ 2095-9184
%D in press
%I Zhejiang University Press & Springer
doi="https://doi.org/10.1631/FITEE.2500038"

TY - JOUR
T1 - Large language model-enhanced probabilistic modeling for effective static analysis alarms
A1 - Xinlong PAN
A1 - Jianhua LI
A1 - Zhihong ZHOU
A1 - Gaolei LI
A1 - Xiuzhen CHEN
A1 - Jin MA
A1 - Jun WU
A1 - Quanhai ZHANG
J0 - Frontiers of Information Technology & Electronic Engineering
SP - 1926
EP - 1941
%@ 2095-9184
Y1 - in press
PB - Zhejiang University Press & Springer
ER -
doi="https://doi.org/10.1631/FITEE.2500038"

Abstract
Chinese Summary
Academic Network
Reviewer Comment

Abstract: Static analysis presents significant challenges in alarm handling, where probabilistic models and alarm prioritization are essential methods for addressing these issues. These models prioritize alarms based on user feedback, thereby alleviating the burden on users to manually inspect alarms. However, they often encounter limitations related to efficiency and issues such as false generalization. While learning-based approaches have demonstrated promise, they typically incur high training costs and are constrained by the predefined structures of existing models. Moreover, the integration of large language models (LLMs) in static analysis has yet to reach its full potential, often resulting in lower accuracy rates in vulnerability identification. To tackle these challenges, we introduce BinLLM, a novel framework that harnesses the generalization capabilities of LLMs to enhance alarm probability models through rule learning. Our approach integrates LLM-derived abstract rules into the probabilistic model, using alarm paths and critical statements from static analysis. This integration enhances the model's reasoning capabilities, improving its effectiveness in prioritizing genuine bugs while mitigating false generalizations. We evaluated BinLLM on a suite of C programs and observed 40.1% and 9.4% reduction in the number of checks required for alarm verification compared to two state-of-the-art baselines, Bingo and BayeSmith, respectively, underscoring the potential of combining LLMs with static analysis to improve alarm management.

大语言模型增强的静态分析警报概率模型

潘鑫龙^1,2，李建华^1,2，周志洪^1,2，李高磊^1,2，陈秀真^1,2，马进^1,2，伍军^1,2，张全海^1,2
¹上海交通大学计算机学院网络安全技术研究院，中国上海市，200240
²上海市信息安全综合管理技术研究重点实验室，中国上海市，200240
摘要：静态分析在警报处理方面面临诸多挑战，其中概率模型与警报优先级排序是缓解用户手动检查负担的关键方法。这些模型通常依赖用户反馈对警报进行排序，从而提升处理效率。然而，现有方法常受限于效率低下及泛化能力不足等问题。尽管基于学习的方法已展现一定潜力，但其通常伴随着高昂的训练代价，并受预定义模型结构的制约。此外，大语言模型（LLM）在静态分析中的集成尚未充分发挥其潜力，导致漏洞识别准确率偏低。为应对上述问题，本文提出一种新型框架—BinLLM，该框架利用LLM的泛化能力，通过规则学习提升警报概率模型的性能。我们的方法将LLM生成的抽象规则引入概率模型，结合静态分析中的警报路径与关键语句，从而增强模型推理能力，有效提高真实漏洞的识别率，并缓解泛化错误问题。在一组C程序的实验评估中，BinLLM在警报验证所需检查数量上，较两个先进基线方法Bingo和BayeSmith分别减少40.1%与9.4%，充分体现了LLM与静态分析的结合在提升警报管理方面的应用潜力。

关键词组：静态分析；贝叶斯推理；大语言模型；警报排序

Darkslateblue:Affiliate; Royal Blue:Author; Turquoise:Article

Reference

[1]Achiam J, Adler S, Agarwal S, et al., 2023. GPT-4 technical report. https://arxiv.org/abs/2303.08774

[2]Ahmed T, Pai KS, Devanbu P, et al., 2023. Improving few-shot prompts with relevant static analysis products. https://arxiv.org/abs/2304.06815v1

[3]Beller M, Bholanath R, McIntosh S, et al., 2016. Analyzing the state of static analysis: a large-scale evaluation in open source software. IEEE 23^rd Int Conf on Software Analysis, Evolution, and Reengineering, p.470-481.

[4]Chen M, Tworek J, Jun H, et al., 2021. Evaluating large language models trained on code. https://arxiv.org/abs/2107.03374

[5]Chen TY, Heo K, Raghothaman M, 2021. Boosting static analysis accuracy with instrumented test executions. Proc 29^th ACM Joint Meeting on European Software Engineering Conf and Symp on the Foundations of Software Engineering, p.1154-1165.

[6]Christakis M, Bird C, 2016. What developers want and need from program analysis: an empirical study. Proc 31^st IEEE/ACM Int Conf on Automated Software Engineering, p.332-343.

[7]Eggert P, 2010. sort: Commit 14ad7a2. http://git.savannah.gnu.org/cgit/coreutils.git/commit/?id=14ad7a2 [Accessed on Nov. 4, 2024].

[8]Feng ZY, Guo DY, Tang DY, et al., 2020. CodeBERT: a pre-trained model for programming and natural languages. Findings of the Association for Computational Linguistics: EMNLP, p.1536-1547.

[9]Ferrante J, Ottenstein KJ, Warren JD, 1987. The program dependence graph and its use in optimization. ACM Trans Program Lang Syst, 9(3):319-349.

[10]Gao ZY, Wang H, Zhou YC, et al., 2023. How far have we gone in vulnerability detection using large language models. https://arxiv.org/abs/2311.12420

[11]Heo K, Oh H, Yi K, 2017. Machine-learning-guided selectively unsound static analysis. IEEE/ACM 39^th Int Conf on Software Engineering, p.519-529.

[12]Heo K, Raghothaman M, Si XJ, et al., 2019. Continuously reasoning about programs using differential Bayesian inference. Proc 40^th ACM SIGPLAN Conf on Programming Language Design and Implementation, p.561-575.

[13]Ji ZW, Lee N, Frieske R, et al., 2023. Survey of hallucination in natural language generation. ACM Comput Surv, 55(12):248.

[14]Khanfar H, Lisper B, Masud AN, 2015. Static backward program slicing for safety-critical systems. 20^th Ada-Europe Int Conf on Reliable Software Technologies, p.50-65.

[15]Kim H, Raghothaman M, Heo K, 2022. Learning probabilistic models for static analysis alarms. Proc 44^th Int Conf on Software Engineering, p.1282-1293.

[16]Li HN, Hao Y, Zhai YZ, et al., 2023. The Hitchhiker's guide to program analysis: a journey with large language models. https://arxiv.org/abs/2308.00245

[17]Li HN, Hao Y, Zhai YZ, et al., 2024. Enhancing static analysis for practical bug detection: an LLM-integrated approach. Proc ACM Program Lang, 8(OOPSLA1):111.

[18]Li ZY, Dutta S, Naik M, 2024. IRIS: LLM-assisted static analysis for detecting security vulnerabilities. https://arxiv.org/abs/2405.17238

[19]Libtiff developers, 2024. Issue #624 - libtiff. https://gitlab.com/libtiff/libtiff/-/issues/624 [Accessed on Nov. 4, 2024].

[20]Lisper B, Masud AN, Khanfar H, 2015. Static backward demand-driven slicing. Proc Workshop on Partial Evaluation and Program Manipulation, p.115-126.

[21]Ma W, Liu SQ, Lin ZH, et al., 2023. LMs: understanding code syntax and semantics for code analysis. https://arxiv.org/abs/2305.12138

[22]Mangal R, Zhang X, Nori AV, et al., 2015. A user-guided approach to program analysis. Proc 10^th Joint Meeting on Foundations of Software Engineering, p.462-473.

[23]Meyering J, 2018. tar: Commit b531801. http://git.savannah.gnu.org/cgit/tar.git/commit/?id=b531801 [Accessed on Nov. 4, 2024].

[24]MITRE, 2015a. CVE-2015-1345. https://cve.mitre.org/cgi-bin/cvename.cgi?name=CVE-2015-1345 [Accessed on Nov. 4, 2024].

[25]MITRE, 2015b. CVE-2015-8106. https://cve.mitre.org/cgi-bin/cvename.cgi?name=CVE-2015-8106 [Accessed on Nov. 4, 2024].

[26]MITRE, 2016. CVE-2016-10713. https://cve.mitre.org/cgi-bin/cvename.cgi?name=CVE-2016-10713 [Accessed on Nov. 4, 2024].

[27]MITRE, 2017a. CVE-2017-9181. https://cve.mitre.org/cgi-bin/cvename.cgi?name=CVE-2017-9181 [Accessed on Nov. 4, 2024].

[28]MITRE, 2017b. CVE-2017-16663. https://cve.mitre.org/cgi-bin/cvename.cgi?name=CVE-2017-16663 [Accessed on Nov. 4, 2024].

[29]MITRE, 2018. CVE-2018-10372. https://cve.mitre.org/cgi-bin/cvename.cgi?name=CVE-2018-10372 [Accessed on Nov. 4, 2024].

[30]MITRE, 2019a. CVE-2019-16166. https://cve.mitre.org/cgi-bin/cvename.cgi?name=CVE-2019-16166 [Accessed on Nov. 4, 2024].

[31]MITRE, 2019b. CVE-2019-18397. https://cve.mitre.org/cgi-bin/cvename.cgi?name=CVE-2019-18397 [Accessed on Nov. 4, 2024].

[32]MITRE, 2024. CVE-2024-7006. https://cve.mitre.org/cgi-bin/cvename.cgi?name=CVE-2024-7006 [Accessed on Nov. 4, 2024].

[33]Mohajer MM, Aleithan R, Harzevili NS, et al., 2023. SkipAnalyzer: an embodied agent for code analysis with large language models. https://arxiv.org/abs/2310.18532

[34]Mooij JM, 2010. libDAI: a free and open source C++ library for discrete approximate inference in graphical models. J Mach Learn Res, 11:2169-2173.

[35]Muske T, Serebrenik A, 2022. Survey of approaches for postprocessing of static analysis alarms. ACM Comput Surv, 55(3):48.

[36]Oh H, Heo K, Lee W, et al., 2012. The Sparrow static analyzer. https://github.com/ropas/sparrow [Accessed on Nov. 4, 2024].

[37]Pearce H, Tan B, Ahmad B, et al., 2023. Examining zero-shot vulnerability repair with large language models. IEEE Symp on Security and Privacy, p.2339-2356.

[38]Pei KX, Bieber D, Shi KS, et al., 2023. Can large language models reason about program invariants? Proc 40^th Int Conf on Machine Learning, p.27496-27520.

[39]Raghothaman M, Kulkarni S, Heo K, et al., 2018. User-guided program reasoning using Bayesian inference. Proc 39^th ACM SIGPLAN Conf on Programming Language Design and Implementation, p.722-735.

[40]Ruhsen T, 2018a. wget: Commit b3ff8ce. http://git.savannah.gnu.org/cgit/wget.git/commit/?id=b3ff8ce [Accessed on Nov. 4, 2024].

[41]Ruhsen T, 2018b. wget: Commit f0d715b. http://git.savannah.gnu.org/cgit/wget.git/commit/?id=f0d715b [Accessed on Nov. 4, 2024].

[42]Shen HH, Fang JH, Zhao JJ, 2011. Efindbugs: effective error ranking for findbugs. 4^th IEEE Int Conf on Software Testing, Verification and Validation, p.299-308.

[43]SouffleRules, 2024. Soufflé: a Datalog Synthesis Tool—Rules. https://souffle-lang.github.io/rules [Accessed on Nov. 4, 2024].

[44]Sun YQ, Wu DY, Xue Y, et al., 2024. GPTScan: detecting logic vulnerabilities in smart contracts by combining GPT with program analysis. Proc IEEE/ACM 46^th Int Conf on Software Engineering, Article 166.

[45]Touvron H, Martin L, Stone K, et al., 2023. Llama 2: open foundation and fine-tuned chat models. https://arxiv.org/abs/2307.09288

[46]Wei J, Wang XZ, Schuurmans D, et al., 2022. Chain-of-thought prompting elicits reasoning in large language models. Proc 36^th Int Conf on Neural Information Processing Systems, Article 1800.

[47]Yao SY, Yu D, Zhao J, et al., 2023. Tree of thoughts: deliberate problem solving with large language models. Proc 37^th Int Conf on Neural Information Processing Systems, Article 517.

[48]Zhang YF, Shi YF, Zhang X, 2024. Learning abstraction selection for Bayesian program analysis. Proc ACM Program Lang, 8(OOPSLA1):128.

[49]Zhou X, Cao SC, Sun XB, et al., 2025. Large language model for vulnerability detection and repair: literature review and the road ahead. ACM Trans Softw Eng Methodol, 34(5):145.

Open peer comments: Debate/Discuss/Question/Opinion

<1>

- Go to

大语言模型增强的静态分析警报概率模型

Darkslateblue:Affiliate; Royal Blue:Author; Turquoise:Article

Reference