
CLC number: TP311.53;TP183
On-line Access: 2025-11-17
Received: 2025-01-16
Revision Accepted: 2025-11-18
Crosschecked: 2025-04-14
Cited: 0
Clicked: 748
Citations: Bibtex RefMan EndNote GB/T7714
https://orcid.org/0009-0006-3328-4080
Xinlong PAN, Jianhua LI, Zhihong ZHOU, Gaolei LI, Xiuzhen CHEN, Jin MA, Jun WU, Quanhai ZHANG. Large language model-enhanced probabilistic modeling for effective static analysis alarms[J]. Frontiers of Information Technology & Electronic Engineering, 2025, 26(10): 1926-1941.
@article{title="Large language model-enhanced probabilistic modeling for effective static analysis alarms",
author="Xinlong PAN, Jianhua LI, Zhihong ZHOU, Gaolei LI, Xiuzhen CHEN, Jin MA, Jun WU, Quanhai ZHANG",
journal="Frontiers of Information Technology & Electronic Engineering",
volume="26",
number="10",
pages="1926-1941",
year="2025",
publisher="Zhejiang University Press & Springer",
doi="10.1631/FITEE.2500038"
}
%0 Journal Article
%T Large language model-enhanced probabilistic modeling for effective static analysis alarms
%A Xinlong PAN
%A Jianhua LI
%A Zhihong ZHOU
%A Gaolei LI
%A Xiuzhen CHEN
%A Jin MA
%A Jun WU
%A Quanhai ZHANG
%J Frontiers of Information Technology & Electronic Engineering
%V 26
%N 10
%P 1926-1941
%@ 2095-9184
%D 2025
%I Zhejiang University Press & Springer
%DOI 10.1631/FITEE.2500038
TY - JOUR
T1 - Large language model-enhanced probabilistic modeling for effective static analysis alarms
A1 - Xinlong PAN
A1 - Jianhua LI
A1 - Zhihong ZHOU
A1 - Gaolei LI
A1 - Xiuzhen CHEN
A1 - Jin MA
A1 - Jun WU
A1 - Quanhai ZHANG
J0 - Frontiers of Information Technology & Electronic Engineering
VL - 26
IS - 10
SP - 1926
EP - 1941
%@ 2095-9184
Y1 - 2025
PB - Zhejiang University Press & Springer
ER -
DOI - 10.1631/FITEE.2500038
Abstract: static analysis presents significant challenges in alarm handling, where probabilistic models and alarm prioritization are essential methods for addressing these issues. These models prioritize alarms based on user feedback, thereby alleviating the burden on users to manually inspect alarms. However, they often encounter limitations related to efficiency and issues such as false generalization. While learning-based approaches have demonstrated promise, they typically incur high training costs and are constrained by the predefined structures of existing models. Moreover, the integration of large language models (LLMs) in static analysis has yet to reach its full potential, often resulting in lower accuracy rates in vulnerability identification. To tackle these challenges, we introduce BinLLM, a novel framework that harnesses the generalization capabilities of LLMs to enhance alarm probability models through rule learning. Our approach integrates LLM-derived abstract rules into the probabilistic model, using alarm paths and critical statements from static analysis. This integration enhances the model's reasoning capabilities, improving its effectiveness in prioritizing genuine bugs while mitigating false generalizations. We evaluated BinLLM on a suite of C programs and observed 40.1% and 9.4% reduction in the number of checks required for alarm verification compared to two state-of-the-art baselines, Bingo and BayeSmith, respectively, underscoring the potential of combining LLMs with static analysis to improve alarm management.
[1]Achiam J, Adler S, Agarwal S, et al., 2023. GPT-4 technical report. https://arxiv.org/abs/2303.08774
[2]Ahmed T, Pai KS, Devanbu P, et al., 2023. Improving few-shot prompts with relevant static analysis products. https://arxiv.org/abs/2304.06815v1
[3]Beller M, Bholanath R, McIntosh S, et al., 2016. Analyzing the state of static analysis: a large-scale evaluation in open source software. IEEE 23rd Int Conf on Software Analysis, Evolution, and Reengineering, p.470-481.
[4]Chen M, Tworek J, Jun H, et al., 2021. Evaluating large language models trained on code. https://arxiv.org/abs/2107.03374
[5]Chen TY, Heo K, Raghothaman M, 2021. Boosting static analysis accuracy with instrumented test executions. Proc 29th ACM Joint Meeting on European Software Engineering Conf and Symp on the Foundations of Software Engineering, p.1154-1165.
[6]Christakis M, Bird C, 2016. What developers want and need from program analysis: an empirical study. Proc 31st IEEE/ACM Int Conf on Automated Software Engineering, p.332-343.
[7]Eggert P, 2010. sort: Commit 14ad7a2. http://git.savannah.gnu.org/cgit/coreutils.git/commit/?id=14ad7a2 [Accessed on Nov. 4, 2024].
[8]Feng ZY, Guo DY, Tang DY, et al., 2020. CodeBERT: a pre-trained model for programming and natural languages. Findings of the Association for Computational Linguistics: EMNLP, p.1536-1547.
[9]Ferrante J, Ottenstein KJ, Warren JD, 1987. The program dependence graph and its use in optimization. ACM Trans Program Lang Syst, 9(3):319-349.
[10]Gao ZY, Wang H, Zhou YC, et al., 2023. How far have we gone in vulnerability detection using large language models. https://arxiv.org/abs/2311.12420
[11]Heo K, Oh H, Yi K, 2017. Machine-learning-guided selectively unsound static analysis. IEEE/ACM 39th Int Conf on Software Engineering, p.519-529.
[12]Heo K, Raghothaman M, Si XJ, et al., 2019. Continuously reasoning about programs using differential Bayesian inference. Proc 40th ACM SIGPLAN Conf on Programming Language Design and Implementation, p.561-575.
[13]Ji ZW, Lee N, Frieske R, et al., 2023. Survey of hallucination in natural language generation. ACM Comput Surv, 55(12):248.
[14]Khanfar H, Lisper B, Masud AN, 2015. Static backward program slicing for safety-critical systems. 20th Ada-Europe Int Conf on Reliable Software Technologies, p.50-65.
[15]Kim H, Raghothaman M, Heo K, 2022. Learning probabilistic models for static analysis alarms. Proc 44th Int Conf on Software Engineering, p.1282-1293.
[16]Li HN, Hao Y, Zhai YZ, et al., 2023. The Hitchhiker's guide to program analysis: a journey with large language models. https://arxiv.org/abs/2308.00245
[17]Li HN, Hao Y, Zhai YZ, et al., 2024. Enhancing static analysis for practical bug detection: an LLM-integrated approach. Proc ACM Program Lang, 8(OOPSLA1):111.
[18]Li ZY, Dutta S, Naik M, 2024. IRIS: LLM-assisted static analysis for detecting security vulnerabilities. https://arxiv.org/abs/2405.17238
[19]Libtiff developers, 2024. Issue #624 - libtiff. https://gitlab.com/libtiff/libtiff/-/issues/624 [Accessed on Nov. 4, 2024].
[20]Lisper B, Masud AN, Khanfar H, 2015. Static backward demand-driven slicing. Proc Workshop on Partial Evaluation and Program Manipulation, p.115-126.
[21]Ma W, Liu SQ, Lin ZH, et al., 2023. LMs: understanding code syntax and semantics for code analysis. https://arxiv.org/abs/2305.12138
[22]Mangal R, Zhang X, Nori AV, et al., 2015. A user-guided approach to program analysis. Proc 10th Joint Meeting on Foundations of Software Engineering, p.462-473.
[23]Meyering J, 2018. tar: Commit b531801. http://git.savannah.gnu.org/cgit/tar.git/commit/?id=b531801 [Accessed on Nov. 4, 2024].
[24]MITRE, 2015a. CVE-2015-1345. https://cve.mitre.org/cgi-bin/cvename.cgi?name=CVE-2015-1345 [Accessed on Nov. 4, 2024].
[25]MITRE, 2015b. CVE-2015-8106. https://cve.mitre.org/cgi-bin/cvename.cgi?name=CVE-2015-8106 [Accessed on Nov. 4, 2024].
[26]MITRE, 2016. CVE-2016-10713. https://cve.mitre.org/cgi-bin/cvename.cgi?name=CVE-2016-10713 [Accessed on Nov. 4, 2024].
[27]MITRE, 2017a. CVE-2017-9181. https://cve.mitre.org/cgi-bin/cvename.cgi?name=CVE-2017-9181 [Accessed on Nov. 4, 2024].
[28]MITRE, 2017b. CVE-2017-16663. https://cve.mitre.org/cgi-bin/cvename.cgi?name=CVE-2017-16663 [Accessed on Nov. 4, 2024].
[29]MITRE, 2018. CVE-2018-10372. https://cve.mitre.org/cgi-bin/cvename.cgi?name=CVE-2018-10372 [Accessed on Nov. 4, 2024].
[30]MITRE, 2019a. CVE-2019-16166. https://cve.mitre.org/cgi-bin/cvename.cgi?name=CVE-2019-16166 [Accessed on Nov. 4, 2024].
[31]MITRE, 2019b. CVE-2019-18397. https://cve.mitre.org/cgi-bin/cvename.cgi?name=CVE-2019-18397 [Accessed on Nov. 4, 2024].
[32]MITRE, 2024. CVE-2024-7006. https://cve.mitre.org/cgi-bin/cvename.cgi?name=CVE-2024-7006 [Accessed on Nov. 4, 2024].
[33]Mohajer MM, Aleithan R, Harzevili NS, et al., 2023. SkipAnalyzer: an embodied agent for code analysis with large language models. https://arxiv.org/abs/2310.18532
[34]Mooij JM, 2010. libDAI: a free and open source C++ library for discrete approximate inference in graphical models. J Mach Learn Res, 11:2169-2173.
[35]Muske T, Serebrenik A, 2022. Survey of approaches for postprocessing of static analysis alarms. ACM Comput Surv, 55(3):48.
[36]Oh H, Heo K, Lee W, et al., 2012. The Sparrow static analyzer. https://github.com/ropas/sparrow [Accessed on Nov. 4, 2024].
[37]Pearce H, Tan B, Ahmad B, et al., 2023. Examining zero-shot vulnerability repair with large language models. IEEE Symp on Security and Privacy, p.2339-2356.
[38]Pei KX, Bieber D, Shi KS, et al., 2023. Can large language models reason about program invariants? Proc 40th Int Conf on Machine Learning, p.27496-27520.
[39]Raghothaman M, Kulkarni S, Heo K, et al., 2018. User-guided program reasoning using Bayesian inference. Proc 39th ACM SIGPLAN Conf on Programming Language Design and Implementation, p.722-735.
[40]Ruhsen T, 2018a. wget: Commit b3ff8ce. http://git.savannah.gnu.org/cgit/wget.git/commit/?id=b3ff8ce [Accessed on Nov. 4, 2024].
[41]Ruhsen T, 2018b. wget: Commit f0d715b. http://git.savannah.gnu.org/cgit/wget.git/commit/?id=f0d715b [Accessed on Nov. 4, 2024].
[42]Shen HH, Fang JH, Zhao JJ, 2011. Efindbugs: effective error ranking for findbugs. 4th IEEE Int Conf on Software Testing, Verification and Validation, p.299-308.
[43]SouffleRules, 2024. Soufflé: a Datalog Synthesis Tool—Rules. https://souffle-lang.github.io/rules [Accessed on Nov. 4, 2024].
[44]Sun YQ, Wu DY, Xue Y, et al., 2024. GPTScan: detecting logic vulnerabilities in smart contracts by combining GPT with program analysis. Proc IEEE/ACM 46th Int Conf on Software Engineering, Article 166.
[45]Touvron H, Martin L, Stone K, et al., 2023. Llama 2: open foundation and fine-tuned chat models. https://arxiv.org/abs/2307.09288
[46]Wei J, Wang XZ, Schuurmans D, et al., 2022. Chain-of-thought prompting elicits reasoning in large language models. Proc 36th Int Conf on Neural Information Processing Systems, Article 1800.
[47]Yao SY, Yu D, Zhao J, et al., 2023. Tree of thoughts: deliberate problem solving with large language models. Proc 37th Int Conf on Neural Information Processing Systems, Article 517.
[48]Zhang YF, Shi YF, Zhang X, 2024. Learning abstraction selection for Bayesian program analysis. Proc ACM Program Lang, 8(OOPSLA1):128.
[49]Zhou X, Cao SC, Sun XB, et al., 2025. Large language model for vulnerability detection and repair: literature review and the road ahead. ACM Trans Softw Eng Methodol, 34(5):145.
Open peer comments: Debate/Discuss/Question/Opinion
<1>