
CLC number: TP391;TP18
On-line Access: 2025-11-17
Received: 2025-06-07
Revision Accepted: 2025-11-18
Crosschecked: 2025-09-03
Cited: 0
Clicked: 985
Citations: Bibtex RefMan EndNote GB/T7714
Junjie ZHANG, Shuoling LIU, Tongzhe ZHANG, Yuchen SHI. A survey on large language model-based alpha mining[J]. Frontiers of Information Technology & Electronic Engineering,in press.https://doi.org/10.1631/FITEE.2500386 @article{title="A survey on large language model-based alpha mining", %0 Journal Article TY - JOUR
基于大语言模型的阿尔法挖掘研究综述1南洋理工大学计算机与数据科学学院,新加坡,639798 2易方达资产管理有限公司,中国广州市,510000 3新加坡国立大学工业系统工程与管理系,新加坡,119077 摘要:阿尔法挖掘指系统性地发现能够预测未来截面收益的数据驱动信号,是量化研究的核心任务。近年来,大语言模型(LLM)的进展催生基于LLM的阿尔法挖掘框架,这类框架在人工指导与算法自动挖掘方法之间提供了理想的中间路径,兼具效率与语义深度。本文从智能体视角出发,对新兴的基于LLM的阿尔法挖掘系统进行结构化综述,并分析LLM在挖掘者、评估者及交互助手中的功能性角色定位。尽管初期取得进展,关键挑战依然存在,包括简化的绩效评估、有限的数值理解能力、缺乏多样性与原创性、薄弱的探索动力学、时间数据泄露以及黑箱风险与合规性挑战。据此,我们勾勒出未来的发展方向,包括提升推理一致性、拓展至新型数据模态、重新审视评估方案,以及将LLM整合到更通用的量化系统中。我们的分析表明,LLM作为可扩展的接口,既能放大领域专业知识又能增强算法严谨性,即它通过将定性假设转化为可验证因素来强化领域专业知识,同时通过支持快速回测和语义推理来提升算法严谨性。由此形成的互补范式中,直觉、自动化与基于语言的推理相互融合,共同重塑量化研究的未来。 关键词组: Darkslateblue:Affiliate; Royal Blue:Author; Turquoise:Article
Reference[1]Anthropic, 2024. The Claude 3 Model Family: Opus, Sonnet, Haiku. Anthropic Research Report. https://assets.anthropic.com/m/61e7d27f8c8f5919/original/Claude-3-Model-Card.pdf [Accessed on June 21, 2025]. ![]() [2]Cao BK, Wang SZ, Lin XY, et al., 2025. From deep learning to LLMs: a survey of AI in quantitative investment. ![]() [3]Cao L, Xi ZK, Liao L, et al., 2025. Chain-of-Alpha: unleashing the power of large language models for alpha mining in quantitative trading. ![]() [4]Chen AY, Lopez-Lira A, Zimmermann T, 2022. Does peer-reviewed research help predict stock returns? ![]() [5]Chen HL, De P, Hu Y, et al., 2014. Wisdom of crowds: the value of stock opinions transmitted through social media. Rev Fin Stud, 27(5):1367-1403. ![]() [6]Chen HT, Shen XJ, Ye ZQ, et al., 2024. RD2Bench: toward data-centric automatic R&D. Proc 13th Int Conf on Learning Representations, p.1-22. ![]() [7]Chen LY, Liu SL, Yan JP, et al., 2025. Advancing financial engineering with foundation models: progress, applications, and challenges. ![]() [8]Cheng YH, Tang K, 2024. GPT's idea of stock factors. Quant Fin, 24(9):1301-1326. ![]() [9]Cochrane JH, 2011. Presidential address: discount rates. J Fin, 66(4):1047-1108. ![]() [10]DeepSeek-AI, Guo DY, Yang DJ, et al., 2025. DeepSeek-R1: incentivizing reasoning capability in LLMs via reinforcement learning. ![]() [11]Ding H, Li YH, Wang JH, et al., 2024. Large language model agent in financial trading: a survey. ![]() [12]Fama EF, French KR, 1993. Common risk factors in the returns on stocks and bonds. J Fin Econ, 33(1):3-56. ![]() [13]Gemini Team of Google, 2024. Gemini 1.5: unlocking multimodal understanding across millions of tokens of context. ![]() [14]Gu SH, Kelly B, Xiu DC, 2020. Empirical asset pricing via machine learning. Rev Fin Stud, 33(5):2223-2273. ![]() [15]Guo J, Wang SZ, Ni LM, et al., 2024. Quant 4.0: engineering quantitative investment with automated, explainable, and knowledge-driven artificial intelligence. Front Inform Technol Electron Eng, 25(11):1421-1445. ![]() [16]Harvey CR, Liu Y, Zhu HQ, 2016. ... and the cross-section of expected returns. Rev Fin Stud, 29(1):5-68. ![]() [17]Jegadeesh N, Titman S, 1993. Returns to buying winners and selling losers: implications for stock market efficiency. J Fin, 48(1):65-91. ![]() [18]Kent D, Lira M, Simon R, et al., 2020. The cross-section of risk and returns. Rev Fin Stud, 33(5):1927-1979. ![]() [19]Kou ZZ, Yu H, Luo JY, et al., 2024. Automate strategy finding with LLM in quant investment. ![]() [20]Li YT, Yang X, Yang X, et al., 2025. R&D-Agent-Quant: a multi-agent framework for data-centric factors and model joint optimization. ![]() [21]Li ZW, Song R, Sun CH, et al., 2024. Can large language models mine interpretable financial factors more effectively? A neural-symbolic factor mining agent model. Findings of the Association for Computational Linguistics, p.3891-3902. ![]() [22]Mehra S, Louka R, Zhang YX, 2022. ESGBERT: language model to help with classification tasks related to companies' environmental, social, and governance practices. ![]() [23]Mirjalili S, 2019. Genetic algorithm. In: Mirjalili S (Ed.), Evolutionary Algorithms and Neural Networks: Theory and Applications. Springer, Cham, p.43-55. ![]() [24]Nie YQ, Kong YX, Dong XW, et al., 2024. A survey of large language models for financial applications: progress, prospects and challenges. ![]() [25]OpenAI, 2023. GPT-4 technical report. ![]() [26]Papasotiriou K, Sood S, Reynolds S, et al., 2024. AI in investment analysis: LLMs for equity stock ratings. Proc 5th ACM Int Conf on AI in Finance, p.419-427. ![]() [27]Real E, Liang C, So D, et al., 2020. AutoML-Zero: evolving machine learning algorithms from scratch. Proc 37th Int Conf on Machine Learning, p.8007-8019. ![]() [28]Shi H, Song WL, Zhang XT, et al., 2025. AlphaForge: a framework to mine and dynamically combine formulaic alpha factors. Proc 39th AAAI Conf on Artificial Intelligence, p.12524-12532. ![]() [29]Shi Y, Duan YT, Li J, 2025. Navigating the alpha jungle: an LLM-Powered MCTS framework for formulaic factor mining. ![]() [30]Srivastava P, Malik M, Gupta V, et al., 2024. Evaluating LLMs' mathematical reasoning in financial document question answering. ![]() [31]Su HY, Wu K, Huang YH, et al., 2024. NumLLM: numeric-sensitive large language model for Chinese finance. ![]() [32]Tang ZY, Chen ZC, Yang JR, et al., 2025. AlphaAgent: LLM-driven alpha mining with regularized exploration to counteract alpha decay. ![]() [33]Wang SZ, Yuan H, Zhou L, et al., 2023. Alpha-GPT: human-AI interactive alpha mining for quantitative investment. ![]() [34]Wang SZ, Yuan H, Ni LM, et al., 2024. QuantAgent: seeking holy grail in trading by self-improving large language model. ![]() [35]Wang YN, Zhao JM, Lawryshyn Y, 2024. GPT-signal: generative AI for semi-automated feature engineering in the alpha research process. ![]() [36]Weng LL, 2023. LLM Powered Autonomous Agents. Lil'Log. https://lilianweng.github.io/posts/2023-06-23-agent [Accessed on June 21, 2025]. ![]() [37]Wu SJ, Irsoy O, Lu S, et al., 2023. BloomberGPT: a large language model for finance. https://arxiv.org/abs/2303.17564 ![]() [38]Xia L, Yang MM, Liu Q, 2024. Using pre-trained language model for accurate ESG prediction. Proc 8th Financial Technology and Natural Language and Proc 1st Agent AI for Scenario Planning, p.1-22. https://aclanthology.org/2024.finnlp-2.1 ![]() [39]Yang X, Chen HT, Feng WJ, et al., 2024. Collaborative evolving strategy for automatic data-centric development. ![]() [40]Yu S, Xue HY, Ao X, et al., 2023. Generating synergistic formulaic alpha collections via reinforcement learning. Proc 29th ACM SIGKDD Conf on Knowledge Discovery and Data Mining, p.5476-5486. ![]() [41]Yu YY, Yao ZY, Li HH, et al., 2024. FinCon: a synthesized LLM multi-agent system with conceptual verbal reinforcement for enhanced financial decision making. Proc 38th Int Conf on Neural Information Processing Systems, Article 4354. ![]() [42]Yuan H, Wang SZ, Guo J, 2024. Alpha-GPT 2.0: human-in-the-loop AI for quantitative investment. ![]() [43]Zhang Q, Qin C, Zhang Y, et al., 2022. Transformer-based attention network for stock movement prediction. Expert Syst Appl, 202:117239. ![]() [44]Zhang TP, Zhang ZYA, Fan ZY, et al., 2023. OpenFE: automated feature generation with expert-level performance. Proc 40th Int Conf on Machine Learning, p.41880-41901. ![]() Journal of Zhejiang University-SCIENCE, 38 Zheda Road, Hangzhou
310027, China
Tel: +86-571-87952783; E-mail: cjzhang@zju.edu.cn Copyright © 2000 - 2026 Journal of Zhejiang University-SCIENCE | ||||||||||||||


ORCID:
Open peer comments: Debate/Discuss/Question/Opinion
<1>