JZUS - Journal of Zhejiang University SCIENCE

ENGINEERING Information Technology & Electronic Engineering

Accepted manuscript available online (unedited version)

A survey on large language model-based alpha mining

Author(s): Junjie ZHANG, Shuoling LIU, Tongzhe ZHANG, Yuchen SHI
Affiliation(s): College of Computing and Data Science, Nanyang Technological University, Singapore 639798, Singapore; more
Corresponding email(s): junjie.zhang@ntu.edu.sg, liushuoling@efunds.com.cn, zhangtongzhe@efunds.com.cn, shiyuchen@efunds.com.cn
Key Words: Alpha mining; Quantitative investment; Large language models (LLMs); LLM agents; Fintech

Share this article to： More <<< Previous Paper \|Next Paper >>>

Junjie ZHANG, Shuoling LIU, Tongzhe ZHANG, Yuchen SHI. A survey on large language model-based alpha mining[J]. Frontiers of Information Technology & Electronic Engineering,in press.https://doi.org/10.1631/FITEE.2500386

@article{title="A survey on large language model-based alpha mining",
author="Junjie ZHANG, Shuoling LIU, Tongzhe ZHANG, Yuchen SHI",
journal="Frontiers of Information Technology & Electronic Engineering",
year="in press",
publisher="Zhejiang University Press & Springer",
doi="https://doi.org/10.1631/FITEE.2500386"
}

%0 Journal Article
%T A survey on large language model-based alpha mining
%A Junjie ZHANG
%A Shuoling LIU
%A Tongzhe ZHANG
%A Yuchen SHI
%J Frontiers of Information Technology & Electronic Engineering
%P 1809-1821
%@ 2095-9184
%D in press
%I Zhejiang University Press & Springer
doi="https://doi.org/10.1631/FITEE.2500386"

TY - JOUR
T1 - A survey on large language model-based alpha mining
A1 - Junjie ZHANG
A1 - Shuoling LIU
A1 - Tongzhe ZHANG
A1 - Yuchen SHI
J0 - Frontiers of Information Technology & Electronic Engineering
SP - 1809
EP - 1821
%@ 2095-9184
Y1 - in press
PB - Zhejiang University Press & Springer
ER -
doi="https://doi.org/10.1631/FITEE.2500386"

Abstract
Chinese Summary
Academic Network
Reviewer Comment

Abstract: Alpha mining, which refers to the systematic discovery of data-driven signals predictive of future cross-sectional returns, is a central task in quantitative research. Recent progress in large language models (LLMs) has sparked interest in LLM-based alpha mining frameworks, which offer a promising middle ground between human-guided and fully automated alpha mining approaches and deliver both speed and semantic depth. This study presents a structured review of emerging LLM-based alpha mining systems from an agentic perspective, and analyzes the functional roles of LLMs, ranging from miners and evaluators to interactive assistants. Despite early progress, key challenges remain, including simplified performance evaluation, limited numerical understanding, lack of diversity and originality, weak exploration dynamics, temporal data leakage, and black-box risks and compliance challenges. Accordingly, we outline future directions, including improving reasoning alignment, expanding to new data modalities, rethinking evaluation protocols, and integrating LLMs into more general-purpose quantitative systems. Our analysis suggests that LLM is a scalable interface for amplifying both domain expertise and algorithmic rigor, as it amplifies domain expertise by transforming qualitative hypotheses into testable factors and enhances algorithmic rigor for rapid backtesting and semantic reasoning. The result is a complementary paradigm, where intuition, automation, and language-based reasoning converge to redefine the future of quantitative research.

基于大语言模型的阿尔法挖掘研究综述

张俊杰¹，刘硕凌²，张桐喆²，施雨晨^2,3
¹南洋理工大学计算机与数据科学学院，新加坡，639798
²易方达资产管理有限公司，中国广州市，510000
³新加坡国立大学工业系统工程与管理系，新加坡，119077
摘要：阿尔法挖掘指系统性地发现能够预测未来截面收益的数据驱动信号，是量化研究的核心任务。近年来，大语言模型（LLM）的进展催生基于LLM的阿尔法挖掘框架，这类框架在人工指导与算法自动挖掘方法之间提供了理想的中间路径，兼具效率与语义深度。本文从智能体视角出发，对新兴的基于LLM的阿尔法挖掘系统进行结构化综述，并分析LLM在挖掘者、评估者及交互助手中的功能性角色定位。尽管初期取得进展，关键挑战依然存在，包括简化的绩效评估、有限的数值理解能力、缺乏多样性与原创性、薄弱的探索动力学、时间数据泄露以及黑箱风险与合规性挑战。据此，我们勾勒出未来的发展方向，包括提升推理一致性、拓展至新型数据模态、重新审视评估方案，以及将LLM整合到更通用的量化系统中。我们的分析表明，LLM作为可扩展的接口，既能放大领域专业知识又能增强算法严谨性，即它通过将定性假设转化为可验证因素来强化领域专业知识，同时通过支持快速回测和语义推理来提升算法严谨性。由此形成的互补范式中，直觉、自动化与基于语言的推理相互融合，共同重塑量化研究的未来。

关键词组：阿尔法挖掘；量化投资；大语言模型（LLM）；LLM智能体；金融科技

Darkslateblue:Affiliate; Royal Blue:Author; Turquoise:Article

Reference

[1]Anthropic, 2024. The Claude 3 Model Family: Opus, Sonnet, Haiku. Anthropic Research Report. https://assets.anthropic.com/m/61e7d27f8c8f5919/original/Claude-3-Model-Card.pdf [Accessed on June 21, 2025].

[2]Cao BK, Wang SZ, Lin XY, et al., 2025. From deep learning to LLMs: a survey of AI in quantitative investment.

[3]Cao L, Xi ZK, Liao L, et al., 2025. Chain-of-Alpha: unleashing the power of large language models for alpha mining in quantitative trading.

[4]Chen AY, Lopez-Lira A, Zimmermann T, 2022. Does peer-reviewed research help predict stock returns?

[5]Chen HL, De P, Hu Y, et al., 2014. Wisdom of crowds: the value of stock opinions transmitted through social media. Rev Fin Stud, 27(5):1367-1403.

[6]Chen HT, Shen XJ, Ye ZQ, et al., 2024. RD2Bench: toward data-centric automatic R&D. Proc 13^th Int Conf on Learning Representations, p.1-22.

[7]Chen LY, Liu SL, Yan JP, et al., 2025. Advancing financial engineering with foundation models: progress, applications, and challenges.

[8]Cheng YH, Tang K, 2024. GPT's idea of stock factors. Quant Fin, 24(9):1301-1326.

[9]Cochrane JH, 2011. Presidential address: discount rates. J Fin, 66(4):1047-1108.

[10]DeepSeek-AI, Guo DY, Yang DJ, et al., 2025. DeepSeek-R1: incentivizing reasoning capability in LLMs via reinforcement learning.

[11]Ding H, Li YH, Wang JH, et al., 2024. Large language model agent in financial trading: a survey.

[12]Fama EF, French KR, 1993. Common risk factors in the returns on stocks and bonds. J Fin Econ, 33(1):3-56.

[13]Gemini Team of Google, 2024. Gemini 1.5: unlocking multimodal understanding across millions of tokens of context.

[14]Gu SH, Kelly B, Xiu DC, 2020. Empirical asset pricing via machine learning. Rev Fin Stud, 33(5):2223-2273.

[15]Guo J, Wang SZ, Ni LM, et al., 2024. Quant 4.0: engineering quantitative investment with automated, explainable, and knowledge-driven artificial intelligence. Front Inform Technol Electron Eng, 25(11):1421-1445.

[16]Harvey CR, Liu Y, Zhu HQ, 2016. ... and the cross-section of expected returns. Rev Fin Stud, 29(1):5-68.

[17]Jegadeesh N, Titman S, 1993. Returns to buying winners and selling losers: implications for stock market efficiency. J Fin, 48(1):65-91.

[18]Kent D, Lira M, Simon R, et al., 2020. The cross-section of risk and returns. Rev Fin Stud, 33(5):1927-1979.

[19]Kou ZZ, Yu H, Luo JY, et al., 2024. Automate strategy finding with LLM in quant investment.

[20]Li YT, Yang X, Yang X, et al., 2025. R&D-Agent-Quant: a multi-agent framework for data-centric factors and model joint optimization.

[21]Li ZW, Song R, Sun CH, et al., 2024. Can large language models mine interpretable financial factors more effectively? A neural-symbolic factor mining agent model. Findings of the Association for Computational Linguistics, p.3891-3902.

[22]Mehra S, Louka R, Zhang YX, 2022. ESGBERT: language model to help with classification tasks related to companies' environmental, social, and governance practices.

[23]Mirjalili S, 2019. Genetic algorithm. In: Mirjalili S (Ed.), Evolutionary Algorithms and Neural Networks: Theory and Applications. Springer, Cham, p.43-55.

[24]Nie YQ, Kong YX, Dong XW, et al., 2024. A survey of large language models for financial applications: progress, prospects and challenges.

[25]OpenAI, 2023. GPT-4 technical report.

[26]Papasotiriou K, Sood S, Reynolds S, et al., 2024. AI in investment analysis: LLMs for equity stock ratings. Proc 5^th ACM Int Conf on AI in Finance, p.419-427.

[27]Real E, Liang C, So D, et al., 2020. AutoML-Zero: evolving machine learning algorithms from scratch. Proc 37^th Int Conf on Machine Learning, p.8007-8019.

[28]Shi H, Song WL, Zhang XT, et al., 2025. AlphaForge: a framework to mine and dynamically combine formulaic alpha factors. Proc 39^th AAAI Conf on Artificial Intelligence, p.12524-12532.

[29]Shi Y, Duan YT, Li J, 2025. Navigating the alpha jungle: an LLM-Powered MCTS framework for formulaic factor mining.

[30]Srivastava P, Malik M, Gupta V, et al., 2024. Evaluating LLMs' mathematical reasoning in financial document question answering.

[31]Su HY, Wu K, Huang YH, et al., 2024. NumLLM: numeric-sensitive large language model for Chinese finance.

[32]Tang ZY, Chen ZC, Yang JR, et al., 2025. AlphaAgent: LLM-driven alpha mining with regularized exploration to counteract alpha decay.

[33]Wang SZ, Yuan H, Zhou L, et al., 2023. Alpha-GPT: human-AI interactive alpha mining for quantitative investment.

[34]Wang SZ, Yuan H, Ni LM, et al., 2024. QuantAgent: seeking holy grail in trading by self-improving large language model.

[35]Wang YN, Zhao JM, Lawryshyn Y, 2024. GPT-signal: generative AI for semi-automated feature engineering in the alpha research process.

[36]Weng LL, 2023. LLM Powered Autonomous Agents. Lil'Log. https://lilianweng.github.io/posts/2023-06-23-agent [Accessed on June 21, 2025].

[37]Wu SJ, Irsoy O, Lu S, et al., 2023. BloomberGPT: a large language model for finance. https://arxiv.org/abs/2303.17564

[38]Xia L, Yang MM, Liu Q, 2024. Using pre-trained language model for accurate ESG prediction. Proc 8^th Financial Technology and Natural Language and Proc 1^st Agent AI for Scenario Planning, p.1-22. https://aclanthology.org/2024.finnlp-2.1

[39]Yang X, Chen HT, Feng WJ, et al., 2024. Collaborative evolving strategy for automatic data-centric development.

[40]Yu S, Xue HY, Ao X, et al., 2023. Generating synergistic formulaic alpha collections via reinforcement learning. Proc 29^th ACM SIGKDD Conf on Knowledge Discovery and Data Mining, p.5476-5486.

[41]Yu YY, Yao ZY, Li HH, et al., 2024. FinCon: a synthesized LLM multi-agent system with conceptual verbal reinforcement for enhanced financial decision making. Proc 38^th Int Conf on Neural Information Processing Systems, Article 4354.

[42]Yuan H, Wang SZ, Guo J, 2024. Alpha-GPT 2.0: human-in-the-loop AI for quantitative investment.

[43]Zhang Q, Qin C, Zhang Y, et al., 2022. Transformer-based attention network for stock movement prediction. Expert Syst Appl, 202:117239.

[44]Zhang TP, Zhang ZYA, Fan ZY, et al., 2023. OpenFE: automated feature generation with expert-level performance. Proc 40^th Int Conf on Machine Learning, p.41880-41901.

Open peer comments: Debate/Discuss/Question/Opinion

<1>

- Go to

基于大语言模型的阿尔法挖掘研究综述

Darkslateblue:Affiliate; Royal Blue:Author; Turquoise:Article

Reference