CLC number:
On-line Access: 2025-09-19
Received: 2025-06-25
Revision Accepted: 2025-09-03
Crosschecked: 0000-00-00
Cited: 0
Clicked: 15
Junjie ZHANG1, Liyuan CHEN2, Shuoling LIU2, Tongzhe ZHANG2, Yuchen SHI2,3. A survey on large language model-based alpha mining[J]. Frontiers of Information Technology & Electronic Engineering, 1998, -1(-1): .
@article{title="A survey on large language model-based alpha mining",
author="Junjie ZHANG1, Liyuan CHEN2, Shuoling LIU2, Tongzhe ZHANG2, Yuchen SHI2,3",
journal="Frontiers of Information Technology & Electronic Engineering",
volume="-1",
number="-1",
pages="",
year="1998",
publisher="Zhejiang University Press & Springer",
doi="10.1631/FITEE.2500386"
}
%0 Journal Article
%T A survey on large language model-based alpha mining
%A Junjie ZHANG1
%A Liyuan CHEN2
%A Shuoling LIU2
%A Tongzhe ZHANG2
%A Yuchen SHI2
%A 3
%J Journal of Zhejiang University SCIENCE C
%V -1
%N -1
%P
%@ 2095-9184
%D 1998
%I Zhejiang University Press & Springer
%DOI 10.1631/FITEE.2500386
TY - JOUR
T1 - A survey on large language model-based alpha mining
A1 - Junjie ZHANG1
A1 - Liyuan CHEN2
A1 - Shuoling LIU2
A1 - Tongzhe ZHANG2
A1 - Yuchen SHI2
A1 - 3
J0 - Journal of Zhejiang University Science C
VL - -1
IS - -1
SP -
EP -
%@ 2095-9184
Y1 - 1998
PB - Zhejiang University Press & Springer
ER -
DOI - 10.1631/FITEE.2500386
Abstract: alpha mining, which refers to the systematic discovery of data-driven signals predictive of future cross-sectional returns, is a central task in quantitative research. Recent progress in large language models (LLMs) has sparked interest in LLM-based alpha mining frameworks, which offer a promising middle ground between human-guided and fully automated alpha mining approaches and deliver both speed and semantic depth. This study presents a structured review of emerging LLM-based alpha mining systems from an agentic perspective and analyzes the functional roles of LLMs, ranging from miners and evaluators to interactive assistants. Despite early progress, key challenges remain, including limited numerical reasoning, weak exploitation mechanisms, low factor diversity, and risks of information leakage. Accordingly, we outline future working directions, including improving reasoning alignment, expanding to new data modalities, rethinking evaluation protocols, and integrating LLMs into more general-purpose quantitative systems. Our analysis suggests that LLM is a scalable interface for amplifying both domain expertise and algorithmic rigor,as it amplifies domain expertise by transforming qualitative hypotheses into testable factors, and enhances algorithmic rigor by serving as an interface for rapid backtesting and semantic reasoning. The result is a complementary paradigmwhere intuition, automation, and language-based reasoning convergeto redefine the future of quantitative research.
Open peer comments: Debate/Discuss/Question/Opinion
<1>