JZUS - Journal of Zhejiang University SCIENCE

Journal of Zhejiang University SCIENCE C 1998 Vol.-1 No.-1 P.

Training large-scale models with limited GPU memory: a survey

Author(s): Yu TANG, Linbo QIAO, Lujia YIN, Peng LIANG, Ao SHEN, Zhilin YANG, Lizhi ZHANG, Dongsheng LI
Affiliation(s): National University of Defense Technology, Changsha 410073, China
Corresponding email(s): qiao.linbo@nudt.edu.cn, dsli@nudt.edu.cn
Key Words: Training techniques, Memory optimization, Model parameters, Model states, Model activations

Share this article to： More <<< Previous Article \|Next Article >>>

Yu TANG, Linbo QIAO, Lujia YIN, Peng LIANG, Ao SHEN, Zhilin YANG, Lizhi ZHANG, Dongsheng LI. Training large-scale models with limited GPU memory: a survey[J]. Frontiers of Information Technology & Electronic Engineering, 1998, -1(-1): .

@article{title="Training large-scale models with limited GPU memory: a survey",
author="Yu TANG, Linbo QIAO, Lujia YIN, Peng LIANG, Ao SHEN, Zhilin YANG, Lizhi ZHANG, Dongsheng LI",
journal="Frontiers of Information Technology & Electronic Engineering",
volume="-1",
number="-1",
pages="",
year="1998",
publisher="Zhejiang University Press & Springer",
doi="10.1631/FITEE.2300710"
}

%0 Journal Article
%T Training large-scale models with limited GPU memory: a survey
%A Yu TANG
%A Linbo QIAO
%A Lujia YIN
%A Peng LIANG
%A Ao SHEN
%A Zhilin YANG
%A Lizhi ZHANG
%A Dongsheng LI
%J Journal of Zhejiang University SCIENCE C
%V -1
%N -1
%P
%@ 2095-9184
%D 1998
%I Zhejiang University Press & Springer
%DOI 10.1631/FITEE.2300710

TY - JOUR
T1 - Training large-scale models with limited GPU memory: a survey
A1 - Yu TANG
A1 - Linbo QIAO
A1 - Lujia YIN
A1 - Peng LIANG
A1 - Ao SHEN
A1 - Zhilin YANG
A1 - Lizhi ZHANG
A1 - Dongsheng LI
J0 - Journal of Zhejiang University Science C
VL - -1
IS - -1
SP -
EP -
%@ 2095-9184
Y1 - 1998
PB - Zhejiang University Press & Springer
ER -
DOI - 10.1631/FITEE.2300710

Abstract
Chinese Summary
Academic Network
Reviewer Comment

Abstract: Large-scale models have gained significant attention within a wide range of fields, such as computer vision and natural language processing, due to their effectiveness across various applications. However, a notable hurdle in training these large-scale models is the limited memory capacity of GPUs. In this paper, we present a comprehensive survey focused on training large-scale models with limited GPU memory. The exploration commences by scrutinizing the factors that contribute to the consumption of GPU memory during the training process, namely model parameters, model states, and model activations. Following this analysis, we present an in-depth overview of the relevant research work that addresses these aspects individually. Finally, the paper concludes by presenting an outlook on the future of memory optimization in training large-scale language models, emphasizing the necessity for continued research and innovation in this area. This survey serves as a valuable resource for researchers and practitioners keen on comprehending the challenges and advancements in training large-scale language models with limited GPU memory.

Darkslateblue:Affiliate; Royal Blue:Author; Turquoise:Article

Open peer comments: Debate/Discuss/Question/Opinion

<1>

Similar articles

- Go to

Darkslateblue:Affiliate; Royal Blue:Author; Turquoise:Article