JZUS - Journal of Zhejiang University SCIENCE

Frontiers of Information Technology & Electronic Engineering

Accepted manuscript available online (unedited version)

Adaptive layer splitting for wireless large language model inference in edge computing: a model-based reinforcement learning approach

Author(s): Yuxuan CHEN, Rongpeng LI, Xiaoxue YU, Zhifeng ZHAO, Honggang ZHANG
Affiliation(s): College of Information Science & Electronic Engineering, Zhejiang University, Hangzhou 310027, China; more
Corresponding email(s): cyx00@zju.edu.cn, lirongpeng@zju.edu.cn, sdwhyxx@zju.edu.cn, zhaozf@zhejianglab.com, honggangzhang@zju.edu.cn
Key Words: Large language models (LLMs); Edge computing; Model-based reinforcement learning (MBRL); Split inference; Transformer

Share this article to： More <<< Previous Paper \|Next Paper >>>

Yuxuan CHEN, Rongpeng LI, Xiaoxue YU, Zhifeng ZHAO, Honggang ZHANG. Adaptive layer splitting for wireless large language model inference in edge computing: a model-based reinforcement learning approach[J]. Frontiers of Information Technology & Electronic Engineering,in press.https://doi.org/10.1631/FITEE.2400468

@article{title="Adaptive layer splitting for wireless large language model inference in edge computing: a model-based reinforcement learning approach",
author="Yuxuan CHEN, Rongpeng LI, Xiaoxue YU, Zhifeng ZHAO, Honggang ZHANG",
journal="Frontiers of Information Technology & Electronic Engineering",
year="in press",
publisher="Zhejiang University Press & Springer",
doi="https://doi.org/10.1631/FITEE.2400468"
}

%0 Journal Article
%T Adaptive layer splitting for wireless large language model inference in edge computing: a model-based reinforcement learning approach
%A Yuxuan CHEN
%A Rongpeng LI
%A Xiaoxue YU
%A Zhifeng ZHAO
%A Honggang ZHANG
%J Frontiers of Information Technology & Electronic Engineering
%P 278-292
%@ 2095-9184
%D in press
%I Zhejiang University Press & Springer
doi="https://doi.org/10.1631/FITEE.2400468"

TY - JOUR
T1 - Adaptive layer splitting for wireless large language model inference in edge computing: a model-based reinforcement learning approach
A1 - Yuxuan CHEN
A1 - Rongpeng LI
A1 - Xiaoxue YU
A1 - Zhifeng ZHAO
A1 - Honggang ZHANG
J0 - Frontiers of Information Technology & Electronic Engineering
SP - 278
EP - 292
%@ 2095-9184
Y1 - in press
PB - Zhejiang University Press & Springer
ER -
doi="https://doi.org/10.1631/FITEE.2400468"

Abstract
Chinese Summary
Academic Network
Reviewer Comment

Abstract: Optimizing the deployment of large language models (LLMs) in edge computing environments is critical for enhancing privacy and computational efficiency. In the path toward efficient wireless LLM inference in edge computing, this study comprehensively analyzes the impact of different splitting points in mainstream open-source LLMs. Accordingly, this study introduces a framework taking inspiration from model-based reinforcement learning to determine the optimal splitting point across the edge and user equipment. By incorporating a reward surrogate model, our approach significantly reduces the computational cost of frequent performance evaluations. Extensive simulations demonstrate that this method effectively balances inference performance and computational load under varying network conditions, providing a robust solution for LLM deployment in decentralized settings.

基于模型强化学习的边缘计算无线大语言模型推理自适应层切分方法

陈宇轩¹，李荣鹏¹，于小雪¹，赵志峰²，张宏纲¹
¹浙江大学信息与电子工程学院，中国杭州市，310027
²之江实验室，中国杭州市，310012
摘要：在边缘计算环境中优化大型语言模型（LLMs）的部署对提升隐私保护和计算效率至关重要。为实现高效的无线LLM推理，本文全面分析了主流开源LLMs中不同分割点的影响。本文引入一个基于模型的强化学习（MBRL）框架，以确定边缘和用户设备（UE）之间的最佳分割点。通过引入奖励替代模型，该方法显著减少了频繁的性能评估的计算成本。广泛的仿真结果表明，该方法在不同网络条件下有效地平衡了推理性能和计算负载，为去中心化环境中LLM的部署提供稳健的解决方案。

关键词组：大型语言模型；边缘计算；基于模型的强化学习；分裂推理；Transformer模型

Darkslateblue:Affiliate; Royal Blue:Author; Turquoise:Article

Reference

[1]Abbas N, Zhang Y, Taherkordi A, et al., 2018. Mobile edge computing: a survey. IEEE Int Things J, 5(1):450-465.

[2]Bai YT, Jones A, Ndousse K, et al., 2022. Training a helpful and harmless assistant with reinforcement learning from human feedback. https://arxiv.org/abs/2204.05862

[3]Beaulieu NC, Cheng C, 2005. Efficient Nakagami-m fading channel simulation. IEEE Trans Veh Technol, 54(2):413-424.

[4]Brown TB, Mann B, Ryder N, et al., 2020. Language models are few-shot learners. Proc 34th Int Conf on Neural Information Processing Systems, Article 159.

[5]Chen L, Ahmed NK, Dutta A, et al., 2024. The landscape and challenges of HPC research and LLMs. https://arxiv.org/abs/2402.02018

[6]Chen MZ, Gündüz D, Huang KB, et al., 2021. Distributed learning in wireless networks: recent progress and future challenges. IEEE J Sel Areas Commun, 39(12):3579-3605.

[7]Chen YX, Li RP, Zhao ZF, et al., 2024. NetGPT: an AI-native network architecture for provisioning beyond personalized generative services. IEEE Netw, 38(6):404-413.

[8]Cleveland WS, 1979. Robust locally weighted regression and smoothing scatterplots. J Am Stat Assoc, 74(368):829-836.

[9]Deisenroth MP, Rasmussen CE, 2011. PILCO: a model-based and data-efficient approach to policy search. Proc 28th Int Conf on Machine Learning, p.465-472.

[10]Dong QF, Chen XL, Satyanarayanan M, 2024. Creating edge AI from cloud-based LLMs. Proc 25th Int Workshop on Mobile Computing Systems and Applications, p.8-13.

[11]Egorov V, Shpilman A, 2022. Scalable multi-agent model-based reinforcement learning. Proc 21st Int Conf on Autonomous Agents and Multiagent Systems, p.381-390.

[12]Gemini Team Google, 2023. Gemini: a family of highly capable multimodal models. https://arxiv.org/abs/2312.11805

[13]Gupta O, Raskar R, 2018. Distributed learning of deep neural network over multiple agents. J Netw Comput Appl, 116:1-8.

[14]Gupta R, Sosio N, 2024. Introducing Prem-1B. https://blog.premai.io/introducing-prem-1b/

[15]Hadi MU, Tashi QA, Qureshi R, et al., 2023. A survey on large language models: applications, challenges, limitations, and practical usage. https://www.techrxiv.org/doi/full/10.36227/techrxiv.23589741.v1

[16]Icarte RT, Klassen TQ, Valenzano R, et al., 2023. Learning reward machines: a study in partially observable reinforcement learning. Artif Intell, 323:103989.

[17]Jiang AQ, Sablayrolles A, Mensch A, et al., 2023. Mistral 7B. https://arxiv.org/abs/2310.06825

[18]Jin MY, Yu QK, Shu D, et al., 2024. Health-LLM: personalized retrieval-augmented disease prediction system. https://arxiv.org/abs/2402.00746

[19]Kaddour J, Harris J, Mozes M, et al., 2023. Challenges and applications of large language models. https://arxiv.org/abs/2307.10169

[20]Kaiser L, Babaeizadeh M, Milos P, et al., 2019. Model-based reinforcement learning for Atari. https://arxiv.org/abs/1903.00374

[21]Karjee J, Naik SP, Anand K, et al., 2022. Split computing: DNN inference partition with load balancing in IoT-edge platform for beyond 5G. Meas Sens, 23:100409.

[22]Ke CH, Astuti L, 2023. Applying multi-agent deep reinforcement learning for contention window optimization to enhance wireless network performance. ICT Express, 9(5):776-782.

[23]Lan Q, Zeng QS, Popovski P, et al., 2021. Progressive feature transmission for split inference at the wireless edge. https://arxiv.org/abs/2112.07244

[24]Le Scao T, Fan A, Akiki C, et al., 2022. BLOOM: a 176B-parameter open-access multilingual language model. https://arxiv.org/abs/2211.05100

[25]Lee J, Lee H, Choi W, 2023. Wireless channel adaptive DNN split inference for resource-constrained edge devices. IEEE Commun Lett, 27(6):1520-1524.

[26]Letaief KB, Shi YM, Lu JM, et al., 2022. Edge artificial intelligence for 6G: vision, enabling technologies, and applications. IEEE J Sel Areas Commun, 40(1):5-36.

[27]Li E, Zeng LK, Zhou Z, et al., 2020. Edge AI: on-demand accelerating deep neural network inference via edge computing. IEEE Trans Wirel Commun, 19(1):447-457.

[28]Li X, Lu LY, Ni W, et al., 2022. Federated multi-agent deep reinforcement learning for resource allocation of vehicle-to-vehicle communications. IEEE Trans Veh Technol, 71(8):8810-8824.

[29]Li YX, 2017. Deep reinforcement learning: an overview. https://arxiv.org/abs/1701.07274

[30]Lin B, Zhang C, Peng T, et al., 2024. Infinite-LLM: efficient LLM service for long context with DistAttention and distributed KVCache. https://arxiv.org/abs/2401.02669

[31]Lin Z, Qu GQ, Chen QY, et al., 2024a. Pushing large language models to the 6G edge: vision, challenges, and opportunities. https://arxiv.org/abs/2309.16739

[32]Lin Z, Qu GQ, Chen XH, et al., 2024b. Split learning in 6G edge networks. IEEE Wirel Commun, 31(4):170-176.

[33]Liu D, Sun CJ, Yang CY, et al., 2020. Optimizing wireless systems using unsupervised and reinforced-unsupervised deep learning. IEEE Netw, 34(4):270-277.

[34]Luong NC, Hoang DT, Gong SM, et al., 2019. Applications of deep reinforcement learning in communications and networking: a survey. IEEE Commun Surv Tutor, 21(4):3133-3174.

[35]Mach P, Becvar Z, 2017. Mobile edge computing: a survey on architecture and computation offloading. IEEE Commun Surv Tutor, 19(3):1628-1656.

[36]Mao YY, You CS, Zhang J, et al., 2017. A survey on mobile edge computing: the communication perspective. IEEE Commun Surv Tutor, 19(4):2322-2358.

[37]Merity S, Xiong CM, Bradbury J, et al., 2016. Pointer sentinel mixture models. https://arxiv.org/abs/1609.07843

[38]Mnih V, Kavukcuoglu K, Silver D, et al., 2015. Human-level control through deep reinforcement learning. Nature, 518(7540):529-533.

[39]Mnih V, Badia AP, Mirza M, et al., 2016. Asynchronous methods for deep reinforcement learning. Proc 33rd Int Conf on Machine Learning, p.1928-1937.

[40]Moerland TM, Broekens J, Plaat A, et al., 2023. Model-based reinforcement learning: a survey. Foundat Trends^® Mach Learn, 16(1):1-118.

[41]Nakagami M, 1960. The m-distribution—a general formula of intensity distribution of rapid fading. In: Hoffman WC (Ed.), Statistical Methods in Radio Wave Propagation. Pergamon, UK, p.3-36.

[42]Nijkamp E, Pang B, Hayashi H, et al., 2022. CodeGen: an open large language model for code with multi-turn program synthesis. https://arxiv.org/abs/2203.13474

[43]Ong I, 2024. Efficient Distributed LLM Inference with Dynamic Partitioning. Technical Report UCB/EECS-2024-108, California, USA.

[44]OpenAI, 2023. GPT-4 Technical Report, San Francisco, USA.

[45]Patil R, Gudivada V, 2024. A review of current trends, techniques, and challenges in large language models (LLMs). Appl Sci, 14(5):2074.

[46]Pham QV, Fang F, Ha VN, et al., 2020. A survey of multi-access edge computing in 5G and beyond: fundamentals, technology integration, and state-of-the-art. IEEE Access, 8:116974-117017.

[47]Qian YC, Wu J, Wang R, et al., 2019. Survey on reinforcement learning applications in communication networks. J Commun Inform Netw, 4(2):30-39.

[48]Qiao LT, Zhou Y, 2023. Timely split inference in wireless networks: an accuracy-freshness tradeoff. IEEE Trans Veh Technol, 72(12):16817-16822.

[49]Romoff J, Henderson P, Piché A, et al., 2018. Reward estimation for variance reduction in deep reinforcement learning. Proc 2nd Conf on Robot Learning, p.674-699.

[50]Rozière B, Gehring J, Gloeckle F, et al., 2023. Code LLAMA: open foundation models for code. https://arxiv.org/abs/2308.12950

[51]Ryu J, Won D, Lee Y, 2022. A study of split learning model. 16th Int Conf on Ubiquitous Information Management and Communication, p.1-4.

[52]Satyanarayanan M, Bahl P, Caceres R, et al., 2009. The case for VM-based cloudlets in mobile computing. IEEE Pervas Comput, 8(4):14-23.

[53]Schulman J, Wolski F, Dhariwal P, et al., 2017. Proximal policy optimization algorithms. https://arxiv.org/abs/1707.06347

[54]Shlezinger N, Farsad N, Eldar YC, et al., 2021. Model-based machine learning for communications. https://arxiv.org/abs/2101.04726

[55]Stone M, 1974. Cross-validatory choice and assessment of statistical predictions. J Roy Stat Soc Ser B, 36(2):111-133.

[56]Thirunavukarasu AJ, Ting DSJ, Elangovan K, et al., 2023. Large language models in medicine. Nat Med, 29(8):1930-1940.

[57]Touvron H, Martin L, Stone K, et al., 2023. LLAMA 2: open foundation and fine-tuned chat models. https://arxiv.org/abs/2307.09288

[58]Üstün A, Aryabumi V, Yong ZX, et al., 2024. Aya model: an instruction finetuned open-access multilingual language model. Proc 62nd Annual Meeting of the Association for Computational Linguistics, p.15894-15939.

[59]Wang G, Cheng SJ, Zhan XY, et al., 2023. OpenChat: advancing open-source language models with mixed-quality data. https://arxiv.org/abs/2309.11235

[60]Wang YZ, Guo K, Hong W, et al., 2023. Split learning in wireless networks: a communication and computation adaptive scheme. IEEE/CIC Int Conf on Communications in China, p.1-6.

[61]Webb T, Holyoak KJ, Lu HJ, 2023. Emergent analogical reasoning in large language models. Nat Hum Behav, 7(9):1526-1541.

[62]Wei J, Bosma M, Zhao VY, et al., 2021. Finetuned language models are zero-shot learners. https://arxiv.org/abs/2109.01652

[63]Wu SJ, Irsoy O, Lu S, et al., 2023. BloombergGPT: a large language model for finance. https://arxiv.org/abs/2303.17564

[64]Yang K, Shi CS, Shen C, et al., 2023. Offline reinforcement learning for wireless network optimization with mixture datasets. IEEE Trans Wirel Commun, 23(10):12703-12716.

[65]Zhang MJ, Cao JN, Shen XM, et al., 2024. EdgeShard: efficient LLM inference via collaborative edge computing. https://arxiv.org/abs/2405.14371

[66]Zhang XH, Yu BW, Yu HY, et al., 2023. Wider and deeper LLM networks are fairer LLM evaluators. https://arxiv.org/abs/2308.01862

[67]Zhu LW, Takami G, Kawahara M, et al., 2022. Alleviating parameter-tuning burden in reinforcement learning for large-scale process control. Comput Chem Eng, 158:107658.

Open peer comments: Debate/Discuss/Question/Opinion

<1>

- Go to

基于模型强化学习的边缘计算无线大语言模型推理自适应层切分方法

Darkslateblue:Affiliate; Royal Blue:Author; Turquoise:Article

Reference