CLC number: TP391
On-line Access: 2025-03-07
Received: 2024-06-01
Revision Accepted: 2024-09-13
Crosschecked: 2025-03-07
Cited: 0
Clicked: 1765
Citations: Bibtex RefMan EndNote GB/T7714
https://orcid.org/0009-0008-9570-2000
https://orcid.org/0000-0003-4297-5060
Yuxuan CHEN, Rongpeng LI, Xiaoxue YU, Zhifeng ZHAO, Honggang ZHANG. Adaptive layer splitting for wireless large language model inference in edge computing: a model-based reinforcement learning approach[J]. Frontiers of Information Technology & Electronic Engineering,in press.https://doi.org/10.1631/FITEE.2400468 @article{title="Adaptive layer splitting for wireless large language model inference in edge computing: a model-based reinforcement learning approach", %0 Journal Article TY - JOUR
基于模型强化学习的边缘计算无线大语言模型推理自适应层切分方法1浙江大学信息与电子工程学院,中国杭州市,310027 2之江实验室,中国杭州市,310012 摘要:在边缘计算环境中优化大型语言模型(LLMs)的部署对提升隐私保护和计算效率至关重要。为实现高效的无线LLM推理,本文全面分析了主流开源LLMs中不同分割点的影响。本文引入一个基于模型的强化学习(MBRL)框架,以确定边缘和用户设备(UE)之间的最佳分割点。通过引入奖励替代模型,该方法显著减少了频繁的性能评估的计算成本。广泛的仿真结果表明,该方法在不同网络条件下有效地平衡了推理性能和计算负载,为去中心化环境中LLM的部署提供稳健的解决方案。 关键词组: Darkslateblue:Affiliate; Royal Blue:Author; Turquoise:Article
Reference[1]Abbas N, Zhang Y, Taherkordi A, et al., 2018. Mobile edge computing: a survey. IEEE Int Things J, 5(1):450-465. ![]() [2]Bai YT, Jones A, Ndousse K, et al., 2022. Training a helpful and harmless assistant with reinforcement learning from human feedback. https://arxiv.org/abs/2204.05862 ![]() [3]Beaulieu NC, Cheng C, 2005. Efficient Nakagami-m fading channel simulation. IEEE Trans Veh Technol, 54(2):413-424. ![]() [4]Brown TB, Mann B, Ryder N, et al., 2020. Language models are few-shot learners. Proc 34th Int Conf on Neural Information Processing Systems, Article 159. ![]() [5]Chen L, Ahmed NK, Dutta A, et al., 2024. The landscape and challenges of HPC research and LLMs. https://arxiv.org/abs/2402.02018 ![]() [6]Chen MZ, Gündüz D, Huang KB, et al., 2021. Distributed learning in wireless networks: recent progress and future challenges. IEEE J Sel Areas Commun, 39(12):3579-3605. ![]() [7]Chen YX, Li RP, Zhao ZF, et al., 2024. NetGPT: an AI-native network architecture for provisioning beyond personalized generative services. IEEE Netw, 38(6):404-413. ![]() [8]Cleveland WS, 1979. Robust locally weighted regression and smoothing scatterplots. J Am Stat Assoc, 74(368):829-836. ![]() [9]Deisenroth MP, Rasmussen CE, 2011. PILCO: a model-based and data-efficient approach to policy search. Proc 28th Int Conf on Machine Learning, p.465-472. ![]() [10]Dong QF, Chen XL, Satyanarayanan M, 2024. Creating edge AI from cloud-based LLMs. Proc 25th Int Workshop on Mobile Computing Systems and Applications, p.8-13. ![]() [11]Egorov V, Shpilman A, 2022. Scalable multi-agent model-based reinforcement learning. Proc 21st Int Conf on Autonomous Agents and Multiagent Systems, p.381-390. ![]() [12]Gemini Team Google, 2023. Gemini: a family of highly capable multimodal models. https://arxiv.org/abs/2312.11805 ![]() [13]Gupta O, Raskar R, 2018. Distributed learning of deep neural network over multiple agents. J Netw Comput Appl, 116:1-8. ![]() [14]Gupta R, Sosio N, 2024. Introducing Prem-1B. https://blog.premai.io/introducing-prem-1b/ ![]() [15]Hadi MU, Tashi QA, Qureshi R, et al., 2023. A survey on large language models: applications, challenges, limitations, and practical usage. https://www.techrxiv.org/doi/full/10.36227/techrxiv.23589741.v1 ![]() [16]Icarte RT, Klassen TQ, Valenzano R, et al., 2023. Learning reward machines: a study in partially observable reinforcement learning. Artif Intell, 323:103989. ![]() [17]Jiang AQ, Sablayrolles A, Mensch A, et al., 2023. Mistral 7B. https://arxiv.org/abs/2310.06825 ![]() [18]Jin MY, Yu QK, Shu D, et al., 2024. Health-LLM: personalized retrieval-augmented disease prediction system. https://arxiv.org/abs/2402.00746 ![]() [19]Kaddour J, Harris J, Mozes M, et al., 2023. Challenges and applications of large language models. https://arxiv.org/abs/2307.10169 ![]() [20]Kaiser L, Babaeizadeh M, Milos P, et al., 2019. Model-based reinforcement learning for Atari. https://arxiv.org/abs/1903.00374 ![]() [21]Karjee J, Naik SP, Anand K, et al., 2022. Split computing: DNN inference partition with load balancing in IoT-edge platform for beyond 5G. Meas Sens, 23:100409. ![]() [22]Ke CH, Astuti L, 2023. Applying multi-agent deep reinforcement learning for contention window optimization to enhance wireless network performance. ICT Express, 9(5):776-782. ![]() [23]Lan Q, Zeng QS, Popovski P, et al., 2021. Progressive feature transmission for split inference at the wireless edge. https://arxiv.org/abs/2112.07244 ![]() [24]Le Scao T, Fan A, Akiki C, et al., 2022. BLOOM: a 176B-parameter open-access multilingual language model. https://arxiv.org/abs/2211.05100 ![]() [25]Lee J, Lee H, Choi W, 2023. Wireless channel adaptive DNN split inference for resource-constrained edge devices. IEEE Commun Lett, 27(6):1520-1524. ![]() [26]Letaief KB, Shi YM, Lu JM, et al., 2022. Edge artificial intelligence for 6G: vision, enabling technologies, and applications. IEEE J Sel Areas Commun, 40(1):5-36. ![]() [27]Li E, Zeng LK, Zhou Z, et al., 2020. Edge AI: on-demand accelerating deep neural network inference via edge computing. IEEE Trans Wirel Commun, 19(1):447-457. ![]() [28]Li X, Lu LY, Ni W, et al., 2022. Federated multi-agent deep reinforcement learning for resource allocation of vehicle-to-vehicle communications. IEEE Trans Veh Technol, 71(8):8810-8824. ![]() [29]Li YX, 2017. Deep reinforcement learning: an overview. https://arxiv.org/abs/1701.07274 ![]() [30]Lin B, Zhang C, Peng T, et al., 2024. Infinite-LLM: efficient LLM service for long context with DistAttention and distributed KVCache. https://arxiv.org/abs/2401.02669 ![]() [31]Lin Z, Qu GQ, Chen QY, et al., 2024a. Pushing large language models to the 6G edge: vision, challenges, and opportunities. https://arxiv.org/abs/2309.16739 ![]() [32]Lin Z, Qu GQ, Chen XH, et al., 2024b. Split learning in 6G edge networks. IEEE Wirel Commun, 31(4):170-176. ![]() [33]Liu D, Sun CJ, Yang CY, et al., 2020. Optimizing wireless systems using unsupervised and reinforced-unsupervised deep learning. IEEE Netw, 34(4):270-277. ![]() [34]Luong NC, Hoang DT, Gong SM, et al., 2019. Applications of deep reinforcement learning in communications and networking: a survey. IEEE Commun Surv Tutor, 21(4):3133-3174. ![]() [35]Mach P, Becvar Z, 2017. Mobile edge computing: a survey on architecture and computation offloading. IEEE Commun Surv Tutor, 19(3):1628-1656. ![]() [36]Mao YY, You CS, Zhang J, et al., 2017. A survey on mobile edge computing: the communication perspective. IEEE Commun Surv Tutor, 19(4):2322-2358. ![]() [37]Merity S, Xiong CM, Bradbury J, et al., 2016. Pointer sentinel mixture models. https://arxiv.org/abs/1609.07843 ![]() [38]Mnih V, Kavukcuoglu K, Silver D, et al., 2015. Human-level control through deep reinforcement learning. Nature, 518(7540):529-533. ![]() [39]Mnih V, Badia AP, Mirza M, et al., 2016. Asynchronous methods for deep reinforcement learning. Proc 33rd Int Conf on Machine Learning, p.1928-1937. ![]() [40]Moerland TM, Broekens J, Plaat A, et al., 2023. Model-based reinforcement learning: a survey. Foundat Trends® Mach Learn, 16(1):1-118. ![]() [41]Nakagami M, 1960. The m-distribution—a general formula of intensity distribution of rapid fading. In: Hoffman WC (Ed.), Statistical Methods in Radio Wave Propagation. Pergamon, UK, p.3-36. ![]() [42]Nijkamp E, Pang B, Hayashi H, et al., 2022. CodeGen: an open large language model for code with multi-turn program synthesis. https://arxiv.org/abs/2203.13474 ![]() [43]Ong I, 2024. Efficient Distributed LLM Inference with Dynamic Partitioning. Technical Report UCB/EECS-2024-108, California, USA. ![]() [44]OpenAI, 2023. GPT-4 Technical Report, San Francisco, USA. ![]() [45]Patil R, Gudivada V, 2024. A review of current trends, techniques, and challenges in large language models (LLMs). Appl Sci, 14(5):2074. ![]() [46]Pham QV, Fang F, Ha VN, et al., 2020. A survey of multi-access edge computing in 5G and beyond: fundamentals, technology integration, and state-of-the-art. IEEE Access, 8:116974-117017. ![]() [47]Qian YC, Wu J, Wang R, et al., 2019. Survey on reinforcement learning applications in communication networks. J Commun Inform Netw, 4(2):30-39. ![]() [48]Qiao LT, Zhou Y, 2023. Timely split inference in wireless networks: an accuracy-freshness tradeoff. IEEE Trans Veh Technol, 72(12):16817-16822. ![]() [49]Romoff J, Henderson P, Piché A, et al., 2018. Reward estimation for variance reduction in deep reinforcement learning. Proc 2nd Conf on Robot Learning, p.674-699. ![]() [50]Rozière B, Gehring J, Gloeckle F, et al., 2023. Code LLAMA: open foundation models for code. https://arxiv.org/abs/2308.12950 ![]() [51]Ryu J, Won D, Lee Y, 2022. A study of split learning model. 16th Int Conf on Ubiquitous Information Management and Communication, p.1-4. ![]() [52]Satyanarayanan M, Bahl P, Caceres R, et al., 2009. The case for VM-based cloudlets in mobile computing. IEEE Pervas Comput, 8(4):14-23. ![]() [53]Schulman J, Wolski F, Dhariwal P, et al., 2017. Proximal policy optimization algorithms. https://arxiv.org/abs/1707.06347 ![]() [54]Shlezinger N, Farsad N, Eldar YC, et al., 2021. Model-based machine learning for communications. https://arxiv.org/abs/2101.04726 ![]() [55]Stone M, 1974. Cross-validatory choice and assessment of statistical predictions. J Roy Stat Soc Ser B, 36(2):111-133. ![]() [56]Thirunavukarasu AJ, Ting DSJ, Elangovan K, et al., 2023. Large language models in medicine. Nat Med, 29(8):1930-1940. ![]() [57]Touvron H, Martin L, Stone K, et al., 2023. LLAMA 2: open foundation and fine-tuned chat models. https://arxiv.org/abs/2307.09288 ![]() [58]Üstün A, Aryabumi V, Yong ZX, et al., 2024. Aya model: an instruction finetuned open-access multilingual language model. Proc 62nd Annual Meeting of the Association for Computational Linguistics, p.15894-15939. ![]() [59]Wang G, Cheng SJ, Zhan XY, et al., 2023. OpenChat: advancing open-source language models with mixed-quality data. https://arxiv.org/abs/2309.11235 ![]() [60]Wang YZ, Guo K, Hong W, et al., 2023. Split learning in wireless networks: a communication and computation adaptive scheme. IEEE/CIC Int Conf on Communications in China, p.1-6. ![]() [61]Webb T, Holyoak KJ, Lu HJ, 2023. Emergent analogical reasoning in large language models. Nat Hum Behav, 7(9):1526-1541. ![]() [62]Wei J, Bosma M, Zhao VY, et al., 2021. Finetuned language models are zero-shot learners. https://arxiv.org/abs/2109.01652 ![]() [63]Wu SJ, Irsoy O, Lu S, et al., 2023. BloombergGPT: a large language model for finance. https://arxiv.org/abs/2303.17564 ![]() [64]Yang K, Shi CS, Shen C, et al., 2023. Offline reinforcement learning for wireless network optimization with mixture datasets. IEEE Trans Wirel Commun, 23(10):12703-12716. ![]() [65]Zhang MJ, Cao JN, Shen XM, et al., 2024. EdgeShard: efficient LLM inference via collaborative edge computing. https://arxiv.org/abs/2405.14371 ![]() [66]Zhang XH, Yu BW, Yu HY, et al., 2023. Wider and deeper LLM networks are fairer LLM evaluators. https://arxiv.org/abs/2308.01862 ![]() [67]Zhu LW, Takami G, Kawahara M, et al., 2022. Alleviating parameter-tuning burden in reinforcement learning for large-scale process control. Comput Chem Eng, 158:107658. ![]() Journal of Zhejiang University-SCIENCE, 38 Zheda Road, Hangzhou
310027, China
Tel: +86-571-87952783; E-mail: cjzhang@zju.edu.cn Copyright © 2000 - 2025 Journal of Zhejiang University-SCIENCE |
Open peer comments: Debate/Discuss/Question/Opinion
<1>