
CLC number: TP181
On-line Access: 2024-08-27
Received: 2023-10-17
Revision Accepted: 2024-05-08
Crosschecked: 2023-12-21
Cited: 0
Clicked: 2983
Citations: Bibtex RefMan EndNote GB/T7714
Yiyun SUN, Senlin ZHANG, Meiqin LIU, Ronghao ZHENG, Shanling DONG, Xuguang LAN. Multi-agent evaluation for energy management by practically scaling α-rank[J]. Frontiers of Information Technology & Electronic Engineering,in press.https://doi.org/10.1631/FITEE.2300438 @article{title="Multi-agent evaluation for energy management by practically scaling α-rank", %0 Journal Article TY - JOUR
基于拓展α-rank的多智能体策略评估方法在能源管理中的应用1浙江大学工业控制技术国家重点实验室,中国杭州市,310027 2浙江大学电气工程学院,中国杭州市,310027 3西安交通大学人机混合增强智能全国重点实验室,中国西安市,710049 摘要:随着碳达峰、碳中和政策的制定与实施,电网新能源化成为了主流趋势。然而,配电网中光伏装置数量的增加已经给分布式配电网系统带来巨大的有源电压调控压力,使得传统电压调节模式难以适应新能源化电网系统。基于多智能体强化学习的智能控制策略可通过智能逆变器和其他智能建筑能源管理系统(楼宇微网)缓解这些问题。为了获得楼宇微网的最佳能源管理策略,并满足楼宇用户的舒适度和能源需求,本文提出两种大规模多智能体策略评估方法,将能源管理问题转化为一般和博弈,同时优化了系统和楼宇用户两个层面的收益。α-rank算法虽然可解决一般和博弈,并在理论上保证策略排名的可靠性,但其受到策略交互中的采样复杂度限制,难以应用于实际电力系统。通过引入张量补全拓展α-rank算法,本文提出一种新的评估算法TcEval,以降低交互中的采样复杂性。此外,考虑到实际场景中普遍存在的噪声问题,本文建立了基于领域知识的噪声处理模型来计算策略收益,提出了针对噪声场景的TcEval-AS算法。多组基于实际数据的能源管理案例实验说明,本文提出的两种评估算法相较于现有方法(RG-UCB和α-IG)大幅度降低了策略评估中采样复杂度。最后,用实际数据验证了所提算法的有效性。 关键词组: Darkslateblue:Affiliate; Royal Blue:Author; Turquoise:Article
Reference[1]Brookes DH, Listgarten J, 2018. Design by adaptive sampling. https://arxiv.org/pdf/1810.03714v4 ![]() [2]Brookes DH, Park H, Listgarten J, 2019. Conditioning by adaptive sampling for robust design. Proc 36th Int Conf on Machine Learning, p.773-782. ![]() [3]Cai WQ, Kordabad AB, Gros S, 2023. Energy management in residential microgrid using model predictive control-based reinforcement learning and Shapley value. Eng Appl Artif Intell, 119:105793. ![]() [4]Claessens BJ, Vrancx P, Ruelens F, 2018. Convolutional neural networks for automatic state-time feature extraction in reinforcement learning applied to residential load control. IEEE Trans Smart Grid, 9(4):3259-3269. ![]() [5]Czarnecki WM, Gidel G, Tracey B, et al., 2020. Real world games look like spinning tops. Proc 34th Int Conf on Neural Information Processing Systems, Article 1463. ![]() [6]Dong Q, Wu ZY, Lu J, et al., 2022. Existence and practice of gaming: thoughts on the development of multi-agent system gaming. Front Inform Technol Electron Eng, 23(7):995-1001. ![]() [7]Du YL, Yan X, Chen X, et al., 2021. Estimating α-rank from a few entries with low rank matrix completion. Proc 38th Int Conf on Machine Learning, p.2870-2879. ![]() [8]Lowe R, Wu Y, Tamar A, et al., 2017. Multi-agent actor-critic for mixed cooperative-competitive environments. Proc 31st Int Conf on Neural Information Processing Systems, p.6382-6393. ![]() [9]Muller P, Omidshafiei S, Rowland M, et al., 2020. A generalized training approach for multiagent learning. Proc 8th Int Conf on Learning Representations. ![]() [10]Omidshafiei S, Papadimitriou C, Piliouras G, et al., 2019. α-rank: multi-agent evaluation by evolution. Sci Rep, 9(1):9937. ![]() [11]Pigott A, Crozier C, Baker K, et al., 2022. GridLearn: multiagent reinforcement learning for grid-aware building energy management. Electr Power Syst Res, 213:108521. ![]() [12]Rashid T, Zhang C, Ciosek K, 2021. Estimating α-rank by maximizing information gain. Proc AAAI Conf on Artificial Intelligence, p.5673-5681. ![]() [13]Rowland M, Omidshafiei S, Tuyls K, et al., 2019. Multiagent evaluation under incomplete information. Proc 33rd Int Conf on Neural Information Processing Systems, Article 1101. ![]() [14]Shalev-Shwartz S, Ben-David S, 2014. Understanding Machine Learning: from Theory to Algorithms. Cambridge University Press, Cambridge, UK. ![]() [15]Signorino CS, Ritter JM, 1999. Tau-b or not tau-b: measuring the similarity of foreign policy positions. Int Stud Q, 43(1):115-144. ![]() [16]Silver D, Huang A, Maddison CJ, et al., 2016. Mastering the game of Go with deep neural networks and tree search. Nature, 529(7587):484-489. ![]() [17]Su WC, Wang JH, 2012. Energy management systems in microgrid operations. Electr J, 25(8):45-60. ![]() [18]Tong Z, Li N, Zhang HM, et al., 2023. Dynamic user-centric multi-dimensional resource allocation for a wide-area coverage signaling cell based on DQN. Front Inform Technol Electron Eng, 24(1):154-163. ![]() [19]Tuyls K, Perolat J, Lanctot M, et al., 2018. A generalised method for empirical game theoretic analysis. Proc 17th Int Conf on Autonomous Agents and Multiagent Systems, p.77-85. ![]() [20]Vincent R, Ait-Ahmed M, Houari A, et al., 2020. Residential microgrid energy management considering flexibility services opportunities and forecast uncertainties. Int J Electr Power Energy Syst, 120:105981. ![]() [21]Williams CKI, Rasmussen CE, 1995. Gaussian processes for regression. Proc 8th Int Conf on Neural Information Processing Systems, p.514-520. ![]() [22]Xia D, Yuan M, Zhang CH, 2021. Statistically optimal and computationally efficient low rank tensor completion from noisy entries. Ann Stat, 49(1):76-99. ![]() [23]Xu HC, Domínguez-García AD, Sauer PW, 2020. Optimal tap setting of voltage regulation Transformers using batch reinforcement learning. IEEE Trans Power Syst, 35(3):1990-2001. ![]() [24]Zhang YY, Rao XP, Liu CY, et al., 2023. A cooperative EV charging scheduling strategy based on double deep Q-network and prioritized experience replay. Eng Appl Artif Intell, 118:105642. ![]() [25]Zhao LY, Yang T, Li W, et al., 2022. Deep reinforcement learning-based joint load scheduling for household multi-energy system. Appl Energy, 324:119346. ![]() Journal of Zhejiang University-SCIENCE, 38 Zheda Road, Hangzhou
310027, China
Tel: +86-571-87952783; E-mail: cjzhang@zju.edu.cn Copyright © 2000 - 2026 Journal of Zhejiang University-SCIENCE | ||||||||||||||


ORCID:
Open peer comments: Debate/Discuss/Question/Opinion
<1>