CLC number:
On-line Access: 2024-08-27
Received: 2023-10-17
Revision Accepted: 2024-05-08
Crosschecked: 2024-03-17
Cited: 0
Clicked: 1768
Citations: Bibtex RefMan EndNote GB/T7714
Xiaoyun WANG, Xiaodong DUAN, Kehan YAO, Tao SUN, Peng LIU, Hongwei YANG, Zhiqiang LI. Computing-aware network (CAN): a systematic design of computing and network convergence[J]. Frontiers of Information Technology & Electronic Engineering,in press.https://doi.org/10.1631/FITEE.2400098 @article{title="Computing-aware network (CAN): a systematic design of computing and network convergence", %0 Journal Article TY - JOUR
算力感知网络:一种算网一体的系统设计1中国移动通信集团有限公司,中国北京市,100032 2中国移动通信有限公司研究院,中国北京市,100053 摘要:网络资源的覆盖范围日益广泛,算力资源也逐渐成为能够提供泛在计算服务的基础设施。然而,在广域网络,底层网络和计算资源缺乏密切的研究或协同设计,仍然存在计算服务调度缓慢、数据分发不灵活、数据传输效率低等问题。本文提出算力感知网络(CAN)的系统架构设计,其核心贡献在于引入感知平面来收集、管理并综合计算和网络的信息。这样,感知平面、控制平面和数据平面组成一个闭环控制系统,增强了整个系统的感知能力、决策能力和数据转发功能。为了使能CAN系统,本文提出三项关键技术:算力路由、弹性广播和广域高吞吐传输。本文以人工智能(AI)模型训练、推理和离线参数传输为例,展示CAN的适用性,并指出未来的一些研究方向。 关键词组: Darkslateblue:Affiliate; Royal Blue:Author; Turquoise:Article
Reference[1]Ali-Eldin A, Wang B, Shenoy P, 2021. The hidden cost of the edge: a performance comparison of edge and cloud latencies. Proc Int Conf for High Performance Computing, Networking, Storage and Analysis, Article 23. ![]() [2]Arkko J, Hardie T, Pauly T, et al., 2023. Considerations on Application-Network Collaboration Using Path Signals. RFC9419, RFC. ![]() [3]Armbrust M, Fox A, Griffith R, et al., 2010. A view of cloud computing. Commun ACM, 53(4):50-58. ![]() [4]Arun V, Balakrishnan H, 2018. Copa: practical delay-based congestion control for the Internet. Proc 15th USENIX Symp on Networked Systems Design and Implementation, p.329-342. ![]() [5]Baldantoni L, Lundqvist H, Karlsson G, 2004. Adaptive end-to-end FEC for improving TCP performance over wireless links. Proc IEEE Int Conf on Communications, p.4023-4027. ![]() [6]Cardwell N, Cheng YC, Gunn CS, et al., 2016. BBR: congestion-based congestion control: measuring bottleneck bandwidth and round-trip propagation time. Queue, 14(5):20-53. ![]() [7]Chan E, Heimlich M, Purkayastha A, et al., 2007. Collective communication: theory, practice, and experience. Concurr Comp Pract Exper, 19(13):1749-1783. ![]() [8]Chunduri S, Parker S, Balaji P, et al., 2018. Characterization of MPI usage on a production supercomputer. Proc Int Conf for High Performance Computing, Networking, Storage and Analysis, p.386-400. ![]() [9]Clos C, 1953. A study of non-blocking switching networks. Bell Syst Tech J, 32(2):406-424. ![]() [10]Dolganow A, Przygienda T, Aldrin S, et al., 2017. Multicast Using Bit Index Explicit Replication (BIER). RFC8279, RFC. ![]() [11]Dunbar L, Malis A, Jacquenet C, et al., 2024. Dynamic Networks to Hybrid Cloud DCs: Problems and Mitigation Practices-Draft-Ietf-Rtgwg-Net2cloud-Problem-Statement-37. IETF. ![]() [12]Gibson D, Hariharan H, Lance E, et al., 2022. Aquila: a unified, low-latency fabric for datacenter networks. Proc 19th USENIX Symp on Networked Systems Design and Implementation. ![]() [13]Ha S, Rhee I, Xu LS, 2008. CUBIC: a new TCP-friendly high-speed TCP variant. ACM SIGOPS Oper Syst Rev, 42(5):64-74. ![]() [14]IEA, 2024. Electricity 2024: Analysis and Forecast to2026. Available from https://www.iea.org/reports/electricity [Accessed on Feb. 5, 2024]. ![]() [15]InfiniBand Trade Association, 2014. Supplement to InfiniBand Architecture Specification Volume 1 Release 1.2.2 Annex A17: RoCEv2 (IP Routable RoCE). ![]() [16]ITU-T, 2021. Y.2501: Framework and Architecture of Computing Power Network. Draft Recommendation ITU-T. Available from https://handle.itu.int/11.1002/1000/14768 [Accessed on Feb. 5, 2024]. ![]() [17]Kaj I, Olsén J, 2001. Throughput modeling and simulation for single connection TCP-Tahoe. Teletraffic Sci Eng, 4:705-718. ![]() [18]Kind A, Dimitropoulos X, Denazis S, et al., 2008. Advanced network monitoring brings life to the awareness plane. IEEE Commun Mag, 46(10):140-146. ![]() [19]Koop MJ, Jones T, Panda DK, 2007. Reducing connection memory requirements of MPI for InfiniBand clusters: a message coalescing approach. Proc 7th IEEE Int Symp on Cluster Computing and the Grid, p.495-504. ![]() [20]Kurose JF, 2001. Computer Networking: a Top-Down Approach. Pearson, UK. ![]() [21]Li WX, Zhang JY, Liu YF, et al., 2024. Cepheus: accelerating datacenter applications with high-performance RoCE-capable multicast. Proc IEEE Int Symp on High-Performance Computer Architecture. ![]() [22]Liu B, Mao JW, Xu L, et al., 2021. CFN-dyncast: load balancing the edges via the network. Proc IEEE Wireless Communications and Networking Conf Workshops, p.1-6. ![]() [23]Mao YY, You CS, Zhang J, et al., 2017. A survey on mobile edge computing: the communication perspective. IEEE Commun Surv Tutor, 19(4):2322-2358. ![]() [24]Rekhter Y, Li T, Hares S, 2006. A Border Gateway Protocol 4 (BGP-4). RFC-4271, RFC. ![]() [25]Savage D, Ng J, Moore S, et al., 2016. Cisco’s Enhanced Interior Gateway Routing Protocol (EIGRP). RFC7868, RFC. ![]() [26]Singhvi A, Akella A, Gibson D, et al., 2020. 1RMA: re-envisioning remote memory access for multi-tenant datacenters. Proc Annual Conf of the ACM Special Interest Group on Data Communication on the Applications, Technologies, Architectures, and Protocols for Computer Communication, p.708-721. ![]() [27]Stoica I, Shenker S, 2021. From cloud computing to sky computing. Proc Workshop on Hot Topics in Operating Systems, p.26-32. ![]() [28]Su JS, Zhao BK, Dai Y, et al., 2022. Technology trends in large-scale high-efficiency network computing. Front Inform Technol Electron Eng, 23(12):1733-1746. ![]() [29]Tang XY, Cao C, Wang YX, et al., 2021. Computing power network: the architecture of convergence of computing and networking towards 6G requirement. China Commun, 18(2):175-185. ![]() [30]Xiao JM, Tillo T, Zhao Y, 2013. Real-time video streaming using randomized expanding Reed–Solomon code. IEEE Trans Circ Syst Video Technol, 23(11):1825-1836. ![]() [31]Yao HP, Mai TL, Jiang CX, et al., 2019. AI routers & network mind: a hybrid machine learning paradigm for packet routing. IEEE Comput Intell Mag, 14(4):21-30. ![]() [32]Yao KH, Trossen D, Boucadair M, et al., 2024. Computing-Aware Traffic Steering (CATS) Problem Statement, Use Cases, and Requirements: Draft-Ietf-Cats-Usecases-Requirements-02. IETF. ![]() [33]Yuan BH, He YJ, Davis J, et al., 2022. Decentralized training of foundation models in heterogeneous environments. Proc 36th Int Conf on Neural Information Processing Systems. ![]() [34]Zong MY, Krishnamachari B, 2022. A survey on GPT-3. https://arxiv.org/abs/2212.00857 ![]() Journal of Zhejiang University-SCIENCE, 38 Zheda Road, Hangzhou
310027, China
Tel: +86-571-87952783; E-mail: cjzhang@zju.edu.cn Copyright © 2000 - 2025 Journal of Zhejiang University-SCIENCE |
Open peer comments: Debate/Discuss/Question/Opinion
<1>