CLC number: TP315
On-line Access: 2022-10-26
Received: 2021-06-16
Revision Accepted: 2022-10-26
Crosschecked: 2021-10-24
Cited: 0
Clicked: 3043
Citations: Bibtex RefMan EndNote GB/T7714
https://orcid.org/0000-0003-2368-4946
Mingtian SHAO, Kai LU, Wanqing CHI, Ruibo WANG, Yiqin DAI, Wenzhe ZHANG. TEES: topology-aware execution environment service for fast and agile application deployment in HPC[J]. Frontiers of Information Technology & Electronic Engineering,in press.https://doi.org/10.1631/FITEE.2100284 @article{title="TEES: topology-aware execution environment service for fast and agile application deployment in HPC", %0 Journal Article TY - JOUR
TEES:一种面向高性能计算快速、灵活应用程序部署的拓扑感知的运行环境服务国防科技大学计算机学院,中国长沙市,410073 摘要:高性能计算(HPC)即将达到新的高度:百亿亿次。应用程序部署正成为一个日益突出的问题。容器技术解决了应用程序及其运行环境的封装和迁移问题。但是,容器镜像太过笨重,在大量计算结点上的部署过程非常耗时。虽然点对点(P2P)方式带来更高的传输效率,但也引入更大的网络负载。所有这些问题都会导致应用程序的高启动延迟。为解决这些问题,提出拓扑感知的运行环境服务(TEES),用于在高性能计算系统上快速、灵活地部署应用程序。TEES为用户创建了一个更轻量级的运行环境,并使用一种更有效的拓扑感知P2P方法减少部署时间。结合分步传输和提前启动机制,TEES降低了应用程序的启动延迟。在天河高性能计算系统中,TEES在3秒内实现了在17 560个计算结点上的一个典型应用程序的部署和启动。与基于容器的应用程序部署方式相比,速度提高了12倍,网络负载减少了85%。 关键词组: Darkslateblue:Affiliate; Royal Blue:Author; Turquoise:Article
Reference[1]Belkin M, Haas R, Arnold GW, et al., 2018. Container solutions for HPC systems: a case study of using shifter on blue waters. Proc Practice and Experience on Advanced Research Computing,Article 43. [2]Boettiger C, 2015. An introduction to Docker for reproducible research. SIGOPS Oper Syst Rev, 49(1):71-79. [3]Boyle PA, 2012. The BlueGene/Q supercomputer. Proc 30th Int Symp on Lattice Field Theory,Article 20. [4]Chen JY, Guan Q, Liang X, et al., 2018. Build and execution environment (BEE): an encapsulated environment enabling HPC applications running everywhere. IEEE Int Conf on Big Data, p.1737-1746. [5]de Velp GE, Rivière E, Sadre R, 2020. Understanding the performance of container execution environments. Proc 6th Int Workshop on Container Technologies and Container Clouds, p.37-42. [6]di Nitto E, Gorroñogoitia J, Kumara I, et al., 2020. An approach to support automated deployment of applications on heterogeneous cloud-HPC infrastructures. Proc 22nd Int Symp on Symbolic and Numeric Algorithms for Scientific Computing, p.133-140. [7]Djemame K, Carr H, 2020. Exascale computing deployment challenges. Proc 17th Int Conf on the Economics of Grids, Clouds, Systems, and Services, p.211-216. [8]Dongarra J, 2016. Report on the Sunway TaihuLight System. UT-EECS-16-742, University of Tennessee, Tennessee, USA. [9]Du L, Wo TY, Yang RY, et al., 2017. Cider: a rapid Docker container deployment system through sharing network storage. IEEE 19th Int Conf on High Performance Computing and Communications; IEEE 15th Int Conf on Smart City; IEEE 3rd Int Conf on Data Science and Systems, p.332-339. [10]Feng HH, Misra V, Rubenstein D, 2007. PBS: a unified priority-based scheduler. Proc ACM SIGMETRICS Int Conf on Measurement and Modeling of Computer Systems, p.203-214. [11]Fu HH, Liao JF, Yang JZ, et al., 2016. The Sunway TaihuLight supercomputer: system and applications. Sci China Inform Sci, 59(7):072001. [12]Gerhardt L, Bhimji W, Canon S, et al., 2017. Shifter: containers for HPC. J Phys Conf Ser, 898:082021. [13]Godlove D, 2019. Singularity: simple, secure containers for compute-driven workloads. Proc Practice and Experience in Advanced Research Computing on Rise of the Machines, Article 24. [14]Hardi N, Blomer J, Ganis G, et al., 2018. Making containers lazy with Docker and CernVM-FS. J Phys Conf Ser, 1085(3):032019. [15]Haring R, 2011. The Blue Gene/Q Compute chip. IEEE Hot Chips 23 Symp, p.1-20. [16]Harter T, Salmon B, Liu R, et al., 2016. Slacker: fast distribution with lazy Docker containers. Proc 14th USENIX Conf on File and Storage Technologies, p.181-195. [17]Höb M, Kranzlmüller D, 2020. Enabling EASEY deployment of containerized applications for future HPC systems. Proc 20th Int Conf on Computational Science, p.206-219. [18]Huang Z, Wu S, Jiang S, et al., 2019. FastBuild: accelerating Docker image building for efficient development and deployment of container. 35th Symp on Mass Storage Systems and Technologies, p.28-37. [19]Kurtzer GM, Sochat V, Bauer MW, 2017. Singularity: scientific containers for mobility of compute. PLoS ONE, 12(5):e0177459. [20]Li HB, Yuan YF, Du R, et al., 2020. DADI: block-level image service for agile and elastic application deployment. USENIX Annual Technical Conf, p.727-740. [21]Liu HF, Ding W, Chen Y, et al., 2019. CFS: a distributed file system for large scale container platforms. https://arxiv.org/abs/1911.03001 [22]Meizner J, Nowakowski P, Kapala J, et al., 2020. Towards exascale computing architecture and its prototype: services and infrastructure. Comput Inform, 39(4):860-880. [23]Merkel D, 2014. Docker: lightweight Linux containers for consistent development and deployment. Linux J, 2014(239):2. [24]Shao MT, Lu K, Zhang WZ, 2022. Self-deployed execution environment for HPC. Front Inform Technol Electron Eng, early access. [25]Srirama SN, Adhikari M, Paul S, 2020. Application deployment using containers with auto-scaling for microservices in cloud environment. J Netw Comput Appl, 160:102629. [26]Verma A, Pedrosa L, Korupolu M, et al., 2015. Large-scale cluster management at Google with Borg. Proc 10th European Conf on Computer Systems, Article 18. [27]Wang KJ, Yang Y, Li Y, et al., 2017. FID: a faster image distribution system for Docker platform. IEEE 2nd Int Workshops on Foundations and Applications of Self• Systems, p.191-198. [28]Yoo AB, Jette MA, Grondona M, 2003. SLURM: simple Linux utility for resource management. Proc 9th Int Workshop on Job Scheduling Strategies for Parallel Processing, p.44-60. [29]Zheng C, Rupprecht L, Tarasov V, et al., 2018. Wharf: sharing Docker images in a distributed file system. Proc ACM Symp on Cloud Computing, p.174-185. Journal of Zhejiang University-SCIENCE, 38 Zheda Road, Hangzhou
310027, China
Tel: +86-571-87952783; E-mail: cjzhang@zju.edu.cn Copyright © 2000 - 2024 Journal of Zhejiang University-SCIENCE |
Open peer comments: Debate/Discuss/Question/Opinion
<1>