CLC number: TP316
On-line Access: 2017-02-10
Received: 2015-10-21
Revision Accepted: 2016-03-13
Crosschecked: 2016-12-13
Cited: 0
Clicked: 6937
Wen-zhe Zhang, Kai Lu, Mikel LUJÁN, Xiao-ping Wang, Xu Zhou. Fine-grained checkpoint based on non-volatile memory[J]. Frontiers of Information Technology & Electronic Engineering, 2017, 18(2): 220-234.
@article{title="Fine-grained checkpoint based on non-volatile memory",
author="Wen-zhe Zhang, Kai Lu, Mikel LUJÁN, Xiao-ping Wang, Xu Zhou",
journal="Frontiers of Information Technology & Electronic Engineering",
volume="18",
number="2",
pages="220-234",
year="2017",
publisher="Zhejiang University Press & Springer",
doi="10.1631/FITEE.1500352"
}
%0 Journal Article
%T Fine-grained checkpoint based on non-volatile memory
%A Wen-zhe Zhang
%A Kai Lu
%A Mikel LUJÁN
%A Xiao-ping Wang
%A Xu Zhou
%J Frontiers of Information Technology & Electronic Engineering
%V 18
%N 2
%P 220-234
%@ 2095-9184
%D 2017
%I Zhejiang University Press & Springer
%DOI 10.1631/FITEE.1500352
TY - JOUR
T1 - Fine-grained checkpoint based on non-volatile memory
A1 - Wen-zhe Zhang
A1 - Kai Lu
A1 - Mikel LUJÁN
A1 - Xiao-ping Wang
A1 - Xu Zhou
J0 - Frontiers of Information Technology & Electronic Engineering
VL - 18
IS - 2
SP - 220
EP - 234
%@ 2095-9184
Y1 - 2017
PB - Zhejiang University Press & Springer
ER -
DOI - 10.1631/FITEE.1500352
Abstract: New non-volatile memory (e.g., phase-change memory) provides fast access, large capacity, byte-addressability, and non-volatility features. These features, fast-byte-persistency, will bring new opportunities to fault tolerance. We propose a fine-grained checkpoint based on non-volatile memory. We extend the current virtual memory manager to manage non-volatile memory, and design a persistent heap with support for fast allocation and checkpointing of persistent objects. To achieve a fine-grained checkpoint, we scatter objects across virtual pages and rely on hardware page-protection to monitor the modifications. In our system, two objects in different virtual pages may reside on the same physical page. Modifying one object would not interfere with the other object. This allows us to monitor and checkpoint objects smaller than 4096 bytes in a fine-grained way. Compared with previous page-grained based checkpoint mechanisms, our new checkpoint method can greatly reduce the data copied at checkpoint time and better leverage the limited bandwidth of non-volatile memory.
[1]Badam, A., 2013. How persistent memory will change software systems. Computer, 46(8):45-51.
[2]Bautista-Gomez, L., Tsuboi, S., Komatitsch, D., et al., 2011. FTI: high performance fault tolerance interface for hybrid systems. Proc. Int. Conf. for High Performance Computing, Networking, Storage and Analysis, Article 32.
[3]Bent, J., Gibson, G., Grider, G., et al., 2009. PLFS: a checkpoint filesystem for parallel applications. Proc. Int. Conf. for High Performance Computing, Networking, Storage and Analysis, Article 21.
[4]Berger, E.D., McKinley, K.S., Blumofe, R.D., et al., 2000. Hoard: a scalable memory allocator for multithreaded applications. ACM SIGPLAN Not., 35(11):117-128.
[5]Cho, S., Lee, H., 2009. Flip-n-write: a simple deterministic technique to improve PRAM write performance, energy and endurance. Proc. 42nd Annual IEEE/ACM Int. Symp. on Microarchitecture, p.347-357.
[6]Chou, C., Jaleel, A., Qureshi, M.K., 2014. CAMEO: a two-level memory organization with capacity of main memory and flexibility of hardware-managed cache. Proc. 47th Annual IEEE/ACM Int. Symp. on Microarchitecture, p.1-12.
[7]Coburn, J., Caulfield, A.M., Akel, A., et al., 2011. NV-Heaps: making persistent objects fast and safe with next-generation, non-volatile memories. ACM SIGARCH Comput. Archit. News, 39(1):105-118.
[8]Condit, J., Nightingale, E.B., Frost, C., et al., 2009. Better I/O through byte-addressable, persistent memory. Proc. ACM SIGOPS 22nd Symp. on Operating Systems Principles, p.133-146.
[9]di Ventra, M., Pershin, Y.V., Chua, L.O., 2009. Circuit elements with memory: memristors, memcapacitors, and meminductors. Proc. IEEE, 97(10):1717-1724.
[10]Dong, X., Xie, Y., Muralimanohar, N., et al., 2011. Hybrid checkpointing using emerging nonvolatile memories for future exascale systems. ACM Trans. Archit. Code Optim., 8(2), Article 6.
[11]Dulloor, S.R., Kumar, S., Keshavamurthy, A., et al., 2014. System software for persistent memory. Proc. 9th European Conf. on Computer Systems, Article 15.
[12]Felber, P., Fetzer, C., Riegel, T., 2008. Dynamic performance tuning of word-based software transactional memory. Proc. 13th ACM SIGPLAN Symp. on Principles and Practice of Parallel Programming, p.237-246.
[13]Felber, P., Fetzer, C., Marlier, P., et al., 2010. Time-based software transactional memory. IEEE Trans. Parall. Distr. Syst., 21(12):1793-1807.
[14]Gulur, N., Mehendale, M., Manikantan, R., et al., 2014. Bi-modal DRAM cache: improving hit rate, hit latency and bandwidth. Proc. 47th Annual IEEE/ACM Int. Symp. on Microarchitecture, p.38-50.
[15]Hirabayashi, M., 2010. Tokyo Cabinet: a Modern Implementation of DBM. http://fallabs.com/tokyocabinet/
[16]Jevdjic, D., Loh, G.H., Kaynak, C., et al., 2014. Unison cache: a scalable and effective die-stacked DRAM cache. Proc. 47th Annual IEEE/ACM Int. Symp. on Microarchitecture, p.25-37.
[17]Kannan, S., Gavrilovska, A., Schwan, K., et al., 2013. Optimizing checkpoints using NVM as virtual memory. Proc. IEEE 27th Int. Symp. on Parallel & Distributed Processing, p.29-40.
[18]Koltsidas, I., Mueller, P., Pletka, R., et al., 2014. PSS: a prototype storage subsystem based on PCM. Proc. 5th Annual Non-volatile Memories Workshop, p.1-2.
[19]Lattner, C., Adve, V., 2004. LLVM: a compilation framework for lifelong program analysis & transformation. Proc. Int. Symp. on Code Generation and Optimization, p.75-86.
[20]Li, D., Vetter, J.S., Marin, G., et al., 2012. Identifying opportunities for byte-addressable non-volatile memory in extreme-scale scientific applications. Proc. IEEE 26th Int. Parallel & Distributed Processing Symp., p.945-956.
[21]Luk, C., Cohn, R., Muth, R., et al., 2005. Pin: building customized program analysis tools with dynamic instrumentation. ACM SIGPLAN Not., 40(6):190-200.
[22]Minh, C., Chung, J., Kozyrakis, C., et al., 2008. STAMP: Stanford transactional applications for multi-processing. Proc. IEEE Int. Symp. on Workload Characterization, p.35-46.
[23]Plank, J.S., Li, K., Puening, M.A., 1998. Diskless checkpointing. IEEE Trans. Parall. Distr. Syst., 9(10):972-986.
[24]Qureshi, M.K., Srinivasan, V., Rivers, J.A., 2009. Scalable high performance main memory system using phase-change memory technology. ACM SIGARCH Comput. Archit. News, 37(3):24-33.
[25]Qureshi, M.K., Franceschini, M.M., Jagmohan, A., et al., 2012. PreSET: improving performance of phase change memories by exploiting asymmetry in write times. ACM SIGARCH Comput. Archit. News, 40(3):380-391.
[26]Rosenfeld, P., Cooper-Balis, E., Jacob, B., 2011. DRAMSim2: a cycle accurate memory system simulator. IEEE Comput. Archit. Lett., 10(1):16-19.
[27]Schroeder, B., Gibson, G.A., 2007. Understanding failures in petascale computers. J. Phys. Conf. Ser., 78:012022.
[28]Volos, H., Tack, A.J., Swift, M.M., 2011. Mnemosyne: lightweight persistent memory. ACM SIGARCH Comput. Archit. News, 39(1):91-104.
[29]Volos, H., Nalli, S., Panneerselvam, S., et al., 2014. Aerie: flexible file-system interfaces to storage-class memory. Proc. 9th European Conf. on Computer Systems, Article 14.
[30]Wu, X., Reddy, A.L.N., 2011. SCMFS: a file system for storage class memory. Proc. Int. Conf. for High Performance Computing, Networking, Storage and Analysis, Article 39.
[31]Xu, W., Sun, H., Wang, X., et al., 2011. Design of last-level on-chip cache using spin-torque transfer RAM (STT RAM). IEEE Trans. VLSI Syst., 19(3):483-493.
[32]Yoon, D.H., Muralimanohar, N., Chang, J., et al., 2011. FREE-p: protecting non-volatile memory against both hard and soft errors. Proc. IEEE 17th Int. Symp. on High Performance Computer Architecture, p.466-477.
[33]Zheng, G., Shi, L., Kale, L.V., 2004. FTC-Charm++: an in-memory checkpoint-based fault tolerant runtime for Charm++ and MPI. Proc. IEEE Int. Conf. on Cluster Computing, p.93-103.
[34]Zhou, P., Zhao, B., Yang, J., et al., 2009. A durable and energy efficient main memory using phase change memory technology. ACM SIGARCH Comput. Archit. News, 37(3):14-23.
Open peer comments: Debate/Discuss/Question/Opinion
<1>