Date of Award
12-2025
Document Type
Thesis
Degree Name
Master of Science (MS)
Department
Electrical and Computer Engineering (Holcomb Dept. of)
Committee Chair/Advisor
Jon C. Calhoun
Committee Member
Rong Ge
Committee Member
Tao Wei
Abstract
Modern high-performance computing (HPC) applications generate data at petascale and beyond, creating immense storage demands while I/O bandwidth remains limited. Error-bounded lossy compression (EBLC) addresses this challenge by significantly reducing the data size while guaranteeing that the errors introduced to the dataset stay within user-defined bounds. SZ, a state-of-the-art EBLC, achieves high compression ratios but faces challenges in parallelization due to read-after-write (RAW) dependencies during prediction and quantization. To overcome this limitation, Dube introduced vecSZ, a SIMD-vectorized CPU version of SZ that employs the dual-quantization method from cuSZ, a GPU-based SZ framework. However, vecSZ is restricted to specific CPU architectures and supports only single-precision floating-point data, limiting its portability and applicability. To address these constraints of vecSZ, we propose SZ3 SIMD, an architecture-independent EBLC built on the C++ experimental SIMD library and fully integrated into SZ3’s framework. We evaluated SZ3 SIMD on five CPUs and seven real-world HPC applications datasets, comparing it against vecSZ, SZ3.2, and SZ3.3. Experimental results show that SZ3 SIMD improves compression ratio by up to 1.7× over vecSZ. In terms of prediction and quantization (PQ) compression throughput, SZ3 SIMD achieves up to 9.44× speedup over SZ3.2 and 6.71× over SZ3.3. For total compression throughput, SZ3 SIMD achieves up to 4.42× speedup over SZ3.2 and 3.62× over SZ3.3.
Recommended Citation
Zou, Changfeng, "SZ3_SIMD: Accelerating Error-Bounded Lossy Compression With Architecture Independent SIMD" (2025). All Theses. 4628.
https://open.clemson.edu/all_theses/4628
Author ORCID Identifier
https://orcid.org/0009-0004-7739-560X