Date of Award

5-2025

Document Type

Dissertation

Degree Name

Doctor of Philosophy (PhD)

Department

School of Computing

Committee Chair/Advisor

Rong Ge

Committee Member

Kai Liu

Committee Member

Feng Luo

Committee Member

Prasanna Balaprakash

Committee Member

Xingfu Wu

Abstract

Continuous increases in high performance computing (HPC) throughput have served as catalysts for industry and scientific advancement in countless manners that have fundamentally shaped our modern world. Our demands on compute resources continue to scale, but the limitations of Ahmdal’s law and Dennard scaling have proven increasingly difficult to overcome when approached solely through hardware or software design. Furthermore, many HPC applications fail to utilize the collective system’s performance, even on the most advanced supercomputers.

However, the resurgence of AI in the industry has promoted an explosion of hardware and software codesign that have fueled massive improvements in GPU design and novel ASICs. These performance improvements are maximized on myriad heterogeneous systems by specially tuning applications. Mimicking these developments across the whole of computing will require similarly holistic approaches combining specialty hardware, software that caters its design to the greatest hardware strengths, and fine-tuning on individual systems to maximize performance.

We use three distinct perspectives to address scalable system performance holistically. We analyze the impacts of liquid immersion cooling technologies on sustained application performance and energy efficiency. Next, we present a case study where intentional algorithmic redesign for GPU acceleration permits robust performance improvements that endure through multiple generations of hardware. We find that memory latency forms a primary bottleneck for GPU-accelerated performance and demonstrate how algorithm-specific optimizations can significantly improve performance over multiple architecture generations. Finally, we tie these concepts together through performance optimization techniques that respect software- and hardware-based performance constraints. We improve the re-usability of performance insights with novel transfer learning techniques that make performance optimization costs more predictable and more successful in the short term. Our insights demonstrate the necessity of systemic approaches for performance tuning in HPC.

Author ORCID Identifier

0000-0002-1213-1011

Share

COinS
 
 

To view the content in your browser, please download Adobe Reader or, alternately,
you may Download the file to your hard drive.

NOTE: The latest versions of Adobe Reader do not support viewing PDF files within Firefox on Mac OS and if you are using a modern (Intel) Mac, there is no official plugin for viewing PDF files within the browser window.