Date of Award

5-2026

Document Type

Dissertation

Degree Name

Doctor of Philosophy (PhD)

Department

Electrical and Computer Engineering

Committee Chair/Advisor

Dr.Tao Wei

Committee Member

Dr. Fatemeh Afghah

Committee Member

Dr. Rong Ge

Committee Member

Dr. Judson Ryckman

Abstract

High-performance computing (HPC) is changing rapidly as scientific simulations and large language model (LLM) workloads push the need for higher performance under tight power and memory constraints. Conventional platforms such as CPUs and GPUs accelerate computation through instruction-driven parallelism, relying on multithreading, SIMD, and SIMT execution, but increasingly encounter scalability limits imposed by the power and memory walls. In contrast, Field-Programmable Gate Arrays (FPGAs) and Neural Processing Units (NPUs) offer a high-efficiency alternative through dataflow-oriented architectures that exploit deep pipelining and customized memory hierarchies to reduce data movement. However, the performance potential of these spatial accelerators remains largely unrealized when traditional, control-flow-centric algorithms are directly mapped onto them.

This dissertation addresses this gap by developing domain-specific algorithm and dataflow designs tailored for FPGAs and NPUs, demonstrating that hardware–software co-design is essential for achieving high performance and energy efficiency on modern accelerators. Two representative and challenging applications are studied: electromagnetic simulation using the Finite-Difference Time-Domain (FDTD) method and on-device inference for large language models. FDTD simulations are critical for the design of photonic integrated circuits but are computationally intensive, while on-device LLM inference requires low latency and low power consumption.

For FDTD, this work introduces a time-pipelined computation that significantly reduces data movement and enables scalable execution across FPGA networks and reconfigurable accelerators, substantially shortening photonic design cycles. For large language models, it demonstrates how reorganizing computation and dataflow allows NPUs to process long sequences efficiently, achieving lower latency and energy consumption than existing approaches.

Recommended Citation

Yu, Miaoxiang, "Domain-Specific Design on FPGA and NPU for scientific computing and On-Device LLM Inference" (2026). All Dissertations. 4205.
https://open.clemson.edu/all_dissertations/4205

Author ORCID Identifier

0000-0002-4382-9009

Download

Included in

Architectural Engineering Commons, Electromagnetics and Photonics Commons

COinS

All Dissertations

Domain-Specific Design on FPGA and NPU for scientific computing and On-Device LLM Inference

Date of Award

Document Type

Degree Name

Department

Committee Chair/Advisor

Committee Member

Committee Member

Committee Member

Abstract

Recommended Citation

Author ORCID Identifier

Included in

Search

Browse by

Useful Links

All Dissertations

Domain-Specific Design on FPGA and NPU for scientific computing and On-Device LLM Inference

Author

Date of Award

Document Type

Degree Name

Department

Committee Chair/Advisor

Committee Member

Committee Member

Committee Member

Abstract

Recommended Citation

Author ORCID Identifier

Included in

Share

Search

Browse by

Useful Links