Author

Date of Award

5-2026

Document Type

Dissertation

Degree Name

Doctor of Philosophy (PhD)

Department

Computer Engineering

Committee Chair/Advisor

Fatemeh Afghah

Committee Member

Xiaolong Ma

Committee Member

Yongkai Wu

Committee Member

Tao Wei

Committee Member

Long Cheng

Abstract

Deep learning has achieved remarkable success across domains such as computer vision, natural language processing, and multimodal reasoning. This progress has largely been driven by increasingly large neural networks and foundation models. However, scaling model size and input complexity introduces substantial computational and memory demands across the deep learning lifecycle. Training highly overparameterized models requires extensive computational resources, while deployment, particularly in resource constrained environments, faces strict latency, memory, and energy limitations. These challenges are further amplified in large scale vision language models (VLMs), where hundreds or thousands of visual tokens must be processed for each input, significantly increasing inference cost and limiting scalability.

This dissertation addresses these challenges through a unified perspective that views efficient deep learning as a budget aware optimization problem across three interacting computational resources: parameter capacity, optimization dynamics, and representation complexity. Under this framework, efficiency is improved by systematically optimizing the parameter budget, optimization budget, and representation budget that govern different stages of the deep learning lifecycle.

To improve deployment side efficiency in video processing, this dissertation first introduces the Spatial Temporal Data Overfitting (STDO) framework for video super resolution. STDO leverages spatial and temporal information density to enable high quality reconstruction using lightweight models, making real time deployment feasible on resource constrained devices.

To address efficiency challenges during model training, the dissertation investigates sparse learning and optimization aware training strategies. Bi-Level Dynamic Sparse Training (BiDST) formulates sparse topology discovery as a structured bi-level optimization problem, enabling stable sparse training while improving model capacity allocation. Single Step Sharpness Aware Minimization (S2-SAM) introduces an efficient approximation of sharpness aware optimization that preserves generalization benefits without incurring additional training overhead. Zero Order Sharpness Aware Minimization (ZO-SAM) further integrates zeroth order gradient estimation with sharpness aware optimization to reduce gradient computation cost while maintaining robustness.

Finally, the dissertation develops representation efficient techniques for scalable vision language models. Importance Diversity Disentangled Token Pruning (ID-Pruner) formulates visual token reduction as a structured subset selection problem that separates semantic importance estimation from diversity aware coverage, enabling effective training free token pruning under strict token budgets. Importance Aware Token Generation (ITG) further improves token compression by reformulating token reduction as a representation reconstruction problem, generating compact tokens that preserve the information distribution of visual features.

Collectively, these contributions establish a unified framework for efficient deep learning across parameter sparsity, optimization dynamics, and representation compression. By systematically addressing efficiency challenges in both training and inference, this work advances the development of scalable, robust, and deployment ready deep learning systems.

Available for download on Monday, May 31, 2027

Share

COinS