Date of Award

5-2026

Document Type

Dissertation

Degree Name

Doctor of Philosophy (PhD)

Department

Electrical and Computer Engineering (Holcomb Dept. of)

Committee Chair/Advisor

Fatemeh Afghah

Committee Member

Melissa Crawley Smith

Committee Member

Abolfazl Razi

Committee Member

Xiaolong Ma

Committee Member

Tao Wei

Abstract

Vision-Language Models (VLMs) and Multimodal Large Language Models (MLLMs) have recently emerged as powerful frameworks for learning joint representations across visual and textual modalities. These models enable a wide range of applications, including visual recognition, multimodal reasoning, and visual question answering. However, adapting large pre-trained VLMs to downstream tasks while preserving their strong generalization ability remains a significant challenge, particularly under domain shifts or limited supervision. This dissertation focuses on developing methods for generalizable adaptation of VLMs, aiming to improve robustness, efficiency, and applicability across diverse tasks and environments.

First, this work introduces novel prompt learning strategies for adapting CLIP-style VLMs while maintaining strong out-of-distribution generalization. In particular, we propose two frameworks, Style-Pro and DiSa, which address domain bias and overfitting in prompt learning. Style-Pro incorporates a style-guided prompt learning mechanism that synthesizes diverse style representations to reduce discrepancies between training and unseen domains. DiSa introduces directional saliency-aware regularization that enhances cross-modal alignment and encourages the model to focus on semantically important visual regions, improving robustness under limited data settings.

Second, this dissertation investigates efficient and generalizable multimodal in-context learning (ICL). To address the high inference cost and instability of demonstration-based prompting in MLLMs, we propose Hyper-ICL, a lightweight framework that reconstructs ICL behavior through attention-level adaptation. Hyper-ICL decomposes the effects of demonstrations within the attention mechanism and introduces query-adaptive modulation together with hyperbolic anchor distillation, enabling compact task representation while preserving multimodal reasoning capability.

Finally, the dissertation demonstrates the practical impact of VLMs in real-world applications. Two representative domains are explored. The first is wildfire monitoring, where a new benchmark dataset called WildFireVQA is introduced to evaluate multimodal reasoning using synchronized RGB and radiometric thermal aerial imagery. The benchmark enables systematic evaluation of multimodal models for tasks such as fire detection, hotspot localization, and environmental analysis. The second application focuses on industrial visual anomaly detection. To address limitations of existing approaches, the proposed Qwen-AD framework introduces a modular adaptation strategy using task-specialized LoRA experts and a dynamic gating mechanism for multi-task anomaly understanding in MLLMs.

Extensive experiments across multiple benchmarks demonstrate that the proposed methods significantly improve generalization, efficiency, and robustness in vision-language learning. Overall, this dissertation advances the development of adaptable multimodal systems capable of operating reliably across diverse domains and real-world scenarios.

Recommended Citation

Alipour Talemi, Niloufar, "Generalizable Adaptation for Vision-Language Models" (2026). All Dissertations. 4236.
https://open.clemson.edu/all_dissertations/4236

Author ORCID Identifier

0009-0000-6881-3671

Download

Included in

Computer Sciences Commons

COinS

All Dissertations

Generalizable Adaptation for Vision-Language Models

Date of Award

Document Type

Degree Name

Department

Committee Chair/Advisor

Committee Member

Committee Member

Committee Member

Committee Member

Abstract

Recommended Citation

Author ORCID Identifier

Included in

Search

Browse by

Useful Links

All Dissertations

Generalizable Adaptation for Vision-Language Models

Author

Date of Award

Document Type

Degree Name

Department

Committee Chair/Advisor

Committee Member

Committee Member

Committee Member

Committee Member

Abstract

Recommended Citation

Author ORCID Identifier

Included in

Share

Search

Browse by

Useful Links