Date of Award
5-2026
Document Type
Dissertation
Degree Name
Doctor of Philosophy (PhD)
Department
Electrical and Computer Engineering (Holcomb Dept. of)
Committee Chair/Advisor
Fatemeh Afghah
Committee Member
Nathan McNeese
Committee Member
Wael Abd Almageed
Committee Member
Tao Wei
Abstract
Despite remarkable progress in deep learning, a major challenge remains: machine learning models often struggle to generalize to unseen domains under distribution shift. In real-world settings, data often differ from training conditions due to changes in lighting, sensor type, image resolution, and style. These differences can significantly degrade performance, highlighting the need for representations that are both robust and generalizable. This thesis addresses this challenge by developing a set of frameworks for generalization across domains in anomaly detection, deepfake detection, and vision-language image recognition. For anomaly detection, this thesis introduces ROADS, a robust prompt-driven framework for multi-class unified anomaly detection. ROADS is designed to address two major limitations of existing approaches: interference among anomaly classes and sensitivity to domain shifts. It incorporates hierarchical class-aware prompts to encode class-specific semantics and includes a domain adaptation component to improve robustness under varying conditions. As a result, ROADS improves both anomaly detection and localization performance, especially in out-of-distribution settings. Beyond conventional anomaly detection, this thesis further studies multimodal anomaly understanding using Multimodal Large Language Models (MLLMs). Qwen-AD addresses multi-task industrial anomaly understanding by reducing task interference and improving generalization across multiple anomaly-related tasks. In addition, RADAR improves the robustness of MLLMs for anomaly understanding at inference time without retraining. It enhances reliability under shifted conditions by estimating uncertainty, strengthening visual grounding, and refining model predictions to reduce language bias and hallucination.
For deepfake detection, this thesis proposes FreqDebias, a framework designed to improve generalization to unseen forgery types. Many deepfake detectors rely on narrow spectral artifacts that do not transfer well across datasets or manipulation methods. FreqDebias reduces this spectral bias through a frequency-based augmentation strategy called Fo-Mixup and a dual consistency regularization that encourages consistent local and global representations. This leads to more robust and generalizable deepfake detection. For vision-language recognition, this thesis introduces Style-Pro, a style-guided prompt learning framework for models such as CLIP. It improves generalization by modeling style variation while preserving content and cross-modal alignment. As a result, the model adapts more effectively to unseen domains and classes while maintaining strong zero-shot performance. Extensive experiments validate the proposed methods across diverse benchmarks. The results consistently demonstrate improved robustness and generalization in anomaly detection, multimodal anomaly understanding, deepfake detection, and vision-language recognition under both in-distribution and out-of-distribution settings. Together, these findings show that robust representation learning, modular adaptation, and inference-time refinement can substantially improve generalization across diverse domains and visual environments.
Recommended Citation
Kashiani, Hossein, "Towards Generalizable Representation Learning Across Domains" (2026). All Dissertations. 4237.
https://open.clemson.edu/all_dissertations/4237
Author ORCID Identifier
0000-0001-8338-9987