Date of Award

12-2025

Document Type

Thesis

Degree Name

Master of Science (MS)

Department

Computer Engineering

Committee Chair/Advisor

Dr. Fatemeh Afghah

Committee Member

Dr. Yongkai Wu

Committee Member

Dr. Tao Wei

Abstract

Visual anomaly detection is a technology that uses computer vision to automatically identify defects or irregularities in images, such as cracks, scratches, or discolorations on manufactured products. Unsupervised visual anomaly detection does this without needing examples of those defects during the training process. This "unsupervised" approach is crucial in industries like manufacturing, automotive, electronics, and pharmaceuticals, where ensuring product quality is essential for safety, reliability, and cost efficiency. For instance, it helps spot flaws in circuit boards, fabrics, or medical pills during production lines, preventing faulty items from reaching consumers. By reducing manual inspections, it saves time and resources, benefiting companies by minimizing waste and recalls, workers by automating tedious tasks, and end-users through higher-quality products. This field is growing in importance as factories adopt AI-driven automation to handle diverse items at scale, but it faces hurdles in accurately detecting subtle defects that blend into normal appearances. In multi-class settings, where a single system must inspect various product types, as seen in bench- marks like MVTec-AD (15 categories of objects and textures) and VisA (12 classes with multiple defects per image), challenges include extreme imbalance between normal and defective areas (normal pixels often exceed 95%), defects that camouflage within textures, and difficulty generalizing across categories without losing precision. This thesis advances a prompt-based image reconstruction system by introducing three key innovations: (1) an enhanced refiner architecture with multi-scale convolutional processing to capture anomalies at different sizes, Transformer attention mechanisms to focus computation on suspicious regions, and diffusion- inspired iterative refinement for progressively sharper boundary detection; (2) Focal Loss integration that dynamically emphasizes sparse hard-to-detect anomalous pixels while downweighting the overwhelming majority of easy normal pixels, addressing extreme pixel-level class imbalance; and (3) cross-modal CLIP prompt enhancement that combines visual examples with textual semantic descriptions to provide class-specific understanding of normal characteristics and common defect types, helping distinguish legitimate variations ii from actual anomalies. Evaluation on MVTec-AD demonstrates substantial improvements in pixel-level localization precision, with particularly strong gains on texture classes and objects requiring sharp boundary detection. The enhancements prove effective for clean single-defect scenarios while revealing fundamental challenges in multi-instance detection and distribution shift robustness that define important future research directions. This work advances practical anomaly detection capabilities for diverse manufacturing applications. Quality control systems in electronics manufacturing can benefit from improved detection of subtle PCB defects and component damage. Textile and material inspection gains from better discrimination between natural variations and actual defects in wood, leather, and fabric surfaces. Food processing quality assurance can leverage enhanced localization for detecting contamination and packaging defects. Pharmaceutical manufacturing benefits from precise identification of pill surface anomalies and capsule deformities. More broadly, any high-throughput visual inspection scenario requiring automated defect detection with minimal false positives stands to gain from the systematic integration of architectural refinement, imbalance-aware training, and semantic guidance demonstrated in this work.

Recommended Citation

McCain, Duncan F., "Cross-Modal Prompting for Multi-Class Visual Anomaly Localization" (2025). All Theses. 4641.
https://open.clemson.edu/all_theses/4641

Download

Included in

Artificial Intelligence and Robotics Commons

COinS

All Theses

Cross-Modal Prompting for Multi-Class Visual Anomaly Localization

Date of Award

Document Type

Degree Name

Department

Committee Chair/Advisor

Committee Member

Committee Member

Abstract

Recommended Citation

Included in

Search

Browse by

Useful Links

All Theses

Cross-Modal Prompting for Multi-Class Visual Anomaly Localization

Author

Date of Award

Document Type

Degree Name

Department

Committee Chair/Advisor

Committee Member

Committee Member

Abstract

Recommended Citation

Included in

Share

Search

Browse by

Useful Links