Date of Award
12-2025
Document Type
Thesis
Degree Name
Master of Science (MS)
Department
Computer Engineering
Committee Chair/Advisor
Dr. Fatemeh Afghah
Committee Member
Dr. Yongkai Wu
Committee Member
Dr. Tao Wei
Abstract
Visual anomaly detection is a technology that uses computer vision to automatically identify defects or irregularities in images, such as cracks, scratches, or discolorations on manufactured products. Unsupervised visual anomaly detection does this without needing examples of those defects during the training process. This "unsupervised" approach is crucial in industries like manufacturing, automotive, electronics, and pharmaceuticals, where ensuring product quality is essential for safety, reliability, and cost efficiency. For instance, it helps spot flaws in circuit boards, fabrics, or medical pills during production lines, preventing faulty items from reaching consumers. By reducing manual inspections, it saves time and resources, benefiting companies by minimizing waste and recalls, workers by automating tedious tasks, and end-users through higher-quality products. This field is growing in importance as factories adopt AI-driven automation to handle diverse items at scale, but it faces hurdles in accurately detecting subtle defects that blend into normal appearances. In multi-class settings, where a single system must inspect various product types, as seen in bench- marks like MVTec-AD (15 categories of objects and textures) and VisA (12 classes with multiple defects per image), challenges include extreme imbalance between normal and defective areas (normal pixels often exceed 95%), defects that camouflage within textures, and difficulty generalizing across categories without losing precision. This thesis advances a prompt-based image reconstruction system by introducing three key innovations: (1) an enhanced refiner architecture with multi-scale convolutional processing to capture anomalies at different sizes, Transformer attention mechanisms to focus computation on suspicious regions, and diffusion- inspired iterative refinement for progressively sharper boundary detection; (2) Focal Loss integration that dynamically emphasizes sparse hard-to-detect anomalous pixels while downweighting the overwhelming majority of easy normal pixels, addressing extreme pixel-level class imbalance; and (3) cross-modal CLIP prompt enhancement that combines visual examples with textual semantic descriptions to provide class-specific understanding of normal characteristics and common defect types, helping distinguish legitimate variations ii from actual anomalies. Evaluation on MVTec-AD demonstrates substantial improvements in pixel-level localization precision, with particularly strong gains on texture classes and objects requiring sharp boundary detection. The enhancements prove effective for clean single-defect scenarios while revealing fundamental challenges in multi-instance detection and distribution shift robustness that define important future research directions. This work advances practical anomaly detection capabilities for diverse manufacturing applications. Quality control systems in electronics manufacturing can benefit from improved detection of subtle PCB defects and component damage. Textile and material inspection gains from better discrimination between natural variations and actual defects in wood, leather, and fabric surfaces. Food processing quality assurance can leverage enhanced localization for detecting contamination and packaging defects. Pharmaceutical manufacturing benefits from precise identification of pill surface anomalies and capsule deformities. More broadly, any high-throughput visual inspection scenario requiring automated defect detection with minimal false positives stands to gain from the systematic integration of architectural refinement, imbalance-aware training, and semantic guidance demonstrated in this work.
Recommended Citation
McCain, Duncan F., "Cross-Modal Prompting for Multi-Class Visual Anomaly Localization" (2025). All Theses. 4641.
https://open.clemson.edu/all_theses/4641