Date of Award
5-2026
Document Type
Thesis
Degree Name
Master of Science (MS)
Department
Civil Engineering
Committee Chair/Advisor
Mashrur Chowdhury
Committee Member
Yongkai Wu
Committee Member
Chao Fan
Abstract
Autonomous vehicle (AV) systems typically employ modular systems in which discrete components handle separate tasks such as perception, computation, and path planning. While flexible, this approach allows errors to propagate and compound across the pipeline, and many AI systems offer little transparency into their internal decision-making. Such limitations are particularly concerning in safety-critical domains where failures can carry lethal consequences. Vision Language Models (VLMs) have emerged as a promising alternative because they support end-to-end implementations that bypass compounding error risks and provide natural language explanations of their outputs. Despite these advantages, prior research has demonstrated that both computer vision systems and large language models can exhibit demographic disparities, with outputs that vary systematically based on the characteristics of individuals represented in the input. Whether VLMs inherit or introduce similar biases in safety-critical pedestrian detection tasks remains largely unexplored. To investigate this gap, this study benchmarks five state-of-the-art VLMs across two pedestrian detection scenarios using the CityPersons dataset augmented with demographic labels, evaluating model performance across age and gender. The primary contribution is a systematic assessment of the demographic robustness of leading VLMs in pedestrian detection, providing insights critical to the unbiased and safe deployment of AI-driven AV systems.
Recommended Citation
Thomas, Ostonya K., "Robustness of Vision Language Models for Pedestrian Detection Tasks" (2026). All Theses. 4730.
https://open.clemson.edu/all_theses/4730
Author ORCID Identifier
0009-0007-6893-0852