Date of Award
5-2026
Document Type
Thesis
Degree Name
Master of Science (MS)
Department
Electrical and Computer Engineering (Holcomb Dept. of)
Committee Chair/Advisor
Tao Wei
Committee Member
Xiaoyong Yuan
Committee Member
Fatemeh Afghah
Abstract
This thesis presents a system that allows large AI models to run directly on personal devices instead of relying on cloud servers. Recent advances in artificial intelligence, especially large language models (LLMs), have made it possible to build powerful applications such as chatbots, coding assistants, and intelligent agents. However, most of these systems run in the cloud, which raises concerns about privacy, latency, and cost.
To address these issues, this work develops a local AI serving system that runs efficiently on a specialized hardware component called a Neural Processing Unit (NPU). The system provides a unified interface that supports multiple types of tasks, including text generation, image under- standing, speech recognition, and embedding-based search. It also supports advanced features such as streaming responses and tool calling, which are essential for building modern AI agents.
The system is designed to be compatible with widely used APIs, allowing existing appli- cations to use it without modification. Experimental results show that the system works correctly across different tasks and improves efficiency through techniques such as prompt caching.
Overall, this work demonstrates that it is possible to run advanced AI systems locally in a practical and efficient way, enabling faster, more private, and more flexible AI applications.
Recommended Citation
Ni, Zhiheng, "Implementation of a Local LLM Serving System for Agentic AI" (2026). All Theses. 4796.
https://open.clemson.edu/all_theses/4796