Date of Award
5-2024
Document Type
Thesis
Degree Name
Master of Science (MS)
Department
School of Computing
Committee Chair/Advisor
Dr. Amy Apon
Committee Member
Dr. Nina Hubig
Committee Member
Dr. Brian Dean
Abstract
Medical coding is the process by which standardized medical codes are assigned to patient health records. This is a complex and challenging task that typically requires an expert human coder to review health records and assign codes from a classification system based on a standard set of rules. Considering the downstream use of these codes in statistical analysis, billing, and patient care, improving the accuracy and efficiency of the medical coding process through automation could have a far-reaching impact on the healthcare domain. Since health records typically consist of a large proportion of free-text documents, this problem has traditionally been approached as a natural language processing (NLP) task. While machine learning-based methods have seen recent popularity on this task, they tend to struggle with codes that are assigned less frequently, for which little or no training data exists. In this thesis, we utilize the open-source programming language for natural language processing, NLP++, and its associated integrated development environment to design and build an automated system to assign International Classification of Diseases (ICD) codes to discharge summaries that functions in the absence of labeled training data. We evaluate our system using the MIMIC-III dataset and compare our results to supervised approaches. Results show that for datasets where labels are sparse, our approach matches state-of-the-art machine learning approaches. It is somewhat less effective for densely labeled datasets, but provides additional support for explainability and adaptability. Overall, our approach presents an effective pathway for code assignment in clinical documents by providing both competitive performance and enhanced explainability.
Recommended Citation
Williamson, Ashton, "Low-Resource ICD Coding of Discharge Summaries" (2024). All Theses. 4310.
https://open.clemson.edu/all_theses/4310