Date of Award
5-2026
Document Type
Dissertation
Degree Name
Doctor of Philosophy (PhD)
Department
Computer Engineering
Committee Chair/Advisor
Melissa Smith
Committee Member
Alex Feltus
Committee Member
Jon Calhoun
Committee Member
Fatemeh Afghah
Abstract
Breast cancer is one of the most common and deadly cancers in the world. Although doctors can use gene activity data to better understand different types of breast cancer, it is still difficult to identify the most important genes and to track how the disease changes over time. This is partly because gene data are very large and complex.
This dissertation develops a computer-based framework to study breast cancer using gene expression data. The work focuses on three goals: creating realistic synthetic gene data, identifying important genes linked to cancer, and modeling how breast cancer changes from normal tissue to more advanced stages. To do this, I developed a new artificial intelligence pipeline called GEMDiff. This system learns patterns in gene expression data and uses them to generate new samples, compare tumor and normal states, and predict disease progression.
The results showed that using smaller, carefully selected gene sets was more effective than using large random sets when trying to detect meaningful biological patterns. Based on this finding, the study used a focused set of genes to identify important cancer-related signals. The model also captured stage-by-stage changes in breast cancer and linked these changes to biological processes already known to play important roles in cancer, including immune signaling, cell death, and changes in the tumor microenvironment.
Overall, this research shows that artificial intelligence can help reveal meaningful patterns in breast cancer gene data and may support future work in biomarker discovery, disease progression analysis, and precision medicine.
Recommended Citation
Ai, Xusheng, "Generative AI of Breast Cancer Progression in Gene Expression Space" (2026). All Dissertations. 4215.
https://open.clemson.edu/all_dissertations/4215
Author ORCID Identifier
0009-0009-5155-8550