Date of Award
12-2016
Document Type
Dissertation
Degree Name
Doctor of Philosophy (PhD)
Legacy Department
Genetics
Committee Member
Dr. Liangjiang Wang, Committee Chair
Committee Member
Dr. Julia Frugoli
Committee Member
Dr. Hong Luo
Committee Member
Dr. Anand Srivastava
Abstract
Within this study, we sought to leverage knowledge from well-characterized protein coding genes to characterize the lesser known long non-coding RNA (lncRNA) genes using computational methods to find functional annotations and disease associations. Functional genome annotation is an essential step to a systems-level view of the human genome. With this knowledge, we can gain a deeper understanding of how humans develop and function, and a better understanding of human disease. LncRNAs are transcripts greater than 200 nucleotides, which do not code for proteins. LncRNAs have been found to regulate development, tissue and cell differentiation, and organ formation. Their dysregulation has been linked to several diseases including autism spectrum disorder (ASD) and cancer. While a great deal of research has been dedicated to protein-coding genes, the relatively recently discovered lncRNA genes have yet to be characterized. LncRNA function is tied closely to when and where they are expressed. Co-expression network analysis offer a means of functional annotation of uncharacterized genes through a "guilt by association" approach. We have constructed two co-expression networks using known disease-associated protein-coding genes and lncRNA genes. Through clustering of the networks, gene set enrichment analysis, and centrality measures, we found enrichment for disease association and functions as well as identified high-confidence lncRNA disease gene targets. We present a novel approach to the identification of disease state associations by demonstrating genes that are associated with the same disease states share patterns that can be discerned from transcriptomes of healthy tissues. Using a machine learning algorithm, we built a model to classify ASD versus non-ASD genes using their expression profiles from healthy developing human brain tissues. Feature selection during the model-building process also identified critical temporospatial points for the determination of ASD genes. We constructed a webserver tool for the prioritization of genes for ASD association. The webserver tool has a database containing prioritization and co-expression information for nearly every gene in the human genome.
Recommended Citation
Cogill, Steven, "Functional Analysis of Human Long Non-coding RNAs and Their Associations with Diseases" (2016). All Dissertations. 1850.
https://open.clemson.edu/all_dissertations/1850
File A-1
Additional file A-2.csv (785 kB)
File A-2
Additional file A-3.csv (261 kB)
File A-3
Additional file A-4.csv (11747 kB)
File A-4
Additional file A-5.csv (625 kB)
File A-5