Date of Award
5-2018
Document Type
Dissertation
Degree Name
Doctor of Philosophy (PhD)
Department
Genetics and Biochemistry
Committee Member
Dr. Liangjiang Wang, Committee Chair
Committee Member
Dr. Hong Luo
Committee Member
Dr. Anand Srivastava
Committee Member
Dr. Rajandeep Sekhon
Abstract
Life may have begun in an RNA world, which is supported by the increasingly vital role that RNA has been shown to perform in biological systems. To understand how the genome encodes life, one must look to the transcriptome, the set of all RNA molecules in a cell. The transcriptome illustrates which RNA transcripts are expressed at what times and this orchestrated network of gene expression is responsible for multicellular development. In humans, most genes are noncoding RNAs, meaning that they do not encode proteins. The largest class of noncoding genes are long noncoding RNAs (lncRNAs), RNA transcripts greater in length than 200 nucleotides which lack protein-coding capacity. Some lncRNAs have been shown to be key regulators; however, most lncRNAs are uncharacterized. Therefore, we developed genomic data mining methodologies for lncRNA functional annotation. Many lncRNAs are brain-specific and their dysregulation is suspected to be involved in neurodevelopmental disorders. Two prevalent brain disorders are intellectual disability (ID) and autism spectrum disorder (ASD), which are genetically heterogeneous with unidentified genetic risk factors. In this study, we created brain developmental gene coexpression networks, for ID and ASD, to identify lncRNAs associated with known disease genes. We found lncRNAs highly co-expressed with ID genes which harbored ID-associated copy number variants (CNVs). To find ASD-associated lncRNAs we identified lncRNAs differentially expressed in the ASD brain and then refined these candidates by filtering for associations with ASD risk genes in a human brain developmental coexpression network. These candidate-ASD associated lncRNAs were associated with the synaptic transmission and immune response pathways, in addition to residing within ASD-associated CNVs at a high frequency. The mechanism by which lncRNAs function is partly determined by functional motifs in the RNA transcript sequence. To identify lncRNA motifs, we developed a genetic algorithm capable of finding long motifs and found a motif associated with lncRNA nuclear localization. LncRNA functions are compartmentalized within the cell; therefore, knowledge of lncRNA subcellular localization provides insight into their biological function. We developed a deep learning model that predicts lncRNA subcellular localization from lncRNA transcript sequences. This model obtained high prediction accuracy on lncRNAs with known localizations suggesting that sequence motifs are involved in subcellular localization. In summary, we developed genomic data mining methods for the functional characterization of lncRNAs based on their expression patterns and transcript sequences.
Recommended Citation
Gudenas, Brian L., "Genomic Data Mining for Functional Annotation of Human Long Noncoding RNAs" (2018). All Dissertations. 2146.
https://open.clemson.edu/all_dissertations/2146