Date of Award
12-2016
Document Type
Dissertation
Degree Name
Doctor of Philosophy (PhD)
Legacy Department
Computer Science
Committee Member
Dr. Brian Dean, Committee Chair
Committee Member
Dr. Wayne Goddard
Committee Member
Dr. Ilya Safro
Committee Member
Dr. Feng Luo
Abstract
Large “correlationâ€-type networks have applications in machine learning, data mining and analysis of large biological networks. These networks contain short length-k (on the order of thousands) feature vectors for every node in the network that describe the characteristic or behavior of that node. Edges between nodes are expressed as correlations, partial correlations or coherence between the corresponding feature vectors. If the number of nodes n is too large (on the order of millions), then the dense correlation network with size n×n is computationally infeasible even to explicitly store in memory, let alone to use for computation of standard network analytics like clustering or centrality. In this dissertation, we provide an algorithmic framework to analyze large correlation networks efficiently in an implicit fashion, without writing down every entry of the network. Our framework allows us to compute various network analytics like clustering and centrality using spectral techniques (based on linear algebra). It can accommodate several more sophisticated edge weight measurements beyond just correlation, including partial correlation and coherence. Since correlation networks can contain negative edge weights, one of the main innovations of our work is a collection of transformations that re-map node feature vectors so as to produce a non-negative network in which edge weights are approximately the squares of the original weights. We also develop a number of other useful algorithmic tools (e.g., implicit calculation of shrinkage estimators) that play an important role along our implicit computational pipeline.
Recommended Citation
Mohan, Raghuveer, "Efficient Analytics for Large Implicit Correlation Networks" (2016). All Dissertations. 1826.
https://open.clemson.edu/all_dissertations/1826