Date of Award
12-2011
Document Type
Thesis
Degree Name
Master of Science (MS)
Legacy Department
School of Computing
Committee Chair/Advisor
Luo, Feng
Committee Member
Apon , Amy
Committee Member
Blenda , Anna
Abstract
The bioinformatics applications often involve many computational components and massive data sets, which are very difficult to be deployed on a single computing machine. In this thesis, we designed a data-intensive computing platform for bioinformatics applications using virtualization technologies and high performance computing (HPC) infrastructures with the concept of multi-tier architecture, which can seamlessly integrate the web user interface (presentation tier), scientific workflow (logic tier) and computing infrastructure (data/computing tier). We demonstrated our platform on two bioinformatics projects. First, we redesigned and deployed the cotton marker database (CMD) (http://www.cottonmarker.org), a centralized web portal in the cotton research community, using the Xen-based virtualization solution. To achieve high-performance and scalability for CMD web tools, we hosted the large amounts of protein databases and computational intensive applications of CMD on the Palmetto HPC of Clemson University. Biologists can easily utilize both bioinformatics applications and HPC resources through the CMD website without a background in computer science. Second, we developed a web tools - Glycan Array QSAR Tool (http://bci.clemson.edu/tools/glycan_array), to analyze glycan array data. The user interface of this tool was developed at the top of Drupal Content Management Systems (CMS) and the computational part was implemented using MATLAB Compiler Runtime (MCR) module. Our new bioinformatics computing platform enables the rapid deployment of data-intensive bioinformatics applications on HPC and virtualization environment with a user-friendly web interface and bridges the gap between biological scientists and cyberinfrastructure.
Recommended Citation
Xuan, Pengfei, "DATA-INTENSIVE COMPUTING FOR BIOINFORMATICS USING VIRTUALIZATION TECHNOLOGIES AND HPC INFRASTRUCTURES" (2011). All Theses. 1261.
https://open.clemson.edu/all_theses/1261