Fast parallel Markov clustering in bioinformatics using massively parallel graphics processing unit computing

Bustamam, Alhadi, Burrage, Kevin and Hamilton, Nicholas A. (2010). Fast parallel Markov clustering in bioinformatics using massively parallel graphics processing unit computing. In: Joint PDMC+HiBi Workshop 2010: 2010 Ninth International Workshop on Parallel and Distributed Methods in Verification; and Second International Workshop on High Performance Computational Systems Biology. Proceedings. Joint PDMC/HiBi Workshop 2010: International Workshop on High Performance Computational Systems Biology (2nd, HiBi, 2010); International Workshop on Parallel and Distributed Methods in Verification (9th, PDMC, 2010), Enschede, Netherlands, (116-125). 30 September-1 October 2010. doi:10.1109/PDMC-HiBi.2010.23

Attached Files (Some files may be inaccessible until you login with your UQ eSpace credentials)
Name Description MIMEType Size Downloads

Author Bustamam, Alhadi
Burrage, Kevin
Hamilton, Nicholas A.
Title of paper Fast parallel Markov clustering in bioinformatics using massively parallel graphics processing unit computing
Conference name Joint PDMC/HiBi Workshop 2010: International Workshop on High Performance Computational Systems Biology (2nd, HiBi, 2010); International Workshop on Parallel and Distributed Methods in Verification (9th, PDMC, 2010)
Conference location Enschede, Netherlands
Conference dates 30 September-1 October 2010
Proceedings title Joint PDMC+HiBi Workshop 2010: 2010 Ninth International Workshop on Parallel and Distributed Methods in Verification; and Second International Workshop on High Performance Computational Systems Biology. Proceedings
Journal name Proceedings of the 9th Int. Workshop on Parallel and Distributed Methods in Verification, PDMC 2010 - Joint with the 2nd Int. Workshop on High Performance Computational Systems Biology, HiBi 2010
Place of Publication Piscataway, NJ, U.S.A.
Publisher Institute of Electrical and Electronic Engineers (IEEE)
Publication Year 2010
Sub-type Fully published paper
DOI 10.1109/PDMC-HiBi.2010.23
ISBN 9780769542652
0769542654
Start page 116
End page 125
Total pages 10
Collection year 2011
Language eng
Formatted Abstract/Summary
Markov clustering is becoming a key algorithm within bioinformatics for determining clusters in networks. For instance, clustering protein interaction networks is helping find genes implicated in diseases such as cancer. However, with fast sequencing and other technologies generating vast amounts of data on biological networks, performance and scalability issues are becoming a critical limiting factor in applications. Meanwhile, Graphics Processing (GPU) computing, which uses a massively parallel computing environment in the GPU card, is becoming a very powerful, efficient and low cost option to achieve substantial performance gains over CPU approaches. This paper introduces a very fast Markov clustering algorithm (MCL) based on massive parallel computing in GPU. We use the Compute Unified Device Architecture (CUDA) to allow the GPU to perform parallel sparse matrix-matrix computations and parallel sparse Markov matrix normalizations, which are at the heart of the clustering algorithm. The key to optimizing our CUDA Markov Clustering (CUDAMCL) was utilising ELLACK-R sparse data format to allow the effective and fine-grain massively parallel processing to cope with the sparse nature of interaction networks datasets in bioinformatics applications. CUDA also allows us to use on-chip memory on the GPU efficiently, to lower the latency time thus circumventing a major issue in other parallel computing environments, such as Message Passing Interface (MPI).

Here we describe the GPU algorithm and its application to several real world problems as well as to artificial datasets. We find that the principle factor causing variation in performance of the GPU approach is the relative sparseness of networks. Comparing GPU computation times against a modern quad-core CPU on the published (relatively sparse) standard BIOGRID protein interaction networks with 5156 and 23175 nodes, speed factors of 4 times and 9 were obtained, respectively. On the Human Protein Reference Database, the speed of clustering of 19599 proteins was improved by a factor of 7 by the GPU algorithm. However, on artificially generated densely connected networks with 1600 to 4800 nodes, speedups by a factor in the range 40 to 120 times were readily obtained. As the results show, in all cases the GPU implementation is significantly faster than the original MCL running on CPU. Such approaches are allowing large-scale parallel computation on off-the-shelf desktop machines that were previously only possible on super-computing architectures, and have the potential to significantly change the way bioinformaticians and biologists compute and interact with their data.
Keyword Bioinformatics
Parallel Markov clustering
Massively-parallel Computers
Q-Index Code E1
Q-Index Status Confirmed Code
Institutional Status UQ
Additional Notes Presented during the Session "HiBi Parallel Computing in Systems Biology (1)".

 
Available Versions of this Record
 
Versions
Version Filter Type
Citation counts: Scopus Citation Count Cited 3 times in Scopus Article | Citations
Google Scholar Search Google Scholar
Created: Thu, 24 Feb 2011, 10:33:40 EST by Susan Allen on behalf of Institute for Molecular Bioscience