Accelerating the BLAST algorithm via parrallel computing

Calder, Mark. (2005). Accelerating the BLAST algorithm via parrallel computing MPhil Thesis, School of Information Technology and Electrical Engineering, The University of Queensland.

       
Attached Files (Some files may be inaccessible until you login with your UQ eSpace credentials)
Name Description MIMEType Size Downloads
THE18388.pdf Full text application/pdf 9.00MB 7
Author Calder, Mark.
Thesis Title Accelerating the BLAST algorithm via parrallel computing
School, Centre or Institute School of Information Technology and Electrical Engineering
Institution The University of Queensland
Publication date 2005
Thesis type MPhil Thesis
Supervisor Dr Adam Postula
Total pages 50
Language eng
Subjects 08 Information and Computing Sciences
290901 Electrical Engineering
Formatted abstract

BLAST is a fast heuristic method for biological sequence and database comparison. However with databases growing faster than Moore's law a scalable, parallel method is needed to reduce the execution time. The system presented can parallel process BLASTP protein jobs on a cluster in two ways. Either with a subsectioned query and whole database, when the whole database can fit into the memory of each node or with a subsectioned database and query with the single output file reconstructed in parallel. The query and database strings themselves weren't cut up only the file of multiple protein entries was subsectioned. Much like searching in a book by distributing blocks of sentences to many searchers. The system uses the PVM (Parallel Virtual Machine) library and implements a queuing system which schedules the BLAST protein searches and reconstruction jobs.
The system's scalability is better than NCBI BLAST on a SMP machine and the sub sectioning of queries and databases allows the large memory requirements of these databases to be spread across the combined memory of the cluster. The execution time of the Yeast protein database against the Human protein database went from 3 hours, 51 minutes and 6 seconds on one Intel Pentium 700MHz CPU to 5 minutes and 50 seconds on 60 CPUs a speed up of 40. In comparison the scalability of NCBI BLAST is approximately 5 on a SMP machine. A yeast query on a subsectioned human database yielded a scalability of 27 on 50CPUs. The smallest database is the bacteria E.coli and it has a scalability of 14 on 16 CPUs with a whole database and 9 on 16CPUs with segmented database. Bringing execution time down from 16 minutes and 22 seconds to 1 minute and 10 seconds. The next smallest database Drosophila's execution time went from 6 hours and 5 minutes to 25 minutes and 52 seconds on 16 machines for a scalability of 14 with a segmented database and a scalability of 30 for segmented query.
This project was started in 2000 and at that time I found no BLAST systems with large scalability described in the literature except for SGI's HT-BLAST which only runs on their supercomputers and was limited to BLAST version 1.4 when version 2 was in common use. The literature review of chapter 3 contains the latest developments in the field and many of these systems now have overlapping features with each other and my system.

Keyword BLAST (Electronic resource)
Bioinformatics
Genomics -- Data processing

 
Citation counts: Google Scholar Search Google Scholar
Access Statistics: 213 Abstract Views, 7 File Downloads  -  Detailed Statistics
Created: Fri, 02 Mar 2012, 09:58:34 EST by Sharon Bunce on behalf of Marketing, Outreach and Corporate Services