Consensus σ70 promoter prediction using hadoop

Hogan, James M., Kelly, Wayne A. and Newell, Felicity S. (2013). Consensus σ70 promoter prediction using hadoop. In: IEEE Ninth International Conference on e-Science: e-Science 2013 Proceedings. IEEE e-Science 2013: 9th IEEE International Conference on e-Science, Beijing, China, (35-44). 22-25 October, 2013. doi:10.1109/eScience.2013.42

Attached Files (Some files may be inaccessible until you login with your UQ eSpace credentials)
Name Description MIMEType Size Downloads

Author Hogan, James M.
Kelly, Wayne A.
Newell, Felicity S.
Title of paper Consensus σ70 promoter prediction using hadoop
Conference name IEEE e-Science 2013: 9th IEEE International Conference on e-Science
Conference location Beijing, China
Conference dates 22-25 October, 2013
Proceedings title IEEE Ninth International Conference on e-Science: e-Science 2013 Proceedings
Journal name Proceedings - IEEE 9th International Conference on e-Science, e-Science 2013
Place of Publication Piscataway, NJ, United States
Publisher IEEE Computer Society
Publication Year 2013
Year available 2013
Sub-type Fully published paper
DOI 10.1109/eScience.2013.42
ISBN 9780768550831
Start page 35
End page 44
Total pages 10
Language eng
Abstract/Summary MapReduce frameworks such as Hadoop are well suited to handling large sets of data which can be processed separately and independently, with canonical applications in information retrieval and sales record analysis. Rapid advances in sequencing technology have ensured an explosion in the availability of genomic data, with a consequent rise in the importance of large scale comparative genomics, often involving operations and data relationships which deviate from the classical Map Reduce structure. This work examines the application of Hadoop to patterns of this nature, using as our focus a wellestablished workflow for identifying promoters - binding sites for regulatory proteins - Across multiple gene regions and organisms, coupled with the unifying step of assembling these results into a consensus sequence. Our approach demonstrates the utility of Hadoop for problems of this nature, showing how the tyranny of the "dominant decomposition" can be at least partially overcome. It also demonstrates how load balance and the granularity of parallelism can be optimized by pre-processing that splits and reorganizes input files, allowing a wide range of related problems to be brought under the same computational umbrella. Copyright
Subjects 1710 Information Systems
Q-Index Code E1
Q-Index Status Confirmed Code
Institutional Status UQ

Version Filter Type
Citation counts: TR Web of Science Citation Count  Cited 0 times in Thomson Reuters Web of Science Article
Scopus Citation Count Cited 1 times in Scopus Article | Citations
Google Scholar Search Google Scholar
Created: Tue, 04 Mar 2014, 11:52:02 EST by System User on behalf of Institute for Molecular Bioscience