Genomics virtual laboratory: a practical bioinformatics workbench for the cloud

Afgan, Enis, Sloggett, Clare, Goonasekera, Nuwan, Makunin, Igor, Benson, Derek, Crowe, Mark, Gladman, Simon, Kowsar, Yousef, Pheasant, Michael, Horst, Ron and Lonie, Andrew (2015) Genomics virtual laboratory: a practical bioinformatics workbench for the cloud. PLoS One, 10 10: . doi:10.1371/journal.pone.0140829


Author Afgan, Enis
Sloggett, Clare
Goonasekera, Nuwan
Makunin, Igor
Benson, Derek
Crowe, Mark
Gladman, Simon
Kowsar, Yousef
Pheasant, Michael
Horst, Ron
Lonie, Andrew
Title Genomics virtual laboratory: a practical bioinformatics workbench for the cloud
Journal name PLoS One   Check publisher's open access policy
ISSN 1932-6203
Publication date 2015-10-26
Sub-type Article (original research)
DOI 10.1371/journal.pone.0140829
Open Access Status DOI
Volume 10
Issue 10
Total pages 20
Place of publication San Francisco, CA, United States
Publisher Public Library of Science
Collection year 2016
Language eng
Formatted abstract
Background: Analyzing high throughput genomics data is a complex and compute intensive task, generally requiring numerous software tools and large reference data sets, tied together in successive stages of data transformation and visualisation. A computational platform enabling best practice genomics analysis ideally meets a number of requirements, including: a wide range of analysis and visualisation tools, closely linked to large user and reference data sets; workflow platform(s) enabling accessible, reproducible, portable analyses, through a flexible set of interfaces; highly available, scalable computational resources; and flexibility and versatility in the use of these resources to meet demands and expertise of a variety of users. Access to an appropriate computational platform can be a significant barrier to researchers, as establishing such a platform requires a large upfront investment in hardware, experience, and expertise.

Results: We designed and implemented the Genomics Virtual Laboratory (GVL) as a middleware layer of machine images, cloud management tools, and online services that enable researchers to build arbitrarily sized compute clusters on demand, pre-populated with fully configured bioinformatics tools, reference datasets and workflow and visualisation options. The platform is flexible in that users can conduct analyses through web-based (Galaxy, RStudio, IPython Notebook) or command-line interfaces, and add/remove compute nodes and data resources as required. Best-practice tutorials and protocols provide a path from introductory training to practice. The GVL is available on the OpenStack-based Australian Research Cloud (http://nectar.org.au) and the Amazon Web Services cloud. The principles, implementation and build process are designed to be cloud-agnostic.

Conclusions: This paper provides a blueprint for the design and implementation of a cloud-based Genomics Virtual Laboratory. We discuss scope, design considerations and technical and logistical constraints, and explore the value added to the research community through the suite of services and resources provided by our implementation.
Keyword Differential expression analysis
Rna-Seq data
Reproducible research
Sequencing analyses
Data management
Q-Index Code C1
Q-Index Status Provisional Code
Institutional Status UQ

Document type: Journal Article
Sub-type: Article (original research)
Collections: Office of the Vice-Chancellor
Official 2016 Collection
Institute for Molecular Bioscience - Publications
 
Versions
Version Filter Type
Citation counts: TR Web of Science Citation Count  Cited 0 times in Thomson Reuters Web of Science Article
Scopus Citation Count Cited 5 times in Scopus Article | Citations
Google Scholar Search Google Scholar
Created: Sun, 29 Nov 2015, 00:21:49 EST by System User on behalf of Scholarly Communication and Digitisation Service