Workflows in bioinformatics: Meta-analysis and prototype implementation of a workflow generator

Garcia, Alexander Garcia, Thoraval, Samuel, Garcia, Leyla J. and Ragan, Mark A. (2005) Workflows in bioinformatics: Meta-analysis and prototype implementation of a workflow generator. BMC Bioinformatics, 6 : 87.1-87.10.

Attached Files (Some files may be inaccessible until you login with your UQ eSpace credentials)
Name Description MIMEType Size Downloads
Garcia2005.pdf Garcia2005.pdf application/pdf 400.88KB 131

Author Garcia, Alexander Garcia
Thoraval, Samuel
Garcia, Leyla J.
Ragan, Mark A.
Title Workflows in bioinformatics: Meta-analysis and prototype implementation of a workflow generator
Journal name BMC Bioinformatics   Check publisher's open access policy
ISSN 1471-2105
Publication date 2005-04-01
Sub-type Article (original research)
DOI 10.1186/1471-2105-6-87
Volume 6
Start page 87.1
End page 87.10
Total pages 10
Place of publication London, United Kingdom
Publisher Biomed Central
Collection year 2005
Language eng
Subject 279999 Biological Sciences not elsewhere classified
280103 Information Storage, Retrieval and Management
289999 Other Information, Computing and Communication Sciences
230199 Mathematics not elsewhere classified
780105 Biological sciences
280102 Information Systems Management
C1
Formatted abstract Background
Computational methods for problem solving need to interleave information access and algorithm execution in a problem-specific workflow. The structures of these workflows are defined by a scaffold of syntactic, semantic and algebraic objects capable of representing them. Despite the proliferation of GUIs (Graphic User Interfaces) in bioinformatics, only some of them provide workflow capabilities; surprisingly, no meta-analysis of workflow operators and components in bioinformatics has been reported.

Results
We present a set of syntactic components and algebraic operators capable of representing analytical workflows in bioinformatics. Iteration, recursion, the use of conditional statements, and management of suspend/resume tasks have traditionally been implemented on an ad hoc basis and hard-coded; by having these operators properly defined it is possible to use and parameterize them as generic re-usable components. To illustrate how these operations can be orchestrated, we present GPIPE, a prototype graphic pipeline generator for PISE that allows the definition of a pipeline, parameterization of its component methods, and storage of metadata in XML formats. This implementation goes beyond the macro capacities currently in PISE. As the entire analysis protocol is defined in XML, a complete bioinformatic experiment (linked sets of methods, parameters and results) can be reproduced or shared among users. Availability: http://if-web1.imb.uq.edu.au/Pise/5.a/gpipe.html (interactive), ftp://ftp.pasteur.fr/pub/GenSoft/unix/misc/Pise/ (download).

Conclusion

From our meta-analysis we have identified syntactic structures and algebraic operators common to many workflows in bioinformatics. The workflow components and algebraic operators can be assimilated into re-usable software components. GPIPE, a prototype implementation of this framework, provides a GUI builder to facilitate the generation of workflows and integration of heterogeneous analytical tools.
Keyword Workflow capabilities
Workflow operators
Bioinformatics
syntactic components
Algebraic operators
Analytical workflows
Meta-analysis
References 1. Hollingsworth D: The workflow reference model. [http://www.wfmc.org/standards/docs/tc003v11.pdf] 2. Ernst P, Glatting K-H, Shuai S: A task framework for the web interface W2H. Bioinformatics 2003, 19:278-282. 3. Letondal C: A Web interface generator for molecular biology programs in Unix. Bioinformatics 2001, 17:73-82. 4. Senger M, Flores T, Glatting K-H, Ernst P, Hotz-Wagenblatt A, Suhai S: W2H: WWW interface to the GCG sequence analysis package. Bioinformatics 1998, 14:452-457. 5. Rice P, Longden I, Bleasby A: EMBOSS: the European Molecular Biology Open Software Suite. Trends Genet 2000, 16:276-277. 6. Stevens R, Robinson AJ, Goble C: myGrid: personalised bioinformatics on the information grid. Bioinformatics 2003, 19:302i-304i. 7. Shah SP, He DYM, Sawkins JN, Druce JC, Quon G, Lett D, Zheng GXY, Xu T, Ouellette BFF: Pegasys: software for executing and integrating analyses of biological sequences. BMC Bioinformatics 2004, 5:40. 8. Lei K, Singh M: A comparison of workflow meta-models. Workshop on behavioural modelling and design transformations: Issues and opportunities in conceptual modelling. Los Angeles 1997. ER'97, 6-7 November 1997 9. Stevens R, Goble C, Baker P, Brass A: A classification of tasks in bioinformatics. Bioinformatics 2001, 17:180-188. 10. Ganter B, Kuznetsov SO: Formalizing hypothesis with concepts. In 8th International Conference on Conceptual Structures, ICCS Conceptual Structures: Logical, Linguistic, and Computational Issues. Darmstadt, Germany. Lecture Notes in Computer Science 1867. Edited by: Mineau G, Ganter B. Springer-Verlag; 2000:342-356. 11. Sowa FJ: Top-level ontological categories. International Journal of Human Computer Studies 1995, 43:669-685.
Q-Index Code C1
Q-Index Status Provisional Code
Institutional Status UQ
Additional Notes Article number 87.

 
Versions
Version Filter Type
Citation counts: TR Web of Science Citation Count  Cited 16 times in Thomson Reuters Web of Science Article | Citations
Scopus Citation Count Cited 22 times in Scopus Article | Citations
Google Scholar Search Google Scholar
Access Statistics: 407 Abstract Views, 131 File Downloads  -  Detailed Statistics
Created: Mon, 22 Aug 2005, 10:00:00 EST by Cheong Xin Chan on behalf of School of Engineering