Pinstripe: a suite of programs for integrating transcriptomic and proteomic datasets identifies novel proteins and improves differentiation of protein-coding and non-coding genes

Gascoigne, Dennis K., Cheetham, Seth W., Cattenoz, Pierre B., Clark, Michael B., Amaral, Paulo P., Taft, Ryan J., Wilhelm, Dagmar, Dinger, Marcel E. and Mattick, John S. (2012) Pinstripe: a suite of programs for integrating transcriptomic and proteomic datasets identifies novel proteins and improves differentiation of protein-coding and non-coding genes. Bioinformatics, 28 23: 3042-3050. doi:10.1093/bioinformatics/bts582


Author Gascoigne, Dennis K.
Cheetham, Seth W.
Cattenoz, Pierre B.
Clark, Michael B.
Amaral, Paulo P.
Taft, Ryan J.
Wilhelm, Dagmar
Dinger, Marcel E.
Mattick, John S.
Title Pinstripe: a suite of programs for integrating transcriptomic and proteomic datasets identifies novel proteins and improves differentiation of protein-coding and non-coding genes
Journal name Bioinformatics   Check publisher's open access policy
ISSN 1367-4803
1367-4811
Publication date 2012-10-07
Sub-type Article (original research)
DOI 10.1093/bioinformatics/bts582
Open Access Status DOI
Volume 28
Issue 23
Start page 3042
End page 3050
Total pages 9
Place of publication Oxford, United Kingdom
Publisher Oxford University Press
Collection year 2013
Language eng
Formatted abstract
Motivation: Comparing transcriptomic data with proteomic data to identify protein-coding sequences is a long-standing challenge in molecular biology, one exacerbated by the increasing size of high-throughput datasets. To address this challenge, and thereby to improve the quality of genome annotation and understanding of genome biology, we have developed an integrated suite of programs, called Pinstripe. We demonstrate its application, utility and discovery power using transcriptomic and proteomic data from publicly available datasets.

Results: To demonstrate the efficacy of Pinstripe for large-scale analysis, we applied Pinstripe’s reverse peptide mapping pipeline to a transcript library including de novo assembled transcriptomes from the human Illumina Body Atlas (IBA2) and Gencode v10 gene annotations, and the EBI PRIDE peptide database. This analysis identified 736 canonical ORFs supported by three or more PRIDE peptide fragments that are positioned outside any known CDS. Due to the unfiltered nature of the PRIDE database and high probability of false discovery, we further refined this list using independent evidence for translation, including the presence of a Kozak sequence or functional domains, synonymous/non-synonymous substitution ratios, and ORF length. Using this integrative approach, we observed evidence of translation from a previously unknown let7e primary transcript, the archetypical lncRNA H19, and a homolog of RD3. Reciprocally, by exclusion of transcripts with mapped peptides or significant ORFs (>80 codon), we identify 32,187 loci with RNAs longer than 2000 nt that are unlikely to encode proteins.

Availability and Implementation: Pinstripe (pinstripe.matticklab.com) is freely available as source code or a Mono binary. Pinstripe is written in C# and runs under the Mono framework on Linux or Mac OS X, and both under Mono and .Net under Windows.
Q-Index Code C1
Q-Index Status Confirmed Code
Institutional Status UQ
Additional Notes First published online: October 7, 2012

Document type: Journal Article
Sub-type: Article (original research)
Collections: Official 2013 Collection
Institute for Molecular Bioscience - Publications
UQ Diamantina Institute Publications
 
Versions
Version Filter Type
Citation counts: TR Web of Science Citation Count  Cited 21 times in Thomson Reuters Web of Science Article | Citations
Scopus Citation Count Cited 23 times in Scopus Article | Citations
Google Scholar Search Google Scholar
Created: Wed, 07 Nov 2012, 14:32:51 EST by Susan Allen on behalf of Institute for Molecular Bioscience