Improved techniques for the identification of pseudogenes

Coin, L. and Durbin, R. (2004) Improved techniques for the identification of pseudogenes. Bioinformatics, 20 SUPPL. 1: 94-100. doi:10.1093/bioinformatics/bth942

Author Coin, L.
Durbin, R.
Title Improved techniques for the identification of pseudogenes
Journal name Bioinformatics   Check publisher's open access policy
ISSN 1367-4803
Publication date 2004-01-01
Year available 2004
Sub-type Article (original research)
DOI 10.1093/bioinformatics/bth942
Open Access Status Not Open Access
Volume 20
Issue SUPPL. 1
Start page 94
End page 100
Total pages 7
Place of publication Oxford, United Kingdom
Publisher Oxford University Press
Language eng
Subject 1308 Clinical Biochemistry
1703 Computational Theory and Mathematics
1706 Computer Science Applications
Formatted abstract
Motivation: Pseudogenes are the remnants of genomic sequences of genes which are no longer functional. They are frequent in most eukaryotic genomes, and an important resource for comparative genomics. However, pseudogenes are often mis-annotated as functional genes in sequence databases. Current methods for identifying pseudogenes include methods which rely on the presence of stop codons and frameshifts, as well as methods based on the ratio of non-silent to silent nucleotide substitution rates (dN/dS). A recent survey concluded that 50% of human pseudogenes have no detectable truncation in their pseudo-coding regions, indicating that the former methods lack sensitivity. The latter methods have been used to find sets of genes enriched for pseudogenes, but are not specific enough to accurately separate pseudogenes from expressed genes.

We introduce a program called pseudogene inference from loss of constraint (PSILC) which incorporates novel methods for separating pseudogenes from functional genes. The methods calculate the log-odds score that evolution along the final branch of the gene tree to the query gene has been according to the following constraints: A neutral nucleotide model compared to a Pfam domain encoding model (PSILCnuc/dom); A protein coding model compared to a Pfam domain encoding model (PSILCprot/dom). Using the manual annotation of human chromosome 6, we show that both these methods result in a more accurate classification of pseudogenes than dN/dS when a Pfam domain alignment is available. 
Keyword Biochemical Research Methods
Biotechnology & Applied Microbiology
Computer Science, Interdisciplinary Applications
Mathematical & Computational Biology
Statistics & Probability
Biochemistry & Molecular Biology
Biotechnology & Applied Microbiology
Computer Science
Mathematical & Computational Biology
Q-Index Code E1
Q-Index Status Provisional Code
Institutional Status Non-UQ

Document type: Journal Article
Sub-type: Article (original research)
Collection: Institute for Molecular Bioscience - Publications
Version Filter Type
Citation counts: TR Web of Science Citation Count  Cited 12 times in Thomson Reuters Web of Science Article | Citations
Scopus Citation Count Cited 19 times in Scopus Article | Citations
Google Scholar Search Google Scholar
Created: Sat, 25 Jan 2014, 05:06:23 EST by System User on behalf of Institute for Molecular Bioscience