Integrated genome analysis suggests that most conserved non-coding sequences are regulatory factor binding sites

Hemberg, Martin, Gray, Jesse M., Cloonan, Nicole, Kuersten, Scott, Grimmond, Sean, Greenberg, Michael E. and Kreiman, Gabriel (2012) Integrated genome analysis suggests that most conserved non-coding sequences are regulatory factor binding sites. Nucleic Acids Research, 40 16: 7858-7869. doi:10.1093/nar/gks477


Author Hemberg, Martin
Gray, Jesse M.
Cloonan, Nicole
Kuersten, Scott
Grimmond, Sean
Greenberg, Michael E.
Kreiman, Gabriel
Title Integrated genome analysis suggests that most conserved non-coding sequences are regulatory factor binding sites
Journal name Nucleic Acids Research   Check publisher's open access policy
ISSN 0305-1048
1362-4962
Publication date 2012-09-01
Sub-type Article (original research)
DOI 10.1093/nar/gks477
Open Access Status DOI
Volume 40
Issue 16
Start page 7858
End page 7869
Total pages 12
Place of publication Redding, CT United States
Publisher Begell House
Collection year 2013
Language eng
Formatted abstract
More than 98% of a typical vertebrate genome does not code for proteins. Although non-coding regions are sprinkled with short (<200 bp) islands of evolutionarily
conserved sequences, the function of most of these unannotated conserved islands remains unknown. One possibility is that unannotated conserved islands could encode non-coding RNAs (ncRNAs); alternatively, unannotated conserved
islands could serve as promoter-distal regulatory factor binding sites (RFBSs) like enhancers. Here we assess these possibilities by comparing unannotated conserved islands in the human and mouse genomes to transcribed regions and to RFBSs, relying on a detailed case study of one human and one mouse cell type. We define transcribed regions by applying a novel transcript-calling algorithm to RNA-Seq data obtained from total cellular RNA, and we define RFBSs using ChIP-Seq and DNAse-hypersensitivity assays. We find that unannotated conserved islands are four times
more likely to coincide with RFBSs than with unannotated ncRNAs. Thousands of conserved RFBSs can be categorized as insulators based on the presence of CTCF or as enhancers based on the presence of p300/CBP and H3K4me1. While
many unannotated conserved RFBSs are transcriptionally active to some extent, the transcripts produced tend to be unspliced, non-polyadenylated and expressed at levels 10 to 100-fold lower than annotated coding or ncRNAs. Extending these
findings across multiple cell types and tissues, we propose that most conserved non-coding genomic DNA in vertebrate genomes corresponds to promoter-distal regulatory elements.
Keyword Transcription Factor Binding
Gene Expression
Bidirectional Promoters
Seq Reveals
Human Cells
Q-Index Code C1
Q-Index Status Confirmed Code
Institutional Status UQ

Document type: Journal Article
Sub-type: Article (original research)
Collections: Official 2013 Collection
Institute for Molecular Bioscience - Publications
 
Versions
Version Filter Type
Citation counts: TR Web of Science Citation Count  Cited 11 times in Thomson Reuters Web of Science Article | Citations
Scopus Citation Count Cited 11 times in Scopus Article | Citations
Google Scholar Search Google Scholar
Created: Thu, 15 Nov 2012, 22:04:27 EST by System User on behalf of Institute for Molecular Bioscience