Using ChIP-seq technology to understand transcription factor biology

Thomas Whitington (2010). Using ChIP-seq technology to understand transcription factor biology PhD Thesis, Institute for Molecular Bioscience, The University of Queensland.

Attached Files (Some files may be inaccessible until you login with your UQ eSpace credentials)
Name Description MIMEType Size Downloads
s4134830_phd_abstract.pdf Final thesis abstract pdf Click to show the corresponding preview/stream application/pdf 26.47KB 0
s4134830_phd_finalthesis.pdf Final Thesis pdf Click to show the corresponding preview/stream application/pdf 8.77MB 21
Author Thomas Whitington
Thesis Title Using ChIP-seq technology to understand transcription factor biology
School, Centre or Institute Institute for Molecular Bioscience
Institution The University of Queensland
Publication date 2010-08
Thesis type PhD Thesis
Supervisor Timothy Bailey
Total pages 217
Total colour pages 72
Total black and white pages 145
Subjects 06 Biological Sciences
Abstract/Summary Transcription factors bind to DNA and recruit other proteins to modify the rate at which specific genes are transcribed. This regulation of gene expression by transcription factors comprises a crucial layer of control in various biological processes, including development and cellular response to environmental stimuli. For decades, individual transcription factors (TFs) have been studied in vitro and at individual genomic loci in vivo using various technologies such as electrophoretic mobility shift assays, luciferase reporter gene assays, targeted mutation of binding sites, and chromatin immunoprecipitation (ChIP) followed by quantitative real-time PCR. The recent development of ChIP followed by high-throughput sequencing (ChIP-seq) allows the binding locations of individual TFs to be mapped to the entire mammalian genome with high resolution, for a particular tissue or condition of interest. This technology offers an unprecedented opportunity to understand TFs in vivo, on a global scale. Individual ChIP-seq experiments produce large volumes of data, as individual mammalian TFs typically exhibit thousands of binding locations throughout the genome. As a result, computational approaches are essential to ChIP-seq analysis. In this thesis, I have analysed novel and published ChIP-seq data using various approaches. My approaches have been motivated by several distinct biological questions. In particular, I have focused on improving our understanding of the parameters controlling where TFs bind in the genome. I have also developed general methods and pipelines for analysing individual ChIP-seq experiments in order to extract maximum biological knowledge from the data. The structure of chromatin, including nucleosome positioning, mobility, and post-translational modifications, affects binding of TFs to genomic DNA in vivo. Prior research has studied this relationship for specific TFs at individual loci, and recent work established that binding of transcription factor c-Myc is regulated by chromatin structure throughout the human genome. Using ChIP-seq data from prior publications, I have investigated the generality of this relationship across various TFs. I showed that genome-wide estimates of chromatin structure can be harnessed to dramatically improve prediction of in vivo transcription factor binding sites (TFBSs) in a tissue-specific manner. This result was consistent across diverse TFs, establishing a general correlation between TF binding and open chromatin. In addition, my analysis demonstrated the value of incorporating chromatin modification data when attempting to predict TFBSs using in silico approaches. Many TFs form complexes via protein-protein interactions with other TFs when binding at the DNA interface. This complex formation is critical to TF biology, facilitating dramatic changes in target gene expression in response to small changes in TF concentration, as well as providing a mechanism for increased binding selectivity for a TF. I developed a novel computational methodology for inferring the occurrence of TF complexes from ChIP-seq data on the basis of enrichment in TF motif spacing. By applying my algorithm to 39 TF ChIP-seq datasets from prior publications, I was able to detect known and novel TF complexes with high-specificity. These results provide general insights into common features of TF complex formation, in addition to extending our knowledge of which complexes the individual TFs take part in. The transcription factor KLF1 is a critical regulator of erythropoiesis, the production of red blood cells. To better understand the role of this specific TF, we performed a novel ChIP-seq analysis for KLF1 in mouse fetal liver, which is a rich and homogeneous source of red blood cell precursors. To extract biological knowledge from this data, I performed bioinformatic analyses on the ChIP-seq experiment following sequencing and mapping of tags using the SOLiD sequencing platform. The existing bioinformatic tools and custom analyses that I applied to these ChIP-seq data uncovered novel biological information. These findings included a clear cooperation and putative complex formation between KLF1 and GATA1; the observation that KLF1 is likely to act solely as a transcriptional activator in vivo, without exhibiting repressor activity; and the identification of motifs for additional TFs that are likely to co-regulate erythropoiesis in concert with KLF1. My analysis of published ChIP-seq data has improved our understanding of how chromatin accessibility and complex formation modulate binding of genomic DNA by TFs. Subsequent work could integrate information about chromatin accessibility and complex formation into a quantitative model, with the goal of accurate prediction of TF binding in silico. In addition, my analysis of novel ChIP-seq data for KLF1 comprises a bioinformatic pipeline that could be applied to future ChIP-seq experiments in order to maximise their utility.
Keyword Transcription factor, ChIP-seq, chromatin, heterodimer, homodimer, complex, KLF1, computational genomics.
Additional Notes 24,27,28,33,37,42,45,54-65,72,76,78,79,93-104,132,137,147,150,154-163,169-171,178-189,193,195-201

Citation counts: Google Scholar Search Google Scholar
Access Statistics: 243 Abstract Views, 21 File Downloads  -  Detailed Statistics
Created: Thu, 09 Jun 2011, 02:21:26 EST by Mr Thomas Whitington on behalf of Library - Information Access Service