RNA has been primarily regarded as the intermediate between gene and encoded protein. Recently it has become evident that complex eukaryotes also produce large numbers of non-protein-coding RNAs that appear to carry out a range of regulatory functions, from the formation of differentiation-specific subcellular organelles to the site-specific guidance of chromatin-modifying complexes and the dynamic topological organisation of chromatin. Indeed, it is emerging that RNA plays a central role in many processes and lies at the heart of developmental complexity and phenotypic diversity of animals and plants, but our knowledge of the detail of RNA interactions is still very limited. The rapid recent development of next-generation sequencing techniques provides a novel opportunity to investigate the nature and targets of RNA interactions in the cell. The aim of this thesis is to explore the human transcriptome and RNA interactions via next-generation sequencing data generation and analysis.
In the first part of the thesis, I present a meta-analysis of genomic and transcriptomic datasets, showing the proportion of non-protein-coding DNA (ncDNA) per haploid genome is positively correlated with biological complexity, and these sequences are differentially but pervasively transcribed in higher eukaryotes. This suggests that an increasing proportion of the genomes of complex organisms are devoted to the production of regulatory RNAs and that a large proportion of the human genome is functional.
Investigation of RNA-protein interactions provides a means to understand the regulation of gene expression. However, identifying the RNA targets of different proteins can be challenging. In the second part of this thesis I present the development of a new method called “digital RNase footprinting” to investigate RNA-protein interactions in the human mitochondrial transcriptome, using next generation sequencing and a novel computational algorithm. Combining this method with gene-specific knockdowns, it was possible to map mitochondrial RNA-protein interactions at single nucleotide resolution to identify specific RNA binding sites. Furthermore, transcriptome-wide changes during stalled translation were identified and RNA structures were determined at single nucleotide resolution. This approach can be used to examine the roles of RNA-protein interactions in any cellular transcriptome. Furthermore, I performed RIP-Seq analysis to identify Adenosine deaminase, RNA-specific B2 (ADAR3)-associated RNA editing targets in a human neuroblastoma cell line, preliminary analysis of which showed that ADAR3 targets with roles in neurogenesis and synaptic plasticity, such as transcripts from the imprinted IGF2-H19 locus and the long non-coding RNA MALAT1.
In the final chapter, I present a landscape of RNA-DNA hybrids in the human genome. RNA-DNA hybrids, also called R-loops, were immunoprecipitated from human HeLaS3 and K562 cell lines using the specific monoclonal antibody S9.6, to generate ChIP-Seq and matched RIP-Seq data to map the location of RNA-DNA hybrids. The analysis results showed prevalence of RNA-DNA hybrids in distinct genomic loci, many of which belong to transcripts, antisense transcripts and gene regulatory elements, such as promoters and enhancers. Interestingly, most of identified RNA-DNA hybrids included repetitive sequence. Single-nucleotide polymorphisms (SNPs) are enriched at RNA-DNA hybrids regions, suggesting the R-loop formation exposes DNA to mutagenic processes during transcription. A micrococcal nuclease-sequencing dataset also indicated sparse nucleosome positioning signal around RNA-DNA hybrids loci. These results provide an overview and potential regulatory role of RNA-DNA hybrids in the human genome.