The ability for double-stranded DNA to form triple-helical complexes by means of hydrogen bonding with a third, single-stranded oligonucleotide is well established, yet the biological functions of these structures remain largely unknown.
There is considerable albeit circumstantial evidence for the existence of nucleic acid triplexes in vivo and their potential participation in a variety of biological processes including chromatin organization, DNA repair, transcriptional regulation and RNA processing has been investigated in a number of studies to date. There is also a range of possible mechanisms to regulate triplex formation through differential expression of triplex-forming RNAs, alteration of chromatin accessibility, sequence unwinding and nucleotide modifications. Important pieces to the puzzle have been added by recent work on new methods for triplex detection, the discovery of new triplex-unwinding helicases, as well as the increasingly apparent biological roles played by non-coding RNAs that suggest the involvement of triple-helices.
Triplex formation is governed by sequence-specific binding rules that are conceptually similar to the familiar Watson-Crick base-pairing rules. The third nucleotide strand binds in the major groove of an oligopurine·oligopyrimidine tract in the duplex by forming Hoogsteen or reverse Hoogsteen hydrogen bonds with the purine-rich strand of the duplex. The stability of a triplex is constrained by steric features as well as by the availability of hydrogen donor and acceptor groups to establish these hydrogen bonds. This effectively limits triplex formation to three basic “motifs”, all of which permit two stabilizing hydrogen bonds between the nucleotide of the third strand and the purine of the duplex.
Over the past decades, the rule set underpinning these motifs has been scrutinized with respect to various determinants of triplex formation such as the chemistry of the nucleotides present in each strand (e.g. nucleotide backbone, sugars, bases and modifications), the impact of pH, ionic environment, sequence composition and base mismatches utilizing a multitude of complementary experimental techniques. While each of the different determinants affect the stability of the triple-helix, these studies demonstrate that the rule set can be used to model triple-helix formation and analyze sequence data to gain insights into the in vivo role of triplex formation. Genomic and transcriptomic data generated since the advent of high-throughput sequencing technology provide a new, rich resource that, unfortunately, lies idle due to the current lack of tailored in silico methods.
This thesis focuses on developing in silico and in vitro tools to shed light on the existence of triple-helical complexes in vivo and what their functional relevance may be. In addition, this thesis investigates the potential triplex formation provides for biotechnological and therapeutic applications.
In the first part of this thesis, I develop and implement an efficient algorithm, called Triplexator, that facilitates the screening of nucleotide sequences for triple-helical features and – more importantly – enables the automatic assessment of the compatibility between a duplex and single-stranded sequence with respect to triplex formation. Triplexator provides a flexible tool to mine the wealth of available sequence data that emerges from the deep-sequencing of genomes and transcriptomes.
In the second part, I develop and implement a computational pipeline, Triplex-Inspector, that allows researcher to home-in on their genomic region of interest and obtain detailed information on its suitability for triplex-mediated genomic targeting. The software pipeline offers guidance for the design of third strand oligonucleotides to maximize specificity and sensitivity to the primary target.
Finally, this thesis aims to develop a laboratory high-throughput protocol on basis of microarray technology to screen the binding-affinity and specificity between oligonucleotides and duplex DNA. Data obtained from such an approach can be used to refine the somewhat crude binding rule set currently used to model nucleic acid triplex formation. Moreover, such a technology enables one to query the binding preferences of a specific oligonucleotide and extend our understanding of triplex formation to nucleotide analogs such as peptide nucleic acid, which have been reported to provide improved binding properties. This may enable scientists to target single genomic loci with high precision, which constitutes an invaluable instrument for a variety of biotechnological applications and is an essential prerequisite for gene therapy.