The development and application of bioinformatics methods and software tools for computational single nucleotide polymorphism discovery

Lorenc, Michal (2015). The development and application of bioinformatics methods and software tools for computational single nucleotide polymorphism discovery PhD Thesis, School of Agriculture and Food Sciences, The University of Queensland. doi:10.14264/uql.2015.1070

Attached Files (Some files may be inaccessible until you login with your UQ eSpace credentials)
Name Description MIMEType Size Downloads
s4240132_phd_submission.pdf Thesis (open access) application/pdf 4.40MB 0

Author Lorenc, Michal
Thesis Title The development and application of bioinformatics methods and software tools for computational single nucleotide polymorphism discovery
School, Centre or Institute School of Agriculture and Food Sciences
Institution The University of Queensland
DOI 10.14264/uql.2015.1070
Publication date 2015-11-20
Thesis type PhD Thesis
Supervisor Dave Edwards
Jacqueline Batley
Jiri Stiller
Total pages 182
Language eng
Subjects 0604 Genetics
0607 Plant Biology
0803 Computer Software
Formatted abstract
Brassica and wheat are important crops for agriculture in Australia and world-wide. Their production is challenging because of biotic stresses such as diseases, and environmental factors including drought and soil salinity.

In comparison to the model species Arabidopsis thaliana and rice, the genomes of Brassica and wheat are both large and complex. This size and complexity makes it more difficult to determine their genome sequences..

The sequence information produced by Second Generation Sequencing (SGS) technologies allows researchers to identify for example large numbers of molecular genetic markers which can be used to study heritable traits and for applied crop improvement.

SGS technologies are speeding up genome sequencing, but they have led to vast increases in the amount of data resulting in major computational challenges. To manage this data, new computational systems have to be designed to support the SGS based research.

This thesis describes the design, implementation and validation of the SGSautoSNP pipeline, a new approach to call SNPs in large and complex crop genomes using SGS sequences. In our method the reference genome sequence is used only to assemble the reads, and SNPs are then called between these assembled reads. The pipeline includes gene prediction, SNP annotation and identifies low SNP density regions which are more conserved than high SNP density regions.

A total of 638,593 SNPs in the Brassica napus AA genome and 881,289 SNPs in the wheat group 7 chromosome arms were identified using the SGSautoSNP pipeline. Validation of 20 B. napus AA genome SNPs resulted in a SNP prediction accuracy of around 95%. Of the 28 wheat SNPs that were used for validation of the SGSautoSNP pipeline, 26 (93%) produced the expected genotype.

By combining the SGSautoSNP pipeline together with SnpEff it was possible to determine whole genome SNPs trends, transition to transversion ratios and SNP frequencies across chromosomes. Annotation of B. napus AA genome SNPs have revealed that 0.5% of predicted SNPs are classified as “high effect” SNPs, and these could impact the structure of the proteins or the amino acid transcripts.

The discovered molecular markers, genes, genetic and marker annotations and gene ontology by SGSautoSNP pipeline are stored in a new developed database called SGSautoSNPdb. This information are linked to other databases in order to allow researchers to access information quick and in a biologist friendly manner.

Together, the SGSautoSNP pipeline and SGSautoSNPdb provides tools to help us to understand how natural selection has shaped the evolution of crop genomes and SNPs that can be applied to improve crops in order to secure a sufficient food-source into the future.
Keyword Development
Application
Bioinformatics
Bioinformatics programming
Bioinformatics analysis
Software
Single nucleoticle polymorphisms

Document type: Thesis
Collections: UQ Theses (RHD) - Official
UQ Theses (RHD) - Open Access
 
Versions
Version Filter Type
Citation counts: Google Scholar Search Google Scholar
Created: Wed, 18 Nov 2015, 19:56:19 EST by Mr Michal Lorenc on behalf of Scholarly Communication and Digitisation Service