The Collection and Data-Driven Analyses of Proteins Localized to Nuclear Compartments

Nurul Mohamad (2011). The Collection and Data-Driven Analyses of Proteins Localized to Nuclear Compartments MPhil Thesis, Institute for Molecular Bioscience, The University of Queensland.

Attached Files (Some files may be inaccessible until you login with your UQ eSpace credentials)
Name Description MIMEType Size Downloads
s41529353_MPhil_Abstract.pdf Abstract application/pdf 61.42KB 0
s4152935_MPhil_finalthesis.pdf Final Thesis application/pdf 2.15MB 6
Author Nurul Mohamad
Thesis Title The Collection and Data-Driven Analyses of Proteins Localized to Nuclear Compartments
School, Centre or Institute Institute for Molecular Bioscience
Institution The University of Queensland
Publication date 2011-02
Thesis type MPhil Thesis
Supervisor Mikael Boden
Rohan Teasdale
Total pages 215
Total colour pages 7
Total black and white pages 201
Subjects 06 Biological Sciences
Abstract/Summary The cell nucleus is a complex yet dynamic organelle with many morphologically distinct nuclear compartments. Some nuclear compartments are hypothesized to play important regulatory roles. For example, the chromatin is primarily concerned with DNA replication, the nuclear speckle accumulates pre-messenger RNA (pre-mRNA) splicing factors at actively transcribed genes, and the nucleolus controls the ribosome subunit biogenesis in the cell nucleus. Once the entire complement of proteins for the cell nucleus has been defined, the relationships and interactions between proteins and biological processes within the nucleus can be systematically understood. Nuclear protein data however are not readily accessible from a single resource, but instead documented in multiple papers and repositories. Therefore, in this thesis, several objectives are proposed: firstly, to collect a high-confidence set of nuclear proteins with known nuclear compartment associations and utilize the set to obtain a better understanding of the organization of the nucleus into nuclear compartments; secondly, to investigate the quality of the dataset; thirdly, to observe the evolutionary relationship between a mammalian and a plant nuclear proteome, and fourthly, to explain the specific relationship between nuclear compartments and nuclear proteins. The nuclear proteins data is extracted from different resources including specific nuclear proteome databases (Nuclear Protein Database (NPD), Nucleolar Proteome Database (NoPdb),and Database of Nuclear Matrix Proteins (NMP-db)) as well as from generic databases (Human Protein Reference Database (HPRD) and UniProt KnowledgeBase (UniProt)). These data are composed from a diverse set of organisms, but for the collection to representative, at least one mammalian organism is needed to be identified. Furthermore, throughout the data collection stage, Nucprot, a unique mouse nuclear proteome data that map the full nuclear proteome is used as the reference set. Therefore, a mouse identifier is chosen as the protein identifier for the dataset collected. Using the orthology mapping approach, nuclear proteins are assigned with a mouse protein identifier and further intersected with the Nucprot. BLAST search against Nucprot is performed for any protein with an assigned nuclear compartment but have not been identified in Nucprot. Some nuclear proteins such that newly discovered proteins only appear in the published literature. Therefore the nuclear protein set is extended using compartment-specific reviews and proteomic manuscripts that have experimental evidence demonstrating their localization and all the manuscripts must be published from 2004 until 2009. By collating recent data from multiple resources, this thesis is able to identify 3,567 proteins as belonging to the mouse nuclear proteome. This dataset has high quality: it consists of 88% proteins for which experimental evidence of nuclear localization is documented; and 72% proteins have evidence of nuclear localization. The availability of large-scale yet high confidence data of nuclear proteins provides a unique opportunity for understanding the organization of the cell nucleus. The first observation concerns the conservation of the nuclear proteome between mouse and plant. The result of this analysis shows that only 11% of mouse nuclear proteins are conserved in plant. An exploration of several features including gene annotations, amino acid sequence, protein-protein interaction data, cell cycle expression, post-translational modifications, and protein motif characterizes individual compartment on the basis of their protein complement. The exploration for different features is also crucial for discriminating proteins of one compartment from another. The gene ontology terms of proteins belonging to the same nuclear compartments highlight their regulatory roles. Nuclear localization signal (NLS) is more common in a nuclear speckle and SUMOylation (the addition of the small ubiquitin-related modifier, or SUMO) is significantly enriched in Promyelocytic leukemia nuclear body (PML-NB). Support-vector machines (SVMs) are trained to show that nuclear proteins contain discriminative features. SVMs are utilized to identify associations between nuclear compartments. The trained models are also used to estimate the full complement of each nuclear compartment. The discovery of novel compartment-specific motifs provides support for the existence of compartment-specific targeting signals for directing or retaining proteins. Protein interactions are found primarily to link proteins within the same compartment. Cell cycle expression data provide support for increased activity in nucleolus, nuclear speckle and PML-NB especially during synthesis (S) phase. Post-translational modifications are found to significantly occur in proteins associated with most of the nuclear compartments. The construction of large-scale data of nuclear proteins fills an important gap, making such proteins available from a single resource. This dataset therefore provides an initial point for researchers to address a range of important problems related to cell nuclear biology. Results presented here will assist in improving the overall knowledge and understanding of the organization of the nuclear compartments.
Keyword nuclear compartment
nuclear proteins
data-driven analysis of nuclear proteins
nuclear proteins localization
nuclear compartment localization proteins
nuclear protein dataset
Additional Notes Page numbers that should be printed in color 37, 48, 56, 66, 83, 87, 92

Citation counts: Google Scholar Search Google Scholar
Created: Mon, 21 Mar 2011, 12:00:26 EST by Ms Nurul Mohamad on behalf of Library - Information Access Service