Finding biologically meaningful patterns in species assemblage data and relating those patterns to the surrounding environmental conditions is an important and common endeavour amongst marine community ecologists. Currently, there are a variety of statistical methodologies available for eliciting patterns between species assemblage and environmental data. The primary objective of this thesis was to evaluate their suitability for detecting species-environmental patterns within marine community data. In addition, general recommendations in relation to data pre-processing options and the presentation of results with a focus on biological interpretation were to be made.
The available data for this thesis were collected by Commonwealth Scientific & Industrial Research Organisation and the Queensland Department of Primary Industries at 206 sampling locations inside the Far North region of the Great Barrier Reef, Queensland, Australia. This collection resulted in 922 species or higher order taxa and 21 related environmental variables. All statistical investigations were made using these data.
Although the overall aim of this thesis focused on relating environmental and assemblage data, an initial investigation into the best pre-processing options for each particular data type was conducted Once such decisions were made, the focus shifted to using the broad methodologies of direct and indirect gradient analysis. Current techniques were examined, as well as proposed extensions with the ability to model
(1) Both linear and non-linear relationships between the descriptors, and
(2) Spatially explicit variables.
The different methodologies and techniques investigated for analysing marine environmental and assemblage data resulted in similar biological conclusions. The analyses consistently identified the same
(1) Longitudinal gradient separating inshore and offshore locations,
(2) Broad species-environmental patterns,
(3) Rare and locally abundant species, and
(4) Unusual environmental characteristics,
irrespective of the methodology used.
The importance of making judicious decisions regarding the pre-processing of the assemblage data was confirmed by being able to identify more ecological meaningful patterns through the use of either log (biomass+ 1) or a presence/absence transformation when using assemblage data at high taxonomic resolutions (species, family). Low taxonomic resolutions (class, phylum) produced more informative results using raw data. Likewise, the number of species retained in the analyses required balancing rare and abundant species. The results from this specific dataset suggested that around 100 species should be retained to maximise the ecological interpretations. Pre-processing of the environmental data was also necessary, and helped identify and accommodate outlying observations, skewed distributions, and redundant or correlated variables.
The results obtained from the subsequent investigations of environmental data found that models allowing for non-linear relationships between the environmental descriptors provided a better description of the study area. On the other hand, the results from the model based clustering procedures based on the environmental data provided a reasonable grouping for stations in the absence of a priori strata classifications.
The analysis of the assemblage data suggested that additional information on the localities of species obtained using Correspondence Analysis produced a more informative solution in comparison with non-metric Multi-Dimensional Scaling. Correspondence Analysis identified the important environmental gradients related to both the abundant and locally abundant species without having to rely on additional analyses. The results of the cluster analysis based on the assemblage data suggested that in order to obtain the most informative description of species in a particular region, a priori information on both the environmental and species station profiles is necessary.
When relating assemblage and environmental data, indirect gradient analysis provided a more informative solution by superimposing the object scores obtained either through a linear or non-linear Principal Component Analysis onto the objects in a Correspondence Analysis. The alternative generalised Mantel test, used in conjunction with non-metric Multi-Dimensional Scaling, generally resulted in low to moderate correlations between the species pattern and the environmental variables, producing little additional information.
The extensions applied to the direct gradient analysis technique of Canonical Correspondence Analysis to better accommodate for some of the particular properties of the data did show improvement in the amount of variation accounted by the statistical model. However, for these particular data the improvements did not influence the general biological interpretations.
Overall, direct gradient analysis, in particular the technique of Canonical Correspondence Analysis, produced more informative ordinations compared to the techniques related to indirect gradient analysis. Simultaneously identifying relationships between species, stations, and environmental conditions helped explain or justify some of the patterns in the data. The greater flexibility in the model used in Canonical Correspondence Analysis based techniques was an additional advantage over the techniques based on indirect gradient analysis.
As the marine datasets under investigation here are reasonably representative, the above recommendations will be generally applicable to similar multivariate datasets.