A model based approach to the optimal design of comparative experiments is proposed, and imple- mented in a general computational algorithm. The method is shown to be flexible and effective in both classical and emerging novel design problems where theoretical results are not available.
The design of experiments where the observations are correlated has been of interest over sev- eral decades, largely motivated by the spatial characteristics of agricultural field experiments. This interest has also been stimulated by statistical advances in the analysis of data arising from these experiments and developments in computing technologies that have allowed such analyses to be- come (almost) routine. In the context of plant genetic improvement, the goal of these experiments is to identify elite genotypes for progression (or commercialisation), or the identification of supe- rior breeding lines for use in the program. The genotypes in any such experiment are usually related through a shared ancestral hierarchy, and the benefits of including this known correlation structure in the analysis are widely accepted. However, the design of experiments with correlated treatment effects is largely unexplored; this thesis develops a consistent design definition and computing en- vironment that accounts for dependent data and allows the examination of optimal design in the presence of (known) correlated treatment effects.
Despite advances in computing power, the search for efficient designs for correlated data in large scale experiments (for example, greater than 1000 units) remains problematic in real time. In the con- text of agricultural field trials, an efficient design for pairwise treatment comparisons is A-optimal, and a first-order autoregressive process is considered a plausible correlation structure for the resid- uals. Under this model, an approximation to A-optimality is developed based on neighbour proper- ties that is computationally efficient for discriminating among competing designs. The approximate criterion is both useful in its own right and as the initial state for an optimal search using exact A- optimality, and is implemented in C++ as the package nn in the R statistical environment. The effec- tiveness of the approximation is illustrated in a multi-stage laboratory experiment from industry.
Existing computational approaches to optimal design search are algebraic and rely on updating formulae for efficiencies, where expensive tasks such as matrix inversion are required. While effective, updating formulae are not readily available for all complex design models, and an alternative computational framework is sought. This thesis brings direct methods for linear algebra to the search for optimal designs, providing a flexible computing environment with the potential to exploit sparse matrix technologies and optimised linear algebra software libraries. The methodology using dense matrix computations is implemented in C++ and available as the R package od.
Experimental designs are typically specified in software with application specific jargon in pre- set user interfaces, with the link to the underlying statistical model often not explicitly made. This thesis introduces a model based paradigm for design specification using extended symbolic model formulae, with a function style syntax for specifying variance models and special model terms. This explicitly links the design and analysis steps and provides a flexible environment for extending the linear design model to novel settings such as partial composite sampling, or traditional, yet problematic, applications such as diallel mating experiments. As an adjunct, prediction from the linear model is adopted as a template for computing the optimality criterion of a design and a prediction pre-processor to set the design objectives is proposed.
Although developed in the context of agricultural field experiments, the technology is quite gen- eral and has wider application. In practice, optimal design generators require an initial configuration; an adaptation of design key methods for this purpose is presented, and design initialisation flagged as an area for future work. All computational methods developed here are implemented as R packages (od, nn and key.design) and demonstrated to be effective in different practical settings, including multi-phase experiments. Of particular interest is the use of od for generating optimal designs in genetic studies with known correlated treatment effects, an area much neglected in the literature.