Optimization problems occur widely in various domains and are of great practical importance. Better solutions to optimization problems translate into increased production or efficiency, or less waste or error. As a result, the development of optimization algorithms is a large and active area of research. Although there is a large amount of theory and techniques for solving optimization problems, there are also many open issues and questions in understanding the relationship between optimization algorithms and the problems that they are applied to.
In any optimization problem, the influence of each solution variable on the objective function, as well as the interactions between variables are very important. For example, if the variables in a problem are independent, then it might be efficiently solved by decomposition. If important dependencies exist between variables, an algorithm that is able to capture these dependencies may be able to exploit them to find good solutions. Alternatively, if a variable has a strong influence on the objective function, then an algorithm may be better off focussing its search on this variable, compared with another variable that has very little influence on the objective function.
Estimation of Distribution Algorithms (EDAs) are a class of black-box optimization algorithms that learn and sample from a probabilistic model over the solution variables to carry out the search process. The explicit model in an EDA clearly specifies how the algorithm handles problem variables during the search process. A major focus in EDA research has been the incorporation of dependency modelling using models of varying complexity (e.g. probabilistic graphical models). The intuition behind this work is that if the algorithm model is capable and successful at capturing the structure of the problem, it will produce good performance on that problem. Experimental results have confirmed this intuition, for example EDAs that model dependency information have outperformed EDAs that do not model dependencies on certain problems.
While EDAs have shown good performance results, little work has analysed the dynamics of EDA models in practice. In fact, it is not clear what kind of problem structure EDAs can successfully model, or to what extent it is necessary to successfully model problem structure in order to achieve good performance. To provide insight into these issues, a more detailed analysis of EDAs applied to specific problems is needed.
In this thesis, an experimental methodology is proposed to analyse the features of variables in continuous optimization problems and continuous EDAs. The approach is based on sampling points from the problem fitness landscape and/or the history of points visited by an EDA during the search process. These samples are then analysed in three different ways, to identify key structural variables, variable dependencies and important variables. The techniques are used to analyse a variety of test problems and EDAs optimizing the same problem set. The results confirm that the interaction between variables is complex, varies across problems and gives useful insights into the performance of EDAs on these problems. The results are categorized into different problem/algorithm behaviour types.
In continuous EDAs, a Gaussian distribution over continuous variables is commonly used, with several different covariance matrix structures ranging from diagonal i.e. Univariate Marginal Distribution Algorithm (UMDAc) to full i.e. Estimation of Multivariate Normal Algorithm (EMNAglobal). The modelling of key structural variables and correlations are already captured in standard EDAs. In contrast, so-called important variables are not identified by an EDA. In the final part of the thesis, a modified, screening EDA (sEDA) is presented which identifies important variables and uses this to control the degree of covariance modelling in the Gaussian EDA model. Compared to EMNAglobal, the algorithm provides improved numerical stability and can use a smaller selected population. Experimental results are presented to evaluate and compare the performance of the proposed algorithm to UMDAc and EMNAglobal. In its first formulation, sEDA requires a large number of function evaluations for high dimensional problems. To address this issue, a modified version of (sEDA-lite) is also proposed. Experimental results on a large set of high dimensional artificial and real-world representative problems evaluate the performance of the new algorithm and compare it with sEDA and EDA-MCC (EDA framework with Model Complexity Control), a related, recently proposed algorithm.