Optimization problems are of fundamental practical importance and can be found in almost every aspect of human endeavour. Yet remarkably, we have a very limited understanding of the nature of optimization problems and subsequently of how, why and when different algorithms perform well or poorly. The notion of a problem landscape captures the relationship between the objective function and the problem variables. It is clear that the structure of this landscape is vital in understanding optimization problems, however analysis of this structure presents major challenges. Apart from the often high-dimensionality of problem landscapes, information available in the black-box setting is limited to the solutions in the feasible search space and their respective objective function values. Landscape analysis has received some attention in the optimization literature, mainly in evolutionary computation. However there are some important limitations of this work as well as many open issues around its practical utility.
This thesis proposes a novel framework and practical techniques for the analysis of optimization problems utilizing information available in the black-box setting. The concept of length scale is proposed as a fundamental feature of both combinatorial and continuous optimization problems. Analytical properties of length scale and its distribution over a given problem are established. Techniques from statistics, set theory, visualisation and machine learning are employed to summarise and interpret sampled length scale values. From the length scale distribution, a problem similarity measure is proposed using the entropic Jeffrey divergence. This provides a means of comparing arbitrary black-box optimization problems, between combinatorial or continuous problems of possibly different dimensionality. An alternative realisation of the framework is also developed for quantifying optimization problem similarity via length scale information, based on the notion of Information Distance from Kolmogorov Complexity Theory. Information Distance is a universal distance measure between two arbitrary objects, and in practice can be approximated by Normalised Compression Distance, which relies on binary representations of the problems of interest and a lossless compressor. A novel methodology for calculating the Normalised Compression Distance in the optimization context is developed.
The techniques proposed are implemented and extensively evaluated via experiments on continuous artificial, benchmarking and real-world representative problems, and instances of NP-hard classes of combinatorial problems. The results convincingly show that length scale features are highly effective and robust for comparing and characterizing optimization problems. The calculated Jeffrey divergences and Normalised Compression Distances between length scale distributions are able to identify known similarities among problems, and also provide valuable insights into the relationship between problems. Known phase transitions in the difficulty and structure of the combinatorial problem instances are clearly reflected in the results for both similarity measures. This is remarkable given that only black-box information is used in the analysis. Finally, the theoretical and empirical relationship between the Jeffrey divergence and Normalised Compression Distance is studied. The features are shown to be different but conceptually related, and provide complementary empirical information.