A Methodology for Training XCS Agents to Optimise Team and Individual Performance on Path Planning Problems

Kuang-yuan Chen (2010). A Methodology for Training XCS Agents to Optimise Team and Individual Performance on Path Planning Problems PhD Thesis, School of Information Technol and Elec Engineering, The University of Queensland.

Attached Files (Some files may be inaccessible until you login with your UQ eSpace credentials)
Name Description MIMEType Size Downloads
s40882453_phd_finalthesis.pdf s40882453_phd_finalthesis.pdf application/pdf 1.76MB 6
s40882453_phd_finalthesisabstract.pdf s40882453_phd_finalthesisabstract.pdf application/pdf 25.54KB 4
Author Kuang-yuan Chen
Thesis Title A Methodology for Training XCS Agents to Optimise Team and Individual Performance on Path Planning Problems
School, Centre or Institute School of Information Technol and Elec Engineering
Institution The University of Queensland
Publication date 2010-11
Thesis type PhD Thesis
Supervisor Prof. Peter A. Lindsay
Prof. Hussein A. Abbass
Total pages 180
Total colour pages 16
Total black and white pages 164
Language eng
Subjects 08 Information and Computing Sciences
Abstract/Summary Path planning for multiple agents is a challenging problem for navigation in dynamic environments, especially for real-world applications such as air transportation. This thesis explores the use of Genetic-Based Machine Learning (GBML) techniques – and the eXtended Classifier System (XCS) approach in particular – for designing general optimal or near-optimal solutions to path planning problems for Partially Observable Markov Decision Process (POMDP) environments. The approach is developed and evaluated on a simplified 3D multi-agent path planning problem from Air Traffic Management (ATM). A decoupled heuristic approach is used, in which XCS agents are first trained to find path planning solutions which are optimal or near optimal for their particular performance objectives. Then a separate supervisor agent, analogous to an air traffic controller, is used to assign priority dynamically among aircraft agents in order to resolve conflicts. Although the case study is a path planning problem in simplified ATM environments, many challenges still needed to be overcome in order to make this approach systematic. First, the long action chain problem was encountered. As indicated in literature, the traditional approach to using XCS on multi-step problems has difficulty in exploring all states in an environment with more than 10 steps, because solutions are biased to be accurate for those states explored more often, but inaccurate for those unexplored or less-explored states. To overcome this problem, we show that by designing an appropriate local reward function, instead of using the traditional delayed-reward feedback mechanism, XCS can be used to generate optimal or near-optimal heuristic solutions to the ATM path planning problem. Secondly, safety is an important issue in air transportation. The challenge was how to ensure that safety constraints are taken into account as far as possible during learning. A general solution for biasing XCS to avoid undesired actions is developed and shown to significantly improve safety on the 3D path planning problem. The third main challenge was how to generalise from the hand-crafted local reward function solution to a more generalisable approach based simply on path fitness. The traditional approach to multi-step problems in XCS is to use delayed rewards with discounting. But this approach either fails to work or works poorly for many path planning problems. This is known as the aliasing state problem in path planning. To overcome this problem in our ATM environment, we developed a new means for feeding delayed rewards into XCS. In this approach the user simply needs to provide the global reward function and XCS learns a suitable local reward function. We show that the approach achieves results which are almost as good as the hand-crafted solution. Another practical challenge was to reduce the size of the rule set used by XCS. The size of the rule set used in Learning Classifier Systems has a large effect on the computation time and resources used during training and evaluation. Some new genetic algorithm operators are developed and shown to reduce the rule set size by 20-40%. Finally, a Q-learning system was developed for solving the agent prioritisation problem. The controller agent uses Q-learning to assess how much flexibility agents have in their path planning solutions and to give priority to agents with less flexibility. We show that this leads to significantly improved team performance over fixed priority schemes, without unduly affecting individual agent’s performance. In summary, the main contributions of this thesis are: (1) developing a novel mechanism to feedback a global reward from the last step to previous steps in XCS to derive a suitable local reward function; (2) developing a way of incorporating domain knowledge to prevent undesired actions to XCS; (3) providing new parent selection and genetic algorithm operators in XCS to reducing the final rule set size, which may also be a contribution in the Genetic Algorithm realm more generally; (4) demonstrating a prototype of an off-line path planning system in air transportation.
Keyword path planning, genetic-based machine learning, learning classifier system, conflict resolution, XCS, air traffic management
Additional Notes Colour Pages (PDF Page Number):39, 40, 49, 76-81, 83-86, 139, 142, 143

Citation counts: Google Scholar Search Google Scholar
Created: Thu, 05 Jan 2012, 13:10:18 EST by Kuang-yuan Chen on behalf of Library - Information Access Service