This thesis investigates visual information processing methods that a mobile robot can use to track its location. The methods learn the visual appearance of diﬀerent places rather attempting to measure the robot’s position relative to landmarks. The methods are designed to interface with a biologically inspired algorithm for reasoning about robot location.
Localisation by visual appearance implies a processing scheme which converts camera data to position information without employing a geometric world model. The objective in such an approach is to sidestep the diﬃculties associated with interpreting and more signiﬁcantly building geometric environment models with vision. The model building problem can become particularly diﬃcult when it occurs simultaneously with localisation (SLAM). Such a situation occurs when a robot is required to explore an environment without becoming lost in the process. Two paradigms of visual processing for this problem are examined: view and feature learning.
View learning is investigated in both indoor and outdoor scenarios, showing that simple processing of visual appearance can be used as an external sensor in a robotic SLAM system. A signiﬁcant mapping and localisation experiment can be successfully performed using methods as simple as comparing very low resolution images. Three diﬀerent image representations are considered for both indoor and outdoor environments: low resolution greyscales; edge features; and colour histograms. The consistency and usefulness of these representations have been investigated by examining the rates of accuracy and recognition between two indoor experiments separated by a week. Complex cell edge features are shown to be a sensible choice for indoor environments and colour histograms are shown to provide a good generic descriptor in outdoor situations.
View learning is used in many visual appearance based SLAM methods. View learning is tested in both indoor and outdoor environments, and found to be practicable in the short term in both situations. The indoor experiments examine the sensitivity to changes in the environment during changes between daytime and night-time which indicate that, given appropriate pre-processing, view learning is stable enough to construct long term maps. There is trade-oﬀ between recognition and accuracy as the threshold of recognition is varied. View learning does not need to be an exact aﬀair – views can ambiguously code multiple places and places can respond to multiple views. The important part of the process is the conversion of visual information into a sparse vector that it is consistent with robot position.
The implementation details reveal several points about appearance based SLAM. Without panoramic sensors it is unreasonable to try an construct a map that that is independent of the robot’s heading. Redundancy or ambiguity in the environment (which can be ampliﬁed by the need for spatial generalisation) makes spatial ﬁltering a necessity. The proximity of objects to the robot aﬀects its generalisation capabilities and the robot’s movement behaviour will aﬀect its ability to remain localised.
Feature learning provides an alternative to view learning, avoiding the possibility of an expanding view memory. The ideas developed in sparse coding literature appear to provide a principled method for developing relevant image features for a particular type of environment. In particular, sparse coding allows for a ﬁnite feature set that is adapted to the statistics of its environment. Learning the relationship between individual image features and locations in the environment is a more diﬃcult task, as there is more positional ambiguity present in the system. However the ambiguity introduced can be ﬁltered to produce consistent robot positions.
By using rapid learning and recognition of visual appearance, coupled with appropriate methods for spatial generalisation, robot position can be reliably tracked without a geometric model of the environment.