Wireless channel selection with restless bandits

Kuhn, Julia and Nazarathy, Yoni (2017). Wireless channel selection with restless bandits. In Richard J. Boucherie and Nico M. van Dijk (Ed.), Markov decision processes in practice (pp. 463-485) Cham, Switzerland: Springer. doi:10.1007/978-3-319-47766-4_18


Author Kuhn, Julia
Nazarathy, Yoni
Title of chapter Wireless channel selection with restless bandits
Title of book Markov decision processes in practice
Place of Publication Cham, Switzerland
Publisher Springer
Publication Year 2017
Sub-type Research book chapter (original research)
DOI 10.1007/978-3-319-47766-4_18
Open Access Status Not yet assessed
Series International series in operations research and management science
ISBN 9783319477640
9783319477664
ISSN 0884-8289
Editor Richard J. Boucherie
Nico M. van Dijk
Volume number 248
Chapter number 18
Start page 463
End page 485
Total pages 23
Total chapters 21
Collection year 2018
Language eng
Abstract/Summary Wireless devices are often able to communicate on several alternative channels; for example, cellular phones may use several frequency bands and are equipped with base-station communication capability together with WiFi and Bluetooth communication. Automatic decision support systems in such devices need to decide which channels to use at any given time so as to maximize the long-run average throughput. A good decision policy needs to take into account that, due to cost, energy, technical, or performance constraints, the state of a channel is only sensed when it is selected for transmission. Therefore, the greedy strategy of always exploiting those channels assumed to yield the currently highest transmission rate is not necessarily optimal with respect to long-run average throughput. Rather, it may be favourable to give some priority to the exploration of channels of uncertain quality. In this chapter we model such on-line control problems as a special type of Restless Multi-Armed Bandit (RMAB) problem in a partially observable Markov decision process framework. We refer to such models as Reward-Observing Restless Multi-Armed Bandit (RORMAB) problems. These types of optimal control problems were previously considered in the literature in the context of: (i) the Gilbert-Elliot (GE) channels (where channels are modelled as a two state Markov chain), and (ii) Gaussian autoregressive (AR) channels of order 1. A virtue of this chapter is that we unify the presentation of both types of models under the umbrella of our newly defined RORMAB. Further, since RORMAB is a special type of RMAB we also present an account of RMAB problems together with a pedagogical development of the Whittle index which provides an approximately optimal control method. Numerical examples are provided.
Q-Index Code B1
Q-Index Status Provisional Code
Institutional Status UQ

 
Versions
Version Filter Type
Citation counts: Scopus Citation Count Cited 0 times in Scopus Article
Google Scholar Search Google Scholar
Created: Tue, 28 Mar 2017, 00:20:19 EST by Web Cron on behalf of Learning and Research Services (UQ Library)