A Q-Learning Algorithm with Continuous State Space

Kengy Barty, Pierre Girardeau, Jean-Sébastien Roy and
Cyrille Strugarek

october, 2006

Type de publication :

Article (revues avec comité de lecture)

Journal :

Optimization Online

Lien externe :

http://www.optimization-online.org/DB_FILE/2006/10/1477.pdf

HAL :

hal-00977539

Mots clés :

Q-Learning, Continuous state space, kernels

Résumé :

We study in this paper a Markov Decision Problem (MDP) with continuous state space and discrete decision variables. We propose an extension of the Q-learning algorithm introduced to solve this problem by Watkins in 1989 for completely discrete MDPs. Our algorithm relies on stochastic approximation and functional estimation, and uses kernels to locally update the Q-functions. We give a convergence proof for this algorithm under usual assumptions. Finally, we illustrate our algorithm by solving the classical moutain car task with continuous state space.

BibTeX :

@article{Bar-Gir-Roy-Str-2006,
    author={Kengy Barty and Pierre Girardeau and Jean-Sébastien Roy and 
           Cyrille Strugarek },
    title={A Q-Learning Algorithm with Continuous State Space },
    journal={Optimization Online },
    year={2006 },
    month={10},
}