Q-Learning with Continuous State Spaces and Finite Decision Set

Kengy Barty, Pierre Girardeau, Jean-Sébastien Roy and 
Cyrille Strugarek
april, 2007
Publication type:
International conference with proceedings
Conference:
IEEE International Symposium on Approximate Dynamic Programming and Reinforcement Learning (ADPRL)
Abstract:
This paper aims to present an original technique in order to compute the optimal policy of a Markov decision problem with continuous state space and discrete decision variables. We propose an extension of the Q-learning algorithm introduced in 1989 by Watkins for discrete Markov decision problems. Our algorithm relies on stochastic approximation and functional estimation, and uses kernels to locally update the Q-functions. We state under mild assumptions a converge theorem for this algorithm. Finally, we illustrate our algorithm by solving two classical problems: the mountain car task and the puddle world task.
BibTeX:
@inproceedings{Bar-Gir-Roy-Str-2007,
    author={Kengy Barty and Pierre Girardeau and Jean-Sébastien Roy and 
           Cyrille Strugarek },
    title={Q-Learning with Continuous State Spaces and Finite Decision 
           Set },
    doi={10.1109/ADPRL.2007.368209 },
    organization={IEEE International Symposium on Approximate Dynamic 
           Programming and Reinforcement Learning (ADPRL) },
    year={2007 },
    month={4},
    pages={346--351},
}