ComputingReviews.com

Model selection in reinforcement learning
Farahmand A., Szepesvári C. Machine Learning85(3):299-332,2011.Type:Article

Date Reviewed: 05/03/12

This paper considers the problem of finding an optimal action-value function, and choosing the action to perform, in the context of batch reinforcement learning. The learning problem is to identify the best action to take in the context of a Markovian decision process. The eventual decision is derived from a sequence of action-value functions, which select an action based on the state of the Markovian process. Thus, given a list of action-value functions and a dataset of sampled transitions from a Markovian decision process, the “goal is [to choose] the action-value function with the smallest Bellman error.” The heart of the problem lies in the fact that the Bellman error itself must be estimated.

An algorithm, which the authors call BERMIN, is described. They show that this algorithm has oracle-like properties, in that the estimator’s error differs from the true “error by only a constant factor and a small remainder term that vanishes” as the sample size increases. The algorithm works by using only part of the input data to generate the action choice function, and using the rest of the input to estimate the error. The algorithm is explained in some detail, with a brief but clear preliminary overview. The second part of the paper gives a theoretical justification for the procedure. This provides additional insight into the various constants used in setting up the algorithm. Technical proofs related to the estimates used are provided in an appendix.

Reviewer: J. P. E. Hodgson

Review #: CR140110 (1209-0956)

Reproduction in whole or in part without permission is prohibited. Copyright 2024 ComputingReviews.com™
Terms of Use | Privacy Policy