Computing Reviews, the leading online review service for computing literature.

Search

Model selection in reinforcement learning
Farahmand A., Szepesvári C. Machine Learning85 (3):299-332,2011.Type:Article

Date Reviewed: May 3 2012

This paper considers the problem of finding an optimal action-value function, and choosing the action to perform, in the context of batch reinforcement learning. The learning problem is to identify the best action to take in the context of a Markovian decision process. The eventual decision is derived from a sequence of action-value functions, which select an action based on the state of the Markovian process. Thus, given a list of action-value functions and a dataset of sampled transitions from a Markovian decision process, the “goal is [to choose] the action-value function with the smallest Bellman error.” The heart of the problem lies in the fact that the Bellman error itself must be estimated. An algorithm, which the authors call BERMIN, is described. They show that this algorithm has oracle-like properties, in that the estimator’s error differs from the true “error by only a constant factor and a small remainder term that vanishes” as the sample size increases. The algorithm works by using only part of the input data to generate the action choice function, and using the rest of the input to estimate the error. The algorithm is explained in some detail, with a brief but clear preliminary overview. The second part of the paper gives a theoretical justification for the procedure. This provides additional insight into the various constants used in setting up the algorithm. Technical proofs related to the estimates used are provided in an appendix.

Reviewer: J. P. E. Hodgson	Review #: CR140110 (1209-0956)

Learning (I.2.6 )

Adaptive And Iterative Quadrature (G.1.4 ... )

Concept Learning (I.2.6 ... )

Model Classification (I.6.1 ... )

Model Development (I.6.5 )

Would you recommend this review?

yes

Other reviews under "Learning":	Date

Learning in parallel networks: simulating learning in a probabilistic system Hinton G. (ed) BYTE 10(4): 265-273, 1985. Type: Article	Nov 1 1985

Macro-operators: a weak method for learning Korf R. Artificial Intelligence 26(1): 35-77, 1985. Type: Article	Feb 1 1986

Inferring (mal) rules from pupils’ protocols Sleeman D. Progress in artificial intelligence (, Orsay, France,391985. Type: Proceedings	Dec 1 1985

more...

Reproduction in whole or in part without permission is prohibited. Copyright 1999-2024 ThinkLoud^®
Terms of Use | Privacy Policy