Computing Reviews
Today's Issue Hot Topics Search Browse Recommended My Account Log In
Review Help
Search
Model selection in reinforcement learning
Farahmand A., Szepesvári C. Machine Learning85 (3):299-332,2011.Type:Article
Date Reviewed: May 3 2012

This paper considers the problem of finding an optimal action-value function, and choosing the action to perform, in the context of batch reinforcement learning. The learning problem is to identify the best action to take in the context of a Markovian decision process. The eventual decision is derived from a sequence of action-value functions, which select an action based on the state of the Markovian process. Thus, given a list of action-value functions and a dataset of sampled transitions from a Markovian decision process, the “goal is [to choose] the action-value function with the smallest Bellman error.” The heart of the problem lies in the fact that the Bellman error itself must be estimated.

An algorithm, which the authors call BERMIN, is described. They show that this algorithm has oracle-like properties, in that the estimator’s error differs from the true “error by only a constant factor and a small remainder term that vanishes” as the sample size increases. The algorithm works by using only part of the input data to generate the action choice function, and using the rest of the input to estimate the error. The algorithm is explained in some detail, with a brief but clear preliminary overview. The second part of the paper gives a theoretical justification for the procedure. This provides additional insight into the various constants used in setting up the algorithm. Technical proofs related to the estimates used are provided in an appendix.

Reviewer:  J. P. E. Hodgson Review #: CR140110 (1209-0956)
Bookmark and Share
  Featured Reviewer  
 
Learning (I.2.6 )
 
 
Adaptive And Iterative Quadrature (G.1.4 ... )
 
 
Concept Learning (I.2.6 ... )
 
 
Model Classification (I.6.1 ... )
 
 
Model Development (I.6.5 )
 
Would you recommend this review?
yes
no
Other reviews under "Learning": Date
Learning in parallel networks: simulating learning in a probabilistic system
Hinton G. (ed) BYTE 10(4): 265-273, 1985. Type: Article
Nov 1 1985
Macro-operators: a weak method for learning
Korf R. Artificial Intelligence 26(1): 35-77, 1985. Type: Article
Feb 1 1986
Inferring (mal) rules from pupils’ protocols
Sleeman D.  Progress in artificial intelligence (, Orsay, France,391985. Type: Proceedings
Dec 1 1985
more...

E-Mail This Printer-Friendly
Send Your Comments
Contact Us
Reproduction in whole or in part without permission is prohibited.   Copyright 1999-2024 ThinkLoud®
Terms of Use
| Privacy Policy