Computing Reviews
Today's Issue Hot Topics Search Browse Recommended My Account Log In
Review Help
Search
Answer set programming for non-stationary Markov decision processes
Ferreira L., Bianchi R., Santos P., Lopez de Mantaras R. Applied Intelligence47 (4):993-1007,2017.Type:Article
Date Reviewed: Mar 13 2018

Problem solving with computers often involves the exploration of paths from an initial state to a goal state. In addition to the size of this search space, there are many factors complicating this approach, especially in realistic environments. In their contribution, the authors combine three main approaches to deal with “non-stationary domains” prone to changes in the states, actions, or reward functions.

At the core are Markov decision processes (MDPs), used to formalize decision-making problems by identifying states, actions, a transition function between states through actions, and a reward function for reaching a state. Finding a solution to a problem then means identifying a sequence of actions that leads from the initial state to the goal state, maximizing the overall reward points to the best such solution.

In reinforcement learning (RL), an agent tries to maximize the reward function by observing actions and their outcomes. Answer set programming (ASP) is based on logic programming, and can be used to reduce large search spaces by identifying stable models of a program. With ASP identifying a core set of states, RL can be applied to nonstationary problems as well.

The authors examine their approach by solving a set of problems from a 2D grid world with obstacles, where an agent has to find a path from a starting point to an end point. In situations where obstacles constrain the possible paths, the reduction of the search space can be significant. If the configuration of the world changes during the experiment, this combined approach is most beneficial, although its performance depends on the degree of change that the environment undergoes.

While I believe that further validation is needed to demonstrate the practical benefits of this approach combining ASP, MDPs, and RL, I found the combination of logic-based reasoning with learning very interesting.

Reviewer:  Franz Kurfess Review #: CR145910 (1806-0327)
Bookmark and Share
  Featured Reviewer  
 
Markov Processes (G.3 ... )
 
 
Learning (I.2.6 )
 
Would you recommend this review?
yes
no
Other reviews under "Markov Processes": Date
Continuous-time Markov chains and applications
Yin G., Zhang Q., Springer-Verlag New York, Inc., New York, NY, 1998. Type: Book (9780387982441)
Jan 1 1999
Stochastic dynamic programming and the control of queueing systems
Sennott L., Wiley-Interscience, New York, NY, 1999. Type: Book (9780471161202)
Jan 1 1999
Lower bounds for randomized mutual exclusion
Kushilevitz E., Mansour Y., Rabin M., Zuckerman D. SIAM Journal on Computing 27(6): 1550-1563, 1998. Type: Article
Jul 1 1999
more...

E-Mail This Printer-Friendly
Send Your Comments
Contact Us
Reproduction in whole or in part without permission is prohibited.   Copyright 1999-2024 ThinkLoud®
Terms of Use
| Privacy Policy