Computing Reviews, the leading online review service for computing literature.

Search

Answer set programming for non-stationary Markov decision processes
Ferreira L., Bianchi R., Santos P., Lopez de Mantaras R. Applied Intelligence47 (4):993-1007,2017.Type:Article

Date Reviewed: Mar 13 2018

Problem solving with computers often involves the exploration of paths from an initial state to a goal state. In addition to the size of this search space, there are many factors complicating this approach, especially in realistic environments. In their contribution, the authors combine three main approaches to deal with “non-stationary domains” prone to changes in the states, actions, or reward functions. At the core are Markov decision processes (MDPs), used to formalize decision-making problems by identifying states, actions, a transition function between states through actions, and a reward function for reaching a state. Finding a solution to a problem then means identifying a sequence of actions that leads from the initial state to the goal state, maximizing the overall reward points to the best such solution. In reinforcement learning (RL), an agent tries to maximize the reward function by observing actions and their outcomes. Answer set programming (ASP) is based on logic programming, and can be used to reduce large search spaces by identifying stable models of a program. With ASP identifying a core set of states, RL can be applied to nonstationary problems as well. The authors examine their approach by solving a set of problems from a 2D grid world with obstacles, where an agent has to find a path from a starting point to an end point. In situations where obstacles constrain the possible paths, the reduction of the search space can be significant. If the configuration of the world changes during the experiment, this combined approach is most beneficial, although its performance depends on the degree of change that the environment undergoes. While I believe that further validation is needed to demonstrate the practical benefits of this approach combining ASP, MDPs, and RL, I found the combination of logic-based reasoning with learning very interesting.

Reviewer: Franz Kurfess	Review #: CR145910 (1806-0327)

Markov Processes (G.3 ... )

Learning (I.2.6 )

Would you recommend this review?

yes

Other reviews under "Markov Processes":	Date

Continuous-time Markov chains and applications Yin G., Zhang Q., Springer-Verlag New York, Inc., New York, NY, 1998. Type: Book (9780387982441)	Jan 1 1999

Stochastic dynamic programming and the control of queueing systems Sennott L., Wiley-Interscience, New York, NY, 1999. Type: Book (9780471161202)	Jan 1 1999

Lower bounds for randomized mutual exclusion Kushilevitz E., Mansour Y., Rabin M., Zuckerman D. SIAM Journal on Computing 27(6): 1550-1563, 1998. Type: Article	Jul 1 1999

more...

Reproduction in whole or in part without permission is prohibited. Copyright 1999-2024 ThinkLoud^®
Terms of Use | Privacy Policy