Computing Reviews

Answer set programming for non-stationary Markov decision processes
Ferreira L., Bianchi R., Santos P., Lopez de Mantaras R. Applied Intelligence47(4):993-1007,2017.Type:Article
Date Reviewed: 03/13/18

Problem solving with computers often involves the exploration of paths from an initial state to a goal state. In addition to the size of this search space, there are many factors complicating this approach, especially in realistic environments. In their contribution, the authors combine three main approaches to deal with “non-stationary domains” prone to changes in the states, actions, or reward functions.

At the core are Markov decision processes (MDPs), used to formalize decision-making problems by identifying states, actions, a transition function between states through actions, and a reward function for reaching a state. Finding a solution to a problem then means identifying a sequence of actions that leads from the initial state to the goal state, maximizing the overall reward points to the best such solution.

In reinforcement learning (RL), an agent tries to maximize the reward function by observing actions and their outcomes. Answer set programming (ASP) is based on logic programming, and can be used to reduce large search spaces by identifying stable models of a program. With ASP identifying a core set of states, RL can be applied to nonstationary problems as well.

The authors examine their approach by solving a set of problems from a 2D grid world with obstacles, where an agent has to find a path from a starting point to an end point. In situations where obstacles constrain the possible paths, the reduction of the search space can be significant. If the configuration of the world changes during the experiment, this combined approach is most beneficial, although its performance depends on the degree of change that the environment undergoes.

While I believe that further validation is needed to demonstrate the practical benefits of this approach combining ASP, MDPs, and RL, I found the combination of logic-based reasoning with learning very interesting.

Reviewer:  Franz Kurfess Review #: CR145910 (1806-0327)

Reproduction in whole or in part without permission is prohibited.   Copyright 2024 ComputingReviews.com™
Terms of Use
| Privacy Policy