Learning a Machine for the Decision in a Partially Observable Markov Universe

August 11, 2004

Reinforcement Learning: Stochastic Approximation Algorithms for Markov Decision Processes

December 23, 2015

85% Match

Vikram Krishnamurthy

Optimization and Control

This article presents a short and concise description of stochastic approximation algorithms in reinforcement learning of Markov decision processes. The algorithms can also be used as a suboptimal method for partially observed Markov decision processes.

Find SimilarView on arXiv

Using HMM in Strategic Games

April 1, 2014

85% Match

Mario Federal University of Rio de Janeiro Benevides, Isaque Federal University of Rio de Janeiro Lima, ... , Rougemont Pedro Federal University of Rio de Janeiro

Computer Science and Game Th...

Information Retrieval

Machine Learning

In this paper we describe an approach to resolve strategic games in which players can assume different types along the game. Our goal is to infer which type the opponent is adopting at each moment so that we can increase the player's odds. To achieve that we use Markov games combined with hidden Markov model. We discuss a hypothetical example of a tennis game whose solution can be applied to any game with similar characteristics.

Find SimilarView on arXiv

Markov Abstractions for PAC Reinforcement Learning in Non-Markov Decision Processes

April 29, 2022

85% Match

Alessandro Ronca, Gabriel Paludo Licks, Giacomo Giuseppe De

Machine Learning

Artificial Intelligence

Our work aims at developing reinforcement learning algorithms that do not rely on the Markov assumption. We consider the class of Non-Markov Decision Processes where histories can be abstracted into a finite set of states while preserving the dynamics. We call it a Markov abstraction since it induces a Markov Decision Process over a set of states that encode the non-Markov dynamics. This phenomenon underlies the recently introduced Regular Decision Processes (as well as POMDP...

Find SimilarView on arXiv

Recurrent Sum-Product-Max Networks for Decision Making in Perfectly-Observed Environments

June 12, 2020

85% Match

Hari Teja Tatavarti, Prashant Doshi, Layton Hayes

Artificial Intelligence

Machine Learning

Recent investigations into sum-product-max networks (SPMN) that generalize sum-product networks (SPN) offer a data-driven alternative for decision making, which has predominantly relied on handcrafted models. SPMNs computationally represent a probabilistic decision-making problem whose solution scales linearly in the size of the network. However, SPMNs are not well suited for sequential decision making over multiple time steps. In this paper, we present recurrent SPMNs (RSPMN...

Find SimilarView on arXiv

Inferring Probabilistic Reward Machines from Non-Markovian Reward Processes for Reinforcement Learning

July 9, 2021

85% Match

Taylor Dohmen, Noah Topper, George Atia, Andre Beckus, ... , Velasquez Alvaro

Machine Learning

Formal Languages and Automat...

Machine Learning

The success of reinforcement learning in typical settings is predicated on Markovian assumptions on the reward signal by which an agent learns optimal policies. In recent years, the use of reward machines has relaxed this assumption by enabling a structured representation of non-Markovian rewards. In particular, such representations can be used to augment the state space of the underlying decision process, thereby facilitating non-Markovian reinforcement learning. However, th...

Find SimilarView on arXiv

Observer-Aware Probabilistic Planning Under Partial Observability

February 14, 2025

85% Match

Salomé Lepers, Vincent Thomas, Olivier Buffet

Artificial Intelligence

In this article, we are interested in planning problems where the agent is aware of the presence of an observer, and where this observer is in a partial observability situation. The agent has to choose its strategy so as to optimize the information transmitted by observations. Building on observer-aware Markov decision processes (OAMDPs), we propose a framework to handle this type of problems and thus formalize properties such as legibility, explicability and predictability. ...

Find SimilarView on arXiv

Provable Representation with Efficient Planning for Partially Observable Reinforcement Learning

November 20, 2023

85% Match

Hongming Zhang, Tongzheng Ren, Chenjun Xiao, ... , Dai Bo

Machine Learning

Artificial Intelligence

Machine Learning

In real-world reinforcement learning problems, the state information is often only partially observable, which breaks the basic assumption in Markov decision processes, and thus, leads to inferior performances. Partially Observable Markov Decision Processes have been introduced to explicitly take the issue into account for learning, exploration, and planning, but presenting significant computational and statistical challenges. To address these difficulties, we exploit the rep...

Find SimilarView on arXiv

A generalized risk approach to path inference based on hidden Markov models

July 21, 2010

85% Match

Jüri Lember, Alexey A. Koloydenko

Machine Learning

Computation

Motivated by the unceasing interest in hidden Markov models (HMMs), this paper re-examines hidden path inference in these models, using primarily a risk-based framework. While the most common maximum a posteriori (MAP), or Viterbi, path estimator and the minimum error, or Posterior Decoder (PD), have long been around, other path estimators, or decoders, have been either only hinted at or applied more recently and in dedicated applications generally unfamiliar to the statistic...

Find SimilarView on arXiv

Intermittently Observable Markov Decision Processes

February 23, 2023

85% Match

Gongpu Chen, Soung-Chang Liew

Artificial Intelligence

Systems and Control

This paper investigates MDPs with intermittent state information. We consider a scenario where the controller perceives the state information of the process via an unreliable communication channel. The transmissions of state information over the whole time horizon are modeled as a Bernoulli lossy process. Hence, the problem is finding an optimal policy for selecting actions in the presence of state information losses. We first formulate the problem as a belief MDP to establis...

Find SimilarView on arXiv

KL-learning: Online solution of Kullback-Leibler control problems

December 9, 2011

85% Match

Joris Bierkens, Bert Kappen

Optimization and Control

Artificial Intelligence

We introduce a stochastic approximation method for the solution of an ergodic Kullback-Leibler control problem. A Kullback-Leibler control problem is a Markov decision process on a finite state space in which the control cost is proportional to a Kullback-Leibler divergence of the controlled transition probabilities with respect to the uncontrolled transition probabilities. The algorithm discussed in this work allows for a sound theoretical analysis using the ODE method. In a...

Find SimilarView on arXiv