Learning a Machine for the Decision in a Partially Observable Markov Universe

August 11, 2004

Learning Reward Machines from Partially Observed Optimal Policies

February 6, 2025

85% Match

Mohamad Louai Shehab, Antoine Aspeel, Necmiye Ozay

Machine Learning

Formal Languages and Automat...

Inverse reinforcement learning is the problem of inferring a reward function from an optimal policy. In this work, it is assumed that the reward is expressed as a reward machine whose transitions depend on atomic propositions associated with the state of a Markov Decision Process (MDP). Our goal is to identify the true reward machine using finite information. To this end, we first introduce the notion of a prefix tree policy which associates a distribution of actions to each ...

Find SimilarView on arXiv

Remote Estimation of Markov Processes over Costly Channels: On the Benefits of Implicit Information

January 31, 2024

85% Match

Edoardo David Santi, Touraj Soleymani, Deniz Gunduz

Systems and Control

In this paper, we study the remote estimation problem of a Markov process over a channel with a cost. We formulate this problem as an infinite horizon optimization problem with two players, i.e., a sensor and a monitor, that have distinct information, and with a reward function that takes into account both the communication cost and the estimation quality. We show that the main challenge in solving this problem is associated with the consideration of implicit information, i.e...

Find SimilarView on arXiv

Experimental results : Reinforcement Learning of POMDPs using Spectral Methods

May 7, 2017

85% Match

Kamyar Azizzadenesheli, Alessandro Lazaric, Animashree Anandkumar

Artificial Intelligence

Machine Learning

We propose a new reinforcement learning algorithm for partially observable Markov decision processes (POMDP) based on spectral decomposition methods. While spectral methods have been previously employed for consistent learning of (passive) latent variable models such as hidden Markov models, POMDPs are more challenging since the learner interacts with the environment and possibly changes the future observations in the process. We devise a learning algorithm running through ep...

Find SimilarView on arXiv

Probabilistic Systems with Hidden State and Unobservable Transitions

May 27, 2022

85% Match

Rebecca Bernemann, Barbara König, ... , Weis Torben

Machine Learning

Distributed, Parallel, and C...

Logic in Computer Science

We consider probabilistic systems with hidden state and unobservable transitions, an extension of Hidden Markov Models (HMMs) that in particular admits unobservable {\epsilon}-transitions (also called null transitions), allowing state changes of which the observer is unaware. Due to the presence of {\epsilon}-loops this additional feature complicates the theory and requires to carefully set up the corresponding probability space and random variables. In particular we present ...

Find SimilarView on arXiv

Deep Hierarchical Reinforcement Learning Algorithm in Partially Observable Markov Decision Processes

May 11, 2018

85% Match

Le Pham Tuyen, Ngo Anh Vien, ... , Chung TaeChoong

Artificial Intelligence

In recent years, reinforcement learning has achieved many remarkable successes due to the growing adoption of deep learning techniques and the rapid growth in computing power. Nevertheless, it is well-known that flat reinforcement learning algorithms are often not able to learn well and data-efficient in tasks having hierarchical structures, e.g. consisting of multiple subtasks. Hierarchical reinforcement learning is a principled approach that is able to tackle these challeng...

Find SimilarView on arXiv

Feature Markov Decision Processes

December 25, 2008

85% Match

Marcus Hutter

Artificial Intelligence

Information Theory

Machine Learning

Information Theory

General purpose intelligent learning agents cycle through (complex,non-MDP) sequences of observations, actions, and rewards. On the other hand, reinforcement learning is well-developed for small finite state Markov Decision Processes (MDPs). So far it is an art performed by human designers to extract the right state representation out of the bare observations, i.e. to reduce the agent setup to the MDP framework. Before we can think of mechanizing this search for suitable MDPs...

Find SimilarView on arXiv

Structure Learning Using Forced Pruning

December 3, 2018

85% Match

Ahmed Abdelatty, Pracheta Sahoo, Chiradeep Roy

Machine Learning

Markov networks are widely used in many Machine Learning applications including natural language processing, computer vision, and bioinformatics . Learning Markov networks have many complications ranging from intractable computations involved to the possibility of learning a model with a huge number of parameters. In this report, we provide a computationally tractable greedy heuristic for learning Markov networks structure. The proposed heuristic results in a model with a lim...

Find SimilarView on arXiv

From Reinforcement Learning to Optimal Control: A unified framework for sequential decisions

December 7, 2019

85% Match

Warren B Powell

Artificial Intelligence

Machine Learning

Systems and Control

Machine Learning

There are over 15 distinct communities that work in the general area of sequential decisions and information, often referred to as decisions under uncertainty or stochastic optimization. We focus on two of the most important fields: stochastic optimal control, with its roots in deterministic optimal control, and reinforcement learning, with its roots in Markov decision processes. Building on prior work, we describe a unified framework that covers all 15 different communities,...

Find SimilarView on arXiv

Prediction Algorithms Achieving Bayesian Decision Theoretical Optimality Based on Decision Trees as Data Observation Processes

June 12, 2023

85% Match

Yuta Nakahara, Shota Saito, Naoki Ichijo, ... , Matsushima Toshiyasu

Machine Learning

In the field of decision trees, most previous studies have difficulty ensuring the statistical optimality of a prediction of new data and suffer from overfitting because trees are usually used only to represent prediction functions to be constructed from given data. In contrast, some studies, including this paper, used the trees to represent stochastic data observation processes behind given data. Moreover, they derived the statistically optimal prediction, which is robust ag...

Find SimilarView on arXiv

Learning Finite-State Controllers for Partially Observable Environments

January 23, 2013

85% Match

Nicolas Meuleau, Leonid Peshkin, ... , Kaelbling Leslie Pack

Artificial Intelligence

Systems and Control

Reactive (memoryless) policies are sufficient in completely observable Markov decision processes (MDPs), but some kind of memory is usually necessary for optimal control of a partially observable MDP. Policies with finite memory can be represented as finite-state automata. In this paper, we extend Baird and Moore's VAPS algorithm to the problem of learning general finite-state automata. Because it performs stochastic gradient descent, this algorithm can be shown to converge t...

Find SimilarView on arXiv