Learning a Machine for the Decision in a...

Cross-Entropic Learning of a Machine for the Decision in a Partially Observable Universe

May 18, 2006

97% Match

Frederic DGA/CTA/DT/GIP Dambreville

math.OC

cs.AI

cs.LG

cs.NE

cs.RO

math.ST

stat.TH

Revision of the paper previously entitled "Learning a Machine for the Decision in a Partially Observable Markov Universe" In this paper, we are interested in optimal decisions in a partially observable universe. Our approach is to directly approximate an optimal strategic tree depending on the observation. This approximation is made by means of a parameterized probabilistic law. A particular family of hidden Markov models, with input \emph{and} output, is considered as a mode...

Find SimilarView on arXiv

Hidden Markov Model Estimation-Based Q-learning for Partially Observable Markov Decision Process

September 17, 2018

88% Match

Hyung-Jin Yoon, Donghwan Lee, Naira Hovakimyan

Machine Learning

Systems and Control

Machine Learning

The objective is to study an on-line Hidden Markov model (HMM) estimation-based Q-learning algorithm for partially observable Markov decision process (POMDP) on finite state and action sets. When the full state observation is available, Q-learning finds the optimal action-value function given the current action (Q function). However, Q-learning can perform poorly when the full state observation is not available. In this paper, we formulate the POMDP estimation into a HMM esti...

Find SimilarView on arXiv

Active Perception with Initial-State Uncertainty: A Policy Gradient Method

September 24, 2024

87% Match

Chongyang Shi, Shuo Han, ... , Fu Jie

Systems and Control

This paper studies the synthesis of an active perception policy that maximizes the information leakage of the initial state in a stochastic system modeled as a hidden Markov model (HMM). Specifically, the emission function of the HMM is controllable with a set of perception or sensor query actions. Given the goal is to infer the initial state from partial observations in the HMM, we use Shannon conditional entropy as the planning objective and develop a novel policy gradient ...

Find SimilarView on arXiv

On learning parametric-output HMMs

February 25, 2013

87% Match

Aryeh Kontorovich, Boaz Nadler, Roi Weiss

Machine Learning

Statistics Theory

Machine Learning

Statistics Theory

We present a novel approach for learning an HMM whose outputs are distributed according to a parametric family. This is done by {\em decoupling} the learning task into two steps: first estimating the output parameters, and then estimating the hidden states transition probabilities. The first step is accomplished by fitting a mixture model to the output stationary distribution. Given the parameters of this mixture model, the second step is formulated as the solution of an easi...

Find SimilarView on arXiv

A Concise Information-Theoretic Derivation of the Baum-Welch algorithm

June 24, 2014

87% Match

Alireza Nejati, Charles Unsworth

Information Theory

Machine Learning

Information Theory

We derive the Baum-Welch algorithm for hidden Markov models (HMMs) through an information-theoretical approach using cross-entropy instead of the Lagrange multiplier approach which is universal in machine learning literature. The proposed approach provides a more concise derivation of the Baum-Welch method and naturally generalizes to multiple observations.

Find SimilarView on arXiv

Reinforcement Learning and Control as Probabilistic Inference: Tutorial and Review

May 2, 2018

86% Match

Sergey Levine

Machine Learning

Artificial Intelligence

Robotics

Machine Learning

The framework of reinforcement learning or optimal control provides a mathematical formalization of intelligent decision making that is powerful and broadly applicable. While the general form of the reinforcement learning problem enables effective reasoning about uncertainty, the connection between reinforcement learning and inference in probabilistic models is not immediately obvious. However, such a connection has considerable value when it comes to algorithm design: formal...

Find SimilarView on arXiv

Learning Factored Markov Decision Processes with Unawareness

February 27, 2019

86% Match

Craig Innes, Alex Lascarides

Artificial Intelligence

Methods for learning and planning in sequential decision problems often assume the learner is aware of all possible states and actions in advance. This assumption is sometimes untenable. In this paper, we give a method to learn factored markov decision problems from both domain exploration and expert assistance, which guarantees convergence to near-optimal behaviour, even when the agent begins unaware of factors critical to success. Our experiments show our agent learns optim...

Find SimilarView on arXiv

On Separation Between Learning and Control in Partially Observed Markov Decision Processes

November 28, 2022

86% Match

Andreas A. Malikopoulos

Optimization and Control

Cyber-physical systems (CPS) encounter a large volume of data which is added to the system gradually in real time and not altogether in advance. As the volume of data increases, the domain of the control strategies also increases, and thus it becomes challenging to search for an optimal strategy. Even if an optimal control strategy is found, implementing such strategies with increasing domains is burdensome. To derive an optimal control strategy in CPS, we typically assume an...

Find SimilarView on arXiv

When Is Partially Observable Reinforcement Learning Not Scary?

April 19, 2022

86% Match

Qinghua Liu, Alan Chung, ... , Jin Chi

Machine Learning

Artificial Intelligence

Systems and Control

Machine Learning

Applications of Reinforcement Learning (RL), in which agents learn to make a sequence of decisions despite lacking complete information about the latent states of the controlled system, that is, they act under partial observability of the states, are ubiquitous. Partially observable RL can be notoriously difficult -- well-known information-theoretic results show that learning partially observable Markov decision processes (POMDPs) requires an exponential number of samples in ...

Find SimilarView on arXiv

A Spectral Algorithm for Learning Hidden Markov Models

November 26, 2008

86% Match

Daniel Hsu, Sham M. Kakade, Tong Zhang

Machine Learning

Artificial Intelligence

Hidden Markov Models (HMMs) are one of the most fundamental and widely used statistical tools for modeling discrete time series. In general, learning HMMs from data is computationally hard (under cryptographic assumptions), and practitioners typically resort to search heuristics which suffer from the usual local optima issues. We prove that under a natural separation condition (bounds on the smallest singular value of the HMM parameters), there is an efficient and provably co...

Find SimilarView on arXiv