For questions related to the transition model of a Markov decision process or other Markov models. This term is often used in reinforcement learning (RL) to distinguish between model-based and model-free RL algorithms, where model-based algorithms use the transition model while model-free don't use it.
Questions tagged [transition-model]
12 questions
                    
                    8
                    
            votes
                
                1 answer
            
        How to fill in missing transitions when sampling an MDP transition table?
I have a simulator modelling a relatively complex scenario. I extract ~12 discrete features from the simulator state which forms the basis for my MDP state space.
Suppose I am estimating the transition table for an MDP by running a large number of…
         
    
    
        Brendan Hill
        
- 263
- 1
- 6
                    6
                    
            votes
                
                5 answers
            
        How do compute the table for $p(s',r|s,a)$ (exercise 3.5 in Sutton & Barto's book)?
I am trying to study the book Reinforcement Learning: An Introduction (Sutton & Barto, 2018). In chapter 3.1 the authors state the following exercise
Exercise 3.5 Give a table analogous to that in Example 3.3, but for $p(s',r|s,a)$. It should have…
         
    
    
        MrYouMath
        
- 255
- 1
- 7
                    6
                    
            votes
                
                1 answer
            
        What are the state space and the state transition function in AI?
I'm studying for my AI final exam, and I'm stuck in the state space representation. I understand initial and goal states, but what I don't understand is the state space and state transition function. Can someone explain what are they with…
         
    
    
        İsmail Uysal
        
- 63
- 1
- 4
                    6
                    
            votes
                
                1 answer
            
        If the current state is $S_t$ and the actions are chosen according to $\pi$, what is the expectation of $R_{t+1}$ in terms of $\pi$ and $p$?
I'm trying to solve exercise 3.11 from the book Sutton and Barto's book (2nd edition)
Exercise 3.11 If the current state is $S_t$ , and actions are selected according to a stochastic policy $\pi$, then what is the expectation of $R_{t+1}$ in terms…
         
    
    
        tmaric
        
- 402
- 3
- 14
                    4
                    
            votes
                
                1 answer
            
        What is the difference between a distribution model and a sampling model in Reinforcement Learning?
The book from Sutton and Barto, Reinforcement Learning: An Introduction, define a model in Reinforcement Learning as
something that mimics the behavior of the environment, or more generally, that allows inferences to be made about how the…
         
    
    
        A. Pesare
        
- 141
- 4
                    4
                    
            votes
                
                1 answer
            
        How should I implement the state transition when it is a Gaussian distribution?
I am reading this paper Anxiety, Avoidance and Sequential Evaluation and is confused about the implementation of a specific lab study. Namely, the authors model what is called the Balloon task using a simple MDP for which the description is…
         
    
    
        dezdichado
        
- 182
- 8
                    3
                    
            votes
                
                1 answer
            
        How can we find the value function by solving a system of linear equations without knowing the policy?
An MDP is a Markov Reward Process with decisions, it’s an environment in which all states are Markov. This is what we want to solve. An MDP is a tuple $(S, A, P, R, \gamma)$, where $S$ is our state space, $A$ is a finite set of actions, $P$ is the…
         
    
    
        Abc1729
        
- 45
- 4
                    2
                    
            votes
                
                1 answer
            
        Is it appropriate to represent 'total failure' as an absorbing state?
My understanding is that, in Markov decision processes, absorbing state are states which can transition only to themselves and that these transitions generate rewards of 0. I know that absorbing states are commonly used to represent goals, so an…
         
    
    
        K--
        
- 121
- 2
                    1
                    
            vote
                
                0 answers
            
        How to generalize finite MDP to general MDP?
Suppose, for simplicity sake, to be in a discrete time domain with the action set being the same for all states $S \in \mathcal{S}$. Thus, in a finite Markov Decision Process, the sets $\mathcal{A}$, $\mathcal{S}$, and $\mathcal{R}$ have a finite…
         
    
    
        gvgramazio
        
- 706
- 2
- 8
- 20
                    1
                    
            vote
                
                1 answer
            
        Can the state transition function be dynamic in reinforcement learning?
In general, there are two types of transition functions in reinforcement learning. Mathematically, they are as follows
#1: Stochastic state transition function:
$$T : S \times A \times S \rightarrow  [0, 1]$$
#2: Deterministic state transition…
         
    
    
        hanugm
        
- 4,102
- 3
- 29
- 63
                    1
                    
            vote
                
                1 answer
            
        How do you generate the transition probabilities of a non-trivial MDP?
I understand an MDP (Markov Decision Process) model is a tuple of $\{S, A, P, R \}$ where:
$S$ is a discrete set of states
$A$ is a discrete set of actions
$P$ is the transition matrix ie. $P(s' \mid s, a) \rightarrow [0,1]$
$R$ is the reward…
         
    
    
        Brendan Hill
        
- 263
- 1
- 6
                    1
                    
            vote
                
                2 answers
            
        Does "transition model" alone in an MDP imply it's non-deterministic?
I am looking at a lecture on POMDP, and the context is that, when the quadcopter can't see the landmarks, it has to use reckoning. And then he mentions the transition model is not deterministic, hence the uncertainty grows.
Can transition models in…
         
    
    
        gfdsal
        
- 170
- 8