For questions related to the concept of an episode in reinforcement learning.
Questions tagged [episodes]
4 questions
                    
                    2
                    
            votes
                
                0 answers
            
        Why are agents trained in episodes, even in non-episodic tasks?
Let's consider some non-episodic problem. Maybe a game which can go on forever.
My question is: Why are agents still trained in episodes?
My understanding is that the agent's neural network is updated in batches depending on the batch size (so every…
         
    
    
        Vladimir Belik
        
- 362
- 3
- 15
                    1
                    
            vote
                
                1 answer
            
        Why could there be "information leak" if we do not use fixed horizons?
In this page Limitations on horizon length from the Imitation library, the authors recommend that the user sticks to fixed horizon experiments because there could be "information leak" otherwise.
I'm having problems understanding this term, how can…
         
    
    
        aletelecomm
        
- 11
- 1
                    0
                    
            votes
                
                0 answers
            
        What are the parameters to consider when I set the length of an episode during the training of an RL model?
I'm working on an RL algorithm that receive a list of orders and needs to find the optimal clusters considering different parameters such as due date, location, etc. I don't know what should be the length of the episode and how it can impact on the…
        
    
                    0
                    
            votes
                
                1 answer
            
        Why is the sliding puzzle problem episodic?
Why is the sliding puzzle problem episodic and not sequential?
From what I understand, an environment is episodic if each episode is independent and doesn't affect past or future episodes. The actions in the next episode don't depend on the actions…
         
    
    
        numq
        
- 1
- 1
