For questions related to the concept of environment in reinforcement learning and other AI sub-fields.
Questions tagged [environment]
68 questions
16
votes
3 answers
Is the optimal policy always stochastic if the environment is also stochastic?
Is the optimal policy always stochastic (that is, a map from states to a probability distribution over actions) if the environment is also stochastic?
Intuitively, if the environment is deterministic (that is, if the agent is in a state $s$ and…
nbro
- 42,615
- 12
- 119
- 217
13
votes
2 answers
Is there a fundamental difference between an environment being stochastic and being partially observable?
In AI literature, deterministic vs stochastic and being fully-observable vs partially observable are usually considered two distinct properties of the environment.
I'm confused about this because what appears random can be described by hidden…
martinkunev
- 255
- 1
- 8
11
votes
1 answer
How does Q-learning work in stochastic environments?
The Q function uses the (current and future) states to determine the action that gets the highest reward.
However, in a stochastic environment, the current action (at the current state) does not determine the next state.
How does Q learning handle…
redlum
- 111
- 1
- 3
10
votes
3 answers
What do the different actions of the OpenAI gym's environment of 'Pong-v0' represent?
Printing action_space for Pong-v0 gives Discrete(6) as output, i.e. $0, 1, 2, 3, 4, 5$ are actions defined in the environment as per the documentation. However, the game needs only 2 controls. Why do we have this discrepancy? Further, is that…
cur10us
- 211
- 1
- 2
- 4
7
votes
1 answer
Are all fully observable environments episodic?
According to the definition of a fully observable environment in Russell & Norvig, AIMA (2nd ed), pages 41-44, an environment is only fully observable if it requires zero memory for an agent to perform optimally, that is, all relevant information is…
Francis M. Bacon
- 171
- 3
6
votes
3 answers
What exactly are partially observable environments?
I have trouble understanding the meaning of partially observable environments. Here's my doubt.
According to what I understand, the state of the environment is what precisely determines the next state and reward for any particular action taken. So,…
Chandrasekhara
- 63
- 1
- 5
6
votes
1 answer
Interesting examples of discrete stochastic games
SGs are a generalization of MDPs to multiple agents. Like this previous question on MDPs, are there any interesting examples of zero-sum, discrete SGs—preferably with small state and action spaces? I'm hoping to use such examples as benchmarks, but…
user76284
- 375
- 2
- 15
6
votes
1 answer
Benchmarks for reinforcement learning in discrete MDPs
To compare the performance of various algorithms for perfect information games, reasonable benchmarks include reversi and m,n,k-games (generalized tic-tac-toe). For imperfect information games, something like simplified poker is a reasonable…
user76284
- 375
- 2
- 15
5
votes
1 answer
Why is it recommended to use a "separate test environment" when evaluating a model?
I am training an agent (stable baselines3 algorithm) on a custom environment. During training, I want to have a callback so that for every $N$ steps of the learning process, I get the current model and run it on my environment $M$ times and log the…
jgklsdjfgkldsfaSDF
- 61
- 3
5
votes
2 answers
How to represent players in a multi agent environment so each model can distinguish its own player
So I have 2 models trained with the DQN algorithm that I want to train in a multi-agent environment to see how they react with each other. The models were trained in an environment consisting of 0's and 1's (-1's for the other model)where 1 means…
Milky
- 51
- 2
5
votes
1 answer
How to create a custom environment for reinforcement learning
I am a newbie in reinforcement learning working on a college project. The project is related to optimizing the hardware power. I am running proprietary software in Linux distribution (16.04). The goal is to use reinforcement learning and optimize…
NewToCoding
- 151
- 1
- 4
4
votes
2 answers
How to deal with changing environment in reinforcement learning
I am new to RL and I'm currently working on implementing a DQN and DDPG agent for a 2D car parking environment. I want to train my agent so that it can successfully traverse the env and park in the designated goal in the middle.
So, my question is:…
ashesofphoenix
- 43
- 1
- 4
4
votes
2 answers
How can a neural network work with continuous time?
I have an ANN model that receives an input and produces an output. The output is an action that interacts with the environment and changes the input accordingly. The network has a desired environment state which, in any turn, decides the desired…
Emad
- 183
- 1
- 10
4
votes
1 answer
How should I generate datasets for a SARSA agent when the environment is not simple?
I am currently working on my master's thesis and going to apply Deep-SARSA as my DRL algorithm. The problem is that there is no datasets available and I guess that I should generate them somehow. Datasets generation seems a common feature in this…
Shahin
- 153
- 4
4
votes
2 answers
Why do all states appear identical under the function approximation in the Short Corridor task?
This is the Short Corridor problem taken from the Sutton & Barto book. Here it's written:
The problem is difficult because all the states appear identical under the function approximation
But this doesn't make much sense as we can always choose…
ZERO NULLS
- 147
- 10