Questions tagged [gym]

For questions about OpenAI's gym library, which provides a set of APIs to access different types of environments to train reinforcement learning agents.

78 questions
10
votes
3 answers

What do the different actions of the OpenAI gym's environment of 'Pong-v0' represent?

Printing action_space for Pong-v0 gives Discrete(6) as output, i.e. $0, 1, 2, 3, 4, 5$ are actions defined in the environment as per the documentation. However, the game needs only 2 controls. Why do we have this discrepancy? Further, is that…
cur10us
  • 211
  • 1
  • 2
  • 4
8
votes
2 answers

Deep Q-Learning "catastrophic drop" reasons?

I am implementing some "classical" papers in Model Free RL like DQN, Double DQN, and Double DQN with Prioritized Replay. Through the various models im running on CartPole-v1 using the same underlying NN, I am noticing all of the above 3 exhibit a…
Virus
  • 81
  • 1
  • 5
7
votes
1 answer

What are the state-of-the-art results in OpenAI's gym environments?

What are the state-of-the-art results in OpenAI's gym environments? Is there a link to a paper/article that describes them and how these SOTA results were calculated?
6
votes
1 answer

Why did the openai's gym website close?

Openai's gym website redirects to the GitHub repository. Why did the openai's gym website close?
Franck Dernoncourt
  • 3,473
  • 2
  • 21
  • 39
6
votes
2 answers

My Deep Q-Learning Network does not learn for OpenAI gym's cartpole problem

I am implementing OpenAI gym's cartpole problem using Deep Q-Learning (DQN). I followed tutorials (video and otherwise) and learned all about it. I implemented a code for myself and I thought it should work, but the agent is not learning. I will…
SJa
  • 393
  • 3
  • 17
6
votes
1 answer

How to define an action space when an agent can take multiple sub-actions in a step?

I'm attempting to design an action space in OpenAI's gym and hitting the following roadblock. I've looked at this post which is closely related but subtly different. The environment I'm writing needs to allow an agent to make between $1$ and $n$…
5
votes
1 answer

How powerful is OpenAI's Gym and Universe in board games area?

I'm a big fan of computer board games and would like to make Python chess/go/shogi/mancala programs. Having heard of reinforcement learning, I decided to look at OpenAI Gym. But first of all, I would like to know, is it possible using OpenAI…
Taissa
  • 63
  • 4
4
votes
1 answer

What is the mapping between actions and numbers in OpenAI's gym?

In a gym environment, the action space is often a discrete space, where each action is labeled by an integer. I cannot find a way to figure out the correspondence between action and number. For example, in frozen lake, the agent can move Up, Down,…
Llewlyn
  • 143
  • 4
4
votes
2 answers

In the frozen lake environment of Gymnasium, why aren't the holes negatively rewarded?

In this given map, for example, the agent needs to perform the downward action twice to reach the reward. Considering that initially all actions are equally likely, the probability of reaching the reward is really low. If the agent never encounters…
DeadAsDuck
  • 103
  • 6
4
votes
1 answer

Finding the true Q-values in gymnaiusm

I'm very interested in the true Q-values of state-action pairs in the classic control environments in gymnasium. Contrary to the usual goal, the ordering of the Q-values itself is irrelevant; a very close to accurate estimation of the Q-values is…
Mark B
  • 43
  • 3
4
votes
3 answers

How do I get started with multi-agent reinforcement learning?

Is there any tutorial that walks through a multi-agent reinforcement learning implementation (in Python) using libraries such as OpenAI's Gym (for the environment), TF-agents, and stable-baselines-3? I searched a lot, but I was not able to find any…
4
votes
0 answers

Unable to train Coach for Banana-v0 Gym environment

I have just started playing with Reinforcement learning and starting from the basics I'm trying to figure out how to solve Banana Gym with coach. Essentially Banana-v0 env represents a Banana shop that buys a banana for \$1 on day 1 and has 3 days…
4
votes
2 answers

What is the difference between A2C and running an agent in an environment vector?

I've implemented A2C. I'm now wondering why would we have multiple actors walk around the environment and gather rewards, why not just have a single agent run in an environment vector? I personally think this will be more efficient since now all…
3
votes
2 answers

How do you deal with movement inertia in an environment after a step?

I was wondering how can we deal with movement inertia in an environment that is constantly changing? Imagine that you make a step on an environment that moves a ball. When you make the step, you make the ball move and at one time, it returns an…
3
votes
1 answer

Why is training longer not better in reinforcement learning?

I have trained an RL agent (PPO) for 6 million steps to solve the OpenAI gym LunarLander-v2. Surprisingly, the agent performs best already after 320K steps and is getting worse after that. In the tensorboard log, I can see that the mean, min reward…
1
2 3 4 5 6