Highest Voted 'hindsight-experience-replay' Questions - Artificial Intelligence Stack Exchange

5

votes

1 answer

How does the optimization process in hindsight experience replay exactly work?

I was reading the following research paper Hindsight Experience Replay. This is the paper that introduces a concept called Hindsight Experience Replay (HER), which basically attempts to alleviate the infamous sparse reward problem. It is based on…

asked Mar 12 '20 at 10:19

vikram71198

111
3

4

votes

0 answers

Is this a good approach to solving Atari's "Montezuma's Revenge"?

I'm new to Reinforcement Learning. For an internship, I am currently training Atari's "Montezuma's Revenge" using a double Deep Q-Network with Hindsight Experience Replay (HER) (see also this article). HER is supposed to alleviate the reward…

reinforcement-learning dqn deep-rl hindsight-experience-replay random-network-distillation

asked Mar 13 '20 at 10:58

vikram71198

111
3

3

votes

1 answer

How does Hindsight Experience Replay learn from unsuccessful trajectories?

I am confused by how HER learns from unsuccessful trajectories. I understand that from failed trajectories it creates 'fake' goals that it can learn from. Ignoring HER for now, if in the case where the robotic arm reaches the goal correctly, then…

reinforcement-learning deep-rl hindsight-experience-replay

asked Nov 14 '18 at 11:47

piccolo

173
6

3

votes

0 answers

How can PPO be combined with HER?

I ask because PPO is apparently an on-policy algorithm & the HER paper says that it can be combine with any off-policy algorithm. Yet I see GitHub projects that have combined them somehow? How is this done? And is it reasonable?

reinforcement-learning proximal-policy-optimization hindsight-experience-replay

asked Jan 17 '22 at 19:53

profPlum

496
2
10

3

votes

1 answer

What is the difference between success rate and reward when dealing with binary and sparse rewards?

In OpenAI Gym "reward" is defined as: reward (float): amount of reward achieved by the previous action. The scale varies between environments, but the goal is always to increase your total reward. I am training Hindsight Experience Replay on Fetch…

reinforcement-learning terminology papers reward-functions hindsight-experience-replay

asked Oct 30 '19 at 12:09

rrz0

273
2
7

2

votes

0 answers

How does Hindsight Experience Replay cope with multiple goals?

What if there are multiple goals? For example, let's consider the bit-flipping environment as described in the paper HER with one small change: now, the goal is not some specific configuration, but let's say for the last $m$ bits (e.g. $m=2$), I do…

reinforcement-learning deep-rl state-spaces hindsight-experience-replay

asked Mar 19 '19 at 15:59

Savco

61
1

1

vote

2 answers

Why does HER not work with on-policy RL algorithms?

I'm wondering because I don't appreciate what is wrong with just applying HER to an otherwise on-policy algorithm? Like if we do that will the training stability just fall apart? And if so why? My understanding is that on-policy is just a category…

reinforcement-learning off-policy-methods on-policy-methods hindsight-experience-replay

asked Jul 06 '23 at 20:20

profPlum

496
2
10

1

vote

1 answer

What does $r : \mathcal{S} \times \mathcal{A} \rightarrow \mathbb{R}$ mean in the article Hindsight Experience Replay, section 2.1?

Taken from section 2.1 in the article: We consider the standard reinforcement learning formalism consisting of an agent interacting with an environment. To simplify the exposition we assume that the environment is fully observable. An environment…

reinforcement-learning math papers hindsight-experience-replay

asked Sep 14 '20 at 07:18

WinnieThePooh

113
3

1

vote

0 answers

Why would DDPG with Hindsight Experience Replay not converge?

I am trying to train a DDPG agent augmented with Hindsight Experience Replay (HER) to solve the KukaGymEnv environment. The actor and critic are simple neural networks with two hidden layers (as in the HER paper). More precisely, the…

reinforcement-learning deep-rl rewards ddpg hindsight-experience-replay

asked Sep 01 '20 at 06:54

Vedant Shah

125
1
7

1

vote

1 answer

What do the state features of KukaGymEnv represent?

I trying to use DDPG augmented with Hindsight Experience Replay (HER) on pybullet's KukaGymEnv. To formulate the feature vector for the goal state, I need to know what the features of the state of the environment represent. To be precise, a typical…

reinforcement-learning ddpg hindsight-experience-replay state-spaces pybullet

asked Aug 18 '20 at 06:36

Vedant Shah

125
1
7

0

votes

1 answer

A question about the reward calculation in the Hindsight Experience Replay algorithm

I' m try to implement the HER algorithm from scratch in order to use it in the PandaReach-v3 environment. I already developed the same algorithm for the bitflip environment and it works as expected. So, what's now the problem? The problem is the…

reward-functions off-policy-methods hindsight-experience-replay sparse-rewards

asked Mar 18 '24 at 16:22

Dave

214
1
11

Questions tagged [hindsight-experience-replay]