For questions about the Hindsight Experience Replay (HER), proposed in the paper "Hindsight Experience Replay" (2017) by Marcin Andrychowicz et al.
Questions tagged [hindsight-experience-replay]
11 questions
5
votes
1 answer
How does the optimization process in hindsight experience replay exactly work?
I was reading the following research paper Hindsight Experience Replay. This is the paper that introduces a concept called Hindsight Experience Replay (HER), which basically attempts to alleviate the infamous sparse reward problem. It is based on…
vikram71198
- 111
- 3
4
votes
0 answers
Is this a good approach to solving Atari's "Montezuma's Revenge"?
I'm new to Reinforcement Learning. For an internship, I am currently training Atari's "Montezuma's Revenge" using a double Deep Q-Network with Hindsight Experience Replay (HER) (see also this article).
HER is supposed to alleviate the reward…
vikram71198
- 111
- 3
3
votes
1 answer
How does Hindsight Experience Replay learn from unsuccessful trajectories?
I am confused by how HER learns from unsuccessful trajectories. I understand that from failed trajectories it creates 'fake' goals that it can learn from.
Ignoring HER for now, if in the case where the robotic arm reaches the goal correctly, then…
piccolo
- 173
- 6
3
votes
0 answers
How can PPO be combined with HER?
I ask because PPO is apparently an on-policy algorithm & the HER paper says that it can be combine with any off-policy algorithm. Yet I see GitHub projects that have combined them somehow?
How is this done? And is it reasonable?
profPlum
- 496
- 2
- 10
3
votes
1 answer
What is the difference between success rate and reward when dealing with binary and sparse rewards?
In OpenAI Gym "reward" is defined as:
reward (float): amount of reward achieved by the previous action. The
scale varies between environments, but the goal is always to increase
your total reward.
I am training Hindsight Experience Replay on Fetch…
rrz0
- 273
- 2
- 7
2
votes
0 answers
How does Hindsight Experience Replay cope with multiple goals?
What if there are multiple goals? For example, let's consider the bit-flipping environment as described in the paper HER with one small change: now, the goal is not some specific configuration, but let's say for the last $m$ bits (e.g. $m=2$), I do…
Savco
- 61
- 1
1
vote
2 answers
Why does HER not work with on-policy RL algorithms?
I'm wondering because I don't appreciate what is wrong with just applying HER to an otherwise on-policy algorithm? Like if we do that will the training stability just fall apart? And if so why? My understanding is that on-policy is just a category…
profPlum
- 496
- 2
- 10
1
vote
1 answer
What does $r : \mathcal{S} \times \mathcal{A} \rightarrow \mathbb{R}$ mean in the article Hindsight Experience Replay, section 2.1?
Taken from section 2.1 in the article:
We consider the standard reinforcement learning formalism consisting of an agent interacting with an environment. To simplify the exposition we assume that the environment is fully observable. An environment…
WinnieThePooh
- 113
- 3
1
vote
0 answers
Why would DDPG with Hindsight Experience Replay not converge?
I am trying to train a DDPG agent augmented with Hindsight Experience Replay (HER) to solve the KukaGymEnv environment. The actor and critic are simple neural networks with two hidden layers (as in the HER paper).
More precisely, the…
Vedant Shah
- 125
- 1
- 7
1
vote
1 answer
What do the state features of KukaGymEnv represent?
I trying to use DDPG augmented with Hindsight Experience Replay (HER) on pybullet's KukaGymEnv.
To formulate the feature vector for the goal state, I need to know what the features of the state of the environment represent. To be precise, a typical…
Vedant Shah
- 125
- 1
- 7
0
votes
1 answer
A question about the reward calculation in the Hindsight Experience Replay algorithm
I' m try to implement the HER algorithm from scratch in order to use it in the PandaReach-v3 environment.
I already developed the same algorithm for the bitflip environment and it works as expected.
So, what's now the problem?
The problem is the…
Dave
- 214
- 1
- 11