Highest Voted 'sparse-rewards' Questions - Artificial Intelligence Stack Exchange

6

votes

1 answer

How to improve the reward signal when the rewards are sparse?

In cases where the reward is delayed, this can negatively impact a models ability to do proper credit assignment. In the case of a sparse reward, are there ways in which this can be negated? In a chess example, there are certain moves that you can…

asked Feb 03 '21 at 18:17

tryingtolearn

395
1
2
10

6

votes

1 answer

What are the pros and cons of sparse and dense rewards in reinforcement learning?

From what I understand, if the rewards are sparse the agent will have to explore more to get rewards and learn the optimal policy, whereas if the rewards are dense in time, the agent is quickly guided towards its learning goal. Are the above…

reinforcement-learning comparison reward-functions sparse-rewards dense-rewards

asked Aug 13 '20 at 07:05

stoic-santiago

1,201
9
22

5

votes

1 answer

How does the optimization process in hindsight experience replay exactly work?

I was reading the following research paper Hindsight Experience Replay. This is the paper that introduces a concept called Hindsight Experience Replay (HER), which basically attempts to alleviate the infamous sparse reward problem. It is based on…

reinforcement-learning dqn deep-rl sparse-rewards hindsight-experience-replay

asked Mar 12 '20 at 10:19

vikram71198

111
3

4

votes

2 answers

How to apply Q-learning when rewards is only available at the last state?

I have a scheduling problem in which there are $n$ slots and $m$ clients. I am trying to solve the problem using Q-learning so I have made the following state-action model. A state $s_t$ is given by the current slot $t=1,2,\ldots,n$ and an action…

reinforcement-learning q-learning reward-functions sparse-rewards combinatorial-optimization

asked Aug 27 '20 at 14:17

zdm

309
2
9

4

votes

1 answer

Can reinforcement learning be used for tasks where only one final reward is received?

Is reinforcement learning problem adaptable to the setting when there is only one - final - reward. I am aware of problems with sparse and delayed rewards, but what about only one reward and a quite long path?

reinforcement-learning reward-design sparse-rewards credit-assignment-problem delayed-rewards

asked May 07 '19 at 15:50

TomR

903
6
18

3

votes

1 answer

Are there any reliable ways of modifying the reward function to make the rewards less sparse?

If I am training an agent to try and navigate a maze as fast as possible, a simple reward would be something like \begin{align} R(\text{terminal}) &= N - \text{time}\ \ , \ \ N \gg \text{everything} \\ R(\text{state})& = 0\ \ \text{if not…

reinforcement-learning rewards reward-shaping reward-design sparse-rewards

asked Jul 18 '19 at 04:12

Paradox

133
3

2

votes

0 answers

How does PPO with advantage normalization learn in MountainCar-v0 before first reaching the goal state?

I'm trying to figure out how PPO ever learns anything in a sparse environment like gymnasium's MountainCar-v0 before it first ever reaches the goal state. Specifically was looking at stable_baselines3's implementation of PPO env =…

proximal-policy-optimization gym stable-baselines sparse-rewards

asked Jan 20 '24 at 06:35

Switch

121
1

1

vote

1 answer

How do I compute the value function when the reward is only at the end in the context of actor-critic algorithms?

Consider the actor-critic reinforcement learning setting (actor and critic parameterized by a neural network). The reward is given only at the end of the episode (or when there is a timeout there is no reward). How could we learn the value function?…

reinforcement-learning actor-critic-methods reward-functions sparse-rewards

asked Oct 05 '21 at 08:31

cerebrou

161
1
3

0

votes

1 answer

A question about the reward calculation in the Hindsight Experience Replay algorithm

I' m try to implement the HER algorithm from scratch in order to use it in the PandaReach-v3 environment. I already developed the same algorithm for the bitflip environment and it works as expected. So, what's now the problem? The problem is the…

reward-functions off-policy-methods hindsight-experience-replay sparse-rewards

asked Mar 18 '24 at 16:22

Dave

214
1
11

0

votes

1 answer

How do I update Q-values in Q-learning when rewards may only be received after many actions?

I am working on a Q-learning system where the agent may well (and almost always) have to take many actions before a reward can be given to the agent (or more so, the notion of a reward in my context only becomes defined after many actions). How can…

q-learning sparse-rewards

asked Dec 19 '23 at 04:31

Hera Sutton

103
2

0

votes

0 answers

How does Proximal Policy Optimization deal with sparse reward

In the original paper, the objective of PPO is as follows:. My question is, how does this objective behave in a sparse reward setting (i.e., reward is only given after a sequence of actions were taken)? In this case we don't have $\hat{A}_{t}$…

proximal-policy-optimization sparse-rewards

asked Mar 04 '23 at 02:33

Sam

205
1
5

Questions tagged [sparse-rewards]