Questions tagged [reward-design]

For questions about designing (or defining) reward functions e.g. for reinforcement learning problems.

49 questions
43
votes
5 answers

How should I handle invalid actions (when using REINFORCE)?

I want to create an AI which can play five-in-a-row/Gomoku. I want to use reinforcement learning for this. I use the policy gradient method, namely REINFORCE, with baseline. For the value and policy function approximation, I use a neural network. It…
14
votes
1 answer

How could I use reinforcement learning to solve a chess-like board game?

I invented a chess-like board game. I built an engine so that it can play autonomously. The engine is basically a decision tree. It's composed by: A search function that at each node finds all possible legal moves An evaluation function that…
12
votes
4 answers

Counterexamples to the reward hypothesis

On Sutton and Barto's RL book, the reward hypothesis is stated as that all of what we mean by goals and purposes can be well thought of as the maximization of the expected value of the cumulative sum of a received scalar signal (called…
9
votes
2 answers

How do we define the reward function for an environment?

How do you actually decide what reward value to give for each action in a given state for an environment? Is this purely experimental and down to the programmer of the environment? So, is it a heuristic approach of simply trying different reward…
8
votes
2 answers

What are some best practices when trying to design a reward function?

Generally speaking, is there a best-practice procedure to follow when trying to define a reward function for a reinforcement-learning agent? What common pitfalls are there when defining the reward function, and how should you avoid them? What…
8
votes
1 answer

Suitable reward function for trading buy and sell orders

I am working to build a deep reinforcement learning agent which can place orders (i.e. limit buy and limit sell orders). The actions are {"Buy": 0 , "Do Nothing": 1, "Sell": 2}. Suppose that all the features are well suited for this task. I wanted…
6
votes
1 answer

How should I handle invalid actions in a grid world?

I'm building a really simple experiment, where I let an agent move from the bottom-left corner to the upper-right corner of a $3 \times 3$ grid world. I plan to use DQN to do this. I'm having trouble handling the starting point: what if the Q…
5
votes
0 answers

How define a reward function for a humanoid agent whose goal is to stand up from the ground?

I'm trying to teach a humanoid agent how to stand up after falling. The episode starts with the agent lying on the floor with its back touching the ground, and its goal is to stand up in the shortest amount of time. But I'm having trouble in regards…
5
votes
1 answer

Can the rewards be stochastic when the transition model is deterministic?

Suppose we have a deterministic environment where knowing $s,a$ determines $s'$. Is it possible to get two different rewards $r\neq r'$ in some state $s_{\text{fixed}}$? Assume that $s_{\text{fixed}}$ is a fixed state I get to after taking the…
4
votes
1 answer

Is a reward given at every step or only given when the RL agent fails or succeeds?

In reinforcement learning, an agent can receive a positive reward for correct actions and a negative reward for wrong actions, but does the agent also receive rewards for every other step/action?
4
votes
2 answers

Can rewards be decomposed into components?

I'm training a robot to walk to a specific $(x, y)$ point using TD3, and, for simplicity, I have something like reward = distance_x + distance_y + standing_up_straight, and then it adds this reward to the replay buffer. However, I think that it…
4
votes
1 answer

How to avoid rapid actuator movements in favor of smooth movements in a continuous space and action space problem?

I'm working on a continuous state / continuous action controller. It shall control a certain roll angle of an aircraft by issuing the correct aileron commands (in $[-1, 1]$). To this end, I use a neural network and the DDPG algorithm, which shows…
4
votes
1 answer

Can reinforcement learning be used for tasks where only one final reward is received?

Is reinforcement learning problem adaptable to the setting when there is only one - final - reward. I am aware of problems with sparse and delayed rewards, but what about only one reward and a quite long path?
4
votes
1 answer

Expressing Arbitrary Reward Functions as Potential-Based Advice (PBA)

I am trying to reproduce the results for the simple grid-world environment in [1]. But it turns out that using a dynamically learned PBA makes the performance worse and I cannot obtain the results shown in Figure 1 (a) in [1] (with the same…
3
votes
2 answers

What should I do when the potential value of a state is too high?

I'm working on a Reinforcement Learning task where I use reward shaping as proposed in the paper Policy invariance under reward transformations: Theory and application to reward shaping (1999) by Andrew Y. Ng, Daishi Harada and Stuart Russell. In…
1
2 3 4