Highest Voted 'td3' Questions - Artificial Intelligence Stack Exchange

5

votes

0 answers

Optimal episode length in reinforcement learning

I have a custom environment for stock trading where an episode can be as long as 2000-3000 steps. I've run several experiments with td3 and sac algorithms, average reward per episode flattens after few episodes. I believe average reward per episode…

asked May 28 '21 at 15:48

Mika

371
2
10

1

vote

1 answer

Can action be dominated by state features in actor-critic algorithms?

I have a case where my state consists of relatively large number of features, e.g. 50, whereas my action size is 1. I wonder whether my state features dominate the action in my critic network. I believe in theory eventually it shouldn't matter but…

deep-rl actor-critic-methods soft-actor-critic td3

asked Mar 02 '23 at 21:35

Mika

371
2
10

1

vote

0 answers

If we have a working reward function, would adding another action have a significant effect on the agent performance if task remains the same?

If we have a working reward function, providing the desired behavior and optimal policy in a continuous action/state-space problem, would adding another action significantly affect the possible optimal policy? For example, assume you have an RL…

reinforcement-learning deep-rl reward-functions action-spaces td3

asked May 15 '22 at 12:57

Philori

11
1

0

votes

1 answer

Why is my agent stuck on the same action in my Twin Delayed Deep Deterministic Policy Gradient (TD3) program?

I've been tirelessly converting a reinforcement learning program from Python to JavaScript using TensorFlow.js that is running Twin Delayed Deep Deterministic Policy Gradient (TD3). I'm just trying to make a basic blueprint for myself and the…

reinforcement-learning machine-learning actor-critic-methods javascript td3

asked Nov 11 '23 at 04:12

CloudZero

1

0

votes

0 answers

Training a RL agent using different data at each episode

I am training a RL agent whose state is composed of two numbers, ranging between 4 ~ 16 and 0 ~ 360. The action is continuous and between 0~90. In real life, the states can be any I am training a TD3 agent using the stable baselines library. In real…

reinforcement-learning deep-rl continuous-action-spaces td3 continuous-state-spaces

asked Jan 26 '22 at 11:12

Leibniz

69
5

0

votes

0 answers

Is it possible to use Softmax as an activation function for actor (policy) network in TD3 or SAC Reinforcement learning algorithms?

As I understand from literature, normally, the last activation in an actor (policy) network in TD3 and SAC algorithms is a Tanh function, which is scaled by a certain limit. My action vector is perfectly described as a vector, where all values are…

reinforcement-learning deep-rl activation-functions soft-actor-critic td3

asked Aug 09 '21 at 17:34

Bi0max

101
1

-1

votes

1 answer

TD3 sticking to end values

I am using TD3 on a custom gym environment, but the problem is that the action values stick to the end. Sticking to the end values makes reward negative, to be positive it must find action values somewhere in the mid. But, the agent doesn't learn…

reinforcement-learning td3

asked Jul 06 '21 at 13:30

K_197

1
3

Questions tagged [td3]

Optimal episode length in reinforcement learning

Can action be dominated by state features in actor-critic algorithms?

If we have a working reward function, would adding another action have a significant effect on the agent performance if task remains the same?

Why is my agent stuck on the same action in my Twin Delayed Deep Deterministic Policy Gradient (TD3) program?

Training a RL agent using different data at each episode

Is it possible to use Softmax as an activation function for actor (policy) network in TD3 or SAC Reinforcement learning algorithms?

TD3 sticking to end values