Questions tagged [td3]

For questions related to the Twin Delayed Deep Deterministic policy gradient algorithm (TD3).

8 questions
5
votes
0 answers

Optimal episode length in reinforcement learning

I have a custom environment for stock trading where an episode can be as long as 2000-3000 steps. I've run several experiments with td3 and sac algorithms, average reward per episode flattens after few episodes. I believe average reward per episode…
1
vote
1 answer

Can action be dominated by state features in actor-critic algorithms?

I have a case where my state consists of relatively large number of features, e.g. 50, whereas my action size is 1. I wonder whether my state features dominate the action in my critic network. I believe in theory eventually it shouldn't matter but…
Mika
  • 371
  • 2
  • 10
1
vote
0 answers

If we have a working reward function, would adding another action have a significant effect on the agent performance if task remains the same?

If we have a working reward function, providing the desired behavior and optimal policy in a continuous action/state-space problem, would adding another action significantly affect the possible optimal policy? For example, assume you have an RL…
0
votes
1 answer

Why is my agent stuck on the same action in my Twin Delayed Deep Deterministic Policy Gradient (TD3) program?

I've been tirelessly converting a reinforcement learning program from Python to JavaScript using TensorFlow.js that is running Twin Delayed Deep Deterministic Policy Gradient (TD3). I'm just trying to make a basic blueprint for myself and the…
0
votes
0 answers

Training a RL agent using different data at each episode

I am training a RL agent whose state is composed of two numbers, ranging between 4 ~ 16 and 0 ~ 360. The action is continuous and between 0~90. In real life, the states can be any I am training a TD3 agent using the stable baselines library. In real…
0
votes
0 answers

Is it possible to use Softmax as an activation function for actor (policy) network in TD3 or SAC Reinforcement learning algorithms?

As I understand from literature, normally, the last activation in an actor (policy) network in TD3 and SAC algorithms is a Tanh function, which is scaled by a certain limit. My action vector is perfectly described as a vector, where all values are…
-1
votes
1 answer

TD3 sticking to end values

I am using TD3 on a custom gym environment, but the problem is that the action values stick to the end. Sticking to the end values makes reward negative, to be positive it must find action values somewhere in the mid. But, the agent doesn't learn…
K_197
  • 1
  • 3