For questions related to the asynchronous advantage actor-critic (A3C) algorithm.
Questions tagged [a3c]
13 questions
5
votes
1 answer
How does being on-policy prevent us from using the replay buffer with the policy gradients?
One of the approaches to improving the stability of the Policy
Gradient family of methods is to use multiple environments in
parallel. The reason behind this is the fundamental problem we
discussed in Chapter 6, Deep Q-Network, when we talked about…
jgauth
- 261
- 1
- 13
4
votes
0 answers
Can deep successor representations be used with the A3C algorithm?
Deep Successor Representations(DSR) has given better performance in tasks like navigation, when compared to normal model-free RL tasks. Basically, DSR is a hybrid of model-free RL and model-based RL. But the original work has only used value-based…
Shamane Siriwardhana
- 191
- 6
3
votes
1 answer
What is the pros and cons of increasing and decreasing the number of worker processes in A3C?
In A3C, there are several child processes and one master process. The child precesses calculate the loss and backpropagation, and the master process sums them up and updates the parameters, if I understand it correctly.
But I wonder how I should…
Blaszard
- 1,097
- 4
- 11
- 25
2
votes
1 answer
Implementing A3C for CarRacing-v3 continuous action case
The problem I am facing right now is tying the theory from Sutton & Barto about advantage actor critic to the implementation of A3C I read here.
From what I understand:
The critic network (value function) loss function is given by: $L_V =…
DeadAsDuck
- 103
- 6
2
votes
1 answer
Why do we also need to normalize the action's values on continuous action spaces?
I was reading here tips & tricks for training in DRL and I noticed the following:
always normalize your observation space when you can, i.e., when you know the boundaries
normalize your action space and make it symmetric when continuous (cf…
mkanakis
- 175
- 1
- 6
1
vote
0 answers
Understanding loss function gradient in asynchronous advantage actor-critic (A3C) algorithm
This is a question I posted here. I am asking it on this StackExchange branch as well, so that more people who could potentially answer get to see the question.
In the A3C algorithm from the original paper:
the gradient with respect to log policy…
Kagaratsch
- 111
- 2
1
vote
1 answer
How do I create a custom gym environment based on an image?
I am trying to create my own gym environment for the A3C algorithm (one implementation is here). The custom environment is a simple login form for any site. I want to create an environment from an image. The idea is to take a screenshot of the web…
Ren
- 21
- 4
1
vote
0 answers
is it ok to take random actions while training a3c as in below code
i am trying to train an A3C algorithm but I am getting same output in the multinomial function.
can I train the A3C with random actions as in below code.
can someone expert comment.
while count
user2783767
- 121
- 2
1
vote
0 answers
When past states contain useful information, does A3C perform better than TD3, given that TD3 does not use an LSTM?
I am trying to build an AI that needs to have some information about the past states as well. Therefore, LSTMs are suitable for this.
Now, I want to know that for a problem/game like Breakout, where we require previous states as well, does A3C…
user2783767
- 121
- 2
1
vote
0 answers
How should I deal with variable batch size in A3C?
I am fairly new to reinforcement learning (RL) and deep RL. I have been trying to create my first agent (using A3C) that selects an optimal path with the reward being some associated completion time (the more optimal the path is, packets will be…
mkanakis
- 175
- 1
- 6
0
votes
0 answers
Tensorflow-gpu and multiprocessing
I have finished implementing an Asynchronous Advantage Actor-Critic (A3C) agent for TensorFlow (gpu). By using a single RMSprop optimizer with shared statistics. To do so, a central controller holds both the Global Network (ActorCriticModel) and the…
Lyn Cassidy
- 1
- 1
0
votes
1 answer
Why would the reward of A3C with LSTM suddenly drop off after many episodes?
I am training an A3C with stacked LSTM.
During initial training, my model was giving descent +ve reward. However, after many episodes, its reward just goes to zero and is continuing for a long time. Is it because of LSTM?
Is it normal?
Should I…
user2783767
- 121
- 2
0
votes
2 answers
Why I got the same action when testing the A2C?
I'm working on an advantage actor-critic (A2C) reinforcement learning model, but when I test the model after I trained for 3500 episodes, I start to get almost the same action for all testing episodes. While if I trained the system for less than 850…
I_Al-thamary
- 52
- 1
- 16