Most Popular
1500 questions
5
votes
2 answers
Policy gradient: Does it use the Markov property?
To derive the policy gradient, we start by writing the equation for the probability of a certain trajectory (e.g. see spinningup tutorial):
$$
\begin{align}
P_\theta(\tau) &= P_\theta(s_0, a_0, s_1, a_1, \dots, s_T, a_T) \\
& = p(s_0) \prod_{i=0}^T…
Gerges
- 151
- 3
5
votes
3 answers
What's the difference between architectures and backbones?
In the paper "ForestNet: Classifying Drivers of Deforestation in Indonesia using Deep Learning on Satellite Imagery", the authors talk about using:
Feature Pyramid Networks (as the architecture)
EfficientNet-B2 (as the backbone)
Performance…
codinggirl123
- 51
- 2
5
votes
1 answer
Multi Armed Bandits with large number of arms
I'm dealing with a (stochastic) Multi Armed Bandit (MAB) with a large number of arms.
Consider a pizza machine that produces a pizza depending on an input $i$ (equivalent to an arm). The (finite) set of arms $K$ is given by $K=X_1\times X_2 \times…
D. B.
- 101
- 1
- 7
5
votes
2 answers
How to deal with the time delay in reinforcement learning?
I have a question regarding the time delay in reinforcement learning (RL).
In the RL, one has state, reward and action. It is usually assumed that (as far as I understand it) when the action is executed on the system, the state changes immediately…
jengmge
- 51
- 1
- 2
5
votes
1 answer
Is there anything that ensures that convolutional filters don't end up the same?
I trained a simple model to recognize handwritten numbers from the mnist dataset. Here it is:
model = Sequential([
Conv2D(filters=1, kernel_size=(3,1), padding='valid', strides=1, input_shape=(28, 28, 1)),
Flatten(),
Dense(10,…
mark mark
- 813
- 6
- 25
5
votes
1 answer
Why is second-order backpropagation useful?
Raul Rojas's book on Neural Networks dedicates section 8.4.3 to explaining how to do second-order backpropagation, that is, computing the Hessian of the error function with respect to two weights at a time.
What problems are easier to solve using…
EmmanuelMess
- 227
- 3
- 16
5
votes
1 answer
What is the gradient of an attention unit?
The paper Attention Is All You Need describes the Transformer architecture, which describes attention as a function of the queries $Q = x W^Q$, keys $K = x W^K$, and values $V = x W^V$:
$\text{Attention(Q, K, V)} = \text{softmax}\left(…
user3667125
- 1,700
- 9
- 16
5
votes
2 answers
Transformers: how does the decoder final layer output the desired token?
In the paper Attention Is All You Need, this section confuses me:
In our model, we share the same weight matrix between the two embedding layers [in the encoding section] and the pre-softmax linear transformation [output of the decoding…
user3667125
- 1,700
- 9
- 16
5
votes
1 answer
Can AlphaFold predict proteins with metals well?
There are certain proteins that contain metal components, known as metalloproteins. Commonly, the metal is at the active site which needs the most prediction precision. Typically, there is only one (or a few) metals in a protein, which contains far…
jw_
- 199
- 1
- 5
5
votes
2 answers
How to detect a full-fledged self-aware AI?
The premise: A full-fledged self-aware artificial intelligence may have come to exist in a distributed environment like the internet. The possible A.I. in question may be quite unwilling to reveal itself.
The question: Given a first initial…
user4327
- 61
- 5
5
votes
1 answer
Why does off-policy learning outperform on-policy learning?
I am self-studying about Reinforcement Learning using different online resources. I now have a basic understanding of how RL works.
I saw this in a book:
Q-learning is an off-policy learner. An off-policy learner learns the value of an optimal…
Exploring
- 371
- 7
- 18
5
votes
1 answer
Why would a VAE train much better with batch sizes closer to 1 over batch size of 100+?
I've been training a VAE to reconstruct human names and when I train it on a batch size of 100+ after about 5 hours of training it tends to just output the same thing regardless of the input and I'm using teacher forcing as well. When I use a lower…
user8714896
- 825
- 1
- 9
- 24
5
votes
2 answers
Given two optimal policies, is an affine combination of them also optimal?
If there are two different optimal policies $\pi_1, \pi_2$ in a reinforcement learning task, will the linear combination (or affine combination) of the two policies $\alpha \pi_1 + \beta \pi_2, \alpha + \beta = 1$ also be an optimal policy?
Here I…
yang liu
- 53
- 4
5
votes
4 answers
Which methods or algorithms to develop a learning application?
I am creating a game application that will generate a new level based on the performance of the user in the previous level.
The application is regarding language improvement, to be precise. Suppose the user performed well in grammar-related…
Abdallah .E Abdallah
- 51
- 2
5
votes
0 answers
What is the justification for Kaiming He initialization?
I've been trying to understand where the formulas for Xavier and Kaiming He initialization come from. My understanding is that these initialization schemes come from a desire to keep the gradients stable during back-propagation (avoiding…
Jack M
- 302
- 2
- 9