Highest Voted Questions - Artificial Intelligence Stack Exchange

5

votes

2 answers

Policy gradient: Does it use the Markov property?

To derive the policy gradient, we start by writing the equation for the probability of a certain trajectory (e.g. see spinningup tutorial): $$ \begin{align} P_\theta(\tau) &= P_\theta(s_0, a_0, s_1, a_1, \dots, s_T, a_T) \\ & = p(s_0) \prod_{i=0}^T…

reinforcement-learning deep-rl policy-gradients markov-property

asked Dec 20 '20 at 23:37

Gerges

151
3

5

votes

3 answers

What's the difference between architectures and backbones?

In the paper "ForestNet: Classifying Drivers of Deforestation in Indonesia using Deep Learning on Satellite Imagery", the authors talk about using: Feature Pyramid Networks (as the architecture) EfficientNet-B2 (as the backbone) Performance…

deep-learning convolutional-neural-networks comparison terminology architecture

asked Dec 18 '20 at 13:57

codinggirl123

51
2

5

votes

1 answer

Multi Armed Bandits with large number of arms

I'm dealing with a (stochastic) Multi Armed Bandit (MAB) with a large number of arms. Consider a pizza machine that produces a pizza depending on an input $i$ (equivalent to an arm). The (finite) set of arms $K$ is given by $K=X_1\times X_2 \times…

reinforcement-learning multi-armed-bandits epsilon-greedy-policy upper-confidence-bound

asked Dec 16 '20 at 16:03

D. B.

101
1
7

5

votes

2 answers

How to deal with the time delay in reinforcement learning?

I have a question regarding the time delay in reinforcement learning (RL). In the RL, one has state, reward and action. It is usually assumed that (as far as I understand it) when the action is executed on the system, the state changes immediately…

reinforcement-learning time-series delayed-rewards

asked Dec 14 '20 at 02:02

jengmge

51
1
2

5

votes

1 answer

Is there anything that ensures that convolutional filters don't end up the same?

I trained a simple model to recognize handwritten numbers from the mnist dataset. Here it is: model = Sequential([ Conv2D(filters=1, kernel_size=(3,1), padding='valid', strides=1, input_shape=(28, 28, 1)), Flatten(), Dense(10,…

neural-networks convolutional-neural-networks convolution filters

asked Dec 11 '20 at 03:54

mark mark

813
6
25

5

votes

1 answer

Why is second-order backpropagation useful?

Raul Rojas's book on Neural Networks dedicates section 8.4.3 to explaining how to do second-order backpropagation, that is, computing the Hessian of the error function with respect to two weights at a time. What problems are easier to solve using…

neural-networks backpropagation optimization numerical-algorithms

asked Dec 10 '20 at 16:27

EmmanuelMess

227
3
16

5

votes

1 answer

What is the gradient of an attention unit?

The paper Attention Is All You Need describes the Transformer architecture, which describes attention as a function of the queries $Q = x W^Q$, keys $K = x W^K$, and values $V = x W^V$: $\text{Attention(Q, K, V)} = \text{softmax}\left(…

neural-networks natural-language-processing gradient-descent transformer attention

asked Dec 08 '20 at 00:49

user3667125

1,700
9
16

5

votes

2 answers

Transformers: how does the decoder final layer output the desired token?

In the paper Attention Is All You Need, this section confuses me: In our model, we share the same weight matrix between the two embedding layers [in the encoding section] and the pre-softmax linear transformation [output of the decoding…

natural-language-processing transformer bert attention

asked Dec 06 '20 at 08:25

user3667125

1,700
9
16

5

votes

1 answer

Can AlphaFold predict proteins with metals well?

There are certain proteins that contain metal components, known as metalloproteins. Commonly, the metal is at the active site which needs the most prediction precision. Typically, there is only one (or a few) metals in a protein, which contains far…

machine-learning alpha-fold

asked Dec 06 '20 at 03:00

jw_

199
1
5

5

votes

2 answers

How to detect a full-fledged self-aware AI?

The premise: A full-fledged self-aware artificial intelligence may have come to exist in a distributed environment like the internet. The possible A.I. in question may be quite unwilling to reveal itself. The question: Given a first initial…

agi intelligence-testing self-awareness

asked Dec 18 '16 at 10:13

user4327

61
5

5

votes

1 answer

Why does off-policy learning outperform on-policy learning?

I am self-studying about Reinforcement Learning using different online resources. I now have a basic understanding of how RL works. I saw this in a book: Q-learning is an off-policy learner. An off-policy learner learns the value of an optimal…

reinforcement-learning comparison q-learning off-policy-methods on-policy-methods

asked Nov 26 '20 at 03:14

Exploring

371
7
18

5

votes

1 answer

Why would a VAE train much better with batch sizes closer to 1 over batch size of 100+?

I've been training a VAE to reconstruct human names and when I train it on a batch size of 100+ after about 5 hours of training it tends to just output the same thing regardless of the input and I'm using teacher forcing as well. When I use a lower…

variational-autoencoder batch-size batch-learning

asked Nov 19 '20 at 12:51

user8714896

825
1
9
24

5

votes

2 answers

Given two optimal policies, is an affine combination of them also optimal?

If there are two different optimal policies $\pi_1, \pi_2$ in a reinforcement learning task, will the linear combination (or affine combination) of the two policies $\alpha \pi_1 + \beta \pi_2, \alpha + \beta = 1$ also be an optimal policy? Here I…

reinforcement-learning proofs policies optimal-policy optimality

asked Nov 18 '20 at 07:04

yang liu

53
4

5

votes

4 answers

Which methods or algorithms to develop a learning application?

I am creating a game application that will generate a new level based on the performance of the user in the previous level. The application is regarding language improvement, to be precise. Suppose the user performed well in grammar-related…

machine-learning deep-learning algorithm ai-design path-planning

asked Dec 07 '16 at 17:10

Abdallah .E Abdallah

51
2

5

votes

0 answers

What is the justification for Kaiming He initialization?

I've been trying to understand where the formulas for Xavier and Kaiming He initialization come from. My understanding is that these initialization schemes come from a desire to keep the gradients stable during back-propagation (avoiding…

deep-learning weights-initialization

asked Oct 28 '20 at 18:26

Jack M

302
2
9

Most Popular