Highest Voted Questions - Artificial Intelligence Stack Exchange

8

votes

2 answers

How does AlphaZero's MCTS work when starting from the root node?

From the AlphaGo Zero paper, during MCTS, statistics for each new node are initialized as such: ${N(s_L, a) = 0, W (s_L, a) = 0, Q(s_L, a) = 0, P (s_L, a) = p_a}$. The PUCT algorithm for selecting the best child node is $a_t = argmax(Q(s,a) +…

deep-rl monte-carlo-tree-search alphazero

asked Dec 30 '20 at 02:03

sb3

167
1
7

8

votes

1 answer

Validation accuracy higher than training accurarcy

I implemented the unet in TensorFlow for the segmentation of MRI images of the thigh. I noticed I always get a higher validation accuracy by a small gap, independently of the initial split. One example: So I researched when this could be…

tensorflow keras image-segmentation u-net

asked Dec 27 '20 at 17:46

Lis Louise

139
4

8

votes

1 answer

Why is there a Uniform and Normal version of He / Xavier initialization in DL libraries?

Two of the most popular initialization schemes for neural network weights today are Xavier and He. Both methods propose random weight initialization with a variance dependent on the number of input and output units. Xavier proposes $$W \sim…

neural-networks training weights weights-initialization

asked Dec 17 '20 at 23:15

Tinu

628
1
4
14

8

votes

2 answers

Why is KL divergence used so often in Machine Learning?

The KL Divergence is quite easy to compute in closed form for simple distributions -such as Gaussians- but has some not-very-nice properties. For example, it is not symmetrical (thus it is not a metric) and it does not respect the triangular…

probability-distribution kl-divergence wasserstein-metric total-variational-distance

asked Dec 15 '20 at 14:20

Federico Taschin

253
2
8

8

votes

1 answer

How is the DQN loss derived from (or theoretically motivated by) the Bellman equation, and how is it related to the Q-learning update?

I'm doing a project on Reinforcement Learning. I programmed an agent that uses DDQN. There are a lot of tutorials on that, so the code implementation was not that hard. However, I have problems understanding how one should come up with this kind of…

reinforcement-learning q-learning dqn objective-functions bellman-equations

asked Dec 09 '20 at 18:28

Yves Boutellier

183
6

8

votes

1 answer

What is the cost function of a transformer?

The paper Attention Is All You Need describes the transformer architecture that has an encoder and a decoder. However, I wasn't clear on what the cost function to minimize is for such an architecture. Consider a translation task, for example, where…

neural-networks natural-language-processing objective-functions transformer attention

asked Dec 07 '20 at 23:18

user3667125

1,700
9
16

8

votes

2 answers

What is the difference between the positional encoding techniques of the Transformer and GPT?

I know the original Transformer and the GPT (1-3) use two slightly different positional encoding techniques. More specifically, in GPT they say positional encoding is learned. What does that mean? OpenAI's papers don't go into detail very much. How…

comparison transformer gpt positional-encoding

asked Nov 23 '20 at 22:03

Leevo

305
2
9

8

votes

2 answers

Could there be existential threats to humanity due to AI?

We are doing research, spending hours figuring out how we can make real AI software (intelligent agents) to work better. We are also trying to implement some applications e.g. in business, health and education, using the AI technology. Nonetheless,…

human-like applications cyberterrorism

asked Dec 15 '16 at 07:00

quintumnia

1,173
2
10
35

8

votes

2 answers

How should we interpret this figure that relates the perceptron criterion and the hinge loss?

I am currently studying the textbook Neural Networks and Deep Learning by Charu C. Aggarwal. Chapter 1.2.1.2 Relationship with Support Vector Machines says the following: The perceptron criterion is a shifted version of the hinge-loss used in…

objective-functions support-vector-machine perceptron binary-classification hinge-loss

asked Nov 21 '20 at 13:19

The Pointer

611
5
22

8

votes

2 answers

Why is the perceptron criterion function differentiable?

I'm reading chapter one of the book called Neural Networks and Deep Learning from Aggarwal. In section 1.2.1.1 of the book, I'm learning about the perceptron. One thing that book says is, if we use the sign function for the following loss function:…

optimization gradient-descent perceptron

asked Oct 27 '20 at 00:38

Flávio Mendes

83
4

8

votes

1 answer

Is there a connection between the bias term in a linear regression model and the bias that can lead to under-fitting?

Here is a linear regression model $$y = mx + b,$$ where $b$ is known as $y$-intercept, but also known as the bias [1], $m$ is the slope, and $x$ is the feature vector. As I understood, in machine learning, there is also the bias that can cause the…

machine-learning linear-regression bias-variance-tradeoff underfitting inductive-bias

asked Sep 26 '20 at 16:55

Sivaram Rasathurai

326
2
10

8

votes

1 answer

Why is the learning rate generally beneath 1?

In all examples I've ever seen, the learning rate of an optimisation method is always less than $1$. However, I've never found an explanation as to why this is. In addition to that, there are some cases where having a learning rate bigger than 1 is…

machine-learning optimization gradient-descent learning-rate stochastic-gradient-descent

asked Sep 25 '20 at 03:40

Recessive

1,446
10
21

8

votes

1 answer

Which loss function should I use in REINFORCE, and what are the labels?

I understand that this is the update for the parameters of a policy in REINFORCE: $$ \Delta \theta_{t}=\alpha \nabla_{\theta} \log \pi_{\theta}\left(a_{t} \mid s_{t}\right) v_{t}, $$ where $v_t$ is usually the discounted future reward and …

reinforcement-learning backpropagation policy-gradients reinforce cross-entropy

asked Sep 16 '20 at 15:09

S2673

600
4
17

8

votes

1 answer

Why do the standard and deterministic Policy Gradient Theorems differ in their treatment of the derivatives of $R$ and the conditional probability?

I would like to understand the difference between the standard policy gradient theorem and the deterministic policy gradient theorem. These two theorem are quite different, although the only difference is whether the policy function is deterministic…

reinforcement-learning policy-gradients policy-gradient-theorem deterministic-pg-theorem theorems

asked Aug 04 '20 at 07:10

fabian

183
4

8

votes

2 answers

What are some best practices when trying to design a reward function?

Generally speaking, is there a best-practice procedure to follow when trying to define a reward function for a reinforcement-learning agent? What common pitfalls are there when defining the reward function, and how should you avoid them? What…

reinforcement-learning reward-design reward-functions reward-shaping inverse-rl

asked Aug 03 '20 at 16:30

naisuu42

195
1
1
8

Most Popular