Highest Voted Questions - Artificial Intelligence Stack Exchange

12

votes

3 answers

Why is dot product attention faster than additive attention?

In section 3.2.1 of Attention Is All You Need the claim is made that: Dot-product attention is identical to our algorithm, except for the scaling factor of $\frac{1}{\sqrt{d_k}}$. Additive attention computes the compatibility function using a…

neural-networks machine-learning deep-learning attention

asked Apr 17 '19 at 04:38

user3180

648
5
15

12

votes

3 answers

Is REINFORCE the same as 'vanilla policy gradient'?

I don't know what people mean by 'vanilla policy gradient', but what comes to mind is REINFORCE, which is the simplest policy gradient algorithm I can think of. Is this an accurate statement? By REINFORCE I mean this surrogate objective $$…

reinforcement-learning comparison terminology policy-gradients reinforce

asked Mar 27 '19 at 13:01

yewang

361
2
7

12

votes

5 answers

What is "backprop"?

What does "backprop" mean? Is the "backprop" term basically the same as "backpropagation" or does it have a different meaning?

neural-networks backpropagation terminology definitions

asked Aug 02 '16 at 15:39

kenorb

10,525
6
45
95

11

votes

3 answers

Is it difficult to learn the rotated bounding box for a (rotated) object?

I have checked out many methods and papers, like YOLO, SSD, etc., with good results in detecting a rectangular box around an object, However, I could not find any paper that shows a method that learns a rotated bounding box. Is it difficult to learn…

convolutional-neural-networks computer-vision object-detection yolo

asked Jan 11 '19 at 15:00

Ankish Bansal

253
1
2
8

11

votes

3 answers

What is a deep neural network?

What is the definition of a deep neural network? Why are they so popular or important?

machine-learning deep-learning terminology deep-neural-networks definitions

asked Aug 02 '16 at 17:12

skistaddy

429
4
17

11

votes

1 answer

Why is the n-step tree backup algorithm an off-policy algorithm?

In reinforcement learning book from Sutton & Barto (2018 edition), specifically in section 7.5 of the book, they present an n-step off-policy algorithm that doesn't require importance sampling called n-step tree backup algorithm. In other…

reinforcement-learning off-policy-methods

asked Dec 13 '18 at 21:07

Brale

2,416
1
7
15

11

votes

2 answers

How do we prove the n-step return error reduction property?

In section 7.1 (about the n-step bootstrapping) of the book Reinforcement Learning: An Introduction (2nd edition), by Andrew Barto and Richard S. Sutton, the authors write about what they call the "n-step return error reduction property": But they…

reinforcement-learning q-learning math proofs sutton-barto

asked Dec 08 '18 at 05:24

123learn

111
3

11

votes

1 answer

What are ontologies in AI?

What exactly are ontologies in AI? How should I write them and why are they important?

ai-design terminology definitions ontology

asked Oct 15 '18 at 12:38

oren revenge

263
2
9

11

votes

1 answer

Can layers of deep neural networks be seen as Hopfield networks?

Hopfield networks are able to store a vector and retrieve it starting from a noisy version of it. They do so setting weights in order to minimize the energy function when all neurons are set equal to the vector values, and retrieve the vector using…

neural-networks deep-learning deep-neural-networks topology hopfield-network

asked Sep 03 '18 at 09:56

Mario Alemi

211
1
3

11

votes

3 answers

How can AI researchers avoid "overfitting" to commonly-used benchmarks as a community?

In fields such as Machine Learning, we typically (somewhat informally) say that we are overfitting if improve our performance on a training set at the cost of reduced performance on a test set / the true population from which data is sampled. More…

machine-learning research academia benchmarks

asked Aug 12 '18 at 12:08

Dennis Soemers

10,519
2
29
70

11

votes

7 answers

Why does training an SVM take so long? How can I speed it up?

I'm trying to create and test non-linear SVMs with various kernels (RBF, Sigmoid, Polynomial) in scikit-learn, to create a model which can classify anomalies and benign behaviors. My dataset includes 692703 records and I use a 75/25%…

machine-learning training support-vector-machine

asked Jul 19 '18 at 11:01

Panagiotis

211
1
2
3

11

votes

2 answers

What tools are used to deal with adversarial examples problem?

The problem of adversarial examples is known to be critical for neural networks. For example, an image classifier can be manipulated by additively superimposing a different low amplitude image to each of many training examples that looks like noise…

resource-request adversarial-ml ai-safety ai-security

asked Jun 26 '18 at 10:39

Ilya Palachev

299
2
11

11

votes

2 answers

Why do we prefer ReLU over linear activation functions?

The ReLU activation function is defined as follows $$y = \operatorname{max}(0,x)$$ And the linear activation function is defined as follows $$y = x$$ The ReLU nonlinearity just clips the values less than 0 to 0 and passes everything else. Then why…

neural-networks deep-learning comparison activation-functions relu

asked May 19 '18 at 16:41

imflash217

499
5
15

11

votes

5 answers

What kind of simulated environment is complex enough to develop a general AI?

Imagine trying to create a simulated virtual environment that is complicated enough to create a "general AI" (which I define as a self aware AI) but is as simple as possible. What would this minimal environment be like? i.e. An environment that was…

agi artificial-consciousness self-awareness

asked May 09 '18 at 15:35

zooby

2,260
1
14
22

11

votes

1 answer

What kind of problems require more than 2 hidden layers?

I've read that the most of the problems can be solved with 1-2 hidden layers. How do you know you need more than 2? For what kind of problems you would need them (give me an example)?

deep-neural-networks hidden-layers

asked Aug 02 '16 at 16:29

kenorb

10,525
6
45
95

Most Popular