Most Popular
1500 questions
12
votes
3 answers
Why is dot product attention faster than additive attention?
In section 3.2.1 of Attention Is All You Need the claim is made that:
Dot-product attention is identical to our algorithm, except for the scaling factor of $\frac{1}{\sqrt{d_k}}$. Additive attention computes the compatibility function using a…
user3180
- 648
- 5
- 15
12
votes
3 answers
Is REINFORCE the same as 'vanilla policy gradient'?
I don't know what people mean by 'vanilla policy gradient', but what comes to mind is REINFORCE, which is the simplest policy gradient algorithm I can think of. Is this an accurate statement?
By REINFORCE I mean this surrogate objective
$$…
yewang
- 361
- 2
- 7
12
votes
5 answers
What is "backprop"?
What does "backprop" mean? Is the "backprop" term basically the same as "backpropagation" or does it have a different meaning?
kenorb
- 10,525
- 6
- 45
- 95
11
votes
3 answers
Is it difficult to learn the rotated bounding box for a (rotated) object?
I have checked out many methods and papers, like YOLO, SSD, etc., with good results in detecting a rectangular box around an object, However, I could not find any paper that shows a method that learns a rotated bounding box.
Is it difficult to learn…
Ankish Bansal
- 253
- 1
- 2
- 8
11
votes
3 answers
What is a deep neural network?
What is the definition of a deep neural network? Why are they so popular or important?
skistaddy
- 429
- 4
- 17
11
votes
1 answer
Why is the n-step tree backup algorithm an off-policy algorithm?
In reinforcement learning book from Sutton & Barto (2018 edition), specifically in section 7.5 of the book, they present an n-step off-policy algorithm that doesn't require importance sampling called n-step tree backup algorithm.
In other…
Brale
- 2,416
- 1
- 7
- 15
11
votes
2 answers
How do we prove the n-step return error reduction property?
In section 7.1 (about the n-step bootstrapping) of the book Reinforcement Learning: An Introduction (2nd edition), by Andrew Barto and Richard S. Sutton, the authors write about what they call the "n-step return error reduction property":
But they…
123learn
- 111
- 3
11
votes
1 answer
What are ontologies in AI?
What exactly are ontologies in AI? How should I write them and why are they important?
oren revenge
- 263
- 2
- 9
11
votes
1 answer
Can layers of deep neural networks be seen as Hopfield networks?
Hopfield networks are able to store a vector and retrieve it starting from a noisy version of it. They do so setting weights in order to minimize the energy function when all neurons are set equal to the vector values, and retrieve the vector using…
Mario Alemi
- 211
- 1
- 3
11
votes
3 answers
How can AI researchers avoid "overfitting" to commonly-used benchmarks as a community?
In fields such as Machine Learning, we typically (somewhat informally) say that we are overfitting if improve our performance on a training set at the cost of reduced performance on a test set / the true population from which data is sampled.
More…
Dennis Soemers
- 10,519
- 2
- 29
- 70
11
votes
7 answers
Why does training an SVM take so long? How can I speed it up?
I'm trying to create and test non-linear SVMs with various kernels (RBF, Sigmoid, Polynomial) in scikit-learn, to create a model which can classify anomalies and benign behaviors.
My dataset includes 692703 records and I use a 75/25%…
Panagiotis
- 211
- 1
- 2
- 3
11
votes
2 answers
What tools are used to deal with adversarial examples problem?
The problem of adversarial examples is known to be critical for neural networks. For example, an image classifier can be manipulated by additively superimposing a different low amplitude image to each of many training examples that looks like noise…
Ilya Palachev
- 299
- 2
- 11
11
votes
2 answers
Why do we prefer ReLU over linear activation functions?
The ReLU activation function is defined as follows
$$y = \operatorname{max}(0,x)$$
And the linear activation function is defined as follows
$$y = x$$
The ReLU nonlinearity just clips the values less than 0 to 0 and passes everything else. Then why…
imflash217
- 499
- 5
- 15
11
votes
5 answers
What kind of simulated environment is complex enough to develop a general AI?
Imagine trying to create a simulated virtual environment that is complicated enough to create a "general AI" (which I define as a self aware AI) but is as simple as possible. What would this minimal environment be like?
i.e. An environment that was…
zooby
- 2,260
- 1
- 14
- 22
11
votes
1 answer
What kind of problems require more than 2 hidden layers?
I've read that the most of the problems can be solved with 1-2 hidden layers.
How do you know you need more than 2? For what kind of problems you would need them (give me an example)?
kenorb
- 10,525
- 6
- 45
- 95