Most Popular
1500 questions
12
votes
1 answer
Do off-policy policy gradient methods exist?
Do off-policy policy gradient methods exist?
I know that policy gradient methods themselves using the policy function for sampling rollouts. But can't we easily have a model for sampling from the environment? If so, I've never seen this done before.
echo
- 713
- 1
- 6
- 12
12
votes
1 answer
Strategic planning and multi dimensional knapsack problem
I'm trying to find a planning approach to solve a problem that attempts to model learning of new material. We assume that we only have one resource such as Wikipedia, which contains a list of articles represented as a vector of knowledge it contains…
Artem
- 221
- 1
- 3
12
votes
4 answers
Is overfitting always a bad thing?
DNN can be used to recognize pictures. Great. For that usage, it's better if they are somewhat flexible so as to recognize as cats even cats that are not on the pictures on which they trained (i.e. avoid overfitting). Agreed.
But when one uses NN as…
ZakC
- 347
- 2
- 7
12
votes
1 answer
Why use ReLU over Leaky ReLU?
From my understanding a leaky ReLU attempts to address issues of vanishing gradients and nonzero-centeredness by keeping neurons that fire with a negative value alive.
With just this info to go off of, it would seem that the leaky ReLU is just an…
John Brown
- 123
- 1
- 1
- 5
12
votes
1 answer
What are the best known gradient-free training methods for deep learning?
As I know, the current state of the art methods for training deep learning networks are variants of gradient descent or stochastic gradient descent.
What are the best known gradient-free training methods for deep learning (mostly in visual tasks…
rkellerm
- 334
- 1
- 9
12
votes
2 answers
Is plain autoencoder a generative model?
I am wondering how a plain auto encoder is a generative model though its version might be but how can a plain auto encoder can be generative. I know that Vaes which is a version of the autoencoder is generative as it generates distribution for…
Nervous Hero
- 195
- 1
- 6
12
votes
2 answers
What are bottleneck features?
In the blog post Building powerful image classification models using very little data, bottleneck features are mentioned. What are the bottleneck features? Do they change with the architecture that is used? Are they the final output of convolutional…
Abhishek Bhatia
- 447
- 2
- 5
- 16
12
votes
1 answer
Is there a proper initialization technique for the weight matrices in multi-head attention?
Self-attention layers have 4 learnable tensors (in the vanilla formulation):
Query matrix $W_Q$
Key matrix $W_K$
Value matrix $W_V$
Output matrix $W_O$
Nice illustration from https://jalammar.github.io/illustrated-transformer/
However, I do not…
spiridon_the_sun_rotator
- 2,852
- 12
- 17
12
votes
1 answer
In Computer Vision, what is the difference between a transformer and attention?
Having been studying computer vision for a while, I still cannot understand what the difference between a transformer and attention is?
novice
- 123
- 1
- 4
12
votes
5 answers
Why are deep neural networks and deep learning insufficient to achieve general intelligence?
Everything related to Deep Learning (DL) and deep(er) networks seems "successful", at least progressing very fast, and cultivating the belief that AGI is at reach. This is popular imagination. DL is a tremendous tool to tackle so many problems,…
Eric Platon
- 1,510
- 10
- 22
12
votes
1 answer
How can Viv generate new code based on some user's query?
I have been looking into Viv, an artificial intelligent agent in development. Here is a demonstration of Viv (by Dag Kittlaus).
Based on what I understand, this AI can generate new code and execute it based on a query from the user.
What I am…
N. Chalifour
- 161
- 2
12
votes
3 answers
Why teaching only search algorithms in a short introductory AI course?
I understood that the concept of search is important in AI. There's a question on this website regarding this topic, but one could also intuitively understand why. I've had an introductory course on AI, which lasted half of a semester, so of course…
nbro
- 42,615
- 12
- 119
- 217
12
votes
3 answers
What is the purpose of Decoder mask (triangular mask) in Transformer?
I'm trying to implement transformer model using this tutorial. In the decoder block of the Transformer model, a mask is passed to "pad and mask future tokens in the input received by the decoder". This mask is added to attention weights.
import…
Uchiha Madara
- 173
- 1
- 1
- 8
12
votes
3 answers
Are neural networks the only way to reach "true" artificial intelligence?
Currently, most research done in artificial intelligence focuses on neural networks, which have been successfully used to solve many problems. A good example would be DeepMind's AlphaGo, which uses a convolutional neural network. There are many…
Eka
- 1,106
- 8
- 24
12
votes
1 answer
What exactly is the advantage of double DQN over DQN?
I started looking into the double DQN (DDQN). Apparently, the difference between DDQN and DQN is that in DDQN we use the main value network for action selection and the target network for outputting the Q values.
However, I don't understand why…
Chukwudi
- 369
- 2
- 8