Most Popular
1500 questions
23
votes
4 answers
Where can I find the original paper that introduced RNNs?
I was able to find the original paper on LSTM, but I was not able to find the paper that introduced "vanilla" RNNs. Where can I find it?
Ahsan Tarique
- 331
- 1
- 2
- 5
23
votes
1 answer
What are the advantages of ReLU vs Leaky ReLU and Parametric ReLU (if any)?
I think that the advantage of using Leaky ReLU instead of ReLU is that in this way we cannot have vanishing gradient. Parametric ReLU has the same advantage with the only difference that the slope of the output for negative inputs is a learnable…
gvgramazio
- 706
- 2
- 8
- 20
23
votes
6 answers
What activation function does the human brain use?
Does the human brain use a specific activation function?
I've tried doing some research, and as it's a threshold for whether the signal is sent through a neuron or not, it sounds a lot like ReLU. However, I can't find a single article confirming…
mlman
- 341
- 2
- 5
23
votes
6 answers
How much energy consumption is involved in Chat GPT responses being generated?
I note this question was deemed off-topic, so I'm trying to clearly frame this question in terms of scope of response I'm interested in, namely ethics and sustainability issues associated with the soon-to-be proliferation of OpenAI Chat GPT types of…
wide_eyed_pupil
- 333
- 1
- 2
- 7
23
votes
2 answers
Why does GPT-2 Exclude the Transformer Encoder?
After looking into transformers, BERT, and GPT-2, from what I understand, GPT-2 essentially uses only the decoder part of the original transformer architecture and uses masked self-attention that can only look at prior tokens.
Why does GPT-2 not…
Athena Wisdom
- 381
- 1
- 2
- 5
23
votes
2 answers
What are the flaws in Jeff Hawkins's AI framework?
In 2004 Jeff Hawkins, inventor of the palm pilot, published a very interesting book called On Intelligence, in which he details a theory how the human neocortex works.
This theory is called Memory-Prediction framework and it has some striking…
BlindKungFuMaster
- 4,265
- 13
- 23
23
votes
5 answers
What is the difference between machine learning and deep learning?
Can someone explain to me the difference between machine learning and deep learning? Is it possible to learn deep learning without knowing machine learning?
Addis
- 333
- 5
- 9
23
votes
2 answers
Can Q-learning be used for continuous (state or action) spaces?
Many examples work with a table-based method for Q-learning. This may be suitable for a discrete state (observation) or action space, like a robot in a grid world, but is there a way to use Q-learning for continuous spaces like the control of a…
Bryan McGill
- 491
- 1
- 3
- 12
22
votes
2 answers
Are Modular Neural Networks more effective than large, monolithic networks at any tasks?
Modular/Multiple Neural networks (MNNs) revolve around training smaller, independent networks that can feed into each other or another higher network.
In principle, the hierarchical organization could allow us to make sense of more complex problem…
Harsh Sikka
- 321
- 1
- 2
22
votes
1 answer
Why do you not see dropout layers on reinforcement learning examples?
I've been looking at reinforcement learning, and specifically playing around with creating my own environments to use with the OpenAI Gym AI. I am using agents from the stable_baselines project to test with it.
One thing I've noticed in virtually…
Matt Hamilton
- 353
- 2
- 5
22
votes
2 answers
How to define states in reinforcement learning?
I am studying reinforcement learning and the variants of it. I am starting to get an understanding of how the algorithms work and how they apply to an MDP.
What I don't understand is the process of defining the states of the MDP. In most examples…
Andy
- 323
- 1
- 2
- 6
22
votes
4 answers
Why do we need floats for using neural networks?
Is it possible to make a neural network that uses only integers by scaling input and output of each function to [-INT_MAX, INT_MAX]? Is there any drawbacks?
elimohl
- 331
- 1
- 2
- 5
22
votes
3 answers
Are softmax outputs of classifiers true probabilities?
BACKGROUND: The softmax function is the most common choice for an activation function for the last dense layer of a multiclass neural network classifier. The outputs of the softmax function have mathematical properties of probabilities and are--in…
Snehal Patel
- 1,037
- 1
- 4
- 27
22
votes
2 answers
How do neural networks play chess?
I have been spending a few days trying to wrap my head around how and why neural networks are used to play chess.
Although I know very little about how the game of chess works, I can understand the following idea. Theoretically, we could make a…
stats_noob
- 299
- 3
- 12
22
votes
1 answer
Why has the cross-entropy become the classification standard loss function and not Kullback-Leibler divergence?
The cross-entropy is identical to the KL divergence plus the entropy of the target distribution. The KL divergence equals zero when the two distributions are the same, which seems more intuitive to me than the entropy of the target distribution,…
Josh Albert
- 331
- 2
- 6