Highest Voted Questions - Artificial Intelligence Stack Exchange

23

votes

4 answers

Where can I find the original paper that introduced RNNs?

I was able to find the original paper on LSTM, but I was not able to find the paper that introduced "vanilla" RNNs. Where can I find it?

recurrent-neural-networks long-short-term-memory reference-request papers

asked Sep 30 '18 at 18:55

Ahsan Tarique

331
1
2
5

23

votes

1 answer

What are the advantages of ReLU vs Leaky ReLU and Parametric ReLU (if any)?

I think that the advantage of using Leaky ReLU instead of ReLU is that in this way we cannot have vanishing gradient. Parametric ReLU has the same advantage with the only difference that the slope of the output for negative inputs is a learnable…

neural-networks activation-functions relu

asked Jul 24 '18 at 12:47

gvgramazio

706
2
8
20

23

votes

6 answers

What activation function does the human brain use?

Does the human brain use a specific activation function? I've tried doing some research, and as it's a threshold for whether the signal is sent through a neuron or not, it sounds a lot like ReLU. However, I can't find a single article confirming…

activation-functions neuroscience brain

asked Apr 18 '18 at 10:36

mlman

341
2
5

23

votes

6 answers

How much energy consumption is involved in Chat GPT responses being generated?

I note this question was deemed off-topic, so I'm trying to clearly frame this question in terms of scope of response I'm interested in, namely ethics and sustainability issues associated with the soon-to-be proliferation of OpenAI Chat GPT types of…

social ethics chatgpt green-ai

asked Jan 31 '23 at 02:31

wide_eyed_pupil

333
1
2
7

23

votes

2 answers

Why does GPT-2 Exclude the Transformer Encoder?

After looking into transformers, BERT, and GPT-2, from what I understand, GPT-2 essentially uses only the decoder part of the original transformer architecture and uses masked self-attention that can only look at prior tokens. Why does GPT-2 not…

natural-language-processing transformer attention bert gpt

asked Mar 27 '21 at 19:55

Athena Wisdom

381
1
2
5

23

votes

2 answers

What are the flaws in Jeff Hawkins's AI framework?

In 2004 Jeff Hawkins, inventor of the palm pilot, published a very interesting book called On Intelligence, in which he details a theory how the human neocortex works. This theory is called Memory-Prediction framework and it has some striking…

htm

asked Sep 21 '16 at 13:05

BlindKungFuMaster

4,265
13
23

23

votes

5 answers

What is the difference between machine learning and deep learning?

Can someone explain to me the difference between machine learning and deep learning? Is it possible to learn deep learning without knowing machine learning?

machine-learning deep-learning comparison

asked Aug 25 '16 at 22:19

Addis

333
5
9

23

votes

2 answers

Can Q-learning be used for continuous (state or action) spaces?

Many examples work with a table-based method for Q-learning. This may be suitable for a discrete state (observation) or action space, like a robot in a grid world, but is there a way to use Q-learning for continuous spaces like the control of a…

reinforcement-learning q-learning dqn continuous-action-spaces continuous-state-spaces

asked May 11 '19 at 11:11

Bryan McGill

491
1
3
12

22

votes

2 answers

Are Modular Neural Networks more effective than large, monolithic networks at any tasks?

Modular/Multiple Neural networks (MNNs) revolve around training smaller, independent networks that can feed into each other or another higher network. In principle, the hierarchical organization could allow us to make sense of more complex problem…

neural-networks topology architecture neurons biology

asked Dec 02 '18 at 21:09

Harsh Sikka

321
1
2

22

votes

1 answer

Why do you not see dropout layers on reinforcement learning examples?

I've been looking at reinforcement learning, and specifically playing around with creating my own environments to use with the OpenAI Gym AI. I am using agents from the stable_baselines project to test with it. One thing I've noticed in virtually…

machine-learning reinforcement-learning overfitting dropout

asked Oct 07 '18 at 09:55

Matt Hamilton

353
2
5

22

votes

2 answers

How to define states in reinforcement learning?

I am studying reinforcement learning and the variants of it. I am starting to get an understanding of how the algorithms work and how they apply to an MDP. What I don't understand is the process of defining the states of the MDP. In most examples…

reinforcement-learning ai-design markov-decision-process state-spaces state-representations

asked Aug 30 '18 at 23:45

Andy

323
1
2
6

22

votes

4 answers

Why do we need floats for using neural networks?

Is it possible to make a neural network that uses only integers by scaling input and output of each function to [-INT_MAX, INT_MAX]? Is there any drawbacks?

neural-networks machine-learning

asked Jul 22 '18 at 14:12

elimohl

331
1
2
5

22

votes

3 answers

Are softmax outputs of classifiers true probabilities?

BACKGROUND: The softmax function is the most common choice for an activation function for the last dense layer of a multiclass neural network classifier. The outputs of the softmax function have mathematical properties of probabilities and are--in…

activation-functions probability softmax probability-theory

asked Nov 14 '22 at 19:11

Snehal Patel

1,037
1
4
27

22

votes

2 answers

How do neural networks play chess?

I have been spending a few days trying to wrap my head around how and why neural networks are used to play chess. Although I know very little about how the game of chess works, I can understand the following idea. Theoretically, we could make a…

neural-networks reference-request chess board-games

asked Mar 18 '22 at 15:14

stats_noob

299
3
12

22

votes

1 answer

Why has the cross-entropy become the classification standard loss function and not Kullback-Leibler divergence?

The cross-entropy is identical to the KL divergence plus the entropy of the target distribution. The KL divergence equals zero when the two distributions are the same, which seems more intuitive to me than the entropy of the target distribution,…

machine-learning classification comparison cross-entropy kl-divergence

asked Mar 30 '17 at 08:39

Josh Albert

331
2
6

Most Popular