Highest Voted Questions - Artificial Intelligence Stack Exchange

14

votes

1 answer

Why does a transformer not use an activation function following the multi-head attention layer?

I was hoping someone could explain to me why in the transformer model from the "Attention is all you need" paper there is no activation applied after both the multihead attention layer and to the residual connections. It seems to me that there are…

transformer attention

asked Aug 24 '21 at 09:55

chasep255

193
1
7

14

votes

6 answers

Is there actually a lack of fundamental theory on deep learning?

I heard several times that one of the fundamental/open problems of deep learning is the lack of "general theory" on it, because, actually, we don't know why deep learning works so well. Even the Wikipedia page on deep learning has similar comments.…

deep-learning computational-learning-theory

asked Mar 15 '17 at 09:10

heleone

151
1
3

14

votes

3 answers

How to determine the embedding size?

When we are training a neural network, we are going to determine the embedding size to convert the categorical (in NLP, for instance) or continuous (in computer vision or voice) information to hidden vectors (or embeddings), but I wonder if there…

deep-learning hyperparameter-optimization hyper-parameters embeddings

asked Jul 07 '21 at 13:26

Lerner Zhang

1,065
1
9
22

14

votes

5 answers

Is a genetic algorithm an example of artificial intelligence?

Since human intelligence presumably is a function of a natural genetic algorithm in nature, is using a genetic algorithm in a computer an example of artificial intelligence? If not, how do they differ? Or perhaps some are and some are not expressing…

philosophy genetic-algorithms terminology

asked Aug 02 '16 at 16:02

WilliamKF

2,533
1
26
31

14

votes

1 answer

What are the consequences of layer norm vs batch norm?

I'll start with my understanding of the literal difference between these two. First, let's say we have an input tensor to a layer, and that tensor has dimensionality $B \times D$, where $B$ is the size of the batch and $D$ is the dimensionality of…

deep-learning comparison batch-normalization layer-normalization

asked Apr 13 '21 at 18:13

Alexander Soare

1,379
3
12
28

14

votes

1 answer

Which approaches could I use to create a simple chatbot using a neural network?

I wanted to start experimenting with neural networks, so I decided to make a chatbot (like Cleverbot, which is not that clever anyway) using them. I looked around for some documentation and I found many tutorials on general tasks, but few on this…

neural-networks recurrent-neural-networks reference-request long-short-term-memory chat-bots

asked Dec 14 '16 at 21:56

Totem

381
2
6

14

votes

3 answers

What are other examples of theoretical machine learning books?

I am looking for a book about machine learning that would suit my physics background. I am more or less familiar with classical and complex analysis, theory of probability, сcalculus of variations, matrix algebra, etc. However, I have not studied…

machine-learning reference-request computational-learning-theory books

asked Sep 10 '20 at 20:20

Ilya

143
1
6

14

votes

10 answers

Could an AI feel emotions?

Assuming humans had finally developed the first humanoid AI based on the human brain, would It feel emotions? If not, would it still have ethics and/or morals?

philosophy human-like ethics emotional-intelligence emotion-recognition

asked Nov 06 '16 at 01:51

MountainSide Studios

383
3
9

14

votes

2 answers

How does one prove comprehension in machines?

Say we have a machine and we give it a task to do (vision task, language task, game, etc.), how can one prove that a machine actually know's what's going on/happening in that specific task? To narrow it down, some examples: Conversation - How would…

philosophy agi artificial-consciousness chinese-room-argument

asked Jun 03 '20 at 00:19

Landon G

500
2
10

14

votes

3 answers

How does noise affect generalization?

Does increasing the noise in data help to improve the learning ability of a network? Does it make any difference or does it depend on the problem being solved? How is it affect the generalization process overall?

neural-networks machine-learning statistical-ai generalization

asked Aug 02 '16 at 15:40

kenorb

10,525
6
45
95

14

votes

2 answers

Should deep residual networks be viewed as an ensemble of networks?

The question is about the architecture of Deep Residual Networks (ResNets). The model that won the 1-st places at "Large Scale Visual Recognition Challenge 2015" (ILSVRC2015) in all five main tracks: ImageNet Classification: “Ultra-deep” (quote…

neural-networks machine-learning deep-learning deep-neural-networks residual-networks

asked Sep 20 '16 at 10:54

Erba Aitbayev

357
1
10

14

votes

3 answers

Has anyone thought about making a neural network ask questions, instead of only answering them?

Most of the people is trying to answer question with a neural network. However, has anyone came up with some thoughts about how to make neural network ask questions, instead of answer questions? For example, if a CNN can decide which category an…

neural-networks deep-learning

asked Sep 12 '16 at 05:41

cha

141
5

14

votes

8 answers

Is consciousness necessary for any AI task?

Consciousness is challenging to define, but for this question let's define it as "actually experiencing sensory input as opposed to just putting a bunch of data through an inanimate machine." Humans, of course, have minds; for normal computers, all…

philosophy artificial-consciousness

asked Sep 08 '16 at 16:20

Ben N

2,589
2
21
35

14

votes

3 answers

Is there a way to understand neural networks without using the concept of brain?

Is there a way to understand, for instance, a multi-layered perceptron without hand-waving about them being similar to brains, etc? For example, it is obvious that what a perceptron does is approximating a function; there might be many other ways,…

neural-networks machine-learning function-approximation history

asked Oct 19 '19 at 18:23

Evgeniy

249
1
3

14

votes

3 answers

What is the relationship between the size of the hidden layer and the size of the cell state layer in an LSTM?

I was following some examples to get familiar with TensorFlow's LSTM API, but noticed that all LSTM initialization functions require only the num_units parameter, which denotes the number of hidden units in a cell. According to what I have learned…

neural-networks tensorflow recurrent-neural-networks long-short-term-memory

asked Sep 25 '19 at 13:59

kuixiong

241
2
4

Most Popular