Most Popular

1500 questions
14
votes
1 answer

Why does a transformer not use an activation function following the multi-head attention layer?

I was hoping someone could explain to me why in the transformer model from the "Attention is all you need" paper there is no activation applied after both the multihead attention layer and to the residual connections. It seems to me that there are…
chasep255
  • 193
  • 1
  • 7
14
votes
6 answers

Is there actually a lack of fundamental theory on deep learning?

I heard several times that one of the fundamental/open problems of deep learning is the lack of "general theory" on it, because, actually, we don't know why deep learning works so well. Even the Wikipedia page on deep learning has similar comments.…
heleone
  • 151
  • 1
  • 3
14
votes
3 answers

How to determine the embedding size?

When we are training a neural network, we are going to determine the embedding size to convert the categorical (in NLP, for instance) or continuous (in computer vision or voice) information to hidden vectors (or embeddings), but I wonder if there…
14
votes
5 answers

Is a genetic algorithm an example of artificial intelligence?

Since human intelligence presumably is a function of a natural genetic algorithm in nature, is using a genetic algorithm in a computer an example of artificial intelligence? If not, how do they differ? Or perhaps some are and some are not expressing…
WilliamKF
  • 2,533
  • 1
  • 26
  • 31
14
votes
1 answer

What are the consequences of layer norm vs batch norm?

I'll start with my understanding of the literal difference between these two. First, let's say we have an input tensor to a layer, and that tensor has dimensionality $B \times D$, where $B$ is the size of the batch and $D$ is the dimensionality of…
14
votes
1 answer

Which approaches could I use to create a simple chatbot using a neural network?

I wanted to start experimenting with neural networks, so I decided to make a chatbot (like Cleverbot, which is not that clever anyway) using them. I looked around for some documentation and I found many tutorials on general tasks, but few on this…
14
votes
3 answers

What are other examples of theoretical machine learning books?

I am looking for a book about machine learning that would suit my physics background. I am more or less familiar with classical and complex analysis, theory of probability, сcalculus of variations, matrix algebra, etc. However, I have not studied…
14
votes
10 answers

Could an AI feel emotions?

Assuming humans had finally developed the first humanoid AI based on the human brain, would It feel emotions? If not, would it still have ethics and/or morals?
14
votes
2 answers

How does one prove comprehension in machines?

Say we have a machine and we give it a task to do (vision task, language task, game, etc.), how can one prove that a machine actually know's what's going on/happening in that specific task? To narrow it down, some examples: Conversation - How would…
14
votes
3 answers

How does noise affect generalization?

Does increasing the noise in data help to improve the learning ability of a network? Does it make any difference or does it depend on the problem being solved? How is it affect the generalization process overall?
kenorb
  • 10,525
  • 6
  • 45
  • 95
14
votes
2 answers

Should deep residual networks be viewed as an ensemble of networks?

The question is about the architecture of Deep Residual Networks (ResNets). The model that won the 1-st places at "Large Scale Visual Recognition Challenge 2015" (ILSVRC2015) in all five main tracks: ImageNet Classification: “Ultra-deep” (quote…
14
votes
3 answers

Has anyone thought about making a neural network ask questions, instead of only answering them?

Most of the people is trying to answer question with a neural network. However, has anyone came up with some thoughts about how to make neural network ask questions, instead of answer questions? For example, if a CNN can decide which category an…
cha
  • 141
  • 5
14
votes
8 answers

Is consciousness necessary for any AI task?

Consciousness is challenging to define, but for this question let's define it as "actually experiencing sensory input as opposed to just putting a bunch of data through an inanimate machine." Humans, of course, have minds; for normal computers, all…
Ben N
  • 2,589
  • 2
  • 21
  • 35
14
votes
3 answers

Is there a way to understand neural networks without using the concept of brain?

Is there a way to understand, for instance, a multi-layered perceptron without hand-waving about them being similar to brains, etc? For example, it is obvious that what a perceptron does is approximating a function; there might be many other ways,…
14
votes
3 answers

What is the relationship between the size of the hidden layer and the size of the cell state layer in an LSTM?

I was following some examples to get familiar with TensorFlow's LSTM API, but noticed that all LSTM initialization functions require only the num_units parameter, which denotes the number of hidden units in a cell. According to what I have learned…