Highest Voted Questions - Artificial Intelligence Stack Exchange

22

votes

1 answer

What is the intuition behind the dot product attention?

I am watching the video Attention Is All You Need by Yannic Kilcher. My question is: what is the intuition behind the dot product attention? $$A(q,K, V) = \sum_i\frac{e^{q.k_i}}{\sum_j e^{q.k_j}} v_i$$ becomes: $$A(Q,K, V) = \text{softmax}(QK^T)V$$

natural-language-processing papers transformer attention bert

asked Apr 11 '20 at 12:53

DRV

1,843
3
15
20

22

votes

3 answers

Is a dystopian surveillance state computationally possible?

This isn't really a conspiracy theory question. More of an inquire on the global computational power and data storage logistics question. Most recording instruments such as cameras and microphones are typically voluntary opt in devices, in that,…

social computation

asked Feb 28 '20 at 08:34

Harrison Tran

329
2
6

22

votes

2 answers

Why would you implement the position-wise feed-forward network of the transformer with convolution layers?

The Transformer model introduced in "Attention is all you need" by Vaswani et al. incorporates a so-called position-wise feed-forward network (FFN): In addition to attention sub-layers, each of the layers in our encoder and decoder contains a…

deep-learning keras convolution transformer feedforward-neural-networks

asked Sep 18 '19 at 23:45

Eli Korvigo

321
1
2
6

22

votes

1 answer

Has the Lovelace Test 2.0 been successfully used in an academic setting?

In October 2014, Dr. Mark Riedl published an approach to testing AI intelligence, called the "Lovelace Test 2.0", after being inspired by the original Lovelace Test (published in 2001). Mark believed that the original Lovelace Test would be…

history intelligence-testing

asked Aug 07 '16 at 18:17

Left SE On 10_6_19

1,670
10
23

22

votes

3 answers

Why doesn't Q-learning converge when using function approximation?

The tabular Q-learning algorithm is guaranteed to find the optimal $Q$ function, $Q^*$, provided the following conditions (the Robbins-Monro conditions) regarding the learning rate are satisfied $\sum_{t} \alpha_t(s, a) = \infty$ $\sum_{t}…

reinforcement-learning q-learning deep-rl proofs function-approximation

asked Apr 05 '19 at 18:23

nbro

42,615
12
119
217

21

votes

3 answers

How can we process the data from both the true distribution and the generator?

I'm struggling to understand the GAN loss function as provided in Understanding Generative Adversarial Networks (a blog post written by Daniel Seita). In the standard cross-entropy loss, we have an output that has been run through a sigmoid function…

neural-networks deep-learning objective-functions generative-adversarial-networks generative-model

asked Jun 13 '17 at 10:50

tryingtolearn

395
1
2
10

21

votes

3 answers

What kind of word embedding is used in the original transformer?

I am currently trying to understand transformers. To start, I read Attention Is All You Need and also this tutorial. What makes me wonder is the word embedding used in the model. Is word2vec or GloVe being used? Are the word embeddings trained from…

natural-language-processing transformer attention word-embedding

asked Feb 05 '21 at 18:51

Bert Gayus

645
1
5
12

21

votes

1 answer

What is the number of neurons required to approximate a polynomial of degree n?

I learned about the universal approximation theorem from this guide. It states that a network even with a single hidden layer can approximate any function within some bound, given a sufficient number of neurons. Or mathematically, ${|g(x)−f(x)|<…

neural-networks computational-learning-theory multilayer-perceptrons function-approximation universal-approximation-theorems

asked Jan 02 '21 at 17:49

mark mark

813
6
25

21

votes

2 answers

What is the "Hello World" problem of Reinforcement Learning?

As we all know, "Hello World" is usually the first program that any programmer learns/implements in any language/framework. As Aurélien Géron mentioned in his book that MNIST is often called the Hello World of Machine Learning, is there any "Hello…

machine-learning reinforcement-learning

asked Sep 13 '20 at 12:57

Arpit-Gole

404
3
9

21

votes

1 answer

What is a fully convolution network?

I was surveying some literature related to Fully Convolutional Networks and came across the following phrase, A fully convolutional network is achieved by replacing the parameter-rich fully connected layers in standard CNN architectures by…

machine-learning convolutional-neural-networks computer-vision image-segmentation fully-convolutional-networks

asked Jun 12 '20 at 01:35

r4bb1t

345
1
3
8

21

votes

2 answers

What are the main differences between skip-gram and continuous bag of words?

The skip-gram and continuous bag of words (CBOW) are two different types of word2vec models. What are the main differences between them? What are the pros and cons of both methods?

natural-language-processing comparison word2vec cbow skip-gram

asked Mar 14 '20 at 06:52

DRV

1,843
3
15
20

21

votes

2 answers

Problems that only humans will ever be able to solve

With the increasing complexity of reCAPTCHA, I wondered about the existence of some problem, that only a human will ever be able to solve (or that AI won't be able to solve as long as it doesn't reproduce exactly the human brain). For instance, the…

agi problem-solving intelligence-testing ai-completeness

asked May 05 '19 at 08:45

Marc Perlade

313
1
6

20

votes

1 answer

How does LSTM in deep reinforcement learning differ from experience replay?

In the paper Deep Recurrent Q-Learning for Partially Observable MDPs, the author processed the Atari game frames with an LSTM layer at the end. My questions are: How does this method differ from the experience replay, as they both use past…

reinforcement-learning long-short-term-memory deep-rl comparison experience-replay

asked Aug 27 '18 at 01:58

Kevin. Fang

373
1
2
7

20

votes

4 answers

What is the difference between actor-critic and advantage actor-critic?

I'm struggling to understand the difference between actor-critic and advantage actor-critic. At least, I know they are different from asynchronous advantage actor-critic (A3C), as A3C adds an asynchronous mechanism that uses multiple worker agents…

reinforcement-learning terminology actor-critic-methods comparison advantage-actor-critic

asked Aug 02 '18 at 14:59

Blaszard

1,097
4
11
25

20

votes

3 answers

How do I choose the best algorithm for a board game like checkers?

How do I choose the best algorithm for a board game like checkers? So far, I have considered only three algorithms, namely, minimax, alpha-beta pruning, and Monte Carlo tree search (MCTS). Apparently, both the alpha-beta pruning and MCTS are…

game-ai applications monte-carlo-tree-search minimax alpha-beta-pruning

asked Jul 16 '18 at 15:37

JoeyB

487
1
6
15

Most Popular