Most Popular
1500 questions
6
votes
2 answers
In 2016, can $1000.00 buy enough operations per second to be approximately equal to the computational power of a human brain?
In The Age of Spiritual Machines (1999), Ray Kurzweil predicted that in 2009, a \$1000 computing device would be able to perform a trillion operations per second. Additionally, he claimed that in 2019, a \$1000 computing device would be…
DJG
- 173
- 5
6
votes
1 answer
If the current state is $S_t$ and the actions are chosen according to $\pi$, what is the expectation of $R_{t+1}$ in terms of $\pi$ and $p$?
I'm trying to solve exercise 3.11 from the book Sutton and Barto's book (2nd edition)
Exercise 3.11 If the current state is $S_t$ , and actions are selected according to a stochastic policy $\pi$, then what is the expectation of $R_{t+1}$ in terms…
tmaric
- 402
- 3
- 14
6
votes
1 answer
How does SGD escape local minima?
SGD is able to jump out of local minima that would otherwise trap BGD
I don't really understand the above statement. Could someone please provide a mathematical explanation for why SGD (Stochastic Gradient Descent) is able to escape local minima,…
stoic-santiago
- 1,201
- 9
- 22
6
votes
1 answer
Is this proof of $\epsilon$-greedy policy improvement correct?
The following paragraph about $\epsilon$-greedy policies can be found at the end of page 100, under section 5.4, of the book "Reinforcement Learning: An Introduction" by Richard Sutton and Andrew Barto (second edition, 2018).
but with probability…
Jarvis1997
- 157
- 6
6
votes
1 answer
What's the difference between LSTM and GRU?
I have been reading about LSTMs and GRUs, which are recurrent neural networks (RNNs). The difference between the two is the number and specific type of gates that they have. The GRU has an update gate, which has a similar role to the role of the…
Pluviophile
- 1,293
- 7
- 20
- 40
6
votes
2 answers
How can we compute the ratio between the distributions if we don't know one of the distributions?
Here is my understanding of importance sampling. If we have two distributions $p(x)$ and $q(x)$, where we have a way of sampling from $p(x)$ but not from $q(x)$, but we want to compute the expectation wrt $q(x)$, then we use importance sampling.…
pecey
- 353
- 2
- 10
6
votes
1 answer
How should I handle invalid actions in a grid world?
I'm building a really simple experiment, where I let an agent move from the bottom-left corner to the upper-right corner of a $3 \times 3$ grid world.
I plan to use DQN to do this. I'm having trouble handling the starting point: what if the Q…
o_yeah
- 197
- 1
- 7
6
votes
4 answers
What are some datasets to train an MLP on simple tasks?
I have implemented an MLP. Now, I want to train it to solve simple tasks.
Are there any data sets to train the MLP on simple tasks, that is, tasks with a small number of inputs and outputs?
I would like to train it to solve problems which are…
David Price
- 61
- 1
- 3
6
votes
3 answers
What is the difference between artificial intelligence and swarm intelligence?
Artificial intelligence (AI) refers to the simulation of human intelligence in machines that are programmed to think like humans and mimic their actions. The term may also be applied to any machine that exhibits traits associated with a human mind…
Pluviophile
- 1,293
- 7
- 20
- 40
6
votes
1 answer
How does the region proposal method work in Fast R-CNN?
I read so many articles and the Fast R-CNN paper, but I'm still confused about how the region proposal method works in Fast R-CNN.
As you can see in the image below, they say they used a proposal method, but it is not specified how it works.
What…
ozoubia
- 61
- 2
6
votes
2 answers
How to write a C decompiler using AI?
I would like to learn more about whether it is possible and how to write a program that decompiles executable binary (an object file) to the C source. I'm not asking exactly 'how', but rather how this can be achieved.
Given the following hello.c…
kenorb
- 10,525
- 6
- 45
- 95
6
votes
1 answer
Is Expected SARSA an off-policy or on-policy algorithm?
I understand that SARSA is an On-policy algorithm, and Q-learning an off-policy one.
Sutton and Barto's textbook describes Expected Sarsa thusly:
In these cliff walking results Expected Sarsa was used on-policy, but
in general it might use a…
Y. Xu
- 63
- 1
- 4
6
votes
0 answers
Why Pixel RNN (Row LSTM) can capture triangular contexts?
I'm reading the paper Pixel Recurrent Neural Network. I have a question about Row LSTM. Why Row LSTM can capture triangular contexts?
In this paper,
the kernel of the one-dimensional convolution has size $k \times 1$ where $k \geq 3$; the larger…
musako
- 181
- 2
6
votes
4 answers
Would it be ethical to allow an AI to make life-or-death medical decisions?
Would it be ethical to allow an AI to make life-or-death medical decisions?
For instance, where there an insufficient number of ventilators during a respiratory pandemic, not every patient can have one. It seems like a straight forward question,…
DukeZhou
- 6,209
- 5
- 27
- 54
6
votes
1 answer
In NEAT, is it a good idea to give the same ID to node genes created from the same connection gene?
Do I have to prevent nodes created from the same connection gene to have different IDs/innovation number? In this example, the node 6 is created from the connection going from node 3 to node 4:
In the case where that specific node was already…
Dara Kong
- 125
- 1
- 7