Most Popular
1500 questions
5
votes
1 answer
Why is the mean used to compute the expectation in the GAN loss?
From Goodfellow et al. (2014), we have the adversarial loss:
$$ \min_G \, \max_D V (D, G) = \mathbb{E}_{x∼p_{data}(x)} \, [\log \, D(x)] + \, \mathbb{E}_{z∼p_z(z)} \, [\log \, (1 − D(G(z)))] \, \text{.} \quad$$
In practice, the expectation is…
A is for Ambition
- 153
- 4
5
votes
1 answer
Can you convert a MDP problem to a Contextual Multi-Arm Bandits problem?
I'm trying to get a better understanding of Multi-Arm Bandits, Contextual Multi-Arm Bandits and Markov Decision Process.
Basically, Multi-Arm Bandits is a special case of Contextual Multi-Arm Bandits where there is no state(features/context). And…
peidaqi
- 151
- 2
5
votes
2 answers
Why are policy iteration and value iteration studied as separate algorithms?
In Sutton and Barto's book about reinforcement learning, policy iteration and value iterations are presented as separate/different algorithms.
This is very confusing because policy iteration includes an update/change of value and value iteration…
User007
- 51
- 3
5
votes
2 answers
How can we prevent AGI from doing drugs?
I recently read some introductions to AI alignment, AIXI and decision theory things.
As far as I understood, one of the main problems in AI alignment is how to define a utility function well, not causing something like the paperclip apocalypse.
Then…
user3584499
- 153
- 2
5
votes
1 answer
How to run a Monte Carlo Tree Search MCTS for stochastic environment?
For MCTS there is an expansion phase where we make a move and list down all the next states. But this is complicated by the fact that for some games, after making the move, there is a stochastic change to the environment. Consider the game 2048,…
xiaodai
- 151
- 1
- 3
5
votes
1 answer
How can I find a specific word in an audio file?
I'm trying to train and use a neural network to detect a specific word in an audio file. The input of the neural network is an audio of 2-3 seconds duration, and the neural network must determine whether the input audio (the voice of a person)…
Ali.kavari76
- 121
- 6
5
votes
1 answer
What is eager learning and lazy learning?
What is the difference between eager learning and lazy learning?
How does eager learning or lazy learning help me build a neural network system? And how can I use it for any target function?
mogoja
- 73
- 5
5
votes
1 answer
Why do DQNs tend to forget?
Why do DQNs tend to forget? Is it because when you feed highly correlated samples, your model (function approximation) doesn't give a general solution?
For example:
I use level 1 experiences, my model $p$ is fitted to learn how to play that…
Chukwudi
- 369
- 2
- 8
5
votes
2 answers
Could an AI be sentient?
In theory, could an AI become sentient, as in learning and becoming self-aware, all from its source code?
MountainSide Studios
- 383
- 3
- 9
5
votes
3 answers
Why is symbolic AI not so popular as ANN but used by IBM's Deep Blue?
Everybody is implementing and using DNN with, for example, TensorFlow or PyTorch.
I thought IBM's Deep Blue was an ANN-based AI system, but this article says that IBM's Deep Blue was symbolic AI.
Are there any special features in symbolic AI that…
Dan D
- 1,318
- 1
- 14
- 39
5
votes
1 answer
NEAT can't solve XOR completely
I'm currently implementing the NEAT algorithm. But problems occur when testing it with problems which don't have a linear solution(for example xor). My xor only produces 3 correct outputs once at a time:
1, 0 -> 0.99
0, 0 -> 0
1, 1 -> 0
0, 1 ->…
Creepsy
- 151
- 2
5
votes
3 answers
Is there an upper limit to the maximum cumulative reward in a deep reinforcement learning problem?
Is there an upper limit to the maximum cumulative reward in a deep reinforcement learning problem?
For example, you want to train a DQN agent in an environment, and you want to know what the highest possible value you can get from the cumulative…
user38696
5
votes
1 answer
Why do we need target network in deep Q learning?
I already know deep RL, but to learn it deeply I want to know why do we need 2 networks in deep RL. What does the target network do? I now there is huge mathematics into this, but I want to know deep Q-learning deeply, because I am about to make…
dato nefaridze
- 882
- 10
- 22
5
votes
1 answer
What is a "closed expression" in the context of logic?
I was reading about logic systems and the following phrase appeared.
any closed expression that is not derivable inside the same system
What is a "closed expression" in this context? What does "closed expression that is not derivable" mean?
Ale
- 153
- 3
- 11
5
votes
2 answers
What is a trap function in the context of a genetic algorithm?
What is a trap function in the context of a genetic algorithm? How is it related to the concepts of local and global optima?
mountaincloud
- 63
- 7