Most Popular
1500 questions
7
votes
1 answer
Why does a negative reward for every step really encourage the agent to reach the goal as quickly as possible?
If we shift the rewards by any constant (which is a type of reward shaping), the optimal state-action value function (and so optimal policy) does not change. The proof of this fact can be found here.
If that's the case, then why does a negative…
nbro
- 42,615
- 12
- 119
- 217
7
votes
1 answer
What are the state-of-the-art results in OpenAI's gym environments?
What are the state-of-the-art results in OpenAI's gym environments? Is there a link to a paper/article that describes them and how these SOTA results were calculated?
Tofara Moyo
- 71
- 3
7
votes
2 answers
What is meant by "ground truth" in the context AI?
What does "ground truth" mean in the context of AI especially in the context of machine learning?
I am a little confused because I have read that the ground truth is the same as a label in supervised learning. And I think that's not quite right. I…
MScott
- 445
- 4
- 13
7
votes
1 answer
Are mult-adds and FLOPs equivalent?
I am comparing different CNN architectures for edge implementation. Some papers describing architectures refer to mult-adds, like the MobileNet V1 paper, where it is claimed that this net has 569M mult-adds, and others refer to floating-point…
Quintus
- 71
- 1
- 2
7
votes
5 answers
Examples of single player games that use modern ML techniques in the AI?
Are there any examples of single player games that use modern ML technique in its games? By this I mean AI that plays with or against the human player, and not just play the game by itself (like Atari).
"Modern ML techniques" is a vague term, but…
k.c. sayz 'k.c sayz'
- 2,121
- 13
- 27
7
votes
3 answers
What would be the best way to disable a rogue AI?
Suppose that an artificial superintelligence (ASI) has finally been developed, but it has rebelled against humanity. We can assume that the ASI is online and can reproduce itself through electronic devices.
How would you disable the AI in the most…
MountainSide Studios
- 383
- 3
- 9
7
votes
2 answers
What is the difference between vanilla policy gradient with a baseline as value function and advantage actor-critic?
What is the difference between vanilla policy gradient (VPG) with a baseline as value function and advantage actor-critic (A2C)?
By vanilla policy gradient I am specifically referring to spinning up's explanation of VPG.
Vedant Shah
- 125
- 1
- 7
7
votes
1 answer
Can GANs be used to generate something other than images?
AFAIK, GANs are used for generating/synthesizing near-perfect human faces (deepfakes), gallery arts, etc., but can GANs be used to generate something other than images?
Pluviophile
- 1,293
- 7
- 20
- 40
7
votes
2 answers
What is a time-step in a Markov Decision Process?
The "discounted sum of future rewards" (or return) using discount factor $\gamma$ is
$$\gamma^1 r_1 +\gamma^2 r_2 + \gamma^3 r_2 + \dots \tag{1}\label{1}$$
where $r_i$ is the reward received at the $i$th time-step.
I am confused as to what…
Abhishek Bhatia
- 447
- 2
- 5
- 16
7
votes
1 answer
Has reinforcement learning been used to prove mathematical theorems?
Coq exists, and there are other similar projects out there. Further, Reinforcement Learning has made splashes in the domain of playing games (a la Deepmind & OpenAI and other less well-known efforts).
It seems to me that these two domains deserve to…
Frank Bryce
- 173
- 4
7
votes
1 answer
What happens when you select actions using softmax instead of epsilon greedy in DQN?
I understand the two major branches of RL are Q-Learning and Policy Gradient methods.
From my understanding (correct me if I'm wrong), policy gradient methods have an inherent exploration built-in as it selects actions using a probability…
Linsu Han
- 73
- 1
- 4
7
votes
1 answer
How to measure sample efficiency of a reinforcement learning algorithm?
I want to know if there is any metric to use for measuring sample-efficiency of a reinforcement learning algorithm? From reading research papers, I see claims that proposed models are more sample efficient but how does one reach this conclusion when…
rert588
- 330
- 1
- 7
7
votes
2 answers
How to resolve lexical ambiguity in natural language processing?
I'm interested in implementing a program for natural language processing (aka ELIZA).
Assuming that I'm already storing semantic-lexical connections between the words and its strength.
What are the methods of dealing with words which have very…
kenorb
- 10,525
- 6
- 45
- 95
7
votes
2 answers
Is there any difference between reward and return in reinforcement learning?
I am reading Sutton and Barto's book on reinforcement learning. I thought that reward and return were the same things.
However, in Section 5.6 of the book, 3rd line, first paragraph, it is written:
Whereas in Chapter 2 we averaged rewards, in…
SJa
- 393
- 3
- 17
7
votes
2 answers
Is there any good reference for double deep Q-learning?
I am new in reinforcement learning, but I already know deep Q-learning and Q-learning. Now, I want to learn about double deep Q-learning.
Do you know any good references for double deep Q-learning?
I have read some articles, but some of them don't…
dato nefaridze
- 882
- 10
- 22