Most Popular

1500 questions
7
votes
1 answer

Why does a negative reward for every step really encourage the agent to reach the goal as quickly as possible?

If we shift the rewards by any constant (which is a type of reward shaping), the optimal state-action value function (and so optimal policy) does not change. The proof of this fact can be found here. If that's the case, then why does a negative…
nbro
  • 42,615
  • 12
  • 119
  • 217
7
votes
1 answer

What are the state-of-the-art results in OpenAI's gym environments?

What are the state-of-the-art results in OpenAI's gym environments? Is there a link to a paper/article that describes them and how these SOTA results were calculated?
7
votes
2 answers

What is meant by "ground truth" in the context AI?

What does "ground truth" mean in the context of AI especially in the context of machine learning? I am a little confused because I have read that the ground truth is the same as a label in supervised learning. And I think that's not quite right. I…
MScott
  • 445
  • 4
  • 13
7
votes
1 answer

Are mult-adds and FLOPs equivalent?

I am comparing different CNN architectures for edge implementation. Some papers describing architectures refer to mult-adds, like the MobileNet V1 paper, where it is claimed that this net has 569M mult-adds, and others refer to floating-point…
7
votes
5 answers

Examples of single player games that use modern ML techniques in the AI?

Are there any examples of single player games that use modern ML technique in its games? By this I mean AI that plays with or against the human player, and not just play the game by itself (like Atari). "Modern ML techniques" is a vague term, but…
k.c. sayz 'k.c sayz'
  • 2,121
  • 13
  • 27
7
votes
3 answers

What would be the best way to disable a rogue AI?

Suppose that an artificial superintelligence (ASI) has finally been developed, but it has rebelled against humanity. We can assume that the ASI is online and can reproduce itself through electronic devices. How would you disable the AI in the most…
7
votes
2 answers

What is the difference between vanilla policy gradient with a baseline as value function and advantage actor-critic?

What is the difference between vanilla policy gradient (VPG) with a baseline as value function and advantage actor-critic (A2C)? By vanilla policy gradient I am specifically referring to spinning up's explanation of VPG.
7
votes
1 answer

Can GANs be used to generate something other than images?

AFAIK, GANs are used for generating/synthesizing near-perfect human faces (deepfakes), gallery arts, etc., but can GANs be used to generate something other than images?
7
votes
2 answers

What is a time-step in a Markov Decision Process?

The "discounted sum of future rewards" (or return) using discount factor $\gamma$ is $$\gamma^1 r_1 +\gamma^2 r_2 + \gamma^3 r_2 + \dots \tag{1}\label{1}$$ where $r_i$ is the reward received at the $i$th time-step. I am confused as to what…
7
votes
1 answer

Has reinforcement learning been used to prove mathematical theorems?

Coq exists, and there are other similar projects out there. Further, Reinforcement Learning has made splashes in the domain of playing games (a la Deepmind & OpenAI and other less well-known efforts). It seems to me that these two domains deserve to…
7
votes
1 answer

What happens when you select actions using softmax instead of epsilon greedy in DQN?

I understand the two major branches of RL are Q-Learning and Policy Gradient methods. From my understanding (correct me if I'm wrong), policy gradient methods have an inherent exploration built-in as it selects actions using a probability…
7
votes
1 answer

How to measure sample efficiency of a reinforcement learning algorithm?

I want to know if there is any metric to use for measuring sample-efficiency of a reinforcement learning algorithm? From reading research papers, I see claims that proposed models are more sample efficient but how does one reach this conclusion when…
rert588
  • 330
  • 1
  • 7
7
votes
2 answers

How to resolve lexical ambiguity in natural language processing?

I'm interested in implementing a program for natural language processing (aka ELIZA). Assuming that I'm already storing semantic-lexical connections between the words and its strength. What are the methods of dealing with words which have very…
kenorb
  • 10,525
  • 6
  • 45
  • 95
7
votes
2 answers

Is there any difference between reward and return in reinforcement learning?

I am reading Sutton and Barto's book on reinforcement learning. I thought that reward and return were the same things. However, in Section 5.6 of the book, 3rd line, first paragraph, it is written: Whereas in Chapter 2 we averaged rewards, in…
SJa
  • 393
  • 3
  • 17
7
votes
2 answers

Is there any good reference for double deep Q-learning?

I am new in reinforcement learning, but I already know deep Q-learning and Q-learning. Now, I want to learn about double deep Q-learning. Do you know any good references for double deep Q-learning? I have read some articles, but some of them don't…