Highest Voted Questions - Artificial Intelligence Stack Exchange

7

votes

1 answer

Why does a negative reward for every step really encourage the agent to reach the goal as quickly as possible?

If we shift the rewards by any constant (which is a type of reward shaping), the optimal state-action value function (and so optimal policy) does not change. The proof of this fact can be found here. If that's the case, then why does a negative…

reinforcement-learning proofs reward-shaping reward-functions

asked Nov 01 '20 at 23:09

nbro

42,615
12
119
217

7

votes

1 answer

What are the state-of-the-art results in OpenAI's gym environments?

What are the state-of-the-art results in OpenAI's gym environments? Is there a link to a paper/article that describes them and how these SOTA results were calculated?

reinforcement-learning reference-request gym state-of-the-art

asked Oct 30 '20 at 01:38

Tofara Moyo

71
3

7

votes

2 answers

What is meant by "ground truth" in the context AI?

What does "ground truth" mean in the context of AI especially in the context of machine learning? I am a little confused because I have read that the ground truth is the same as a label in supervised learning. And I think that's not quite right. I…

machine-learning terminology

asked Sep 29 '20 at 15:33

MScott

445
4
13

7

votes

1 answer

Are mult-adds and FLOPs equivalent?

I am comparing different CNN architectures for edge implementation. Some papers describing architectures refer to mult-adds, like the MobileNet V1 paper, where it is claimed that this net has 569M mult-adds, and others refer to floating-point…

convolutional-neural-networks terminology papers complexity-theory

asked Sep 08 '20 at 14:38

Quintus

71
1
2

7

votes

5 answers

Examples of single player games that use modern ML techniques in the AI?

Are there any examples of single player games that use modern ML technique in its games? By this I mean AI that plays with or against the human player, and not just play the game by itself (like Atari). "Modern ML techniques" is a vague term, but…

game-ai reference-request

asked Aug 27 '20 at 17:30

k.c. sayz 'k.c sayz'

2,121
13
27

7

votes

3 answers

What would be the best way to disable a rogue AI?

Suppose that an artificial superintelligence (ASI) has finally been developed, but it has rebelled against humanity. We can assume that the ASI is online and can reproduce itself through electronic devices. How would you disable the AI in the most…

agi superintelligence control-problem ai-takeover self-replicating-machines

asked Nov 05 '16 at 11:08

MountainSide Studios

383
3
9

7

votes

2 answers

What is the difference between vanilla policy gradient with a baseline as value function and advantage actor-critic?

What is the difference between vanilla policy gradient (VPG) with a baseline as value function and advantage actor-critic (A2C)? By vanilla policy gradient I am specifically referring to spinning up's explanation of VPG.

reinforcement-learning comparison policy-gradients actor-critic-methods advantage-actor-critic

asked Jul 27 '20 at 04:40

Vedant Shah

125
1
7

7

votes

1 answer

Can GANs be used to generate something other than images?

AFAIK, GANs are used for generating/synthesizing near-perfect human faces (deepfakes), gallery arts, etc., but can GANs be used to generate something other than images?

deep-learning applications generative-adversarial-networks deepfakes

asked Jul 22 '20 at 12:02

Pluviophile

1,293
7
20
40

7

votes

2 answers

What is a time-step in a Markov Decision Process?

The "discounted sum of future rewards" (or return) using discount factor $\gamma$ is $$\gamma^1 r_1 +\gamma^2 r_2 + \gamma^3 r_2 + \dots \tag{1}\label{1}$$ where $r_i$ is the reward received at the $i$th time-step. I am confused as to what…

reinforcement-learning terminology markov-decision-process return time-step

asked Oct 27 '16 at 19:57

Abhishek Bhatia

447
2
5
16

7

votes

1 answer

Has reinforcement learning been used to prove mathematical theorems?

Coq exists, and there are other similar projects out there. Further, Reinforcement Learning has made splashes in the domain of playing games (a la Deepmind & OpenAI and other less well-known efforts). It seems to me that these two domains deserve to…

reinforcement-learning automated-theorem-proving coq

asked Jun 28 '20 at 02:19

Frank Bryce

173
4

7

votes

1 answer

What happens when you select actions using softmax instead of epsilon greedy in DQN?

I understand the two major branches of RL are Q-Learning and Policy Gradient methods. From my understanding (correct me if I'm wrong), policy gradient methods have an inherent exploration built-in as it selects actions using a probability…

reinforcement-learning dqn policy-gradients epsilon-greedy-policy softmax-policy

asked Jun 23 '20 at 16:47

Linsu Han

73
1
4

7

votes

1 answer

How to measure sample efficiency of a reinforcement learning algorithm?

I want to know if there is any metric to use for measuring sample-efficiency of a reinforcement learning algorithm? From reading research papers, I see claims that proposed models are more sample efficient but how does one reach this conclusion when…

reinforcement-learning sample-efficiency

asked Jun 18 '20 at 09:29

rert588

330
1
7

7

votes

2 answers

How to resolve lexical ambiguity in natural language processing?

I'm interested in implementing a program for natural language processing (aka ELIZA). Assuming that I'm already storing semantic-lexical connections between the words and its strength. What are the methods of dealing with words which have very…

natural-language-processing lexical-recognition

asked Aug 03 '16 at 14:17

kenorb

10,525
6
45
95

7

votes

2 answers

Is there any difference between reward and return in reinforcement learning?

I am reading Sutton and Barto's book on reinforcement learning. I thought that reward and return were the same things. However, in Section 5.6 of the book, 3rd line, first paragraph, it is written: Whereas in Chapter 2 we averaged rewards, in…

reinforcement-learning comparison rewards return

asked Jun 04 '20 at 03:35

SJa

393
3
17

7

votes

2 answers

Is there any good reference for double deep Q-learning?

I am new in reinforcement learning, but I already know deep Q-learning and Q-learning. Now, I want to learn about double deep Q-learning. Do you know any good references for double deep Q-learning? I have read some articles, but some of them don't…

reinforcement-learning q-learning reference-request deep-rl

asked May 28 '20 at 15:55

dato nefaridze

882
10
22

Most Popular