Highest Voted Questions - Artificial Intelligence Stack Exchange

6

votes

2 answers

How is parallelism implemented in RL algorithms like PPO?

There are multiple ways to implement parallelism in reinforcement learning. One is to use parallel workers running in their own environments to collect data in parallel, instead of using replay memory buffers (this is how A3C works, for…

reinforcement-learning actor-critic-methods implementation proximal-policy-optimization

asked Apr 30 '19 at 01:15

alex vdk

61
2

6

votes

1 answer

How to detect LEGO bricks by using a deep learning approach?

In my thesis I dealt with the question how a computer can recognize LEGO bricks. With multiple object detection, I chose a deep learning approach. I also looked at an existing training set of LEGO brick images and tried to optimize it. My…

deep-learning image-recognition tensorflow datasets object-recognition

asked Apr 03 '19 at 18:50

melawiki

61
1
3

6

votes

1 answer

Can this tic tac toe program be considered AI?

I coded a tic tac toe program, but I don't know if I can call it artificial intelligence. Here's what I did. There is a random player, which always makes random valid moves. And then there is the AI player, which will receive input before every…

neural-networks game-ai

asked Apr 02 '19 at 20:57

Pablo Carrasco Hernández

163
1
7

6

votes

1 answer

When should we use algorithms like Adam as opposed to SGD?

As far as I know, Stochastic Gradient Descent is an optimization algorithm which belongs to the the category of algorithms where hyper-parameters have to be defined beforehand. They are useful in many cases, but there are some cases that the…

machine-learning optimization

asked Mar 25 '19 at 22:46

Utku

173
1
5

6

votes

1 answer

Why Q2 is a more or less independant estimate in Twin Delayed DDPG (TD3)?

Twin Delayed Deep Deterministic (TD3) policy gradient is inspired by both double Q-learning and double DQN. In double Q-learning, I understand that Q1 and Q2 are independent because they are trained on different samples. In double DQN, I understand…

reinforcement-learning q-learning dqn deep-rl ddpg

asked Mar 24 '19 at 05:26

Luke Guye

181
2

6

votes

1 answer

Reinforcement Learning with more actions than states

I have read a lot about RL recently. As far as I understood, most RL applications have much more states than there are actions to choose from. I am thinking about using RL for a problem where I have got a lot of actions to choose from, but only very…

reinforcement-learning policy-gradients greedy-ai

asked Mar 18 '19 at 23:16

Jan

361
3
13

6

votes

4 answers

Which machine learning algorithm is used in self-driving cars?

Which deep neural network is used in Google's driverless cars to analyze the surroundings? Is this information open to the public?

deep-neural-networks algorithm autonomous-vehicles

asked Aug 02 '16 at 18:59

kenorb

10,525
6
45
95

6

votes

1 answer

Why is a constant plane of ones added into the input features of AlphaGo?

In the paper Mastering the game of Go with deep neural networks and tree search, the input features of the networks of AlphaGo contains a plane of constant ones and a plane of constant zeros, as following. Feature #of planes Description Stone…

machine-learning alphago

asked Mar 05 '19 at 09:22

Yangcy

61
2

6

votes

1 answer

Why would someone use NEAT over other machine learning algorithms?

Why would someone use a neuroevolution algorithm, such as NEAT, over other machine learning algorithms? What situation would only apply to an algorithm such as NEAT, but no other machine learning algorithm?

machine-learning neat comparison neuroevolution

asked Mar 02 '19 at 10:17

Sebastian Dixon

107
1
6

6

votes

2 answers

What is "planning" in the context of reinforcement learning, and how is it different from RL and SL?

This is an excerpt taken from Sutton and Barto (pg. 3): Another key feature of reinforcement learning is that it explicitly considers the whole problem of a goal-directed agent interacting with an uncertain environment. This is in contrast with…

reinforcement-learning comparison terminology supervised-learning planning

asked Feb 16 '19 at 14:04

user9947

6

votes

1 answer

What is the relation between a policy which is the solution to a MDP and a policy like $\epsilon$-greedy?

In the context of reinforcement learning, a policy, $\pi$, is often defined as a function from the space of states, $\mathcal{S}$, to the space of actions, $\mathcal{A}$, that is, $\pi : \mathcal{S} \rightarrow \mathcal{A}$. This function is the…

reinforcement-learning definitions markov-decision-process policies exploration-strategies

asked Feb 10 '19 at 16:34

nbro

42,615
12
119
217

6

votes

1 answer

Can TD($\lambda$) be used with deep reinforcement learning?

TD lambda is a way to interpolate between TD(0) - bootstrapping over a single step, and, TD(max), bootstrapping over the entire episode length, or, Monte Carlo. Reading the link above, I see that an eligibility trace is kept for each state in order…

reinforcement-learning deep-rl temporal-difference-methods eligibility-traces td-lambda

asked Feb 02 '19 at 17:30

Gulzar

789
1
10
27

6

votes

2 answers

Rollout algorithm like Monte Carlo search suggest model based reinforcement learning?

From what I understand, Monte Carlo Tree Search Algorithm is a solution algorithm for model free reinforcement learning (RL). Model free RL means agent doesnt know the transition and reward model. Thus for it to know which next state it will observe…

reinforcement-learning models monte-carlo-tree-search

asked Jan 30 '19 at 08:20

user21872

61
1
4

6

votes

2 answers

What does learning mean?

Can someone explain what is the process of learning? What does it mean to learn something?

machine-learning terminology philosophy definitions

asked Jan 28 '19 at 20:27

Jay Critch

343
1
7

6

votes

1 answer

What is a "logit probability"?

DeepMind's paper "Mastering the game of Go without human knowledge" states in its "Methods" section on its "Neural network architecture" that the output layer of AlphaGo Zero's policy head is "A fully connected linear layer that outputs a vector of…

neural-networks terminology activation-functions alphago-zero

asked Jan 23 '19 at 15:33

sadakatsu

163
4

Most Popular