Highest Voted Questions - Artificial Intelligence Stack Exchange

6

votes

1 answer

What is the probability of selecting the greedy action in a 0.5-greedy selection method for the 2-armed bandit problem?

I'm new to reinforcement learning and I'm going through Sutton and Barto. Exercise 2.1 states the following: In $\varepsilon$-greedy action selection, for the case of two actions and $\varepsilon=0.5$, what is the probability that the greedy action…

reinforcement-learning multi-armed-bandits epsilon-greedy-policy

asked May 23 '21 at 20:48

Daviiid

585
5
17

6

votes

1 answer

What is the effect of parallel environments in reinforcement learning?

Do parallel environments improve the agent's ability to learn or does it not really make a difference? Specifically, I am using PPO, but I think this applies across the board to other algorithms too.

reinforcement-learning deep-rl proximal-policy-optimization

asked May 23 '21 at 12:18

Dylan Kerler

313
3
9

6

votes

3 answers

What exactly are partially observable environments?

I have trouble understanding the meaning of partially observable environments. Here's my doubt. According to what I understand, the state of the environment is what precisely determines the next state and reward for any particular action taken. So,…

reinforcement-learning definitions environment state-spaces pomdp

asked May 22 '21 at 07:39

Chandrasekhara

63
1
5

6

votes

1 answer

Reward interpolation between MDPs. Will an optimal policy on both ends stay optimal inside the interval?

Say I've got two Markov Decision Processes (MDPs): $$\mathcal{M_0} = (\mathcal{S}, \mathcal{A}, P, R_0),\quad\text{and}\quad\mathcal{M}_1 = (\mathcal{S}, \mathcal{A}, P, R_1)$$ Both have the same set of states and actions, and the transition…

markov-decision-process rewards reward-shaping interpolation

asked May 21 '21 at 22:32

Kostya

2,667
12
24

6

votes

4 answers

What are the typical sizes of practical/commercial artificial neural networks?

I'm interested in artificial neural networks (ANN) and I wonder how big ANNs in practical use are, for example, Tesla Autopilot, Google Translate, and others. The only thing I found about Tesla is this one: "A full build of Autopilot neural…

neural-networks machine-learning reference-request autonomous-vehicles

asked May 16 '21 at 15:01

Mitarano

63
1
6

6

votes

1 answer

If $\gamma \in (0,1)$, what is the on-policy state distribution for episodic tasks?

In Reinforcement Learning: An Introduction, section 9.2 (page 199), Sutton and Barto describe the on-policy distribution in episodic tasks, with $\gamma =1$, as being \begin{equation} \mu(s) = \frac{\eta(s)}{\sum_{k \in S}…

reinforcement-learning policy-gradients sutton-barto on-policy-methods discount-factor

asked May 13 '21 at 22:22

Felipe Costa

103
5

6

votes

0 answers

Are generative models actually used in practice for industrial drug design?

I just finished reading this paper MoFlow: An Invertible Flow Model for Generating Molecular Graphs. The paper, which is about generating molecular graphs with certain chemical properties improved the SOTA at the time of writing by a bit and used a…

generative-adversarial-networks generative-model geometric-deep-learning mo-flow drug-design

asked May 09 '21 at 16:39

Adriaan

61
2

6

votes

1 answer

How does the Alpha Zero's move encoding work?

I am a beginner in AI. I'm trying to train a multi-agent RL algorithm to play chess. One issue that I ran into was representing the action space (legal moves/or honestly just moves in general) numerically. I looked up how Alpha Zero represented it,…

reinforcement-learning alphazero chess multi-agent-systems action-spaces

asked Apr 14 '21 at 17:57

Akshay Ghosh

105
5

6

votes

1 answer

In MCTS, what to do if I do not want to simulate till the end of the game?

I'm trying to implement MCTS with UCT for a board game and I'm kinda stuck. The state space is quite large (3e15), and I'd like to compute a good move in less than 2 seconds. I already have MCTS implemented in Java from here, and I noticed that it…

monte-carlo-tree-search monte-carlo-methods upper-confidence-bound

asked Apr 05 '21 at 01:28

Sami

163
4

6

votes

2 answers

Has any schema-agnostic database engine been implemented?

natural-language-processing knowledge-representation

asked Jan 23 '17 at 09:22

Leo

111
6

6

votes

2 answers

Are there RL algorithms that also try to predict the next state?

So far I've developed simple RL algorithms, like Deep Q-Learning and Double Deep Q-Learning. Also, I read a bit about A3C and policy gradient but superficially. If I remember correctly, all these algorithms focus on the value of the action and try…

reinforcement-learning deep-rl model-based-methods algorithm-request

asked Apr 01 '21 at 20:40

Ram Rachum

260
1
11

6

votes

4 answers

How widely accepted is the definition of intelligence by Marcus Hutter & Shane Legg?

I came across several papers by M. Hutter & S. Legg. Especially this one: Universal Intelligence: A Definition of Machine Intelligence, Shane Legg, Marcus Hutter Given that it was published back in 2007, how much recognition or agreement has it…

agi definitions intelligent-agent intelligence

asked Mar 23 '21 at 06:18

Aether

285
2
7

6

votes

1 answer

How does mating take place in NEAT?

In the Evolving Neural Networks through Augmenting Topologies (NEAT) paper it says (p. 110): The entire population is then replaced by the offspring of the remaining organisms in each species. But how does it take place? Are they paired and then…

neural-networks genetic-algorithms evolutionary-algorithms neat neuroevolution

asked Jan 18 '17 at 17:38

Miemels

389
2
12

6

votes

1 answer

It is possible to use deep learning to give approximate solutions to NP-hard graph theory problems?

It is possible to use deep learning to give approximate solutions to NP-hard graph theory problems? If we take, for example, the travelling salesman problem (or the dominating set problem). Let's say I have a bunch of smaller examples, where I…

neural-networks deep-learning tensorflow keras graph-theory

asked Mar 10 '21 at 20:58

Jake B.

181
1

6

votes

2 answers

Why is tf.abs non-differentiable in Tensorflow?

I understand why tf.abs is non-differentiable in principle (discontinuity at 0) but the same applies to tf.nn.relu yet, in case of this function gradient is simply set to 0 at 0. Why the same logic is not applied to tf.abs? Whenever I tried to use…

tensorflow backpropagation relu gradient

asked Feb 17 '21 at 17:29

zedsdead

63
4

Most Popular