Most Popular
1500 questions
6
votes
1 answer
What is the probability of selecting the greedy action in a 0.5-greedy selection method for the 2-armed bandit problem?
I'm new to reinforcement learning and I'm going through Sutton and Barto. Exercise 2.1 states the following:
In $\varepsilon$-greedy action selection, for the case of two actions and $\varepsilon=0.5$, what is the probability that the greedy action…
Daviiid
- 585
- 5
- 17
6
votes
1 answer
What is the effect of parallel environments in reinforcement learning?
Do parallel environments improve the agent's ability to learn or does it not really make a difference? Specifically, I am using PPO, but I think this applies across the board to other algorithms too.
Dylan Kerler
- 313
- 3
- 9
6
votes
3 answers
What exactly are partially observable environments?
I have trouble understanding the meaning of partially observable environments. Here's my doubt.
According to what I understand, the state of the environment is what precisely determines the next state and reward for any particular action taken. So,…
Chandrasekhara
- 63
- 1
- 5
6
votes
1 answer
Reward interpolation between MDPs. Will an optimal policy on both ends stay optimal inside the interval?
Say I've got two Markov Decision Processes (MDPs):
$$\mathcal{M_0} = (\mathcal{S}, \mathcal{A}, P, R_0),\quad\text{and}\quad\mathcal{M}_1 = (\mathcal{S}, \mathcal{A}, P, R_1)$$
Both have the same set of states and actions, and the transition…
Kostya
- 2,667
- 12
- 24
6
votes
4 answers
What are the typical sizes of practical/commercial artificial neural networks?
I'm interested in artificial neural networks (ANN) and I wonder how big ANNs in practical use are, for example, Tesla Autopilot, Google Translate, and others.
The only thing I found about Tesla is this one:
"A full build of Autopilot neural…
Mitarano
- 63
- 1
- 6
6
votes
1 answer
If $\gamma \in (0,1)$, what is the on-policy state distribution for episodic tasks?
In Reinforcement Learning: An Introduction, section 9.2 (page 199), Sutton and Barto describe the on-policy distribution in episodic tasks, with $\gamma =1$, as being
\begin{equation}
\mu(s) = \frac{\eta(s)}{\sum_{k \in S}…
Felipe Costa
- 103
- 5
6
votes
0 answers
Are generative models actually used in practice for industrial drug design?
I just finished reading this paper MoFlow: An Invertible Flow Model for Generating Molecular Graphs.
The paper, which is about generating molecular graphs with certain chemical properties improved the SOTA at the time of writing by a bit and used a…
Adriaan
- 61
- 2
6
votes
1 answer
How does the Alpha Zero's move encoding work?
I am a beginner in AI. I'm trying to train a multi-agent RL algorithm to play chess. One issue that I ran into was representing the action space (legal moves/or honestly just moves in general) numerically. I looked up how Alpha Zero represented it,…
Akshay Ghosh
- 105
- 5
6
votes
1 answer
In MCTS, what to do if I do not want to simulate till the end of the game?
I'm trying to implement MCTS with UCT for a board game and I'm kinda stuck. The state space is quite large (3e15), and I'd like to compute a good move in less than 2 seconds. I already have MCTS implemented in Java from here, and I noticed that it…
Sami
- 163
- 4
6
votes
2 answers
Has any schema-agnostic database engine been implemented?
Has any schema-agnostic database engine been implemented?
Leo
- 111
- 6
6
votes
2 answers
Are there RL algorithms that also try to predict the next state?
So far I've developed simple RL algorithms, like Deep Q-Learning and Double Deep Q-Learning. Also, I read a bit about A3C and policy gradient but superficially.
If I remember correctly, all these algorithms focus on the value of the action and try…
Ram Rachum
- 260
- 1
- 11
6
votes
4 answers
How widely accepted is the definition of intelligence by Marcus Hutter & Shane Legg?
I came across several papers by M. Hutter & S. Legg.
Especially this one:
Universal Intelligence: A Definition of Machine Intelligence, Shane Legg, Marcus Hutter
Given that it was published back in 2007, how much recognition or agreement has it…
Aether
- 285
- 2
- 7
6
votes
1 answer
How does mating take place in NEAT?
In the Evolving Neural Networks through Augmenting Topologies (NEAT) paper it says (p. 110):
The entire population is then replaced by the offspring of the remaining organisms in each species.
But how does it take place? Are they paired and then…
Miemels
- 389
- 2
- 12
6
votes
1 answer
It is possible to use deep learning to give approximate solutions to NP-hard graph theory problems?
It is possible to use deep learning to give approximate solutions to NP-hard graph theory problems?
If we take, for example, the travelling salesman problem (or the dominating set problem). Let's say I have a bunch of smaller examples, where I…
Jake B.
- 181
- 1
6
votes
2 answers
Why is tf.abs non-differentiable in Tensorflow?
I understand why tf.abs is non-differentiable in principle (discontinuity at 0) but the same applies to tf.nn.relu yet, in case of this function gradient is simply set to 0 at 0. Why the same logic is not applied to tf.abs? Whenever I tried to use…
zedsdead
- 63
- 4