Most Popular

1500 questions
6
votes
1 answer

What is the probability of selecting the greedy action in a 0.5-greedy selection method for the 2-armed bandit problem?

I'm new to reinforcement learning and I'm going through Sutton and Barto. Exercise 2.1 states the following: In $\varepsilon$-greedy action selection, for the case of two actions and $\varepsilon=0.5$, what is the probability that the greedy action…
6
votes
1 answer

What is the effect of parallel environments in reinforcement learning?

Do parallel environments improve the agent's ability to learn or does it not really make a difference? Specifically, I am using PPO, but I think this applies across the board to other algorithms too.
6
votes
3 answers

What exactly are partially observable environments?

I have trouble understanding the meaning of partially observable environments. Here's my doubt. According to what I understand, the state of the environment is what precisely determines the next state and reward for any particular action taken. So,…
6
votes
1 answer

Reward interpolation between MDPs. Will an optimal policy on both ends stay optimal inside the interval?

Say I've got two Markov Decision Processes (MDPs): $$\mathcal{M_0} = (\mathcal{S}, \mathcal{A}, P, R_0),\quad\text{and}\quad\mathcal{M}_1 = (\mathcal{S}, \mathcal{A}, P, R_1)$$ Both have the same set of states and actions, and the transition…
Kostya
  • 2,667
  • 12
  • 24
6
votes
4 answers

What are the typical sizes of practical/commercial artificial neural networks?

I'm interested in artificial neural networks (ANN) and I wonder how big ANNs in practical use are, for example, Tesla Autopilot, Google Translate, and others. The only thing I found about Tesla is this one: "A full build of Autopilot neural…
6
votes
1 answer

If $\gamma \in (0,1)$, what is the on-policy state distribution for episodic tasks?

In Reinforcement Learning: An Introduction, section 9.2 (page 199), Sutton and Barto describe the on-policy distribution in episodic tasks, with $\gamma =1$, as being \begin{equation} \mu(s) = \frac{\eta(s)}{\sum_{k \in S}…
6
votes
0 answers

Are generative models actually used in practice for industrial drug design?

I just finished reading this paper MoFlow: An Invertible Flow Model for Generating Molecular Graphs. The paper, which is about generating molecular graphs with certain chemical properties improved the SOTA at the time of writing by a bit and used a…
6
votes
1 answer

How does the Alpha Zero's move encoding work?

I am a beginner in AI. I'm trying to train a multi-agent RL algorithm to play chess. One issue that I ran into was representing the action space (legal moves/or honestly just moves in general) numerically. I looked up how Alpha Zero represented it,…
6
votes
1 answer

In MCTS, what to do if I do not want to simulate till the end of the game?

I'm trying to implement MCTS with UCT for a board game and I'm kinda stuck. The state space is quite large (3e15), and I'd like to compute a good move in less than 2 seconds. I already have MCTS implemented in Java from here, and I noticed that it…
6
votes
2 answers

Has any schema-agnostic database engine been implemented?

Has any schema-agnostic database engine been implemented?
6
votes
2 answers

Are there RL algorithms that also try to predict the next state?

So far I've developed simple RL algorithms, like Deep Q-Learning and Double Deep Q-Learning. Also, I read a bit about A3C and policy gradient but superficially. If I remember correctly, all these algorithms focus on the value of the action and try…
6
votes
4 answers

How widely accepted is the definition of intelligence by Marcus Hutter & Shane Legg?

I came across several papers by M. Hutter & S. Legg. Especially this one: Universal Intelligence: A Definition of Machine Intelligence, Shane Legg, Marcus Hutter Given that it was published back in 2007, how much recognition or agreement has it…
Aether
  • 285
  • 2
  • 7
6
votes
1 answer

How does mating take place in NEAT?

In the Evolving Neural Networks through Augmenting Topologies (NEAT) paper it says (p. 110): The entire population is then replaced by the offspring of the remaining organisms in each species. But how does it take place? Are they paired and then…
6
votes
1 answer

It is possible to use deep learning to give approximate solutions to NP-hard graph theory problems?

It is possible to use deep learning to give approximate solutions to NP-hard graph theory problems? If we take, for example, the travelling salesman problem (or the dominating set problem). Let's say I have a bunch of smaller examples, where I…
6
votes
2 answers

Why is tf.abs non-differentiable in Tensorflow?

I understand why tf.abs is non-differentiable in principle (discontinuity at 0) but the same applies to tf.nn.relu yet, in case of this function gradient is simply set to 0 at 0. Why the same logic is not applied to tf.abs? Whenever I tried to use…
zedsdead
  • 63
  • 4