Most Popular

1500 questions
16
votes
3 answers

How to implement a variable action space in Proximal Policy Optimization?

I'm coding a Proximal Policy Optimization (PPO) agent with the Tensorforce library (which is built on top of TensorFlow). The first environment was very simple. Now, I'm diving into a more complex environment, where all the actions are not available…
16
votes
5 answers

What is the most general definition of "intelligence"?

When we talk about artificial intelligence, human intelligence, or any other form of intelligence, what do we mean by the term intelligence in a general sense? What would you call intelligent and what not? In other words, how do we define the term…
user79161
  • 379
  • 1
  • 13
16
votes
3 answers

What is a "trajectory" in reinforcement learning?

I'm now learning about reinforcement learning, but I just found the word "trajectory" in this answer. However, I'm not sure what it means. I read a few books on the Reinforcement Learning but none of them mentioned it. Usually these introductionary…
Blaszard
  • 1,097
  • 4
  • 11
  • 25
16
votes
1 answer

What is the fringe in the context of search algorithms?

What is the fringe in the context of search algorithms?
tahasozgen
  • 309
  • 1
  • 2
  • 7
16
votes
5 answers

Why are the initial weights of neural networks randomly initialised?

This might sound silly to someone who has plenty of experience with neural networks but it bothers me... Random initial weights might give you better results that would be somewhat closer to what a trained neural network should look like, but it…
16
votes
3 answers

What roles knowledge bases play now and will play in the future?

Nowadays, artificial intelligence seems almost equal to machine learning, especially deep learning. Some have said that deep learning will replace human experts, traditionally very important for feature engineering, in this field. It is said that…
16
votes
1 answer

What language is the GPT-3 engine written in?

I know that the API is python based, but what's the gpt-3 engine written in mostly? C? C++? I'm having some trouble finding this info.
16
votes
4 answers

Why do activation functions need to be differentiable in the context of neural networks?

Why should an activation function of a neural network be differentiable? Is it strictly necessary or is it just advantageous?
user3642
16
votes
2 answers

Why is reinforcement learning not the answer to AGI?

I previously asked a question about How can an AI freely make decisions?. I got a great answer about how current algorithms lack agency. The first thing I thought of was reinforcement learning, since the entire concept is oriented around an agent…
16
votes
1 answer

Will parameter sweeping on one split of data followed by cross validation discover the right hyperparameters?

Let's call our dataset splits train/test/evaluate. We're in a situation where we require months of data. So we prefer to use the evaluation dataset as infrequently as possible to avoid polluting our results. Instead, we do 10 fold cross validation…
16
votes
2 answers

How can I automate the choice of the architecture of a neural network for an arbitrary problem?

Assume that I want to solve an issue with a neural network that either I can't fit to existing architectures (perceptron, Konohen, etc) or I'm simply not aware of the existence of those or I'm unable to understand their mechanics and I rely on my…
16
votes
1 answer

How to stay a up-to-date researcher in ML/RL community?

As a student who wants to work on machine learning, I would like to know how it is possible to start my studies and how to follow it to stay up-to-date. For example, I am willing to work on RL and MAB problems, but there are huge literatures on…
16
votes
2 answers

Why is it called Latent Vector?

I just learned about GAN and I'm a little bit confused about the naming of Latent Vector. First, In my understanding, a definition of a latent variable is a random variable that can't be measured directly (we needs some calculation from other…
malioboro
  • 2,859
  • 3
  • 23
  • 47
16
votes
3 answers

Is the optimal policy always stochastic if the environment is also stochastic?

Is the optimal policy always stochastic (that is, a map from states to a probability distribution over actions) if the environment is also stochastic? Intuitively, if the environment is deterministic (that is, if the agent is in a state $s$ and…
15
votes
1 answer

Why does the policy network in AlphaZero work?

In AlphaZero, the policy network (or head of the network) maps game states to a distribution of the likelihood of taking each action. This distribution covers all possible actions from that state. How is such a network possible? The possible actions…