Highest Voted Questions - Artificial Intelligence Stack Exchange

8

votes

2 answers

What is the appropriate way to deal with multiple paths to same state in MCTS?

Many games have multiple paths to the same states. What is the appropriate way to deal with this in MCTS? If the state appears once in the tree, but with multiple parents, then it seems to be difficult to define back propagation: do we only…

monte-carlo-tree-search

asked Jul 30 '19 at 10:14

Jay McCarthy

225
1
5

8

votes

1 answer

Can I train a neural network incrementally given new daily data?

I would like to know if it was possible to train a neural network on daily new data. Let me explain this more in detail. Let's say you have daily data from 2010 to 2019. You train your NN on all of it, but, from now on, every day in 2019 you get new…

neural-networks machine-learning classification incremental-learning

asked Jul 03 '19 at 22:40

neomatriciel

91
1
2

8

votes

1 answer

Can a single neural network handle recognizing two types of objects, or should it be split into two smaller networks?

In particular, an embedded computer (with limited resources) analyzes live video stream from a traffic camera, trying to pick good frames that contain license plate numbers of passing cars. Once a plate is located, the frame is handed over to an OCR…

neural-networks image-recognition

asked Aug 02 '16 at 15:52

SF.

464
3
13

8

votes

1 answer

What is the purpose of the actor in actor-critic algorithms?

For discrete action spaces, what is the purpose of the actor in actor-critic algorithms? My current understanding is that the critic estimates the future reward given an action, so why not just take the action that maximizes the estimated…

reinforcement-learning actor-critic-methods exploration-exploitation-tradeoff

asked May 18 '19 at 23:07

David Rein

93
6

8

votes

3 answers

What is the difference between a stochastic and a deterministic policy?

In reinforcement learning, there are the concepts of stochastic (or probabilistic) and deterministic policies. What is the difference between them?

reinforcement-learning comparison policies deterministic-policy stochastic-policy

asked May 12 '19 at 18:50

nbro

42,615
12
119
217

8

votes

1 answer

Which machine learning models are universal function approximators?

The universal approximation theorem states that a feed-forward neural network with a single hidden layer containing a finite number of neurons can approximate any continuous function (provided some assumptions on the activation function are…

neural-networks machine-learning reference-request function-approximation universal-approximation-theorems

asked May 10 '19 at 12:15

nbro

42,615
12
119
217

8

votes

1 answer

What's the advantage of log_softmax over softmax?

Previously I have learned that the softmax as the output layer coupled with the log-likelihood cost function (the same as the the nll_loss in pytorch) can solve the learning slowdown problem. However, while I am learning the pytorch mnist tutorial,…

objective-functions activation-functions

asked Apr 30 '19 at 15:36

user1024

181
2

8

votes

1 answer

How is the policy gradient calculated in REINFORCE?

Reading Sutton and Barto, I see the following in describing policy gradients: How is the gradient calculated with respect to an action (taken at time t)? I've read implementations of the algorithm, but conceptually I'm not sure I understand how the…

reinforcement-learning policy-gradients sutton-barto notation reinforce

asked Apr 21 '19 at 19:23

Hanzy

519
3
11

8

votes

2 answers

How can alpha zero learn if the tree search stops and restarts before finishing a game?

I am trying to understand how alpha zero works, but there is one point that I have problems understanding, even after reading several different explanations. As I understand it (see for example…

deep-learning reinforcement-learning monte-carlo-tree-search alphazero

asked Apr 12 '19 at 11:42

Jonathan Lindgren

183
3

8

votes

2 answers

Can DQN perform better than Double DQN?

I'm training both DQN and double DQN in the same environment, but DQN performs significantly better than double DQN. As I've seen in the double DQN paper, double DQN should perform better than DQN. Am I doing something wrong or is it possible?

reinforcement-learning q-learning dqn deep-rl double-dqn

asked Apr 08 '19 at 09:08

Angelo

211
2
17

8

votes

2 answers

Is reinforcement learning using shallow neural networks still deep reinforcement learning?

Often times I see the term deep reinforcement learning to refer to RL algorithms that use neural networks, regardless of whether or not the networks are deep. For example, PPO is often considered a deep RL algorithm, but using a deep network is not…

deep-learning reinforcement-learning terminology deep-rl

asked Mar 30 '19 at 05:31

yewang

361
2
7

8

votes

1 answer

Are there existing examples of using neural networks for static code analysis?

Background Context: In the past I've heavily applied various "code quality metrics" to statically analyze code to provide an inkling of how "maintainable" it is and using things like the Maintainability Index alluded to here. However, a problem that…

neural-networks machine-learning quality-control

asked Mar 24 '19 at 19:37

PhD

181
2

8

votes

1 answer

Which unsupervised learning technique can be used for anomaly detection in a time series?

I've started working on anomaly detection in Python. My dataset is a time series one. The data is being collected by some sensors which record and collect data on semiconductor-making machines. My dataset looks like this: ContextID Time_ms…

neural-networks machine-learning unsupervised-learning time-series anomaly-detection

asked Mar 22 '19 at 10:45

some_programmer

225
1
4

8

votes

1 answer

What are the main benefits of using Bayesian networks?

I have some trouble understanding the benefits of Bayesian networks. Am I correct that the key benefit of the network is that one does not need to use the chain rule of probability in order to calculate joint distributions? So, using the chain…

applications probability-distribution probability-theory bayesian-networks

asked Feb 18 '19 at 11:53

Sebastian Dine

181
1

8

votes

1 answer

Why isn't the ElliotSig activation function widely used?

The Softsign (a.k.a. ElliotSig) activation function is really simple: $$ f(x) = \frac{x}{1+|x|} $$ It is bounded $[-1,1]$, has a first derivative, it is monotonic, and it is computationally extremely simple (easy for, e.g., a GPU). Why it is not…

machine-learning activation-functions performance

asked Feb 05 '19 at 11:34

Pietro

183
1
8

Most Popular