Most Popular
1500 questions
5
votes
2 answers
InstructGPT: What is the sigma in the loss function and why $\log(\cdot)$ is being used?
InstructGPT: What is the sigma in the loss function and why $\log(\cdot)$ is being used?
$$ \operatorname{loss}(\theta) = -\frac{1}{\binom{K}{2}}E_{(x,y_w,y_l)\sim D}[\log(\sigma(r_{\theta}(x, y_w) - r_{\theta}(x, y_l)))] $$
The equation was taken…
Nathan G
- 161
- 3
5
votes
4 answers
Would minimizing influence into the world be a safe directive to a general AI?
Let's take our standard paperclip maximizer General AI and attempt to obtain precisely one million paper clips, over course of a year, without destroying the universe in the process.
Most maximization directives make the process run-away. As cheaply…
SF.
- 464
- 3
- 13
5
votes
2 answers
Why is $\sum_{s} \eta(s)$ a constant of proportionality in the proof of the policy gradient theorem?
In Sutton and Barto's book (http://incompleteideas.net/book/bookdraft2017nov5.pdf), a proof of the policy gradient theorem is provided on pg. 269 for an episodic case and a start state policy objective function (see picture below, last 3…
jwl17
- 59
- 2
5
votes
1 answer
Is logistic regression more free from the conditional independence assumption than naive Bayes?
To my understanding, logistic regression is an extension of naive Bayes.
Suppose $X = \{x_1, x_2, \dots, x_N \}$ and $Y = \{0, 1\}$, each $x_i$ is i.i.d and $P(x_i \mid Y=y_k) \sim \mathcal{N}(\mu, \sigma^2)$ is a Gaussian distribution.
In order to…
imflash217
- 499
- 5
- 15
5
votes
1 answer
Why is it recommended to use a "separate test environment" when evaluating a model?
I am training an agent (stable baselines3 algorithm) on a custom environment. During training, I want to have a callback so that for every $N$ steps of the learning process, I get the current model and run it on my environment $M$ times and log the…
jgklsdjfgkldsfaSDF
- 61
- 3
5
votes
1 answer
Is a decision tree less suitable for incremental learning than e.g. a neural net?
I can recall that a professor once said that decision trees are not good for incremental learning, as they have to be rebuilt from the ground up if new training examples arrive.
Is this basically true? Quick googling just brought me to a lot of…
Ulu83
- 153
- 4
5
votes
2 answers
Concrete examples of unintentional adversarial AI behaviour
Are there any real-world examples of unintentional "bad" AI behaviour? I'm not looking for hypothetical arguments of malicious AI (AI in a box, paperclip maximizer), but for actual instances in history where some AI directly did something bad due to…
k.c. sayz 'k.c sayz'
- 2,121
- 13
- 27
5
votes
1 answer
Does a bias also have a chance to be dropped out in Dropout layer?
Suppose that you have 80 neurons in a layer, where one neuron is bias. Then you add a dropout layer after the activation function of this layer.
In this case, does it have a chance to drop out the bias neuron, or does the dropout only affect the…
Blaszard
- 1,097
- 4
- 11
- 25
5
votes
1 answer
Traveling salesman problem variant: which algorithm to choose?
I have an industrial problem which I'm trying to cast as a Traveling Salesman problem (TSP) in 3D euclidian space. There are physical limitations which implies that some subpaths may or may not be valid based on simple rules.
What algorithm is best…
Oliver
- 51
- 1
5
votes
2 answers
What is the intuition behind self-attention?
I've been watching a few lectures on transformers, especially for language translation, though it seemingly becomes more confusing the more I watch.
In this lecture, there seems to be two conflicting views of self-attention. First, there's an Iron…
User
- 215
- 1
- 5
5
votes
1 answer
Do Support Vector Machines have the ability to learn while in use?
I've read in some literature,that SVMs are characterized by their adaptivity. Does that mean they can learn while in use?
anon
5
votes
2 answers
Can hidden Markov models be used to model any time series data?
Can HMMs be used to model any time series data? Or does the data have to be that of a Markov process?
In HTK documentation, I see that the first few lines state that it can model any time series
HTK is a toolkit for building Hidden Markov Models…
vinjk
- 53
- 2
5
votes
0 answers
What exactly is non-delusional Q-learning?
Problems occur when we combine Q-learning with a function approximator.
What exactly is the delusional-bias and non-delusional Q-learning? I am talking about the neurIPS 18 best paper Non-delusional Q-learning and value-iteration.
I have trouble…
wrek
- 183
- 4
5
votes
3 answers
Why do Decision Tree Learning Algorithm preferably outputs the smallest Decision Tree?
I have been following the ML course by Tom Mitchel.
The inherent assumption while using Decision Tree Learning Algo is: The algo. preferably chooses a Decision Tree which is the smallest.
Why is this so when we can have bigger extensions of the…
imflash217
- 499
- 5
- 15
5
votes
3 answers
How to implement an Automatic Learning Rate for a Neural Network?
I'm learning Neural Networks, and everything works as planned but, like humans do, adjusting themselves to learn more efficiently, I'm trying to understand conceptually how one might implement an auto adjusting learning rate for a Neural Network.
I…
Laceanu George
- 137
- 4