Highest Voted Questions - Artificial Intelligence Stack Exchange

5

votes

2 answers

InstructGPT: What is the sigma in the loss function and why $\log(\cdot)$ is being used?

InstructGPT: What is the sigma in the loss function and why $\log(\cdot)$ is being used? $$ \operatorname{loss}(\theta) = -\frac{1}{\binom{K}{2}}E_{(x,y_w,y_l)\sim D}[\log(\sigma(r_{\theta}(x, y_w) - r_{\theta}(x, y_l)))] $$ The equation was taken…

machine-learning papers open-ai chat-bots instruct-gpt

asked Jan 17 '23 at 11:49

Nathan G

161
3

5

votes

4 answers

Would minimizing influence into the world be a safe directive to a general AI?

Let's take our standard paperclip maximizer General AI and attempt to obtain precisely one million paper clips, over course of a year, without destroying the universe in the process. Most maximization directives make the process run-away. As cheaply…

agi

asked Aug 23 '17 at 15:41

SF.

464
3
13

5

votes

2 answers

Why is $\sum_{s} \eta(s)$ a constant of proportionality in the proof of the policy gradient theorem?

In Sutton and Barto's book (http://incompleteideas.net/book/bookdraft2017nov5.pdf), a proof of the policy gradient theorem is provided on pg. 269 for an episodic case and a start state policy objective function (see picture below, last 3…

reinforcement-learning policy-gradients proofs sutton-barto policy-gradient-theorem

asked Jan 11 '23 at 20:50

jwl17

59
2

5

votes

1 answer

Is logistic regression more free from the conditional independence assumption than naive Bayes?

To my understanding, logistic regression is an extension of naive Bayes. Suppose $X = \{x_1, x_2, \dots, x_N \}$ and $Y = \{0, 1\}$, each $x_i$ is i.i.d and $P(x_i \mid Y=y_k) \sim \mathcal{N}(\mu, \sigma^2)$ is a Gaussian distribution. In order to…

machine-learning logistic-regression naive-bayes

asked Aug 21 '17 at 15:40

imflash217

499
5
15

5

votes

1 answer

Why is it recommended to use a "separate test environment" when evaluating a model?

I am training an agent (stable baselines3 algorithm) on a custom environment. During training, I want to have a callback so that for every $N$ steps of the learning process, I get the current model and run it on my environment $M$ times and log the…

reinforcement-learning deep-rl intelligent-agent environment stable-baselines

asked Dec 09 '22 at 17:18

jgklsdjfgkldsfaSDF

61
3

5

votes

1 answer

Is a decision tree less suitable for incremental learning than e.g. a neural net?

I can recall that a professor once said that decision trees are not good for incremental learning, as they have to be rebuilt from the ground up if new training examples arrive. Is this basically true? Quick googling just brought me to a lot of…

machine-learning reinforcement-learning learning-algorithms

asked Aug 16 '17 at 19:29

Ulu83

153
4

5

votes

2 answers

Concrete examples of unintentional adversarial AI behaviour

Are there any real-world examples of unintentional "bad" AI behaviour? I'm not looking for hypothetical arguments of malicious AI (AI in a box, paperclip maximizer), but for actual instances in history where some AI directly did something bad due to…

ai-safety

asked Aug 13 '17 at 16:48

k.c. sayz 'k.c sayz'

2,121
13
27

5

votes

1 answer

Does a bias also have a chance to be dropped out in Dropout layer?

Suppose that you have 80 neurons in a layer, where one neuron is bias. Then you add a dropout layer after the activation function of this layer. In this case, does it have a chance to drop out the bias neuron, or does the dropout only affect the…

deep-learning dropout

asked Aug 12 '17 at 04:38

Blaszard

1,097
4
11
25

5

votes

1 answer

Traveling salesman problem variant: which algorithm to choose?

I have an industrial problem which I'm trying to cast as a Traveling Salesman problem (TSP) in 3D euclidian space. There are physical limitations which implies that some subpaths may or may not be valid based on simple rules. What algorithm is best…

machine-learning reinforcement-learning genetic-algorithms combinatorics

asked Aug 11 '17 at 12:26

Oliver

51
1

5

votes

2 answers

What is the intuition behind self-attention?

I've been watching a few lectures on transformers, especially for language translation, though it seemingly becomes more confusing the more I watch. In this lecture, there seems to be two conflicting views of self-attention. First, there's an Iron…

neural-networks transformer attention

asked Nov 23 '22 at 04:32

User

215
1
5

5

votes

1 answer

Do Support Vector Machines have the ability to learn while in use?

I've read in some literature,that SVMs are characterized by their adaptivity. Does that mean they can learn while in use?

machine-learning support-vector-machine

asked Oct 21 '22 at 17:29

anon

5

votes

2 answers

Can hidden Markov models be used to model any time series data?

Can HMMs be used to model any time series data? Or does the data have to be that of a Markov process? In HTK documentation, I see that the first few lines state that it can model any time series HTK is a toolkit for building Hidden Markov Models…

machine-learning applications time-series hidden-markov-model

asked Aug 03 '17 at 08:56

vinjk

53
2

5

votes

0 answers

What exactly is non-delusional Q-learning?

Problems occur when we combine Q-learning with a function approximator. What exactly is the delusional-bias and non-delusional Q-learning? I am talking about the neurIPS 18 best paper Non-delusional Q-learning and value-iteration. I have trouble…

reinforcement-learning q-learning papers value-iteration policy-iteration

asked Oct 15 '22 at 20:38

wrek

183
4

5

votes

3 answers

Why do Decision Tree Learning Algorithm preferably outputs the smallest Decision Tree?

I have been following the ML course by Tom Mitchel. The inherent assumption while using Decision Tree Learning Algo is: The algo. preferably chooses a Decision Tree which is the smallest. Why is this so when we can have bigger extensions of the…

machine-learning unsupervised-learning learning-algorithms

asked Jul 31 '17 at 08:23

imflash217

499
5
15

5

votes

3 answers

How to implement an Automatic Learning Rate for a Neural Network?

I'm learning Neural Networks, and everything works as planned but, like humans do, adjusting themselves to learn more efficiently, I'm trying to understand conceptually how one might implement an auto adjusting learning rate for a Neural Network. I…

neural-networks

asked Jul 30 '17 at 19:06

Laceanu George

137
4

Most Popular