Highest Voted 'softmax' Questions - Artificial Intelligence Stack Exchange

22

votes

3 answers

Are softmax outputs of classifiers true probabilities?

BACKGROUND: The softmax function is the most common choice for an activation function for the last dense layer of a multiclass neural network classifier. The outputs of the softmax function have mathematical properties of probabilities and are--in…

asked Nov 14 '22 at 19:11

Snehal Patel

1,037
1
4
27

7

votes

2 answers

Why does TensorFlow docs discourage using softmax as activation for the last layer?

The beginner colab example for tensorflow states: Note: It is possible to bake this tf.nn.softmax in as the activation function for the last layer of the network. While this can make the model output more directly interpretable, this approach is…

tensorflow objective-functions softmax

asked Apr 13 '20 at 07:38

galah92

173
5

6

votes

1 answer

Which paper introduced the term "softmax"?

Nowadays, the softmax function is widely used in deep learning and, specifically, classification with neural networks. However, the origins of this term and function are almost never mentioned anywhere. So, which paper introduced this term?

neural-networks deep-learning classification history softmax

asked Jul 10 '20 at 00:37

nbro

42,615
12
119
217

5

votes

2 answers

What is the advantage of using cross entropy loss & softmax?

I am trying to do the standard MNIST dataset image recognition test with a standard feed forward NN, but my network failed pretty badly. Now I have debugged it quite a lot and found & fixed some errors, but I had a few more ideas. For one, I am…

neural-networks gradient-descent cross-entropy mean-squared-error softmax

asked Oct 11 '20 at 00:13

Ben

455
3
11

4

votes

1 answer

Why are policy gradient methods more effective in high-dimensional action spaces?

David Silver argues, in his Reinforcement Learning course, that policy-based reinforcement learning (RL) is more effective than value-based RL in high-dimensional action spaces. He points out that the implicit policy (e.g., $\epsilon$-greedy) in…

policy-gradients value-functions function-approximation softmax value-based-methods

asked Dec 16 '22 at 12:52

Saucy Goat

153
5

2

votes

1 answer

Why do we use the softmax instead of no activation function?

Why do we use the softmax activation function on the last layer? Suppose $i$ is the index that has the highest value (in the case when we don't use softmax at all). If we use softmax and take $i$th value, it would be the highest value because $e$ is…

neural-networks activation-functions softmax multiclass-classification

asked May 07 '21 at 16:04

dato nefaridze

882
10
22

2

votes

1 answer

Why are there two versions of softmax cross entropy? Which one to use in what situation?

I have seen 2 forms of softmax cross-entropy loss and are confused by the two. Which one is the right one? For example in this Quora answer, there are 2 answers: $L(\mathbf{w})=\frac{1}{N} \sum_{n=1}^{N} H\left(p_{n}, q_{n}\right)=-\frac{1}{N}…

machine-learning comparison cross-entropy softmax categorical-crossentropy

asked Oct 24 '20 at 01:08

Herbert

123
4

2

votes

3 answers

What do the authors of this paper mean by the bias term in this picture of a neural network implementation?

I am reading a paper implementing a deep deterministic policy gradient algorithm for portfolio management. My question is about a specific neural network implementation they depict in this picture (paper, picture is on page 14). The first three…

convolutional-neural-networks papers algorithmic-bias softmax

asked May 09 '20 at 19:49

Mike

141
4

1

vote

1 answer

the scoring function of the policy

I read in the book and when I saw the formula to optimize the $\theta$ $$ \theta \leftarrow \theta + \alpha \nabla_\theta J(\pi_\theta) \\ \nabla_\theta J(\pi_\theta) = E_(\tau ~ \pi_\theta)[\sum_{t=0}^T…

policy-gradients softmax softmax-policy

asked Feb 18 '25 at 13:18

Vietnamese IPhO Competitant -

11
1

1

vote

3 answers

Why softmax/sigmoid use base e instead of 2?

Performing -ln(ε) in NumPy returns relatively small values like this: print(-np.log(np.finfo(np.float32).eps)) print(-np.log(np.finfo(np.float64).eps)) Output: 15.942385 36.04365338911715 Compared to -log2(ε) which has a greater range compared to…

python softmax sigmoid

asked Dec 18 '24 at 17:23

Muhammad Ikhwan Perwira

800
3
10

1

vote

1 answer

What is it called when model asking for validation?

As we know, classification problems are just a bunch of probabilities, commonly it comes from logits or softmax output. Performing $argmax$ to get the most favorable class by model, discarding us from some information. Such as the distribution of…

classification philosophy explainable-ai softmax

asked Dec 12 '24 at 06:51

Muhammad Ikhwan Perwira

800
3
10

1

vote

2 answers

One Softmax or two separate logistic regressions for the task of classifying pictures as a/b and c/d

Simply put, the question 11 in chapter 4 of Aurélien Géron's book "Hands-on Machine Learning" asks: Suppose you want to classify pictures as outdoor/indoor and daytime/nighttime. Should you implement two logistic regression classifiers or one…

machine-learning softmax logistic-regression

asked Apr 24 '24 at 17:25

Dimitri

33
6

1

vote

1 answer

Since $f_c$ returns the probability of class label $c$, we require $0 \le f_c \le 1$ for each $c$, and $\sum_{c = 1}^C f_c = 1$. Why avoid this?

Chapter 1.2.1.5 Uncertainty of Probabilistic Machine Learning: An Introduction by Kevin P. Murphy says the following: We can capture our uncertainty using the following conditional probability distribution: $$p(y = c \mid \mathbf{x};…

machine-learning supervised-learning softmax

asked Feb 24 '24 at 09:17

The Pointer

611
5
22

1

vote

1 answer

Is Softmax Necessary as the Activation Function for Self-Attention Mechanisms?

I’m curious about the mathematical reasoning behind the use of the softmax function as the activation function in self-attention mechanisms within neural networks. Specifically, I’m interested in understanding if there is a theoretical basis that…

deep-learning transformer attention softmax

asked Dec 07 '23 at 22:15

Kasia

303
2
9

1

vote

1 answer

Dealing with noise in models with softmax output

I have a device with an accelerometer and gyroscope (6-axis). The device sends live raw telemetry data to the model 40 samples for each input, 6 values per sample (accelerometer xyz, gyroscope xyz). The model predicts between 12 different labels of…

convolutional-neural-networks tensorflow softmax

asked Aug 13 '23 at 18:11

Sterling Duchess

113
3

Questions tagged [softmax]