Highest Voted 'regularization' Questions - Artificial Intelligence Stack Exchange

12

votes

3 answers

Are there any rules of thumb for having some idea of what capacity a neural network needs to have for a given problem?

To give an example. Let's just consider the MNIST dataset of handwritten digits. Here are some things which might have an impact on the optimum model capacity: There are 10 output classes The inputs are 28x28 grayscale pixels (I think this…

asked Feb 24 '20 at 20:00

Alexander Soare

1,379
3
12
28

11

votes

2 answers

Can someone explain R1 regularization function in simple terms?

I'm trying to understand the R1 regularization function, both the abstract concept and every symbol in the formula. According to the article, the definition of R1 is: It penalizes the discriminator from deviating from the Nash Equilibrium via…

machine-learning generative-adversarial-networks regularization r1-regularization

asked Dec 30 '20 at 08:41

Aviad Hadad

221
2
7

10

votes

1 answer

What is "early stopping" in machine learning?

What is early stopping in machine learning and, in general, artificial intelligence? What are the advantages of using this method? How does it help exactly? I'd be interested in perspectives and links to recent research.

deep-learning definitions overfitting regularization early-stopping

asked Aug 02 '16 at 15:53

kenorb

10,525
6
45
95

8

votes

3 answers

How should we regularize an LSTM model?

There are five parameters from an LSTM layer for regularization if I am correct. To deal with overfitting, I would start with reducing the layers reducing the hidden units Applying dropout or regularizers. There are kernel_regularizer,…

recurrent-neural-networks long-short-term-memory overfitting regularization dropout

asked Jan 25 '21 at 05:50

Leo

133
1
1
6

7

votes

2 answers

Why is dropout favoured compared to reducing the number of units in hidden layers?

Why is dropout favored compared to reducing the number of units in hidden layers for the convolutional networks? If a large set of units leads to overfitting and dropping out "averages" the response units, why not just suppress units? I have read…

neural-networks machine-learning deep-learning dropout regularization

asked Dec 11 '19 at 16:26

pascal sautot

241
1
9

5

votes

2 answers

Why did the L1/L2 regularization technique not improve my accuracy?

I am training a multilayer neural network with 146 samples (97 for the training set, 20 for the validation set, and 29 for the testing set). I am using: automatic differentiation, SGD method, fixed learning rate + momentum term, logistic…

neural-networks deep-learning training overfitting regularization

asked Oct 24 '18 at 20:05

LVoltz

131
1
6

5

votes

1 answer

How does L2 regularization make weights smaller?

I'm learning logistic regression and $L_2$ regularization. The cost function looks like below. $$J(w) = -\displaystyle\sum_{i=1}^{n} (y^{(i)}\log(\phi(z^{(i)})+(1-y^{(i)})\log(1-\phi(z^{(i)})))$$ And the regularization term is added. ($\lambda$ is a…

machine-learning proofs hyper-parameters regularization l2-regularization

asked Sep 23 '18 at 03:24

Riddle Aaron

65
3

4

votes

1 answer

What is the best way to combine or weight multiple losses with gradient descent?

I am optimizing a neural network with Adam using 3 different losses. Their scale is very different, and the current method is to either sum the losses and clip the gradient or to manually weight them within the sum. Something like:…

objective-functions gradient-descent regularization gradient-clipping

asked May 24 '23 at 17:29

Simon

263
1
8

4

votes

2 answers

How does Regularization Reduce Overfitting?

As I understand, this is the general summary of the Regularization-Overfitting Problem: The classical "Bias-Variance Tradeoff" suggests that complicated models (i.e. models with more parameters, e.g. neural networks with many layers/weights) are…

neural-networks overfitting regularization

asked Jan 24 '22 at 22:25

stats_noob

299
3
12

4

votes

0 answers

When is using weight regularization bad?

Regularization of weights (e.g. L1 or L2) keeps them small and standardized, which can help reduce data overfitting. From this article, regularization sounds favorable in many cases, but is it always encouraged? Are there scenarios in which it…

neural-networks regularization weights l2-regularization l1-regularization

asked Dec 21 '20 at 20:41

mark mark

813
6
25

4

votes

1 answer

Why does L1 regularization yield sparse features?

In contrast to L2 regularization, L1 regularization usually yields sparse feature vectors and most feature weights are zero. What's the reason for the above statement - could someone explain it mathematically, and/or provide some intuition (maybe…

machine-learning regularization l2-regularization l1-regularization

asked Jul 02 '20 at 08:44

stoic-santiago

1,201
9
22

4

votes

1 answer

Is there a way to ensure that my model is able to recognize an unseen example?

My question is more theoretical than practical. Let's say that I am training my cat classifier with a dataset that I feel is pretty representative of cat images in general. But then a new breed of cat is created that is distinct from other cats and…

neural-networks machine-learning overfitting regularization generalization

asked Feb 24 '20 at 21:31

mdurrant

41
2

4

votes

1 answer

What is the $\ell_{2, 1}$ norm?

I'm reading this paper and it says: In this paper, we present a multi-class embedded feature selection method called as sparse optimal scoring with adjustment (SOSA), which is capable of addressing the data heterogeneity issue. We propose to…

machine-learning feature-selection regularization

asked Dec 30 '19 at 21:32

Gyntonic

143
1
5

3

votes

0 answers

Regarding L0 sparsification of DNNs proposed by Louizos, Kingma and Welling

I am reading the paper on $\ell_0$ regularization of DNNs by Louizos, Welling and Kingma (2017) (Link to arxiv). In Section 2.1 the authors define the cost function as follows: $$ \mathcal{R}\left( \tilde{\theta}, \pi \right) =…

deep-learning training regularization

asked Nov 18 '18 at 19:47

panini

31
2

3

votes

1 answer

How does dropout work during backpropagation?

I've searched for an answer to this, and read several scientific articles on the subject, but I can't find a practical explanation of how Dropout actually drops nodes in an algorithm. I've read that Dropout zeros out the activation function for…

neural-networks machine-learning regularization dropout

asked Dec 13 '22 at 11:13

Connor

133
1
5

Questions tagged [regularization]