Questions tagged [regularization]

For questions about application of regularization techniques.

In mathematics, statistics, and computer science, particularly in the fields of machine learning and inverse problems, regularization is a process of introducing additional information in order to solve an ill-posed problem or to prevent overfitting.
https://en.wikipedia.org/wiki/Regularization_(mathematics)

57 questions
12
votes
3 answers

Are there any rules of thumb for having some idea of what capacity a neural network needs to have for a given problem?

To give an example. Let's just consider the MNIST dataset of handwritten digits. Here are some things which might have an impact on the optimum model capacity: There are 10 output classes The inputs are 28x28 grayscale pixels (I think this…
11
votes
2 answers

Can someone explain R1 regularization function in simple terms?

I'm trying to understand the R1 regularization function, both the abstract concept and every symbol in the formula. According to the article, the definition of R1 is: It penalizes the discriminator from deviating from the Nash Equilibrium via…
10
votes
1 answer

What is "early stopping" in machine learning?

What is early stopping in machine learning and, in general, artificial intelligence? What are the advantages of using this method? How does it help exactly? I'd be interested in perspectives and links to recent research.
kenorb
  • 10,525
  • 6
  • 45
  • 95
8
votes
3 answers

How should we regularize an LSTM model?

There are five parameters from an LSTM layer for regularization if I am correct. To deal with overfitting, I would start with reducing the layers reducing the hidden units Applying dropout or regularizers. There are kernel_regularizer,…
7
votes
2 answers

Why is dropout favoured compared to reducing the number of units in hidden layers?

Why is dropout favored compared to reducing the number of units in hidden layers for the convolutional networks? If a large set of units leads to overfitting and dropping out "averages" the response units, why not just suppress units? I have read…
5
votes
2 answers

Why did the L1/L2 regularization technique not improve my accuracy?

I am training a multilayer neural network with 146 samples (97 for the training set, 20 for the validation set, and 29 for the testing set). I am using: automatic differentiation, SGD method, fixed learning rate + momentum term, logistic…
5
votes
1 answer

How does L2 regularization make weights smaller?

I'm learning logistic regression and $L_2$ regularization. The cost function looks like below. $$J(w) = -\displaystyle\sum_{i=1}^{n} (y^{(i)}\log(\phi(z^{(i)})+(1-y^{(i)})\log(1-\phi(z^{(i)})))$$ And the regularization term is added. ($\lambda$ is a…
4
votes
1 answer

What is the best way to combine or weight multiple losses with gradient descent?

I am optimizing a neural network with Adam using 3 different losses. Their scale is very different, and the current method is to either sum the losses and clip the gradient or to manually weight them within the sum. Something like:…
4
votes
2 answers

How does Regularization Reduce Overfitting?

As I understand, this is the general summary of the Regularization-Overfitting Problem: The classical "Bias-Variance Tradeoff" suggests that complicated models (i.e. models with more parameters, e.g. neural networks with many layers/weights) are…
stats_noob
  • 299
  • 3
  • 12
4
votes
0 answers

When is using weight regularization bad?

Regularization of weights (e.g. L1 or L2) keeps them small and standardized, which can help reduce data overfitting. From this article, regularization sounds favorable in many cases, but is it always encouraged? Are there scenarios in which it…
4
votes
1 answer

Why does L1 regularization yield sparse features?

In contrast to L2 regularization, L1 regularization usually yields sparse feature vectors and most feature weights are zero. What's the reason for the above statement - could someone explain it mathematically, and/or provide some intuition (maybe…
4
votes
1 answer

Is there a way to ensure that my model is able to recognize an unseen example?

My question is more theoretical than practical. Let's say that I am training my cat classifier with a dataset that I feel is pretty representative of cat images in general. But then a new breed of cat is created that is distinct from other cats and…
4
votes
1 answer

What is the $\ell_{2, 1}$ norm?

I'm reading this paper and it says: In this paper, we present a multi-class embedded feature selection method called as sparse optimal scoring with adjustment (SOSA), which is capable of addressing the data heterogeneity issue. We propose to…
Gyntonic
  • 143
  • 1
  • 5
3
votes
0 answers

Regarding L0 sparsification of DNNs proposed by Louizos, Kingma and Welling

I am reading the paper on $\ell_0$ regularization of DNNs by Louizos, Welling and Kingma (2017) (Link to arxiv). In Section 2.1 the authors define the cost function as follows: $$ \mathcal{R}\left( \tilde{\theta}, \pi \right) =…
panini
  • 31
  • 2
3
votes
1 answer

How does dropout work during backpropagation?

I've searched for an answer to this, and read several scientific articles on the subject, but I can't find a practical explanation of how Dropout actually drops nodes in an algorithm. I've read that Dropout zeros out the activation function for…
1
2 3 4