Highest Voted 'mini-batch-gradient-descent' Questions - Artificial Intelligence Stack Exchange

12

votes

2 answers

What exactly is averaged when doing batch gradient descent?

I have a question about how the averaging works when doing mini-batch gradient descent. I think I now understood the general gradient descent algorithm, but only for online learning. When doing mini-batch gradient descent, do I have to: forward…

asked Apr 18 '20 at 21:21

Ben

455
3
11

10

votes

2 answers

Is neural networks training done one-by-one?

I'm trying to learn neural networks by watching this series of videos and implementing a simple neural network in Python. Here's one of the things I'm wondering about: I'm training the neural network on sample data, and I've got 1,000 samples. The…

neural-networks deep-learning gradient-descent stochastic-gradient-descent mini-batch-gradient-descent

asked May 25 '19 at 05:08

Ram Rachum

260
1
11

10

votes

1 answer

Is back-propagation applied for each data point or for a batch of data points?

I am new to deep learning and trying to understand the concept of back-propagation. I have a doubt about when the back-propagation is applied. Assume that I have a training data set of 1000 images for handwritten letters, Is back-propagation…

neural-networks backpropagation gradient-descent stochastic-gradient-descent mini-batch-gradient-descent

asked Apr 05 '19 at 08:34

Maanu

245
1
2
7

3

votes

1 answer

When using experience replay, do we update the parameters for all samples of the mini-batch or for each sample in the mini-batch separately?

I've been reading Google's DeepMind Atari paper and I'm trying to understand how to implement experience replay. Do we update the parameters $\theta$ of function $Q$ once for all the samples of the minibatch, or do we do that for each sample of the…

reinforcement-learning deep-rl dqn experience-replay mini-batch-gradient-descent

asked May 30 '18 at 19:56

user491626

241
1
5

3

votes

3 answers

What is the difference between batch and mini-batch gradient decent?

I am learning deep learning from Andrew Ng's tutorial Mini-batch Gradient Descent. Can anyone explain the similarities and dissimilarities between batch GD and mini-batch GD?

deep-learning comparison gradient-descent stochastic-gradient-descent mini-batch-gradient-descent

asked Mar 28 '20 at 05:03

DRV

1,843
3
15
20

2

votes

2 answers

What's the rationale behind mini-batch gradient descent?

I am reading a book that states As the mini-batch size increases, the gradient computed is closer to the 'true' gradient So, I assume that they are saying that mini-batch training only focuses on decreasing the cost function in a certain 'plane',…

gradient-descent stochastic-gradient-descent mini-batch-gradient-descent

asked Aug 09 '18 at 03:31

ngc1300

133
5

2

votes

1 answer

Is it possible to use Mini-Batches with Adam optimization?

Is it possible/advised to use Mini-Batch like accumulation with Adam optimization? How would that works? Do I accumulate the loss function for each sample in the batch and then run Adam, or should I divide the Loss by Number of samples in batch…

neural-networks mini-batch-gradient-descent adam

asked Mar 19 '24 at 07:23

CoffeDeveloper

291
1
7

2

votes

1 answer

Why is it called "batch" gradient descent if it consumes the full dataset before calculating the gradient?

While training a neural network, we can follow three methods: batch gradient descent, mini-batch gradient descent and stochastic gradient descent. For this question, assume that your dataset has $n$ training samples and we divided it into $k$…

terminology gradient-descent stochastic-gradient-descent mini-batch-gradient-descent

asked Jul 30 '21 at 23:38

hanugm

4,102
3
29
63

2

votes

1 answer

When is the loss calculated, and when does the back-propagation take place?

I read different articles and keep getting confused on this point. Not sure if the literature is giving mixed information or I'm interpreting it incorrectly. So from reading articles my understanding (loosely) for the following terms are as…

neural-networks deep-learning objective-functions mini-batch-gradient-descent epochs

asked Aug 07 '19 at 01:38

Hazzaldo

309
3
9

1

vote

1 answer

How to normalize gradient value due to the batch size?

A = (m x n) - input B = (n x k) - weight output = A @ B = (m x k) outputloss = (m x k) doutput/dB = A.T @ outputloss = (n x m) @ (m x k) = (n x k) So, as we see m (batch size) is inner dimension and it is droping out. But the value of gradient is…

gradient-descent weights batch-normalization mini-batch-gradient-descent

asked Nov 01 '24 at 22:21

Тима

39
4

1

vote

1 answer

Why to use gradient accumulation?

I know that gradient accumulation is (1) a way to reduce memory usage while still enabling the machine to fit a large dataset (2) reducing the noise of the gradient compared to SGD, and thus smoothing the training process. However, I wonder what…

deep-learning gradient-descent gradient stochastic-gradient-descent mini-batch-gradient-descent

asked Jan 25 '23 at 22:46

Cyrus

111
2

1

vote

1 answer

What is the order of execution of steps in back-propagation algorithm in a neural network?

I am a machine learning newbie. I am trying to understand the back-propagation algorithm. I have a training dataset of 60 instances/records. What is the correct order of the process? This one? Forward pass of the first instance. Calculate the…

neural-networks deep-learning backpropagation stochastic-gradient-descent mini-batch-gradient-descent

asked Jun 15 '17 at 14:06

gokul

53
5

1

vote

0 answers

Why would one prefer the gradient of the sum rather than the sum of the gradients?

When gradients are aggregated over mini batches, I sometimes see formulations like this, e.g., in the "Deep Learning" book by Goodfellow et al. $$\mathbf{g} = \frac{1}{m} \nabla_{\mathbf{w}} \left( \sum\limits_{i=1}^{m} L \left( f \left(…

gradient-descent gradient mini-batch-gradient-descent

asked Mar 21 '22 at 09:55

Eddie C

11
1

1

vote

1 answer

Is it possible to use stochastic gradient descent at the beginning, then switch to batch gradient descent with only a few training examples?

Batch gradient descent is extremely slow for large datasets, but it can find the lowest possible value for the cost function. Stochastic gradient descent is relatively fast, but it kind of finds the general area where convergence happens and it kind…

machine-learning gradient-descent stochastic-gradient-descent mini-batch-gradient-descent

asked Nov 03 '21 at 07:41

Adith Raghav

121
3

1

vote

2 answers

When would it make sense to perform a gradient descent step for each term of a loss function with multiple terms?

I am training a neural network using a mini-batch gradient descent algorithm. Now, consider the following loss function, which is composed of 2 terms. $$L = L_{\text{MSE}} + L_{\text{regularization}} \label{1}\tag{1}$$ As far as I understand,…

deep-learning objective-functions mini-batch-gradient-descent

asked Oct 13 '21 at 07:39

hanugm

4,102
3
29
63

Questions tagged [mini-batch-gradient-descent]