Highest Voted 'batch-learning' Questions - Artificial Intelligence Stack Exchange

5

votes

1 answer

Why would a VAE train much better with batch sizes closer to 1 over batch size of 100+?

I've been training a VAE to reconstruct human names and when I train it on a batch size of 100+ after about 5 hours of training it tends to just output the same thing regardless of the input and I'm using teacher forcing as well. When I use a lower…

variational-autoencoder batch-size batch-learning

asked Nov 19 '20 at 12:51

user8714896

825
1
9
24

4

votes

1 answer

Is batch learning with gradient descent equivalent to "rehearsal" in incremental learning?

I am learning about incremental learning and read that rehearsal learning is retraining with old data. In essence, isn't this the exact same thing as batch learning (with stochastic gradient descent)? You train a model by passing in batches of data…

neural-networks deep-learning gradient-descent incremental-learning batch-learning

asked Sep 07 '20 at 22:43

JobHunter69

233
1
10

4

votes

2 answers

Why is Batch Gradient Descent performing worse than Stochastic and Mini-Batch Gradient Descent?

I have implemented a neural network from scratch (only using numpy) and I am having problems understanding why the results are so different between Stochastic/Mini-Batch Gradient Descent and Batch Gradient Descent: The training data is a collection…

machine-learning deep-learning python gradient-descent batch-learning

asked Sep 26 '19 at 16:21

Pigna

295
2
6

3

votes

2 answers

Are batches useful for REINFORCE without strong episode cutoffs?

I'm following along with PyTorch's example implementations (found here) of reinforcement learning algorithms that happen to be largely REINFORCE (vanilla policy gradient) based, and I notice they don't use batches. This leads me to ask, are batch…

reinforcement-learning deep-rl reinforce batch-learning

asked Sep 20 '22 at 23:42

Josh

99
9

2

votes

0 answers

What's the most efficient way of performing batched training of Causal Language Models?

I have seen a number of ways to train (yes, train, not fine-tune) these models efficiently with batches. I will illustrate these techniques with the following example dataset and context window: Context window: ----------------- Data samples: 1.…

training transformer gpt language-model batch-learning

asked Mar 28 '23 at 07:40

thesofakillers

329
3
15

2

votes

1 answer

How to sample the tuples during the initial time steps of the DDPG algorithm?

I am facing an issue in understanding the following line from the pseudocode of the DDPG algorithm Sample a random minibatch of $N$ transitions $(s_i, a_i, r_i, s_{i+1})$ from $R$ Here $N$ is a hyperparameter that is equal to the number of…

deep-rl implementation ddpg experience-replay batch-learning

asked Jul 03 '22 at 12:38

hanugm

4,102
3
29
63

2

votes

0 answers

Methodologies for passing the best samples for a neural network to learn

Just an idea I am sure I read in a book some time ago, but I can't remember the name. Given a very large dataset and a neural network (or anything that can learn via something like stochastic gradient descent, passing a subset of samples to modify…

neural-networks stochastic-gradient-descent batch-learning

asked Jul 23 '21 at 15:59

user4052054

121
1

2

votes

1 answer

Offline/Batch Reinforcement Learning: when to stop training and what agent to select

Context: My team and I are working on a RL problem for a specific application. We have data collected from user interactions (states, actions, rewards, etc.). It is too costly for us to emulate agents. We decided therefore to concentrate on Offline…

training q-learning off-policy-methods batch-learning offline-reinforcement-learning

asked Jan 13 '21 at 11:17

MetaHG

21
4

1

vote

1 answer

Batch wise Inference to speed up Muzero's MCTS

Context: I've implemented Muzero for the game Tic-tac-toe. Unfortunately, the self-play and training is very slow (like 10 hours until it plays quite well). I ran the python profiler to find the parts that take the most time. The result is that…

reinforcement-learning pytorch performance batch-learning muzero

asked Feb 05 '24 at 17:40

Lynix

33
3

1

vote

1 answer

Batching together similar length sequences to avoid padding and packing

I am training an RNN in PyTorch to produce captions for images. It's a pretty standard architecture – the image is processed by a pre-trained InceptionV3 to extract features, the recurrent module processes the words seen so far and then its result…

natural-language-processing recurrent-neural-networks pytorch training-datasets batch-learning

asked Oct 09 '22 at 16:33

czypsu

111
2

1

vote

0 answers

How is it possible to use batches of data from within the same sequence with an LSTM?

ETA: More concise wording: Why do some implementations use batches of data taken from within the same sequence? Does this not make the cell state useless? Using the example of an LSTM, it has a hidden state and cell state. These states are updated…

machine-learning recurrent-neural-networks long-short-term-memory pytorch batch-learning

asked Jul 20 '22 at 06:24

Recessive

1,446
10
21

1

vote

1 answer

Why does the output shape of a Dense layer contain a batch size?

I understand that the batch size is the number of examples you pass into the neural network (NN). If the batch size is 10, it means you feed the NN 10 examples at once. Assuming I have an NN with a single Dense layer. This Dense layer of 20 units…

neural-networks keras hidden-layers dense-layers batch-learning

asked Aug 26 '20 at 15:18

jaksnak

13
3

1

vote

1 answer

What is the difference between batches in deep Q learning and supervised learning?

How is the batch loss calculated in both DQNs and simple classifiers? From what I understood, in a classifier, a common method is that you sample a mini-batch, calculate the loss for every example, calculate the average loss over the whole batch,…

reinforcement-learning deep-learning backpropagation objective-functions batch-learning

asked Jan 15 '20 at 09:48

Voß

99
9

0

votes

0 answers

Why does chunked dataset training give different results compared to full-batch training in my Siren model?

I'm implementing a Siren model for audio reconstruction using PyTorch. My first approach processes the entire dataset in a single batch, while my second approach loads and trains the dataset in smaller chunks to avoid memory overload. Approach 1…

pytorch audio-processing batch-learning

asked Feb 20 '25 at 06:55

Mohamed Mahdi

1

0

votes

1 answer

Should batch size be as large as possible, even the entire training set?

Should batch size be as large as possible, even the entire training set (if memory allows for it)?

terminology mini-batch-gradient-descent batch-size batch-learning

asked Jan 21 '25 at 18:59

Geremia

555
1
5
12

Questions tagged [batch-learning]