Questions tagged [batch-learning]

For questions about machine learning algorithms that learn in batches of data rather than one example at a time (i.e. online learning). Batch learning can also be called offline learning and it is the common way of training machine learning models.

17 questions
5
votes
1 answer

Why would a VAE train much better with batch sizes closer to 1 over batch size of 100+?

I've been training a VAE to reconstruct human names and when I train it on a batch size of 100+ after about 5 hours of training it tends to just output the same thing regardless of the input and I'm using teacher forcing as well. When I use a lower…
user8714896
  • 825
  • 1
  • 9
  • 24
4
votes
1 answer

Is batch learning with gradient descent equivalent to "rehearsal" in incremental learning?

I am learning about incremental learning and read that rehearsal learning is retraining with old data. In essence, isn't this the exact same thing as batch learning (with stochastic gradient descent)? You train a model by passing in batches of data…
4
votes
2 answers

Why is Batch Gradient Descent performing worse than Stochastic and Mini-Batch Gradient Descent?

I have implemented a neural network from scratch (only using numpy) and I am having problems understanding why the results are so different between Stochastic/Mini-Batch Gradient Descent and Batch Gradient Descent: The training data is a collection…
3
votes
2 answers

Are batches useful for REINFORCE without strong episode cutoffs?

I'm following along with PyTorch's example implementations (found here) of reinforcement learning algorithms that happen to be largely REINFORCE (vanilla policy gradient) based, and I notice they don't use batches. This leads me to ask, are batch…
2
votes
0 answers

What's the most efficient way of performing batched training of Causal Language Models?

I have seen a number of ways to train (yes, train, not fine-tune) these models efficiently with batches. I will illustrate these techniques with the following example dataset and context window: Context window: ----------------- Data samples: 1.…
2
votes
1 answer

How to sample the tuples during the initial time steps of the DDPG algorithm?

I am facing an issue in understanding the following line from the pseudocode of the DDPG algorithm Sample a random minibatch of $N$ transitions $(s_i, a_i, r_i, s_{i+1})$ from $R$ Here $N$ is a hyperparameter that is equal to the number of…
hanugm
  • 4,102
  • 3
  • 29
  • 63
2
votes
0 answers

Methodologies for passing the best samples for a neural network to learn

Just an idea I am sure I read in a book some time ago, but I can't remember the name. Given a very large dataset and a neural network (or anything that can learn via something like stochastic gradient descent, passing a subset of samples to modify…
2
votes
1 answer

Offline/Batch Reinforcement Learning: when to stop training and what agent to select

Context: My team and I are working on a RL problem for a specific application. We have data collected from user interactions (states, actions, rewards, etc.). It is too costly for us to emulate agents. We decided therefore to concentrate on Offline…
1
vote
1 answer

Batch wise Inference to speed up Muzero's MCTS

Context: I've implemented Muzero for the game Tic-tac-toe. Unfortunately, the self-play and training is very slow (like 10 hours until it plays quite well). I ran the python profiler to find the parts that take the most time. The result is that…
1
vote
1 answer

Batching together similar length sequences to avoid padding and packing

I am training an RNN in PyTorch to produce captions for images. It's a pretty standard architecture – the image is processed by a pre-trained InceptionV3 to extract features, the recurrent module processes the words seen so far and then its result…
1
vote
0 answers

How is it possible to use batches of data from within the same sequence with an LSTM?

ETA: More concise wording: Why do some implementations use batches of data taken from within the same sequence? Does this not make the cell state useless? Using the example of an LSTM, it has a hidden state and cell state. These states are updated…
1
vote
1 answer

Why does the output shape of a Dense layer contain a batch size?

I understand that the batch size is the number of examples you pass into the neural network (NN). If the batch size is 10, it means you feed the NN 10 examples at once. Assuming I have an NN with a single Dense layer. This Dense layer of 20 units…
1
vote
1 answer

What is the difference between batches in deep Q learning and supervised learning?

How is the batch loss calculated in both DQNs and simple classifiers? From what I understood, in a classifier, a common method is that you sample a mini-batch, calculate the loss for every example, calculate the average loss over the whole batch,…
0
votes
0 answers

Why does chunked dataset training give different results compared to full-batch training in my Siren model?

I'm implementing a Siren model for audio reconstruction using PyTorch. My first approach processes the entire dataset in a single batch, while my second approach loads and trains the dataset in smaller chunks to avoid memory overload. Approach 1…
0
votes
1 answer

Should batch size be as large as possible, even the entire training set?

Should batch size be as large as possible, even the entire training set (if memory allows for it)?
1
2