5

In the deep learning specialization course by Andrew Ng, in the video Sequence Models (minute 4:13), he says that in negative sampling we have to choose a sample of words from the corpus to train rather than choosing the whole corpus. But he said that, for smaller datasets, we need a bigger number of samples, for example, 5-20, and, for larger datasets, we need a smaller sample, for example, 2-5. By sample, I am referring to the number of words along with the target word we have taken to train the model.

Why do small datasets require more samples, while big datasets require fewer samples?

1 Answers1

1

He likely found this to a be a best practice to avoid over fitting, with a small data set if you only use small and easy to learn (less words -> less degrees of freedom) sequences then you open your model to the risk of over fitting that data set where as on a large data set that has alot more total information you can train on small sequences without being at risk of over fitting because although the smaller sequences will be easier to learn the variance of sequences will be much higher.

nickw
  • 347
  • 1
  • 7