6

I am developing an LSTM for sequence tagging. During the development, I do various changes in the system, for example, add new features, change the number of nodes in the hidden layers, etc. After each change, I check the accuracy using cross-validation on a development dataset.

Currently, in each check, I use 100 iterations to train the system, which takes a lot of time. So I thought that, maybe, during development, I can use only e.g. 20 iterations. Then, each check will be faster. After I find the best configuration, I can switch back to 100 iterations to get better accuracy.

My question is: is this consideration correct? More specifically, if model A is better than model B with 20 training iterations, is it likely that A will be better than B also with 100 training iterations?

Alternatively, is there a better way to speed up the development process?

nbro
  • 42,615
  • 12
  • 119
  • 217

2 Answers2

5

Your scenario is common.

The most straightforward approach is to subsample your data randomly. Unless your data or your model has strong bias, your performance to the smaller data set should be comparable. The accuracy might be lower, but the purpose is to do quick sanity check.

SmallChess
  • 1,421
  • 1
  • 9
  • 14
2

This might work for your case but isn't necessarily true and depends on how much data the network goes through in an iteration. You should be able to test this by making a small change and training until 100 iterations and seeing if the performance significantly changes and if it can be predicted from the 20th iteration.

Another way which may work for you is preloading lower layers of your network (if you have more than one layer). For instance, if you have 5 layers and are making changes to the last 2, you could preload the bottom 3 layers with previously trained weights. This should decrease the amount of training that needs to take place as your network can already discern some primary features of your problem.

Jaden Travnik
  • 3,867
  • 1
  • 18
  • 35