I am developing an LSTM for sequence tagging. During the development, I do various changes in the system, for example, add new features, change the number of nodes in the hidden layers, etc. After each change, I check the accuracy using cross-validation on a development dataset.
Currently, in each check, I use 100 iterations to train the system, which takes a lot of time. So I thought that, maybe, during development, I can use only e.g. 20 iterations. Then, each check will be faster. After I find the best configuration, I can switch back to 100 iterations to get better accuracy.
My question is: is this consideration correct? More specifically, if model A is better than model B with 20 training iterations, is it likely that A will be better than B also with 100 training iterations?
Alternatively, is there a better way to speed up the development process?
 
     
     
    