11

Is it possible to give a rule of thumb estimate about the size of neural networks that are trainable on common consumer-grade GPUs?

For example, the Emergence of Locomotion (Reinforcement) paper trains a network using tanh activation of the neurons. They have a 3 layer NN with 300,200,100 units for the Planar Walker. But they don’t report the hardware and time.

But could a rule of thumb be developed?

Also, just based on current empirical results. So, for example, $X$ units using sigmoid activation can run $Y$ learning iterations per hour on a 1060.

Or using activation function $a$ instead of $b$ causes a $n$ times decrease in performance.

If a student/researcher/curious mind is going to buy a GPU for playing around with these networks, how do you decide what you get? A 1060 is apparently the entry-level budget option, but how can you evaluate if it is not smarter to just get a crappy netbook instead of building a high-power desktop and spend the saved $ on on-demand cloud infrastructure.

Motivation for the question: I just purchased a 1060 and (clever, to ask the question afterwards huh) wonder if I should have just kept the $ and made a Google Cloud account. And if I can run my master thesis simulation on the GPU.

nbro
  • 42,615
  • 12
  • 119
  • 217
pascalwhoop
  • 305
  • 1
  • 8

3 Answers3

6

Usually the problem is to fit the model into video RAM. If it does not, you cannot train your model at all without big efforts (like training parts of the model separately). If it does, time is your only problem. But the difference in training time between consumer GPUs like the Nvidia 1080 and much more expensive GPU accelerators like the Nvidia K80 are not very large. Actually the best consumer cards are faster than GPU accelerators, but lack other properties like VRAM. Random comparisons and benchmarks: Which GPU(s) to Get for Deep Learning: My Experience and Advice for Using GPUs in Deep Learning, A Comparison between NVIDIA’s GeForce GTX 1080 and Tesla P100 for Deep Learning.

To calculate if your models fits into VRAM, you just approximate how much data and which hyperparameters you have (inputs, outputs, weights, layers, batch size, which datatype and so on).

Shayan Shafiq
  • 350
  • 1
  • 4
  • 12
C. Yduqoli
  • 176
  • 4
3

As a caveat, I’d suggest that unless you’re pushing up against fundamental technological limits, computation speed and resources should be secondary to design rationale when developing a neural network architecture.

That said, earlier this year I finished my MS thesis that involved bioinformatics analytics pipelines with whole genome sequencing data - that project took over 100,000 hours of compute time to develop according to our clusters job manager. When your on a deadline, resources can be a real constraint and speed can be critical.

So, to answer your questions as I understand them:

Would I have been better off to use the money to buy time in the cloud?

Probably. The few hundred dollars you spent on the 1060 would take you far training your model(s) in the cloud. Further, as far as I can tell, you don’t require the GPU to be cranking 100% of the time (you would if you were, say, mining crypto currencies). Finally, with cloud instances you could scale, training multiple models at once, which can speed up the exploration and validation of any architecture you settle on.

Is there a way to gauge the compute time of a neural network on a given GPU

Well, Big O is one estimator, but it sounds like you want a more precise method. I’m sure they exist, but I’d counter that you can make your estimation with simple back of the envelope calculations that account for threads, memory, code iterations, etc. Do you really want to dig into the GPU processing pipeline on the 1060? You may be able to come up with a very good estimate by understanding everything happening between your code and the metal, but ultimately it’s probably not worth the time and effort; it will likely confirm that Big O notation (the simple model, if you will) captures most of the variation in compute time. One thing you can do if you notice bottlenecks is performance profiling.

Greenstick
  • 416
  • 3
  • 10
2

It depends on what you need. You can train any size of network on any resource. The problem is the time of training. If you want to train Inception on an average CPU it will take months to converge. So, it all depends on how long you can wait to see your results based on your network. As in neural nets we do not have only one operation but many (like concatenating, max pooling, padding etc.), it is impossible to make an estimation as you are searching for. Just start training some infamous networks and measure the time. Then, you can interpolate how long it will take to train networks that you are searching for.

Deniz Beker
  • 366
  • 1
  • 7