9

Currently, I am working on a few projects that use feedforward neural networks for regression and classification of simple tabular data. I have noticed that training a neural network using TensorFlow-GPU is often slower than training the same network using TensorFlow-CPU.

Could something be wrong with my setup/code or is it possible that sometimes GPU is slower than CPU?

nbro
  • 42,615
  • 12
  • 119
  • 217
GKozinski
  • 1,290
  • 11
  • 22

3 Answers3

7

This changes according to your data and complexity of your models. See following article by microsoft. Their conclusion is

The results suggest that the throughput from GPU clusters is always better than CPU throughput for all models and frameworks proving that GPU is the economical choice for inference of deep learning models. ...

It is important to note that, for standard machine learning models where number of parameters are not as high as deep learning models, CPUs should still be considered as more effective and cost efficient.

Since you are training MLP, it can not be thought as standard machine learning model. See my preprint, The impact of using large training data set KDD99 on classification accuracy. I compare different machine learning algorithms using weka.

enter image description here

As you can see from above image, MLP takes 722 minutes to train while Naive Bayes ~2 minutes. If your data is small and your models parameters are not high, you will see better performance on CPU.

Atilla Ozgur
  • 196
  • 4
2

I advice you to always use GPU over CPU for training your models. This is driven by the usage of deep learning methods on images and texts, where the data is very rich.

You must have a GPU suited perfectly for training (e.g. NVIDIA 1080, NVIDIA Titan or higher versions), I wouldn't be surprised to find that your CPU was faster if you don't have a powerful GPU.

1

It depends, if you ve to solve a "simple" problem which does not require CNN or stacked models without multidimensional data and not many multiplications, big long numbers then if you decide to use CNN / stacked architectures AND GPU it is like using a hammer to insert a needle.It will not only spend energy but the computations will do zero padding in memory, you will observe a degradation in speed.