Questions tagged [gpu]

Graphics processing units or GPUs are specialized hardware for the manipulation of images and calculation of local image properties.

The mathematical basis of neural networks and image manipulation are similar, parallel tasks involving matrices, leading GPUs to become increasingly used for machine learning tasks. As of 2016, GPUs are popular for AI work, and they continue to evolve in a direction to facilitate deep learning, both for training[24] and inference in devices such as self-driving cars. GPU developers are developing additional connective capability for the kind of dataflow workloads AI benefits from. As GPUs have been increasingly applied to AI acceleration, GPU manufacturers have incorporated neural network specific hardware to further accelerate these tasks. Tensor cores are intended to speed up the training of neural networks. Wikipedia Reference

46 questions
9
votes
3 answers

Is a GPU always faster than a CPU for training neural networks?

Currently, I am working on a few projects that use feedforward neural networks for regression and classification of simple tabular data. I have noticed that training a neural network using TensorFlow-GPU is often slower than training the same…
GKozinski
  • 1,290
  • 11
  • 22
8
votes
2 answers

Can LSTM neural networks be sped up by a GPU?

I am training LSTM neural networks with Keras on a small mobile GPU. The speed on the GPU is slower than on the CPU. I found some articles that say that it is hard to train LSTMs (and, in general, RNNs) on GPUs because the training cannot be…
Dieshe
  • 289
  • 1
  • 2
  • 6
8
votes
2 answers

Effect of batch size and number of GPUs on model accuracy

I have a data set that was split using a fixed random seed and I am going to use 80% of the data for training and the rest for validation. Here are my GPU and batch size configurations use 64 batch size with one GTX 1080Ti use 128 batch size with…
bit_scientist
  • 241
  • 2
  • 5
  • 16
5
votes
3 answers

For an LLM model, how can I estimate its memory requirements based on storage usage?

It is easy to see the amount of disk space consumed by an LLM model (downloaded from huggingface, for instance). Just go in the relevant directory and check the file sizes. How can I estimate the amount of GPU RAM required to run the model? For…
ahron
  • 265
  • 2
  • 7
5
votes
3 answers

How does a transformer leverage the GPU to be trained faster than RNNs?

How does a transformer leverage the GPU to be trained faster than RNNs? I understand the parameter space of the transformer might be significantly larger than that of the RNN. But why does the transformer structure can leverage multiple GPUs, and…
3
votes
0 answers

Good model and training algorithm to store texture data for fast gpu inference

Now, the following may sound silly, but I want to do it for my better understanding of performance and implementation of GPU inference for a set of deep learning problems. What I want to do is to replace a surface texture for a 3d model by a NN that…
3
votes
1 answer

Training network with 4 GPUs performance is not exactly 4 times over one GPU why?

Training neural network with 4 GPUs using pyTorch, performance is not even 2 times (btw 1 & 2 times) compare to using one GPU. From Nvidia-smi we see GPU usage is for few milliseconds and next 5-10 seconds looks like data is off-loaded and loaded…
Troy
  • 83
  • 4
3
votes
2 answers

How can I reduce the GPU memory usage with large images?

I am trying to train a CNN-LSTM model. The size of my images is 640x640. I have a GTX 1080 ti 11GB. I am using Keras with the TensorFlow backend. Here is the model. img_input_1 = Input(shape=(1, n_width, n_height, n_channels)) conv_1 =…
2
votes
1 answer

Complete formula to get LLM VRAM usage

I would like to find the GPU size required to run an hypothetical LLM, considering all possible factors, like: P: Model parameters (total or MoE active parameters) Q: Quantization bits C: Context length cap (from what I understand, the context can…
rikyeah
  • 121
  • 2
2
votes
1 answer

Has anyone tried to use llama.cpp with NVLink?

Apparantly its possible to pool the memory of two 3090 using NVLink (although not with 4090). This would make it possible to run large LLM's on consumer hardware. https://huggingface.co/transformers/v4.9.2/performance.html Although before I invest…
user2741831
  • 135
  • 6
2
votes
0 answers

How does one deal with images that are too large to fit in the GPU memory for doing ML image analysis?

How does one deal with images that are too large to fit in the GPU memory for doing ML image analysis? I am interested in detecting small structures on images which are themselves many GB in size. Beyond simple downsampling and maybe doing…
2
votes
0 answers

How to train neural networks with multiprocessing?

I am trying to figure out how multiprocessing works in neural networks. In the example I've seen, the database is split into $x$ parts (depending on how many workers you have) and each worker is responsible to train the network using a different…
Yedidya kfir
  • 121
  • 1
2
votes
2 answers

How do GPUs faciliate the training of a Deep Learning Architecture?

I would love to know in detail, how exactly GPUs help, in technical terms, in training the deep learning models. To my understanding, GPUs help in performing independent tasks simultaneously to improve the speed. For example, in calculation of the…
2
votes
0 answers

In addition to matrix algebra, can GPU's also handle the various Kernel functions for Neural Networks?

I've read a number of articles on how GPUs can speed up matrix algebra calculations, but I'm wondering how calculations are performed when one uses various kernel functions in a neural network. If I use Sigmoid functions in my neural network, does…
1
vote
0 answers

Using FLOPS estimate of Transformer to approximate time given GPU FLOPS per second

Intro I am attempting to approximate the time it takes for a Transformer to generate tokens given a GPU. Based on ran experiments, the below approach significantly underestimate the actual runtime. The model's runtime does not scale in any…
1
2 3 4