2

I work on Kaggle's Galaxy Zoo competition with Keras/Tensorflow, but the huge amount of data (lot of images) sends my computer to limbo. Mine is a more or less ordinary PC (i5) with a generous 48GB of RAM albeit I am unable to utilize my GPU (my video card is not CUDA-compatible). I use Ubuntu&Anaconda combo.

The actual problem is that Python throws "Memory error" while reading in the images from disk to a stacked numpy array. Seemingly my memory is insufficient for the job and I could imagine that the same would be true for any serious task (of course, there are projects beyond MNIST-classification).

So, my question is, what is and how could I get an infrastucture capable of handling jobs of this scale? Actually, what is the real bottleneck here? Memory? The top Linux command shows only about 10% of memory usage in case the running Python process, which is curious.

Of course, I'm not on the level of institutional players so only reasonable costs are acceptable...

karel
  • 122,292
  • 133
  • 301
  • 332
Hendrik
  • 225

0 Answers0