I work on Kaggle's Galaxy Zoo competition with Keras/Tensorflow, but the huge amount of data (lot of images) sends my computer to limbo. Mine is a more or less ordinary PC (i5) with a generous 48GB of RAM albeit I am unable to utilize my GPU (my video card is not CUDA-compatible). I use Ubuntu&Anaconda combo.
The actual problem is that Python throws "Memory error" while reading in the images from disk to a stacked numpy array. Seemingly my memory is insufficient for the job and I could imagine that the same would be true for any serious task (of course, there are projects beyond MNIST-classification).
So, my question is, what is and how could I get an infrastucture capable of handling jobs of this scale? Actually, what is the real bottleneck here? Memory? The top Linux command shows only about 10% of memory usage in case the running Python process, which is curious.
Of course, I'm not on the level of institutional players so only reasonable costs are acceptable...