-1

I would like to have an order of magnitude of ressources required to build an image recognition system.

Let say you want to build a startup company which main product will have to distinguish 20 different kinds of objects (bottle, dogs, car, flowers...). Images are already tagged.

  • How many images are needed as a learning set ? 1k, 10k, 100k, 1 million ?
  • What kind of hardware and how long will the learning process take ?
  • How many developers, how much time ​?
  • Does it changes a lot if the number of target output is reduced to two kinds, or increased to one thousands ?

​A link to a real life paper would be perfect. Thank you

bokan
  • 423
  • 4
  • 8

1 Answers1

3

One answer is infinite amount of time because it can always be better.

Another answer is:

  • 10k for training set
  • A PC with a GPU (3~4k USD), google colab (10 USD per month), or other cloud service (probably more expensive than colab)
  • One developer, 1 day lol
  • Two kinds is easier than multiple kinds
  • There is no paper that seeks to answer your question the way you put it. I wouldn't even recommend a paper. In fact, for you I'd recommend an AutoML tutorial. Check these. * no offence if I've misjudged your knowledge/skill level.
  • Here's a paper anyway :) https://paperswithcode.com/lib/torchvision/alexnet

To conclude, please be aware that your question is super open ended, and my answer is bad (but good enough for now maybe), but a good answer doesn't really exist. It's always going to be context dependent. For instance, you never said whether you need 90% or 99% accuracy.

Alexander Soare
  • 1,379
  • 3
  • 12
  • 28