0

I'm on a robotics team and we've been tasked to write a program to differentiate between a live and dead fish. We've been given ~15 minutes of training footage and it's absolutely terrible. It's low quality, hard to label (even for humans) and it's like 20 frames a second.

I have tried everything I can think of. YOLO, 3D convolutions (to take movement over time into account), residual networks with anywhere from 1-10 layers and more. I have narrowed it down to the data is just terrible.

Is there anything I can do to fix this? I know of data augmentation and have used it, but that doesn't increase the usefulness of the data, it just creates more terrible data. I feel like using machine learning to clean the data wouldn't be helpful (because of studies I can't remember the name of showing that adding one white pixel to an image can completely confuse an object classifier I just assume that using a machine learning model to alter an image would also just confuse a network), is this an accurate assumption?

Either way: is there anyway to improve the data I've been given? Or another way to approach this problem?

nbro
  • 42,615
  • 12
  • 119
  • 217

1 Answers1

0

I suggest using a model of super resolution to enhance the quality of your dataset and then annotate it properly because by that time it will be readable at least from a human being perspective.

You can check out this blog post for more information.

haddagart
  • 1
  • 1