Should I use U-net to label keys in a keyboard image?

Question

This is a 600*800 image.

Which algorithm/model should I use to get an image like the one below, in which each key is detected and labeled by a rectangle?

I guess this is some kind of a segmentation problem where U-net is the most popular algorithm, though I don't know how to apply it to this particular problem.

nbro · Accepted Answer · 2021-04-22T01:09:13.623

If you just need to draw a rectangle around each key, this is an object detection or template matching problem, so you can use any of the available models for object detection (e.g. YOLO) or any technique for multi-template template matching (e.g. you can use sequential RANSAC or t-linkage). In the first case, you will need a labeled dataset, while, in the second case, you will need the original image and the templates (in your case, a template would be an image of a key).

So, no, this is not a segmentation problem (which would be the task of classifying each pixel in the objects of interest, and not just locating the objects).

Kirill Fedyanin · Answer 2 · 2021-03-25T10:41:53.267

There are two related problems for images

Semantic segmentation, where you need to assign each pixel on the image some class. I.e. you have a satellite image and want to segmentate roads/forests/fields and so on
Objects detection, where you need to detect different types of objects and draw a bounding box for each. I.e. there is a popular dataset MSCOCO for the task, where you need to localize all bikes/people/cats/etc on the image

U-net is good for the first task, but I would say you have a second. You can use something like YOLOv3 (if you need fast inference) or fast R-CNN if you need precision. If you need really good performance, you can browse top methods for the task on the paperswithcode.com: semantic segmentation, object detection

Should I use U-net to label keys in a keyboard image?

2 Answers2