Why does test data need to be labelled?

Question

I have a problem understanding why test data needs to be labelled to test a trained faster R-CNN model. Maybe it's basic, but I don't get why it needs to be labelled.

When an image is not obvious, like the type of disease, let's say, a label is useful to know if it's good or not because you're not a doctor. But for the classification between airplanes and cars, the label should be optional in my mind, because even if the classifier is wrong you'll be able to recognize that and will make adjustments accordingly.

Why does test data need to be labelled even in "obvious" cases?

chessprogrammer · Accepted Answer · 2023-12-06T22:59:27.487

2

If the data is not labeled, you will have no way of knowing if your model was wrong or correct. You seem to suggest evaluating by eye, but that will not scale to datasets with thousands or millions of samples.

edited Dec 06 '23 at 22:59

answered Dec 06 '23 at 20:25

chessprogrammer

3,050
2
16
26

score 1 · Answer 2 · answered Dec 06 '23 at 22:50

You're grading a multiple choice test, so you need to know the correct answers in order to mark your model's answers (predictions) as correct or incorrect. If you lack the answer key---which you can have because of external labeling, label yourself, or generate using sophisticated methods of semi-supervised learning---you cannot mark the predictions as incorrect or incorrect.

If you don't have the labels and have to generate them youself, you have to take the time of doing the labeling. This can be done, but your time as a software engineer or data scientist is probably better spent on tasks other than deciding if images are of cars or helicopters (or it might be that you lack the expertise to make the labels, such as when the labels come from medical images that physicians spend years learning to interpret).

Since it is common to have data without labels, there is an area of semi-supervised learning that uses labeled data to generated labels for unlabeled data that are then treated as labeled data (at least loosely speaking).

Why does test data need to be labelled?

2 Answers2