5

(I apologize for the title being too broad and the question being not 'technical')

Suppose that my task is to label news articles. This means that given a news article, I am supposed to classify which category that news belong to. Eg, 'Ronaldo scores a fantastic goal' should classify under 'Sports'.

After much experimentation, I came up with a model that does this labeling for me. It has, say, 50% validation accuracy. (Assume that it is the best)

And so I deployed this model for my task (on unseen data obviously). Of course, from a probabilistic perspective, I should get roughly 50% of the articles labelled correctly. But how do I know that which labels are actually correct and which labels need to be corrected? If I were to manually check (say, by hiring people to do so), how is deploying such a model better than just hiring people to do the classification directly? (Do not forget that the manpower cost of developing the model could have been saved.)

DukeZhou
  • 6,209
  • 5
  • 27
  • 54

2 Answers2

10

There are several advantages:

  1. Some text classification systems are much more accurate than 50%. For example, most spam classification systems are 99.9% accurate, or more. There will be little value to having employees review these labels.
  2. Many text classification systems can output a confidence as well as a label. You can selectively have employees review only the examples the model is not confident about. Often these will be small in number.
  3. You can usually test a text classification model by having it classify some unseen data, and then asking people to check the work. If you do this for a small number of examples, you can make sure the system is working. You can then confidently use the system on a much large set of unlabeled examples, and be reasonably sure about how accurate it is.
  4. For text, it is also important to measure how much different people agree on the ratings. You are unlikely to do better than this, because this gives you a notion of the subjectivity of the specific problem you are working on. If people disagree 50% of the time anyway, maybe you can accept a 50% failure rate from the automated system, and not bother checking its work.
John Doucette
  • 9,452
  • 1
  • 19
  • 52
0

First of all to be more real, you usually expect more than 50% validation accuracy on articles predictions.

Back on your question, you should definitely try to automate this process if you are looking for a long-term solution of labeling articles. Deploying such model should not cost more than hiring employees to do this manually, at least for a long-term perspective.

theonekeyg
  • 91
  • 1
  • 4