Confused about NER evaluation

Question

I'm reading from the book Speech and Language Processing by Dan Jurafsky and James H. Martin and I've stuck for a while trying to understand what the authors mean by the following

The fact that named entity tagging has a segmentation component which is not present in tasks like text categorization or part-of-speech tagging causes some problems with evaluation. For example, a system that labeled Jane but not Jane Villanueva as a person would cause two errors, a false positive for O and a false negative for I-PER. In addition, using entities as the unit of response but words as the unit of training means that there is a mismatch between the training and test conditions.

In particular, I don't see what's the problem with

For example, a system that labeled Jane but not Jane Villanueva as a person would cause two errors, a false positive for O and a false negative for I-PER.

Was is expected to cause one error or something? What if Jane Villaneuva was labeled as B-LOC I-LOC?

Thanks.

Brian O'Donnell · Answer 1 · 2022-12-10T01:42:17.043

You left some critical context out. The author is referring to the following sentence:

[PER Jane Villanueva ] of [ORG United] , a unit of [ORG United Airlines Holding] , said the fare applies to the [LOC Chicago ] route.

A NER algorithm should label "Jane Villanueva" as a person. The BIO labels would be B-PER for 'Jane' and I-PER for 'Villanueva.' Labeling 'Villanueva' as 'O' is incorrect and produces a false positive. Not labeling 'Villanueva' as I-PER is incorrect and produces a false negative. As Jurafsky explains, "any tokens outside of any span of interest are labeled O."

Labeling 'Jane Villanueva' as B-LOC I-LOC would also be incorrect; therefore, both labels would be false negatives.

The author assumes that the reader knows the BIO tagging labels. For a reference, see this page from Hugging Face. Here is an excerpt:

For those interested, the book is online here.

Confused about NER evaluation

1 Answers1