I'm reading from the book Speech and Language Processing by Dan Jurafsky and James H. Martin and I've stuck for a while trying to understand what the authors mean by the following
The fact that named entity tagging has a segmentation component which is not present in tasks like text categorization or part-of-speech tagging causes some problems with evaluation. For example, a system that labeled Jane but not Jane Villanueva as a person would cause two errors, a false positive for O and a false negative for I-PER. In addition, using entities as the unit of response but words as the unit of training means that there is a mismatch between the training and test conditions.
In particular, I don't see what's the problem with
For example, a system that labeled Jane but not Jane Villanueva as a person would cause two errors, a false positive for O and a false negative for I-PER.
Was is expected to cause one error or something? What if Jane Villaneuva was labeled as B-LOC I-LOC?
Thanks.
