It seems that nowadays lots of people building tools to extract data from different sources (e.g. PDF reports) with the help of models like GPT-4, LLama or Falcon.
I was wondering, if this really works, because from what I know, you could train GenAI models to reproduce entities in the same way as they are written in the original text, but still produce the output themselves instead of pointing to the parts of the text.
So I was wondering if Extraction can be achieved reliably with GenAI or if ExtractiveAI with Bert-like models (e.g. SpanMarker for NER) is the way to go?