Again and again I ask myself what goes on in a pre-trained transformer-based language model (like ChatGPT9) when it comes to "know" that it cannot give an appropriate answer and either
- states it ("I have not enough information to answer this question.") 
- asks for more specific information ("Please tell me which kind of XY you mean.") 
- calls a plugin (like Wolfram or ScholarAI) 
(I assume that this will never happen without reinforcement learning by human feedback. A pre-trained-only model would always answer something (possibly hallucinating) and not "reflect" about its lack of knowledge.)
The only possibility that I can see - but it's not really explanatory: that after some steps of execution the sum of the top_k probabilities of the final vector (which gives probabilities to the all words in the vocabulary) is too small. But what, when this happens only late? ChatGPT would already have produced lots of words - but one never observes that he stops generation after some lengthy text and only then ends with something like "Ah, finally I see that I'm missing information. I wasn't aware in the beginning." ChatGPT immediately admits that he doesn't know (when he does). And when ChatGPT calls a plugin - e.g. ScholarAI - he does it without having produced a single word of response to the last message.
In principle, ChatGPT could generate a complete response in the background that then is checked somehow if it's "satisfactory". If yes it's given as output (simulating word-by-word generation), if not, it's regenerated with some sort of trigger (a hidden token?) to admit that ChatGPT is missing information or to call a plugin.
What's the clever trick under the hood (in some technical detail)?
 
     
    