(Just to preface that I do not have such a great understanding of LLMs and AI in general...)
My question is, when I pose a question to a LLM, will it present the fastest response that satisfies the parameters of the query, irrespective whether it is capable of providing a better answer with more compute? ("Better" in this sense meaning that if you posed the question and both answers to the LLM, the LLM itself would acknowledge that the second answer was "better" by some metric.)
I would provide some examples but I feel that that might be counterproductive - I want to avoid focus on a specific type of query. Also, clearly there are many questions that there is no complexity to the answers - e.g. "what is the capital of Thailand?" - so more compute will not improve the answer in any meaningful way.
I guess another way of asking this question is, what parameters does an LLM use to decide when an answer is complete/sufficient?