The main thing that I can spot that is missing from your outline of "a general conversation and problem solving AI" is that there is no assessment of direction or outcome.
In AlphaZero, that is provided by a game rules engine which tells the AI when it has won. There is no such game rules engine for open-ended conversation, or in general for a goal-based conversation such as technical support or sales. Without this, an engine like GPT-3 has nothing to ground it to any particular text other than statistics - it could pick the most (or least) likely progression of a conversation, but not use any concept of utility for itself or the other participant in the conversation.
The grounding problem is also still a major issue independently. A conversation "solving" AI may have no concept of the subject of discussion, or of the state of the person it is talking to. It models the language around these things - e.g. it may adjust to a human expressing fear or excitement. But it has no concept of what these mean, beyond finding matching text in its reply. Importantly it has no state model for the other side of the conversation beyond what is written.
You may find you need to solve the grounding problem though in order to perform any assessment of a conversation in a general context.
By evaluating it's own responses
How, on what basis? That's the problem that still needs to be solved after combining the two technologies.