Can we combine Alpha-zero with GTP-4 to create a general AI?

Question

Alpha Zero is good at looking into the future to plan it's next move. GTP-4 is good at generating language from previous text.

It seems like combining these two systems would create a general conversation and problem solving AI.

For example, in response to a question, the AI could generate a thousand candidate replies using GTP-4, then generate a thousand suggestions of how the human might reply to each of those responses. This chess-like thinking is exactly what Alpha Zero is good at.

By evaluating it's own responses and predicting which ones would lead to a successful outcome, this is more or less what human's do.

(The only difference might be that humans have a longer memory than a GTP-4 algorithm).

score 1 · Answer 1 · answered Jul 14 '22 at 07:41

The main thing that I can spot that is missing from your outline of "a general conversation and problem solving AI" is that there is no assessment of direction or outcome.

In AlphaZero, that is provided by a game rules engine which tells the AI when it has won. There is no such game rules engine for open-ended conversation, or in general for a goal-based conversation such as technical support or sales. Without this, an engine like GPT-3 has nothing to ground it to any particular text other than statistics - it could pick the most (or least) likely progression of a conversation, but not use any concept of utility for itself or the other participant in the conversation.

The grounding problem is also still a major issue independently. A conversation "solving" AI may have no concept of the subject of discussion, or of the state of the person it is talking to. It models the language around these things - e.g. it may adjust to a human expressing fear or excitement. But it has no concept of what these mean, beyond finding matching text in its reply. Importantly it has no state model for the other side of the conversation beyond what is written.

You may find you need to solve the grounding problem though in order to perform any assessment of a conversation in a general context.

By evaluating it's own responses

How, on what basis? That's the problem that still needs to be solved after combining the two technologies.

Can we combine Alpha-zero with GTP-4 to create a general AI?

1 Answers1