7

From what I understand, ChatGPT is just a fancy neural network, operating like a sophisticated Markov Chain generator. As such, it should only be able to generate tokens that are in its training dataset. One thing it should not be able to generate is a token unique to the conversation, such as a random number, since the conversation is not part of its training.

However, ChatGPT very easily, and with complete accuracy, repeats many long random numbers I give it. I also split up the random number across my request, and have ChatGPT piece it together again.

Eric Holloway: How about concatenating these numbers, and removing the intermittent filler words? 9475 3008 3305 11 frog 4357 5075 0021 3385 0894 2324 3812 9336 29 dog 52 1375 8236 3840 5500 6080 3841 log 7266 0722 9112 9774 4967 4707 81

ChatGPT: To concatenate the numbers you have provided, while removing the filler words “frog”, “dog”, and “log”, you can simply combine the numbers in the order that they are given. Here is the resulting number: 9475 3008 3305 1143 5750 7500 2133 8508 9423 2438 1293 3629 5213 7582 3638 4055 0060 8038 4172 6607 2291 1297 7449 6747 0781

How is a neural network able to do this?

yters
  • 417
  • 4
  • 11

3 Answers3

11

As well as decent modelling of purposes or strings of digits, ChatGPT can identify when a completely novel string might be a product name, a fictional person or place etc.

The language model in ChatGPT is partially character based. There are around 100,000 tokens in latest GPT - the encoding is called c100k_base. Most of them are common English words (e.g. "the"), but it also supports individual characters including individual digits, and processes sequences of characters and effectively models groups of them at a time.

From what I understand, ChatGPT is just a fancy neural network, operating like a sophisticated Markov Chain generator.

That's a reasonable analogy. ChatGPT is a bit like a 2048-gram prediction engine for the next token, shift the sequence along one and repeat. No different to toy fantasy name generators when viewed from 10,000 feet up.

As such, it should only be able to generate tokens that are in its training dataset. One thing it should not be able to generate is a token unique to the conversation, such as a random number, since the conversation is not part of its training.

A couple of misunderstandings here. First, the random number will not become a single token, but will be one token per digit, or pair of digits or triple digits, depensing on sequence - you can give this a try to help visualise it, in the encoding that ChatGPT uses. Of course each of those tokens will have been seen before, millions of times in the training data.

Second, sequences do not need to be seen in the training data in order for ChatGPT to work with them. In fact, with an input sequence length of 2048, pretty much all inputs to ChatGPT in inference mode are unique never-seen-before sequences. Regardless if some of the tokens represent a long random number, the chances of any 2048 long sequence of letters and short words being unique when generated are very high.

This is where the neural network model differs from a true 2048-gram. It has generalised from the training data well enough that it actually can predict meaningful and useful values for probability of next token, even though in all likelihood it has never before seen the exact same sequence. In this regard it is an approximation of a "perfect" 2048-gram prediction engine that somehow been trained on infinite human writings.

A lot of language modelling is about correctly processing the context of a subsequence, so recognising a number sequence as being a grammatical "unit" that can be reused as-is is not a surprising feature.

Neil Slater
  • 33,739
  • 3
  • 47
  • 66
2

Existing answer is great about model generalization, but I would like to add about an important inductive bias of the Transformer model architecture used for ChatGPT.

In the Transformer model architecture, there is a mechanism called attention. An attention block in a model can access all context (input and previously-generated output) and reminder what is needed based on the model's choice. A partial analogy of the attention mechanism is how humans' eye move to some previous text while writing.

Due the attention mechanism, a Transformer model is very good at repeating the input as is to the output. This is different from previous architectures such as RNN, where the model has to convert the input to their internal representation.

That being said, the provided example is something more than a mere repeat, and it is very difficult to say "how" a large language model performs in general, especially as ChatGPT model is not public.

pcpthm
  • 266
  • 1
  • 6
-6

Turns out ChatGPT is indeed human curated, by open admission.

During this conversation ChatGPT outright states the OpenAI team filters and edits the GPT generated responses.

...the response you are receiving is being filtered and edited by the OpenAI team, who ensures that the text generated by the model is coherent, accurate and appropriate for the given prompt.

Apparently, the fact that OpenAI actively curates ChatGPT's responses is indirectly implied in the documentation here.

Human in the loop (HITL): Wherever possible, we recommend having a human review outputs before they are used in practice. This is especially critical in high-stakes domains, and for code generation. Humans should be aware of the limitations of the system, and have access to any information needed to verify the outputs (for example, if the application summarizes notes, a human should have easy access to the original notes to refer back).

So, that explains that :)

yters
  • 417
  • 4
  • 11