How to force LLM (like OpenAI ChatGPT) to output a variable list of values?

Question

What prompt (or other technique) should I use with an LLM so that

The result is guaranteed to be reliably parseable as a list of values (e.g. a Python list of strings)
LLM would understand that a variable list of values is expected during generation

EDIT: Actual Solution

Note: For more in-depth understanding, please read the excellent post Schema Alligned Parsing

And for comparing different structured output generation methods, read https://www.boundaryml.com/blog/structured-output-from-llms

BAML worked perfectly for me

More specifically, I wrote this definition

class Resume {
  skills string[] @description("Only include programming languages")
}

then BAML handled communication with the LLM (both prompt generation and parsing of the prompt completion result by the LLM)

all of this got parsed into a list of strings.

Example

Run it yourself

Step 1

Create resume.baml file

class Resume {
  name string
  education Education[] @description("Extract in the same order listed")
  skills string[] @description("Only include programming languages")
}
class Education {
  school string
  degree string
  year int
}
function ExtractResume(resume_text: string) -> Resume {
  client GPT4o
prompt #"
    Parse the following resume and return a structured representation of the data in the schema below.
Resume:
---
{{ resume_text }}
---

{# special macro to print the output instructions. #}
{{ ctx.output_format }}

JSON:

"#
}

Step 2

BAML will generate bindings for your language (Python/TS/Ruby/Java/C#/Rust/Go or HTTP API, etc.)

from baml_client import b
resume = b.ExtractResume("""
      John Doe
  Education
  - University of California, Berkeley
    - B.S. in Computer Science
    - 2020

  Skills
  - Python
  - Java
  - C++

""")
Note: resume is of Pydantic type Resume

BAML will format the input as follows and send it to the LLM

Parse the following resume and return a structured representation of the data in the schema below.
Resume:
John Doe
Education

University of California, Berkeley
B.S. in Computer Science
2020



Skills

Python
Java
C++


Answer in JSON using this schema:
{
  name: string,
  // Extract in the same order listed
  education: [
    {
      school: string,
      degree: string,
      year: int,
    }
  ],
  // Only include programming languages
  skills: string[],
}
JSON:

Variable resume has a structured result

Neil Slater · Answer 1 · 2024-02-09T11:12:05.593

There are no guarantees that an LLM used directly will output with any specific constraints.

You can add to prompts, fine-tuning or other customisation to request a specific output, and that often helps, but does not guarantee output will conform to what you requested. At best you can get a certain level of reliablility e.g. 90% accurate or 99% accurate. How accurate you can get depends on the complexity of the task you have stated in the prompt.

I have found that ChatGPT v4 (and I assume others at the same level) will output code snippets quite reliably, with either Python or JSON structure, and using markdown formatting, when this is specified in setting up an Assistant, a GPT or in a system prompt.

A simplified example of a system prompt that might do this:

You are a sentiment analysis tool for user reviews on a shopping site. For each user prompt, decide whether it is "negative", "neutral" or "positive". Then output a JSON snippet with key of "sentiment" and the decision as the value.

Example user input:

This television had a scratch on the screen when it was delivered.

Example LLM output:

A scratched screen is a sign of fault that may have occurred in manufacture or during delivery. Although the customer did not express any emotions in their post, they are likely to be unhappy with this defect.
{
 "sentiment": "negative"
}

In addition, it usually helps with accuracy of processing if you allow the chat session to add preamble to this, as per my example. I.e. do not demand, in the prompt, to only receive a final processed result. That is because the LLM does not "think" about the text output, it simply outputs text. The coherency of the text can be used to emulate some forms of processing.

This is a common misunderstanding when using services like ChatGPT, that many people seem to assume that posting a question causes the AI to "think" then return an answer. It doesn't do that. So if you want any kind of logical analysis of your input, it can help to have the LLM work through the issue - a bit like thinking out loud - before it outputs structured data.

To get your structured data:

In the web UI, code snippets with markdown will have a copy icon, and you can click it to get a copy of the code block to paste into something else.
Via the API, you will need to write code that can look for the markdown block, and extract it and parse. Regular expressions are good for this, and available in most programming languages.

In both cases, you need to allow for the times when the LLM doesn't output what you want. You can retry, fail with an error, or whatever is appropriate for your use case.

score 2 · Answer 2 · answered Feb 10 '24 at 10:53

I agree with @Neil Slater -- it's never guaranteed because inherently the LLM are a stochastic model, not deterministic.

Given the need for JSON output, OpenAI heeded to the call of the community and made it easy to increase your likelihood of generating a JSON output even with the cheaper gpt-3.5-turbo model.

Here's a code snippet from their documentation:

from openai import OpenAI
client = OpenAI()
response = client.chat.completions.create(
  model="gpt-3.5-turbo-0125",
  response_format={ "type": "json_object" },
  messages=[
    {"role": "system", "content": "You are a helpful assistant designed to output JSON."},
    {"role": "user", "content": "Who won the world series in 2020?"}
  ]
)
print(response.choices[0].message.content)

A tip I would offer is to create a simple test in which you pass a set of typical inputs and have that occur over 20 iterations. Then you examine the accuracy of the output.

score 0 · Answer 3 · answered Feb 10 '24 at 15:36

I'd say it's possible, although it depends on the model you're using.

If you're using llama.cpp (or its python bindings), you can use their grammars support to constrain the output

GBNF (GGML BNF) is a format for defining formal grammars to constrain model outputs in llama.cpp. For example, you can use it to force the model to generate valid JSON, or speak only in emojis.

For a more generic approach, and if you're using Python, take a look at guidance.

guidance is a programming paradigm that offers superior control and efficiency compared to conventional prompting and chaining. It allows users to constrain generation (e.g. with regex and CFGs) as well as to interleave control (conditional, loops) and generation seamlessly.

Here's a snippet of what a guidance configuration would look like (borrowed from their docs):

@guidance
def character_maker(lm, id, description, valid_weapons):
    lm += f"""\
    The following is a character profile for an RPG game in JSON format.
    ```json
    {{
        "id": "{id}",
        "description": "{description}",
        "name": "{gen('name', stop='"')}",
        "age": {gen('age', regex='[0-9]+', stop=',')},
        "armor": "{select(options=['leather', 'chainmail', 'plate'], name='armor')}",
        "weapon": "{select(options=valid_weapons, name='weapon')}",
        "class": "{gen('class', stop='"')}",
        "mantra": "{gen('mantra', stop='"')}",
        "strength": {gen('strength', regex='[0-9]+', stop=',')},
        "items": ["{gen('item', list_append=True, stop='"')}", "{gen('item', list_append=True, stop='"')}", "{gen('item', list_append=True, stop='"')}"]
    }}```"""
    return lm
lm = llama2 + character_maker(1, 'A nimble fighter', ['axe', 'sword', 'bow'])

How to force LLM (like OpenAI ChatGPT) to output a variable list of values?

EDIT: Actual Solution

BAML worked perfectly for me

Example

Step 1

Step 2

Note: resume is of Pydantic type Resume

Resume:

3 Answers3