What prompt (or other technique) should I use with an LLM so that
- The result is guaranteed to be reliably parseable as a list of values (e.g. a Python list of strings)
- LLM would understand that a variable list of values is expected during generation
EDIT: Actual Solution
Note: For more in-depth understanding, please read the excellent post Schema Alligned Parsing
And for comparing different structured output generation methods, read https://www.boundaryml.com/blog/structured-output-from-llms
BAML worked perfectly for me
More specifically, I wrote this definition
class Resume {
skills string[] @description("Only include programming languages")
}
then BAML handled communication with the LLM (both prompt generation and parsing of the prompt completion result by the LLM)
all of this got parsed into a list of strings.
Example
Step 1
Create resume.baml file
class Resume {
name string
education Education[] @description("Extract in the same order listed")
skills string[] @description("Only include programming languages")
}
class Education {
school string
degree string
year int
}
function ExtractResume(resume_text: string) -> Resume {
client GPT4o
prompt #"
Parse the following resume and return a structured representation of the data in the schema below.
Resume:
---
{{ resume_text }}
---
{# special macro to print the output instructions. #}
{{ ctx.output_format }}
JSON:
"#
}
Step 2
BAML will generate bindings for your language (Python/TS/Ruby/Java/C#/Rust/Go or HTTP API, etc.)
from baml_client import b
resume = b.ExtractResume("""
John Doe
Education
- University of California, Berkeley
- B.S. in Computer Science
- 2020
Skills
- Python
- Java
- C++
""")
Note: resume is of Pydantic type Resume
BAML will format the input as follows and send it to the LLM
Parse the following resume and return a structured representation of the data in the schema below.
Resume:
John Doe
Education
- University of California, Berkeley
- B.S. in Computer Science
- 2020
Skills
- Python
- Java
- C++
Answer in JSON using this schema:
{
name: string,
// Extract in the same order listed
education: [
{
school: string,
degree: string,
year: int,
}
],
// Only include programming languages
skills: string[],
}
JSON:
Variable resume has a structured result
