Python GraphRag retrieval: Is using the embedded user's question when querying the data from graph good or bad?

Question

I'm following a tutorial about using GraphRAG Python library to answer user's questions about actors who acted in the movies that match some specific description. The approach is to embed the user's question (also called the query text) using an embedder (one from OpenAI). And then use a graph vector retriever to query the graph database to return a set of movies with the value embedded_plot property most relevant to the embedded query text. And lastly, an LLM will be invoked with context provided in the user's question and the result returned from graph to return the final output.

The main part of the code looks as follows:

llm = OpenAILLM(model_name="gpt-4o", model_params={"temperature": 0})
# Initialize the RAG pipeline
rag = GraphRAG(retriever=vc_retriever, llm=llm)
# Query the graph
query_text = "Who were the actors in the movie about the magic jungle board game?"
response = rag.search(query_text=query_text, retriever_config={"top_k": 3})

However, I am not sure whether embedding the user's question (i.e. the query_text) is accurate or not. Because when retrieving data from graph, we need to match movies with relevant embeded_plot property against the description here is "movie about the magic jungle board game" not the irrelevant part "who were the actors in" which is mainly used as the context for the LLM?

The full code looks like this:

from neo4j import GraphDatabase
from neo4j_graphrag.retrievers import VectorCypherRetriever
from neo4j_graphrag.embeddings.openai import OpenAIEmbeddings
from neo4j_graphrag.llm import OpenAILLM
from neo4j_graphrag.generation import GraphRAG
Demo database credentials
URI = "neo4j+s://demo.neo4jlabs.com"
AUTH = ("recommendations", "recommendations")
Connect to Neo4j database
driver = GraphDatabase.driver(URI, auth=AUTH)
embedder = OpenAIEmbeddings(model="text-embedding-ada-002")
retrieval_query = """
MATCH
(actor:Actor)-[:ACTED_IN]->(node)
RETURN
node.title AS movie_title,
node.plot AS movie_plot,
collect(actor.name) AS actors;
"""
vc_retriever = VectorCypherRetriever(
    driver,
    index_name="moviePlotsEmbedding",
    embedder=embedder,
    # return_properties=["title", "plot"],
    retrieval_query=retrieval_query,
)
LLM
Note: the OPENAI_API_KEY must be in the env vars
llm = OpenAILLM(model_name="gpt-4o", model_params={"temperature": 0})
Initialize the RAG pipeline
rag = GraphRAG(retriever=vc_retriever, llm=llm)
Query the graph
query_text = "movie about the magic jungle board game?"
response = rag.search(query_text=query_text, retriever_config={"top_k": 3})
print(response.answer)

Python GraphRag retrieval: Is using the embedded user's question when querying the data from graph good or bad?

Demo database credentials

Connect to Neo4j database

LLM

Note: the OPENAI_API_KEY must be in the env vars

Initialize the RAG pipeline

Query the graph

0 Answers0