Leveraging structured knowledge to improve accuracy and relevance in retrieval augmented generation systems
Retrieval Augmented Generation (RAG) systems have revolutionized how large language models (LLMs) access and utilize external knowledge. By retrieving relevant information from a knowledge base before generating responses, RAG enables LLMs to provide more accurate, up-to-date, and verifiable answers. As shown in Figure 1, a RAG system consists of three essential components: a Retriever that identifies relevant information, a Knowledge Store that maintains the indexed document repository, and a Generator (typically an LLM) that synthesizes the retrieved context with the user's query to produce a comprehensive response. However, traditional RAG systems face significant challenges when dealing with complex information structures.
Standard RAG implementations rely primarily on vector similarity search, treating documents as collections of independent chunks with limited contextual relationships. While effective for simple question answering, this approach struggles with:
For example, when answering questions about financial data, a traditional RAG system might retrieve document chunks containing relevant numbers but may miss crucial context about which fiscal periods, products, or business segments they relate to. This is particularly relevant for standardized financial documentation with highly correlated chunks.
Knowledge graphs provide a natural solution to these challenges by explicitly modeling entities and their relationships. By representing documents as interconnected nodes and edges rather than isolated chunks, knowledge graph-based RAG systems can:
Figure 2: Interactive visualization of a knowledge graph for financial data. This graph represents entities like Apple Inc., its financial metrics across different quarters, and the relationships between them. Click on any node to explore its connections.
Knowledge graph-based approaches for RAG have been gaining attention in recent research. Notable examples include Microsoft's GraphRAG
While this article focuses on knowledge graph-based enhancements to RAG, several other approaches have been developed to address the limitations of basic vector-based retrieval:
These approaches each have their own strengths and are often complementary to knowledge graph methods. The entity-based KG-RAG approach we present in this article shares similarities with hierarchical retrieval methods, as it first identifies relevant entities before exploring their local neighborhoods to find relevant document chunks. However, it distinguishes itself by explicitly modeling and utilizing the relationships between entities in a structured knowledge graph.
As illustrated in Figure 1, a comprehensive RAG system consists of three primary components that work together to provide accurate, context-aware responses:
In this article, we focus primarily on the knowledge store and retriever components, as these are where knowledge graph enhancements have the most significant impact. The following sections will explore how these components are implemented in both traditional vector-based RAG and our knowledge graph-based approach.
The standard RAG approach relies on vector similarity between the query embedding and the pre-embedded document chunks in the vector database to retrieve relevant context. This implementation aligns with the three core components shown in Figure 1, where the Knowledge Store contains chunk embeddings mapped to document chunks, the Retriever uses embedding similarity to match queries to relevant chunks, and the Generator is an LLM. The process follows a straightforward pipeline as illustrated below:
The workflow consists of three main stages:
In our implementation of the baseline system, we incorporate metadata into chunks by adding source information (e.g., "From: 2023_Q3_AAPL.pdf") at the top of each chunk, although this metadata is not included in the embedding calculation itself.
This approach works well for many question-answering tasks but has certain limitations when dealing with complex, relationship-heavy domains:
To address the limitations of vector-based RAG, we introduce knowledge graph-based approaches that incorporate structured relationships into the retrieval process. These methods build and leverage a knowledge graph representing entities and relationships extracted from the document collection.
The effectiveness of any KG-RAG system heavily depends on the quality of the underlying knowledge graph. But how exactly is this graph generated from the unstructured document text? The process typically involves using LLMs to extract entities and their relationships from text chunks.
The core of our knowledge graph generation process is the LLMGraphTransformer
from Langchain, which leverages large language models to identify entities and relationships from document text. The process follows these key steps:
Let's take a closer look at the extraction process:
# Define the graph transformer with allowed entities and relationships
transformer = LLMGraphTransformer(
llm=ChatOpenAI(model="gpt-4o", temperature=0),
strict_mode=True
)
# Process documents to extract graph elements
graph_documents = transformer.convert_to_graph_documents(documents)
# Create a unified graph from the extracted elements
graph = create_graph_from_graph_documents(graph_documents)
The system prompts the LLM with specific instructions to identify entities and their relationships. Here's a simplified view of how the extraction works:
The LLM transforms this structured output into graph nodes and edges with appropriate types and properties.
Building high-quality knowledge graphs from unstructured text presents several challenges:
Our implementation addresses some of these challenges by:
One significant challenge we encountered was that document-level hierarchies and metadata are not reliably extracted during knowledge graph creation. This occurs because critical context (like document titles "2023 Q3 AAPL" or "2022 Q2 MSFT") typically appears only on the first page or in the document title, but gets lost during the chunking process.
To address this limitation, we implemented a context preservation technique that hyphenates entities with their source document titles before creating entity embeddings. For example, rather than just embedding "Gross Margin Percentage" as a standalone entity, we embed "Gross Margin Percentage - 2023 Q3 AAPL" to incorporate document-level context. This approach ensures that even when the chunk-level extraction misses the hierarchical relationship, the entity embeddings still capture the critical document-source context.
The Entity-Based KG-RAG approach enhances the standard RAG pipeline by reimagining the Knowledge Store and Retriever components shown in Figure 1. Instead of just storing chunk embeddings, the Knowledge Store consists of a knowledge graph of interconnected entities with entity-chunk relationships that map these entities to the relevant document chunks from which they were extracted. The Retriever uses a two-step process that first identifies relevant entities and then explores their connections before selecting chunks. The process follows these steps:
The workflow consists of four main stages:
This approach leverages both semantic similarity (through embeddings) and structural relationships (through the knowledge graph) to provide more accurate and comprehensive answers. The entities in the knowledge graph are embedded based on the entity name and source document using the OpenAI text-embedding-3-small
model, and compared to the query embedding using cosine similarity.
It's important to note that the "top nodes" concept is separate from the final number of chunks selected. We choose top N nodes and score them based on both their frequency (how often they are associated with a specific chunk) and overall similarity scores to the query. Then we select a set of top K chunks based on that combined score.
To better understand how the Entity-Based KG-RAG method works in practice, let's look at an interactive visualization of the process for a query where the baseline system answers incorrectly but the entity-based system gets right:
Figure 5: Interactive visualization of the Entity-Based KG-RAG approach showing query processing flow for a question about Apple's gross margin percentage. The visualization demonstrates how entities are identified, the subgraph is explored, and relevant document chunks are selected to provide comprehensive answers.
The visualization above shows how a query about Apple's gross margin percentage flows through the Entity-Based KG-RAG system:
The Entity-Based KG-RAG approach offers several advantages over traditional RAG systems:
Despite its advantages, knowledge graph-based RAG approaches also face several challenges and limitations:
In our implementation, we initially aimed to create a knowledge graph that would map all the way to terminal nodes containing specific values (e.g., APPLE -> HAS_DOCUMENT -> 2023 Q3 -> REPORTED -> Gross Margin Percentage -> 44%). However, we found that the knowledge graph creation process was biased toward extracting semantic relationships rather than embedding specific values. This led us to develop the entity-chunk mapping approach as a pragmatic solution to connect entities in the graph with the document chunks containing relevant values.
To evaluate the performance of different RAG approaches, we leverage a specialized dataset from Docugami based on SEC 10-Q quarterly financial reports from major technology companies. The dataset includes:
This dataset was chosen because financial documents represent an ideal use case for knowledge graph approaches - they contain numerous entities with complex relationships between them, and answering questions often requires connecting information across different sections.
While the original dataset included human-reviewed LLM-generated question-answer pairs, these tended to be qualitative in nature, making precise evaluation challenging. To address this limitation, we developed a set of 100 synthetic question-answer pairs with the following characteristics:
For example, a qualitative question like:
"Can any trends be identified in Apple's Services segment revenue over the reported periods?"
was transformed into a quantitative question such as:
"What was the increase in Apple's Services segment net sales from the quarter ended June 25, 2022, to the quarter ended July 1, 2023, as reported in their 2022 Q3 and 2023 Q3 10-Q filings? Provide the answer in millions of dollars as a whole number without commas."
Here's an example of a multi-hop question where the KG-RAG method outperforms the baseline approach:
This question requires the system to find and connect information about R&D expenses from two different filing periods, perform a calculation, and return the result in a specific format. The knowledge graph approach excels at this type of question because it can explicitly model the relationships between entities (Apple, R&D expenses, time periods) and facilitate the multi-hop reasoning required.
It's important to note that our focus on quantitative questions is primarily for evaluation simplicity and not to suggest that end users would only ask for numerical answers. In real-world deployments, users would likely ask a much broader range of questions, including qualitative ones about trends, strategies, risks, and other textual information in the reports.
We evaluated each RAG system using the following methodology:
It's worth noting that we did not implement a re-ranker for either method in these experiments. However, re-ranking could be an interesting future exploration for the KG-RAG method by supplying token-efficient subgraph path definitions to have the model re-rank paths based on their usefulness to the query, with the associated nodes then used to select chunks.
Our evaluation revealed significant performance differences between the baseline RAG and Entity-Based KG-RAG approaches. The following visualization shows the overall accuracy comparison:
The Entity-Based KG-RAG approach showed a substantial improvement over the baseline, with accuracy increasing from 40% to 55% when using GPT-4o, and from 36.36% to 56% when using GPT-4o-mini. This represents a relative improvement of approximately 37.5% and 54% respectively.
Surprisingly, the performance of the entity-based approach was even more pronounced with the smaller GPT-4o-mini model, which typically performs worse than the larger GPT-4o model. This suggests that the structural knowledge provided by the knowledge graph compensates for the limitations of the smaller model, allowing it to leverage relationships more effectively than the baseline approach.
Regarding latency, our measurements showed that the KG-RAG method adds only minimal overhead to the retrieval process compared to the baseline method:
Here, we conduct a detailed error analysis using a confusion matrix to understand the patterns of success and failure between the two approaches:
The confusion matrix reveals that:
This asymmetric pattern suggests that the KG-RAG approach maintains most of the strengths of the baseline approach while addressing many of its weaknesses through improved structural understanding.
We also investigated how the performance of the KG-RAG approach varies with different configuration parameters, particularly the number of top nodes considered in the similarity matching stage:
This analysis reveals that performance peaks when considering between 30-40 top similar nodes (56% accuracy), with a noticeable decline when considering either too few (< 10 nodes) or too many (> 50 nodes) similar entities. This suggests an optimal balance where the system has enough similar entities to explore related connections, but not so many that it introduces noise or dilutes the relevance of the retrieved context.
While this article has focused on the entity-based KG-RAG approach, we also implemented several other knowledge graph-based methods that show promise for different use cases. These approaches were not included in the main evaluation for various reasons detailed below, but they offer valuable alternatives for specific scenarios.
The Cypher-based KG-RAG method leverages a Neo4j graph database and uses structured query language (Cypher) instead of vector embeddings as the primary retrieval mechanism:
This approach excels when questions map clearly to specific relationship patterns in the knowledge graph, but requires more specialized knowledge of the underlying graph structure. We use the Langchain neo4j cookbooks as a reference for our implementation. LlamaIndex also implements a Knowledge Graph Query Engine using neo4j and cypher generation.
In our initial testing, the cypher-based approach had difficulty with the knowledge graph provided for the SEC 10-Q dataset, as the LLM-generated Cypher queries could not reliably capture the complexity of the relationships in the financial documents. This led to brittle query generation and inconsistent results, which is why we excluded it from the main evaluation.
Future work could leverage custom models for cypher generation or validating cyphers in a loop to refine query until valid cypher is emitted.
GraphRAG combines knowledge graphs with embedding-based retrieval and community detection algorithms:
This approach is particularly effective for documents with natural community structures, such as research papers with distinct sections or reports covering various business segments. We leverage a langchain implementation of GraphRAG for ease-of-use, though other implementations like those from LlamaIndex also offer similar capabilities.
In our early evaluations, this particular implementation of GraphRAG achieved approximately 20% accuracy (without any prompt tuning) on the SEC 10-Q dataset. Due to time constraints, we decided to focus our comprehensive evaluation on the entity-based approach that showed more promising initial results.
Knowledge graph-based RAG approaches represent a significant advancement over traditional vector-based methods, especially for domains with complex relational structures. By incorporating structured relationships into the retrieval process, these methods can provide more accurate, comprehensive, and explainable answers.
Our experiments with the Entity-Based KG-RAG method show promising results, particularly for questions that require understanding relationships between multiple entities and documents. The ability to explore subgraphs and combine structural knowledge with textual information enables more nuanced and accurate responses.
The comparative analysis clearly demonstrates that incorporating structural knowledge through knowledge graphs significantly improves the ability of RAG systems to handle complex information needs, particularly in domains with rich relational structures like financial documentation.
Future directions for this research include:
The source code for reproducing all experiments and implementations described in this article is available in our KG-RAG repository.
We would like to thank the Vector Institute for supporting this research, and the open-source community for providing valuable tools and frameworks that made this work possible.