Enhancing RAG with Knowledge Graphs

Leveraging structured knowledge to improve accuracy and relevance in retrieval augmented generation systems

Introduction

Retrieval Augmented Generation (RAG) systems have revolutionized how large language models (LLMs) access and utilize external knowledge. By retrieving relevant information from a knowledge base before generating responses, RAG enables LLMs to provide more accurate, up-to-date, and verifiable answers. As shown in Figure 1, a RAG system consists of three essential components: a Retriever that identifies relevant information, a Knowledge Store that maintains the indexed document repository, and a Generator (typically an LLM) that synthesizes the retrieved context with the user's query to produce a comprehensive response. However, traditional RAG systems face significant challenges when dealing with complex information structures.

Components of RAG: Retriever, Knowledge Store, and Generator — Figure 1: Components of a RAG system: Retriever, Knowledge Store, and Generator

The Challenges of Traditional RAG

Standard RAG implementations rely primarily on vector similarity search, treating documents as collections of independent chunks with limited contextual relationships. While effective for simple question answering, this approach struggles with:

Connecting Related Information: When relevant information is spread across multiple documents or sections
Understanding Complex Relationships: Between entities mentioned in different contexts
Multi-hop Reasoning: Questions that require synthesizing facts from multiple sources
Preserving Structural Information: Important relationships that exist in the original documents

For example, when answering questions about financial data, a traditional RAG system might retrieve document chunks containing relevant numbers but may miss crucial context about which fiscal periods, products, or business segments they relate to. This is particularly relevant for standardized financial documentation with highly correlated chunks.

Knowledge Graphs: A Structural Solution

Knowledge graphs provide a natural solution to these challenges by explicitly modeling entities and their relationships. By representing documents as interconnected nodes and edges rather than isolated chunks, knowledge graph-based RAG systems can:

Figure 2: Interactive visualization of a knowledge graph for financial data. This graph represents entities like Apple Inc., its financial metrics across different quarters, and the relationships between them. Click on any node to explore its connections.

Capture Meaningful Relationships: Between entities mentioned across documents
Enable Traversal-Based Retrieval: Following connection paths between related concepts
Combine Structural and Semantic Information: Leveraging both relationships and textual content
Support Explainable Retrieval: Making it clear why certain information was selected

Knowledge graph-based approaches for RAG have been gaining attention in recent research. Notable examples include Microsoft's GraphRAG, which leverages graph structures for query-focused summarization, along with other approaches like MiniRAG, which explores efficient retrieval methods that combines text chunks and named entities in a unified structure.

Alternative RAG Enhancement Approaches

While this article focuses on knowledge graph-based enhancements to RAG, several other approaches have been developed to address the limitations of basic vector-based retrieval:

Metadata Filtering: Enhances retrieval by using document metadata (e.g., titles, dates, authors) to filter or re-rank results. This can be particularly effective when users' queries include specific metadata elements.
Hierarchical (Big-to-Small) Retrieval: Implements a multi-stage retrieval process that first identifies relevant high-level documents or sections before retrieving specific chunks within them.
Advanced Embedding Models: Models like ColBERTv2, and E5offer more sophisticated embedding capabilities than basic models, capturing more nuanced semantic relationships. Similarly, Cohere's Rerank provides re-ranking capabilities to further improve relevance of retrieved context.
Hybrid Search: Combines multiple retrieval methods (e.g., keyword search with vector search) to leverage the strengths of different approaches.

These approaches each have their own strengths and are often complementary to knowledge graph methods. The entity-based KG-RAG approach we present in this article shares similarities with hierarchical retrieval methods, as it first identifies relevant entities before exploring their local neighborhoods to find relevant document chunks. However, it distinguishes itself by explicitly modeling and utilizing the relationships between entities in a structured knowledge graph.

RAG Architecture Overview

Key Components

As illustrated in Figure 1, a comprehensive RAG system consists of three primary components that work together to provide accurate, context-aware responses:

Knowledge Store: Responsible for storing, indexing, and organizing information from source documents. The knowledge store can be implemented using various approaches:
- Vector databases (traditional RAG)
- Knowledge graphs (KG-RAG)
- Hybrid stores (combining multiple representation methods)
Retriever: Responsible for identifying and retrieving the most relevant information from the knowledge store based on the user query. Retrieval mechanisms vary by implementation:
- Embedding similarity (traditional RAG)
- Graph traversal (KG-RAG)
- Hybrid approaches combining multiple retrieval strategies
Generator: The large language model that synthesizes the retrieved context and the user query to produce a comprehensive response. While the generator is typically consistent across different RAG implementations, its effectiveness depends heavily on the quality and relevance of the retrieved context.

In this article, we focus primarily on the knowledge store and retriever components, as these are where knowledge graph enhancements have the most significant impact. The following sections will explore how these components are implemented in both traditional vector-based RAG and our knowledge graph-based approach.

Baseline RAG Implementation

Workflow

The standard RAG approach relies on vector similarity between the query embedding and the pre-embedded document chunks in the vector database to retrieve relevant context. This implementation aligns with the three core components shown in Figure 1, where the Knowledge Store contains chunk embeddings mapped to document chunks, the Retriever uses embedding similarity to match queries to relevant chunks, and the Generator is an LLM. The process follows a straightforward pipeline as illustrated below:

Baseline RAG Architecture — Figure 3: Standard RAG architecture showing the Knowledge Store with chunk embeddings and the Retriever using direct similarity matching between the query and document chunks

The workflow consists of three main stages:

Query Embedding: Convert the user query into a vector embedding
Chunk Similarity Matching: Find document chunks with embeddings similar to the query (the Retriever calculates similarity scores like 0.73, 0.54, etc.)
Chunk Selection: Select the top-k most similar chunks based on these scores

In our implementation of the baseline system, we incorporate metadata into chunks by adding source information (e.g., "From: 2023_Q3_AAPL.pdf") at the top of each chunk, although this metadata is not included in the embedding calculation itself.

Limitations

This approach works well for many question-answering tasks but has certain limitations when dealing with complex, relationship-heavy domains:

The Knowledge Store only captures direct mappings between embeddings and chunks, without preserving relationships between information across different chunks
The Retriever relies solely on direct similarity matching, making it difficult to handle multi-hop questions that require following chains of relationships
Without explicit entity relationships, the system has limited ability to leverage structural information that exists in the original documents
The similarity-only approach risks retrieving chunks that are semantically related to the query but lack the specific contextual relationships needed for accurate answers

Knowledge Graph-Based RAG

To address the limitations of vector-based RAG, we introduce knowledge graph-based approaches that incorporate structured relationships into the retrieval process. These methods build and leverage a knowledge graph representing entities and relationships extracted from the document collection.

Knowledge Graph Generation

The effectiveness of any KG-RAG system heavily depends on the quality of the underlying knowledge graph. But how exactly is this graph generated from the unstructured document text? The process typically involves using LLMs to extract entities and their relationships from text chunks.

LLM-Based Entity and Relationship Extraction

The core of our knowledge graph generation process is the LLMGraphTransformer from Langchain, which leverages large language models to identify entities and relationships from document text. The process follows these key steps:

Text Chunking: Documents are first split into manageable chunks
Entity and Relationship Extraction: Each chunk is processed by an LLM with specialized prompting
Graph Construction: The extracted entities and relationships are assembled into a cohesive graph structure

Let's take a closer look at the extraction process:

# Define the graph transformer with allowed entities and relationships
    transformer = LLMGraphTransformer(
        llm=ChatOpenAI(model="gpt-4o", temperature=0),
        strict_mode=True
    )
    # Process documents to extract graph elements
    graph_documents = transformer.convert_to_graph_documents(documents)
    # Create a unified graph from the extracted elements
    graph = create_graph_from_graph_documents(graph_documents)

Python code for knowledge graph generation using LLMGraphTransformer

Prompt Engineering for Graph Extraction

The system prompts the LLM with specific instructions to identify entities and their relationships. Here's a simplified view of how the extraction works:

Input: "Apple Inc. reported a gross margin of 44.3% for Q3 2023, compared to 43.3% in the same quarter of 2022."

Output Entities:
- "Apple Inc." (type: Company)
- "gross margin" (type: Metric)
- "44.3%" (type: Amount)
- "Q3 2023" (type: Quarter)
- "43.3%" (type: Amount)
- "Q3 2022" (type: Quarter)

Output Relationships:
- (Apple Inc., REPORTED, gross margin)
- (gross margin, HAS_VALUE, 44.3%)
- (44.3%, REPORTED_IN, Q3 2023)
- (43.3%, REPORTED_IN, Q3 2022)
- (44.3%, COMPARED_TO, 43.3%)

The LLM transforms this structured output into graph nodes and edges with appropriate types and properties.

Challenges in Knowledge Graph Construction

Building high-quality knowledge graphs from unstructured text presents several challenges:

Entity Resolution: The LLM must correctly identify when different mentions refer to the same entity (e.g., "Apple", "Apple Inc.", "the company")
Relationship Accuracy: Extracting accurate relationships between entities requires understanding complex linguistic patterns and domain knowledge
Schema Consistency: Maintaining a consistent ontology (types of entities and relationships) across diverse documents
Processing Limitations: LLM context windows limit how much text can be processed at once, requiring careful document chunking strategies

Our implementation addresses some of these challenges by:

Entity normalization to reduce duplication (NetworkX graph creation runs this by default)
Careful document chunking to balance context preservation with processing efficiency

Document-Level Context Preservation

One significant challenge we encountered was that document-level hierarchies and metadata are not reliably extracted during knowledge graph creation. This occurs because critical context (like document titles "2023 Q3 AAPL" or "2022 Q2 MSFT") typically appears only on the first page or in the document title, but gets lost during the chunking process.

To address this limitation, we implemented a context preservation technique that hyphenates entities with their source document titles before creating entity embeddings. For example, rather than just embedding "Gross Margin Percentage" as a standalone entity, we embed "Gross Margin Percentage - 2023 Q3 AAPL" to incorporate document-level context. This approach ensures that even when the chunk-level extraction misses the hierarchical relationship, the entity embeddings still capture the critical document-source context.

Entity-Based Approach

The Entity-Based KG-RAG approach enhances the standard RAG pipeline by reimagining the Knowledge Store and Retriever components shown in Figure 1. Instead of just storing chunk embeddings, the Knowledge Store consists of a knowledge graph of interconnected entities with entity-chunk relationships that map these entities to the relevant document chunks from which they were extracted. The Retriever uses a two-step process that first identifies relevant entities and then explores their connections before selecting chunks. The process follows these steps:

Entity-Based KG-RAG Architecture — Figure 4: Entity-Based KG-RAG architecture showing the Knowledge Store with a knowledge graph and entity-chunk relationships, and the Retriever using entity similarity matching followed by subgraph exploration

The workflow consists of four main stages:

Query Embedding: Convert the user query to a vector embedding
Entity Similarity Matching: Find top N entities in the knowledge graph most similar to the query (with similarity scores like 0.73, 0.54, etc.)
Subgraph Exploration: Explore the local neighborhood around similar entities to discover related entities and their connections
Chunk-Entity Voting: Select the top K most relevant document chunks based on their connections and similarity to the identified entities and provide subgraph context.

This approach leverages both semantic similarity (through embeddings) and structural relationships (through the knowledge graph) to provide more accurate and comprehensive answers. The entities in the knowledge graph are embedded based on the entity name and source document using the OpenAI text-embedding-3-small model, and compared to the query embedding using cosine similarity.

It's important to note that the "top nodes" concept is separate from the final number of chunks selected. We choose top N nodes and score them based on both their frequency (how often they are associated with a specific chunk) and overall similarity scores to the query. Then we select a set of top K chunks based on that combined score.

Interactive Visualization

To better understand how the Entity-Based KG-RAG method works in practice, let's look at an interactive visualization of the process for a query where the baseline system answers incorrectly but the entity-based system gets right:

Figure 5: Interactive visualization of the Entity-Based KG-RAG approach showing query processing flow for a question about Apple's gross margin percentage. The visualization demonstrates how entities are identified, the subgraph is explored, and relevant document chunks are selected to provide comprehensive answers.

The visualization above shows how a query about Apple's gross margin percentage flows through the Entity-Based KG-RAG system:

The system first identifies relevant entities in the knowledge graph based on similarity to the query
It then explores the subgraph around these entities to discover related information
Based on the explored subgraph, it selects the most relevant document chunks
Finally, it assembles a comprehensive context that combines both structural knowledge and textual information

Advantages

The Entity-Based KG-RAG approach offers several advantages over traditional RAG systems:

Relational Context Preservation: The Knowledge Store's graph structure explicitly maintains the relationships between entities, preserving crucial contextual information that might be lost in vector-based approaches.
Multi-hop Reasoning Support: The Retriever's subgraph exploration capability allows the system to discover relevant entities and information that may be multiple hops away from the initially matched entities.
Entity-Grounded Context Selection: The entity-chunk relationships in the Knowledge Store ensure that document chunks are selected based on their connections to relevant entities, not just lexical similarity.
Structural Patterns in Financial Data: Financial documents follow predictable structures, with information organized around key entities like companies, time periods, and financial metrics. Knowledge graphs naturally capture these patterns, making them particularly effective for this domain.
Explanation and Transparency: The paths in the knowledge graph provide a clear explanation of how different pieces of information are related, enhancing the transparency of the retrieval process.

Challenges and Limitations

Despite its advantages, knowledge graph-based RAG approaches also face several challenges and limitations:

Domain Specificity: The effectiveness of a knowledge graph depends heavily on how well it captures the domain-specific relationships in the documents. Different domains may require different graph schemas and extraction approaches.
Computational Overhead: Building and maintaining a knowledge graph introduces additional computational requirements compared to simple vector stores, particularly for large document collections.
Graph Quality vs. Performance: The quality of the knowledge graph directly impacts the performance of the KG-RAG system. Incomplete or inaccurate graphs can lead to missing connections or irrelevant retrievals.
Optimization Challenges: Finding the optimal configuration for knowledge graph construction and exploration (e.g., similarity thresholds, number of hops) often requires extensive experimentation.

In our implementation, we initially aimed to create a knowledge graph that would map all the way to terminal nodes containing specific values (e.g., APPLE -> HAS_DOCUMENT -> 2023 Q3 -> REPORTED -> Gross Margin Percentage -> 44%). However, we found that the knowledge graph creation process was biased toward extracting semantic relationships rather than embedding specific values. This led us to develop the entity-chunk mapping approach as a pragmatic solution to connect entities in the graph with the document chunks containing relevant values.

SEC 10-Q Dataset & Evaluation

Dataset Overview

To evaluate the performance of different RAG approaches, we leverage a specialized dataset from Docugami based on SEC 10-Q quarterly financial reports from major technology companies. The dataset includes:

Financial reports from Apple, Amazon, Intel, Microsoft, and NVIDIA
Multiple quarters per company (2022-2023)
PDF files with extractable text content
Structured financial data including revenue, profit margins, and other metrics

Sample SEC 10-Q Document — Figure 6: Sample SEC 10-Q document from Apple's Q3 2023 report. The document contains structured financial data and textual information.

This dataset was chosen because financial documents represent an ideal use case for knowledge graph approaches - they contain numerous entities with complex relationships between them, and answering questions often requires connecting information across different sections.

Evaluation Methodology

While the original dataset included human-reviewed LLM-generated question-answer pairs, these tended to be qualitative in nature, making precise evaluation challenging. To address this limitation, we developed a set of 100 synthetic question-answer pairs with the following characteristics:

Derived from original Q&A pairs but focused on quantitative answers
Designed to have objective numerical answers that can be precisely evaluated
Questions require understanding relationships between entities (e.g., companies, time periods, financial metrics)
Manually verified to ensure answerable using the original documents
Include a mix of single-hop and multi-hop questions, though the latter comprise a smaller fraction

For example, a qualitative question like:

"Can any trends be identified in Apple's Services segment revenue over the reported periods?"

was transformed into a quantitative question such as:

"What was the increase in Apple's Services segment net sales from the quarter ended June 25, 2022, to the quarter ended July 1, 2023, as reported in their 2022 Q3 and 2023 Q3 10-Q filings? Provide the answer in millions of dollars as a whole number without commas."

Here's an example of a multi-hop question where the KG-RAG method outperforms the baseline approach:

Multi-hop Question Example:
"What was the increase in Apple's R&D expenses from the third quarter of 2022 to the first quarter of 2023, as reported in their 2022 Q3 and 2023 Q1 10-Q filings? Provide the answer in millions of dollars as a whole number without commas."

This question requires the system to find and connect information about R&D expenses from two different filing periods, perform a calculation, and return the result in a specific format. The knowledge graph approach excels at this type of question because it can explicitly model the relationships between entities (Apple, R&D expenses, time periods) and facilitate the multi-hop reasoning required.

It's important to note that our focus on quantitative questions is primarily for evaluation simplicity and not to suggest that end users would only ask for numerical answers. In real-world deployments, users would likely ask a much broader range of questions, including qualitative ones about trends, strategies, risks, and other textual information in the reports.

We evaluated each RAG system using the following methodology:

Accuracy: An answer is considered correct only if the numerical value exactly matches the ground truth.
Controlled Environment: All systems used the same LLMs (GPT-4o/GPT-4o-mini) for generation, ensuring that performance differences were attributable to the retrieval components.
Hyperparameter Consistency: Where applicable, we used consistent hyperparameters (e.g., top-k = 5 chunks) across systems for fair comparison. For document chunking, we used a standard approach of 512 tokens with an overlap of 24 tokens, kept constant for both baseline and KG-RAG implementations.
Error Analysis: Beyond simple accuracy, we analyzed the confusion matrix between systems to understand where and why different approaches succeeded or failed.

It's worth noting that we did not implement a re-ranker for either method in these experiments. However, re-ranking could be an interesting future exploration for the KG-RAG method by supplying token-efficient subgraph path definitions to have the model re-rank paths based on their usefulness to the query, with the associated nodes then used to select chunks.

Performance Results

Our evaluation revealed significant performance differences between the baseline RAG and Entity-Based KG-RAG approaches. The following visualization shows the overall accuracy comparison:

Performance Comparison — Figure 7: Performance comparison between Entity-based KG-RAG and Baseline RAG across different LLM models. Entity-based KG-RAG consistently outperforms the baseline approach.

The Entity-Based KG-RAG approach showed a substantial improvement over the baseline, with accuracy increasing from 40% to 55% when using GPT-4o, and from 36.36% to 56% when using GPT-4o-mini. This represents a relative improvement of approximately 37.5% and 54% respectively.

Surprisingly, the performance of the entity-based approach was even more pronounced with the smaller GPT-4o-mini model, which typically performs worse than the larger GPT-4o model. This suggests that the structural knowledge provided by the knowledge graph compensates for the limitations of the smaller model, allowing it to leverage relationships more effectively than the baseline approach.

Regarding latency, our measurements showed that the KG-RAG method adds only minimal overhead to the retrieval process compared to the baseline method:

Baseline Method: Mean Latency: 0.5679 seconds, Median: 0.3511 seconds
KG-RAG Method: Mean Latency: 0.6224 seconds, Median: 0.4533 seconds

Here, we conduct a detailed error analysis using a confusion matrix to understand the patterns of success and failure between the two approaches:

The confusion matrix reveals that:

Both systems correctly answered 38 questions (38% of the dataset)
KG-RAG correctly answered 17 questions that the Baseline RAG missed
Baseline RAG correctly answered only 2 questions that KG-RAG missed
Both systems incorrectly answered 43 questions (43% of the dataset)

This asymmetric pattern suggests that the KG-RAG approach maintains most of the strengths of the baseline approach while addressing many of its weaknesses through improved structural understanding.

We also investigated how the performance of the KG-RAG approach varies with different configuration parameters, particularly the number of top nodes considered in the similarity matching stage:

KG-RAG Performance by Top-N Nodes — Figure 9: KG-RAG performance with varying numbers of top similarity nodes considered. The peak performance occurs around 30-40 nodes, with diminishing returns when considering too many or too few nodes.

This analysis reveals that performance peaks when considering between 30-40 top similar nodes (56% accuracy), with a noticeable decline when considering either too few (< 10 nodes) or too many (> 50 nodes) similar entities. This suggests an optimal balance where the system has enough similar entities to explore related connections, but not so many that it introduces noise or dilutes the relevance of the retrieved context.

Other Knowledge Graph RAG Approaches

While this article has focused on the entity-based KG-RAG approach, we also implemented several other knowledge graph-based methods that show promise for different use cases. These approaches were not included in the main evaluation for various reasons detailed below, but they offer valuable alternatives for specific scenarios.

Cypher-Based KG-RAG

The Cypher-based KG-RAG method leverages a Neo4j graph database and uses structured query language (Cypher) instead of vector embeddings as the primary retrieval mechanism:

Cypher Query Generation: A specialized LLM prompt template helps generate valid Cypher queries based on natural language questions
Schema-Aware Design: The system maintains awareness of the underlying graph schema to ensure generated queries use the correct entity types and relationships
Declarative Retrieval: Rather than exploring a subgraph based on similarity, this approach directly queries for specific patterns of relationships
Error Handling: Includes mechanisms to detect and correct malformed queries through an iterative process

This approach excels when questions map clearly to specific relationship patterns in the knowledge graph, but requires more specialized knowledge of the underlying graph structure. We use the Langchain neo4j cookbooks as a reference for our implementation. LlamaIndex also implements a Knowledge Graph Query Engine using neo4j and cypher generation.

In our initial testing, the cypher-based approach had difficulty with the knowledge graph provided for the SEC 10-Q dataset, as the LLM-generated Cypher queries could not reliably capture the complexity of the relationships in the financial documents. This led to brittle query generation and inconsistent results, which is why we excluded it from the main evaluation.

Future work could leverage custom models for cypher generation or validating cyphers in a loop to refine query until valid cypher is emitted.

GraphRAG

GraphRAG combines knowledge graphs with embedding-based retrieval and community detection algorithms:

Document-to-Graph Transformation: Transforms documents into graph structures with nodes, edges, and community clusters
Hybrid Search Strategies: Implements both local (node-centered) and global (community-based) search strategies
Community Detection: Uses graph algorithms to identify clusters of related information
LangChain Integration: Built on the LangChain framework for seamless integration with other components

This approach is particularly effective for documents with natural community structures, such as research papers with distinct sections or reports covering various business segments. We leverage a langchain implementation of GraphRAG for ease-of-use, though other implementations like those from LlamaIndex also offer similar capabilities.

In our early evaluations, this particular implementation of GraphRAG achieved approximately 20% accuracy (without any prompt tuning) on the SEC 10-Q dataset. Due to time constraints, we decided to focus our comprehensive evaluation on the entity-based approach that showed more promising initial results.

Conclusion and Future Directions

Knowledge graph-based RAG approaches represent a significant advancement over traditional vector-based methods, especially for domains with complex relational structures. By incorporating structured relationships into the retrieval process, these methods can provide more accurate, comprehensive, and explainable answers.

Our experiments with the Entity-Based KG-RAG method show promising results, particularly for questions that require understanding relationships between multiple entities and documents. The ability to explore subgraphs and combine structural knowledge with textual information enables more nuanced and accurate responses.

The comparative analysis clearly demonstrates that incorporating structural knowledge through knowledge graphs significantly improves the ability of RAG systems to handle complex information needs, particularly in domains with rich relational structures like financial documentation.

Future directions for this research include:

Improved graph construction techniques: Developing better methods for automatically extracting entities and relationships from documents
Dynamic graph updates: Creating systems that can continuously update the knowledge graph as new information becomes available
Reasoning-enhanced retrieval: Incorporating logical reasoning capabilities into the graph exploration process
Hybrid approaches: Integrating the strengths of different KG-RAG methods for optimal performance across diverse question types

The source code for reproducing all experiments and implementations described in this article is available in our KG-RAG repository.

Acknowledgments

We would like to thank the Vector Institute for supporting this research, and the open-source community for providing valuable tools and frameworks that made this work possible.