Skip to content

Build a RAG System

In this quick start example, we'll demonstrate how to build a RAG system and subequently query it using FedRAG. We begin with a short primer on the components of RAG which mirror the abstractions that FedRAG uses to build such systems.

Components of RAG

Components of RAG
The three main components of RAG.
Components of RAG
The three main components of RAG.

A RAG system is comprised of three main components, namely:

  • Knowledge Store — contains non-parametric knowledge facts that the system can use at inference time in order to produce more accurate responses to queries.
  • Retriever — a model that takes in a user query and retrieves the most relevant knowledge facts from the knowledge store.
  • Generator — a model that takes in the user's query and additional context and provides a response to that query.

Info

The Retriever is also used to populate (i.e index) the Knowledge Store during setup.

Building a RAG system

We'll install FedRAG with the huggingface extra this time in order to build our RAG system using HuggingFace models.

pip install "fed-rag[huggingface]"

Retriever

With the huggingface extra, we can use any ~sentence_transformers.SentenceTransfomer as the retriever model, including dual encoders like the one used below. This HuggingFace encoder is used to define a HFSentenceTransformerRetriever.

retriever
from fed_rag.retrievers.huggingface.hf_sentence_transformer import (
    HFSentenceTransformerRetriever,
)

QUERY_ENCODER_NAME = "nthakur/dragon-plus-query-encoder"
CONTEXT_ENCODER_NAME = "nthakur/dragon-plus-context-encoder"

retriever = HFSentenceTransformerRetriever(
    query_model_name=QUERY_ENCODER_NAME,
    context_model_name=CONTEXT_ENCODER_NAME,
    load_model_at_init=False,
)

Knowledge Store

To create a knowledge store, we've create a toy set of only two knowledge artifacts that we'll encode and subsequently load into an InMemoryKnowledgeStore.

knowledge artifacts
import json

# knowledge chunks
chunks_json_strs = [
    '{"id": "0", "title": "Orchid", "text": "Orchids are easily distinguished from other plants, as they share some very evident derived characteristics or synapomorphies. Among these are: bilateral symmetry of the flower (zygomorphism), many resupinate flowers, a nearly always highly modified petal (labellum), fused stamens and carpels, and extremely small seeds"}'
    '{"id": "1", "title": "Tulip", "text": "Tulips are easily distinguished from other plants, as they share some very evident derived characteristics or synapomorphies. Among these are: bilateral symmetry of the flower (zygomorphism), many resupinate flowers, a nearly always highly modified petal (labellum), fused stamens and carpels, and extremely small seeds"}'
]
chunks = [json.loads(line) for line in chunks_json_strs]
knowledge store
from fed_rag.knowledge_stores.in_memory import InMemoryKnowledgeStore
from fed_rag.types.knowledge_node import KnowledgeNode, NodeType

knowledge_store = InMemoryKnowledgeStore()

# create knowledge nodes
nodes = []
for c in chunks:
    node = KnowledgeNode(
        embedding=retriever.encode_context(c["text"]).tolist(),
        node_type=NodeType.TEXT,
        text_content=c["text"],
        metadata={"title": c["title"], "id": c["id"]},
    )
    nodes.append(node)

# load into knowledge_store
knowledge_store.load_nodes(nodes=nodes)

Generator

With the huggingface extra installed, we can use any ~transformers.PreTrainedModel as well as any ~peft.PeftModel. For this example, we use the latter and define a HFPeftModelGenerator.

generator
from fed_rag.generators.huggingface import HFPeftModelGenerator
from transformers.generation.utils import GenerationConfig
from transformers.utils.quantization_config import BitsAndBytesConfig

PEFT_MODEL_NAME = "Styxxxx/llama2_7b_lora-quac"
BASE_MODEL_NAME = "meta-llama/Llama-2-7b-hf"

generation_cfg = GenerationConfig(
    do_sample=True,
    eos_token_id=[128000, 128009],
    bos_token_id=128000,
    max_new_tokens=4096,
    top_p=0.9,
    temperature=0.6,
    cache_implementation="offloaded",
    stop_strings="</response>",
)
quantization_config = BitsAndBytesConfig(load_in_4bit=True)
generator = HFPeftModelGenerator(
    model_name=PEFT_MODEL_NAME,
    base_model_name=BASE_MODEL_NAME,
    generation_config=generation_cfg,
    load_model_at_init=False,
    load_model_kwargs={"is_trainable": True, "device_map": "auto"},
    load_base_model_kwargs={
        "device_map": "auto",
        "quantization_config": quantization_config,
    },
)

RAG System

Finally, with our three main components in hand, we can build our first RAGSystem!

RAG system
from fed_rag.types.rag_system import RAGConfig, RAGSystem

rag_config = RAGConfig(top_k=2)
rag_system = RAGSystem(
    knowledge_store=knowledge_store,
    generator=generator,
    retriever=retriever,
    rag_config=rag_config,
)
querying our RAGSystem
# query the rag system
response = rag_system.query("What is a Tulip?")

print(f"\n{response}")

# inspect source nodes
print(response.source_nodes)

What's next?

Having explored federating a centralized task and building a RAG system, we'll dive deeper into fine-tuning RAG in our upcoming how-to guides. For now, let's proceed with installation and our usage patterns.