Using Qdrant for Knowledge Storage¶
Introduction¶
The fed-rag
library supports a simple, in-memory knowledge store for rapid creation and development cycles of RAG systems. For larger scale fine-tuning jobs, you may need a more optimized knowledge store. FedRAG supports a seamless Qdrant integration in the form of the QdrantKnowledgeStore
, allowing you to connect to any Qdrant service—whether running locally or in a managed/cloud environment.
In this notebook, we demonstrate how to launch a local Qdrant service and use it as the knowledge storage for your RAG system.
Install dependencies¶
The QdrantKnowledgeStore
requires the installation of the qdrant
extra. Note that we also will use a HuggingFace SentenceTransformer
as the retriever/embedding model to encode our knowledge artifacts prior to loading them to our knowledge store.
# If running in a Google Colab, the first attempt at installing fed-rag may fail,
# though for reasons unknown to me yet, if you try a second time, it magically works...
!uv pip install fed-rag[huggingface,qdrant] -q
# We'll use the docker SDK to launch the Qdrant docker image
!uv pip install docker -q
Launch a Local Qdrant Service (with Docker)¶
This step assumes that you have docker installed on your machine. If not installed, refer to the official Docker docs for installation found here.
IMPORTANT NOTE: if you are running this within a Google Colab, you won't be able to run a docker image. Instead, you can run the rest of this notebook by using an in-memory instance of Qdrant.
If using a Colab, set the WITH_DOCKER
to False
WITH_DOCKER = True
if WITH_DOCKER:
import docker
import os
import time
client = docker.from_env()
image_name = "qdrant/qdrant"
# first see if we need to pull the docker image
try:
client.images.get(image_name)
print(f"Image '{image_name}' already exists locally")
except docker.errors.ImageNotFound:
print(f"Image '{image_name}' not found locally. Pulling...")
# Pull with progress information
for line in client.api.pull(image_name, stream=True, decode=True):
if "progress" in line:
print(f"\r{line['status']}: {line['progress']}", end="")
elif "status" in line:
print(f"\r{line['status']}", end="")
print("\nPull complete!")
# run the Qdrant container
container = client.containers.run(
"qdrant/qdrant",
detach=True, # Run in background
ports={"6333/tcp": 6333, "6334/tcp": 6334},
volumes={
f"{os.getcwd()}/qdrant_storage": {
"bind": "/qdrant/storage",
"mode": "rw",
}
},
name="qdrant-demo-fedrag-nb",
)
print(f"Container started with ID: {container.id}")
# wait a moment for the container to initialize
time.sleep(3)
# Check container status
container.reload() # Refresh container data
print(f"Container status: {container.status}")
print(f"Container logs:")
print(container.logs().decode("utf-8"))
Image 'qdrant/qdrant' already exists locally Container started with ID: 8e615fa4f54eeb349a1bd62fe3c9531104d0547b2390935ab9196a4090dcf692 Container status: running Container logs: _ _ __ _ __| |_ __ __ _ _ __ | |_ / _` |/ _` | '__/ _` | '_ \| __| | (_| | (_| | | | (_| | | | | |_ \__, |\__,_|_| \__,_|_| |_|\__| |_| Version: 1.14.0, build: 3617a011 Access web UI at http://localhost:6333/dashboard 2025-05-20T17:32:14.275811Z INFO storage::content_manager::consensus::persistent: Loading raft state from ./storage/raft_state.json 2025-05-20T17:32:14.277018Z INFO qdrant: Distributed mode disabled 2025-05-20T17:32:14.277054Z INFO qdrant: Telemetry reporting enabled, id: 68306805-e05f-434f-9075-ebf937a54e6a 2025-05-20T17:32:14.277091Z INFO qdrant: Inference service is not configured. 2025-05-20T17:32:14.278552Z INFO qdrant::actix: TLS disabled for REST API 2025-05-20T17:32:14.278595Z INFO qdrant::actix: Qdrant HTTP listening on 6333 2025-05-20T17:32:14.278603Z INFO actix_server::builder: Starting 11 workers 2025-05-20T17:32:14.278608Z INFO actix_server::server: Actix runtime found; starting in Actix runtime 2025-05-20T17:32:14.281871Z INFO qdrant::tonic: Qdrant gRPC listening on 6334 2025-05-20T17:32:14.281879Z INFO qdrant::tonic: TLS disabled for gRPC API
Setup the Retriever and QdrantKnowledgeStore
¶
from fed_rag.knowledge_stores import QdrantKnowledgeStore
from fed_rag.retrievers.huggingface import (
HFSentenceTransformerRetriever,
)
from fed_rag.data_structures import KnowledgeNode, NodeType
QUERY_ENCODER_NAME = "nthakur/dragon-plus-query-encoder"
CONTEXT_ENCODER_NAME = "nthakur/dragon-plus-context-encoder"
# retriever
retriever = HFSentenceTransformerRetriever(
query_model_name=QUERY_ENCODER_NAME,
context_model_name=CONTEXT_ENCODER_NAME,
load_model_at_init=False,
)
# knowledge store
if WITH_DOCKER:
knowledge_store = QdrantKnowledgeStore(
collection_name="nthakur.dragon-plus-context-encoder"
)
else:
knowledge_store = QdrantKnowledgeStore(
collection_name="nthakur.dragon-plus-context-encoder", in_memory=True
)
Let's Add Some Knowledge¶
# a small sample from the Dec 2021 Wikipedia dump
text_chunks = [
{
"id": "140",
"title": "History of marine biology",
"section": "James Cook",
"text": " James Cook is well known for his voyages of exploration for the British Navy in which he mapped out a significant amount of the world's uncharted waters. Cook's explorations took him around the world twice and led to countless descriptions of previously unknown plants and animals. Cook's explorations influenced many others and led to a number of scientists examining marine life more closely. Among those influenced was Charles Darwin who went on to make many contributions of his own. ",
},
{
"id": "141",
"title": "History of marine biology",
"section": "Charles Darwin",
"text": " Charles Darwin, best known for his theory of evolution, made many significant contributions to the early study of marine biology. He spent much of his time from 1831 to 1836 on the voyage of HMS Beagle collecting and studying specimens from a variety of marine organisms. It was also on this expedition where Darwin began to study coral reefs and their formation. He came up with the theory that the overall growth of corals is a balance between the growth of corals upward and the sinking of the sea floor. He then came up with the idea that wherever coral atolls would be found, the central island where the coral had started to grow would be gradually subsiding",
},
{
"id": "142",
"title": "History of marine biology",
"section": "Charles Wyville Thomson",
"text": " Another influential expedition was the voyage of HMS Challenger from 1872 to 1876, organized and later led by Charles Wyville Thomson. It was the first expedition purely devoted to marine science. The expedition collected and analyzed thousands of marine specimens, laying the foundation for present knowledge about life near the deep-sea floor. The findings from the expedition were a summary of the known natural, physical and chemical ocean science to that time.",
},
]
from fed_rag.data_structures import KnowledgeNode, NodeType
# create knowledge nodes
nodes = []
texts = []
for c in text_chunks:
text = c.pop("text")
title = c.pop("title")
section = c.pop("section")
context_text = f"title: {title}\nsection: {section}\ntext: {text}"
texts.append(context_text)
# batch encode
batch_embeddings = retriever.encode_context(texts)
for jx, c in enumerate(text_chunks):
node = KnowledgeNode(
embedding=batch_embeddings[jx].tolist(),
node_type=NodeType.TEXT,
text_content=texts[jx],
metadata=c,
)
nodes.append(node)
knowledge_store.load_nodes(nodes)
knowledge_store.count
3
Retriever From The Knowledge Store¶
query = "Who is James Cook?"
query_emb = retriever.encode_query(query).tolist()
retrieved_nodes = knowledge_store.retrieve(query_emb=query_emb, top_k=1)
score, knowledge_node = retrieved_nodes[0]
print("Similarity score: ", score)
print("Text content: ", knowledge_node.text_content)
Similarity score: 0.49984106 Text content: title: History of marine biology section: James Cook text: James Cook is well known for his voyages of exploration for the British Navy in which he mapped out a significant amount of the world's uncharted waters. Cook's explorations took him around the world twice and led to countless descriptions of previously unknown plants and animals. Cook's explorations influenced many others and led to a number of scientists examining marine life more closely. Among those influenced was Charles Darwin who went on to make many contributions of his own.
Clean up¶
if WITH_DOCKER:
# stop and remove container
container.stop()
container.remove()
Note on Connecting to Managed Qdrant Service¶
If you have a managed Qdrant service, then connecting to is easy. Simply pass in the credentials (i.e., api_key), the host name, the collection name at instantiation.
knowledge_store = QdrantKnowledgeStore(
# qdrant credentials
api_key="...",
host="...",
collection_name="...",
https=True,
)