Skip to content

Data Persistence

Issue

ChromaDB collections are lost when containers restart or when running docker system prune.

Solution

Current Setup

The Docker configuration uses named volumes for persistence: - Development: chroma_data_dev - Production: chroma_data

Prevent Data Loss

Option 1: Selective Pruning (Recommended)

# Safe cleanup - preserves volumes
docker system prune --volumes=false

# Or clean individually
docker container prune
docker image prune
docker network prune

Option 2: Volume Backups

# Check volume status
docker volume ls | grep chroma

# Backup before cleanup
docker run --rm -v health-rec_chroma_data_dev:/source -v $(pwd)/backups:/backup alpine tar czf /backup/chroma_backup_$(date +%Y%m%d).tar.gz -C /source .

Restore Data

# Restore from backup
docker run --rm -v health-rec_chroma_data_dev:/target -v $(pwd)/backups:/backup alpine tar xzf /backup/chroma_backup_YYYYMMDD.tar.gz -C /target

Alternative: Bind Mounts for Development

For guaranteed persistence during development, uncomment the bind mount in docker-compose.dev.yml:

chromadb-dev:
  volumes:
    # - chroma_data_dev:/chroma/chroma        # Named volume
    - ./data/chroma_dev:/chroma/chroma        # Bind mount (uncomment this)

Quick Commands

# Safe restart with data preservation
docker compose --env-file .env.development -f docker-compose.dev.yml down
docker compose --env-file .env.development --profile frontend -f docker-compose.dev.yml up

# Check if data exists
docker exec $(docker compose --env-file .env.development -f docker-compose.dev.yml ps -q chromadb-dev) ls -la /chroma/chroma

Best Practice

Never use docker system prune without the --volumes=false flag to preserve your data.