Deployment
LlamaFarm shards into two primary services: the FastAPI server and the RAG worker (Celery). You can run them locally for development or deploy to your infrastructure using the same configuration.
Local Development
Option A: lf start
The easiest path is the integrated dev stack:
lf start
This command:
- Starts the server and RAG worker (using Docker/Nx under the hood).
- Watches
llamafarm.yaml
for changes. - Opens an interactive chat UI.
Option B: Manual control
Open two terminals:
# Terminal 1 – API
cd server
uv sync
uv run uvicorn server.main:app --reload
# Terminal 2 – RAG worker
cd rag
uv sync
uv run python cli.py worker
Then run CLI commands against the default server at http://localhost:8000
.
Production Checklist
- Environment variables: store API keys (OpenAI, Together, etc.) in
.env
files or secret managers. Updateruntime.api_key
to reference them. - Process management: use process supervisors (systemd, PM2, Docker Compose) to keep server and Celery workers running.
- Workers: size Celery workers based on ingestion load; large PDFs can take minutes.
- Observability: enable logging and monitoring around ingestion jobs (out-of-scope for this quick guide, but recommended).
Docker Compose (Example)
docker-compose.yml
version: "3.8"
services:
server:
build: ./server
environment:
- LLAMAFARM_CONFIG=/app/llamafarm.yaml
- RUNTIME_API_KEY=${RUNTIME_API_KEY}
volumes:
- ./llamafarm.yaml:/app/llamafarm.yaml
ports:
- "8000:8000"
rag-worker:
build: ./rag
environment:
- LLAMAFARM_CONFIG=/app/llamafarm.yaml
volumes:
- ./llamafarm.yaml:/app/llamafarm.yaml
Adjust to mount datasets, persistence layers, or external vector stores as needed.
Scaling Beyond Local
- vLLM / Hosted runtimes – run models on separate infrastructure; update
runtime.base_url
andapi_key
. - Managed vector stores – swap
ChromaStore
for Qdrant cloud or another backend you register. - Multiple workers – run additional Celery workers to parallelize ingestion.
- Container orchestration – convert the Docker Compose example to Kubernetes or ECS, ensuring environment variables and secrets propagate.
Tests Before Release
uv run --group test python -m pytest
(server)go test ./...
(CLI)nx build docs
(documentation)- Smoke tests on ingestion/queries using
lf datasets
andlf rag query
Resources
- Quickstart – local installation steps.
- Configuration Guide – runtime/provider settings.
- Extending LlamaFarm – adapt to your infrastructure.