Core Concepts
Understand the moving pieces—projects, sessions, runtimes, and the service architecture—before you customize or extend LlamaFarm.
Architecture Overview
┌────────────┐ ┌───────────────┐ ┌──────────────┐
│ lf CLI │──────▶│ LlamaFarm │──────▶│ Runtime Host │
│ │ HTTP │ Server (API) │ │ (Ollama/vLLM │
└─────┬──────┘ │ │ │ OpenAI,...) │
│ │ ┌─────────┐ │ └──────────────┘
│ Websocket │ │ Celery │◀┐
│ (streaming) │ │ Workers │ │ ingest jobs
│ │ └─────────┘ │
│ │ ▲ │
▼ │ │ │
┌────────────┐ │ ┌─────────┐ │ ┌─────────────┐
│ Config │◀──────┘ │ RAG │◀┼────│ Vector Store │
│ Watcher │ updates │ Worker │ │ │ (Chroma,...)│
└────────────┘ └─────────┘ │ └─────────────┘
│
▼
┌────────────┐
│ Dataset │
│ Storage │
└────────────┘
- CLI (
lf
) orchestrates everything: talking to the API, streaming responses, uploading datasets, and watching config changes. - Server exposes REST endpoints under
/v1/projects/{namespace}/{project}/...
for chat completions, datasets, and RAG queries. - Celery workers handle ingestion tasks asynchronously; the CLI polls and surfaces progress.
- Runtime hosts can be local (Ollama) or remote OpenAI-compatible endpoints (vLLM, Together). Configuration controls provider, base URL, API key, and instructor mode.
- RAG worker processes documents via configured pipelines and writes to vector databases (default Chroma, configurable).
Projects & Namespaces
- A project is a configuration bundle stored in
llamafarm.yaml
plus server-side metadata. - Projects live within a namespace (defaults to
default
). Namespaces isolate resources, dataset names, and sessions. lf init
creates a project using the server’s template; you can list existing projects withlf projects list --namespace my-team
.
Sessions
lf chat
creates or resumes a session when you pass a--session-id
or use the environment variableLLAMAFARM_SESSION_ID
.lf start
opens a stateful dev session whose history persists under.llamafarm/projects/<namespace>/<project>/dev/context
.lf chat --no-rag
is stateless by default unless you provide a session identifier.- API consumers pass
session_id
directly to/chat/completions
to control continuity.
Configuration-Driven Behaviour
llamafarm.yaml
defines runtime, prompts, and RAG strategies (see Configuration Guide).- Changes to the file trigger the config watcher; the CLI reloads live during dev sessions.
- Missing runtime fields (provider/base_url/api_key) are treated as errors; there are no hidden defaults.
RAG Strategies
- RAG configuration is composed of databases and data processing strategies.
- Each dataset references a strategy and database; CLI enforces this relationship when creating datasets.
- Strategies describe parsers, extractors, metadata processors, and embedding choices.
Extensibility Mindset
Everything in LlamaFarm is intended to be swapped or extended:
- Point
runtime.base_url
to a vLLM or custom OpenAI-compatible gateway. - Register a new vector store backend, update
rag/schema.yaml
, and regenerate types. - Add parsers/extractors to support new file formats.
- Create new CLI subcommands under
cli/cmd
to automate workflows.
See Extending LlamaFarm for detailed instructions.
Component Health
When commands run, you might see a summary like:
⚠️ Server is degraded
Summary: server=healthy, storage=healthy, ollama=healthy, celery=degraded, rag-service=healthy, project=healthy
⚠️ celery degraded No workers replied to ping (latency: 533ms)
- Degraded does not always mean failure; ingestion may continue in the background.
lf rag health
reports live status of embedder, store, and processing pipeline.- Address warnings before production deployment (ensure Celery workers are running, Ollama/vLLM accessible, etc.).
Next Steps
- Quickstart – run through the onboarding flow if you haven’t already.
- CLI Reference – learn each command in detail.
- RAG Guide – configure databases, strategies, and retrieval.