Skip to main content

Configuration Guide

Every LlamaFarm project is defined by a single file: llamafarm.yaml. The server validates it against JSON Schema, so missing fields surface as errors instead of hidden defaults. This guide explains each section and shows how to extend the schema responsibly.

File Layout

version: v1
name: my-project
namespace: default
runtime: { ... }
prompts: [...]
rag: { ... }
datasets: [...]

Metadata

FieldTypeRequiredNotes
versionstring✅ (v1)Schema version.
namestringProject identifier.
namespacestringGrouping for isolation (matches server namespace).

Runtime

Controls how chat completions are executed. LlamaFarm supports both multi-model (recommended) and legacy single-model configurations.

Configure multiple models and switch between them via CLI or API:

runtime:
default_model: fast # Which model to use by default

models:
fast:
description: "Fast Ollama model"
provider: ollama
model: gemma3:1b
prompt_format: unstructured

powerful:
description: "More capable model"
provider: ollama
model: qwen3:8b

Using multi-model:

  • CLI: lf chat --model powerful "your question"
  • CLI: lf models list
  • API: POST /v1/projects/{ns}/{id}/chat/completions with {"model": "powerful", ...}

Legacy Single-Model Configuration (Still Supported)

The original flat runtime configuration is automatically converted internally:

runtime:
provider: openai
model: qwen2.5:7b
base_url: http://localhost:8000/v1
api_key: sk-local-placeholder
instructor_mode: tools
model_api_parameters:
temperature: 0.2

Runtime Fields

Multi-model format:

FieldTypeRequiredDescription
default_modelstringName of the default model to use
modelsarrayList of model configurations (see below)

Per-model fields:

FieldTypeRequiredDescription
namestringUnique identifier for this model
providerenum (openai, ollama, lemonade, universal)openai for OpenAI-compatible APIs, ollama for local Ollama, lemonade for local GGUF models with NPU/GPU support, universal for the Universal Runtime
modelstringModel identifier understood by the provider
descriptionstringOptionalHuman-readable description of the model
defaultbooleanOptionalSet to true to make this the default model (alternative to default_model)
base_urlstring or null⚠️ Required for non-default hosts (vLLM, Together, Lemonade)API endpoint URL
api_keystring or null⚠️ Required for most hosted providers. Use .env + environment variablesAuthentication key
instructor_modestring or nullOptionaljson, md_json, tools for structured output modes
prompt_formatstringOptionalunstructured or other format
model_api_parametersobjectOptionalPassthrough parameters (temperature, top_p, etc.)
lemonadeobject⚠️ Required for provider: lemonadeLemonade-specific configuration (see below)
extra_bodyobjectOptionalProvider-specific parameters (see n_ctx below)
encoder_configobjectOptionalConfiguration for BERT-style encoder models (Universal runtime only)
tool_call_strategyenumnative_apinative_api or prompt_based for tool calling strategy
mcp_serversarrayOptionalList of MCP server names to use (omit for all, empty for none)

extra_body fields (Universal runtime):

FieldTypeDefaultDescription
n_ctxintegerautoContext window size for GGUF models. Auto-detected if not specified.

encoder_config fields (Universal runtime):

FieldTypeDefaultDescription
max_lengthintegerautoMaximum sequence length (ModernBERT: 8192, classic: 512)
use_flash_attentionbooleantrueEnable Flash Attention 2 for faster inference
taskenumembeddingembedding, classification, reranking, ner

Lemonade-specific fields:

FieldTypeRequiredDescription
backendstringllamacpp, onnx, or transformers
portnumberPort number (default: 11534)
context_sizenumberOptionalContext window size (default: 32768)

Extending providers: To add a new provider enum, update config/schema.yaml, regenerate types via config/generate_types.py, and implement routing in the server/CLI. See Extending runtimes.

Prompts

Prompts are named sets of messages that seed instructions for each session.

prompts:
- name: default
messages:
- role: system
content: >-
You are a supportive assistant. Cite documents when relevant.
  • Each prompt has a name and a list of messages with role and content.
  • Roles can be system, user, or assistant (anything supported by the runtime).
  • Models can select which prompt sets to use via prompts: [list of names]; if omitted, all prompts stack in definition order.
  • Prompts are appended before user input; combine with RAG context via the RAG guide.

RAG Configuration

The rag section mirrors rag/schema.yaml. It defines databases and data-processing strategies.

rag:
databases:
- name: main_db
type: ChromaStore
default_embedding_strategy: default_embeddings
default_retrieval_strategy: semantic_search
embedding_strategies:
- name: default_embeddings
type: OllamaEmbedder
config:
model: nomic-embed-text:latest
retrieval_strategies:
- name: semantic_search
type: VectorRetriever
config:
top_k: 5
data_processing_strategies:
- name: pdf_ingest
parsers:
- type: PDFParser_LlamaIndex
config:
chunk_size: 1500
chunk_overlap: 200
extractors:
- type: HeadingExtractor
- type: ContentStatisticsExtractor

Key points:

  • databases map to vector stores; choose from ChromaStore or QdrantStore by default.
  • embedding_strategies and retrieval_strategies let you define hybrid or metadata-aware search.
  • data_processing_strategies describe parser/extractor pipelines applied during ingestion.
  • For a complete field reference, see the RAG Guide.

Memory Configuration

The memory section configures optional memory stores for working memory, time-series data, spatial data, and graph relationships.

memory:
default_store: main_memory
stores:
- name: main_memory
working_memory:
enabled: true
ttl_seconds: 3600
max_records: 10000
timeseries:
enabled: true
retention_days: 30
spatial:
enabled: false
graph:
enabled: true
max_path_depth: 10
entity_extraction: true
relationship_extraction: false
consolidation:
enabled: true
interval_seconds: 300
min_records: 10
batch_size: 100

Memory Store Fields

FieldTypeDescription
namestringUnique identifier for this memory store
working_memoryobjectShort-term memory buffer configuration
timeseriesobjectTime-series store configuration
spatialobjectGeo-spatial store configuration
graphobjectGraph store configuration
consolidationobjectMemory consolidation settings

Working Memory

FieldTypeDefaultDescription
enabledbooleantrueEnable working memory
ttl_secondsinteger3600Time-to-live for records
max_recordsinteger10000Maximum records before auto-prune

Time-series

FieldTypeDefaultDescription
enabledbooleantrueEnable time-series store
retention_daysinteger30Days to retain data

Spatial

FieldTypeDefaultDescription
enabledbooleantrueEnable spatial store
retention_daysinteger30Days to retain data
index_typestringrtreertree or geohash

Graph

FieldTypeDefaultDescription
enabledbooleantrueEnable graph store
max_path_depthinteger10Maximum depth for path finding
entity_extractionbooleantrueExtract entities using NER
relationship_extractionbooleanfalseExtract relationships via LLM

Consolidation

FieldTypeDefaultDescription
enabledbooleantrueEnable consolidation
interval_secondsinteger300Consolidation interval
min_recordsinteger10Minimum records before consolidation
batch_sizeinteger100Batch size for consolidation
prune_after_consolidatebooleantruePrune after consolidation
extract_summariesbooleanfalseCreate embeddings from consolidated data

Datasets

datasets keep metadata about datasets you manage via the CLI.

datasets:
- name: research-notes
data_processing_strategy: pdf_ingest
database: main_db
files:
- 2d5fd8424e62c56cad39864fac9ecff7af9639cf211deb936a16dc05aca5b3ea
  • files are SHA256 hashes tracked by the server.
  • Not required, but useful for syncing dataset metadata across environments.

Validation & Errors

  • The CLI enforces schema validation when loading configs. Missing runtime fields raise Error: runtime.provider is required.
  • Use lf chat --curl to inspect the raw request if responses look wrong (verify prompts and RAG toggles).
  • The server logs include full validation errors if API calls fail due to config mismatches.

Extending the Schema

  1. Edit config/schema.yaml or rag/schema.yaml to add new enums/properties.
  2. Run config/generate_types.py to regenerate Pydantic/Go datamodels.
  3. Update server/CLI logic to accept the new fields.
  4. Document the addition in this guide and the Extending section.

Example: To support a new provider together, add it to the provider enum, regenerate types, and update runtime selection to issue HTTP requests to Together’s API.

Best Practices

  • Keep secrets out of YAML; use environment variables and reference them at runtime.
  • Version control your config; treat llamafarm.yaml like application code.
  • Use separate namespaces or configs for dev/staging/prod to avoid cross-talk.
  • Document uncommon parser/extractor choices for future maintainers.

Need concrete samples? Check the Example configs and the examples in the repo (examples/fda_rag/llamafarm-example.yaml).