Example Configurations

Use these snippets as starting points for real projects. Every example validates against the current schema.

Local RAG with Ollama (Multi-Model)

version: v1
name: local-rag
namespace: default

runtime:
  default_model: default

  models:
    - name: default
      description: "Primary Ollama model"
      provider: ollama
      model: llama3:8b
      default: true

prompts:
  - name: default
    messages:
      - role: system
        content: >-
          You are a friendly assistant. Reference document titles when possible.

rag:
  databases:
    - name: main_db
      type: ChromaStore
      default_embedding_strategy: default_embeddings
      default_retrieval_strategy: semantic_search
      embedding_strategies:
        - name: default_embeddings
          type: OllamaEmbedder
          config:
            model: nomic-embed-text:latest
      retrieval_strategies:
        - name: semantic_search
          type: VectorRetriever
          config:
            top_k: 5
  data_processing_strategies:
    - name: pdf_ingest
      parsers:
        - type: PDFParser_LlamaIndex
          config:
            chunk_size: 1200
            chunk_overlap: 150
      extractors:
        - type: HeadingExtractor
        - type: ContentStatisticsExtractor

datasets:
  - name: policies
    data_processing_strategy: pdf_ingest
    database: main_db

Lemonade Local Runtime (Multi-Model)

version: v1
name: lemonade-local
namespace: default

runtime:
  default_model: balanced

  models:
    - name: fast
      description: "Fast 0.6B model for quick responses"
      provider: lemonade
      model: user.Qwen3-0.6B
      base_url: "http://127.0.0.1:11534/v1"
      lemonade:
        backend: llamacpp
        port: 11534
        context_size: 32768

    - name: balanced
      description: "Balanced 4B model - recommended"
      provider: lemonade
      model: user.Qwen3-4B
      base_url: "http://127.0.0.1:11535/v1"
      default: true
      lemonade:
        backend: llamacpp
        port: 11535
        context_size: 32768

    - name: powerful
      description: "Powerful 8B reasoning model"
      provider: lemonade
      model: user.Qwen3-8B
      base_url: "http://127.0.0.1:11536/v1"
      lemonade:
        backend: llamacpp
        port: 11536
        context_size: 65536

prompts:
  - name: default
    messages:
      - role: system
        content: >-
          You are a helpful assistant with access to local models.

Setup:

Download models: uv run lemonade-server-dev pull user.Qwen3-4B --checkpoint unsloth/Qwen3-4B-GGUF:Q4_K_M --recipe llamacpp
Start instances (from llamafarm project root):
- LEMONADE_MODEL=user.Qwen3-0.6B LEMONADE_PORT=11534 nx start lemonade
- LEMONADE_MODEL=user.Qwen3-4B LEMONADE_PORT=11535 nx start lemonade
- LEMONADE_MODEL=user.Qwen3-8B LEMONADE_PORT=11536 nx start lemonade

Note: Currently, Lemonade must be manually started. In the future, it will run as a container and be auto-started by the LlamaFarm server.

See Lemonade Quickstart for detailed setup.

vLLM Gateway with Structured Output

version: v1
name: llm-gateway
namespace: enterprise

runtime:
  default_model: vllm-model

  models:
    - name: vllm-model
      description: "vLLM gateway model"
      provider: openai
      model: qwen2.5:7b
      base_url: https://llm.company.internal/v1
      api_key: ${VLLM_API_KEY}
      instructor_mode: tools
      default: true
      model_api_parameters:
        temperature: 0.1

prompts:
  - name: default
    messages:
      - role: system
        content: >-
          You are a compliance assistant returning JSON with fields: `summary`, `citations`.

rag:
  databases:
    - name: compliance_db
      type: QdrantStore
      default_embedding_strategy: openai_embeddings
      default_retrieval_strategy: hybrid_search
      embedding_strategies:
        - name: openai_embeddings
          type: OpenAIEmbedder
          config:
            model: text-embedding-3-small
      retrieval_strategies:
        - name: hybrid_search
          type: HybridUniversalStrategy
          config:
            dense_weight: 0.7
            sparse_weight: 0.3
  data_processing_strategies:
    - name: docx_ingest
      parsers:
        - type: DocxParser_LlamaIndex
          config:
            chunk_size: 1000
            chunk_overlap: 100
      extractors:
        - type: EntityExtractor
          config:
            include_types: [ORGANIZATION, LAW]

Multi-Strategy Retrieval

rag:
  databases:
    - name: research_db
      type: ChromaStore
      default_embedding_strategy: dense_embeddings
      default_retrieval_strategy: reranked_search
      embedding_strategies:
        - name: dense_embeddings
          type: SentenceTransformerEmbedder
          config:
            model: all-MiniLM-L6-v2
      retrieval_strategies:
        - name: keyword_search
          type: BM25Retriever
          config:
            stop_words: ["the", "a", "and"]
        - name: reranked_search
          type: RerankedStrategy
          config:
            candidate_strategy: keyword_search
            reranker: bm25+embedding

Mixed Providers (Ollama + Lemonade)

version: v1
name: mixed-providers
namespace: default

runtime:
  default_model: ollama-default

  models:
    - name: ollama-default
      description: "Primary Ollama model"
      provider: ollama
      model: llama3:8b
      default: true

    - name: ollama-small
      description: "Small Ollama model"
      provider: ollama
      model: gemma3:1b

    - name: lemon-fast
      description: "Lemonade fast model with NPU/GPU"
      provider: lemonade
      model: user.Qwen3-0.6B
      base_url: "http://127.0.0.1:11534/v1"
      lemonade:
        backend: llamacpp
        port: 11534
        context_size: 32768

prompts:
  - name: default
    messages:
      - role: system
        content: >-
          You are a helpful assistant. Use the appropriate model for the task.

Usage:

Fast responses: lf chat --model lemon-fast "Quick question"
Default model: lf chat "Normal question"
Small model: lf chat --model ollama-small "Simple task"

Mix and match these patterns to suit your project. Remember to regenerate schema types if you add new providers or store options.

Local RAG with Ollama (Multi-Model)​

Lemonade Local Runtime (Multi-Model)​

vLLM Gateway with Structured Output​

Multi-Strategy Retrieval​

Mixed Providers (Ollama + Lemonade)​

Local RAG with Ollama (Multi-Model)

Lemonade Local Runtime (Multi-Model)

vLLM Gateway with Structured Output

Multi-Strategy Retrieval

Mixed Providers (Ollama + Lemonade)