Skip to main content

Offline operation

LlamaFarm runtimes can run fully offline — no network calls to HuggingFace, no runtime downloads of llama.cpp binaries, no silent retries. This is the deployment pattern used by llamadrone / arc on air-gapped Raspberry Pi devices, and it works for any runtime that imports llamafarm_common.

Offline mode is env-var driven. There are two flags:

VariableWhat it does
LLAMAFARM_OFFLINE=1Strict offline mode. The runtime fails loudly if a model or llama.cpp binary is missing, with a remediation message pointing at the right lf CLI command. Also propagates HF_HUB_OFFLINE=1 and TRANSFORMERS_OFFLINE=1 so transitive huggingface_hub/transformers calls honor offline mode.
LLAMAFARM_MODEL_DIR=/opt/llamafarm/modelsFlat-directory model layout root. When set, the runtime looks for models under $LLAMAFARM_MODEL_DIR/<alias>/ first, before falling back to the HuggingFace cache.

The two can be used together, separately, or not at all. The default behavior (neither set) is unchanged from prior releases.

Canonical on-device layout

The layout that lf models path (from the CLI) emits as target paths:

$LLAMAFARM_MODEL_DIR/
├── manifest.json ← written by your ops tooling, not the runtime
├── qwen3-1.7b/ ← alias from runtime.models[].name
│ ├── model.Q4_K_M.gguf ← main weights
│ └── mmproj.f16.gguf ← optional multimodal projector
├── smollm-135m/
│ └── model.Q8_0.gguf
└── yolo11n/
└── yolo11n.pt ← vision model (when runtime supports it)

The runtime discovers files by format sniffing — extension plus GGUF magic bytes for .gguf files — rather than requiring specific filenames. Both the canonical names above (model.<QUANT>.gguf, mmproj.<precision>.gguf) and HF-preserved filenames (Qwen3-1.7B-Q4_K_M.gguf, mmproj-qwen-f16.gguf) work identically. When multiple weights-candidate files exist in the same alias directory, the quantization preference order Q4_K_M > Q4_K > Q5_K_M > Q5_K > Q8_0 > ... is applied.

How the alias is determined per runtime

Edge runtime — the alias is auto-derived from the incoming model field of the API request by stripping the org/ prefix and :quant suffix. All of these request values map to the same alias directory Qwen3-0.6B-GGUF:

Request modelDerived alias
Qwen/Qwen3-0.6B-GGUF:Q4_K_MQwen3-0.6B-GGUF
Qwen/Qwen3-0.6B-GGUF:Q8_0Qwen3-0.6B-GGUF
Qwen/Qwen3-0.6B-GGUFQwen3-0.6B-GGUF
Qwen3-0.6B-GGUFQwen3-0.6B-GGUF

This means API clients don't have to change anything — an operator places /opt/llamafarm/models/Qwen3-0.6B-GGUF/model.Q4_K_M.gguf on the Pi, and every request form above finds it. The trade-off: foo/my-model and bar/my-model collide on the same alias directory. If you need to disambiguate, send distinct base names in your request model_ids.

Universal runtime — if you're using the universal runtime's GGUF models, pass alias=<name> explicitly when constructing GGUFLanguageModel, or leave it unset to keep the legacy HF-cache-first behavior. Auto-derivation is edge-specific for now.

Resolution order

For each model alias, the runtime resolves its weights file in this order, first match wins:

  1. $LLAMAFARM_MODEL_DIR/<alias>/  (format-sniffed)


2. HuggingFace cache (existing behavior)


3. Network download (skipped when LLAMAFARM_OFFLINE=1)

In strict offline mode, step 3 is removed. If no tier matches, the runtime raises FileNotFoundError with a multi-line message naming the alias, the paths that were tried, and the lf command that would make the file available.

What about absolute paths in runtime.models[].model?

Absolute filesystem paths (e.g. runtime.models[0].model: /data/custom.gguf) are not resolved through the LLAMAFARM_MODEL_DIR tier — they flow through the legacy get_gguf_file_path entry point, which handles .gguf-suffixed inputs via a safe-directory basename lookup under ~/.llamafarm/models/ or $GGUF_MODELS_DIR/. This preserves existing behavior for projects that reference hand-placed files by absolute path; nothing about this handling changes with LLAMAFARM_MODEL_DIR.

If you want your hand-placed GGUF to be discovered via the canonical <alias>/ layout, either (a) move it under $LLAMAFARM_MODEL_DIR/<alias>/ and reference it by alias, or (b) move it under $GGUF_MODELS_DIR/ with its basename matching the runtime.models[].model value and continue referencing by filename.

End-to-end workflow with lf models path

The companion workflow on the CLI side (from feat-cli-models-path):

# 1. On a build host with internet, populate the HF cache
lf models pull

# 2. Get a transport plan telling you where the files live and where they
# should go on the device
lf models path --format json --target-root /opt/llamafarm/models

# 3. Your ops tooling (ansible, packer, rsync) copies the files per the plan
# Example: the Ansible playbook snippet from the lf-models docs

See lf models path for the full flag reference and example Ansible playbook.

Docker compose example

# docker-compose.yml on an air-gapped edge device
services:
llamafarm-edge:
image: llamafarm/edge-runtime:latest
ports:
- "11540:11540"
environment:
LLAMAFARM_OFFLINE: "1" # strict offline, no retries
LLAMAFARM_MODEL_DIR: /models # flat-dir layout
HF_HUB_OFFLINE: "1" # belt-and-suspenders
TRANSFORMERS_OFFLINE: "1" # belt-and-suspenders
LD_LIBRARY_PATH: /opt/llamafarm/bin # where llama.cpp lives
volumes:
- /opt/llamafarm/models:/models:ro # bind-mount from host
- /opt/llamafarm/bin:/opt/llamafarm/bin:ro # llama.cpp binary + deps

Note that LLAMAFARM_OFFLINE=1 automatically sets the two HF_*OFFLINE variables for you, so in practice you only need LLAMAFARM_OFFLINE and LLAMAFARM_MODEL_DIR in the compose file. The others are shown above for clarity.

The /opt/llamafarm/bin directory is populated on the host via:

lf runtime binary pull \
--platform linux/arm64 \
--accelerator cpu \
--export /opt/llamafarm/bin

See lf runtime binary for details.

Startup verification

On startup, the runtime emits a single structured log line showing the resolved mode:

INFO  llamafarm_offline_mode  mode=offline
model_dir=/opt/llamafarm/models
hf_hub_offline=1
transformers_offline=1

This is your grep-able verification that the deployment configuration was picked up correctly. If you see mode=online when you expected offline, the env var was not inherited by the runtime process.

Troubleshooting

Error: Model 'qwen3-1.7b' not available in offline mode.

Model 'qwen3-1.7b' not available in offline mode.
Tried: /opt/llamafarm/models/qwen3-1.7b/
Tried: /root/.cache/huggingface/hub/models--Qwen--Qwen3-1.7B-GGUF
To fix: run 'lf models pull Qwen/Qwen3-1.7B-GGUF' on a host with internet,
then sync the files to this host
Note: If the build host has internet, you can also use
'lf models path --ensure' to pull before emitting a plan.

The error names the alias, both places it looked, and the command that would populate them. On the build host, either:

  • Run lf models pull Qwen/Qwen3-1.7B-GGUF to cache the model, then re-sync the files to the device, OR
  • Run lf models path --ensure --format json to pull and emit in one shot, then ship the files per the plan.

Error: llama.cpp binary not available in offline mode for linux/arm64

The runtime couldn't find the llama.cpp shared library in any cached location. Fix with:

lf runtime binary pull --platform linux/arm64 --accelerator cpu --export /opt/llamafarm/bin

Then sync /opt/llamafarm/bin/ to the device.

Warning: skipping <path>: .gguf extension but missing GGUF magic bytes

Something with a .gguf extension exists in your alias directory but its first four bytes are not GGUF. Usually this means:

  • The file is a partial download (truncated) — delete it and re-sync.
  • The file is a text file or symlink target that isn't resolvable.
  • The file is from a corrupt copy step.

The runtime refuses to hand this file to llama-cpp because loading it would produce an opaque crash later.

Warning: LLAMAFARM_MODEL_DIR=<path> does not exist on disk

The root directory named by LLAMAFARM_MODEL_DIR does not exist. The runtime logs this as a warning and falls through to the HF cache. If you expected offline mode to fail here, you also need LLAMAFARM_OFFLINE=1; otherwise the cache fallback is doing its job.

I see mode=online but I set LLAMAFARM_OFFLINE=1

The env var isn't reaching the runtime process. In a Docker container, check that the environment: block in docker-compose.yml actually contains the variable and that docker compose config shows it in the resolved configuration. In an ansible-managed unit file, check that the Environment= line is present in the final rendered unit and that you ran systemctl daemon-reload.