`lf runtime`

Manage shared runtime components that the LlamaFarm server, edge runtime, and universal runtime all depend on. Today this is scoped to llama.cpp binaries; future subcommands may manage other shared infrastructure.

These commands are primarily for deployment pipelines — especially air-gapped or Dockerfile builds where you cannot rely on first-run downloads.

Synopsis

lf runtime binary pull [flags]
lf runtime binary path [flags]

`lf runtime binary pull`

Download the pinned llama.cpp shared library and its dependency files for a target platform and accelerator into the local LlamaFarm cache.

# Current host, best accelerator (default)
lf runtime binary pull

# Linux ARM64 build for a Raspberry Pi (from a macOS dev host)
lf runtime binary pull --platform linux/arm64 --accelerator cpu

# Fetch and materialize a flat directory for Ansible/Packer pickup
lf runtime binary pull --platform linux/arm64 --export /tmp/lf-bin

# Pin a specific llama.cpp release
lf runtime binary pull --version b7800

Flags:

--platform <os>/<arch> — Target OS/arch (e.g. linux/arm64, darwin/arm64). Defaults to the current host.
--accelerator <backend> — Compute backend: cpu, cuda, metal, vulkan, rocm. Defaults to the best supported backend for the target platform.
--version <tag> — llama.cpp version tag. Defaults to the version baked into this CLI (from llama-cpp-version.txt).
--export <dir> — After download, copy the binary and all dependency libraries into this directory as a flat layout (preserves symlinks). Size on disk is typically 50–200 MB.

Supported platform/accelerator combinations:

OS	Arch	Accelerator
darwin	arm64	metal
darwin	amd64	cpu
linux	amd64	cpu, vulkan
linux	arm64	cpu
windows	amd64	cpu, cuda, vulkan

Linux cuda and rocm fall back to vulkan because upstream llama.cpp no longer ships prebuilt CUDA/ROCm Linux binaries.

`lf runtime binary path`

Print the absolute path to the cached main library for the specified target. Exits non-zero with a remediation message if the binary has not been downloaded for that target yet.

lf runtime binary path
lf runtime binary path --platform linux/arm64 --accelerator cpu

Flags match lf runtime binary pull except there is no --export.

Example: Dockerfile integration

Pre-fetch the llama.cpp binary at image build time instead of on first container start. Note that models are not baked into the image — they are mounted from the host filesystem at runtime.

FROM ubuntu:24.04 AS base
# ... base setup ...

# Fetch the llama.cpp binary for the target platform into a flat dir.
# The dir is then copied into the image at a known location.
RUN lf runtime binary pull \
      --platform linux/arm64 \
      --accelerator cpu \
      --export /opt/llamafarm/bin

ENV LD_LIBRARY_PATH=/opt/llamafarm/bin
ENV LLAMAFARM_OFFLINE=1

# Models are mounted at runtime, not baked in:
VOLUME /opt/llamafarm/models

Example: Ansible integration

- name: Pull llama.cpp binary for target
  command: >
    lf runtime binary pull
      --platform linux/arm64
      --accelerator cpu
      --export /tmp/lf-bin
  delegate_to: localhost

- name: Sync binary dir to device
  synchronize:
    src: /tmp/lf-bin/
    dest: /opt/llamafarm/bin/
    rsync_opts: ["--checksum"]

Synopsis​

lf runtime binary pull​

lf runtime binary path​

Example: Dockerfile integration​

Example: Ansible integration​

See Also​

Synopsis

`lf runtime binary pull`

`lf runtime binary path`

Example: Dockerfile integration

Example: Ansible integration

See Also