Skip to main content

Vision Pipeline Guide

The Universal Runtime provides a complete computer vision pipeline with YOLO object detection, CLIP zero-shot classification, cascade streaming inference, model training, and ONNX export.

Overview

The vision pipeline supports:

  • YOLO Object Detection: Detect and locate objects in images with bounding boxes
  • CLIP Classification: Zero-shot image classification with custom class labels
  • Cascade Streaming: Multi-model chains that escalate when confidence is low
  • Training: Fine-tune detection and classification models on custom datasets
  • Model Management: Save, load, list, and export models (ONNX, CoreML, TensorRT)

Quick Start

Detect Objects

# Base64-encode an image and detect objects
IMAGE=$(base64 -w0 photo.jpg)

curl -X POST http://localhost:11540/v1/vision/detect \
-H "Content-Type: application/json" \
-d '{
"image": "'$IMAGE'",
"model": "yolov8n",
"confidence_threshold": 0.5
}'

Response:

{
"detections": [
{
"box": {"x1": 120.5, "y1": 80.2, "x2": 350.8, "y2": 420.1},
"class_name": "person",
"class_id": 0,
"confidence": 0.92
},
{
"box": {"x1": 400.0, "y1": 200.0, "x2": 550.0, "y2": 380.0},
"class_name": "dog",
"class_id": 16,
"confidence": 0.87
}
],
"model": "yolov8n",
"inference_time_ms": 45.2
}

Classify Images

curl -X POST http://localhost:11540/v1/vision/classify \
-H "Content-Type: application/json" \
-d '{
"image": "'$IMAGE'",
"model": "clip-vit-base",
"classes": ["cat", "dog", "bird", "car", "person"],
"top_k": 3
}'

Response:

{
"class_name": "dog",
"class_id": 1,
"confidence": 0.89,
"all_scores": {
"cat": 0.05,
"dog": 0.89,
"bird": 0.02,
"car": 0.01,
"person": 0.03
},
"model": "clip-vit-base",
"inference_time_ms": 32.1
}

Object Detection (YOLO)

POST /v1/vision/detect

Detect objects in an image using YOLO models. Returns bounding boxes with class labels and confidence scores.

Request:

FieldTypeRequiredDefaultDescription
imagestringYesBase64-encoded image
modelstringNoyolov8nYOLO model variant
confidence_thresholdfloatNo0.5Minimum confidence (0.0–1.0)
classesstring[]NoallFilter to specific class names

Available Models:

ModelSpeedAccuracyUse Case
yolov8nFastestGoodReal-time, edge devices
yolov8sFastBetterBalanced performance
yolov8mMediumHighGeneral purpose
yolov8lSlowHigherHigh accuracy needs
yolov8xSlowestHighestMaximum accuracy

Python Example

import base64
import requests

with open("photo.jpg", "rb") as f:
image_b64 = base64.b64encode(f.read()).decode()

resp = requests.post("http://localhost:11540/v1/vision/detect", json={
"image": image_b64,
"model": "yolov8n",
"confidence_threshold": 0.5,
"classes": ["person", "car"]
})

for det in resp.json()["detections"]:
print(f"{det['class_name']}: {det['confidence']:.2f} at ({det['box']['x1']:.0f},{det['box']['y1']:.0f})")

Zero-Shot Classification (CLIP)

POST /v1/vision/classify

Classify images into arbitrary categories without training using CLIP. You provide the class labels at inference time.

Request:

FieldTypeRequiredDefaultDescription
imagestringYesBase64-encoded image
modelstringNoclip-vit-baseCLIP model variant
classesstring[]YesClass labels for zero-shot classification
top_kintNo5Number of top results (1–100)

Python Example

resp = requests.post("http://localhost:11540/v1/vision/classify", json={
"image": image_b64,
"model": "clip-vit-base",
"classes": ["defective product", "good product", "packaging damage"],
"top_k": 3
})

result = resp.json()
print(f"Classification: {result['class_name']} ({result['confidence']:.2%})")

Cascade Streaming

Cascade streaming processes frames through a chain of models, escalating to more powerful (or remote) models when confidence is low. This is ideal for real-time monitoring where you want fast inference most of the time but accuracy on difficult frames.

Start a Session

curl -X POST http://localhost:11540/v1/vision/stream/start \
-H "Content-Type: application/json" \
-d '{
"config": {
"chain": ["yolov8n", "yolov8m"],
"confidence_threshold": 0.7
},
"target_fps": 1.0,
"action_classes": ["person", "vehicle"],
"cooldown_seconds": 5.0
}'

Response:

{"session_id": "a1b2c3d4"}

Process Frames

curl -X POST http://localhost:11540/v1/vision/stream/frame \
-H "Content-Type: application/json" \
-d '{
"session_id": "a1b2c3d4",
"image": "'$IMAGE'"
}'

Response (confident detection):

{
"status": "action",
"detections": [
{"x1": 100, "y1": 50, "x2": 300, "y2": 400, "class_name": "person", "class_id": 0, "confidence": 0.85}
],
"confidence": 0.85,
"resolved_by": "yolov8n"
}

Response (escalated to larger model):

{
"status": "escalated",
"detections": [...],
"confidence": 0.78,
"resolved_by": "yolov8m"
}

Response (no confident detection):

{"status": "ok"}

Stop Session

curl -X POST http://localhost:11540/v1/vision/stream/stop \
-H "Content-Type: application/json" \
-d '{"session_id": "a1b2c3d4"}'

Response:

{
"session_id": "a1b2c3d4",
"frames_processed": 150,
"actions_triggered": 12,
"escalations": 3,
"duration_seconds": 152.4
}

Cascade Configuration

FieldTypeDefaultDescription
chainstring[]["yolov8n"]Models to try in order. Can include remote:http://... for remote models
confidence_thresholdfloat0.7Minimum confidence before escalating to next model
target_fpsfloat1.0Target frame processing rate
action_classesstring[]allFilter detections to specific classes
cooldown_secondsfloat5.0Minimum seconds between action triggers
Remote Cascade

You can include remote models in the chain for Atmosphere mesh integration:

{"chain": ["yolov8n", "remote:http://gpu-server:11540/v1/vision/detect"]}

Remote hosts must be in the configured allowlist (SSRF protection).

Python Streaming Example

import base64
import time
import requests

BASE = "http://localhost:11540"

# Start session
session = requests.post(f"{BASE}/v1/vision/stream/start", json={
"config": {"chain": ["yolov8n", "yolov8m"], "confidence_threshold": 0.7},
"action_classes": ["person"],
"cooldown_seconds": 2.0
}).json()

sid = session["session_id"]

# Process frames (e.g., from a camera)
for frame_bytes in camera_frames():
image_b64 = base64.b64encode(frame_bytes).decode()
result = requests.post(f"{BASE}/v1/vision/stream/frame", json={
"session_id": sid,
"image": image_b64
}).json()

if result["status"] in ("action", "escalated"):
print(f"Detected: {result['detections']} (by {result['resolved_by']})")

# Stop session
stats = requests.post(f"{BASE}/v1/vision/stream/stop", json={"session_id": sid}).json()
print(f"Processed {stats['frames_processed']} frames, {stats['actions_triggered']} actions")

Training

Fine-tune detection or classification models on your own datasets.

Start Training

curl -X POST http://localhost:11540/v1/vision/train \
-H "Content-Type: application/json" \
-d '{
"model": "my-detector",
"dataset": "/path/to/dataset",
"task": "detection",
"config": {
"epochs": 50,
"batch_size": 16,
"learning_rate": 0.001
},
"base_model": "yolov8n"
}'

Response:

{
"job_id": "train-abc123",
"status": "running",
"progress": 0.0,
"metrics": null
}

Check Training Status

curl http://localhost:11540/v1/vision/train/train-abc123

Response:

{
"job_id": "train-abc123",
"status": "running",
"progress": 0.65,
"current_epoch": 33,
"total_epochs": 50,
"metrics": {
"mAP50": 0.82,
"mAP50-95": 0.61,
"loss": 0.034
},
"error": null
}

Cancel Training

curl -X DELETE http://localhost:11540/v1/vision/train/train-abc123

Training Parameters

FieldTypeRequiredDefaultDescription
modelstringYesName for the trained model
datasetstringYesPath to training dataset
taskstringYesdetection or classification
config.epochsintNo10Training epochs (1–1000)
config.batch_sizeintNo16Batch size (1–256)
config.learning_ratefloatNo0.001Learning rate
base_modelstringNoPre-trained model to fine-tune from

Model Management

List Models

curl http://localhost:11540/v1/vision/models

Response:

{
"models": [
{
"name": "my-detector",
"source_model_id": "yolov8n",
"versions": 3,
"has_current": true,
"size_mb": 12.4
}
],
"total": 1
}

Save a Model

curl -X POST "http://localhost:11540/v1/vision/models/save?model_id=yolov8n&name=production-detector"

Load a Model

curl -X POST "http://localhost:11540/v1/vision/models/load?name=production-detector"

Export Model

Export models to optimized formats for deployment:

curl -X POST http://localhost:11540/v1/vision/models/export \
-H "Content-Type: application/json" \
-d '{
"model_id": "my-detector",
"format": "onnx",
"quantization": "fp16"
}'

Response:

{
"export_path": "/models/exports/my-detector.onnx",
"format": "onnx",
"size_mb": 6.2,
"export_time_seconds": 3.45
}

Export Formats:

FormatDescriptionBest For
onnxOpen Neural Network ExchangeCross-platform deployment
coremlApple Core MLiOS/macOS apps
tensorrtNVIDIA TensorRTNVIDIA GPU inference
tfliteTensorFlow LiteMobile/edge devices
openvinoIntel OpenVINOIntel hardware

Quantization Options:

OptionDescription
fp32Full precision (largest, highest quality)
fp16Half precision (good balance)
int88-bit integer (smallest, fastest)

API Reference

EndpointMethodDescription
/v1/vision/detectPOSTDetect objects with YOLO
/v1/vision/classifyPOSTClassify images with CLIP
/v1/vision/stream/startPOSTStart cascade streaming session
/v1/vision/stream/framePOSTProcess a frame in a session
/v1/vision/stream/stopPOSTStop a streaming session
/v1/vision/trainPOSTStart a training job
/v1/vision/train/{job_id}GETGet training job status
/v1/vision/train/{job_id}DELETECancel a training job
/v1/vision/modelsGETList saved models
/v1/vision/models/savePOSTSave a model
/v1/vision/models/loadPOSTLoad a saved model
/v1/vision/models/exportPOSTExport to ONNX/CoreML/etc.

Next Steps