Skip to main content

ML Training

Classifier Training

The Designer includes built-in tools for training classifiers and anomaly detection models — no ML expertise required. These run on the Universal Runtime.

Text Classifiers

Train a text classifier using SetFit (few-shot fine-tuning). Great for sentiment analysis, ticket routing, content categorization, and more.

How It Works

  1. Choose a base model — select an embedder model (default: all-MiniLM-L6-v2)
  2. Add training data — enter text/label pairs, or use a sample dataset
  3. Train — click Train and watch progress in real-time
  4. Test — enter text and see predictions with confidence scores

Available Base Models

ModelDimensionsNotes
all-MiniLM-L6-v2384Default, fast, good general purpose
bge-small-en-v1.5384Strong English performance
bge-base-en-v1.5768Larger, more accurate
bge-large-en-v1.51024Best accuracy, slower
bge-m31024Multilingual support
e5-base-v2768Good for retrieval tasks
e5-large-v21024Larger variant

Sample Datasets

Built-in sample datasets to get started quickly:

  • Sentiment analysis — 3 classes, 200 examples (positive/negative/neutral)
  • Additional domain-specific samples available

Managing Trained Models

The Trained Models view lists all your classifier models with:

  • Model name and version
  • Training timestamp
  • Number of classes and examples
  • Actions: load, test, delete

Anomaly Detection

Train anomaly detection models using PyOD backends. Useful for fraud detection, system monitoring, quality control, and any scenario where you need to identify outliers.

Backends

12 PyOD backends organized by category:

CategoryBackendsBest for
Fast (Recommended)ECOD, HBOS, COPODGeneral purpose, parameter-free
Legacy (Well-Tested)Isolation Forest, LOF, KNN, OCSVMTraditional ML approaches
Deep LearningAutoEncoder, VAEComplex patterns
EnsembleSUOD, LSCPCombining multiple detectors

Training Flow

  1. Select a backend — ECOD recommended for most cases
  2. Configure features — define feature columns with encoding types and normalization
  3. Add training data — paste text/CSV or use table input mode
  4. Set threshold — contamination ratio (default varies by backend)
  5. Train — model trains and shows results

Feature Configuration

Each feature column supports:

  • Encoding types — numeric, one-hot, label, ordinal, binary, frequency
  • Normalization — standard, min-max, robust, none

Streaming Anomaly Detection

For real-time monitoring, the streaming mode provides:

  • Mode toggle — switch between batch and streaming detection
  • Status panel — connection status, events processed, anomalies detected
  • Results chart — live visualization of anomaly scores over time
  • Mode panel — configure streaming parameters

API Routes

ActionMethodRoute
Train classifierPOST/v1/ml/classifier/fit
Predict (classifier)POST/v1/ml/classifier/predict
List classifier modelsGET/v1/ml/classifier/models
Train anomaly modelPOST/v1/ml/anomaly/fit
Score anomalyPOST/v1/ml/anomaly/score
List anomaly modelsGET/v1/ml/anomaly/models

Route

/chat/models/train/classifier/new
/chat/models/train/classifier/:id
/chat/models/train/anomaly/new
/chat/models/train/anomaly/:id