Ai Plugin

Client-side AI plugin providing Text Inference (LLM) capabilities. All processing runs locally on-device without server dependencies.

Features

TextInference: Local LLM chat with streaming responses
AiModelManager: Model download, caching, and registry management
Cross-platform: Desktop (macOS/Windows/Linux), Browser (WASM), Mobile (iOS/Android)
Automatic model downloads from HuggingFace

Quick Start

import Clayground.Ai

TextInference {
    id: llm
    modelId: "smollm2-1.7b"
    systemPrompt: "You are a helpful assistant."

    onToken: (tok) => console.log(tok)
    onResponse: (full) => console.log("Done:", full)
}

Button {
    text: "Ask"
    onClicked: llm.send("Hello, what can you do?")
}

Components

TextInference

Local LLM text generation with automatic model management.

Properties:

modelId: Model to use (triggers auto-download)
systemPrompt: System prompt for conversation
maxTokens: Maximum tokens per response
temperature: Sampling temperature (0.0-2.0)
modelReady: Whether model is loaded
generating: Whether generation is in progress
downloading: Whether model is downloading
downloadProgress: Download progress (0.0-1.0)

Methods:

send(message): Send user message
stop(): Stop generation
clear(): Clear conversation
unload(): Unload model

Signals:

token(string): Emitted per token (streaming)
response(string): Emitted when complete
error(string): Emitted on error

AiModelManager

Manages model downloads and caching.

Properties:

registryUrl: Custom model registry URL
hasWebGPU: WebGPU availability (browser)
platform: Current platform
activeDownloads: In-progress downloads

Methods:

isAvailable(modelId): Check if cached
modelInfo(modelId): Get model metadata
availableModels(type): List models (“llm”, “stt”, “tts”)
download(modelId): Start download
cancelDownload(modelId): Cancel download
checkMemory(modelId): Check memory requirements

Available Models

Model	Size	Platform	Use Case
smollm2-1.7b	~1 GB	Desktop, WebGPU	Best quality for size
smollm2-360m	~230 MB	All	Lightweight, fast
qwen2.5-1.5b	~986 MB	Desktop, WebGPU	Better reasoning
llama3.2-1b	~776 MB	All	Meta optimized

Platform Notes

Desktop (macOS)

Uses llama.cpp with Metal acceleration
Models cached in ~/.cache/clayground_ai/models/

Browser (WASM)

Uses wllama (llama.cpp WASM binding)
Models cached in IndexedDB
WebGPU auto-detected for faster inference

Mobile

CPU inference only
Use smaller models (smollm2-360m) for better performance

Future Ideas

TextToSpeech: Client-side TTS using sherpa-onnx
SpeechToText: Client-side STT using whisper.cpp