Ai Plugin
Client-side AI plugin providing Text Inference (LLM) capabilities. All processing runs locally on-device without server dependencies.
Features
- TextInference: Local LLM chat with streaming responses
- AiModelManager: Model download, caching, and registry management
- Cross-platform: Desktop (macOS/Windows/Linux), Browser (WASM), Mobile (iOS/Android)
- Automatic model downloads from HuggingFace
Quick Start
import Clayground.Ai
TextInference {
id: llm
modelId: "smollm2-1.7b"
systemPrompt: "You are a helpful assistant."
onToken: (tok) => console.log(tok)
onResponse: (full) => console.log("Done:", full)
}
Button {
text: "Ask"
onClicked: llm.send("Hello, what can you do?")
}
Components
TextInference
Local LLM text generation with automatic model management.
Properties:
modelId: Model to use (triggers auto-download)systemPrompt: System prompt for conversationmaxTokens: Maximum tokens per responsetemperature: Sampling temperature (0.0-2.0)modelReady: Whether model is loadedgenerating: Whether generation is in progressdownloading: Whether model is downloadingdownloadProgress: Download progress (0.0-1.0)
Methods:
send(message): Send user messagestop(): Stop generationclear(): Clear conversationunload(): Unload model
Signals:
token(string): Emitted per token (streaming)response(string): Emitted when completeerror(string): Emitted on error
AiModelManager
Manages model downloads and caching.
Properties:
registryUrl: Custom model registry URLhasWebGPU: WebGPU availability (browser)platform: Current platformactiveDownloads: In-progress downloads
Methods:
isAvailable(modelId): Check if cachedmodelInfo(modelId): Get model metadataavailableModels(type): List models (“llm”, “stt”, “tts”)download(modelId): Start downloadcancelDownload(modelId): Cancel downloadcheckMemory(modelId): Check memory requirements
Available Models
| Model | Size | Platform | Use Case |
|---|---|---|---|
| smollm2-1.7b | ~1 GB | Desktop, WebGPU | Best quality for size |
| smollm2-360m | ~230 MB | All | Lightweight, fast |
| qwen2.5-1.5b | ~986 MB | Desktop, WebGPU | Better reasoning |
| llama3.2-1b | ~776 MB | All | Meta optimized |
Platform Notes
Desktop (macOS)
- Uses llama.cpp with Metal acceleration
- Models cached in
~/.cache/clayground_ai/models/
Browser (WASM)
- Uses wllama (llama.cpp WASM binding)
- Models cached in IndexedDB
- WebGPU auto-detected for faster inference
Mobile
- CPU inference only
- Use smaller models (smollm2-360m) for better performance
Future Ideas
- TextToSpeech: Client-side TTS using sherpa-onnx
- SpeechToText: Client-side STT using whisper.cpp