AI Plugin
Client-side AI plugin providing Text Inference (LLM) capabilities. All processing runs locally on-device without server dependencies.
Features
- TextInference: Local LLM chat with streaming responses
- AiModelManager: Model download, caching, and registry management
- Cross-platform: Desktop (macOS/Windows/Linux), Browser (WASM), Mobile (iOS/Android)
- Automatic model downloads from HuggingFace
Quick Start
import Clayground.Ai
TextInference {
id: llm
modelId: "smollm2-1.7b"
systemPrompt: "You are a helpful assistant."
onToken: (tok) => console.log(tok)
onResponse: (full) => console.log("Done:", full)
}
Button {
text: "Ask"
onClicked: llm.send("Hello, what can you do?")
}
Components
TextInference
Local LLM text generation with automatic model management.
Properties:
modelId: Model to use (triggers auto-download)
systemPrompt: System prompt for conversation
maxTokens: Maximum tokens per response
temperature: Sampling temperature (0.0-2.0)
modelReady: Whether model is loaded
generating: Whether generation is in progress
downloading: Whether model is downloading
downloadProgress: Download progress (0.0-1.0)
Methods:
send(message): Send user message
stop(): Stop generation
clear(): Clear conversation
unload(): Unload model
Signals:
token(string): Emitted per token (streaming)
response(string): Emitted when complete
error(string): Emitted on error
AiModelManager
Manages model downloads and caching.
Properties:
registryUrl: Custom model registry URL
hasWebGPU: WebGPU availability (browser)
platform: Current platform
activeDownloads: In-progress downloads
Methods:
isAvailable(modelId): Check if cached
modelInfo(modelId): Get model metadata
availableModels(type): List models (“llm”, “stt”, “tts”)
download(modelId): Start download
cancelDownload(modelId): Cancel download
checkMemory(modelId): Check memory requirements
Available Models
| Model |
Size |
Platform |
Use Case |
| smollm2-1.7b |
~1 GB |
Desktop, WebGPU |
Best quality for size |
| smollm2-360m |
~230 MB |
All |
Lightweight, fast |
| qwen2.5-1.5b |
~986 MB |
Desktop, WebGPU |
Better reasoning |
| llama3.2-1b |
~776 MB |
All |
Meta optimized |
Desktop (macOS)
- Uses llama.cpp with Metal acceleration
- Models cached in
~/.cache/clayground_ai/models/
Browser (WASM)
- Uses wllama (llama.cpp WASM binding)
- Models cached in IndexedDB
- WebGPU auto-detected for faster inference
Mobile
- CPU inference only
- Use smaller models (smollm2-360m) for better performance
Future Ideas
- TextToSpeech: Client-side TTS using sherpa-onnx
- SpeechToText: Client-side STT using whisper.cpp
API Reference
AiModelManager
Manages AI model downloads, caching, and registry
View full documentation
Properties
| Name | Type | Description |
activeDownloads readonly | list | List of currently active downloads |
hasWebGPU readonly | bool | Whether WebGPU is available (browser only) |
platform readonly | string | Current platform: "desktop", "wasm", "ios", or "android" |
registryReady readonly | bool | Whether the model registry has been loaded |
registryUrl | url | Custom model registry URL |
Methods
| Method | Returns | Description |
availableModels(string type) | list | |
cachedModels() | list | |
cancelDownload(string modelId) | void | |
checkMemory(string modelId) | bool | |
download(string modelId) | void | |
isAvailable(string modelId) | bool | |
modelInfo(string modelId) | object | |
modelPath(string modelId) | string | |
refreshRegistry() | void | |
remove(string modelId) | void | |
Signals
| Signal | Description |
downloadCancelled(string modelId) | |
downloadComplete(string modelId) | |
downloadError(string modelId, string message) | |
downloadProgress(string modelId, real progress, int bytesDownloaded, int totalBytes) | |
downloadStarted(string modelId, int totalBytes) | |
registryUpdated() | |
AiModelManagerBackend
C++ backend for downloading and managing AI models
View full documentation
Properties
| Name | Type | Description |
activeDownloads readonly | list | List of currently active downloads |
hasWebGPU readonly | bool | Whether WebGPU is available (browser only) |
platform readonly | string | Current platform identifier |
registryReady readonly | bool | Whether the model registry has been loaded |
registryUrl | url | Custom model registry URL |
Methods
| Method | Returns | Description |
availableModels(string type) | list | |
cachedModels() | list | |
cancelDownload(string modelId) | void | |
checkMemory(string modelId) | bool | |
download(string modelId) | void | |
isAvailable(string modelId) | bool | |
modelInfo(string modelId) | object | |
modelPath(string modelId) | string | |
refreshRegistry() | void | |
remove(string modelId) | void | |
Signals
| Signal | Description |
downloadCancelled(string modelId) | |
downloadComplete(string modelId) | |
downloadError(string modelId, string message) | |
downloadProgress(string modelId, real progress, int bytesDownloaded, int totalBytes) | |
downloadStarted(string modelId, int totalBytes) | |
registryUpdated() | |
LlmEngineBackend
C++ backend for local LLM inference using llama.cpp
View full documentation
Properties
| Name | Type | Description |
currentResponse readonly | string | Response text accumulated so far during generation |
generating readonly | bool | Whether text generation is in progress |
loadProgress readonly | real | Model loading progress (0.0 to 1.0) |
maxTokens | int | Maximum number of tokens to generate per response |
modelLoading readonly | bool | Whether the model is currently being loaded |
modelPath | string | Path to the GGUF model file |
modelReady readonly | bool | Whether the model is loaded and ready for inference |
systemPrompt | string | System prompt prepended to every conversation |
temperature | real | Sampling temperature (0.0 to 2.0) |
Methods
| Method | Returns | Description |
clear() | void | |
send(string message) | void | |
stop() | void | |
unload() | void | |
Signals
| Signal | Description |
error(string message) | |
response(string fullText) | |
token(string token) | |
Sandbox
Test sandbox for AI plugin components
TextInference
Client-side LLM text generation
View full documentation
Properties
| Name | Type | Description |
currentResponse readonly | string | Current response being generated |
downloadProgress readonly | real | Download progress (0.0 to 1.0) |
downloadedBytes readonly | int | Bytes downloaded so far |
downloading readonly | bool | Whether the model is being downloaded |
generating readonly | bool | Whether text generation is in progress |
loadProgress readonly | real | Model loading progress (0.0 to 1.0) |
maxTokens | int | Maximum tokens to generate per response |
modelId | string | Model to use for inference |
modelLoading readonly | bool | Whether the model is being loaded into memory |
modelReady readonly | bool | Whether the model is loaded and ready for inference |
noModel readonly | string | Special value to cancel download/unload model |
systemPrompt | string | System prompt for the conversation |
temperature | real | Sampling temperature (0.0 to 2.0) |
totalBytes readonly | int | Total bytes to download |
Methods
| Method | Returns | Description |
clear() | void | |
send(string message) | void | |
stop() | void | |
unload() | void | |
Signals
| Signal | Description |
downloadCancelled() | |
downloadStarted(int totalBytes) | |
error(string message) | |
modelDownloaded() | |
modelReadySignal() | |
response(string fullText) | |
token(string token) | |