browser-onnx

Implements high-performance local machine learning inference in the browser using ONNX Runtime Web. Use this skill when the user needs privacy-first, low-latency, or offline AI capabilities (e.g., image classification, object detection, or NLP) without server-side processing.

Safety Notice

This listing is imported from skills.sh public index metadata. Review upstream SKILL.md and repository scripts before running.

Copy this and send it to your AI assistant to learn

Install skill "browser-onnx" with this command: npx skills add thongnt0208/browser-onnx-skills/thongnt0208-browser-onnx-skills-browser-onnx

Browser-Based ONNX Inference

This skill provides a comprehensive workflow for executing ONNX models locally in the browser using ONNX Runtime Web (ORT-Web). Local inference offers significant advantages in data privacy, reduced server costs, and unlimited scalability as each user brings their own compute power.

1. Setup and Installation

Install the required library via npm:

npm install onnxruntime-web

Note: For experimental features like WebGPU or WebNN, use the nightly version onnxruntime-web@dev.

2. Global Environment Configuration

Set global ort.env flags before creating a session to optimize the runtime environment.

  • WebAssembly (CPU): Enable multi-threading by setting ort.env.wasm.numThreads (default is half of hardware concurrency) and use a Proxy Worker (ort.env.wasm.proxy = true) to keep the UI responsive.
  • WASM Paths: If binaries are not in the same directory as the JS bundle, manually override paths using ort.env.wasm.wasmPaths to point to local assets or a CDN.
  • WebGPU (GPU): Use ort.env.webgpu.profiling = { mode: 'default' } for performance diagnosis during development.

3. Creating an Inference Session

Initialize the session by choosing the appropriate Execution Provider (EP):

import * as ort from 'onnxruntime-web';

const session = await ort.InferenceSession.create('./model.onnx', {
  executionProviders: ['webgpu', 'wasm'], // Prioritize GPU, fallback to CPU
  graphOptimizationLevel: 'all' // Enable all graph-level optimizations
});

4. Data Preprocessing

Input data must match the model's training format (e.g., NCHW for vision models).

  • Image-to-Tensor: Use libraries like JIMP or OpenCV.js to resize, normalize (divide by 255.0), and convert RGBA to RGB.
  • Tensor Creation: Use new ort.Tensor('float32', float32Data,) to prepare the input feeds.

5. Optimized Inference Patterns

  • Graph Capture: For models with static shapes on WebGPU, enable enableGraphCapture: true to reduce CPU overhead by replaying kernel executions.
  • IO Binding: For transformer models, keep data on the GPU by using ort.Tensor.fromGpuBuffer() and setting preferredOutputLocation: 'gpu-buffer' to avoid expensive memory copies.
  • Quantization: Prefer uint8 quantized models for CPU (WASM) inference to improve performance; avoid float16 on CPU as it lacks native support and is slow.

6. Large Model Handling (>2GB)

  • Platform Limits: Browsers like Chrome limit ArrayBuffer to ~2GB. Models exceeding this must be exported with external data.
  • Loading External Data: Explicitly link external weight files in the session options:
    const session = await ort.InferenceSession.create(modelUrl, {
      externalData: [{ path: './model.data', data: dataUrl }]
    });
    

7. Common Edge Cases

  • Memory Management: Explicitly call tensor.dispose() for GPU tensors to prevent memory leaks.
  • Zero-Sized Tensors: ORT-Web treats tensors with a dimension of 0 as CPU tensors regardless of the selected EP.
  • Thermal Throttling: Sustained inference on mobile devices may trigger frequency scaling, doubling latency. Use lightweight "tiny" models to maintain thermal equilibrium.

8. Examples

Multilingual Translation

Offload heavy translation tasks to a separate Web Worker using a singleton pattern to ensure the model (e.g., NLLB-200) loads only once.

Object Detection (YOLO)

Implement Non-Max Suppression (NMS). If the browser lacks support for specific NMS ops, run a separate NMS ONNX model to filter overlapping boxes locally.

Source Transparency

This detail page is rendered from real SKILL.md content. Trust labels are metadata-based hints, not a safety guarantee.

Related Skills

Related by shared tags or category signals.

General

test_skill

import json import tkinter as tk from tkinter import messagebox, simpledialog

Archived SourceRecently Updated
General

neo

Browse websites, read web pages, interact with web apps, call website APIs, and automate web tasks. Use Neo when: user asks to check a website, read a web page, post on social media (Twitter/X), interact with any web app, look up information on a specific site, scrape data from websites, automate browser tasks, or when you need to call any website's API. Keywords: website, web page, browse, URL, http, API, twitter, tweet, post, scrape, web app, open site, check site, read page, social media, online service.

Archived SourceRecently Updated
General

image-gen

Generate AI images from text prompts. Triggers on: "生成图片", "画一张", "AI图", "generate image", "配图", "create picture", "draw", "visualize", "generate an image".

Archived SourceRecently Updated
General

explainer

Create explainer videos with narration and AI-generated visuals. Triggers on: "解说视频", "explainer video", "explain this as a video", "tutorial video", "introduce X (video)", "解释一下XX(视频形式)".

Archived SourceRecently Updated