Gemma Gem Browser AI

Skill by ara.so — Daily 2026 Skills collection.

Gemma Gem is a Chrome extension that runs Google's Gemma 4 model entirely on-device via WebGPU. It injects a chat overlay into every page and exposes a tool-calling agent loop that can read pages, click elements, fill forms, execute JavaScript, and take screenshots — all without sending data to any server.

Architecture Overview

Offscreen Document          Service Worker           Content Script
(Gemma 4 + Agent Loop)  <-> (Message Router)    <-> (Chat UI + DOM Tools)
       |                         |
  WebGPU inference          Screenshot capture
  Token streaming           JS execution

Offscreen document (offscreen/): Loads the ONNX model via @huggingface/transformers, runs the agent loop, streams tokens.
Service worker (background/): Routes messages, handles take_screenshot and run_javascript.
Content script (content/): Injects shadow DOM chat UI, executes DOM tools.
agent/: Zero-dependency module defining ModelBackend and ToolExecutor interfaces — extractable as a standalone library.

Install & Build

# Prerequisites: Node.js 18+, pnpm
pnpm install

# Development build (logging active, source maps)
pnpm build

# Production build (errors only, minified)
pnpm build:prod

Load the extension:

Open chrome://extensions
Enable Developer mode
Click Load unpacked → select .output/chrome-mv3-dev/

Model download happens automatically on first chat open:

onnx-community/gemma-4-E2B-it-ONNX — ~500 MB (default)
onnx-community/gemma-4-E4B-it-ONNX — ~1.5 GB

Models are cached in the browser's cache storage after the first run.

Key Interfaces (`agent/`)

ModelBackend

// agent/types.ts
export interface ModelBackend {
  generate(
    messages: ChatMessage[],
    tools: ToolDefinition[],
    options: GenerateOptions
  ): AsyncGenerator<StreamChunk>;
}

export interface ToolDefinition {
  name: string;
  description: string;
  parameters: JSONSchema;
}

export interface GenerateOptions {
  maxNewTokens?: number;
  thinking?: boolean;
}

ToolExecutor

// agent/types.ts
export interface ToolExecutor {
  execute(toolName: string, args: Record<string, unknown>): Promise<unknown>;
}

Agent Loop

// agent/loop.ts — simplified illustration
export async function* runAgentLoop(
  userMessage: string,
  history: ChatMessage[],
  model: ModelBackend,
  tools: ToolExecutor,
  toolDefs: ToolDefinition[],
  maxIterations: number
): AsyncGenerator<AgentEvent> {
  const messages = [...history, { role: "user", content: userMessage }];

  for (let i = 0; i < maxIterations; i++) {
    for await (const chunk of model.generate(messages, toolDefs, {})) {
      if (chunk.type === "token") yield { type: "token", token: chunk.token };
      if (chunk.type === "tool_call") {
        yield { type: "tool_start", name: chunk.name };
        const result = await tools.execute(chunk.name, chunk.args);
        yield { type: "tool_result", name: chunk.name, result };
        messages.push({ role: "tool", name: chunk.name, content: String(result) });
      }
      if (chunk.type === "done") return;
    }
  }
}

Built-in Tools

Tool	Location	Description
`read_page_content`	Content script	Read page text/HTML or a CSS selector
`take_screenshot`	Service worker	Capture visible tab as PNG
`click_element`	Content script	Click by CSS selector
`type_text`	Content script	Type into input by CSS selector
`scroll_page`	Content script	Scroll by pixel amount
`run_javascript`	Service worker	Execute JS in page context

Adding a New Tool

Tools live in two places: the definition (in the offscreen agent) and the executor (in content script or service worker).

Step 1 — Define the tool schema

// offscreen/tools/definitions.ts
export const MY_TOOL_DEFINITION: ToolDefinition = {
  name: "get_page_title",
  description: "Returns the document title of the current page.",
  parameters: {
    type: "object",
    properties: {},
    required: [],
  },
};

Step 2 — Register in the tool list

// offscreen/tools/index.ts
import { MY_TOOL_DEFINITION } from "./definitions";

export const ALL_TOOLS: ToolDefinition[] = [
  // ...existing tools
  MY_TOOL_DEFINITION,
];

Step 3 — Implement execution in the content script

// content/tools/executor.ts
export async function executeContentTool(
  name: string,
  args: Record<string, unknown>
): Promise<unknown> {
  switch (name) {
    case "get_page_title":
      return document.title;

    case "read_page_content": {
      const selector = args.selector as string | undefined;
      if (selector) {
        return document.querySelector(selector)?.textContent ?? "Not found";
      }
      return document.body.innerText;
    }

    case "click_element": {
      const el = document.querySelector(args.selector as string) as HTMLElement;
      if (!el) throw new Error(`Element not found: ${args.selector}`);
      el.click();
      return "clicked";
    }

    case "type_text": {
      const input = document.querySelector(args.selector as string) as HTMLInputElement;
      if (!input) throw new Error(`Input not found: ${args.selector}`);
      input.focus();
      input.value = args.text as string;
      input.dispatchEvent(new Event("input", { bubbles: true }));
      input.dispatchEvent(new Event("change", { bubbles: true }));
      return "typed";
    }

    default:
      throw new Error(`Unknown content tool: ${name}`);
  }
}

Step 4 — Handle service-worker-side tools

// background/tools.ts
export async function executeSwTool(
  name: string,
  args: Record<string, unknown>,
  tabId: number
): Promise<unknown> {
  switch (name) {
    case "take_screenshot": {
      const dataUrl = await chrome.tabs.captureVisibleTab({ format: "png" });
      return dataUrl;
    }

    case "run_javascript": {
      const results = await chrome.scripting.executeScript({
        target: { tabId },
        func: new Function(args.code as string) as () => unknown,
      });
      return results[0]?.result ?? null;
    }

    default:
      return null; // not a SW tool — forward to content script
  }
}

Message Routing Pattern

The service worker acts as a message bus. All communication uses chrome.runtime.sendMessage.

// Message types (shared/messages.ts)
export type ExtMessage =
  | { type: "TOOL_CALL"; name: string; args: Record<string, unknown>; tabId: number }
  | { type: "TOOL_RESULT"; name: string; result: unknown }
  | { type: "TOKEN"; token: string }
  | { type: "AGENT_DONE" }
  | { type: "AGENT_ERROR"; error: string };

// Offscreen → SW
chrome.runtime.sendMessage<ExtMessage>({
  type: "TOOL_CALL",
  name: "click_element",
  args: { selector: "#submit-btn" },
  tabId: currentTabId,
});

// SW → Content script
chrome.tabs.sendMessage<ExtMessage>(tabId, {
  type: "TOOL_CALL",
  name: "click_element",
  args: { selector: "#submit-btn" },
  tabId,
});

Model Configuration

// offscreen/model.ts — loading with transformers.js
import { pipeline, TextGenerationPipeline } from "@huggingface/transformers";

const MODEL_IDS = {
  E2B: "onnx-community/gemma-4-E2B-it-ONNX",
  E4B: "onnx-community/gemma-4-E4B-it-ONNX",
} as const;

export type ModelSize = keyof typeof MODEL_IDS;

export async function loadModel(
  size: ModelSize,
  onProgress: (progress: number) => void
): Promise<TextGenerationPipeline> {
  return pipeline("text-generation", MODEL_IDS[size], {
    dtype: "q4f16",
    device: "webgpu",
    progress_callback: (p: { progress: number }) => onProgress(p.progress),
  });
}

Settings & Persistence

Settings are stored via chrome.storage.sync:

export interface GemmaGemSettings {
  modelSize: "E2B" | "E4B";
  thinking: boolean;
  maxIterations: number;
  disabledHosts: string[];
}

const DEFAULT_SETTINGS: GemmaGemSettings = {
  modelSize: "E2B",
  thinking: false,
  maxIterations: 10,
  disabledHosts: [],
};

export async function getSettings(): Promise<GemmaGemSettings> {
  const stored = await chrome.storage.sync.get("settings");
  return { ...DEFAULT_SETTINGS, ...(stored.settings ?? {}) };
}

export async function saveSettings(patch: Partial<GemmaGemSettings>): Promise<void> {
  const current = await getSettings();
  await chrome.storage.sync.set({ settings: { ...current, ...patch } });
}

// Disable extension on current host
async function disableOnCurrentSite() {
  const host = new URL(location.href).hostname;
  const settings = await getSettings();
  if (!settings.disabledHosts.includes(host)) {
    await saveSettings({ disabledHosts: [...settings.disabledHosts, host] });
  }
}

Shadow DOM Chat UI Pattern

The content script injects a shadow DOM to isolate styles:

// content/ui.ts
export function injectChatOverlay(): ShadowRoot {
  const host = document.createElement("div");
  host.id = "gemma-gem-host";
  // Prevent page styles from leaking in
  const shadow = host.attachShadow({ mode: "closed" });

  // Inject styles
  const style = document.createElement("style");
  style.textContent = CHAT_STYLES; // imported CSS string
  shadow.appendChild(style);

  // Inject chat container
  const container = document.createElement("div");
  container.id = "gemma-gem-container";
  shadow.appendChild(container);

  document.body.appendChild(host);
  return shadow;
}

Debugging

All logs use [Gemma Gem] prefix. Development builds log info/debug/warn; production only logs errors.

# Service worker logs
chrome://extensions → Gemma Gem → "Inspect views: service worker"

# Offscreen document (most useful: model loading, prompts, tool calls)
chrome://extensions → Gemma Gem → "Inspect views: offscreen.html"

# Content script logs
DevTools on any page → Console (filter: [Gemma Gem])

# All extension contexts
chrome://inspect#other

Key things to check in offscreen document logs:

Model download progress
Full prompt construction
Token counts per turn
Raw model output (before tool call parsing)
Tool execution results

Common Patterns & Gotchas

WebGPU availability check:

if (!navigator.gpu) {
  throw new Error("WebGPU not supported. Use Chrome 113+ with hardware acceleration enabled.");
}
const adapter = await navigator.gpu.requestAdapter();
if (!adapter) throw new Error("No WebGPU adapter found.");

Offscreen document lifecycle — Chrome may suspend the offscreen document. Ping it before sending messages:

async function ensureOffscreen() {
  const existing = await chrome.offscreen.hasDocument();
  if (!existing) {
    await chrome.offscreen.createDocument({
      url: "offscreen.html",
      reasons: [chrome.offscreen.Reason.WORKERS],
      justification: "Run Gemma 4 inference via WebGPU",
    });
  }
}

Context window management — Gemma 4 supports 128K tokens but inference slows with long contexts. Clear history per-page with clear_context or limit stored turns:

const MAX_HISTORY_TURNS = 20;
function trimHistory(messages: ChatMessage[]): ChatMessage[] {
  if (messages.length <= MAX_HISTORY_TURNS * 2) return messages;
  return messages.slice(-MAX_HISTORY_TURNS * 2);
}

Tool call parsing — Gemma 4 emits tool calls in a structured format. If adding custom parsing, guard against partial/streamed JSON:

function safeParseToolCall(raw: string): { name: string; args: Record<string, unknown> } | null {
  try {
    return JSON.parse(raw);
  } catch {
    return null; // still streaming
  }
}

CSS selector safety for DOM tools:

function safeQuerySelector(selector: string): Element | null {
  try {
    return document.querySelector(selector);
  } catch {
    return null; // invalid selector from model
  }
}