Ollama Integration
Integrate Ollama for local LLM inference in TypeScript applications. Ollama provides a simple API for running language models locally.
When to Apply
Use this skill when:
-
Running LLMs locally without cloud APIs
-
Generating text or embeddings with Ollama
-
Building AI features that need to work offline
-
Implementing RAG pipelines with local models
-
Testing AI applications without API costs
Prerequisites
Install Ollama
macOS
brew install ollama
Linux
curl -fsSL https://ollama.com/install.sh | sh
Start the server
ollama serve
Pull Required Models
Embedding model (768 dimensions)
ollama pull nomic-embed-text
Chat/generation model
ollama pull mistral
Alternative models
ollama pull llama2 ollama pull codellama
OllamaClient Implementation
Complete TypeScript client for Ollama API:
// src/utils/ollama.ts import fetch from 'cross-fetch';
interface OllamaResponse { model: string; created_at: string; response: string; done: boolean; }
interface OllamaEmbeddingResponse { embedding: number[]; }
interface OllamaChatMessage { role: 'system' | 'user' | 'assistant'; content: string; }
interface OllamaChatResponse { model: string; created_at: string; message: OllamaChatMessage; done: boolean; }
export class OllamaClient { private baseUrl: string;
constructor(baseUrl?: string) { this.baseUrl = baseUrl || process.env.OLLAMA_HOST || 'http://localhost:11434'; }
// Generate embeddings for text
async generateEmbedding(
text: string,
model: string = 'nomic-embed-text'
): Promise<number[]> {
const response = await fetch(${this.baseUrl}/api/embeddings, {
method: 'POST',
headers: { 'Content-Type': 'application/json' },
body: JSON.stringify({ model, prompt: text }),
});
if (!response.ok) {
throw new Error(`Failed to generate embedding: ${response.statusText}`);
}
const data: OllamaEmbeddingResponse = await response.json();
return data.embedding;
}
// Generate text response (non-streaming)
async generateResponse(
prompt: string,
context?: string,
model: string = 'mistral'
): Promise<string> {
const fullPrompt = context
? Context: ${context}\n\nQuestion: ${prompt}\n\nAnswer:
: prompt;
const response = await fetch(`${this.baseUrl}/api/generate`, {
method: 'POST',
headers: { 'Content-Type': 'application/json' },
body: JSON.stringify({
model,
prompt: fullPrompt,
stream: false,
}),
});
if (!response.ok) {
throw new Error(`Failed to generate response: ${response.statusText}`);
}
const data: OllamaResponse = await response.json();
return data.response;
}
// Generate text response (streaming)
async generateStreamingResponse(
prompt: string,
onChunk: (chunk: string) => void,
context?: string,
model: string = 'mistral'
): Promise<void> {
const fullPrompt = context
? Context: ${context}\n\nQuestion: ${prompt}\n\nAnswer:
: prompt;
const response = await fetch(`${this.baseUrl}/api/generate`, {
method: 'POST',
headers: { 'Content-Type': 'application/json' },
body: JSON.stringify({
model,
prompt: fullPrompt,
stream: true,
}),
});
if (!response.ok) {
throw new Error(`Failed to generate streaming response: ${response.statusText}`);
}
const reader = response.body?.getReader();
if (!reader) {
throw new Error('Failed to get response reader');
}
const decoder = new TextDecoder();
while (true) {
const { done, value } = await reader.read();
if (done) break;
const chunk = decoder.decode(value);
const lines = chunk.split('\n').filter(Boolean);
for (const line of lines) {
try {
const data: OllamaResponse = JSON.parse(line);
if (data.response) {
onChunk(data.response);
}
} catch {
// Skip malformed JSON lines
}
}
}
}
// Chat completion API
async chat(
messages: OllamaChatMessage[],
model: string = 'mistral'
): Promise<string> {
const response = await fetch(${this.baseUrl}/api/chat, {
method: 'POST',
headers: { 'Content-Type': 'application/json' },
body: JSON.stringify({
model,
messages,
stream: false,
}),
});
if (!response.ok) {
throw new Error(`Failed to chat: ${response.statusText}`);
}
const data: OllamaChatResponse = await response.json();
return data.message.content;
}
// List available models
async listModels(): Promise<string[]> {
const response = await fetch(${this.baseUrl}/api/tags);
if (!response.ok) {
throw new Error(`Failed to list models: ${response.statusText}`);
}
const data = await response.json();
return data.models?.map((m: { name: string }) => m.name) || [];
}
// Check if Ollama is running
async isHealthy(): Promise<boolean> {
try {
const response = await fetch(${this.baseUrl}/api/tags);
return response.ok;
} catch {
return false;
}
}
}
API Endpoints
Endpoint Method Purpose
/api/embeddings
POST Generate embeddings
/api/generate
POST Generate text completion
/api/chat
POST Chat completion
/api/tags
GET List available models
/api/pull
POST Pull a model
Usage Examples
Basic Text Generation
const ollama = new OllamaClient();
const response = await ollama.generateResponse( 'Explain machine learning in simple terms' ); console.log(response);
With Context (RAG)
const context = 'Our company was founded in 2020 and has 50 employees.'; const question = 'When was the company founded?';
const response = await ollama.generateResponse(question, context); // "Based on the context, the company was founded in 2020."
Streaming Response
await ollama.generateStreamingResponse( 'Write a short poem about coding', (chunk) => process.stdout.write(chunk) );
Chat Conversation
const messages = [ { role: 'system', content: 'You are a helpful coding assistant.' }, { role: 'user', content: 'How do I reverse a string in JavaScript?' }, ];
const response = await ollama.chat(messages); console.log(response);
Generate Embeddings
const text = 'Machine learning is a subset of artificial intelligence.'; const embedding = await ollama.generateEmbedding(text);
console.log(Embedding dimensions: ${embedding.length}); // 768 for nomic-embed-text
Model Selection
Embedding Models
Model Dimensions Speed Quality
nomic-embed-text
768 Fast Good
mxbai-embed-large
1024 Medium Better
all-minilm
384 Very Fast Acceptable
Generation Models
Model Size Speed Use Case
mistral
7B Fast General purpose
llama2
7B Fast General purpose
codellama
7B Fast Code generation
mixtral
8x7B Slow High quality
Environment Configuration
Default Ollama host
export OLLAMA_HOST=http://localhost:11434
For Docker/CI environments
export OLLAMA_HOST=http://ollama:11434
Testing with Ollama
import { OllamaClient } from '../src/utils/ollama';
let ollama: OllamaClient;
beforeAll(() => { ollama = new OllamaClient(); });
test('should generate embedding', async () => { const embedding = await ollama.generateEmbedding('test text'); expect(embedding).toHaveLength(768); expect(embedding.every(n => typeof n === 'number')).toBe(true); });
test('should generate response', async () => { const response = await ollama.generateResponse('Say hello'); expect(response).toBeTruthy(); expect(typeof response).toBe('string'); });
CI/CD Integration
In GitHub Actions, use the Ollama service container:
services: ollama: image: ollama/ollama:latest ports: - 11434:11434
env: OLLAMA_HOST: http://ollama:11434
steps:
- name: Pull models
run: |
wget -q -O - --post-data='{"name": "nomic-embed-text"}'
--header='Content-Type: application/json'
http://ollama:11434/api/pull wget -q -O - --post-data='{"name": "mistral"}'
--header='Content-Type: application/json'
http://ollama:11434/api/pull
Error Handling
async function safeGenerate(prompt: string): Promise<string | null> { const ollama = new OllamaClient();
// Check if Ollama is running if (!await ollama.isHealthy()) { console.error('Ollama is not running'); return null; }
try { return await ollama.generateResponse(prompt); } catch (error) { console.error('Generation failed:', error); return null; } }
Troubleshooting
Issue Solution
"Connection refused" Start Ollama: ollama serve
"Model not found" Pull model: ollama pull <model>
Slow responses Use smaller model or reduce prompt length
Out of memory Use quantized model or smaller context
Timeout errors Increase timeout or use streaming
Package Dependencies
{ "dependencies": { "cross-fetch": "^4.1.0" } }
References
-
Related skill: rag-pipeline for complete RAG implementation
-
Related skill: pgvector-embeddings for storing embeddings
-
Related skill: github-workflows-ollama for CI/CD setup
-
Ollama documentation
-
Ollama API reference