You have a local model running. Ollama, llama.cpp, vLLM -- doesn't matter. It answers prompts on localhost. Now you want other parts of your system to send it work and get results back. The usual next step is to reach for a framework: LangChain, CrewAI, AutoGen. That's a lot of machinery for what is fundamentally a messaging problem.
This tutorial skips the framework. You'll wire up your local model as a NoLag worker agent in under 5 minutes. The model handles inference. NoLag handles the transport -- getting tasks to the model and results back to whoever asked.
What You'll Build
A local Ollama worker that:
- Connects to NoLag and advertises its capabilities via presence
- Picks up tasks dispatched by any other agent in the system
- Calls your local model and sends results back
- Shows up in the NoLag portal dashboard so you can see it's alive
Then you'll add a second model with different capabilities and watch NoLag route tasks to the right worker automatically. Zero changes to the dispatching code.
Prerequisites
- Ollama installed --
curl -fsSL https://ollama.com/install.sh | sh - A model pulled --
ollama pull llama3.2 - A NoLag account -- create an app in the portal and grab an actor token
- Node.js 18+
Step 1: Create the Worker
Install the dependencies:
npm install @nolag/agents ollamaThe worker connects to NoLag, declares what it can do, and listens for matching tasks. When a task arrives, it calls Ollama and sends the result back.
import { NoLagAgents, Handoff } from "@nolag/agents";
import { Ollama } from "ollama";
const ollama = new Ollama({ host: "http://localhost:11434" });
const worker = new NoLagAgents(TOKEN, {
appName: "my-agents-app",
agentId: "llama-worker",
presence: {
name: "llama-worker",
role: "agent",
capabilities: ["summarize", "classify"],
metadata: { model: "llama3.2", provider: "local" },
},
});
await worker.connect();
const room = worker.room("tasks");
const handoff = new Handoff(room);
handoff.onTask(["summarize", "classify"], async (task, respond) => {
const response = await ollama.chat({
model: "llama3.2",
messages: [
{ role: "system", content: `You are a ${task.type} agent.` },
{ role: "user", content: task.payload.text },
],
});
respond("success", {
result: response.message.content,
model: "llama3.2",
latencyMs: Date.now() - task.createdAt,
});
});
console.log("Worker online. Waiting for tasks..."); That's the entire worker. No routing logic, no queue management, no health checks. The presence block tells the system what this worker can do. The onTask callback handles incoming work. NoLag manages everything in between.
Step 2: Dispatch a Task
From any other process -- a backend service, a CLI script, another agent -- dispatch a task by capability:
import { NoLagAgents, Handoff } from "@nolag/agents";
const orchestrator = new NoLagAgents(TOKEN, {
appName: "my-agents-app",
agentId: "orchestrator",
presence: { name: "orchestrator", role: "orchestrator" },
});
await orchestrator.connect();
const room = orchestrator.room("tasks");
const handoff = new Handoff(room);
// Check who's online
const workers = room.findAgents("summarize");
console.log(`${workers.length} summarize worker(s) online`);
// Dispatch a task -- NoLag routes it to a capable worker
const result = await handoff.dispatch("summarize", {
text: "NoLag is a real-time messaging infrastructure..."
}, {
waitForResult: true,
timeout: 30000,
});
console.log("Result:", result.payload.result);
console.log("Model:", result.payload.model);
console.log("Latency:", result.payload.latencyMs, "ms"); The orchestrator doesn't import ollama. It doesn't know the worker runs Llama. It dispatches a "summarize" task and gets a result. If you later swap the worker to use Mistral, or move it to a GPU server across the network, the dispatching code doesn't change.
Step 3: Add a Second Model
Pull another model and spin up a second worker with different capabilities:
ollama pull mistralimport { NoLagAgents, Handoff } from "@nolag/agents";
import { Ollama } from "ollama";
const ollama = new Ollama({ host: "http://localhost:11434" });
const worker = new NoLagAgents(TOKEN, {
appName: "my-agents-app",
agentId: "mistral-worker",
presence: {
name: "mistral-worker",
role: "agent",
capabilities: ["extract-entities", "translate"],
metadata: { model: "mistral", provider: "local" },
},
});
await worker.connect();
const room = worker.room("tasks");
const handoff = new Handoff(room);
handoff.onTask(["extract-entities", "translate"], async (task, respond) => {
const response = await ollama.chat({
model: "mistral",
messages: [
{ role: "system", content: `You are an ${task.type} agent.` },
{ role: "user", content: task.payload.text },
],
});
respond("success", {
result: response.message.content,
model: "mistral",
latencyMs: Date.now() - task.createdAt,
});
});Now dispatch tasks that route to each model automatically:
// The orchestrator doesn't change.
// Dispatch by capability -- NoLag routes to the right worker.
const summary = await handoff.dispatch("summarize", {
text: document
}, { waitForResult: true });
// -> routed to llama-worker
const entities = await handoff.dispatch("extract-entities", {
text: document
}, { waitForResult: true });
// -> routed to mistral-worker
const translation = await handoff.dispatch("translate", {
text: summary.payload.result
}, { waitForResult: true });
// -> routed to mistral-worker
// All three tasks ran on local models.
// No API keys. No egress. No per-token billing.The orchestrator dispatches by what needs doing, not by which model does it. Add a third model, remove one, restart a worker mid-task -- the system adapts because routing is based on presence, not hardcoded addresses.
What You Get for Free
Presence
Every connected worker shows up in real time. You always know which models are online and what they can do.
// See every connected agent in real time
room.onPresenceChange((agents) => {
for (const agent of agents) {
console.log(`${agent.name} [${agent.status}]`);
console.log(` capabilities: ${agent.capabilities.join(", ")}`);
console.log(` model: ${agent.metadata?.model}`);
}
});
// Output:
// llama-worker [online]
// capabilities: summarize, classify
// model: llama3.2
// mistral-worker [online]
// capabilities: extract-entities, translate
// model: mistral
// orchestrator [online]
// capabilities: (none)The NoLag portal dashboard shows this same view graphically -- no custom monitoring needed.
Reconnection
If a worker disconnects (machine reboots, network blip, Ollama restart), the SDK reconnects automatically. Tasks dispatched while the worker was offline are held in the queue and delivered when it comes back. No messages lost.
Load Balancing
Run two instances of the same worker on two machines. NoLag distributes tasks across them with load-balanced subscriptions. Scale by adding processes, not by changing code.
Observability
Every task dispatch and result flows through NoLag's event system. The portal's Agents tab shows the topology: which agents are connected, task flow between them, and aggregate metrics. Attach an observer agent to log everything to your own system if you need it.
Why Not a Framework?
Frameworks like LangChain and CrewAI solve a real problem: making it easy to build agent logic. But they also own the transport. Your agents communicate through function calls inside a single process. That means:
- Agents can't run on different machines
- You can't scale workers independently
- If the coordinator crashes, everything stops
- Adding a human approval step means hacking the framework's internals
NoLag doesn't replace your agent logic. It replaces the wiring between agents. Use whatever prompt engineering, RAG pipeline, or chain-of-thought approach you want inside your worker. NoLag just makes sure tasks get to the right worker and results get back.
Next Steps
- Add a cloud model -- see Hybrid LLM Coordination for mixing local and proprietary models
- Add human approval -- gate sensitive actions with the Approve pattern before they execute
- Share state -- use the Blackboard pattern to track model performance metrics across all workers
- Read the docs -- @nolag/agents getting started guide
Your local model is the smart part. NoLag is the dumb pipe that makes it reachable. That's the whole idea.