Skip to main content
An agent in OpenSwarm can do anything you can do on your computer — read and write files, run commands, search the web, control a browser, send emails, manage your calendar — and handle long-running, multi-step tasks autonomously. Think of each agent as a teammate you can brief on a task and let loose, while you watch it work in real time.

What makes an agent different from a chatbot

A regular chatbot takes a prompt and returns text. An OpenSwarm agent takes a prompt and acts on it — it plans, uses tools, reads results, adapts, and keeps going until the job is done or it needs your input. Agents can:
  • Read, create, and modify files across your entire filesystem
  • Browse the web via a real browser — navigating, clicking, filling forms
  • Use integrations — Gmail, Google Calendar, Drive, Sheets, X/Twitter, Reddit, and any custom integrations you connect
  • Spawn sub-agents to parallelize work or delegate specialized tasks
  • Generate interactive apps that render live on the dashboard

Agents in the context of OpenSwarm

OpenSwarm isn’t about running one agent — it’s about running many at once. Each agent lives as a card in a workspace canvas. You can have infinite agents working simultaneously. The workspace gives you a bird’s-eye view of everything that’s happening.

The swarm model

  1. You launch an agent from the toolbar and give it a task
  2. That agent can spawn sub-agents, browsers, and execute actions as needed.
  3. Permissions are set globally so you control what exactly an agent can or can’t do.
  4. When human approval or insight is needed, a popup appears within that agent’s chat.
In the center of the Open Swarm application’s header, you can also find the floating island bubble. This gives you a quick overview of all agent states. If human input is needed, this floating island will morph into an input field that you can interact with.

Modes

Agents run in a mode that controls their behavior — which tools are available, what system prompt they use, and how they approach tasks. OpenSwarm ships with built-in modes:
  • Agent — Full access to all tools. The default general-purpose mode.
  • Ask — Conversational only, no tool use. For quick questions.
  • Plan — Thinks through a plan before acting. Read-only tools only.
  • App Builder — Specialized for generating interactive Views.
  • Skill Builder — Specialized for creating reusable skills.
You can also create custom modes with your own system prompts and tool restrictions. Switch modes at any time from the chat input.
99% of the time, the default “Agent” mode is all you need. Modes are helpful when you want holistic, re-usable, presets for how agents behave.

Models

Each agent is powered by an Ai Model which you can select. Here is a brief overview of the models:
ModelBest for
Sonnet 4.6Fast, capable, good default for most tasks
Opus 4.6Most capable, best for complex reasoning and long-horizon tasks
Haiku 3.5Fastest and cheapest, good for simple tasks
You can switch models mid-conversation — the new model takes effect on the next message.

Integration approvals

When an agent wants to use an integration, it may need your permission depending on how that integration is configured:
  • Allow — runs automatically, no approval needed
  • Ask — pauses and shows you exactly what it’s about to do (the command, the file, the API call) so you can approve or deny
  • Deny — blocked entirely
You can set these permissions in the Integrations page.

Cost tracking

Every agent session tracks token usage and cost in real time. A context window ring in the chat input shows how much of the context window has been consumed.