Built-in Actions - Open Swarm

The Action Library ships with built-in general purpose actions and first-party integrations for popular services. Everything is managed from the Actions page in the sidebar.

Built-in Action Sets

Built-in actions are organized into collapsible sections, each with its own enable/disable toggle.

Core Actions

These are always loaded into every agent session (unless disabled). They cover fundamental operations:

Action	Category	Description
`Read`	Filesystem	Read files and directories
`Edit`	Filesystem	Make targeted edits using search and replace
`Write`	Filesystem	Create new files or overwrite existing ones
`Bash`	System	Execute shell commands in a terminal
`Glob`	Search	Find files matching glob and wildcard patterns
`Grep`	Search	Search file contents using regular expressions
`AskUserQuestion`	Interaction	Ask the user a question and wait for a response

Extended Actions (On-Demand)

Extended actions are deferred — they aren’t loaded at session start. Instead, the agent discovers them via ToolSearch when it needs them. This keeps the base tool set lean.

Action	Category	Description
`WebSearch`	Search	Search the web for real-time information
`WebFetch`	Search	Fetch and read content from a URL
`NotebookEdit`	Filesystem	Edit Jupyter notebook cells
`TodoWrite`	Planning	Write and manage a structured todo list
`EnterPlanMode`	Planning	Enter plan mode for designing solutions
`ExitPlanMode`	Planning	Exit plan mode and return to execution
`EnterWorktree`	System	Enter a git worktree for isolated work
`TaskOutput`	System	Read output from a background task
`TaskStop`	System	Stop a running background task
`CronCreate`	Scheduling	Create a scheduled or recurring task
`CronList`	Scheduling	List all scheduled tasks
`CronDelete`	Scheduling	Delete a scheduled task
`RenderOutput`	Views	Render a reusable View artifact with structured input data

Browser Actions

Browser automation is split into two layers: Delegation layer — what the main agent calls:

Action	Description
`CreateBrowserAgent`	Create a new browser and run a task on it
`BrowserAgent`	Delegate a task to an existing browser agent
`BrowserAgents`	Run multiple browser tasks in parallel

Action layer — what the browser sub-agent executes:

Action	Description
`BrowserScreenshot`	Capture a screenshot of the page
`BrowserNavigate`	Navigate to a URL
`BrowserClick`	Click an element by CSS selector
`BrowserType`	Type text into an input element
`BrowserEvaluate`	Execute JavaScript in the browser
`BrowserGetText`	Get visible text content of the page
`BrowserGetElements`	List interactive elements with CSS selectors
`BrowserScroll`	Scroll the page up or down
`BrowserWait`	Wait for page loads or animations

The main agent never calls browser action tools directly. It delegates via CreateBrowserAgent or BrowserAgent, and a sub-agent handles the low-level browser interactions autonomously.

Apps

If you’ve created Apps, they appear here as an additional action set. Each App is exposed as a RenderOutput call that the agent can invoke to display data. Views have their own per-item permission toggles.

Enabling and Disabling Sections

Each action set has a toggle switch in its header. Disabling a section sets all of its actions to deny, which means agents cannot use any of them. Re-enabling restores them to always_allow. You can also control permissions at a more granular level — see Permissions.

First-Party Integrations

Integrations are pre-configured connections for popular services. They appear in their own section of the Action Library.

Google Workspace

Enable the integration

Toggle Google Workspace on in the integrations section.

Connect your account

Click Connect Google. A popup opens the Google OAuth consent screen. Sign in and grant the requested scopes.

Actions are discovered automatically

After OAuth completes, OpenSwarm populates all available actions (Docs, Sheets, Slides, Calendar, Gmail, Drive, etc.) with their permission controls.

Once connected, the integration shows the connected account email (e.g., you@gmail.com) and you can disconnect or switch accounts at any time. Token refresh is handled automatically. When an access token expires, OpenSwarm uses the stored refresh token to obtain a new one before the next agent session starts. If refresh fails, you’ll be prompted to reconnect.

Twitter / X (via xbird)

Enable the integration

Toggle xbird on. This installs the xbird MCP server (bunx @checkra1n/xbird).

Provide credentials

Click Connect and enter your auth_token and ct0 cookie values from x.com. To find these: open x.com in your browser, press F12, go to Application → Cookies → x.com, and copy the values.

Actions are discovered

OpenSwarm syncs credentials to ~/.config/xbird/config.json and discovers all available actions (search tweets, read profiles, post, like, follow, etc.).

The connected account handle (e.g., @yourhandle) is displayed after successful connection.

Reddit requires no authentication. Actions (browse subreddits, search posts, get post details, user analysis) are discovered immediately.

Disconnecting an Integration

For OAuth integrations (Google Workspace): clicking Disconnect revokes the token on Google’s side and clears the stored tokens locally. You can then reconnect with a different account. For credential-based integrations (xbird): disconnecting clears the stored credentials and removes them from the external config file. Disabling an integration (toggling it off) does not disconnect it — it just prevents agents from using it. Your credentials and connection remain intact so you can re-enable without re-authenticating.

Documentation Index

​Built-in Action Sets

​Core Actions

​Extended Actions (On-Demand)

​Browser Actions

​Apps

​Enabling and Disabling Sections

​First-Party Integrations

​Google Workspace

​Twitter / X (via xbird)

​Reddit

​Disconnecting an Integration

Built-in Action Sets

Core Actions

Extended Actions (On-Demand)

Browser Actions

Apps

Enabling and Disabling Sections

First-Party Integrations

Google Workspace

Twitter / X (via xbird)

Reddit

Disconnecting an Integration