Skip to main content
The Action Library ships with built-in general purpose actions and first-party integrations for popular services. Everything is managed from the Actions page in the sidebar.

Built-in Action Sets

Built-in actions are organized into collapsible sections, each with its own enable/disable toggle.

Core Actions

These are always loaded into every agent session (unless disabled). They cover fundamental operations:
ActionCategoryDescription
ReadFilesystemRead files and directories
EditFilesystemMake targeted edits using search and replace
WriteFilesystemCreate new files or overwrite existing ones
BashSystemExecute shell commands in a terminal
GlobSearchFind files matching glob and wildcard patterns
GrepSearchSearch file contents using regular expressions
AskUserQuestionInteractionAsk the user a question and wait for a response

Extended Actions (On-Demand)

Extended actions are deferred — they aren’t loaded at session start. Instead, the agent discovers them via ToolSearch when it needs them. This keeps the base tool set lean.
ActionCategoryDescription
WebSearchSearchSearch the web for real-time information
WebFetchSearchFetch and read content from a URL
NotebookEditFilesystemEdit Jupyter notebook cells
TodoWritePlanningWrite and manage a structured todo list
EnterPlanModePlanningEnter plan mode for designing solutions
ExitPlanModePlanningExit plan mode and return to execution
EnterWorktreeSystemEnter a git worktree for isolated work
TaskOutputSystemRead output from a background task
TaskStopSystemStop a running background task
CronCreateSchedulingCreate a scheduled or recurring task
CronListSchedulingList all scheduled tasks
CronDeleteSchedulingDelete a scheduled task
RenderOutputViewsRender a reusable View artifact with structured input data

Browser Actions

Browser automation is split into two layers: Delegation layer — what the main agent calls:
ActionDescription
CreateBrowserAgentCreate a new browser and run a task on it
BrowserAgentDelegate a task to an existing browser agent
BrowserAgentsRun multiple browser tasks in parallel
Action layer — what the browser sub-agent executes:
ActionDescription
BrowserScreenshotCapture a screenshot of the page
BrowserNavigateNavigate to a URL
BrowserClickClick an element by CSS selector
BrowserTypeType text into an input element
BrowserEvaluateExecute JavaScript in the browser
BrowserGetTextGet visible text content of the page
BrowserGetElementsList interactive elements with CSS selectors
BrowserScrollScroll the page up or down
BrowserWaitWait for page loads or animations
The main agent never calls browser action tools directly. It delegates via CreateBrowserAgent or BrowserAgent, and a sub-agent handles the low-level browser interactions autonomously.

Apps

If you’ve created Apps, they appear here as an additional action set. Each App is exposed as a RenderOutput call that the agent can invoke to display data. Views have their own per-item permission toggles.

Enabling and Disabling Sections

Each action set has a toggle switch in its header. Disabling a section sets all of its actions to deny, which means agents cannot use any of them. Re-enabling restores them to always_allow. You can also control permissions at a more granular level — see Permissions.

First-Party Integrations

Integrations are pre-configured connections for popular services. They appear in their own section of the Action Library.

Google Workspace

1

Enable the integration

Toggle Google Workspace on in the integrations section.
2

Connect your account

Click Connect Google. A popup opens the Google OAuth consent screen. Sign in and grant the requested scopes.
3

Actions are discovered automatically

After OAuth completes, OpenSwarm populates all available actions (Docs, Sheets, Slides, Calendar, Gmail, Drive, etc.) with their permission controls.
Once connected, the integration shows the connected account email (e.g., you@gmail.com) and you can disconnect or switch accounts at any time. Token refresh is handled automatically. When an access token expires, OpenSwarm uses the stored refresh token to obtain a new one before the next agent session starts. If refresh fails, you’ll be prompted to reconnect.

Twitter / X (via xbird)

1

Enable the integration

Toggle xbird on. This installs the xbird MCP server (bunx @checkra1n/xbird).
2

Provide credentials

Click Connect and enter your auth_token and ct0 cookie values from x.com. To find these: open x.com in your browser, press F12, go to Application → Cookies → x.com, and copy the values.
3

Actions are discovered

OpenSwarm syncs credentials to ~/.config/xbird/config.json and discovers all available actions (search tweets, read profiles, post, like, follow, etc.).
The connected account handle (e.g., @yourhandle) is displayed after successful connection.

Reddit

Reddit requires no authentication. Actions (browse subreddits, search posts, get post details, user analysis) are discovered immediately.

Disconnecting an Integration

For OAuth integrations (Google Workspace): clicking Disconnect revokes the token on Google’s side and clears the stored tokens locally. You can then reconnect with a different account. For credential-based integrations (xbird): disconnecting clears the stored credentials and removes them from the external config file. Disabling an integration (toggling it off) does not disconnect it — it just prevents agents from using it. Your credentials and connection remain intact so you can re-enable without re-authenticating.