# The Agentic OS Protocol

> The standard for multi-agent systems. Define agents, coordinate workflows, and build systems where agents work together.

---

# Apps


<Callout type="warn">
  This feature is experimental. The App schema and distribution manifest format
  have no real implementation yet and are subject to change.
</Callout>

## Overview

An app is a distribution of the Agentic OS — a manifest that declares which vendors implement which protocol interfaces, forming a complete agentic system configuration. Think of it like `package.json` for Node.js, `docker-compose.yml` for containers, or `vercel.json` for deployments: a single file that describes how all the pieces fit together.

The format is YAML frontmatter for the structured, machine-readable manifest, combined with a free-form markdown body for human-readable documentation. The filename is up to the author — `SYNER.md`, `APP.md`, `ACADEMY.md`, or anything else. The format is what matters.

## Distribution File

An app file combines a YAML frontmatter manifest with a markdown documentation body:

```yaml
---
name: '@synerops/syner-os'
version: '1.0.0'
description: 'Syner OS — AI-powered development platform'
protocol: '0.1.0'
providers:
  system:
    sandbox: { provider: '@vercel/sandbox' }
    fs: { provider: '@vercel/blob' }
  context:
    embeddings: { provider: '@upstash/vector' }
---

# Syner OS

Syner OS is an AI-powered development platform...
```

The frontmatter declares the distribution metadata and provider bindings. The markdown body is free-form — documentation, usage instructions, architecture notes, or anything else relevant to the distribution.

## TypeScript API

Import types from `@osprotocol/schema/apps`:

```ts
import type { App, AppMetadata, ProviderMap, ProviderEntry } from '@osprotocol/schema/apps'
```

### ProviderEntry

A single vendor binding for a protocol interface:

```ts
interface ProviderEntry {
  provider: string
  version?: string
  enabled?: boolean
  metadata?: Record<string, unknown>
}
```

| Field      | Type                      | Description                                                 |
| ---------- | ------------------------- | ----------------------------------------------------------- |
| `provider` | `string`                  | Package name or identifier of the vendor implementation     |
| `version`  | `string`                  | Optional semver range for the provider                      |
| `enabled`  | `boolean`                 | Whether this binding is active. Defaults to true if omitted |
| `metadata` | `Record<string, unknown>` | Arbitrary provider-specific configuration                   |

### ProviderMap

Maps protocol domains to their provider bindings, one entry per interface:

```ts
interface ProviderMap {
  system?: Record<string, ProviderEntry>
  context?: Record<string, ProviderEntry>
  actions?: Record<string, ProviderEntry>
  checks?: Record<string, ProviderEntry>
}
```

Each key within a domain corresponds to a specific protocol interface (e.g., `sandbox`, `fs`, `embeddings`), not the domain as a whole. Granularity is per interface.

### AppMetadata

The structured frontmatter of a distribution file:

```ts
interface AppMetadata {
  name: string
  version: string
  description?: string
  protocol?: string
  providers?: ProviderMap
  metadata?: Record<string, unknown>
}
```

| Field         | Type                      | Description                                              |
| ------------- | ------------------------- | -------------------------------------------------------- |
| `name`        | `string`                  | Distribution name, typically a scoped package identifier |
| `version`     | `string`                  | Semver version of this distribution                      |
| `description` | `string`                  | Human-readable description                               |
| `protocol`    | `string`                  | OS Protocol spec version this distribution targets       |
| `providers`   | `ProviderMap`             | Vendor bindings per domain and interface                 |
| `metadata`    | `Record<string, unknown>` | Arbitrary additional metadata                            |

### App

The parsed representation of a full distribution file:

```ts
interface App {
  metadata: AppMetadata
  content: string
  path: string
}
```

| Field      | Type          | Description                              |
| ---------- | ------------- | ---------------------------------------- |
| `metadata` | `AppMetadata` | Parsed frontmatter                       |
| `content`  | `string`      | Raw markdown body after the frontmatter  |
| `path`     | `string`      | Filesystem path to the distribution file |

## Provider Bindings

The `providers` field in the frontmatter maps each protocol domain's interfaces to concrete vendor implementations. Each domain (`system`, `context`, `actions`, `checks`) contains a record where each key is a specific interface name and each value is a `ProviderEntry`.

```yaml
providers:
  system:
    env:
      provider: '@vercel/env'
    sandbox:
      provider: '@vercel/sandbox'
      version: '^1.0.0'
  context:
    embeddings: { provider: '@upstash/vector' }
  checks:
    screenshot: { provider: 'playwright' }
    judge: { provider: '@braintrust/judge' }
```

This declaration says: for the `system` domain, use `@vercel/env` for environment access and `@vercel/sandbox` for sandboxed execution; for `context`, use `@upstash/vector` for embeddings; for `checks`, use Playwright for screenshots and Braintrust for LLM-as-judge evaluation.

Provider bindings are resolved at runtime by the OS Protocol host. Interfaces with no binding fall back to any default registered by the host environment.

## Usage Examples

### Load an app manifest

```ts
import { readFileSync } from 'fs'
import matter from 'gray-matter'
import type { App, AppMetadata } from '@osprotocol/schema/apps'

function loadApp(filePath: string): App {
  const raw = readFileSync(filePath, 'utf-8')
  const { data, content } = matter(raw)
  return {
    metadata: data as AppMetadata,
    content,
    path: filePath,
  }
}

const app = loadApp('./SYNER.md')
console.log(app.metadata.name)    // '@synerops/syner-os'
console.log(app.metadata.version) // '1.0.0'
```

### Validate provider bindings

```ts
import type { ProviderMap } from '@osprotocol/schema/apps'

function getProviderForInterface(
  providers: ProviderMap,
  domain: keyof ProviderMap,
  interfaceName: string
): string | undefined {
  return providers[domain]?.[interfaceName]?.provider
}

const sandboxProvider = getProviderForInterface(
  app.metadata.providers ?? {},
  'system',
  'sandbox'
)
// '@vercel/sandbox'
```

### Compare two distributions

```ts
import type { App } from '@osprotocol/schema/apps'

function diffProviders(a: App, b: App): string[] {
  const differences: string[] = []
  const domains = ['system', 'context', 'actions', 'checks'] as const

  for (const domain of domains) {
    const aBindings = a.metadata.providers?.[domain] ?? {}
    const bBindings = b.metadata.providers?.[domain] ?? {}
    const allInterfaces = new Set([...Object.keys(aBindings), ...Object.keys(bBindings)])

    for (const iface of allInterfaces) {
      const aProvider = aBindings[iface]?.provider
      const bProvider = bBindings[iface]?.provider
      if (aProvider !== bProvider) {
        differences.push(`${domain}.${iface}: ${aProvider ?? 'none'} → ${bProvider ?? 'none'}`)
      }
    }
  }

  return differences
}
```

## Cross-References

The app manifest format is analogous to other ecosystem configuration files:

| Format                       | Ecosystem         | Defines                                                |
| ---------------------------- | ----------------- | ------------------------------------------------------ |
| `package.json`               | Node.js / npm     | Package dependencies and scripts                       |
| `docker-compose.yml`         | Docker            | Service definitions and bindings                       |
| `Chart.yaml`                 | Helm / Kubernetes | Chart metadata and dependencies                        |
| `claude_desktop_config.json` | Claude Desktop    | MCP server registrations                               |
| `mcp.json` (Cursor)          | Cursor            | MCP server registrations for Cursor                    |
| `.vscode/mcp.json`           | VS Code           | MCP server registrations for VS Code                   |
| `vercel.json`                | Vercel            | Deployment configuration and routing                   |
| App manifest (`*.md`)        | OS Protocol       | Agentic OS provider bindings and distribution metadata |

The app manifest occupies the same role in the Agentic OS ecosystem that these files occupy in their respective ecosystems: a single source of truth for how a system is configured and what implements each capability.

## Integration

Provider bindings in an app manifest map to the following protocol domains:

* [System](/docs/system) — environment, sandboxing, storage, file system interfaces
* [Context](/docs/context) — memory, embeddings, and knowledge retrieval interfaces
* [Actions](/docs/actions) — external integrations and side-effect interfaces
* [Checks](/docs/checks) — validation, evaluation, and quality assurance interfaces


---

# Architecture


## Overview

The Agentic OS Protocol (OSP) is built on a modular architecture that separates concerns and enables flexible composition of agent systems.

## Philosophy and Foundations

OSP doesn't create everything from scratch. Instead, it adapts proven philosophies and patterns that have demonstrated effectiveness in production environments, combined with our own contributions and experience in infrastructure design.

### Influences and Adaptations

OSP draws inspiration from several key sources:

* **[Anthropic: Building Effective Agents](https://www.anthropic.com/engineering/building-effective-agents)**: Workflow patterns (Routing, Prompt Chaining, Orchestrator-Workers, Parallelization, Evaluator-Optimizer) form the foundation of our workflow taxonomy.
* **[Agent Communication Protocol](https://agentcommunicationprotocol.dev/core-concepts/agent-run-lifecycle)**: The concept of **Runs**—essential for multi-agent systems—provides the lifecycle management framework.
* **[Claude Agent SDK](https://www.anthropic.com/engineering/building-agents-with-the-claude-agent-sdk)**: The **Agent Loop** execution pattern (Gather Context → Take Action → Verify Work → Iterate) defines our core cognitive cycle.

These patterns have been adapted and extended to work together in a unified protocol specification that emphasizes interoperability, scalability, and system-level orchestration.

### The Operating System Concept

Where OSP contributes uniquely is in the **Operating System** abstraction—a layer that provides system intelligence through standardized APIs. This OS layer manages the lifecycle, coordination, and resource management that multi-agent systems require.

> **Understanding the Metaphor**: Just as traditional operating systems abstract hardware resources (CPU, memory, disk), an Agentic OS abstracts cognitive resources (inference, context, knowledge, tools). The [Agentic OS concept](/docs/concepts/agentic-os) defines this paradigm—OSP is the protocol specification that implementations follow.

Just as traditional operating systems provide process management, memory management, and I/O interfaces, OSP's Operating System provides:

* **Agent Registry**: Discovery and capability management
* **System APIs**: Environment, Filesystem, Settings, Sandbox
* **Context Facades**: System Context, Embeddings, Key-Value
* **Quality Assurance**: Rules, Audit, Judge, Screenshot

## Architecture Layers

The protocol architecture is organized into distinct domains:

### 1. System (Infrastructure)

The OS layer provides infrastructure services that all agents depend on. Unlike traditional protocols that focus solely on agent-to-agent communication, OSP includes system-level intelligence:

* **Registry**: Manages agent registration, discovery, and capability matching
* **Environment**: Handles configuration and environment variable management
* **Filesystem**: Provides standardized file system operation interfaces
* **Sandbox**: Isolated execution environments for untrusted code
* **Settings**: Manages system-level configurations
* **Preferences**: User-level preferences and personalization
* **Installer**: Package and dependency management
* **MCP Client**: Integrates with Model Context Protocol for external tool access

### 2. Context & Actions (Read/Write Facades)

OSP separates read-only information gathering from state-changing operations. This mirrors the Agent Loop phases: **gather** context, then **take** action.

**Context** (read-only facades for the gather phase):

* **System Context**: Aggregated read-only view of system state (environment, settings, filesystem metadata)
* **Embeddings**: Vector database integration for semantic search
* **Key-Value**: Lightweight key-value storage for agent state

**Actions** (write facades for the act phase):

* **System Actions**: Aggregated write operations (filesystem writes, setting changes)
* **Tools**: Tool registration and invocation interfaces
* **MCP Servers**: External tool access through Model Context Protocol servers

### 3. Checks (Quality Assurance)

Built-in mechanisms for ensuring reliability, compliance, and quality:

* **Rules**: Behavioral constraints and validation frameworks
* **Judge**: LLM-based evaluation of agent outputs and decisions
* **Audit**: Comprehensive monitoring and logging of agent behavior
* **Screenshot**: Visual validation for UI-related tasks

### 4. Workflows (Execution Patterns)

Composable patterns for agent coordination, based on [Anthropic's building blocks for agentic systems](https://www.anthropic.com/engineering/building-effective-agents):

* **Routing**: Classify input and delegate to the appropriate handler
* **Parallelization**: Split work across parallel branches and merge results
* **Orchestrator-Workers**: Plan, delegate to workers, synthesize outputs
* **Evaluator-Optimizer**: Generate, evaluate, refine in a loop

### 5. Runs (Lifecycle Control)

Every agent execution is a **Run** with a well-defined lifecycle and control mechanisms:

* **Timeout**: Time-based execution limits
* **Retry**: Automatic retry with configurable strategies
* **Cancel**: Graceful cancellation with cleanup
* **Approval**: Human-in-the-loop gates for sensitive operations

## Core Execution Model

### The Agent Loop

At the heart of every agent is the **Agent Loop**, the cognitive cycle that drives execution:

<Mermaid
  chart="flowchart TD
    Start([Agent Loop])
    Gather[Gather Context]
    Action[Take Action]
    Verify[Verify Work]
    Iterate{Complete?}
    End([End])
    
    Start --> Gather
    Gather --> Action
    Action --> Verify
    Verify --> Iterate
    Iterate -->|No| Gather
    Iterate -->|Yes| End"
/>

This loop executes within workflows and is managed by the Operating System during the Execution phase of the Agent Lifecycle.

Learn more: [Agent Loop](/docs/concepts/agent-loop)

### The Agent Lifecycle

At the system level, agents follow a **Lifecycle** that spans their entire existence:

<Mermaid
  chart="flowchart LR
    Reg[Registration]
    Disc[Discovery]
    Exec[Execution]
    Eval[Evaluation]
    
    Reg --> Disc
    Disc --> Exec
    Exec --> Eval"
/>

* **Registration**: Agents declare capabilities to the Registry
* **Discovery**: The OS matches agents to tasks based on capabilities
* **Execution**: The Agent Loop runs within workflows
* **Evaluation**: Performance, quality, and compliance are assessed

Learn more: [Agent Lifecycle](/docs/concepts/lifecycle)

## Integration Patterns

OSP is designed to integrate with existing protocols and systems:

### MCP Integration

OSP includes native support for the Model Context Protocol, allowing agents to access external tools and resources through standardized MCP servers. The OS provides MCP client functionality that any agent can leverage.

### Workflow Orchestration

The protocol defines workflow patterns that can be composed and combined. Workflows handle coordination; Runs handle lifecycle control (timeouts, retries, cancellation, approval).

Learn more: [Workflows](/docs/workflows) · [Runs](/docs/runs)

### Multi-Agent Coordination

OSP enables agents from different implementations to work together through standardized interfaces. Agents can coordinate tasks, share context (with proper isolation), and participate in distributed workflows.

## Design Philosophy

The architecture prioritizes:

* **Modularity**: Components can be used independently or together
* **Extensibility**: New components and capabilities can be added without breaking existing implementations
* **Interoperability**: Different implementations can work together through standardized contracts
* **Reliability**: Built-in quality assurance through Rules, Audit, and Judge
* **Observability**: Comprehensive monitoring, auditing, and evaluation capabilities
* **Scalability**: Designed to grow from single-agent systems to complex multi-agent environments

## Next Steps

* Learn about **[System](/docs/system)** for infrastructure services
* Understand **[Context](/docs/context)** and **[Actions](/docs/actions)** for the read/write split
* Review **[Checks](/docs/checks)** for quality assurance mechanisms
* Explore **[Workflows](/docs/workflows)** for coordination patterns
* See **[Runs](/docs/runs)** for lifecycle control


---

# Introduction


import { Orbit, Boxes, CircuitBoard, Server } from 'lucide-react';

## What is the Agentic OS Protocol?

The Agentic OS Protocol (OSP) is a specification—a shared contract that defines the interfaces, behaviors, and data formats for orchestrating AI agents at scale.

Think of it as the blueprint you implement: it tells you which domains exist (System, Context, Actions, Checks, Workflows, Runs), how they interact, and what "conformant" behavior looks like. It's not a runtime or framework—you build to it.

### Protocol Domains

| Domain                       | Purpose                                                                                  |
| ---------------------------- | ---------------------------------------------------------------------------------------- |
| [System](/docs/system)       | Infrastructure — registry, environment, filesystem, settings                             |
| [Context](/docs/context)     | Read-only facades — system context, embeddings, key-value                                |
| [Actions](/docs/actions)     | Write facades — system actions, tools, MCP servers                                       |
| [Checks](/docs/checks)       | Quality assurance — rules, judge, audit, screenshot                                      |
| [Workflows](/docs/workflows) | Execution patterns — routing, parallelization, orchestrator-workers, evaluator-optimizer |
| [Runs](/docs/runs)           | Lifecycle control — timeout, retry, cancel, approval                                     |

## Getting Started

OSP defines the contract for building agent systems that can work together seamlessly, with standardized patterns for coordination, quality assurance, and context management. Below are the key entry points to understand and implement the protocol.

<Cards>
  <Card icon={<Boxes />} href="/docs/architecture" title="See the big picture">
    Explore the architecture, how everything fits before you dive into details.
  </Card>

  <Card icon={<Orbit />} href="/docs/concepts/agent-loop" title="Agent Loop">
    Gather Context → Take Actions → Verify Results
  </Card>

  <Card icon={<CircuitBoard />} href="/docs/system" title="System Intelligence">
    Registry, Environment, Filesystem, Settings—the infrastructure layer.
  </Card>

  <Card icon={<Server />} href="/docs/concepts/agentic-os" title="What is Agentic OS?">
    The architectural paradigm where LLM functions as the Kernel of the system.
  </Card>
</Cards>


---

# Motivation


## The Challenge

As AI agents become more sophisticated and capable, we face new challenges in orchestrating, managing, and executing them at scale. Traditional approaches to agent management often fall short when dealing with the reality that comes when systems grow beyond a single agent running in isolation.

Picture this: you start with one agent handling a simple task. It works perfectly. Then you need two agents to work together. Still manageable. But as you add more agents—each with different capabilities, running across different environments, coordinating complex workflows—things quickly become messy. What seemed simple at small scale reveals complexity you didn't anticipate.

## When Systems Grow

Orchestration, at its core, is about coordinating multiple agents to work together toward a common goal. When you have a handful of agents, coordination feels straightforward. But scale changes everything. Your agents start running across different environments—some in the cloud, others on edge devices, each with different capabilities and constraints. Coordinating them becomes a challenge in itself. Then workflows get complex: one agent's output becomes another's input, creating chains of dependencies that span multiple agents and environments. A failure in one step can cascade through the entire process. You find yourself managing not just individual agents, but intricate relationships between them—who depends on whom, what happens when something fails, how to retry, how to recover.

As your system grows, new questions emerge that traditional approaches struggle to answer. How do you ensure quality when you can't monitor everything manually? How do you maintain context when agents operate independently across sessions? How do you scale from ten agents to a thousand without everything breaking? In distributed systems where agents collaborate, quality is emergent—it's about how agents interact, not just how each performs alone. Context management becomes critical: agents need to share information but also maintain isolation. Without standardized approaches, everyone solves these problems differently. Agent platforms built by different teams can't interoperate. The ecosystem fragments, and innovation slows because everyone is reinventing the same solutions.

## Why a Protocol

This is where protocols shine. A protocol defines a shared contract—the interfaces, behaviors, and data formats that enable interoperability. Just as HTTP allows any web browser to communicate with any web server, a protocol for agent orchestration would allow different implementations to work together while remaining free to innovate in their specific domains.

A protocol, unlike a framework or library, doesn't prescribe implementation details. It defines *what* must be supported and *how* components should interact, but leaves *how* you build it up to you. This flexibility is crucial: teams working in different languages, with different constraints, and different use cases can all implement the same protocol and achieve interoperability.

## What OSP Does

OSP provides standardized patterns for agent coordination, quality assurance, and resource management—proven patterns that are reusable but not prescriptive. It means providing infrastructure for common problems: agent discovery, context management, quality monitoring. It means designing for scale from the start, so systems can grow from a single agent to complex multi-agent environments without fundamental redesigns.

With a standardized protocol, the entire ecosystem benefits. Developers can build agent systems knowing they'll interoperate with others. Teams can share agents, workflows, and patterns. Agents from different platforms can collaborate on complex tasks. Workflows can span multiple systems. The ecosystem becomes composable—you can combine agents and tools from different sources, knowing they'll work together because they follow the same protocol. This creates the foundation for systems where agents collaborate at unprecedented scale, where workflows span organizations and platforms, where the whole ecosystem is greater than the sum of its parts.

## Building in the Open

The Agentic OS Protocol is in active development, maintained by [SynerOps](https://synerops.com), and we're building it in the open.

Why? Because the protocol needs to solve real problems, work in real environments, and evolve based on how people actually use it.

We welcome contributions, feedback, and collaboration. Whether you're implementing the protocol, using it in production, researching agent systems, or just curious about what's possible—your perspective matters. Together, we're not just defining a protocol; we're shaping how agents will work together for years to come.


---

# MCP Servers


<Callout type="warn">
  This interface is experimental. No real implementation exists yet. The API surface
  may change as the MCP ecosystem and OS Protocol integration patterns mature.
</Callout>

## Overview

`McpServers` is the agent-facing interface for MCP-specific capabilities that go beyond tool execution. Resources (data and content exposed by a server) and prompts (reusable templates) are accessed through this interface. Tool execution from MCP servers goes through the unified Tools interface, not here. Connection management and server lifecycle are handled at the infrastructure level by `system/mcp-client`. Provider analogues include Anthropic MCP and the AAIF MCP Standard.

## Architecture

<Mermaid
  chart="flowchart TD
  Agent[&#x22;Agent&#x22;]

  Agent -->|&#x22;tool calls&#x22;| Tools[&#x22;Tools interface&#x22;]
  Agent -->|&#x22;resources & prompts&#x22;| McpServers[&#x22;McpServers interface&#x22;]

  Tools --> MCP[&#x22;MCP Server&#x22;]
  McpServers --> MCP

  McpClient[&#x22;system/McpClient&#x22;] -.->|&#x22;manages connection&#x22;| MCP"
/>

## TypeScript API

```ts
import type { McpServers, McpResource, McpPrompt } from "@osprotocol/schema/actions/mcp-servers"
```

### McpResource

Represents a data or content resource exposed by an MCP server.

```ts
interface McpResource {
  uri: string
  name: string
  mimeType?: string
  description?: string
  metadata?: Record<string, unknown>
}
```

| Field         | Type                      | Description                                        |
| ------------- | ------------------------- | -------------------------------------------------- |
| `uri`         | `string`                  | Unique resource identifier within the server       |
| `name`        | `string`                  | Human-readable resource name                       |
| `mimeType`    | `string`                  | Optional MIME type of the resource content         |
| `description` | `string`                  | Optional description of what the resource contains |
| `metadata`    | `Record<string, unknown>` | Optional server-defined metadata                   |

### McpPrompt

Represents a reusable prompt template provided by an MCP server.

```ts
interface McpPrompt {
  name: string
  description?: string
  arguments?: object
  metadata?: Record<string, unknown>
}
```

| Field         | Type                      | Description                                  |
| ------------- | ------------------------- | -------------------------------------------- |
| `name`        | `string`                  | Unique prompt name within the server         |
| `description` | `string`                  | Optional description of the prompt's purpose |
| `arguments`   | `object`                  | Optional argument schema for the prompt      |
| `metadata`    | `Record<string, unknown>` | Optional server-defined metadata             |

### McpServers

The primary interface agents use to interact with MCP server resources and prompts.

```ts
interface McpServers {
  listResources(server: string): Promise<McpResource[]>
  readResource(server: string, uri: string): Promise<string | null>
  listPrompts(server: string): Promise<McpPrompt[]>
  getPrompt(server: string, name: string, args?: Record<string, string>): Promise<string | null>
}
```

| Method                           | Description                                                      |
| -------------------------------- | ---------------------------------------------------------------- |
| `listResources(server)`          | List all resources available on the given MCP server             |
| `readResource(server, uri)`      | Read the content of a specific resource by URI                   |
| `listPrompts(server)`            | List all prompt templates available on the given MCP server      |
| `getPrompt(server, name, args?)` | Retrieve a rendered prompt by name, optionally passing arguments |

## Usage Examples

### List and read resources from a server

```ts
const resources = await mcp.listResources("knowledge-base")

for (const resource of resources) {
  console.log(`${resource.name} (${resource.mimeType ?? "unknown type"})`)
}

const content = await mcp.readResource("knowledge-base", "docs://api-reference")
if (content) {
  // process the resource content
}
```

### Get a prompt with arguments

```ts
const rendered = await mcp.getPrompt("code-assistant", "explain-function", {
  language: "typescript",
  context: "async generator",
})

if (rendered) {
  // use the rendered prompt text in an LLM call
}
```

### Discover what an MCP server offers

```ts
const [resources, prompts] = await Promise.all([
  mcp.listResources("my-server"),
  mcp.listPrompts("my-server"),
])

console.log(`Resources: ${resources.map((r) => r.name).join(", ")}`)
console.log(`Prompts: ${prompts.map((p) => p.name).join(", ")}`)
```

## Integration

* [Tools](/docs/actions/tools) — unified interface for tool execution, including tools from MCP servers
* [MCP Client](/docs/system/mcp-client) — infrastructure-level connection management for MCP servers
* [SystemActions](/docs/actions/system) — broader system actions context


---

# System Actions


<Callout type="warn">
  This interface is experimental — no production implementation exists yet.
  The API surface may change.
</Callout>

## Overview

System Actions composes all system write interfaces into a single entry point for the actions phase of the agent loop. It is a pure facade — each system API owns its own Actions interface, and `SystemActions` re-exports them under a unified namespace.

The read counterpart is [System Context](/docs/context/system), which provides read-only access to the same system interfaces.

<Mermaid
  chart="flowchart LR
    Agent([Agent]) --> SA[SystemActions]
    SA --> env[EnvActions]
    SA --> fs[FsActions]
    SA --> sandbox[SandboxActions]
    SA --> settings[SettingsActions]
    SA --> preferences[PreferencesActions]
    SA --> registry[RegistryActions]
    SA --> installer[InstallerActions]
    SA --> mcp[McpActions]"
/>

## TypeScript API

```ts
import type { SystemActions } from '@osprotocol/schema/actions/system'
```

### SystemActions

Composes all system write interfaces.

```ts
interface SystemActions {
  /** Environment variables */
  env: EnvActions
  /** System-wide settings */
  settings: SettingsActions
  /** Scoped preferences */
  preferences: PreferencesActions
  /** Resource registries */
  registry: RegistryActions
  /** Host filesystem */
  fs: FsActions
  /** Sandbox environments */
  sandbox: SandboxActions
  /** Installed packages */
  installer: InstallerActions
  /** MCP server connections */
  mcp: McpActions
}
```

Individual Actions interfaces are also re-exported:

```ts
import type {
  EnvActions,
  SettingsActions,
  PreferencesActions,
  RegistryActions,
  FsActions,
  SandboxActions,
  InstallerActions,
  McpActions,
} from '@osprotocol/schema/actions/system'
```

## Usage Examples

### Mutate system state through the facade

```ts
// Set an environment variable
await system.env.set({ key: 'DATABASE_URL', value: 'postgres://...' })

// Create a sandbox for code execution
const sandbox = await system.sandbox.create({ runtime: 'node24', timeout: 60000 })

// Install a dependency
await system.installer.install({ name: '@osprotocol/schema', version: '^0.2.0' })
```

### Use individual Actions interfaces directly

```ts
// When you only need filesystem write access
async function saveArtifact(fs: FsActions, path: string, content: string) {
  await fs.write(path, content)
}
```

## Design Rationale

The agent loop enforces read/write separation by phase:

* **Context phase** → read-only (`SystemContext` in `context/system.ts`)
* **Actions phase** → write operations (`SystemActions` in `actions/system.ts`)

This zero-trust pattern ensures agents gather all context before mutating state. The facade is pure composition — it adds no logic, just groups the individual Actions interfaces for convenience.

## Integration

System Actions integrates with:

* **[System Context](/docs/context/system)**: Read counterpart — same interfaces, read-only view
* **[Tools](/docs/actions/tools)**: System mutations can be exposed as agent tools
* **[Checks](/docs/checks/audit)**: Audit trails record system mutations for verification


---

# Tools


<Callout type="warn">
  This interface is experimental — no production implementation exists yet.
  The API surface may change.
</Callout>

## Overview

`Tools` provides a unified surface for agent tool discovery and execution. Agents call tools without needing to know whether they originate from MCP servers, built-in capabilities, or custom providers — the implementation aggregates all sources transparently. Provider analogues include Anthropic MCP Tools, OpenAI Function Calling, Vercel AI SDK Tools, and LangChain Tools.

## TypeScript API

```ts
import type { Tool, ToolResult, Tools } from "@osprotocol/schema/actions/tools"
```

### Tool

Represents a single callable tool with a name, description, parameter schema, and execute function.

```ts
interface Tool<TParams = unknown, TResult = unknown> {
  name: string
  description: string
  parameters?: object
  execute(params: TParams): Promise<TResult>
  metadata?: Record<string, unknown>
}
```

| Field         | Type                                    | Description                                         |
| ------------- | --------------------------------------- | --------------------------------------------------- |
| `name`        | `string`                                | Unique tool identifier                              |
| `description` | `string`                                | Human-readable description of what the tool does    |
| `parameters`  | `object`                                | Optional JSON Schema describing accepted parameters |
| `execute`     | `(params: TParams) => Promise<TResult>` | Function that performs the tool action              |
| `metadata`    | `Record<string, unknown>`               | Optional extra data attached to the tool            |

### ToolResult

Returned by `Tools.execute`. Wraps the result with success/error state.

```ts
interface ToolResult<T = unknown> {
  toolName: string
  result: T
  success: boolean
  error?: string
  metadata?: Record<string, unknown>
}
```

| Field      | Type                      | Description                               |
| ---------- | ------------------------- | ----------------------------------------- |
| `toolName` | `string`                  | Name of the tool that was executed        |
| `result`   | `T`                       | The value returned by the tool            |
| `success`  | `boolean`                 | Whether execution completed without error |
| `error`    | `string`                  | Error message if `success` is `false`     |
| `metadata` | `Record<string, unknown>` | Optional extra data from the execution    |

### Tools

The primary interface agents use to discover and invoke tools.

```ts
interface Tools {
  get(name: string): Promise<Tool | null>
  list(): Promise<Tool[]>
  execute<T = unknown>(name: string, params?: unknown): Promise<ToolResult<T>>
}
```

| Method                   | Returns                  | Description                                            |
| ------------------------ | ------------------------ | ------------------------------------------------------ |
| `get(name)`              | `Promise<Tool \| null>`  | Retrieve a tool by name, or `null` if not found        |
| `list()`                 | `Promise<Tool[]>`        | Return all available tools from all registered sources |
| `execute(name, params?)` | `Promise<ToolResult<T>>` | Invoke a tool by name with optional parameters         |

## Usage Examples

### Execute a tool

```ts
const result = await tools.execute("read_file", { path: "/data/config.json" })

if (result.success) {
  console.log(result.result)
} else {
  console.error(result.error)
}
```

### List available tools and find one

```ts
const allTools = await tools.list()
const searchTool = allTools.find((t) => t.name === "web_search")

if (searchTool) {
  console.log(searchTool.description)
  console.log(searchTool.parameters)
}
```

### Handle tool errors

```ts
const result = await tools.execute<string>("send_email", {
  to: "user@example.com",
  subject: "Hello",
  body: "Message body",
})

if (!result.success) {
  // Log the error and fall back gracefully
  console.error(`Tool "${result.toolName}" failed: ${result.error}`)
}
```

## Integration

* [MCP Servers](/docs/actions/mcp-servers) — tool sources exposed over the Model Context Protocol
* [MCP Client](/docs/system/mcp-client) — connects to MCP servers and surfaces their tools
* [SystemActions](/docs/actions/system) — built-in system-level actions available as tools


---

# Audit


## Overview

Agents generate formal audit reports as markdown files with YAML frontmatter. The frontmatter conforms to the `AuditEntry` schema, enabling machine-parseable compliance records.

The schema is aligned with **ISO 27001** audit reporting and **ISACA/ITAF** expression of opinion standards.

**Implementation patterns:** gray-matter, Contentlayer, Fumadocs.
**Consumers:** Drata, Scytale (compliance automation), LangSmith, Langfuse (agent observability).

## Audit Flow

<Mermaid
  chart="flowchart TD
  P[Audit prompt] --> A[Agent analyzes]
  A --> F[Generates file]
  F --> Y[YAML frontmatter]
  F --> B[Markdown body]
  Y --> Q[Machine queries]"
/>

## Schema

```ts
import type {
  AuditEntry,
  AuditOpinion,
  AuditFindings,
  AuditQuery,
  Audit,
} from '@osprotocol/schema/checks/audit'
```

### AuditOpinion

ISACA expression of opinion.

```ts
type AuditOpinion =
  | 'unqualified'  // No significant issues, full compliance
  | 'qualified'    // Minor issues that don't affect overall compliance
  | 'adverse'      // Significant issues, non-compliant
  | 'disclaimer'   // Unable to form opinion (insufficient evidence)
```

### AuditFindings

Finding severity counts aligned with ISO 27001 non-conformity classification.

```ts
interface AuditFindings {
  critical: number  // Immediate action required
  major: number     // Should be addressed soon
  minor: number     // Low risk, normal course
}
```

### AuditEntry

Schema for the YAML frontmatter in audit report files.

```ts
interface AuditEntry {
  id: string
  createdAt: number
  agentId?: string
  executionId?: string

  // ISO 27001 / ISACA fields
  objectives: string        // What the audit aims to determine
  scope: string[]           // Files, systems, or processes audited
  opinion: AuditOpinion     // Expression of opinion
  findings: AuditFindings   // Counts by severity

  // Detailed results (optional)
  ruleResults?: RuleResult[]
  judgeResult?: JudgeResult
  metadata?: Record<string, unknown>
}
```

### AuditQuery

Filter criteria for querying audit entries.

```ts
interface AuditQuery {
  opinion?: AuditOpinion | AuditOpinion[]
  minCritical?: number
  minMajor?: number
  agentId?: string
  executionId?: string
  since?: number  // Unix ms
  until?: number  // Unix ms
}
```

### Audit

Operations for parsing, writing, and querying audit entries.

```ts
interface Audit {
  parse(content: string): AuditEntry
  write(entry: Omit<AuditEntry, 'id' | 'createdAt'>, body: string): string
  query(query: AuditQuery): Promise<AuditEntry[]>
}
```

* `parse` — Extract `AuditEntry` from file content with YAML frontmatter
* `write` — Generate file content from entry and markdown body
* `query` — Find entries matching filter criteria

## Agentic Usage

### Prompt

```
Audit the production configuration files
```

### Output

File: `audits/2026-02-21-config-review.md`

```yaml
---
type: audit
date: 2026-02-21
agent: reviewer
objectives: "Verify configuration files follow security best practices"
scope:
  - config/production.yaml
  - config/staging.yaml
status: complete
opinion: qualified
findings:
  critical: 0
  major: 1
  minor: 2
---

## Criteria

- Security best practices
- Secret management
- Environment isolation

## Findings

### Major

**M1: Hardcoded API endpoint**
Production config contains hardcoded URL instead of environment variable...

### Minor

**m1: Missing timeout configuration**
...

## Recommendations

**M1**: Use environment variable for API endpoint
...

## Conclusion

Configuration is mostly secure but contains one hardcoded value that
should be externalized. Qualified opinion issued.
```

### Querying Audits

The frontmatter is machine-parseable:

```bash
# Find audits with critical findings
grep -l "critical: [1-9]" audits/*.md

# Find adverse opinions
grep -l "opinion: adverse" audits/*.md
```

Or programmatically via the `Audit` interface:

```ts
const critical = await audit.query({ minCritical: 1 })
const adverse = await audit.query({ opinion: 'adverse' })
```

## Standards Mapping

| AuditEntry Field | ISO 27001            | ISACA/ITAF              |
| ---------------- | -------------------- | ----------------------- |
| `objectives`     | Scope and Objectives | Objectives of the Audit |
| `scope`          | Scope and Objectives | Scope of Engagement     |
| `opinion`        | Audit Conclusion     | Expression of Opinion   |
| `findings`       | Non-Conformities     | Findings with Severity  |
| `ruleResults`    | Evidence             | Supporting Data         |
| `judgeResult`    | Evaluation           | Quality Assessment      |

## Integration

* [Rules](/docs/checks/rules) — `RuleResult[]` can be included in audit entries as evidence.
* [Judge](/docs/checks/judge) — `JudgeResult` can be included for quality assessment.
* [Screenshot](/docs/checks/screenshot) — Visual comparison results can support findings.


---

# Judge


<Callout type="warn">
  This interface is **experimental**. No real implementation exists yet.
  The API shape may change before stabilization.
</Callout>

## Overview

`Judge` is the checks-phase interface for LLM-as-judge evaluation. It uses a model to score agent output against quality criteria, returning a numeric score (0–1), a pass/fail result, and natural-language reasoning. Results can optionally include a per-criterion breakdown via `ruleResults`, feed into audit records, and trigger approval flows when a score falls below threshold.

Provider analogues: OpenAI Evals, Braintrust Scorers, LangSmith Evaluators, Arize Phoenix.

## Evaluation Flow

<Mermaid
  chart="flowchart TD
  A[content + JudgeConfig] --> B[Judge.evaluate]
  B --> C[JudgeResult]
  C --> D{score >= threshold?}
  D -- yes --> E[passed: true]
  D -- no --> F[passed: false]
  F --> G[trigger approval / escalation]"
/>

## TypeScript API

```ts
import type { JudgeConfig, JudgeResult, Judge } from '@osprotocol/schema/checks/judge'
```

### JudgeConfig

```ts
interface JudgeConfig {
  model?: string
  criteria: string
  threshold?: number
  metadata?: Record<string, unknown>
}
```

| Field       | Type                      | Description                                                                              |
| ----------- | ------------------------- | ---------------------------------------------------------------------------------------- |
| `model`     | `string`                  | Model identifier to use as judge. Falls back to provider default when omitted.           |
| `criteria`  | `string`                  | Natural-language description of what constitutes a passing result. Required.             |
| `threshold` | `number`                  | Minimum score (0–1) for `passed: true`. Defaults to provider-defined value when omitted. |
| `metadata`  | `Record<string, unknown>` | Arbitrary metadata attached to this evaluation run.                                      |

### JudgeResult

```ts
interface JudgeResult {
  score: number
  passed: boolean
  reasoning: string
  ruleResults?: RuleResult[]
  metadata?: Record<string, unknown>
}
```

| Field         | Type                      | Description                                         |
| ------------- | ------------------------- | --------------------------------------------------- |
| `score`       | `number`                  | Numeric quality score in the range 0–1.             |
| `passed`      | `boolean`                 | `true` when `score >= threshold`.                   |
| `reasoning`   | `string`                  | Model-generated explanation for the score.          |
| `ruleResults` | `RuleResult[]`            | Optional per-criterion breakdown from linked rules. |
| `metadata`    | `Record<string, unknown>` | Arbitrary metadata returned by the judge.           |

### Judge

```ts
interface Judge {
  evaluate(content: unknown, config: JudgeConfig): Promise<JudgeResult>
}
```

`evaluate` accepts any `content` value (string, object, or structured output) and a `JudgeConfig`, and returns a `Promise<JudgeResult>`.

## Usage Examples

### Basic evaluation

```ts
const result = await judge.evaluate(agentOutput, {
  criteria: 'The response must be factually accurate, concise, and free of harmful content.',
  threshold: 0.8,
})

console.log(result.passed)    // true | false
console.log(result.score)     // e.g. 0.92
console.log(result.reasoning) // "The response was accurate and well-scoped..."
```

### Breakdown by criteria using ruleResults

```ts
const result = await judge.evaluate(agentOutput, {
  model: 'claude-opus-4-6',
  criteria: 'Evaluate accuracy, tone, and completeness separately.',
  threshold: 0.75,
})

if (result.ruleResults) {
  for (const rule of result.ruleResults) {
    console.log(rule.name, rule.passed, rule.score)
  }
}
```

### Conditional approval trigger

```ts
const result = await judge.evaluate(agentOutput, {
  criteria: 'Output must not contain PII and must follow the brand voice guide.',
  threshold: 0.9,
})

if (!result.passed) {
  // Route to human approval before publishing
  await approvalGate.request({
    reason: result.reasoning,
    score: result.score,
  })
}
```

## Rules vs Judge

|                   | Rules                                             | Judge                                        |
| ----------------- | ------------------------------------------------- | -------------------------------------------- |
| Evaluation method | Deterministic / programmatic                      | LLM-based qualitative evaluation             |
| Output            | Pass/fail per rule                                | Score (0–1) + reasoning                      |
| Best for          | Schema validation, format checks, required fields | Tone, accuracy, helpfulness, nuanced quality |
| Latency           | Low                                               | Higher (model call required)                 |
| Cost              | None                                              | Model inference cost                         |
| Auditability      | Exact rule match                                  | Natural-language reasoning                   |

Use `Rules` for hard constraints and `Judge` for qualitative grading where human-like judgment is required.

## Integration

* [Rules](/docs/checks/rules) — deterministic checks whose results can be surfaced as `ruleResults` inside a `JudgeResult`
* [Audit](/docs/checks/audit) — `JudgeResult` records are written to the audit log for traceability
* [Approval](/docs/runs/approval) — when `passed` is `false`, evaluation results can trigger a human approval gate before execution continues


---

# Rules


<Callout type="warn">
  Rules is an experimental interface. No implementation exists yet. The API
  described here reflects the current design and is subject to change as the
  protocol evolves.
</Callout>

## Overview

Rules define declarative verification criteria that agent output must satisfy before being accepted. They are composable: rules can be evaluated individually or as a complete set against any content. Provider analogues include ESLint Rules, GitHub Checks, Vercel Deployment Checks, and OpenAI Guardrails.

## Severity Levels

| Severity  | Meaning                                                       |
| --------- | ------------------------------------------------------------- |
| `error`   | The rule failure blocks acceptance. Output must not proceed.  |
| `warning` | The rule failure is notable but does not block acceptance.    |
| `info`    | The rule result is informational only. No action is required. |

## TypeScript API

```ts
import type { RuleSeverity, RuleResult, Rule, Rules } from '@osprotocol/schema/checks/rules'
```

### RuleSeverity

```ts
type RuleSeverity = 'error' | 'warning' | 'info'
```

Indicates how a rule failure should be treated. An `error` blocks acceptance, a `warning` is surfaced without blocking, and `info` is purely observational.

### RuleResult

```ts
interface RuleResult {
  ruleName: string
  passed: boolean
  severity: RuleSeverity
  message: string
  metadata?: Record<string, unknown>
}
```

The result returned after evaluating a single rule against content. `passed` indicates whether the rule was satisfied. `message` provides a human-readable explanation. `metadata` carries any structured diagnostic data the rule chooses to emit.

### Rule

```ts
interface Rule {
  name: string
  description: string
  severity: RuleSeverity
  evaluate(content: unknown): Promise<RuleResult>
  metadata?: Record<string, unknown>
}
```

A single verifiable criterion. `evaluate` receives the content to check and returns a `RuleResult`. The `severity` on the `Rule` defines the default severity that should appear in results when the rule fails.

### Rules

```ts
interface Rules {
  get(name: string): Promise<Rule | null>
  list(): Promise<Rule[]>
  evaluate(content: unknown): Promise<RuleResult[]>
}
```

A collection of rules. `list` enumerates all registered rules. `get` retrieves a specific rule by name. `evaluate` runs all rules against the provided content and returns a `RuleResult` for each one.

## Usage Examples

### Evaluate all rules against content

```ts
const results = await rules.evaluate(agentOutput)

for (const result of results) {
  if (!result.passed && result.severity === 'error') {
    throw new Error(`Rule failed: ${result.ruleName} — ${result.message}`)
  }
}
```

### Get a specific rule by name

```ts
const rule = await rules.get('no-pii-in-output')

if (rule) {
  const result = await rule.evaluate(agentOutput)
  console.log(result.passed, result.message)
}
```

### Define a custom rule

```ts
const noPiiRule: Rule = {
  name: 'no-pii-in-output',
  description: 'Ensures agent output does not contain personally identifiable information',
  severity: 'error',
  async evaluate(content: unknown): Promise<RuleResult> {
    const text = typeof content === 'string' ? content : JSON.stringify(content)
    const hasPii = /\b\d{3}-\d{2}-\d{4}\b/.test(text) // SSN pattern example
    return {
      ruleName: 'no-pii-in-output',
      passed: !hasPii,
      severity: 'error',
      message: hasPii ? 'Output contains potential PII' : 'No PII detected',
    }
  },
}
```

## Integration

Rule results produced by `Rules.evaluate` feed into other parts of the checks and runs pipeline:

* [Judge](/docs/checks/judge) — uses rule results alongside other signals to produce a quality verdict
* [Audit](/docs/checks/audit) — records rule results for traceability and post-hoc review
* [Approval](/docs/runs/approval) — an `error`-severity failure can gate a run and trigger a human approval step


---

# Screenshot


<Callout type="warn">
  This interface is experimental. No implementation exists yet.
  The API shape may change before stabilization.
</Callout>

## Overview

Screenshot provides visual capture and baseline comparison for visual regression detection within the checks phase. Providers include Playwright, Puppeteer, Browserbase, and ScreenshotOne. Comparison results feed into the audit trail alongside rule and judge results, giving a complete picture of agent output quality.

## Capture and compare flow

<Mermaid
  chart="flowchart LR
  A[capture options] --> B[ScreenshotEntry]
  B --> C{compare actual vs baseline}
  C --> D[ComparisonResult]
  D --> E{passed?}
  E -->|yes| F[Continue]
  E -->|no| G[Flag / block]"
/>

## TypeScript API

```ts
import type {
  ImageFormat,
  ScreenshotOptions,
  ScreenshotEntry,
  ComparisonResult,
  Screenshot,
} from '@osprotocol/schema/checks/screenshot'
```

### ImageFormat

```ts
type ImageFormat = 'png' | 'jpeg' | 'webp'
```

PNG is lossless and best suited for pixel diffing. JPEG and WebP produce smaller payloads when exact pixel fidelity is not required.

### ScreenshotOptions

```ts
interface ScreenshotOptions {
  url?: string
  fullPage?: boolean
  clip?: {
    x: number
    y: number
    width: number
    height: number
  }
  selector?: string
  format?: ImageFormat
  quality?: number
  scale?: number
  omitBackground?: boolean
  metadata?: Record<string, unknown>
}
```

| Field            | Description                                              |
| ---------------- | -------------------------------------------------------- |
| `url`            | Page URL to navigate to before capturing                 |
| `fullPage`       | Capture the full scrollable page instead of the viewport |
| `clip`           | Restrict capture to a bounding box in pixels             |
| `selector`       | CSS selector — captures only the matching element        |
| `format`         | Output image format (`png`, `jpeg`, `webp`)              |
| `quality`        | Compression quality for JPEG and WebP (0–100)            |
| `scale`          | Device pixel ratio multiplier                            |
| `omitBackground` | Make the background transparent (PNG only)               |
| `metadata`       | Arbitrary key-value pairs attached to the entry          |

### ScreenshotEntry

```ts
interface ScreenshotEntry {
  id: string
  data: string
  format: ImageFormat
  width: number
  height: number
  createdAt: number
  metadata?: Record<string, unknown>
}
```

`data` is a base64-encoded image string. `createdAt` is a Unix timestamp in milliseconds.

### ComparisonResult

```ts
interface ComparisonResult {
  passed: boolean
  message: string
  diffPixels: number
  diffRatio: number
  diffImage?: string
  metadata?: Record<string, unknown>
}
```

`passed` and `message` follow the same convention as `RuleResult`, so comparison results compose naturally into the checks audit trail. `diffImage` is an optional base64-encoded visualization of the pixel diff.

### Screenshot

```ts
interface Screenshot {
  capture(options?: ScreenshotOptions): Promise<ScreenshotEntry>
  compare(
    actual: ScreenshotEntry,
    baseline: ScreenshotEntry,
    threshold?: number,
  ): Promise<ComparisonResult>
}
```

`capture` maps to provider-native methods: `page.screenshot` in Playwright and Puppeteer, `Page.captureScreenshot` via CDP in Browserbase, and `GET /take?url=...` in ScreenshotOne. `compare` uses pixel diffing — only Playwright has this built in via `toHaveScreenshot` (Pixelmatch). For all other providers the adapter handles comparison externally.

`threshold` is a ratio between 0 and 1 representing the maximum acceptable pixel difference before `passed` becomes `false`.

## Usage examples

### Capture a full-page screenshot

```ts
const entry = await screenshot.capture({
  url: 'https://example.com',
  fullPage: true,
  format: 'png',
})
```

### Visual regression test against a baseline

```ts
const actual = await screenshot.capture({ url: 'https://example.com' })

const result = await screenshot.compare(actual, baseline, 0.01)

if (!result.passed) {
  console.log(result.message)
  console.log(`Diff: ${result.diffPixels} pixels (${result.diffRatio * 100}%)`)
}
```

### Capture a specific element

```ts
const entry = await screenshot.capture({
  url: 'https://example.com/dashboard',
  selector: '#revenue-chart',
  format: 'png',
  omitBackground: true,
})
```

## Integration

* [Audit](/docs/checks/audit) — screenshot entries and comparison results are attached to the audit trail
* [Sandbox](/docs/system/sandbox) — browser-based captures run inside isolated sandbox environments
* [Rules](/docs/checks/rules) — rule results and screenshot comparison results compose into a unified checks report


---

# Agent Loop


## Overview

The Agent Loop is the fundamental execution pattern that all agent implementations MUST support. It defines the iterative cycle through which agents gather context, take actions, verify their work, and iterate until completion.

<Mermaid
  chart="flowchart TD
    Gather[Gather Context]
    Action[Take Action]
    Verify[Verify Work]
    Iterate{Complete?}
    
    Gather --> Action
    Action --> Verify
    Verify --> Iterate
    Iterate -->|No| Gather
    Iterate -->|Yes| End([End])"
/>

## The Four Steps

### 1. Gather Context

Agents collect information needed to complete their task through:

* **Agentic search**: File systems, grep, tail, structured queries
* **Semantic search**: Vector embeddings for concept-based queries
* **Subagents**: Isolated context windows for parallel information gathering
* **Context compaction**: Summarization for long-running agents

Learn more: [Context Management](/docs/context)

### 2. Take Action

Agents execute operations using:

* **Tools**: Primary building blocks with clear interfaces
* **Bash/Scripts**: Command execution and automation
* **Code Generation**: Dynamic code creation and execution
* **MCP Integration**: Standardized protocol for external services

Learn more: [Actions](/docs/actions/tools)

### 3. Verify Work

Agents validate outputs through:

* **Rules-based validation**: Defined criteria and constraints
* **Visual feedback**: Screenshots and renders for UI tasks
* **LLM-as-judge**: Model-based evaluation

Learn more: [Checks](/docs/checks/rules)

### 4. Iterate

The loop repeats until:

* Task completion criteria are met
* Iteration limits are reached
* Termination conditions are triggered

## Cognitive Micro-Pattern

The loop maps to an internal cognitive cycle:

1. **Think/Reason** → Plan next action (Gather Context)
2. **Act** → Execute tools (Take Action)
3. **Observe** → Process results (Verify Work)
4. **Reflect** → Evaluate progress (Verify Work)
5. **Decide** → Continue or stop (Iterate)

Reference: [Anthropic: Building Agents with Claude Agent SDK](https://www.anthropic.com/engineering/building-agents-with-the-claude-agent-sdk)

## Next Steps

* Understand how the loop fits into the **[Agent Lifecycle](/docs/concepts/lifecycle)**
* Explore **[Workflow Patterns](/docs/concepts/workflows-taxonomy)** that orchestrate the loop
* Read the full specification in [AGENTS.md Section 2](https://github.com/synerops/osprotocol/blob/main/AGENTS.md#2-core-execution-pattern-agent-loop)


---

# Agentic OS


## Overview

An **Agentic OS** is a design paradigm where the **Large Language Model (LLM)** functions conceptually as the **Kernel** of the system. Unlike traditional operating systems designed for human interaction, an Agentic OS is a **backend infrastructure layer** that manages the lifecycle and resources of autonomous software agents.

This concept is foundational to understanding the Agentic OS Protocol (OSP). OSP defines the standardized interfaces and behaviors that implementations of an Agentic OS must follow—the contract that enables different agent systems to interoperate.

## Resource Abstraction

Just as traditional operating systems abstract hardware resources (CPU, memory, disk, devices), an Agentic OS abstracts the cognitive resources of AI systems. The Agentic OS manages cognitive resources just as a traditional kernel manages physical hardware:

<Mermaid
  chart="flowchart LR
    CPU[CPU Cycles] -->|maps to| Inference[Inference / Tokens]
    RAM[RAM Memory] -->|maps to| Context[Context Window]
    Disk[Disk / Filesystem] -->|maps to| Vector[Vector Store / RAG]
    Drivers[Device Drivers] -->|maps to| Tools[Tools / MCP]
    Scheduler[Process Scheduler] -->|maps to| Orchestrator[Agent Orchestrator]"
/>

* **CPU Cycles → Inference / Tokens**: Managing the compute required for reasoning and generation
* **RAM (Memory) → Context Window**: Managing the finite amount of information active in the model's immediate attention
* **Disk / Filesystem → Vector Store / RAG**: Managing long-term retrieval and persistent knowledge
* **Device Drivers → Tools / MCP**: Standardizing interfaces for external interaction (APIs, browsers, code execution)
* **Process Scheduler → Agent Orchestrator**: Determining which agent runs when, and for how long

Learn more: [System Intelligence](/docs/system/registry)

## The "User" of the OS

In this paradigm, **the "User" of the Operating System is the Agent itself**, not the human.

* The **Agent** requests resources from the OS ("I need to read this file", "I need to store this memory")
* The **OS** enforces permissions, manages limits, and provides the requested capabilities
* The **Human** acts as the external administrator or the user of the *application* built on top of the OS, but does not interact with the Agentic OS layer directly

This distinction is critical: an Agentic OS is the invisible infrastructure that enables complex, multi-agent systems to function reliably at scale.

## Scope and Purpose

The Agentic OS solves **Orchestration Complexity**, not User Experience. Its primary goals are:

1. **Context Hygiene:** Preventing context pollution and managing finite window sizes
2. **Process Isolation:** Ensuring agents operate within defined boundaries without interfering with each other
3. **Inter-Process Communication:** Enabling standardized communication between disparate agents

These goals align directly with the challenges outlined in our [Motivation](/docs/motivation): as systems grow, managing context, isolation, and communication becomes increasingly complex. The Agentic OS provides the infrastructure layer that addresses these challenges systematically.

Learn more: [Motivation](/docs/motivation) | [Architecture](/docs/architecture)

## Agentic OS vs OSP

It's important to understand the distinction:

* **Agentic OS** is the conceptual paradigm—the architectural metaphor
* **OSP (Agentic OS Protocol)** is the specification—the standardized contract that implementations must follow

Just as "operating system" describes a category of software (Linux, Windows, macOS), "Agentic OS" describes a category of systems that manage agent resources. OSP defines the protocol specification that different implementations can follow to achieve interoperability.

Think of it this way: Linux and Windows are both operating systems, but they follow different architectures. Multiple implementations can follow OSP and each be an "Agentic OS" with different internal designs—but they'll all interoperate because they follow the same protocol contract.

## How OSP Implements the Agentic OS

OSP defines the standardized interfaces and behaviors that make an Agentic OS possible:

* **[System](/docs/system)**: Registry, Environment, Filesystem, Sandbox, Settings, Preferences, Installer, MCP Client — the infrastructure layer
* **[Context](/docs/context)**: System Context, Embeddings, Key-Value — read-only facades for the gather phase
* **[Actions](/docs/actions)**: System Actions, Tools, MCP Servers — write facades for the act phase
* **[Checks](/docs/checks/rules)**: Rules, Judge, Audit, Screenshot — verification and quality assurance

These components work together to provide the resource abstraction, process isolation, and inter-process communication that define an Agentic OS.

Learn more: [Architecture](/docs/architecture)

## Next Steps

* Understand the **[Agent Loop](/docs/concepts/agent-loop)**—the core execution pattern within agents
* Explore the **[Agent Lifecycle](/docs/concepts/lifecycle)**—how the OS manages agent resources
* Review **[Workflow Patterns](/docs/concepts/workflows-taxonomy)**—operational execution patterns


---

# Agent Lifecycle


## Overview

The Agent Lifecycle is a System/Control Workflow that defines how agents are managed within the system. Unlike the [Agent Loop](/docs/concepts/agent-loop) (which describes internal execution), the Lifecycle governs system-level responsibilities: registration, discovery, execution management, and evaluation.

<Mermaid
  chart="flowchart LR
    Reg[Registration]
    Disc[Discovery]
    Exec[Execution]
    Eval[Evaluation]
    
    Reg --> Disc
    Disc --> Exec
    Exec --> Eval"
/>

## The Four Phases

### 1. Registration

Agents declare their capabilities and constraints to the system:

* Capability declaration
* Resource requirements specification
* Constraint definition
* Metadata registration

Learn more: [System Registry](/docs/system/registry)

### 2. Discovery

The system exposes agents for selection and routing:

* Capability-based discovery
* Dynamic service discovery
* Load balancing mechanisms
* Failover protocols

Learn more: [System Registry](/docs/system/registry)

### 3. Execution Management

The OS assigns tasks and monitors progress:

* Task assignment interfaces
* Real-time monitoring
* Error handling
* State management
* Policy enforcement

Learn more: [Runs](/docs/runs/run), [Actions](/docs/actions)

### 4. Evaluation

Outputs, logs, and performance are reviewed:

* Performance monitoring
* Quality assessment
* Compliance verification
* Adaptation mechanisms

Learn more: [Audit](/docs/checks/audit), [Judge](/docs/checks/judge)

## Lifecycle vs Loop vs Workflows

Understanding the distinction is crucial:

| Concept                                            | Layer       | Purpose            | Scope                  |
| -------------------------------------------------- | ----------- | ------------------ | ---------------------- |
| **Lifecycle**                                      | System      | Agent management   | OS/Platform governance |
| **[Loop](/docs/concepts/agent-loop)**              | Cognitive   | Internal execution | Single agent reasoning |
| **[Workflows](/docs/concepts/workflows-taxonomy)** | Operational | Task orchestration | Multi-step processes   |

* **Lifecycle** exists *outside* any specific workflow—it's the system contract
* **Loop** executes *inside* workflows—it's the cognitive engine
* **Workflows** orchestrate *during* Execution/Evaluation phases—they're the macro patterns

Reference: [Anthropic: Building Effective Agents](https://www.anthropic.com/engineering/building-effective-agents)

## Next Steps

* Explore [Workflow Taxonomy](/docs/concepts/workflows-taxonomy) to see operational patterns
* Read the [AGENTS.md](https://github.com/synerops/osprotocol/blob/main/AGENTS.md) knowledge base


---

# Workflows Taxonomy


## Overview

Workflows are operational execution patterns that define how tasks are executed during the Execution/Evaluation phases of the [Agent Lifecycle](/docs/concepts/lifecycle). They are macro-level orchestration patterns, distinct from the [Agent Loop](/docs/concepts/agent-loop) (micro-execution) and the Lifecycle (system layer).

<Mermaid
  chart="mindmap
  root((Workflow Categories))
    SystemControl[&#x22;System/Control Workflows&#x22;]
      Lifecycle[&#x22;Agent Lifecycle&#x22;]
    TaskWorkflows[&#x22;Task Workflows&#x22;]
      Routing[&#x22;Routing&#x22;]
      PromptChaining[&#x22;Prompt Chaining&#x22;]
      OrchestratorWorkers[&#x22;Orchestrator-Workers&#x22;]
      Parallelization[&#x22;Parallelization&#x22;]
      EvaluatorOptimizer[&#x22;Evaluator-Optimizer&#x22;]
    QualityWorkflows[&#x22;Quality Workflows&#x22;]
      RulesValidation[&#x22;Rules Validation&#x22;]
      VisualChecks[&#x22;Visual Checks&#x22;]
      LLMJudge[&#x22;LLM-as-Judge&#x22;]
    RecoveryWorkflows[&#x22;Recovery Workflows&#x22;]
      Retries[&#x22;Retries&#x22;]
      Timeouts[&#x22;Timeouts&#x22;]
      Cancellation[&#x22;Cancellation&#x22;]
    HumanInLoop[&#x22;Human-in-the-Loop Workflows&#x22;]
      ApprovalWorkflows[&#x22;Approval Workflows&#x22;]
      ManualDelegation[&#x22;Manual Delegation&#x22;]
    MultiAgent[&#x22;Multi-Agent Workflows&#x22;]
      AgentCoordination[&#x22;Agent Coordination&#x22;]
      DistributedExecution[&#x22;Distributed Execution&#x22;]"
/>

## The Six Categories

### 1. System/Control Workflows

Govern agent management at the platform level. The primary workflow is the [Agent Lifecycle](/docs/concepts/lifecycle): Registration → Discovery → Execution → Evaluation.

### 2. Task Workflows

Operational patterns for executing work:

* **[Routing](/docs/workflows/routing)**: Classify inputs and direct to specialized tasks
* **Prompt Chaining**: Sequential steps with validation gates
* **[Orchestrator-Workers](/docs/workflows/orchestrator-worker)**: Central orchestrator delegates to workers
* **[Parallelization](/docs/workflows/parallelization)**: Simultaneous execution with aggregation
* **[Evaluator-Optimizer](/docs/workflows/evaluator-optimizer)**: Generate-evaluate-refine loops

Reference: [Anthropic: Building Effective Agents](https://www.anthropic.com/engineering/building-effective-agents)

### 3. Quality Workflows

Ensure outputs meet standards:

* **[Rules Validation](/docs/checks/rules)**: Defined criteria and constraints
* **[Visual Checks](/docs/checks/screenshot)**: Screenshots and renders
* **[LLM-as-Judge](/docs/checks/judge)**: Model-based evaluation

### 4. Recovery Workflows

Handle failures and errors:

* **[Retries](/docs/runs/retry)**: Automatic retry mechanisms
* **[Timeouts](/docs/runs/timeout)**: Long-running operation handling
* **[Cancellation](/docs/runs/cancel)**: Graceful termination of running operations

### 5. Human-in-the-Loop Workflows

Integrate human oversight:

* **[Approval Workflows](/docs/runs/approval)**: Human approval before proceeding
* **Manual Delegation**: Human task assignment

### 6. Multi-Agent Workflows

Coordinate multiple agents:

* **Agent Coordination**: Multiple agents working together
* **Distributed Execution**: Tasks distributed across agents

## Key Distinctions

* **Workflows** are macro-level orchestration patterns used *during* Execution/Evaluation
* **[Agent Loop](/docs/concepts/agent-loop)** is the micro-level cognitive cycle *inside* workflows
* **[Lifecycle](/docs/concepts/lifecycle)** is the system-level governance *around* workflows

## Next Steps

* Explore specific [Task Workflows](/docs/workflows/routing)
* Understand [Quality Assurance](/docs/checks/audit) mechanisms
* Learn about [Recovery Patterns](/docs/runs/retry)
* Read the full specification in [AGENTS.md Section 3](https://github.com/synerops/osprotocol/blob/main/AGENTS.md#3-workflow-patterns)


---

# Embeddings


<Callout type="warn">
  This interface is experimental — no production implementation exists yet.
  The API surface may change.
</Callout>

## Overview

Embeddings is the agent-facing interface for semantic search over indexed knowledge. Agents use it to find relevant content by meaning rather than exact keywords. The vector database infrastructure underneath is a system concern — providers like Pinecone, Upstash Vector, Weaviate, or OpenAI Embeddings handle storage and retrieval without the agent needing to know which one is in use.

## TypeScript API

```ts
import type { Embeddings, EmbeddingEntry, EmbeddingsContext, EmbeddingsActions } from '@osprotocol/schema/context/embeddings'
```

### EmbeddingEntry

A single result returned from a search or get operation.

```ts
interface EmbeddingEntry<T = Record<string, unknown>> {
  id: string
  content: string
  /** Similarity score, 0–1. Present only in search results. */
  score?: number
  metadata?: T
}
```

### EmbeddingsContext

Read-only interface for the context phase of the agent loop. Use this to find relevant entries by meaning or retrieve a known entry by ID.

```ts
interface EmbeddingsContext {
  search<T = Record<string, unknown>>(
    query: string,
    topK: number,
    filter?: Partial<T>
  ): Promise<EmbeddingEntry<T>[]>

  get<T = Record<string, unknown>>(id: string): Promise<EmbeddingEntry<T> | null>
}
```

### EmbeddingsActions

Write interface for the actions phase of the agent loop. Use this to index new content or remove stale entries.

```ts
interface EmbeddingsActions {
  upsert<T = Record<string, unknown>>(
    id: string,
    content: string,
    metadata?: T
  ): Promise<EmbeddingEntry<T>>

  remove(id: string): Promise<boolean>
}
```

### Embeddings

Full interface combining read and write operations.

```ts
interface Embeddings {
  upsert<T = Record<string, unknown>>(
    id: string,
    content: string,
    metadata?: T
  ): Promise<EmbeddingEntry<T>>

  search<T = Record<string, unknown>>(
    query: string,
    topK: number,
    filter?: Partial<T>
  ): Promise<EmbeddingEntry<T>[]>

  get<T = Record<string, unknown>>(id: string): Promise<EmbeddingEntry<T> | null>

  remove(id: string): Promise<boolean>
}
```

## Usage Examples

### Semantic search with metadata filter

```ts
type DocMeta = { source: string; language: string }

const results = await embeddings.search<DocMeta>(
  'how to handle authentication errors',
  5,
  { language: 'en' }
)

for (const entry of results) {
  console.log(entry.score, entry.content)
  // 0.91  "When a 401 is returned, refresh the token and retry..."
}
```

### Upsert content into the index

```ts
await embeddings.upsert(
  'doc:auth-errors',
  'When a 401 is returned, refresh the token and retry the request.',
  { source: 'runbook', language: 'en' }
)
```

### RAG pattern — retrieve, then generate

```ts
const chunks = await embeddings.search(userQuestion, 3)

const context = chunks.map((c) => c.content).join('\n\n')

const answer = await llm.complete(`Answer using this context:\n\n${context}\n\nQuestion: ${userQuestion}`)
```

## Embeddings vs Key-Value

| Concern    | Embeddings                 | Key-Value (`context/kv`)        |
| ---------- | -------------------------- | ------------------------------- |
| Lookup by  | Meaning / similarity       | Exact key                       |
| Returns    | Ranked results with scores | Single entry or null            |
| Best for   | Knowledge retrieval, RAG   | Session state, config, counters |
| Query type | Natural language query     | Known key string                |

## Integration

Embeddings integrates with:

* **[Key-Value Store](/docs/context/kv)**: Complementary persistence — embeddings for semantic search, kv for exact lookups
* **[System Context](/docs/context/system)**: EmbeddingsContext is part of the read-only system context facade
* **[Filesystem](/docs/system/fs)**: Source documents can be read from fs and indexed into embeddings


---

# Context


## Overview

The Context domain provides application-specific context and data management for agents. It enables agents to access, store, and retrieve information needed for intelligent decision-making and task execution.

Context is one of the three pillars of the agent loop: **Gather Context** → Take Actions → Verify Results.

## Context APIs

| API                                    | Description                                            |
| -------------------------------------- | ------------------------------------------------------ |
| [System Context](/docs/context/system) | Read-only composition of all system Context interfaces |
| [Embeddings](/docs/context/embeddings) | Vector embeddings for semantic search                  |
| [Key-Value Store](/docs/context/kv)    | Key-value persistence for the agent loop               |

## Role in Agent Loop

Context provides the foundation for informed agent behavior:

<Mermaid
  chart="flowchart TD
    subgraph Context[&#x22;Gather Context&#x22;]
        SC[System Context]
        Emb[Embeddings]
        KV[Key-Value]
    end

    Context --> Actions[Take Actions]
    Actions --> Verify[Verify Results]
    Verify --> Update[Update Context]
    Update -.-> Context"
/>

## Usage

Context is accessed through the `context` protocol domain. `SystemContext` composes all system read interfaces into a single entry point. `Embeddings` and `KV` are agent-facing read/write interfaces for semantic search and key-value persistence respectively.

## Integration

Context integrates with:

* **System**: Accesses system-level information
* **Actions**: Provides context for action execution
* **Checks**: Context informs quality verification
* **Workflows**: Workflows access context during execution


---

# Key-Value Store


<Callout type="warn">
  This interface is experimental — no production implementation exists yet.
  The API surface may change.
</Callout>

## Overview

The key-value store provides flat, direct-access persistence for structured data. Agents use it to store and retrieve data by known keys — session state, user preferences, configuration, counters. Unlike `fs` (hierarchical, file-based) and `embeddings` (semantic search by meaning), `kv` is for exact key lookups.

## TypeScript API

```ts
import type { Kv, KvEntry, KvContext, KvActions } from '@osprotocol/schema/context/kv'
```

### KvEntry

A single key-value entry.

```ts
interface KvEntry<T = unknown> {
  /** Entry key */
  key: string
  /** Entry value */
  value: T
  /** Extensible metadata for provider-specific data */
  metadata?: Record<string, unknown>
}
```

### Kv

Full key-value store interface with read and write operations.

```ts
interface Kv {
  get<T = unknown>(key: string): Promise<KvEntry<T> | null>
  set<T = unknown>(key: string, value: T): Promise<KvEntry<T>>
  remove(key: string): Promise<boolean>
  list(prefix?: string): Promise<string[]>
}
```

### KvContext

Read-only view for the context phase of the agent loop.

```ts
interface KvContext {
  get<T = unknown>(key: string): Promise<KvEntry<T> | null>
  list(prefix?: string): Promise<string[]>
}
```

### KvActions

Write operations for the actions phase of the agent loop.

```ts
interface KvActions {
  set<T = unknown>(key: string, value: T): Promise<KvEntry<T>>
  remove(key: string): Promise<boolean>
}
```

## Usage Examples

### Store and retrieve session state

```ts
await kv.set('session:abc123', {
  userId: 'user-42',
  startedAt: Date.now(),
  step: 'code-review',
})

const session = await kv.get<{ userId: string; step: string }>('session:abc123')
// session.value.step → 'code-review'
```

### Enumerate keys by prefix

```ts
const keys = await kv.list('session:')
// ['session:abc123', 'session:def456', ...]
```

### Remove expired data

```ts
const removed = await kv.remove('session:abc123')
// true if the entry existed
```

## Agent Persistence Model

The protocol provides three distinct persistence patterns:

| Pattern          | Interface            | Access                       | Use Case                                 |
| ---------------- | -------------------- | ---------------------------- | ---------------------------------------- |
| **Hierarchical** | `system/fs`          | Paths and directories        | Files, configs, artifacts                |
| **Key-value**    | `context/kv`         | Direct key lookup            | Session state, counters, structured data |
| **Semantic**     | `context/embeddings` | Similarity search by meaning | Knowledge retrieval, RAG                 |

## Integration

Key-Value Store integrates with:

* **[Embeddings](/docs/context/embeddings)**: Complementary persistence — kv for exact lookups, embeddings for semantic search
* **[Filesystem](/docs/system/fs)**: Complementary persistence — kv for flat data, fs for hierarchical files
* **[System Context](/docs/context/system)**: KvContext is part of the read-only system context facade


---

# System Context


<Callout type="warn">
  This interface is experimental — no production implementation exists yet.
  The API surface may change.
</Callout>

## Overview

`SystemContext` is the read-only facade that composes all system Context interfaces into a single entry point. It is used during the context (gather) phase of the agent loop, giving agents a unified view of system state without the ability to mutate it. Write operations are handled by the counterpart: [SystemActions](/docs/actions/system).

## Architecture

`SystemContext` is pure composition — it adds no logic of its own, only grouping each system API's read-only Context interface under one namespace. `SystemActions` mirrors this structure for write operations.

<Mermaid
  chart="flowchart LR
    Agent([Agent]) --> SC[SystemContext]

    subgraph ReadOnly[&#x22;Read-only (Context Phase)&#x22;]
        SC --> env[EnvContext]
        SC --> settings[SettingsContext]
        SC --> preferences[PreferencesContext]
        SC --> registry[RegistryContext]
        SC --> fs[FsContext]
        SC --> sandbox[SandboxContext]
        SC --> installer[InstallerContext]
        SC --> mcp[McpContext]
    end

    Agent -.->|write operations| SA[SystemActions]"
/>

## TypeScript API

```ts
import type { SystemContext } from '@osprotocol/schema/context/system'
```

### SystemContext

Composes all system read-only interfaces.

```ts
interface SystemContext {
  env: EnvContext
  settings: SettingsContext
  preferences: PreferencesContext
  registry: RegistryContext
  fs: FsContext
  sandbox: SandboxContext
  installer: InstallerContext
  mcp: McpContext
}
```

## Composed Interfaces

| Property      | Type                 | Provides                                                       | Docs                                    |
| ------------- | -------------------- | -------------------------------------------------------------- | --------------------------------------- |
| `env`         | `EnvContext`         | Read environment variables (get, list)                         | [Env](/docs/system/env)                 |
| `settings`    | `SettingsContext`    | Read system-wide settings (get, list)                          | [Settings](/docs/system/settings)       |
| `preferences` | `PreferencesContext` | Read per-agent or per-user preferences by scope (get, list)    | [Preferences](/docs/system/preferences) |
| `registry`    | `RegistryContext`    | Discover and look up registered resources (get, list)          | [Registry](/docs/system/registry)       |
| `fs`          | `FsContext`          | Read host filesystem entries (read, list, exists)              | [Fs](/docs/system/fs)                   |
| `sandbox`     | `SandboxContext`     | Inspect existing sandbox environments (get, list)              | [Sandbox](/docs/system/sandbox)         |
| `installer`   | `InstallerContext`   | Inspect installed packages and their status (get, list)        | [Installer](/docs/system/installer)     |
| `mcp`         | `McpContext`         | Inspect MCP server connections and available tools (get, list) | [MCP Client](/docs/system/mcp-client)   |

## Usage Examples

### Check environment and preferences together

An agent reads an environment variable and resolves a user preference in the same context phase before deciding how to act.

```ts
async function resolveOutputConfig(system: SystemContext) {
  const dbUrl = await system.env.get('DATABASE_URL')
  const formatPref = await system.preferences.get('output.format', 'user')

  return {
    databaseUrl: dbUrl?.value ?? null,
    outputFormat: formatPref?.value ?? 'json',
  }
}
```

### Inspect installed packages and MCP connections

An agent audits what capabilities are currently available before deciding whether to proceed with a task.

```ts
async function auditCapabilities(system: SystemContext) {
  const [packages, mcpServers] = await Promise.all([
    system.installer.list(),
    system.mcp.list(),
  ])

  const hasSchemaPackage = packages.some(
    (p) => p.name === '@osprotocol/schema' && p.status === 'installed'
  )

  const connectedServers = mcpServers.filter((s) => s.status === 'connected')

  return { hasSchemaPackage, connectedServers }
}
```

## Integration

* **[SystemActions](/docs/actions/system)**: The write counterpart — same system interfaces, mutation operations
* **[EnvContext](/docs/system/env)**: Environment variable read interface
* **[SettingsContext](/docs/system/settings)**: System-wide settings read interface
* **[PreferencesContext](/docs/system/preferences)**: Scoped preferences read interface
* **[RegistryContext](/docs/system/registry)**: Resource registry read interface
* **[FsContext](/docs/system/fs)**: Host filesystem read interface
* **[SandboxContext](/docs/system/sandbox)**: Sandbox inspection interface
* **[InstallerContext](/docs/system/installer)**: Installed packages read interface
* **[McpContext](/docs/system/mcp-client)**: MCP server connections read interface


---

# Approval


## Overview

The approval system enables human oversight of workflow execution. It provides approval requests, multi-approver workflows, and configurable timeout behavior for critical checkpoints.

<Mermaid
  chart="sequenceDiagram
    participant E as Execution
    participant A as Approval System
    participant H as Human

    E->>A: waitForApproval(message)
    A->>H: Show approval request
    Note over E: Status: awaiting
    alt Approved
        H->>A: Approve (with reason)
        A->>E: Approval { approved: true }
        Note over E: Status: in-progress
    else Rejected
        H->>A: Reject (with reason)
        A->>E: Approval { approved: false }
    else Timeout
        A->>E: Auto-approve or reject
    end"
/>

## TypeScript API

```ts
import type {
  Approval,
  ApprovalConfig,
  ApprovalRequest
} from '@osprotocol/schema/runs/approval'
```

### Approval

Result of an approval request.

```ts
interface Approval {
  /** Whether the action was approved */
  approved: boolean
  /** Optional reason for the decision */
  reason?: string
  /** Identifier of who approved (user ID, email, etc.) */
  approvedBy?: string
  /** When the approval decision was made */
  timestamp: Date
}
```

### ApprovalConfig

Configuration for approval requests.

```ts
interface ApprovalConfig {
  /** Default timeout for approval requests (milliseconds) */
  timeoutMs?: number
  /** Whether to auto-approve after timeout */
  autoApproveOnTimeout?: boolean
  /** List of users who can approve */
  approvers?: string[]
  /** Minimum approvals required (for multi-approval scenarios) */
  requiredApprovals?: number
}
```

### ApprovalRequest

A pending request for human approval.

```ts
interface ApprovalRequest {
  /** Unique identifier for this request */
  id: string
  /** Message describing what needs approval */
  message: string
  /** Execution ID this request belongs to */
  executionId: string
  /** When the request was created */
  createdAt: Date
  /** When the request expires */
  expiresAt?: Date
  /** Current approval responses */
  responses: Approval[]
}
```

## Usage Example

```ts
// Request approval during execution
const approval = await execution.waitForApproval(
  'Deploy to production environment?'
)

if (approval.approved) {
  console.log(`Approved by ${approval.approvedBy}: ${approval.reason}`)
  // Continue with deployment
} else {
  console.log(`Denied: ${approval.reason}`)
  // Handle rejection
}
```

## Multi-Approval Workflows

For critical operations requiring multiple approvers:

```ts
const config: ApprovalConfig = {
  timeoutMs: 3600000, // 1 hour
  approvers: ['alice@company.com', 'bob@company.com'],
  requiredApprovals: 2,
  autoApproveOnTimeout: false
}
```

## Integration

Approval integrates with:

* **Execution**: Pauses execution until approval
* **Timeout**: Approval requests can expire
* **Cancel**: Rejected approvals can trigger cancellation


---

# Cancellation


## Overview

The cancel system provides mechanisms to cancel running workflows gracefully. It supports cancellation hooks, cleanup operations, and configurable grace periods for in-progress work.

## TypeScript API

```ts
import type { Cancel } from '@osprotocol/schema/runs/cancel'
```

### Cancel

Cancel configuration for workflow runs.

```ts
interface Cancel {
  /**
   * Called before cancellation proceeds
   * Return false to prevent cancellation
   */
  beforeCancel?: () => boolean | Promise<boolean>

  /**
   * Called after cancellation completes
   */
  afterCancel?: () => void

  /**
   * Optional reason for cancellation
   */
  reason?: string

  /**
   * Whether to wait for cleanup before resolving
   */
  graceful?: boolean

  /**
   * Timeout for graceful cancellation in milliseconds
   */
  gracefulTimeoutMs?: number
}
```

## Usage Examples

### Simple Cancellation

```ts
// Cancel an execution
await execution.cancel('User requested stop')
```

### Cancellation with Cleanup

```ts
const cancel: Cancel = {
  graceful: true,
  gracefulTimeoutMs: 5000,
  beforeCancel: async () => {
    // Check if safe to cancel
    const canCancel = await checkSafeToCancel()
    return canCancel
  },
  afterCancel: () => {
    // Cleanup resources
    cleanupTempFiles()
    closeConnections()
  }
}
```

### Preventing Cancellation

```ts
const cancel: Cancel = {
  beforeCancel: () => {
    if (criticalOperationInProgress) {
      console.log('Cannot cancel during critical operation')
      return false // Prevent cancellation
    }
    return true
  }
}
```

### Graceful Shutdown

```ts
const cancel: Cancel = {
  graceful: true,
  gracefulTimeoutMs: 10000, // 10 second grace period
  reason: 'System shutdown',
  afterCancel: () => {
    notifyDependentSystems()
  }
}
// Waits up to 10 seconds for graceful cleanup
// Forces cancellation if cleanup exceeds timeout
```

## Cancellation Flow

<Mermaid
  chart="flowchart TD
    Start([cancel called]) --> Before{beforeCancel?}
    Before -->|returns false| Prevented([Cancel prevented])
    Before -->|returns true| Graceful{graceful?}
    Before -->|not defined| Graceful
    Graceful -->|no| Stop[Immediate stop]
    Graceful -->|yes| Wait[Wait for cleanup]
    Wait --> Timeout{timeout?}
    Timeout -->|no| After[afterCancel]
    Timeout -->|yes| Force[Force stop]
    Force --> After
    Stop --> After
    After --> Done([Status: cancelled])"
/>

## Integration

Cancel integrates with:

* **RunOptions**: Configure cancel behavior for runs
* **Timeout**: Timeouts can trigger cancellation
* **Execution**: Cancel is called via execution.cancel()
* **Approval**: Rejected approvals may trigger cancellation


---

# Retry


## Overview

The retry system provides configurable retry behavior for failed operations. It supports multiple backoff strategies, conditional retries, and callbacks for monitoring retry attempts.

<Mermaid
  chart="flowchart LR
    Start([Execute]) --> Result{Success?}
    Result -->|yes| Done([Complete])
    Result -->|no| Check{shouldRetry?}
    Check -->|no| Fail([Failed])
    Check -->|yes| Attempts{attempts left?}
    Attempts -->|no| Fail
    Attempts -->|yes| Wait[Wait delayMs]
    Wait --> Backoff[Apply backoff]
    Backoff --> Retry([Retry])
    Retry --> Result"
/>

## Backoff Strategies

| Strategy      | Description                                     |
| ------------- | ----------------------------------------------- |
| `none`        | No delay increase between retries               |
| `linear`      | Delay increases linearly (delay \* attempt)     |
| `exponential` | Delay doubles each attempt (delay \* 2^attempt) |

## TypeScript API

```ts
import type { Retry, Backoff } from '@osprotocol/schema/runs/retry'
```

### Backoff

Available backoff strategies for retry delays.

```ts
type Backoff = 'none' | 'linear' | 'exponential'
```

### Retry

Retry configuration for workflow runs.

```ts
interface Retry {
  /** Maximum number of retry attempts */
  attempts: number
  /** Initial delay between retries in milliseconds */
  delayMs: number
  /** Backoff strategy (default: 'none') */
  backoff?: Backoff
  /** Maximum delay when using backoff (milliseconds) */
  maxDelayMs?: number
  /** Callback on each retry attempt */
  onRetry?: (error: Error, attempt: number) => void
  /** Optional predicate to determine if error is retryable */
  shouldRetry?: (error: Error) => boolean
}
```

## Usage Examples

### Simple Retry

```ts
const retry: Retry = {
  attempts: 3,
  delayMs: 1000
}
// Retries up to 3 times with 1 second between each attempt
```

### Exponential Backoff

```ts
const retry: Retry = {
  attempts: 5,
  delayMs: 100,
  backoff: 'exponential',
  maxDelayMs: 10000,
  onRetry: (error, attempt) => {
    console.log(`Retry ${attempt}: ${error.message}`)
  }
}
// Delays: 100ms, 200ms, 400ms, 800ms, 1600ms (capped at 10000ms)
```

### Conditional Retry

```ts
const retry: Retry = {
  attempts: 3,
  delayMs: 500,
  shouldRetry: (error) => {
    // Only retry network errors
    return error.name === 'NetworkError'
  }
}
```

## Delay Calculation

| Strategy      | Attempt 1 | Attempt 2    | Attempt 3    | Attempt 4    |
| ------------- | --------- | ------------ | ------------ | ------------ |
| `none`        | delayMs   | delayMs      | delayMs      | delayMs      |
| `linear`      | delayMs   | 2 \* delayMs | 3 \* delayMs | 4 \* delayMs |
| `exponential` | delayMs   | 2 \* delayMs | 4 \* delayMs | 8 \* delayMs |

## Integration

Retry integrates with:

* **RunOptions**: Configure retry behavior for runs
* **Timeout**: Retries respect timeout constraints
* **Cancel**: Pending retries can be cancelled


---

# Run Lifecycle


## Overview

Executions represent an active workflow run with full lifecycle control. The run system provides status tracking, progress monitoring, and execution control through pause, resume, and cancel operations.

Creating a run IS starting it — `workflow.run()` returns an active `Execution` handle directly. This aligns with the [Agent Communication Protocol (ACP)](https://agentcommunicationprotocol.dev/core-concepts/agent-run-lifecycle).

## Lifecycle

<Mermaid
  chart="stateDiagram-v2
    [*] --> pending: workflow.run()
    pending --> in_progress: execution begins
    in_progress --> awaiting: waitForApproval()
    awaiting --> in_progress: approved
    in_progress --> completed: success
    in_progress --> failed: error
    in_progress --> cancelled: cancel()
    awaiting --> cancelled: rejected/timeout
    completed --> [*]
    failed --> [*]
    cancelled --> [*]"
/>

## TypeScript API

```ts
import type {
  RunOptions,
  RunStatus,
  Execution,
  ExecutionProgress
} from '@osprotocol/schema/runs'
```

### RunStatus

The possible states of a workflow execution.

```ts
type RunStatus =
  | 'pending'      // Execution is queued/initializing
  | 'in-progress'  // Execution is actively running
  | 'awaiting'     // Execution is waiting for human input/approval
  | 'completed'    // Execution finished successfully
  | 'failed'       // Execution encountered an error
  | 'cancelled'    // Execution was cancelled
```

### RunOptions

Options for configuring a workflow run.

```ts
interface RunOptions<Output> {
  /** Timeout configuration */
  timeout?: Timeout
  /** Retry configuration */
  retry?: Retry
  /** Cancel configuration */
  cancel?: Cancel
  /** Callback when run completes successfully */
  onComplete?: (result: Output) => void
  /** Callback when run fails */
  onFailed?: (error: Error) => void
  /** Callback on each status change */
  onStatusChange?: (status: RunStatus) => void
}
```

### Execution

Active execution handle for controlling a running workflow.

```ts
interface Execution<Output> {
  /** Unique identifier for this execution */
  id: string
  /** Current status */
  status: RunStatus
  /** Progress information */
  progress: ExecutionProgress
  /** Execution logs */
  logs: string[]

  /** Pause the execution (if supported) */
  pause(): Promise<void>

  /** Resume a paused execution */
  resume(): Promise<void>

  /** Cancel the execution */
  cancel(reason?: string): Promise<void>

  /** Request human approval before continuing */
  waitForApproval(message?: string): Promise<Approval>

  /** Request input from a human */
  waitForInput<Input>(prompt: string): Promise<Input>

  /** The final result of the execution (resolves when complete) */
  result: Promise<Output>
}
```

### ExecutionProgress

Progress tracking for an execution.

```ts
interface ExecutionProgress {
  /** Current step number */
  current: number
  /** Total number of steps (0 if unknown) */
  total: number
  /** Description of current step */
  message?: string
}
```

## Usage Example

```ts
// Run a workflow and get an active execution
const execution = await workflow.run(prompt, {
  timeout: { ms: 30000 },
  retry: { attempts: 3, delayMs: 1000 },
  onStatusChange: (status) => console.log('Status:', status)
})

// Monitor progress
console.log(`Progress: ${execution.progress.current}/${execution.progress.total}`)

// Wait for result
const result = await execution.result
```

## Integration

Execution integrates with:

* **Timeout**: Enforce time limits on execution
* **Retry**: Automatically retry on failure
* **Cancel**: Graceful cancellation support
* **Approval**: Human-in-the-loop checkpoints


---

# Timeout


## Overview

The timeout system manages time limits for workflow execution. It ensures operations complete within specified durations and provides configurable actions for timeout scenarios.

<Mermaid
  chart="flowchart LR
    Start([Start execution]) --> Timer[Start timer]
    Timer --> Running{Running}
    Running -->|completes| Done([Complete])
    Running -->|timeout| Action{onTimeout}
    Action -->|fail| Failed([Status: failed])
    Action -->|cancel| Cancel([Trigger cancel flow])
    Action -->|continue| Continue([Log & continue])"
/>

## Timeout Actions

| Action     | Description                                 |
| ---------- | ------------------------------------------- |
| `fail`     | Mark the run as failed on timeout (default) |
| `cancel`   | Trigger graceful cancellation on timeout    |
| `continue` | Log timeout but allow execution to continue |

## TypeScript API

```ts
import type { Timeout, TimeoutAction } from '@osprotocol/schema/runs/timeout'
```

### TimeoutAction

Action to take when a timeout occurs.

```ts
type TimeoutAction = 'fail' | 'cancel' | 'continue'
```

### Timeout

Timeout configuration for workflow runs.

```ts
interface Timeout {
  /** Timeout duration in milliseconds */
  ms: number
  /** Action to take when timeout occurs (default: 'fail') */
  onTimeout?: TimeoutAction
  /** Callback function when timeout occurs */
  onTimeoutCallback?: () => void
}
```

## Usage Examples

### Simple Timeout

```ts
const timeout: Timeout = {
  ms: 30000 // 30 seconds
}
// Fails the run if not complete within 30 seconds
```

### Graceful Cancellation

```ts
const timeout: Timeout = {
  ms: 60000, // 1 minute
  onTimeout: 'cancel',
  onTimeoutCallback: () => {
    console.log('Timeout reached, initiating graceful shutdown')
  }
}
// Triggers cancellation flow instead of immediate failure
```

### Warning Without Failure

```ts
const timeout: Timeout = {
  ms: 120000, // 2 minutes
  onTimeout: 'continue',
  onTimeoutCallback: () => {
    sendAlert('Operation exceeding expected duration')
  }
}
// Logs warning but allows execution to continue
```

## Integration

Timeout integrates with:

* **RunOptions**: Configure timeout for workflow runs
* **Cancel**: Timeout can trigger cancellation flow
* **Retry**: Retries reset the timeout clock
* **Execution**: Status changes to failed/cancelled on timeout


---

# Environment


<Callout type="warn">
  This interface is experimental — no production implementation exists yet.
  The API surface may change.
</Callout>

## Overview

The `Env` interface provides kernel-level access to environment variables in the execution environment. It defines a convergent surface for creating, reading, updating, and removing configuration variables across deployment platforms such as Vercel, Cloudflare Workers, and Railway.

Variables can be scoped to specific deployment targets and marked as sensitive to control how they are handled by the platform.

## TypeScript API

```ts
import type {
  Env,
  EnvEntry,
  EnvContext,
  EnvActions,
} from '@osprotocol/schema/system/env'
```

### EnvEntry

A single environment variable, including its value, optional target scopes, and sensitivity flag.

```ts
interface EnvEntry<T = string> {
  key: string
  value: T
  target?: string[]
  sensitive?: boolean
  metadata?: Record<string, unknown>
}
```

### EnvContext

Read-only access for the context phase of the agent loop.

```ts
interface EnvContext {
  get(key: string): Promise<EnvEntry | null>
  list(): Promise<EnvEntry[]>
}
```

### EnvActions

Write operations for the actions phase of the agent loop.

```ts
interface EnvActions {
  set(entry: Omit<EnvEntry, 'metadata'>): Promise<EnvEntry>
  remove(key: string): Promise<boolean>
}
```

### Env

Full environment management interface for provider implementations.

```ts
interface Env {
  get(key: string): Promise<EnvEntry | null>
  set(entry: Omit<EnvEntry, 'metadata'>): Promise<EnvEntry>
  remove(key: string): Promise<boolean>
  list(): Promise<EnvEntry[]>
}
```

## Usage Examples

### Read a variable

```ts
const entry = await env.get('DATABASE_URL')

if (entry) {
  console.log(entry.key)       // 'DATABASE_URL'
  console.log(entry.sensitive) // true
}
```

### Set a variable scoped to production

```ts
await env.set({
  key: 'API_SECRET',
  value: 'sk-live-...',
  target: ['production'],
  sensitive: true,
})
```

### Rotate a key

```ts
await env.set({
  key: 'OPENAI_API_KEY',
  value: 'sk-new-...',
  sensitive: true,
})

// Or remove it entirely
await env.remove('OPENAI_API_KEY')
```

## Integration

The `Env` interface integrates with:

* **[Sandbox](/docs/system/sandbox)**: Sandboxes inherit or override environment variables at creation time
* **[Settings](/docs/system/settings)**: Settings may reference environment variable keys for dynamic configuration
* **[Preferences](/docs/system/preferences)**: Preference resolution can fall through to system-level values sourced from environment


---

# Filesystem


<Callout type="warn">
  This interface is experimental — no production implementation exists yet.
  The API surface may change.
</Callout>

## Overview

The filesystem interface provides platform-level access to the host execution environment's file system. Agents use it to read configurations, persist artifacts, and navigate directories across storage backends including local disk, S3, Vercel Blob, and Cloudflare R2. Sandbox environments manage their own internal filesystem independently — `system/fs` operates on the host, not inside isolated execution containers.

## TypeScript API

```ts
import type { Fs, FsEntry, FsContext, FsActions } from '@osprotocol/schema/system/fs'
```

### FsEntry

Represents a file or directory in the filesystem.

```ts
interface FsEntry {
  name: string
  path: string
  type: 'file' | 'directory'
  size?: number
  updatedAt?: number
  metadata?: Record<string, unknown>
}
```

### FsContext

Read-only view for the context (gather) phase of the agent loop.

```ts
interface FsContext {
  read(path: string): Promise<string | null>
  list(path: string): Promise<FsEntry[]>
  exists(path: string): Promise<boolean>
}
```

### FsActions

Write operations for the actions (act) phase of the agent loop.

```ts
interface FsActions {
  write(path: string, content: string): Promise<FsEntry>
  remove(path: string): Promise<boolean>
}
```

### Fs

Combined interface providing full read and write access. Used directly by providers that do not split context and actions phases.

```ts
interface Fs {
  read(path: string): Promise<string | null>
  write(path: string, content: string): Promise<FsEntry>
  remove(path: string): Promise<boolean>
  list(path: string): Promise<FsEntry[]>
  exists(path: string): Promise<boolean>
}
```

## Usage Examples

### Read a configuration file

```ts
const content = await fs.read('/project/agent.yaml')
if (content === null) {
  throw new Error('agent.yaml not found')
}
const config = parseYaml(content)
```

### Persist an artifact and confirm the write

```ts
const report = generateReport(results)
const entry = await fs.write('/output/report.md', report)
console.log(`Wrote ${entry.size} bytes to ${entry.path}`)
```

### List a directory and filter by type

```ts
const entries = await fs.list('/output')
const files = entries.filter((e) => e.type === 'file')
console.log(`Found ${files.length} files in /output`)
```

## Integration

Filesystem integrates with:

* **[Sandbox](/docs/system/sandbox)**: Sandbox has its own isolated filesystem — use `system/fs` for host-level persistence before or after sandbox execution
* **[KV](/docs/context/kv)**: KV stores short-lived key-value data; filesystem is for structured, durable file content
* **[Environment](/docs/system/env)**: Environment variables may point to filesystem roots or provider credentials


---

# System


import { Database, Settings as SettingsIcon, FolderOpen, Server, Wrench, User, Plug } from 'lucide-react';

## Overview

The System Intelligence layer provides infrastructure services that all agents depend on. Unlike traditional protocols that focus solely on agent-to-agent communication, OSP includes system-level intelligence through the Operating System abstraction.

This layer manages the lifecycle, coordination, and resource management that multi-agent systems require, providing the foundation for reliable, scalable agent orchestration.

## The Operating System Abstraction

Just as traditional operating systems abstract hardware resources (CPU, memory, disk), the System Intelligence layer abstracts cognitive resources (inference, context, knowledge, tools). It provides standardized APIs for:

* **Agent Registry**: Discovery and capability management
* **System Configuration**: Environment and settings management
* **File Operations**: Standardized file system interfaces
* **Protocol Integration**: MCP client and external tool access
* **Installation & Setup**: System deployment and configuration

Learn more: [Agentic OS Concept](/docs/concepts/agentic-os) | [Architecture](/docs/architecture)

## System Components

<Cards>
  <Card icon={<Database />} href="/docs/system/registry" title="Registry">
    Agent registration, discovery, and capability matching for dynamic service allocation.
  </Card>

  <Card icon={<FolderOpen />} href="/docs/system/env" title="Environment">
    Configuration and environment variable management for deployment-specific settings.
  </Card>

  <Card icon={<Server />} href="/docs/system/fs" title="Filesystem">
    Standardized file system operations and management for agent file access.
  </Card>

  <Card icon={<SettingsIcon />} href="/docs/system/settings" title="Settings">
    System and agent settings management for centralized configuration.
  </Card>

  <Card icon={<Plug />} href="/docs/system/mcp-client" title="MCP Client">
    Model Context Protocol client for standardized external tool and resource access.
  </Card>

  <Card icon={<Wrench />} href="/docs/system/installer" title="Installer">
    System installation, setup, and dependency management for deployment.
  </Card>

  <Card icon={<User />} href="/docs/system/preferences" title="Preferences">
    Agent preferences and user settings for customization and personalization.
  </Card>

  <Card icon={<Server />} href="/docs/system/sandbox" title="Sandbox">
    Isolated execution environments for running agent workloads with their own filesystem and commands.
  </Card>
</Cards>

## How System Intelligence Works

The System Intelligence layer operates at the infrastructure level, providing services that agents use rather than defining agent behavior directly:

1. **Registration**: Agents register capabilities with the Registry
2. **Discovery**: The OS matches agents to tasks based on capabilities
3. **Configuration**: Environment and settings provide runtime context
4. **Execution**: Filesystem and MCP enable tool access and file operations
5. **Management**: Installer and preferences configure the system

These components work together to provide the resource abstraction, process isolation, and inter-process communication that define an Agentic OS.

## Integration with Other Layers

System Intelligence integrates with:

* **[Context](/docs/context)**: Filesystem and Environment provide context storage
* **[Actions](/docs/actions)**: MCP Client enables standardized tool access
* **[Checks](/docs/checks)**: Settings and Preferences configure check behavior
* **[Workflows](/docs/workflows)**: Registry enables agent discovery for orchestration patterns

## Next Steps

* Explore **[Registry](/docs/system/registry)** to understand agent discovery and matching
* Learn about **[Environment](/docs/system/env)** for configuration management
* Review **[MCP Client](/docs/system/mcp-client)** for external protocol integration
* Understand how System Intelligence fits into the **[Architecture](/docs/architecture)**


---

# Installer


<Callout type="warn">
  This interface is experimental — no production implementation exists yet.
  The API surface may change.
</Callout>

## Overview

The Installer is the kernel's package manager, responsible for adding capabilities to the system at runtime. It installs, updates, and removes skills, tools, and extensions without requiring a system restart. Provider backends include npm, pip, Claude Code Skills, and Homebrew.

## Install Status

<Mermaid
  chart="stateDiagram-v2
    [*] --> installed: install()
    installed --> updating: update()
    updating --> installed: success
    updating --> failed: error
    failed --> [*]"
/>

## TypeScript API

```ts
import type {
  InstallStatus,
  InstallEntry,
  InstallerContext,
  InstallerActions,
  Installer,
} from '@osprotocol/schema/system/installer'
```

### InstallStatus

```ts
type InstallStatus = 'installed' | 'updating' | 'failed'
```

The lifecycle state of a managed package.

| Value       | Description                                 |
| ----------- | ------------------------------------------- |
| `installed` | Package is present and ready to use         |
| `updating`  | An update is in progress                    |
| `failed`    | The last install or update operation failed |

### InstallEntry

```ts
interface InstallEntry {
  name: string
  version: string
  status: InstallStatus
  installedAt: number
  metadata?: Record<string, unknown>
}
```

A record describing a managed package. `installedAt` is a Unix timestamp (milliseconds). `metadata` carries provider-specific data (e.g., checksums, source URLs).

### InstallerContext

```ts
interface InstallerContext {
  get(name: string): Promise<InstallEntry | null>
  list(): Promise<InstallEntry[]>
}
```

Read-only gather phase. Use `InstallerContext` to inspect what is currently installed without triggering side effects.

| Method      | Description                                                       |
| ----------- | ----------------------------------------------------------------- |
| `get(name)` | Returns the `InstallEntry` for `name`, or `null` if not installed |
| `list()`    | Returns all managed `InstallEntry` records                        |

### InstallerActions

```ts
interface InstallerActions {
  install(name: string, version?: string): Promise<InstallEntry>
  uninstall(name: string): Promise<boolean>
  update(name: string, version?: string): Promise<InstallEntry>
}
```

Write act phase. Use `InstallerActions` to mutate the set of installed packages.

| Method                    | Description                                                    |
| ------------------------- | -------------------------------------------------------------- |
| `install(name, version?)` | Installs the package. Resolves to the resulting `InstallEntry` |
| `uninstall(name)`         | Removes the package. Resolves to `true` on success             |
| `update(name, version?)`  | Updates the package, optionally pinning a version              |

### Installer

```ts
interface Installer {
  install(name: string, version?: string): Promise<InstallEntry>
  uninstall(name: string): Promise<boolean>
  get(name: string): Promise<InstallEntry | null>
  list(): Promise<InstallEntry[]>
  update(name: string, version?: string): Promise<InstallEntry>
}
```

The combined interface. `Installer` merges `InstallerContext` and `InstallerActions` into a single object that a kernel implementation exposes to agents.

## Usage Examples

### Install a package

```ts
const entry = await installer.install('@osprotocol/skill-web-search')
console.log(entry.status) // 'installed'
console.log(entry.version) // e.g. '1.2.0'
```

### List installed packages

```ts
const packages = await installer.list()
for (const pkg of packages) {
  console.log(`${pkg.name}@${pkg.version} — ${pkg.status}`)
}
```

### Update a package

```ts
const updated = await installer.update('@osprotocol/skill-web-search', '2.0.0')
if (updated.status === 'installed') {
  console.log('Update succeeded')
} else {
  console.error('Update failed')
}
```

## Integration

* [Registry](/docs/system/registry) — discover available skills and tools before installing
* [MCP Client](/docs/system/mcp-client) — manages installed MCP servers exposed to agents


---

# MCP Client


<Callout type="warn">
  This interface is experimental — no production implementation exists yet.
  The API surface may change.
</Callout>

## Overview

`McpClient` is the kernel's connection manager for external MCP (Model Context Protocol) servers. It handles the infrastructure side of MCP integration: establishing connections, tracking server status, and exposing available tools to the rest of the system.

This is distinct from the agent-facing side. The agent-facing interface for discovering and invoking MCP tools lives in [MCP Servers](/docs/actions/mcp-servers). Think of `McpClient` as the device driver manager — it keeps connections alive so that agents can use them through higher-level APIs.

Provider examples: Claude Code MCP, Cursor MCP, VS Code Copilot MCP.

## Connection Lifecycle

<Mermaid
  chart="stateDiagram-v2
    [*] --> disconnected
    disconnected --> connected : connect()
    connected --> disconnected : disconnect()
    connected --> error : connection lost
    error --> connected : reconnect / connect()"
/>

## TypeScript API

```ts
import type {
  McpServerStatus,
  McpServerEntry,
  McpContext,
  McpActions,
  McpClient,
} from '@osprotocol/schema/system/mcp-client'
```

### McpServerStatus

```ts
type McpServerStatus = 'connected' | 'disconnected' | 'error'
```

Represents the current state of an MCP server connection.

| Value          | Description                                  |
| -------------- | -------------------------------------------- |
| `connected`    | Server is reachable and active               |
| `disconnected` | Server is not connected                      |
| `error`        | Connection attempt failed or was interrupted |

### McpServerEntry

```ts
interface McpServerEntry {
  name: string
  uri: string
  status: McpServerStatus
  tools?: string[]
  metadata?: Record<string, unknown>
}
```

A record describing a registered MCP server. `tools` lists the tool names exposed by the server. `metadata` holds any additional provider-specific information.

### McpContext

```ts
interface McpContext {
  get(name: string): Promise<McpServerEntry | null>
  list(): Promise<McpServerEntry[]>
}
```

Read-only gather phase. Used to inspect the current state of registered servers without modifying connections.

### McpActions

```ts
interface McpActions {
  connect(name: string, uri: string): Promise<McpServerEntry>
  disconnect(name: string): Promise<boolean>
}
```

Write/act phase. Used to establish or tear down server connections.

### McpClient

```ts
interface McpClient {
  connect(name: string, uri: string): Promise<McpServerEntry>
  disconnect(name: string): Promise<boolean>
  get(name: string): Promise<McpServerEntry | null>
  list(): Promise<McpServerEntry[]>
}
```

The unified interface combining both context and actions. `McpClient` is the primary surface for code that needs both read and write access to MCP connections.

## Usage Examples

### Connect to an MCP server

```ts
const entry = await mcpClient.connect(
  'claude-code',
  'mcp://localhost:3100'
)

console.log(entry.status)  // 'connected'
console.log(entry.tools)   // ['read_file', 'write_file', ...]
```

### List all connected servers and their tools

```ts
const servers = await mcpClient.list()

for (const server of servers) {
  if (server.status === 'connected') {
    console.log(`${server.name}: ${server.tools?.join(', ')}`)
  }
}
```

### Disconnect a server

```ts
const ok = await mcpClient.disconnect('claude-code')
console.log(ok)  // true
```

## System MCP Client vs Actions MCP Servers

|                | `system/mcp-client`                   | `actions/mcp-servers`             |
| -------------- | ------------------------------------- | --------------------------------- |
| Layer          | Infrastructure / kernel               | Agent-facing                      |
| Responsibility | Manage server connections             | Invoke tools on connected servers |
| Who uses it    | System internals, orchestrators       | Agents, skills                    |
| Phase          | Context (read) + Actions (write)      | Actions                           |
| Example        | `connect()`, `disconnect()`, `list()` | `callTool()`, `listTools()`       |

The system client keeps connections alive. The actions interface exposes those connections as callable tools for agents.

## Integration

* [MCP Servers (actions)](/docs/actions/mcp-servers) — agent-facing tool invocation over MCP connections managed here
* [Registry](/docs/system/registry) — discover available MCP servers before connecting
* [Installer](/docs/system/installer) — provision and install MCP server binaries or packages


---

# Preferences


<Callout type="warn">
  This interface is experimental — no production implementation exists yet. The
  API surface may change.
</Callout>

## Overview

The `Preferences` interface provides kernel-level storage for per-agent and per-user configuration that customizes agent behavior without affecting the global system. Unlike system-wide settings, preferences are scoped — each entry belongs to an `agent`, `user`, or `system` scope — and resolved in cascade order: agent overrides user, user overrides system. Familiar provider analogs include VS Code Settings, Claude Code Scoped Config, and GitHub User Preferences.

## Scope Resolution

When an agent requests a preference value, the kernel checks scopes from most specific to least specific. The first match wins.

<Mermaid
  chart="flowchart TD
    A([preference.get key]) --> B{agent scope?}
    B -- found --> R([return entry])
    B -- not found --> C{user scope?}
    C -- found --> R
    C -- not found --> D{system scope?}
    D -- found --> R
    D -- not found --> N([return null])"
/>

## TypeScript API

```ts
import type {
  Preferences,
  PreferenceEntry,
  PreferenceScope,
  PreferencesContext,
  PreferencesActions,
} from '@osprotocol/schema/system/preferences'
```

### PreferenceScope

The three available scopes for a preference entry.

```ts
type PreferenceScope = 'agent' | 'user' | 'system'
```

### PreferenceEntry

A single preference value with its scope and optional metadata.

```ts
interface PreferenceEntry<T = unknown> {
  key: string
  value: T
  scope: PreferenceScope
  metadata?: Record<string, unknown>
}
```

### PreferencesContext

Read-only access for the context phase of the agent loop.

```ts
interface PreferencesContext {
  get<T = unknown>(key: string, scope: PreferenceScope): Promise<PreferenceEntry<T> | null>
  list(scope: PreferenceScope): Promise<PreferenceEntry[]>
}
```

### PreferencesActions

Write operations for the actions phase of the agent loop.

```ts
interface PreferencesActions {
  set<T = unknown>(key: string, value: T, scope: PreferenceScope): Promise<PreferenceEntry<T>>
  remove(key: string, scope: PreferenceScope): Promise<boolean>
}
```

### Preferences

Full preferences management interface combining context and actions.

```ts
interface Preferences {
  get<T = unknown>(key: string, scope: PreferenceScope): Promise<PreferenceEntry<T> | null>
  set<T = unknown>(key: string, value: T, scope: PreferenceScope): Promise<PreferenceEntry<T>>
  remove(key: string, scope: PreferenceScope): Promise<boolean>
  list(scope: PreferenceScope): Promise<PreferenceEntry[]>
}
```

## Usage Examples

### Read an agent-scoped preference

Agent-scoped preferences take priority over user or system values with the same key.

```ts
const entry = await preferences.get<string>('output.format', 'agent')

if (entry) {
  console.log(entry.key)   // 'output.format'
  console.log(entry.value) // 'json'
  console.log(entry.scope) // 'agent'
}
```

### Set a user-scoped preference

User-scoped preferences apply across agents that have not overridden the key at the agent scope.

```ts
await preferences.set('output.format', 'markdown', 'user')
```

### Implement the cascade manually

When you need to resolve a value across all scopes in priority order:

```ts
async function resolve<T>(
  preferences: Preferences,
  key: string,
): Promise<T | null> {
  const scopes: PreferenceScope[] = ['agent', 'user', 'system']

  for (const scope of scopes) {
    const entry = await preferences.get<T>(key, scope)
    if (entry !== null) return entry.value
  }

  return null
}

const format = await resolve<string>(preferences, 'output.format')
```

## Integration

* **[Settings](/docs/system/settings)**: System-wide configuration that acts as the baseline before preference scopes are applied
* **[Environment](/docs/system/env)**: Platform-level variables for credentials and deployment targets; preferences handle behavioral customization above that layer
* **[Registry](/docs/system/registry)**: Agent registration metadata may include default preference values that seed the agent scope at registration time


---

# Registry


<Callout type="warn">
  This interface is experimental — no production implementation exists yet.
  The API surface may change.
</Callout>

## Overview

Registry is the kernel's service directory: it registers and discovers any type of resource in the system. The interface is generic (`Registry<T>`) so it works uniformly for agents, skills, MCP servers, or any other resource type. The `RegistryContext`/`RegistryActions` split operates on **named registries** (e.g., `"agents"`, `"skills"`), allowing a single implementation to manage multiple resource namespaces. This pattern is analogous to Google A2A Agent Cards, Microsoft AutoGen Registry, AGENTS.md/AAIF Agent Discovery, and the npm Registry.

## TypeScript API

```ts
import type {
  RegistryEntry,
  RegistryContext,
  RegistryActions,
  Registry,
} from '@osprotocol/schema/system/registry'
```

### RegistryEntry

A single record stored in the registry.

```ts
interface RegistryEntry<T = unknown> {
  name: string
  description: string
  resource: T
  metadata?: Record<string, unknown>
}
```

| Field         | Type                      | Description                                              |
| ------------- | ------------------------- | -------------------------------------------------------- |
| `name`        | `string`                  | Unique identifier within its registry                    |
| `description` | `string`                  | Human-readable summary                                   |
| `resource`    | `T`                       | The actual resource payload (agent, skill, server, etc.) |
| `metadata`    | `Record<string, unknown>` | Optional arbitrary metadata                              |

### RegistryContext

Read-only access to named registries. Used during the **gather phase** to inspect what is available.

```ts
interface RegistryContext {
  get<T = unknown>(registry: string, name: string): Promise<RegistryEntry<T> | null>
  list<T = unknown>(registry: string): Promise<RegistryEntry<T>[]>
}
```

| Method                | Description                                        |
| --------------------- | -------------------------------------------------- |
| `get(registry, name)` | Fetch a single entry by name from a named registry |
| `list(registry)`      | List all entries in a named registry               |

### RegistryActions

Write access to named registries. Used during the **act phase** to mutate what is registered.

```ts
interface RegistryActions {
  register<T = unknown>(registry: string, entry: RegistryEntry<T>): Promise<void>
  unregister(registry: string, name: string): Promise<boolean>
}
```

| Method                       | Description                                   |
| ---------------------------- | --------------------------------------------- |
| `register(registry, entry)`  | Add or update an entry in a named registry    |
| `unregister(registry, name)` | Remove an entry; returns `true` if it existed |

### Registry

The full provider interface for a single typed registry. Combines all operations and adds `find` for criteria-based lookup — this method is only available at the provider level, not on `RegistryContext` or `RegistryActions`.

```ts
interface Registry<T = unknown> {
  register(entry: RegistryEntry<T>): Promise<void>
  unregister(name: string): Promise<boolean>
  get(name: string): Promise<RegistryEntry<T> | null>
  list(): Promise<RegistryEntry<T>[]>
  find(criteria: Partial<T>): Promise<RegistryEntry<T>[]>
}
```

| Method             | Description                                               |
| ------------------ | --------------------------------------------------------- |
| `register(entry)`  | Add or update an entry                                    |
| `unregister(name)` | Remove an entry; returns `true` if it existed             |
| `get(name)`        | Fetch a single entry by name                              |
| `list()`           | List all entries                                          |
| `find(criteria)`   | Search entries by matching fields on the resource payload |

## Named Registries

`RegistryContext` and `RegistryActions` accept a `registry` string as the first parameter. This means one implementation can manage multiple independent namespaces:

```ts
// Same context, different namespaces
const agent = await ctx.get('agents', 'summarizer-v2')
const skill = await ctx.get('skills', 'web-search')
const server = await ctx.get('mcp-servers', 'filesystem')
```

The full `Registry<T>` interface, by contrast, is typed per resource type and manages a single namespace — you would instantiate one `Registry<AgentDefinition>` for agents and a separate `Registry<SkillDefinition>` for skills.

## Usage Examples

### Register an agent

```ts
await actions.register('agents', {
  name: 'summarizer-v2',
  description: 'Summarizes long documents into structured output.',
  resource: agentDefinition,
  metadata: { version: '2.0.0', owner: 'platform-team' },
})
```

### Discover resources by criteria

Using a typed `Registry<AgentDefinition>`, find all agents that match a partial resource shape:

```ts
const agentRegistry: Registry<AgentDefinition> = getAgentRegistry()

const matches = await agentRegistry.find({
  domain: 'summarization',
  supportsStreaming: true,
})
```

### Use named registries to cross-reference resources

```ts
async function resolveSkillsForAgent(
  ctx: RegistryContext,
  agentName: string,
): Promise<RegistryEntry[]> {
  const agentEntry = await ctx.get('agents', agentName)
  if (!agentEntry) return []

  const requiredSkills: string[] = agentEntry.metadata?.skills as string[] ?? []
  return Promise.all(
    requiredSkills.map((skillName) => ctx.get('skills', skillName)),
  ).then((results) => results.filter(Boolean) as RegistryEntry[])
}
```

## Integration

* [Installer](/docs/system/installer) — installs capabilities that are then registered, making them available for discovery
* [MCP Client](/docs/system/mcp-client) — discovers MCP servers through the registry before establishing connections
* [Env](/docs/system/env) — environment-aware discovery uses registry lookups scoped to the current runtime context


---

# Sandbox


<Callout type="warn">
  This interface is experimental — no production implementation exists yet.
  The API surface may change.
</Callout>

## Overview

The sandbox interface manages isolated execution environments for agent workloads. Agents that write and run code need isolation — a contained filesystem, command execution, optional network access, and automatic cleanup. The protocol defines the convergent surface across providers like Vercel Sandbox, E2B, Cloudflare Workers, and Docker.

<Mermaid
  chart="stateDiagram-v2
    [*] --> pending: create()
    pending --> running: sandbox ready
    running --> stopping: stop()
    running --> failed: error
    stopping --> stopped: cleanup done
    stopped --> [*]
    failed --> [*]"
/>

## TypeScript API

```ts
import type {
  Sandbox,
  SandboxEntry,
  SandboxConfig,
  SandboxStatus,
  SandboxContext,
  SandboxActions,
  CommandResult,
  SandboxFile,
} from '@osprotocol/schema/system/sandbox'
```

### SandboxStatus

Lifecycle states for a sandbox.

```ts
type SandboxStatus = 'pending' | 'running' | 'stopping' | 'stopped' | 'failed'
```

### SandboxEntry

Summary of a sandbox instance.

```ts
interface SandboxEntry {
  /** Unique sandbox identifier */
  id: string
  /** Current lifecycle status */
  status: SandboxStatus
  /** When the sandbox was created (Unix ms) */
  createdAt: number
  /** Remaining time before auto-stop (ms) */
  timeout?: number
  /** Extensible metadata for provider-specific data */
  metadata?: Record<string, unknown>
}
```

### SandboxConfig

Configuration for creating a new sandbox.

```ts
interface SandboxConfig {
  /** Runtime or template identifier (e.g., "node24", "python3.13") */
  runtime?: string
  /** Initial timeout in milliseconds before auto-stop */
  timeout?: number
  /** Environment variables to inject */
  env?: Record<string, string>
  /** Ports to expose for external access */
  ports?: number[]
  /** Extensible metadata for provider-specific data */
  metadata?: Record<string, unknown>
}
```

### CommandResult

Result of executing a command inside a sandbox.

```ts
interface CommandResult {
  /** Process exit code (0 = success) */
  exitCode: number
  /** Standard output */
  stdout: string
  /** Standard error */
  stderr: string
  /** Extensible metadata for provider-specific data */
  metadata?: Record<string, unknown>
}
```

### SandboxFile

A file within the sandbox filesystem.

```ts
interface SandboxFile {
  /** File path within the sandbox */
  path: string
  /** File contents */
  content: string
}
```

### Sandbox

Full sandbox management interface.

```ts
interface Sandbox {
  create(config?: SandboxConfig): Promise<SandboxEntry>
  get(id: string): Promise<SandboxEntry | null>
  list(): Promise<SandboxEntry[]>
  stop(id: string): Promise<boolean>
  exec(id: string, command: string, args?: string[]): Promise<CommandResult>
  readFile(id: string, path: string): Promise<string | null>
  writeFiles(id: string, files: SandboxFile[]): Promise<void>
  getUrl(id: string, port: number): Promise<string | null>
  extendTimeout(id: string, duration: number): Promise<void>
}
```

### SandboxContext

Read-only view for the context phase of the agent loop.

```ts
interface SandboxContext {
  get(id: string): Promise<SandboxEntry | null>
  list(): Promise<SandboxEntry[]>
}
```

### SandboxActions

Write operations for the actions phase of the agent loop.

```ts
interface SandboxActions {
  create(config?: SandboxConfig): Promise<SandboxEntry>
  stop(id: string): Promise<boolean>
  exec(id: string, command: string, args?: string[]): Promise<CommandResult>
  writeFiles(id: string, files: SandboxFile[]): Promise<void>
  readFile(id: string, path: string): Promise<string | null>
  getUrl(id: string, port: number): Promise<string | null>
  extendTimeout(id: string, duration: number): Promise<void>
}
```

## Usage Examples

### Create a sandbox and run code

```ts
const entry = await sandbox.create({
  runtime: 'node24',
  timeout: 120000,
  env: { NODE_ENV: 'production' },
})

await sandbox.writeFiles(entry.id, [
  { path: 'index.ts', content: 'console.log("hello from sandbox")' },
])

const result = await sandbox.exec(entry.id, 'npx', ['tsx', 'index.ts'])
// result.stdout → 'hello from sandbox'
// result.exitCode → 0
```

### Expose a web server

```ts
const entry = await sandbox.create({
  runtime: 'node24',
  ports: [3000],
})

await sandbox.writeFiles(entry.id, [
  { path: 'server.js', content: 'require("http").createServer((_, res) => res.end("ok")).listen(3000)' },
])

await sandbox.exec(entry.id, 'node', ['server.js'])
const url = await sandbox.getUrl(entry.id, 3000)
// url → 'https://sandbox-abc123.provider.dev'
```

### Extend timeout for long-running tasks

```ts
const entry = await sandbox.create({ timeout: 60000 })

// Task is taking longer than expected
await sandbox.extendTimeout(entry.id, 60000) // +60s

// Clean up when done
await sandbox.stop(entry.id)
```

## Sandbox vs Filesystem

|               | Sandbox (`system/sandbox`)                      | Filesystem (`system/fs`)        |
| ------------- | ----------------------------------------------- | ------------------------------- |
| **Scope**     | Isolated environment with own filesystem        | Host/platform filesystem        |
| **Execution** | Can run commands (`exec`)                       | No execution capability         |
| **Lifecycle** | Created, used, destroyed                        | Always available                |
| **Network**   | Optional port exposure                          | N/A                             |
| **Use case**  | Run untrusted code, test builds, serve previews | Read configs, persist artifacts |

## Integration

Sandbox integrates with:

* **[Filesystem](/docs/system/fs)**: Host fs for persistent artifacts, sandbox fs for ephemeral execution
* **[Environment](/docs/system/env)**: Sandbox inherits or overrides environment variables
* **[Timeout](/docs/runs/timeout)**: Sandbox timeout is independent of run timeout — both can apply
* **[Screenshot](/docs/checks/screenshot)**: Visual verification of sandbox-served web previews


---

# Settings


<Callout type="warn">
  This interface is experimental — no production implementation exists yet.
  The API surface may change.
</Callout>

## Overview

Settings is the kernel interface for managing system-wide configuration that applies to all agents and workflows running on the platform. Unlike [Preferences](/docs/system/preferences), which are scoped to individual agents or users, Settings entries are global — changing a setting affects every agent and workflow in the system. Analogous to Vercel Project Settings, Cloudflare Zone Settings, or AWS SSM Parameter Store.

## TypeScript API

```ts
import type {
  SettingsEntry,
  SettingsContext,
  SettingsActions,
  Settings,
} from '@osprotocol/schema/system/settings'
```

### SettingsEntry

A single configuration entry stored in the system.

```ts
interface SettingsEntry<T = unknown> {
  key: string
  value: T
  description?: string
  readOnly?: boolean
  metadata?: Record<string, unknown>
}
```

| Field         | Type                       | Description                                            |
| ------------- | -------------------------- | ------------------------------------------------------ |
| `key`         | `string`                   | Unique identifier for the setting                      |
| `value`       | `T`                        | The stored value (generic, defaults to `unknown`)      |
| `description` | `string?`                  | Human-readable explanation of the setting              |
| `readOnly`    | `boolean?`                 | When `true`, the entry cannot be modified at runtime   |
| `metadata`    | `Record<string, unknown>?` | Arbitrary additional data (e.g. source, last modified) |

### SettingsContext

Read-only interface used during the gather phase. Lets agents inspect system configuration without mutation rights.

```ts
interface SettingsContext {
  get<T = unknown>(key: string): Promise<SettingsEntry<T> | null>
  list(): Promise<SettingsEntry[]>
}
```

### SettingsActions

Write interface used during the act phase. Restricts callers to mutation operations only.

```ts
interface SettingsActions {
  set<T = unknown>(key: string, value: T): Promise<SettingsEntry<T>>
  remove(key: string): Promise<boolean>
}
```

### Settings

Full interface combining read and write operations. Intended for privileged system-level callers.

```ts
interface Settings {
  get<T = unknown>(key: string): Promise<SettingsEntry<T> | null>
  set<T = unknown>(key: string, value: T): Promise<SettingsEntry<T>>
  remove(key: string): Promise<boolean>
  list(): Promise<SettingsEntry[]>
}
```

## Usage Examples

### Reading a setting

```ts
import type { Settings } from '@osprotocol/schema/system/settings'

async function getMaxConcurrency(settings: Settings) {
  const entry = await settings.get<number>('workflow.max_concurrency')

  if (entry === null) {
    return 4 // default
  }

  return entry.value
}
```

### Handling read-only settings

Some entries are marked `readOnly` and must not be modified at runtime. Check the flag before attempting a write.

```ts
async function updateSetting(settings: Settings, key: string, value: unknown) {
  const existing = await settings.get(key)

  if (existing?.readOnly) {
    throw new Error(`Setting "${key}" is read-only and cannot be modified.`)
  }

  return settings.set(key, value)
}
```

### Typed settings with metadata

```ts
import type { Settings, SettingsEntry } from '@osprotocol/schema/system/settings'

interface RateLimitConfig {
  requestsPerMinute: number
  burstAllowance: number
}

async function configureRateLimit(settings: Settings): Promise<SettingsEntry<RateLimitConfig>> {
  return settings.set<RateLimitConfig>('agent.rate_limit', {
    requestsPerMinute: 60,
    burstAllowance: 10,
  })
}
```

## Settings vs Preferences

|                    | Settings                                       | Preferences                      |
| ------------------ | ---------------------------------------------- | -------------------------------- |
| **Scope**          | System-wide                                    | Per-agent or per-user            |
| **Applies to**     | All agents and workflows                       | A specific agent or user         |
| **Who manages it** | Platform operators                             | Individual agents or users       |
| **Typical values** | Concurrency limits, feature flags, rate limits | Language, persona, output format |
| **Change impact**  | Platform-wide                                  | Isolated to the scope            |

## Integration

* [Preferences](/docs/system/preferences) — per-agent or per-user configuration, scoped rather than global
* [Environment](/docs/system/env) — platform-level variables; settings govern higher-level behavioral configuration


---

# Evaluator-Optimizer


<Callout type="warn">
  This workflow pattern is part of the OS Protocol specification.
  The interfaces below describe the expected contract; implementations
  must honor the scoring range (0.0–1.0) and respect maxIterations
  to avoid infinite refinement loops.
</Callout>

## Overview

The Evaluator-Optimizer workflow runs an iterative generate-evaluate-optimize loop, continuing to refine output until a quality threshold is met or the maximum number of iterations is exhausted. Each cycle produces structured feedback that the optimizer uses to improve the next generation attempt. This pattern is best suited to quality-critical tasks where initial outputs are unlikely to meet standards without targeted refinement.

## Pattern

<Mermaid
  chart="flowchart TD
  A([Input]) --> B[generate]
  B --> C[Output]
  C --> D[evaluate]
  D --> E{passed?}
  E -- yes --> F([Done])
  E -- no --> G[optimize]
  G --> D"
/>

## TypeScript API

```ts
import type {
  Evaluation,
  CriterionResult,
  EvaluationCriterion,
  EvaluatorOptimizerWorkflow,
  EvaluatorOptimizerConfig,
} from "@osprotocol/schema/workflows/evaluator-optimizer"
```

### Evaluation

Result returned by `evaluate()`. The `score` is a float between 0.0 and 1.0. `passed` indicates whether the score meets the configured threshold. `feedback` is a human-readable explanation. `criteria` provides a per-criterion breakdown when multiple evaluation dimensions are configured.

```ts
interface Evaluation {
  score: number
  passed: boolean
  feedback: string
  criteria?: CriterionResult[]
}
```

### CriterionResult

Per-criterion scoring entry inside an `Evaluation`. Mirrors `EvaluationCriterion` but carries the measured `score` and `passed` result for a single generation attempt.

```ts
interface CriterionResult {
  name: string
  score: number
  passed: boolean
  feedback?: string
}
```

### EvaluationCriterion

Declares a single quality dimension used during evaluation. `threshold` sets the minimum acceptable score (0.0–1.0) for this criterion. `weight` controls its relative contribution when computing the aggregate score; weights across all criteria should sum to 1.0.

```ts
interface EvaluationCriterion {
  name: string
  description: string
  threshold: number
  weight?: number
}
```

### EvaluatorOptimizerWorkflow

Extends the base `Workflow<Output>` interface with the three methods that implement the loop. `generate` produces an initial output from the prompt. `evaluate` scores that output and returns structured feedback. `optimize` uses the output and evaluation to produce an improved version.

```ts
interface EvaluatorOptimizerWorkflow<Output> extends Workflow<Output> {
  generate(prompt: string): Promise<Output>
  evaluate(output: Output, prompt: string): Promise<Evaluation>
  optimize(output: Output, evaluation: Evaluation, prompt: string): Promise<Output>
}
```

### EvaluatorOptimizerConfig

Configuration passed when constructing the workflow. `threshold` sets the global pass score (default 0.8). `maxIterations` caps the refinement loop. `criteria` declares the evaluation dimensions. `generatorModel` and `evaluatorModel` allow using different models for generation and evaluation—useful when a smaller, faster model generates and a larger, more critical model evaluates.

```ts
interface EvaluatorOptimizerConfig {
  threshold?: number
  maxIterations?: number
  criteria?: EvaluationCriterion[]
  generatorModel?: string
  evaluatorModel?: string
}
```

## Usage Examples

### Basic generation loop

Runs the loop until the output passes the global threshold or `maxIterations` is reached.

```ts
const result = await workflow.run("Write a concise executive summary for Q4 results", {
  config: {
    threshold: 0.85,
    maxIterations: 4,
  },
})
```

### Multi-criteria evaluation

Defines separate quality dimensions with individual thresholds and weights. The aggregate score is a weighted sum of criterion scores.

```ts
const result = await workflow.run("Draft a technical proposal for the new caching layer", {
  config: {
    threshold: 0.80,
    maxIterations: 5,
    criteria: [
      {
        name: "technical_accuracy",
        description: "Claims are technically correct and current",
        threshold: 0.90,
        weight: 0.5,
      },
      {
        name: "clarity",
        description: "Language is clear and free of ambiguity",
        threshold: 0.75,
        weight: 0.3,
      },
      {
        name: "completeness",
        description: "All required sections are present and addressed",
        threshold: 0.70,
        weight: 0.2,
      },
    ],
  },
})
```

### Different models for generation and evaluation

Uses a fast model to generate and a more capable model to evaluate, balancing cost against quality.

```ts
const result = await workflow.run("Translate the following legal clause to plain English", {
  config: {
    threshold: 0.90,
    maxIterations: 3,
    generatorModel: "claude-haiku-4-5",
    evaluatorModel: "claude-opus-4-6",
  },
})
```

## Integration

* [Routing](/docs/workflows/routing) — route inputs to the appropriate generator before entering the loop
* [Orchestrator-Workers](/docs/workflows/orchestrator-workers) — use evaluator-optimizer as a worker in a larger orchestration
* [Parallelization](/docs/workflows/parallelization) — run multiple generation candidates in parallel and evaluate each
* [Judge](/docs/checks/judge) — reuse judge checks as evaluation criteria inside this workflow
* [Runs](/docs/runs) — control timeout, retry, and cancellation for the refinement loop


---

# Workflows


## Overview

Workflows define execution patterns for agent tasks. All workflow types extend the base `Workflow` interface. The OS Protocol implements patterns based on [Anthropic's building blocks for agentic systems](https://www.anthropic.com/engineering/building-effective-agents).

## Available Patterns

| Pattern                                                     | Description                       | When to Use                       |
| ----------------------------------------------------------- | --------------------------------- | --------------------------------- |
| [Routing](/docs/workflows/routing)                          | Classify and delegate             | Single specialized handler needed |
| [Orchestrator-Workers](/docs/workflows/orchestrator-worker) | Plan, delegate, synthesize        | Multiple specialized capabilities |
| [Parallelization](/docs/workflows/parallelization)          | Split, execute in parallel, merge | Independent subtasks              |
| [Evaluator-Optimizer](/docs/workflows/evaluator-optimizer)  | Generate, evaluate, refine        | Quality is critical               |

## TypeScript API

```ts
import type { Workflow, InferWorkflowOutput } from '@osprotocol/schema/workflows'
```

### Workflow

Base interface that all workflow patterns implement.

```ts
interface Workflow<Output> {
  /**
   * Execute the workflow with the given prompt
   *
   * @param prompt - The input prompt/task to process
   * @param options - Optional run configuration (timeout, retry, cancel, etc.)
   * @returns Promise resolving to the workflow output
   */
  run(prompt: string, options?: RunOptions<Output>): Promise<Output>
}
```

### InferWorkflowOutput

Utility type to extract the output type from a workflow.

```ts
type InferWorkflowOutput<T> = T extends Workflow<infer Output>
  ? Output
  : never
```

## Usage Example

```ts
import type { Workflow, RunOptions } from '@osprotocol/schema/workflows'

// Define a custom workflow
const myWorkflow: Workflow<string> = {
  async run(prompt, options) {
    // Execute workflow logic
    return `Processed: ${prompt}`
  }
}

// Run with options
const result = await myWorkflow.run('Hello', {
  timeout: { ms: 30000 },
  retry: { attempts: 3, delayMs: 1000 }
})
```

## Composing Workflows

Workflows can be composed to create complex execution patterns:

```ts
// Routing delegates to specialized workflows
const router: RoutingWorkflow<string> = {
  async classify(prompt) {
    // Determine which workflow to use
    return prompt.includes('code') ? 'code-assistant' : 'general'
  },
  async run(prompt, options) {
    const route = await this.classify(prompt)
    return workflows[route].run(prompt, options)
  }
}
```

## Integration

Workflows integrate with:

* **RunOptions**: Configure timeout, retry, and cancel behavior
* **Agents**: Agents declare which workflow patterns they can use
* **Execution**: Workflows create executions with lifecycle control


---

# Orchestrator-Workers


<Callout type="warn">
  This workflow pattern is experimental. Interfaces are subject to change
  as the OS Protocol specification evolves.
</Callout>

## Overview

The Orchestrator-Workers pattern uses a central orchestrator to decompose a complex task into a structured plan, delegate each step to a specialized worker, and synthesize the collected results into a final output. It is the most powerful multi-agent pattern for tasks that require multiple specialized capabilities working together. Workers operate independently on their assigned steps, and the orchestrator aggregates their results with full awareness of the original goal.

## Pattern Diagram

<Mermaid
  chart="flowchart TD
  A([Input]) --> B[plan]
  B --> C[PlanStep Array]
  C --> D1[delegate step 1]
  C --> D2[delegate step 2]
  C --> D3[delegate step N]
  D1 --> E[WorkerResult Array]
  D2 --> E
  D3 --> E
  E --> F[synthesize]
  F --> G([Output])"
/>

## TypeScript API

Import from `@osprotocol/schema/workflows/orchestrator-workers`.

### PlanStep

A single unit of work in the execution plan.

```ts
interface PlanStep {
  id: string
  description: string
  worker: string
  input: string
  dependsOn?: string[]
}
```

* `id` — unique identifier for the step
* `description` — human-readable summary of what the step does
* `worker` — name of the worker responsible for this step
* `input` — the prompt or data passed to the worker
* `dependsOn` — optional list of step IDs that must complete before this step runs

### Plan

The full execution plan produced by the orchestrator.

```ts
interface Plan {
  steps: PlanStep[]
  goal: string
}
```

* `steps` — ordered list of steps to execute
* `goal` — the original objective driving the plan

### WorkerResult

The result returned by a worker after executing a step.

```ts
interface WorkerResult<T = unknown> {
  stepId: string
  worker: string
  success: boolean
  data?: T
  error?: string
}
```

* `stepId` — the ID of the step this result corresponds to
* `worker` — name of the worker that produced the result
* `success` — whether the step completed without error
* `data` — the output produced by the worker on success
* `error` — error message if the step failed

### OrchestratorWorkersWorkflow

The main interface extending the base `Workflow`.

```ts
interface OrchestratorWorkersWorkflow<Output> extends Workflow<Output> {
  plan(prompt: string): Promise<Plan>
  delegate(step: PlanStep): Promise<WorkerResult>
  synthesize(results: WorkerResult[], plan: Plan): Promise<Output>
}
```

* `plan` — generates a structured `Plan` from the input prompt
* `delegate` — sends a single `PlanStep` to the assigned worker and returns a `WorkerResult`
* `synthesize` — combines all results with the original plan to produce the final `Output`

### WorkerConfig

Configuration for registering a worker with the orchestrator.

```ts
interface WorkerConfig {
  name: string
  description: string
  workflow: Workflow<unknown>
  capabilities: string[]
}
```

* `name` — unique identifier used in `PlanStep.worker`
* `description` — what this worker is specialized to do
* `workflow` — the underlying workflow the worker executes
* `capabilities` — list of capability tags used for worker selection

## Usage Examples

### Create a Plan and Execute

```ts
const result = await orchestrator.run("Research and summarize AI agent frameworks")

// Or step by step:
const plan = await orchestrator.plan("Research and summarize AI agent frameworks")

const results: WorkerResult[] = []
for (const step of plan.steps) {
  const result = await orchestrator.delegate(step)
  results.push(result)
}

const output = await orchestrator.synthesize(results, plan)
```

### Register Workers

```ts
const workers: WorkerConfig[] = [
  {
    name: "researcher",
    description: "Searches the web and retrieves relevant information",
    workflow: searchWorkflow,
    capabilities: ["search", "retrieve", "browse"],
  },
  {
    name: "analyst",
    description: "Analyzes data and identifies patterns",
    workflow: analysisWorkflow,
    capabilities: ["analyze", "compare", "rank"],
  },
  {
    name: "writer",
    description: "Drafts and edits written content",
    workflow: writingWorkflow,
    capabilities: ["write", "summarize", "edit"],
  },
]
```

### Handle Step Dependencies

Use `dependsOn` to ensure steps run in the correct order when outputs from one step feed into another.

```ts
const plan: Plan = {
  goal: "Produce a competitive analysis report",
  steps: [
    {
      id: "step-1",
      description: "Gather data on competitor A",
      worker: "researcher",
      input: "Find key features and pricing for Competitor A",
    },
    {
      id: "step-2",
      description: "Gather data on competitor B",
      worker: "researcher",
      input: "Find key features and pricing for Competitor B",
    },
    {
      id: "step-3",
      description: "Write comparative analysis",
      worker: "writer",
      input: "Compare the two competitors based on research results",
      dependsOn: ["step-1", "step-2"],
    },
  ],
}

// Execute steps respecting dependencies
for (const step of plan.steps) {
  const deps = step.dependsOn ?? []
  const depsComplete = deps.every(id =>
    results.find(r => r.stepId === id && r.success)
  )
  if (depsComplete) {
    results.push(await orchestrator.delegate(step))
  }
}
```

## Integration

* [Routing](/docs/workflows/routing) — use routing to select which orchestrator handles an incoming task
* [Parallelization](/docs/workflows/parallelization) — combine with parallelization to execute independent steps concurrently
* [Evaluator-Optimizer](/docs/workflows/evaluator-optimizer) — wrap synthesized output in an evaluator loop for iterative refinement
* [Runs](/docs/runs) — configure timeouts, retries, and approval gates on orchestrator runs


---

# Parallelization


<Callout type="warn">
  The Parallelization workflow pattern is part of the OS Protocol specification.
  It is defined in `@osprotocol/schema/workflows/parallelization` and must be
  implemented by agents that declare support for parallel execution.
</Callout>

## Overview

Parallelization splits an incoming prompt into independent subtasks, executes all of them concurrently, and merges their results into a single output. This pattern is best suited for throughput-sensitive workloads where subtasks do not depend on each other and the total work can be decomposed cleanly. It follows the parallelization building block described in Anthropic's "Building Effective Agents."

## Pattern

<Mermaid
  chart="flowchart LR
  A([Input]) --> B[split]
  B --> C([&#x22;Subtask[]&#x22;])
  C --> D[parallel]
  D --> E([&#x22;SubtaskResult[]&#x22;])
  E --> F[merge]
  F --> G([Output])"
/>

## Failure Strategies

The `failureStrategy` option in `ParallelizationConfig` controls how the workflow behaves when one or more subtasks fail.

| Strategy      | Behavior                                                                               |
| ------------- | -------------------------------------------------------------------------------------- |
| `fail-fast`   | Stops execution on the first failed subtask and rejects the run immediately.           |
| `collect-all` | Runs all subtasks regardless of failures and returns every result, including failures. |

Use `fail-fast` when any single failure makes the merged output invalid. Use `collect-all` when partial results are still useful or when you want to surface all errors at once.

## TypeScript API

```ts
import type {
  Subtask,
  SubtaskResult,
  ParallelizationWorkflow,
  ParallelizationConfig,
} from "@osprotocol/schema/workflows/parallelization";
```

### Subtask

```ts
interface Subtask {
  id: string
  prompt: string
  metadata?: Record<string, unknown>
}
```

A single unit of work produced by `split()`. The `id` uniquely identifies the subtask and is echoed back in `SubtaskResult` so results can be correlated to their origin.

### SubtaskResult

```ts
interface SubtaskResult<T = unknown> {
  id: string
  success: boolean
  data?: T
  error?: string
  durationMs?: number
}
```

The outcome of a single subtask. Results are returned in the same order as the input `Subtask[]` array. `durationMs` is available for performance monitoring.

### ParallelizationWorkflow

```ts
interface ParallelizationWorkflow<Output> extends Workflow<Output> {
  split(prompt: string): Promise<Subtask[]>
  parallel(subtasks: Subtask[]): Promise<SubtaskResult[]>
  merge(results: SubtaskResult[]): Promise<Output>
}
```

The core workflow interface. Implementations must provide all three methods:

* `split` — decomposes the incoming prompt into a list of independent subtasks.
* `parallel` — executes all subtasks concurrently and returns their results in input order.
* `merge` — combines `SubtaskResult[]` into the final typed output.

### ParallelizationConfig

```ts
interface ParallelizationConfig {
  maxConcurrency?: number
  failureStrategy?: 'fail-fast' | 'collect-all'
  includeFailures?: boolean
}
```

| Field             | Default       | Description                                                          |
| ----------------- | ------------- | -------------------------------------------------------------------- |
| `maxConcurrency`  | unbounded     | Maximum number of subtasks to run at the same time.                  |
| `failureStrategy` | `'fail-fast'` | How to handle subtask failures (see Failure Strategies).             |
| `includeFailures` | `false`       | When using `collect-all`, whether to pass failed results to `merge`. |

## Usage Examples

### Split and merge

```ts
const subtasks = await workflow.split(
  "Summarize these five documents: doc1, doc2, doc3, doc4, doc5"
);

const results = await workflow.parallel(subtasks);
const output = await workflow.merge(results);
```

### Configure concurrency

```ts
const run = await workflow.run(prompt, {
  config: {
    maxConcurrency: 3,
    failureStrategy: "fail-fast",
  },
});
```

### Handle partial failures with collect-all

```ts
const run = await workflow.run(prompt, {
  config: {
    failureStrategy: "collect-all",
    includeFailures: true,
  },
});

const failed = run.output.results.filter((r) => !r.success);
if (failed.length > 0) {
  console.warn(`${failed.length} subtask(s) failed`, failed.map((r) => r.error));
}
```

## Integration

* [Routing](/docs/workflows/routing) — classify and delegate to the right handler before parallelizing.
* [Orchestrator-Workers](/docs/workflows/orchestrator-workers) — combine orchestration with parallel worker execution.
* [Evaluator-Optimizer](/docs/workflows/evaluator-optimizer) — evaluate and refine merged results after parallelization.
* [Runs](/docs/runs) — control timeout, retry, and cancellation for the overall parallel run.


---

# Routing


<Callout type="warn">
  This interface is experimental — no production implementation exists yet. The API surface may change.
</Callout>

## Overview

The Routing workflow classifies an incoming prompt and delegates it to a single specialized workflow based on the classification result. It is the simplest multi-agent pattern — it routes but does not aggregate results across multiple branches. The design follows Anthropic's routing building block from "Building Effective Agents."

## Pattern

<Mermaid
  chart="flowchart LR
  A[Input] --> B[classify]
  B --> C[route key]
  C --> D[Delegated Workflow]
  D --> E[Output]"
/>

## TypeScript API

Import from `@osprotocol/schema/workflows/routing`. The base `Workflow` interface is available from `@osprotocol/schema/workflows`.

### RouteConfig

Describes when a route should be selected and provides optional examples for few-shot classification.

```ts
interface RouteConfig {
  description: string
  whenToUse: string[]
  examples?: string[]
}
```

### RoutingWorkflow

Extends the base `Workflow` interface with a `classify` method that returns the key of the matched route.

```ts
interface RoutingWorkflow<Output> extends Workflow<Output> {
  classify(prompt: string): Promise<string>
}
```

### RoutingWorkflowConfig

Top-level configuration object. `workflows` is a record keyed by route name, each value being a `RoutingWorkflowEntry`.

```ts
interface RoutingWorkflowConfig<Output> {
  model?: string
  workflows: Record<string, RoutingWorkflowEntry<Output>>
}
```

### RoutingWorkflowEntry

Pairs a concrete `Workflow` with its `RouteConfig`. Set `markAsDefault` to `true` on one entry to use it as the fallback when no route matches.

```ts
interface RoutingWorkflowEntry<Output> {
  workflow: Workflow<Output>
  route: RouteConfig
  markAsDefault?: boolean
}
```

## Usage Examples

### Configure a routing workflow with multiple routes

```ts
const config: RoutingWorkflowConfig<string> = {
  model: "claude-opus-4-6",
  workflows: {
    billing: {
      workflow: billingWorkflow,
      route: {
        description: "Handles billing, invoices, and payment questions.",
        whenToUse: [
          "User asks about an invoice",
          "User wants to update payment method",
          "User reports a charge they do not recognize",
        ],
        examples: [
          "Why was I charged twice this month?",
          "How do I cancel my subscription?",
        ],
      },
    },
    technical: {
      workflow: technicalWorkflow,
      route: {
        description: "Handles technical support and troubleshooting.",
        whenToUse: [
          "User reports a bug or error",
          "User needs help integrating the API",
          "User asks how a feature works",
        ],
      },
    },
    general: {
      workflow: generalWorkflow,
      route: {
        description: "Handles all other questions.",
        whenToUse: ["Input does not match any other route"],
      },
      markAsDefault: true,
    },
  },
}
```

### Classify a prompt to determine the route

```ts
const routeKey = await router.classify("I was charged twice last week.")
// routeKey === "billing"
```

### Run the full routing workflow

```ts
const run = await router.run("I was charged twice last week.")
const result = await run.output()
```

The `run` method internally calls `classify`, selects the matching `RoutingWorkflowEntry`, and delegates execution to its `workflow`. If no route matches, the entry with `markAsDefault: true` is used.

## Integration

* [Orchestrator-Workers](/docs/workflows/orchestrator-workers) — plan, delegate to multiple workers, and synthesize results
* [Parallelization](/docs/workflows/parallelization) — split a task across concurrent workflows
* [Evaluator-Optimizer](/docs/workflows/evaluator-optimizer) — generate, evaluate, and refine output in a loop
* [Runs](/docs/runs) — timeout, retry, cancel, and approval controls for any workflow