# The Agentic OS Protocol > The standard for multi-agent systems. Define agents, coordinate workflows, and build systems where agents work together. --- # Apps This feature is experimental. The App schema and distribution manifest format have no real implementation yet and are subject to change. ## Overview An app is a distribution of the Agentic OS — a manifest that declares which vendors implement which protocol interfaces, forming a complete agentic system configuration. Think of it like `package.json` for Node.js, `docker-compose.yml` for containers, or `vercel.json` for deployments: a single file that describes how all the pieces fit together. The format is YAML frontmatter for the structured, machine-readable manifest, combined with a free-form markdown body for human-readable documentation. The filename is up to the author — `SYNER.md`, `APP.md`, `ACADEMY.md`, or anything else. The format is what matters. ## Distribution File An app file combines a YAML frontmatter manifest with a markdown documentation body: ```yaml --- name: '@synerops/syner-os' version: '1.0.0' description: 'Syner OS — AI-powered development platform' protocol: '0.1.0' providers: system: sandbox: { provider: '@vercel/sandbox' } fs: { provider: '@vercel/blob' } context: embeddings: { provider: '@upstash/vector' } --- # Syner OS Syner OS is an AI-powered development platform... ``` The frontmatter declares the distribution metadata and provider bindings. The markdown body is free-form — documentation, usage instructions, architecture notes, or anything else relevant to the distribution. ## TypeScript API Import types from `@osprotocol/schema/apps`: ```ts import type { App, AppMetadata, ProviderMap, ProviderEntry } from '@osprotocol/schema/apps' ``` ### ProviderEntry A single vendor binding for a protocol interface: ```ts interface ProviderEntry { provider: string version?: string enabled?: boolean metadata?: Record } ``` | Field | Type | Description | | ---------- | ------------------------- | ----------------------------------------------------------- | | `provider` | `string` | Package name or identifier of the vendor implementation | | `version` | `string` | Optional semver range for the provider | | `enabled` | `boolean` | Whether this binding is active. Defaults to true if omitted | | `metadata` | `Record` | Arbitrary provider-specific configuration | ### ProviderMap Maps protocol domains to their provider bindings, one entry per interface: ```ts interface ProviderMap { system?: Record context?: Record actions?: Record checks?: Record } ``` Each key within a domain corresponds to a specific protocol interface (e.g., `sandbox`, `fs`, `embeddings`), not the domain as a whole. Granularity is per interface. ### AppMetadata The structured frontmatter of a distribution file: ```ts interface AppMetadata { name: string version: string description?: string protocol?: string providers?: ProviderMap metadata?: Record } ``` | Field | Type | Description | | ------------- | ------------------------- | -------------------------------------------------------- | | `name` | `string` | Distribution name, typically a scoped package identifier | | `version` | `string` | Semver version of this distribution | | `description` | `string` | Human-readable description | | `protocol` | `string` | OS Protocol spec version this distribution targets | | `providers` | `ProviderMap` | Vendor bindings per domain and interface | | `metadata` | `Record` | Arbitrary additional metadata | ### App The parsed representation of a full distribution file: ```ts interface App { metadata: AppMetadata content: string path: string } ``` | Field | Type | Description | | ---------- | ------------- | ---------------------------------------- | | `metadata` | `AppMetadata` | Parsed frontmatter | | `content` | `string` | Raw markdown body after the frontmatter | | `path` | `string` | Filesystem path to the distribution file | ## Provider Bindings The `providers` field in the frontmatter maps each protocol domain's interfaces to concrete vendor implementations. Each domain (`system`, `context`, `actions`, `checks`) contains a record where each key is a specific interface name and each value is a `ProviderEntry`. ```yaml providers: system: env: provider: '@vercel/env' sandbox: provider: '@vercel/sandbox' version: '^1.0.0' context: embeddings: { provider: '@upstash/vector' } checks: screenshot: { provider: 'playwright' } judge: { provider: '@braintrust/judge' } ``` This declaration says: for the `system` domain, use `@vercel/env` for environment access and `@vercel/sandbox` for sandboxed execution; for `context`, use `@upstash/vector` for embeddings; for `checks`, use Playwright for screenshots and Braintrust for LLM-as-judge evaluation. Provider bindings are resolved at runtime by the OS Protocol host. Interfaces with no binding fall back to any default registered by the host environment. ## Usage Examples ### Load an app manifest ```ts import { readFileSync } from 'fs' import matter from 'gray-matter' import type { App, AppMetadata } from '@osprotocol/schema/apps' function loadApp(filePath: string): App { const raw = readFileSync(filePath, 'utf-8') const { data, content } = matter(raw) return { metadata: data as AppMetadata, content, path: filePath, } } const app = loadApp('./SYNER.md') console.log(app.metadata.name) // '@synerops/syner-os' console.log(app.metadata.version) // '1.0.0' ``` ### Validate provider bindings ```ts import type { ProviderMap } from '@osprotocol/schema/apps' function getProviderForInterface( providers: ProviderMap, domain: keyof ProviderMap, interfaceName: string ): string | undefined { return providers[domain]?.[interfaceName]?.provider } const sandboxProvider = getProviderForInterface( app.metadata.providers ?? {}, 'system', 'sandbox' ) // '@vercel/sandbox' ``` ### Compare two distributions ```ts import type { App } from '@osprotocol/schema/apps' function diffProviders(a: App, b: App): string[] { const differences: string[] = [] const domains = ['system', 'context', 'actions', 'checks'] as const for (const domain of domains) { const aBindings = a.metadata.providers?.[domain] ?? {} const bBindings = b.metadata.providers?.[domain] ?? {} const allInterfaces = new Set([...Object.keys(aBindings), ...Object.keys(bBindings)]) for (const iface of allInterfaces) { const aProvider = aBindings[iface]?.provider const bProvider = bBindings[iface]?.provider if (aProvider !== bProvider) { differences.push(`${domain}.${iface}: ${aProvider ?? 'none'} → ${bProvider ?? 'none'}`) } } } return differences } ``` ## Cross-References The app manifest format is analogous to other ecosystem configuration files: | Format | Ecosystem | Defines | | ---------------------------- | ----------------- | ------------------------------------------------------ | | `package.json` | Node.js / npm | Package dependencies and scripts | | `docker-compose.yml` | Docker | Service definitions and bindings | | `Chart.yaml` | Helm / Kubernetes | Chart metadata and dependencies | | `claude_desktop_config.json` | Claude Desktop | MCP server registrations | | `mcp.json` (Cursor) | Cursor | MCP server registrations for Cursor | | `.vscode/mcp.json` | VS Code | MCP server registrations for VS Code | | `vercel.json` | Vercel | Deployment configuration and routing | | App manifest (`*.md`) | OS Protocol | Agentic OS provider bindings and distribution metadata | The app manifest occupies the same role in the Agentic OS ecosystem that these files occupy in their respective ecosystems: a single source of truth for how a system is configured and what implements each capability. ## Integration Provider bindings in an app manifest map to the following protocol domains: * [System](/docs/system) — environment, sandboxing, storage, file system interfaces * [Context](/docs/context) — memory, embeddings, and knowledge retrieval interfaces * [Actions](/docs/actions) — external integrations and side-effect interfaces * [Checks](/docs/checks) — validation, evaluation, and quality assurance interfaces --- # Architecture ## Overview The Agentic OS Protocol (OSP) is built on a modular architecture that separates concerns and enables flexible composition of agent systems. ## Philosophy and Foundations OSP doesn't create everything from scratch. Instead, it adapts proven philosophies and patterns that have demonstrated effectiveness in production environments, combined with our own contributions and experience in infrastructure design. ### Influences and Adaptations OSP draws inspiration from several key sources: * **[Anthropic: Building Effective Agents](https://www.anthropic.com/engineering/building-effective-agents)**: Workflow patterns (Routing, Prompt Chaining, Orchestrator-Workers, Parallelization, Evaluator-Optimizer) form the foundation of our workflow taxonomy. * **[Agent Communication Protocol](https://agentcommunicationprotocol.dev/core-concepts/agent-run-lifecycle)**: The concept of **Runs**—essential for multi-agent systems—provides the lifecycle management framework. * **[Claude Agent SDK](https://www.anthropic.com/engineering/building-agents-with-the-claude-agent-sdk)**: The **Agent Loop** execution pattern (Gather Context → Take Action → Verify Work → Iterate) defines our core cognitive cycle. These patterns have been adapted and extended to work together in a unified protocol specification that emphasizes interoperability, scalability, and system-level orchestration. ### The Operating System Concept Where OSP contributes uniquely is in the **Operating System** abstraction—a layer that provides system intelligence through standardized APIs. This OS layer manages the lifecycle, coordination, and resource management that multi-agent systems require. > **Understanding the Metaphor**: Just as traditional operating systems abstract hardware resources (CPU, memory, disk), an Agentic OS abstracts cognitive resources (inference, context, knowledge, tools). The [Agentic OS concept](/docs/concepts/agentic-os) defines this paradigm—OSP is the protocol specification that implementations follow. Just as traditional operating systems provide process management, memory management, and I/O interfaces, OSP's Operating System provides: * **Agent Registry**: Discovery and capability management * **System APIs**: Environment, Filesystem, Settings, Sandbox * **Context Facades**: System Context, Embeddings, Key-Value * **Quality Assurance**: Rules, Audit, Judge, Screenshot ## Architecture Layers The protocol architecture is organized into distinct domains: ### 1. System (Infrastructure) The OS layer provides infrastructure services that all agents depend on. Unlike traditional protocols that focus solely on agent-to-agent communication, OSP includes system-level intelligence: * **Registry**: Manages agent registration, discovery, and capability matching * **Environment**: Handles configuration and environment variable management * **Filesystem**: Provides standardized file system operation interfaces * **Sandbox**: Isolated execution environments for untrusted code * **Settings**: Manages system-level configurations * **Preferences**: User-level preferences and personalization * **Installer**: Package and dependency management * **MCP Client**: Integrates with Model Context Protocol for external tool access ### 2. Context & Actions (Read/Write Facades) OSP separates read-only information gathering from state-changing operations. This mirrors the Agent Loop phases: **gather** context, then **take** action. **Context** (read-only facades for the gather phase): * **System Context**: Aggregated read-only view of system state (environment, settings, filesystem metadata) * **Embeddings**: Vector database integration for semantic search * **Key-Value**: Lightweight key-value storage for agent state **Actions** (write facades for the act phase): * **System Actions**: Aggregated write operations (filesystem writes, setting changes) * **Tools**: Tool registration and invocation interfaces * **MCP Servers**: External tool access through Model Context Protocol servers ### 3. Checks (Quality Assurance) Built-in mechanisms for ensuring reliability, compliance, and quality: * **Rules**: Behavioral constraints and validation frameworks * **Judge**: LLM-based evaluation of agent outputs and decisions * **Audit**: Comprehensive monitoring and logging of agent behavior * **Screenshot**: Visual validation for UI-related tasks ### 4. Workflows (Execution Patterns) Composable patterns for agent coordination, based on [Anthropic's building blocks for agentic systems](https://www.anthropic.com/engineering/building-effective-agents): * **Routing**: Classify input and delegate to the appropriate handler * **Parallelization**: Split work across parallel branches and merge results * **Orchestrator-Workers**: Plan, delegate to workers, synthesize outputs * **Evaluator-Optimizer**: Generate, evaluate, refine in a loop ### 5. Runs (Lifecycle Control) Every agent execution is a **Run** with a well-defined lifecycle and control mechanisms: * **Timeout**: Time-based execution limits * **Retry**: Automatic retry with configurable strategies * **Cancel**: Graceful cancellation with cleanup * **Approval**: Human-in-the-loop gates for sensitive operations ## Core Execution Model ### The Agent Loop At the heart of every agent is the **Agent Loop**, the cognitive cycle that drives execution: This loop executes within workflows and is managed by the Operating System during the Execution phase of the Agent Lifecycle. Learn more: [Agent Loop](/docs/concepts/agent-loop) ### The Agent Lifecycle At the system level, agents follow a **Lifecycle** that spans their entire existence: * **Registration**: Agents declare capabilities to the Registry * **Discovery**: The OS matches agents to tasks based on capabilities * **Execution**: The Agent Loop runs within workflows * **Evaluation**: Performance, quality, and compliance are assessed Learn more: [Agent Lifecycle](/docs/concepts/lifecycle) ## Integration Patterns OSP is designed to integrate with existing protocols and systems: ### MCP Integration OSP includes native support for the Model Context Protocol, allowing agents to access external tools and resources through standardized MCP servers. The OS provides MCP client functionality that any agent can leverage. ### Workflow Orchestration The protocol defines workflow patterns that can be composed and combined. Workflows handle coordination; Runs handle lifecycle control (timeouts, retries, cancellation, approval). Learn more: [Workflows](/docs/workflows) · [Runs](/docs/runs) ### Multi-Agent Coordination OSP enables agents from different implementations to work together through standardized interfaces. Agents can coordinate tasks, share context (with proper isolation), and participate in distributed workflows. ## Design Philosophy The architecture prioritizes: * **Modularity**: Components can be used independently or together * **Extensibility**: New components and capabilities can be added without breaking existing implementations * **Interoperability**: Different implementations can work together through standardized contracts * **Reliability**: Built-in quality assurance through Rules, Audit, and Judge * **Observability**: Comprehensive monitoring, auditing, and evaluation capabilities * **Scalability**: Designed to grow from single-agent systems to complex multi-agent environments ## Next Steps * Learn about **[System](/docs/system)** for infrastructure services * Understand **[Context](/docs/context)** and **[Actions](/docs/actions)** for the read/write split * Review **[Checks](/docs/checks)** for quality assurance mechanisms * Explore **[Workflows](/docs/workflows)** for coordination patterns * See **[Runs](/docs/runs)** for lifecycle control --- # Introduction import { Orbit, Boxes, CircuitBoard, Server } from 'lucide-react'; ## What is the Agentic OS Protocol? The Agentic OS Protocol (OSP) is a specification—a shared contract that defines the interfaces, behaviors, and data formats for orchestrating AI agents at scale. Think of it as the blueprint you implement: it tells you which domains exist (System, Context, Actions, Checks, Workflows, Runs), how they interact, and what "conformant" behavior looks like. It's not a runtime or framework—you build to it. ### Protocol Domains | Domain | Purpose | | ---------------------------- | ---------------------------------------------------------------------------------------- | | [System](/docs/system) | Infrastructure — registry, environment, filesystem, settings | | [Context](/docs/context) | Read-only facades — system context, embeddings, key-value | | [Actions](/docs/actions) | Write facades — system actions, tools, MCP servers | | [Checks](/docs/checks) | Quality assurance — rules, judge, audit, screenshot | | [Workflows](/docs/workflows) | Execution patterns — routing, parallelization, orchestrator-workers, evaluator-optimizer | | [Runs](/docs/runs) | Lifecycle control — timeout, retry, cancel, approval | ## Getting Started OSP defines the contract for building agent systems that can work together seamlessly, with standardized patterns for coordination, quality assurance, and context management. Below are the key entry points to understand and implement the protocol. } href="/docs/architecture" title="See the big picture"> Explore the architecture, how everything fits before you dive into details. } href="/docs/concepts/agent-loop" title="Agent Loop"> Gather Context → Take Actions → Verify Results } href="/docs/system" title="System Intelligence"> Registry, Environment, Filesystem, Settings—the infrastructure layer. } href="/docs/concepts/agentic-os" title="What is Agentic OS?"> The architectural paradigm where LLM functions as the Kernel of the system. --- # Motivation ## The Challenge As AI agents become more sophisticated and capable, we face new challenges in orchestrating, managing, and executing them at scale. Traditional approaches to agent management often fall short when dealing with the reality that comes when systems grow beyond a single agent running in isolation. Picture this: you start with one agent handling a simple task. It works perfectly. Then you need two agents to work together. Still manageable. But as you add more agents—each with different capabilities, running across different environments, coordinating complex workflows—things quickly become messy. What seemed simple at small scale reveals complexity you didn't anticipate. ## When Systems Grow Orchestration, at its core, is about coordinating multiple agents to work together toward a common goal. When you have a handful of agents, coordination feels straightforward. But scale changes everything. Your agents start running across different environments—some in the cloud, others on edge devices, each with different capabilities and constraints. Coordinating them becomes a challenge in itself. Then workflows get complex: one agent's output becomes another's input, creating chains of dependencies that span multiple agents and environments. A failure in one step can cascade through the entire process. You find yourself managing not just individual agents, but intricate relationships between them—who depends on whom, what happens when something fails, how to retry, how to recover. As your system grows, new questions emerge that traditional approaches struggle to answer. How do you ensure quality when you can't monitor everything manually? How do you maintain context when agents operate independently across sessions? How do you scale from ten agents to a thousand without everything breaking? In distributed systems where agents collaborate, quality is emergent—it's about how agents interact, not just how each performs alone. Context management becomes critical: agents need to share information but also maintain isolation. Without standardized approaches, everyone solves these problems differently. Agent platforms built by different teams can't interoperate. The ecosystem fragments, and innovation slows because everyone is reinventing the same solutions. ## Why a Protocol This is where protocols shine. A protocol defines a shared contract—the interfaces, behaviors, and data formats that enable interoperability. Just as HTTP allows any web browser to communicate with any web server, a protocol for agent orchestration would allow different implementations to work together while remaining free to innovate in their specific domains. A protocol, unlike a framework or library, doesn't prescribe implementation details. It defines *what* must be supported and *how* components should interact, but leaves *how* you build it up to you. This flexibility is crucial: teams working in different languages, with different constraints, and different use cases can all implement the same protocol and achieve interoperability. ## What OSP Does OSP provides standardized patterns for agent coordination, quality assurance, and resource management—proven patterns that are reusable but not prescriptive. It means providing infrastructure for common problems: agent discovery, context management, quality monitoring. It means designing for scale from the start, so systems can grow from a single agent to complex multi-agent environments without fundamental redesigns. With a standardized protocol, the entire ecosystem benefits. Developers can build agent systems knowing they'll interoperate with others. Teams can share agents, workflows, and patterns. Agents from different platforms can collaborate on complex tasks. Workflows can span multiple systems. The ecosystem becomes composable—you can combine agents and tools from different sources, knowing they'll work together because they follow the same protocol. This creates the foundation for systems where agents collaborate at unprecedented scale, where workflows span organizations and platforms, where the whole ecosystem is greater than the sum of its parts. ## Building in the Open The Agentic OS Protocol is in active development, maintained by [SynerOps](https://synerops.com), and we're building it in the open. Why? Because the protocol needs to solve real problems, work in real environments, and evolve based on how people actually use it. We welcome contributions, feedback, and collaboration. Whether you're implementing the protocol, using it in production, researching agent systems, or just curious about what's possible—your perspective matters. Together, we're not just defining a protocol; we're shaping how agents will work together for years to come. --- # MCP Servers This interface is experimental. No real implementation exists yet. The API surface may change as the MCP ecosystem and OS Protocol integration patterns mature. ## Overview `McpServers` is the agent-facing interface for MCP-specific capabilities that go beyond tool execution. Resources (data and content exposed by a server) and prompts (reusable templates) are accessed through this interface. Tool execution from MCP servers goes through the unified Tools interface, not here. Connection management and server lifecycle are handled at the infrastructure level by `system/mcp-client`. Provider analogues include Anthropic MCP and the AAIF MCP Standard. ## Architecture ## TypeScript API ```ts import type { McpServers, McpResource, McpPrompt } from "@osprotocol/schema/actions/mcp-servers" ``` ### McpResource Represents a data or content resource exposed by an MCP server. ```ts interface McpResource { uri: string name: string mimeType?: string description?: string metadata?: Record } ``` | Field | Type | Description | | ------------- | ------------------------- | -------------------------------------------------- | | `uri` | `string` | Unique resource identifier within the server | | `name` | `string` | Human-readable resource name | | `mimeType` | `string` | Optional MIME type of the resource content | | `description` | `string` | Optional description of what the resource contains | | `metadata` | `Record` | Optional server-defined metadata | ### McpPrompt Represents a reusable prompt template provided by an MCP server. ```ts interface McpPrompt { name: string description?: string arguments?: object metadata?: Record } ``` | Field | Type | Description | | ------------- | ------------------------- | -------------------------------------------- | | `name` | `string` | Unique prompt name within the server | | `description` | `string` | Optional description of the prompt's purpose | | `arguments` | `object` | Optional argument schema for the prompt | | `metadata` | `Record` | Optional server-defined metadata | ### McpServers The primary interface agents use to interact with MCP server resources and prompts. ```ts interface McpServers { listResources(server: string): Promise readResource(server: string, uri: string): Promise listPrompts(server: string): Promise getPrompt(server: string, name: string, args?: Record): Promise } ``` | Method | Description | | -------------------------------- | ---------------------------------------------------------------- | | `listResources(server)` | List all resources available on the given MCP server | | `readResource(server, uri)` | Read the content of a specific resource by URI | | `listPrompts(server)` | List all prompt templates available on the given MCP server | | `getPrompt(server, name, args?)` | Retrieve a rendered prompt by name, optionally passing arguments | ## Usage Examples ### List and read resources from a server ```ts const resources = await mcp.listResources("knowledge-base") for (const resource of resources) { console.log(`${resource.name} (${resource.mimeType ?? "unknown type"})`) } const content = await mcp.readResource("knowledge-base", "docs://api-reference") if (content) { // process the resource content } ``` ### Get a prompt with arguments ```ts const rendered = await mcp.getPrompt("code-assistant", "explain-function", { language: "typescript", context: "async generator", }) if (rendered) { // use the rendered prompt text in an LLM call } ``` ### Discover what an MCP server offers ```ts const [resources, prompts] = await Promise.all([ mcp.listResources("my-server"), mcp.listPrompts("my-server"), ]) console.log(`Resources: ${resources.map((r) => r.name).join(", ")}`) console.log(`Prompts: ${prompts.map((p) => p.name).join(", ")}`) ``` ## Integration * [Tools](/docs/actions/tools) — unified interface for tool execution, including tools from MCP servers * [MCP Client](/docs/system/mcp-client) — infrastructure-level connection management for MCP servers * [SystemActions](/docs/actions/system) — broader system actions context --- # System Actions This interface is experimental — no production implementation exists yet. The API surface may change. ## Overview System Actions composes all system write interfaces into a single entry point for the actions phase of the agent loop. It is a pure facade — each system API owns its own Actions interface, and `SystemActions` re-exports them under a unified namespace. The read counterpart is [System Context](/docs/context/system), which provides read-only access to the same system interfaces. ## TypeScript API ```ts import type { SystemActions } from '@osprotocol/schema/actions/system' ``` ### SystemActions Composes all system write interfaces. ```ts interface SystemActions { /** Environment variables */ env: EnvActions /** System-wide settings */ settings: SettingsActions /** Scoped preferences */ preferences: PreferencesActions /** Resource registries */ registry: RegistryActions /** Host filesystem */ fs: FsActions /** Sandbox environments */ sandbox: SandboxActions /** Installed packages */ installer: InstallerActions /** MCP server connections */ mcp: McpActions } ``` Individual Actions interfaces are also re-exported: ```ts import type { EnvActions, SettingsActions, PreferencesActions, RegistryActions, FsActions, SandboxActions, InstallerActions, McpActions, } from '@osprotocol/schema/actions/system' ``` ## Usage Examples ### Mutate system state through the facade ```ts // Set an environment variable await system.env.set({ key: 'DATABASE_URL', value: 'postgres://...' }) // Create a sandbox for code execution const sandbox = await system.sandbox.create({ runtime: 'node24', timeout: 60000 }) // Install a dependency await system.installer.install({ name: '@osprotocol/schema', version: '^0.2.0' }) ``` ### Use individual Actions interfaces directly ```ts // When you only need filesystem write access async function saveArtifact(fs: FsActions, path: string, content: string) { await fs.write(path, content) } ``` ## Design Rationale The agent loop enforces read/write separation by phase: * **Context phase** → read-only (`SystemContext` in `context/system.ts`) * **Actions phase** → write operations (`SystemActions` in `actions/system.ts`) This zero-trust pattern ensures agents gather all context before mutating state. The facade is pure composition — it adds no logic, just groups the individual Actions interfaces for convenience. ## Integration System Actions integrates with: * **[System Context](/docs/context/system)**: Read counterpart — same interfaces, read-only view * **[Tools](/docs/actions/tools)**: System mutations can be exposed as agent tools * **[Checks](/docs/checks/audit)**: Audit trails record system mutations for verification --- # Tools This interface is experimental — no production implementation exists yet. The API surface may change. ## Overview `Tools` provides a unified surface for agent tool discovery and execution. Agents call tools without needing to know whether they originate from MCP servers, built-in capabilities, or custom providers — the implementation aggregates all sources transparently. Provider analogues include Anthropic MCP Tools, OpenAI Function Calling, Vercel AI SDK Tools, and LangChain Tools. ## TypeScript API ```ts import type { Tool, ToolResult, Tools } from "@osprotocol/schema/actions/tools" ``` ### Tool Represents a single callable tool with a name, description, parameter schema, and execute function. ```ts interface Tool { name: string description: string parameters?: object execute(params: TParams): Promise metadata?: Record } ``` | Field | Type | Description | | ------------- | --------------------------------------- | --------------------------------------------------- | | `name` | `string` | Unique tool identifier | | `description` | `string` | Human-readable description of what the tool does | | `parameters` | `object` | Optional JSON Schema describing accepted parameters | | `execute` | `(params: TParams) => Promise` | Function that performs the tool action | | `metadata` | `Record` | Optional extra data attached to the tool | ### ToolResult Returned by `Tools.execute`. Wraps the result with success/error state. ```ts interface ToolResult { toolName: string result: T success: boolean error?: string metadata?: Record } ``` | Field | Type | Description | | ---------- | ------------------------- | ----------------------------------------- | | `toolName` | `string` | Name of the tool that was executed | | `result` | `T` | The value returned by the tool | | `success` | `boolean` | Whether execution completed without error | | `error` | `string` | Error message if `success` is `false` | | `metadata` | `Record` | Optional extra data from the execution | ### Tools The primary interface agents use to discover and invoke tools. ```ts interface Tools { get(name: string): Promise list(): Promise execute(name: string, params?: unknown): Promise> } ``` | Method | Returns | Description | | ------------------------ | ------------------------ | ------------------------------------------------------ | | `get(name)` | `Promise` | Retrieve a tool by name, or `null` if not found | | `list()` | `Promise` | Return all available tools from all registered sources | | `execute(name, params?)` | `Promise>` | Invoke a tool by name with optional parameters | ## Usage Examples ### Execute a tool ```ts const result = await tools.execute("read_file", { path: "/data/config.json" }) if (result.success) { console.log(result.result) } else { console.error(result.error) } ``` ### List available tools and find one ```ts const allTools = await tools.list() const searchTool = allTools.find((t) => t.name === "web_search") if (searchTool) { console.log(searchTool.description) console.log(searchTool.parameters) } ``` ### Handle tool errors ```ts const result = await tools.execute("send_email", { to: "user@example.com", subject: "Hello", body: "Message body", }) if (!result.success) { // Log the error and fall back gracefully console.error(`Tool "${result.toolName}" failed: ${result.error}`) } ``` ## Integration * [MCP Servers](/docs/actions/mcp-servers) — tool sources exposed over the Model Context Protocol * [MCP Client](/docs/system/mcp-client) — connects to MCP servers and surfaces their tools * [SystemActions](/docs/actions/system) — built-in system-level actions available as tools --- # Audit ## Overview Agents generate formal audit reports as markdown files with YAML frontmatter. The frontmatter conforms to the `AuditEntry` schema, enabling machine-parseable compliance records. The schema is aligned with **ISO 27001** audit reporting and **ISACA/ITAF** expression of opinion standards. **Implementation patterns:** gray-matter, Contentlayer, Fumadocs. **Consumers:** Drata, Scytale (compliance automation), LangSmith, Langfuse (agent observability). ## Audit Flow ## Schema ```ts import type { AuditEntry, AuditOpinion, AuditFindings, AuditQuery, Audit, } from '@osprotocol/schema/checks/audit' ``` ### AuditOpinion ISACA expression of opinion. ```ts type AuditOpinion = | 'unqualified' // No significant issues, full compliance | 'qualified' // Minor issues that don't affect overall compliance | 'adverse' // Significant issues, non-compliant | 'disclaimer' // Unable to form opinion (insufficient evidence) ``` ### AuditFindings Finding severity counts aligned with ISO 27001 non-conformity classification. ```ts interface AuditFindings { critical: number // Immediate action required major: number // Should be addressed soon minor: number // Low risk, normal course } ``` ### AuditEntry Schema for the YAML frontmatter in audit report files. ```ts interface AuditEntry { id: string createdAt: number agentId?: string executionId?: string // ISO 27001 / ISACA fields objectives: string // What the audit aims to determine scope: string[] // Files, systems, or processes audited opinion: AuditOpinion // Expression of opinion findings: AuditFindings // Counts by severity // Detailed results (optional) ruleResults?: RuleResult[] judgeResult?: JudgeResult metadata?: Record } ``` ### AuditQuery Filter criteria for querying audit entries. ```ts interface AuditQuery { opinion?: AuditOpinion | AuditOpinion[] minCritical?: number minMajor?: number agentId?: string executionId?: string since?: number // Unix ms until?: number // Unix ms } ``` ### Audit Operations for parsing, writing, and querying audit entries. ```ts interface Audit { parse(content: string): AuditEntry write(entry: Omit, body: string): string query(query: AuditQuery): Promise } ``` * `parse` — Extract `AuditEntry` from file content with YAML frontmatter * `write` — Generate file content from entry and markdown body * `query` — Find entries matching filter criteria ## Agentic Usage ### Prompt ``` Audit the production configuration files ``` ### Output File: `audits/2026-02-21-config-review.md` ```yaml --- type: audit date: 2026-02-21 agent: reviewer objectives: "Verify configuration files follow security best practices" scope: - config/production.yaml - config/staging.yaml status: complete opinion: qualified findings: critical: 0 major: 1 minor: 2 --- ## Criteria - Security best practices - Secret management - Environment isolation ## Findings ### Major **M1: Hardcoded API endpoint** Production config contains hardcoded URL instead of environment variable... ### Minor **m1: Missing timeout configuration** ... ## Recommendations **M1**: Use environment variable for API endpoint ... ## Conclusion Configuration is mostly secure but contains one hardcoded value that should be externalized. Qualified opinion issued. ``` ### Querying Audits The frontmatter is machine-parseable: ```bash # Find audits with critical findings grep -l "critical: [1-9]" audits/*.md # Find adverse opinions grep -l "opinion: adverse" audits/*.md ``` Or programmatically via the `Audit` interface: ```ts const critical = await audit.query({ minCritical: 1 }) const adverse = await audit.query({ opinion: 'adverse' }) ``` ## Standards Mapping | AuditEntry Field | ISO 27001 | ISACA/ITAF | | ---------------- | -------------------- | ----------------------- | | `objectives` | Scope and Objectives | Objectives of the Audit | | `scope` | Scope and Objectives | Scope of Engagement | | `opinion` | Audit Conclusion | Expression of Opinion | | `findings` | Non-Conformities | Findings with Severity | | `ruleResults` | Evidence | Supporting Data | | `judgeResult` | Evaluation | Quality Assessment | ## Integration * [Rules](/docs/checks/rules) — `RuleResult[]` can be included in audit entries as evidence. * [Judge](/docs/checks/judge) — `JudgeResult` can be included for quality assessment. * [Screenshot](/docs/checks/screenshot) — Visual comparison results can support findings. --- # Judge This interface is **experimental**. No real implementation exists yet. The API shape may change before stabilization. ## Overview `Judge` is the checks-phase interface for LLM-as-judge evaluation. It uses a model to score agent output against quality criteria, returning a numeric score (0–1), a pass/fail result, and natural-language reasoning. Results can optionally include a per-criterion breakdown via `ruleResults`, feed into audit records, and trigger approval flows when a score falls below threshold. Provider analogues: OpenAI Evals, Braintrust Scorers, LangSmith Evaluators, Arize Phoenix. ## Evaluation Flow ## TypeScript API ```ts import type { JudgeConfig, JudgeResult, Judge } from '@osprotocol/schema/checks/judge' ``` ### JudgeConfig ```ts interface JudgeConfig { model?: string criteria: string threshold?: number metadata?: Record } ``` | Field | Type | Description | | ----------- | ------------------------- | ---------------------------------------------------------------------------------------- | | `model` | `string` | Model identifier to use as judge. Falls back to provider default when omitted. | | `criteria` | `string` | Natural-language description of what constitutes a passing result. Required. | | `threshold` | `number` | Minimum score (0–1) for `passed: true`. Defaults to provider-defined value when omitted. | | `metadata` | `Record` | Arbitrary metadata attached to this evaluation run. | ### JudgeResult ```ts interface JudgeResult { score: number passed: boolean reasoning: string ruleResults?: RuleResult[] metadata?: Record } ``` | Field | Type | Description | | ------------- | ------------------------- | --------------------------------------------------- | | `score` | `number` | Numeric quality score in the range 0–1. | | `passed` | `boolean` | `true` when `score >= threshold`. | | `reasoning` | `string` | Model-generated explanation for the score. | | `ruleResults` | `RuleResult[]` | Optional per-criterion breakdown from linked rules. | | `metadata` | `Record` | Arbitrary metadata returned by the judge. | ### Judge ```ts interface Judge { evaluate(content: unknown, config: JudgeConfig): Promise } ``` `evaluate` accepts any `content` value (string, object, or structured output) and a `JudgeConfig`, and returns a `Promise`. ## Usage Examples ### Basic evaluation ```ts const result = await judge.evaluate(agentOutput, { criteria: 'The response must be factually accurate, concise, and free of harmful content.', threshold: 0.8, }) console.log(result.passed) // true | false console.log(result.score) // e.g. 0.92 console.log(result.reasoning) // "The response was accurate and well-scoped..." ``` ### Breakdown by criteria using ruleResults ```ts const result = await judge.evaluate(agentOutput, { model: 'claude-opus-4-6', criteria: 'Evaluate accuracy, tone, and completeness separately.', threshold: 0.75, }) if (result.ruleResults) { for (const rule of result.ruleResults) { console.log(rule.name, rule.passed, rule.score) } } ``` ### Conditional approval trigger ```ts const result = await judge.evaluate(agentOutput, { criteria: 'Output must not contain PII and must follow the brand voice guide.', threshold: 0.9, }) if (!result.passed) { // Route to human approval before publishing await approvalGate.request({ reason: result.reasoning, score: result.score, }) } ``` ## Rules vs Judge | | Rules | Judge | | ----------------- | ------------------------------------------------- | -------------------------------------------- | | Evaluation method | Deterministic / programmatic | LLM-based qualitative evaluation | | Output | Pass/fail per rule | Score (0–1) + reasoning | | Best for | Schema validation, format checks, required fields | Tone, accuracy, helpfulness, nuanced quality | | Latency | Low | Higher (model call required) | | Cost | None | Model inference cost | | Auditability | Exact rule match | Natural-language reasoning | Use `Rules` for hard constraints and `Judge` for qualitative grading where human-like judgment is required. ## Integration * [Rules](/docs/checks/rules) — deterministic checks whose results can be surfaced as `ruleResults` inside a `JudgeResult` * [Audit](/docs/checks/audit) — `JudgeResult` records are written to the audit log for traceability * [Approval](/docs/runs/approval) — when `passed` is `false`, evaluation results can trigger a human approval gate before execution continues --- # Rules Rules is an experimental interface. No implementation exists yet. The API described here reflects the current design and is subject to change as the protocol evolves. ## Overview Rules define declarative verification criteria that agent output must satisfy before being accepted. They are composable: rules can be evaluated individually or as a complete set against any content. Provider analogues include ESLint Rules, GitHub Checks, Vercel Deployment Checks, and OpenAI Guardrails. ## Severity Levels | Severity | Meaning | | --------- | ------------------------------------------------------------- | | `error` | The rule failure blocks acceptance. Output must not proceed. | | `warning` | The rule failure is notable but does not block acceptance. | | `info` | The rule result is informational only. No action is required. | ## TypeScript API ```ts import type { RuleSeverity, RuleResult, Rule, Rules } from '@osprotocol/schema/checks/rules' ``` ### RuleSeverity ```ts type RuleSeverity = 'error' | 'warning' | 'info' ``` Indicates how a rule failure should be treated. An `error` blocks acceptance, a `warning` is surfaced without blocking, and `info` is purely observational. ### RuleResult ```ts interface RuleResult { ruleName: string passed: boolean severity: RuleSeverity message: string metadata?: Record } ``` The result returned after evaluating a single rule against content. `passed` indicates whether the rule was satisfied. `message` provides a human-readable explanation. `metadata` carries any structured diagnostic data the rule chooses to emit. ### Rule ```ts interface Rule { name: string description: string severity: RuleSeverity evaluate(content: unknown): Promise metadata?: Record } ``` A single verifiable criterion. `evaluate` receives the content to check and returns a `RuleResult`. The `severity` on the `Rule` defines the default severity that should appear in results when the rule fails. ### Rules ```ts interface Rules { get(name: string): Promise list(): Promise evaluate(content: unknown): Promise } ``` A collection of rules. `list` enumerates all registered rules. `get` retrieves a specific rule by name. `evaluate` runs all rules against the provided content and returns a `RuleResult` for each one. ## Usage Examples ### Evaluate all rules against content ```ts const results = await rules.evaluate(agentOutput) for (const result of results) { if (!result.passed && result.severity === 'error') { throw new Error(`Rule failed: ${result.ruleName} — ${result.message}`) } } ``` ### Get a specific rule by name ```ts const rule = await rules.get('no-pii-in-output') if (rule) { const result = await rule.evaluate(agentOutput) console.log(result.passed, result.message) } ``` ### Define a custom rule ```ts const noPiiRule: Rule = { name: 'no-pii-in-output', description: 'Ensures agent output does not contain personally identifiable information', severity: 'error', async evaluate(content: unknown): Promise { const text = typeof content === 'string' ? content : JSON.stringify(content) const hasPii = /\b\d{3}-\d{2}-\d{4}\b/.test(text) // SSN pattern example return { ruleName: 'no-pii-in-output', passed: !hasPii, severity: 'error', message: hasPii ? 'Output contains potential PII' : 'No PII detected', } }, } ``` ## Integration Rule results produced by `Rules.evaluate` feed into other parts of the checks and runs pipeline: * [Judge](/docs/checks/judge) — uses rule results alongside other signals to produce a quality verdict * [Audit](/docs/checks/audit) — records rule results for traceability and post-hoc review * [Approval](/docs/runs/approval) — an `error`-severity failure can gate a run and trigger a human approval step --- # Screenshot This interface is experimental. No implementation exists yet. The API shape may change before stabilization. ## Overview Screenshot provides visual capture and baseline comparison for visual regression detection within the checks phase. Providers include Playwright, Puppeteer, Browserbase, and ScreenshotOne. Comparison results feed into the audit trail alongside rule and judge results, giving a complete picture of agent output quality. ## Capture and compare flow ## TypeScript API ```ts import type { ImageFormat, ScreenshotOptions, ScreenshotEntry, ComparisonResult, Screenshot, } from '@osprotocol/schema/checks/screenshot' ``` ### ImageFormat ```ts type ImageFormat = 'png' | 'jpeg' | 'webp' ``` PNG is lossless and best suited for pixel diffing. JPEG and WebP produce smaller payloads when exact pixel fidelity is not required. ### ScreenshotOptions ```ts interface ScreenshotOptions { url?: string fullPage?: boolean clip?: { x: number y: number width: number height: number } selector?: string format?: ImageFormat quality?: number scale?: number omitBackground?: boolean metadata?: Record } ``` | Field | Description | | ---------------- | -------------------------------------------------------- | | `url` | Page URL to navigate to before capturing | | `fullPage` | Capture the full scrollable page instead of the viewport | | `clip` | Restrict capture to a bounding box in pixels | | `selector` | CSS selector — captures only the matching element | | `format` | Output image format (`png`, `jpeg`, `webp`) | | `quality` | Compression quality for JPEG and WebP (0–100) | | `scale` | Device pixel ratio multiplier | | `omitBackground` | Make the background transparent (PNG only) | | `metadata` | Arbitrary key-value pairs attached to the entry | ### ScreenshotEntry ```ts interface ScreenshotEntry { id: string data: string format: ImageFormat width: number height: number createdAt: number metadata?: Record } ``` `data` is a base64-encoded image string. `createdAt` is a Unix timestamp in milliseconds. ### ComparisonResult ```ts interface ComparisonResult { passed: boolean message: string diffPixels: number diffRatio: number diffImage?: string metadata?: Record } ``` `passed` and `message` follow the same convention as `RuleResult`, so comparison results compose naturally into the checks audit trail. `diffImage` is an optional base64-encoded visualization of the pixel diff. ### Screenshot ```ts interface Screenshot { capture(options?: ScreenshotOptions): Promise compare( actual: ScreenshotEntry, baseline: ScreenshotEntry, threshold?: number, ): Promise } ``` `capture` maps to provider-native methods: `page.screenshot` in Playwright and Puppeteer, `Page.captureScreenshot` via CDP in Browserbase, and `GET /take?url=...` in ScreenshotOne. `compare` uses pixel diffing — only Playwright has this built in via `toHaveScreenshot` (Pixelmatch). For all other providers the adapter handles comparison externally. `threshold` is a ratio between 0 and 1 representing the maximum acceptable pixel difference before `passed` becomes `false`. ## Usage examples ### Capture a full-page screenshot ```ts const entry = await screenshot.capture({ url: 'https://example.com', fullPage: true, format: 'png', }) ``` ### Visual regression test against a baseline ```ts const actual = await screenshot.capture({ url: 'https://example.com' }) const result = await screenshot.compare(actual, baseline, 0.01) if (!result.passed) { console.log(result.message) console.log(`Diff: ${result.diffPixels} pixels (${result.diffRatio * 100}%)`) } ``` ### Capture a specific element ```ts const entry = await screenshot.capture({ url: 'https://example.com/dashboard', selector: '#revenue-chart', format: 'png', omitBackground: true, }) ``` ## Integration * [Audit](/docs/checks/audit) — screenshot entries and comparison results are attached to the audit trail * [Sandbox](/docs/system/sandbox) — browser-based captures run inside isolated sandbox environments * [Rules](/docs/checks/rules) — rule results and screenshot comparison results compose into a unified checks report --- # Agent Loop ## Overview The Agent Loop is the fundamental execution pattern that all agent implementations MUST support. It defines the iterative cycle through which agents gather context, take actions, verify their work, and iterate until completion. ## The Four Steps ### 1. Gather Context Agents collect information needed to complete their task through: * **Agentic search**: File systems, grep, tail, structured queries * **Semantic search**: Vector embeddings for concept-based queries * **Subagents**: Isolated context windows for parallel information gathering * **Context compaction**: Summarization for long-running agents Learn more: [Context Management](/docs/context) ### 2. Take Action Agents execute operations using: * **Tools**: Primary building blocks with clear interfaces * **Bash/Scripts**: Command execution and automation * **Code Generation**: Dynamic code creation and execution * **MCP Integration**: Standardized protocol for external services Learn more: [Actions](/docs/actions/tools) ### 3. Verify Work Agents validate outputs through: * **Rules-based validation**: Defined criteria and constraints * **Visual feedback**: Screenshots and renders for UI tasks * **LLM-as-judge**: Model-based evaluation Learn more: [Checks](/docs/checks/rules) ### 4. Iterate The loop repeats until: * Task completion criteria are met * Iteration limits are reached * Termination conditions are triggered ## Cognitive Micro-Pattern The loop maps to an internal cognitive cycle: 1. **Think/Reason** → Plan next action (Gather Context) 2. **Act** → Execute tools (Take Action) 3. **Observe** → Process results (Verify Work) 4. **Reflect** → Evaluate progress (Verify Work) 5. **Decide** → Continue or stop (Iterate) Reference: [Anthropic: Building Agents with Claude Agent SDK](https://www.anthropic.com/engineering/building-agents-with-the-claude-agent-sdk) ## Next Steps * Understand how the loop fits into the **[Agent Lifecycle](/docs/concepts/lifecycle)** * Explore **[Workflow Patterns](/docs/concepts/workflows-taxonomy)** that orchestrate the loop * Read the full specification in [AGENTS.md Section 2](https://github.com/synerops/osprotocol/blob/main/AGENTS.md#2-core-execution-pattern-agent-loop) --- # Agentic OS ## Overview An **Agentic OS** is a design paradigm where the **Large Language Model (LLM)** functions conceptually as the **Kernel** of the system. Unlike traditional operating systems designed for human interaction, an Agentic OS is a **backend infrastructure layer** that manages the lifecycle and resources of autonomous software agents. This concept is foundational to understanding the Agentic OS Protocol (OSP). OSP defines the standardized interfaces and behaviors that implementations of an Agentic OS must follow—the contract that enables different agent systems to interoperate. ## Resource Abstraction Just as traditional operating systems abstract hardware resources (CPU, memory, disk, devices), an Agentic OS abstracts the cognitive resources of AI systems. The Agentic OS manages cognitive resources just as a traditional kernel manages physical hardware: * **CPU Cycles → Inference / Tokens**: Managing the compute required for reasoning and generation * **RAM (Memory) → Context Window**: Managing the finite amount of information active in the model's immediate attention * **Disk / Filesystem → Vector Store / RAG**: Managing long-term retrieval and persistent knowledge * **Device Drivers → Tools / MCP**: Standardizing interfaces for external interaction (APIs, browsers, code execution) * **Process Scheduler → Agent Orchestrator**: Determining which agent runs when, and for how long Learn more: [System Intelligence](/docs/system/registry) ## The "User" of the OS In this paradigm, **the "User" of the Operating System is the Agent itself**, not the human. * The **Agent** requests resources from the OS ("I need to read this file", "I need to store this memory") * The **OS** enforces permissions, manages limits, and provides the requested capabilities * The **Human** acts as the external administrator or the user of the *application* built on top of the OS, but does not interact with the Agentic OS layer directly This distinction is critical: an Agentic OS is the invisible infrastructure that enables complex, multi-agent systems to function reliably at scale. ## Scope and Purpose The Agentic OS solves **Orchestration Complexity**, not User Experience. Its primary goals are: 1. **Context Hygiene:** Preventing context pollution and managing finite window sizes 2. **Process Isolation:** Ensuring agents operate within defined boundaries without interfering with each other 3. **Inter-Process Communication:** Enabling standardized communication between disparate agents These goals align directly with the challenges outlined in our [Motivation](/docs/motivation): as systems grow, managing context, isolation, and communication becomes increasingly complex. The Agentic OS provides the infrastructure layer that addresses these challenges systematically. Learn more: [Motivation](/docs/motivation) | [Architecture](/docs/architecture) ## Agentic OS vs OSP It's important to understand the distinction: * **Agentic OS** is the conceptual paradigm—the architectural metaphor * **OSP (Agentic OS Protocol)** is the specification—the standardized contract that implementations must follow Just as "operating system" describes a category of software (Linux, Windows, macOS), "Agentic OS" describes a category of systems that manage agent resources. OSP defines the protocol specification that different implementations can follow to achieve interoperability. Think of it this way: Linux and Windows are both operating systems, but they follow different architectures. Multiple implementations can follow OSP and each be an "Agentic OS" with different internal designs—but they'll all interoperate because they follow the same protocol contract. ## How OSP Implements the Agentic OS OSP defines the standardized interfaces and behaviors that make an Agentic OS possible: * **[System](/docs/system)**: Registry, Environment, Filesystem, Sandbox, Settings, Preferences, Installer, MCP Client — the infrastructure layer * **[Context](/docs/context)**: System Context, Embeddings, Key-Value — read-only facades for the gather phase * **[Actions](/docs/actions)**: System Actions, Tools, MCP Servers — write facades for the act phase * **[Checks](/docs/checks/rules)**: Rules, Judge, Audit, Screenshot — verification and quality assurance These components work together to provide the resource abstraction, process isolation, and inter-process communication that define an Agentic OS. Learn more: [Architecture](/docs/architecture) ## Next Steps * Understand the **[Agent Loop](/docs/concepts/agent-loop)**—the core execution pattern within agents * Explore the **[Agent Lifecycle](/docs/concepts/lifecycle)**—how the OS manages agent resources * Review **[Workflow Patterns](/docs/concepts/workflows-taxonomy)**—operational execution patterns --- # Agent Lifecycle ## Overview The Agent Lifecycle is a System/Control Workflow that defines how agents are managed within the system. Unlike the [Agent Loop](/docs/concepts/agent-loop) (which describes internal execution), the Lifecycle governs system-level responsibilities: registration, discovery, execution management, and evaluation. ## The Four Phases ### 1. Registration Agents declare their capabilities and constraints to the system: * Capability declaration * Resource requirements specification * Constraint definition * Metadata registration Learn more: [System Registry](/docs/system/registry) ### 2. Discovery The system exposes agents for selection and routing: * Capability-based discovery * Dynamic service discovery * Load balancing mechanisms * Failover protocols Learn more: [System Registry](/docs/system/registry) ### 3. Execution Management The OS assigns tasks and monitors progress: * Task assignment interfaces * Real-time monitoring * Error handling * State management * Policy enforcement Learn more: [Runs](/docs/runs/run), [Actions](/docs/actions) ### 4. Evaluation Outputs, logs, and performance are reviewed: * Performance monitoring * Quality assessment * Compliance verification * Adaptation mechanisms Learn more: [Audit](/docs/checks/audit), [Judge](/docs/checks/judge) ## Lifecycle vs Loop vs Workflows Understanding the distinction is crucial: | Concept | Layer | Purpose | Scope | | -------------------------------------------------- | ----------- | ------------------ | ---------------------- | | **Lifecycle** | System | Agent management | OS/Platform governance | | **[Loop](/docs/concepts/agent-loop)** | Cognitive | Internal execution | Single agent reasoning | | **[Workflows](/docs/concepts/workflows-taxonomy)** | Operational | Task orchestration | Multi-step processes | * **Lifecycle** exists *outside* any specific workflow—it's the system contract * **Loop** executes *inside* workflows—it's the cognitive engine * **Workflows** orchestrate *during* Execution/Evaluation phases—they're the macro patterns Reference: [Anthropic: Building Effective Agents](https://www.anthropic.com/engineering/building-effective-agents) ## Next Steps * Explore [Workflow Taxonomy](/docs/concepts/workflows-taxonomy) to see operational patterns * Read the [AGENTS.md](https://github.com/synerops/osprotocol/blob/main/AGENTS.md) knowledge base --- # Workflows Taxonomy ## Overview Workflows are operational execution patterns that define how tasks are executed during the Execution/Evaluation phases of the [Agent Lifecycle](/docs/concepts/lifecycle). They are macro-level orchestration patterns, distinct from the [Agent Loop](/docs/concepts/agent-loop) (micro-execution) and the Lifecycle (system layer). ## The Six Categories ### 1. System/Control Workflows Govern agent management at the platform level. The primary workflow is the [Agent Lifecycle](/docs/concepts/lifecycle): Registration → Discovery → Execution → Evaluation. ### 2. Task Workflows Operational patterns for executing work: * **[Routing](/docs/workflows/routing)**: Classify inputs and direct to specialized tasks * **Prompt Chaining**: Sequential steps with validation gates * **[Orchestrator-Workers](/docs/workflows/orchestrator-worker)**: Central orchestrator delegates to workers * **[Parallelization](/docs/workflows/parallelization)**: Simultaneous execution with aggregation * **[Evaluator-Optimizer](/docs/workflows/evaluator-optimizer)**: Generate-evaluate-refine loops Reference: [Anthropic: Building Effective Agents](https://www.anthropic.com/engineering/building-effective-agents) ### 3. Quality Workflows Ensure outputs meet standards: * **[Rules Validation](/docs/checks/rules)**: Defined criteria and constraints * **[Visual Checks](/docs/checks/screenshot)**: Screenshots and renders * **[LLM-as-Judge](/docs/checks/judge)**: Model-based evaluation ### 4. Recovery Workflows Handle failures and errors: * **[Retries](/docs/runs/retry)**: Automatic retry mechanisms * **[Timeouts](/docs/runs/timeout)**: Long-running operation handling * **[Cancellation](/docs/runs/cancel)**: Graceful termination of running operations ### 5. Human-in-the-Loop Workflows Integrate human oversight: * **[Approval Workflows](/docs/runs/approval)**: Human approval before proceeding * **Manual Delegation**: Human task assignment ### 6. Multi-Agent Workflows Coordinate multiple agents: * **Agent Coordination**: Multiple agents working together * **Distributed Execution**: Tasks distributed across agents ## Key Distinctions * **Workflows** are macro-level orchestration patterns used *during* Execution/Evaluation * **[Agent Loop](/docs/concepts/agent-loop)** is the micro-level cognitive cycle *inside* workflows * **[Lifecycle](/docs/concepts/lifecycle)** is the system-level governance *around* workflows ## Next Steps * Explore specific [Task Workflows](/docs/workflows/routing) * Understand [Quality Assurance](/docs/checks/audit) mechanisms * Learn about [Recovery Patterns](/docs/runs/retry) * Read the full specification in [AGENTS.md Section 3](https://github.com/synerops/osprotocol/blob/main/AGENTS.md#3-workflow-patterns) --- # Embeddings This interface is experimental — no production implementation exists yet. The API surface may change. ## Overview Embeddings is the agent-facing interface for semantic search over indexed knowledge. Agents use it to find relevant content by meaning rather than exact keywords. The vector database infrastructure underneath is a system concern — providers like Pinecone, Upstash Vector, Weaviate, or OpenAI Embeddings handle storage and retrieval without the agent needing to know which one is in use. ## TypeScript API ```ts import type { Embeddings, EmbeddingEntry, EmbeddingsContext, EmbeddingsActions } from '@osprotocol/schema/context/embeddings' ``` ### EmbeddingEntry A single result returned from a search or get operation. ```ts interface EmbeddingEntry> { id: string content: string /** Similarity score, 0–1. Present only in search results. */ score?: number metadata?: T } ``` ### EmbeddingsContext Read-only interface for the context phase of the agent loop. Use this to find relevant entries by meaning or retrieve a known entry by ID. ```ts interface EmbeddingsContext { search>( query: string, topK: number, filter?: Partial ): Promise[]> get>(id: string): Promise | null> } ``` ### EmbeddingsActions Write interface for the actions phase of the agent loop. Use this to index new content or remove stale entries. ```ts interface EmbeddingsActions { upsert>( id: string, content: string, metadata?: T ): Promise> remove(id: string): Promise } ``` ### Embeddings Full interface combining read and write operations. ```ts interface Embeddings { upsert>( id: string, content: string, metadata?: T ): Promise> search>( query: string, topK: number, filter?: Partial ): Promise[]> get>(id: string): Promise | null> remove(id: string): Promise } ``` ## Usage Examples ### Semantic search with metadata filter ```ts type DocMeta = { source: string; language: string } const results = await embeddings.search( 'how to handle authentication errors', 5, { language: 'en' } ) for (const entry of results) { console.log(entry.score, entry.content) // 0.91 "When a 401 is returned, refresh the token and retry..." } ``` ### Upsert content into the index ```ts await embeddings.upsert( 'doc:auth-errors', 'When a 401 is returned, refresh the token and retry the request.', { source: 'runbook', language: 'en' } ) ``` ### RAG pattern — retrieve, then generate ```ts const chunks = await embeddings.search(userQuestion, 3) const context = chunks.map((c) => c.content).join('\n\n') const answer = await llm.complete(`Answer using this context:\n\n${context}\n\nQuestion: ${userQuestion}`) ``` ## Embeddings vs Key-Value | Concern | Embeddings | Key-Value (`context/kv`) | | ---------- | -------------------------- | ------------------------------- | | Lookup by | Meaning / similarity | Exact key | | Returns | Ranked results with scores | Single entry or null | | Best for | Knowledge retrieval, RAG | Session state, config, counters | | Query type | Natural language query | Known key string | ## Integration Embeddings integrates with: * **[Key-Value Store](/docs/context/kv)**: Complementary persistence — embeddings for semantic search, kv for exact lookups * **[System Context](/docs/context/system)**: EmbeddingsContext is part of the read-only system context facade * **[Filesystem](/docs/system/fs)**: Source documents can be read from fs and indexed into embeddings --- # Context ## Overview The Context domain provides application-specific context and data management for agents. It enables agents to access, store, and retrieve information needed for intelligent decision-making and task execution. Context is one of the three pillars of the agent loop: **Gather Context** → Take Actions → Verify Results. ## Context APIs | API | Description | | -------------------------------------- | ------------------------------------------------------ | | [System Context](/docs/context/system) | Read-only composition of all system Context interfaces | | [Embeddings](/docs/context/embeddings) | Vector embeddings for semantic search | | [Key-Value Store](/docs/context/kv) | Key-value persistence for the agent loop | ## Role in Agent Loop Context provides the foundation for informed agent behavior: ## Usage Context is accessed through the `context` protocol domain. `SystemContext` composes all system read interfaces into a single entry point. `Embeddings` and `KV` are agent-facing read/write interfaces for semantic search and key-value persistence respectively. ## Integration Context integrates with: * **System**: Accesses system-level information * **Actions**: Provides context for action execution * **Checks**: Context informs quality verification * **Workflows**: Workflows access context during execution --- # Key-Value Store This interface is experimental — no production implementation exists yet. The API surface may change. ## Overview The key-value store provides flat, direct-access persistence for structured data. Agents use it to store and retrieve data by known keys — session state, user preferences, configuration, counters. Unlike `fs` (hierarchical, file-based) and `embeddings` (semantic search by meaning), `kv` is for exact key lookups. ## TypeScript API ```ts import type { Kv, KvEntry, KvContext, KvActions } from '@osprotocol/schema/context/kv' ``` ### KvEntry A single key-value entry. ```ts interface KvEntry { /** Entry key */ key: string /** Entry value */ value: T /** Extensible metadata for provider-specific data */ metadata?: Record } ``` ### Kv Full key-value store interface with read and write operations. ```ts interface Kv { get(key: string): Promise | null> set(key: string, value: T): Promise> remove(key: string): Promise list(prefix?: string): Promise } ``` ### KvContext Read-only view for the context phase of the agent loop. ```ts interface KvContext { get(key: string): Promise | null> list(prefix?: string): Promise } ``` ### KvActions Write operations for the actions phase of the agent loop. ```ts interface KvActions { set(key: string, value: T): Promise> remove(key: string): Promise } ``` ## Usage Examples ### Store and retrieve session state ```ts await kv.set('session:abc123', { userId: 'user-42', startedAt: Date.now(), step: 'code-review', }) const session = await kv.get<{ userId: string; step: string }>('session:abc123') // session.value.step → 'code-review' ``` ### Enumerate keys by prefix ```ts const keys = await kv.list('session:') // ['session:abc123', 'session:def456', ...] ``` ### Remove expired data ```ts const removed = await kv.remove('session:abc123') // true if the entry existed ``` ## Agent Persistence Model The protocol provides three distinct persistence patterns: | Pattern | Interface | Access | Use Case | | ---------------- | -------------------- | ---------------------------- | ---------------------------------------- | | **Hierarchical** | `system/fs` | Paths and directories | Files, configs, artifacts | | **Key-value** | `context/kv` | Direct key lookup | Session state, counters, structured data | | **Semantic** | `context/embeddings` | Similarity search by meaning | Knowledge retrieval, RAG | ## Integration Key-Value Store integrates with: * **[Embeddings](/docs/context/embeddings)**: Complementary persistence — kv for exact lookups, embeddings for semantic search * **[Filesystem](/docs/system/fs)**: Complementary persistence — kv for flat data, fs for hierarchical files * **[System Context](/docs/context/system)**: KvContext is part of the read-only system context facade --- # System Context This interface is experimental — no production implementation exists yet. The API surface may change. ## Overview `SystemContext` is the read-only facade that composes all system Context interfaces into a single entry point. It is used during the context (gather) phase of the agent loop, giving agents a unified view of system state without the ability to mutate it. Write operations are handled by the counterpart: [SystemActions](/docs/actions/system). ## Architecture `SystemContext` is pure composition — it adds no logic of its own, only grouping each system API's read-only Context interface under one namespace. `SystemActions` mirrors this structure for write operations. ## TypeScript API ```ts import type { SystemContext } from '@osprotocol/schema/context/system' ``` ### SystemContext Composes all system read-only interfaces. ```ts interface SystemContext { env: EnvContext settings: SettingsContext preferences: PreferencesContext registry: RegistryContext fs: FsContext sandbox: SandboxContext installer: InstallerContext mcp: McpContext } ``` ## Composed Interfaces | Property | Type | Provides | Docs | | ------------- | -------------------- | -------------------------------------------------------------- | --------------------------------------- | | `env` | `EnvContext` | Read environment variables (get, list) | [Env](/docs/system/env) | | `settings` | `SettingsContext` | Read system-wide settings (get, list) | [Settings](/docs/system/settings) | | `preferences` | `PreferencesContext` | Read per-agent or per-user preferences by scope (get, list) | [Preferences](/docs/system/preferences) | | `registry` | `RegistryContext` | Discover and look up registered resources (get, list) | [Registry](/docs/system/registry) | | `fs` | `FsContext` | Read host filesystem entries (read, list, exists) | [Fs](/docs/system/fs) | | `sandbox` | `SandboxContext` | Inspect existing sandbox environments (get, list) | [Sandbox](/docs/system/sandbox) | | `installer` | `InstallerContext` | Inspect installed packages and their status (get, list) | [Installer](/docs/system/installer) | | `mcp` | `McpContext` | Inspect MCP server connections and available tools (get, list) | [MCP Client](/docs/system/mcp-client) | ## Usage Examples ### Check environment and preferences together An agent reads an environment variable and resolves a user preference in the same context phase before deciding how to act. ```ts async function resolveOutputConfig(system: SystemContext) { const dbUrl = await system.env.get('DATABASE_URL') const formatPref = await system.preferences.get('output.format', 'user') return { databaseUrl: dbUrl?.value ?? null, outputFormat: formatPref?.value ?? 'json', } } ``` ### Inspect installed packages and MCP connections An agent audits what capabilities are currently available before deciding whether to proceed with a task. ```ts async function auditCapabilities(system: SystemContext) { const [packages, mcpServers] = await Promise.all([ system.installer.list(), system.mcp.list(), ]) const hasSchemaPackage = packages.some( (p) => p.name === '@osprotocol/schema' && p.status === 'installed' ) const connectedServers = mcpServers.filter((s) => s.status === 'connected') return { hasSchemaPackage, connectedServers } } ``` ## Integration * **[SystemActions](/docs/actions/system)**: The write counterpart — same system interfaces, mutation operations * **[EnvContext](/docs/system/env)**: Environment variable read interface * **[SettingsContext](/docs/system/settings)**: System-wide settings read interface * **[PreferencesContext](/docs/system/preferences)**: Scoped preferences read interface * **[RegistryContext](/docs/system/registry)**: Resource registry read interface * **[FsContext](/docs/system/fs)**: Host filesystem read interface * **[SandboxContext](/docs/system/sandbox)**: Sandbox inspection interface * **[InstallerContext](/docs/system/installer)**: Installed packages read interface * **[McpContext](/docs/system/mcp-client)**: MCP server connections read interface --- # Approval ## Overview The approval system enables human oversight of workflow execution. It provides approval requests, multi-approver workflows, and configurable timeout behavior for critical checkpoints. ## TypeScript API ```ts import type { Approval, ApprovalConfig, ApprovalRequest } from '@osprotocol/schema/runs/approval' ``` ### Approval Result of an approval request. ```ts interface Approval { /** Whether the action was approved */ approved: boolean /** Optional reason for the decision */ reason?: string /** Identifier of who approved (user ID, email, etc.) */ approvedBy?: string /** When the approval decision was made */ timestamp: Date } ``` ### ApprovalConfig Configuration for approval requests. ```ts interface ApprovalConfig { /** Default timeout for approval requests (milliseconds) */ timeoutMs?: number /** Whether to auto-approve after timeout */ autoApproveOnTimeout?: boolean /** List of users who can approve */ approvers?: string[] /** Minimum approvals required (for multi-approval scenarios) */ requiredApprovals?: number } ``` ### ApprovalRequest A pending request for human approval. ```ts interface ApprovalRequest { /** Unique identifier for this request */ id: string /** Message describing what needs approval */ message: string /** Execution ID this request belongs to */ executionId: string /** When the request was created */ createdAt: Date /** When the request expires */ expiresAt?: Date /** Current approval responses */ responses: Approval[] } ``` ## Usage Example ```ts // Request approval during execution const approval = await execution.waitForApproval( 'Deploy to production environment?' ) if (approval.approved) { console.log(`Approved by ${approval.approvedBy}: ${approval.reason}`) // Continue with deployment } else { console.log(`Denied: ${approval.reason}`) // Handle rejection } ``` ## Multi-Approval Workflows For critical operations requiring multiple approvers: ```ts const config: ApprovalConfig = { timeoutMs: 3600000, // 1 hour approvers: ['alice@company.com', 'bob@company.com'], requiredApprovals: 2, autoApproveOnTimeout: false } ``` ## Integration Approval integrates with: * **Execution**: Pauses execution until approval * **Timeout**: Approval requests can expire * **Cancel**: Rejected approvals can trigger cancellation --- # Cancellation ## Overview The cancel system provides mechanisms to cancel running workflows gracefully. It supports cancellation hooks, cleanup operations, and configurable grace periods for in-progress work. ## TypeScript API ```ts import type { Cancel } from '@osprotocol/schema/runs/cancel' ``` ### Cancel Cancel configuration for workflow runs. ```ts interface Cancel { /** * Called before cancellation proceeds * Return false to prevent cancellation */ beforeCancel?: () => boolean | Promise /** * Called after cancellation completes */ afterCancel?: () => void /** * Optional reason for cancellation */ reason?: string /** * Whether to wait for cleanup before resolving */ graceful?: boolean /** * Timeout for graceful cancellation in milliseconds */ gracefulTimeoutMs?: number } ``` ## Usage Examples ### Simple Cancellation ```ts // Cancel an execution await execution.cancel('User requested stop') ``` ### Cancellation with Cleanup ```ts const cancel: Cancel = { graceful: true, gracefulTimeoutMs: 5000, beforeCancel: async () => { // Check if safe to cancel const canCancel = await checkSafeToCancel() return canCancel }, afterCancel: () => { // Cleanup resources cleanupTempFiles() closeConnections() } } ``` ### Preventing Cancellation ```ts const cancel: Cancel = { beforeCancel: () => { if (criticalOperationInProgress) { console.log('Cannot cancel during critical operation') return false // Prevent cancellation } return true } } ``` ### Graceful Shutdown ```ts const cancel: Cancel = { graceful: true, gracefulTimeoutMs: 10000, // 10 second grace period reason: 'System shutdown', afterCancel: () => { notifyDependentSystems() } } // Waits up to 10 seconds for graceful cleanup // Forces cancellation if cleanup exceeds timeout ``` ## Cancellation Flow ## Integration Cancel integrates with: * **RunOptions**: Configure cancel behavior for runs * **Timeout**: Timeouts can trigger cancellation * **Execution**: Cancel is called via execution.cancel() * **Approval**: Rejected approvals may trigger cancellation --- # Retry ## Overview The retry system provides configurable retry behavior for failed operations. It supports multiple backoff strategies, conditional retries, and callbacks for monitoring retry attempts. ## Backoff Strategies | Strategy | Description | | ------------- | ----------------------------------------------- | | `none` | No delay increase between retries | | `linear` | Delay increases linearly (delay \* attempt) | | `exponential` | Delay doubles each attempt (delay \* 2^attempt) | ## TypeScript API ```ts import type { Retry, Backoff } from '@osprotocol/schema/runs/retry' ``` ### Backoff Available backoff strategies for retry delays. ```ts type Backoff = 'none' | 'linear' | 'exponential' ``` ### Retry Retry configuration for workflow runs. ```ts interface Retry { /** Maximum number of retry attempts */ attempts: number /** Initial delay between retries in milliseconds */ delayMs: number /** Backoff strategy (default: 'none') */ backoff?: Backoff /** Maximum delay when using backoff (milliseconds) */ maxDelayMs?: number /** Callback on each retry attempt */ onRetry?: (error: Error, attempt: number) => void /** Optional predicate to determine if error is retryable */ shouldRetry?: (error: Error) => boolean } ``` ## Usage Examples ### Simple Retry ```ts const retry: Retry = { attempts: 3, delayMs: 1000 } // Retries up to 3 times with 1 second between each attempt ``` ### Exponential Backoff ```ts const retry: Retry = { attempts: 5, delayMs: 100, backoff: 'exponential', maxDelayMs: 10000, onRetry: (error, attempt) => { console.log(`Retry ${attempt}: ${error.message}`) } } // Delays: 100ms, 200ms, 400ms, 800ms, 1600ms (capped at 10000ms) ``` ### Conditional Retry ```ts const retry: Retry = { attempts: 3, delayMs: 500, shouldRetry: (error) => { // Only retry network errors return error.name === 'NetworkError' } } ``` ## Delay Calculation | Strategy | Attempt 1 | Attempt 2 | Attempt 3 | Attempt 4 | | ------------- | --------- | ------------ | ------------ | ------------ | | `none` | delayMs | delayMs | delayMs | delayMs | | `linear` | delayMs | 2 \* delayMs | 3 \* delayMs | 4 \* delayMs | | `exponential` | delayMs | 2 \* delayMs | 4 \* delayMs | 8 \* delayMs | ## Integration Retry integrates with: * **RunOptions**: Configure retry behavior for runs * **Timeout**: Retries respect timeout constraints * **Cancel**: Pending retries can be cancelled --- # Run Lifecycle ## Overview Executions represent an active workflow run with full lifecycle control. The run system provides status tracking, progress monitoring, and execution control through pause, resume, and cancel operations. Creating a run IS starting it — `workflow.run()` returns an active `Execution` handle directly. This aligns with the [Agent Communication Protocol (ACP)](https://agentcommunicationprotocol.dev/core-concepts/agent-run-lifecycle). ## Lifecycle ## TypeScript API ```ts import type { RunOptions, RunStatus, Execution, ExecutionProgress } from '@osprotocol/schema/runs' ``` ### RunStatus The possible states of a workflow execution. ```ts type RunStatus = | 'pending' // Execution is queued/initializing | 'in-progress' // Execution is actively running | 'awaiting' // Execution is waiting for human input/approval | 'completed' // Execution finished successfully | 'failed' // Execution encountered an error | 'cancelled' // Execution was cancelled ``` ### RunOptions Options for configuring a workflow run. ```ts interface RunOptions { /** Timeout configuration */ timeout?: Timeout /** Retry configuration */ retry?: Retry /** Cancel configuration */ cancel?: Cancel /** Callback when run completes successfully */ onComplete?: (result: Output) => void /** Callback when run fails */ onFailed?: (error: Error) => void /** Callback on each status change */ onStatusChange?: (status: RunStatus) => void } ``` ### Execution Active execution handle for controlling a running workflow. ```ts interface Execution { /** Unique identifier for this execution */ id: string /** Current status */ status: RunStatus /** Progress information */ progress: ExecutionProgress /** Execution logs */ logs: string[] /** Pause the execution (if supported) */ pause(): Promise /** Resume a paused execution */ resume(): Promise /** Cancel the execution */ cancel(reason?: string): Promise /** Request human approval before continuing */ waitForApproval(message?: string): Promise /** Request input from a human */ waitForInput(prompt: string): Promise /** The final result of the execution (resolves when complete) */ result: Promise } ``` ### ExecutionProgress Progress tracking for an execution. ```ts interface ExecutionProgress { /** Current step number */ current: number /** Total number of steps (0 if unknown) */ total: number /** Description of current step */ message?: string } ``` ## Usage Example ```ts // Run a workflow and get an active execution const execution = await workflow.run(prompt, { timeout: { ms: 30000 }, retry: { attempts: 3, delayMs: 1000 }, onStatusChange: (status) => console.log('Status:', status) }) // Monitor progress console.log(`Progress: ${execution.progress.current}/${execution.progress.total}`) // Wait for result const result = await execution.result ``` ## Integration Execution integrates with: * **Timeout**: Enforce time limits on execution * **Retry**: Automatically retry on failure * **Cancel**: Graceful cancellation support * **Approval**: Human-in-the-loop checkpoints --- # Timeout ## Overview The timeout system manages time limits for workflow execution. It ensures operations complete within specified durations and provides configurable actions for timeout scenarios. ## Timeout Actions | Action | Description | | ---------- | ------------------------------------------- | | `fail` | Mark the run as failed on timeout (default) | | `cancel` | Trigger graceful cancellation on timeout | | `continue` | Log timeout but allow execution to continue | ## TypeScript API ```ts import type { Timeout, TimeoutAction } from '@osprotocol/schema/runs/timeout' ``` ### TimeoutAction Action to take when a timeout occurs. ```ts type TimeoutAction = 'fail' | 'cancel' | 'continue' ``` ### Timeout Timeout configuration for workflow runs. ```ts interface Timeout { /** Timeout duration in milliseconds */ ms: number /** Action to take when timeout occurs (default: 'fail') */ onTimeout?: TimeoutAction /** Callback function when timeout occurs */ onTimeoutCallback?: () => void } ``` ## Usage Examples ### Simple Timeout ```ts const timeout: Timeout = { ms: 30000 // 30 seconds } // Fails the run if not complete within 30 seconds ``` ### Graceful Cancellation ```ts const timeout: Timeout = { ms: 60000, // 1 minute onTimeout: 'cancel', onTimeoutCallback: () => { console.log('Timeout reached, initiating graceful shutdown') } } // Triggers cancellation flow instead of immediate failure ``` ### Warning Without Failure ```ts const timeout: Timeout = { ms: 120000, // 2 minutes onTimeout: 'continue', onTimeoutCallback: () => { sendAlert('Operation exceeding expected duration') } } // Logs warning but allows execution to continue ``` ## Integration Timeout integrates with: * **RunOptions**: Configure timeout for workflow runs * **Cancel**: Timeout can trigger cancellation flow * **Retry**: Retries reset the timeout clock * **Execution**: Status changes to failed/cancelled on timeout --- # Environment This interface is experimental — no production implementation exists yet. The API surface may change. ## Overview The `Env` interface provides kernel-level access to environment variables in the execution environment. It defines a convergent surface for creating, reading, updating, and removing configuration variables across deployment platforms such as Vercel, Cloudflare Workers, and Railway. Variables can be scoped to specific deployment targets and marked as sensitive to control how they are handled by the platform. ## TypeScript API ```ts import type { Env, EnvEntry, EnvContext, EnvActions, } from '@osprotocol/schema/system/env' ``` ### EnvEntry A single environment variable, including its value, optional target scopes, and sensitivity flag. ```ts interface EnvEntry { key: string value: T target?: string[] sensitive?: boolean metadata?: Record } ``` ### EnvContext Read-only access for the context phase of the agent loop. ```ts interface EnvContext { get(key: string): Promise list(): Promise } ``` ### EnvActions Write operations for the actions phase of the agent loop. ```ts interface EnvActions { set(entry: Omit): Promise remove(key: string): Promise } ``` ### Env Full environment management interface for provider implementations. ```ts interface Env { get(key: string): Promise set(entry: Omit): Promise remove(key: string): Promise list(): Promise } ``` ## Usage Examples ### Read a variable ```ts const entry = await env.get('DATABASE_URL') if (entry) { console.log(entry.key) // 'DATABASE_URL' console.log(entry.sensitive) // true } ``` ### Set a variable scoped to production ```ts await env.set({ key: 'API_SECRET', value: 'sk-live-...', target: ['production'], sensitive: true, }) ``` ### Rotate a key ```ts await env.set({ key: 'OPENAI_API_KEY', value: 'sk-new-...', sensitive: true, }) // Or remove it entirely await env.remove('OPENAI_API_KEY') ``` ## Integration The `Env` interface integrates with: * **[Sandbox](/docs/system/sandbox)**: Sandboxes inherit or override environment variables at creation time * **[Settings](/docs/system/settings)**: Settings may reference environment variable keys for dynamic configuration * **[Preferences](/docs/system/preferences)**: Preference resolution can fall through to system-level values sourced from environment --- # Filesystem This interface is experimental — no production implementation exists yet. The API surface may change. ## Overview The filesystem interface provides platform-level access to the host execution environment's file system. Agents use it to read configurations, persist artifacts, and navigate directories across storage backends including local disk, S3, Vercel Blob, and Cloudflare R2. Sandbox environments manage their own internal filesystem independently — `system/fs` operates on the host, not inside isolated execution containers. ## TypeScript API ```ts import type { Fs, FsEntry, FsContext, FsActions } from '@osprotocol/schema/system/fs' ``` ### FsEntry Represents a file or directory in the filesystem. ```ts interface FsEntry { name: string path: string type: 'file' | 'directory' size?: number updatedAt?: number metadata?: Record } ``` ### FsContext Read-only view for the context (gather) phase of the agent loop. ```ts interface FsContext { read(path: string): Promise list(path: string): Promise exists(path: string): Promise } ``` ### FsActions Write operations for the actions (act) phase of the agent loop. ```ts interface FsActions { write(path: string, content: string): Promise remove(path: string): Promise } ``` ### Fs Combined interface providing full read and write access. Used directly by providers that do not split context and actions phases. ```ts interface Fs { read(path: string): Promise write(path: string, content: string): Promise remove(path: string): Promise list(path: string): Promise exists(path: string): Promise } ``` ## Usage Examples ### Read a configuration file ```ts const content = await fs.read('/project/agent.yaml') if (content === null) { throw new Error('agent.yaml not found') } const config = parseYaml(content) ``` ### Persist an artifact and confirm the write ```ts const report = generateReport(results) const entry = await fs.write('/output/report.md', report) console.log(`Wrote ${entry.size} bytes to ${entry.path}`) ``` ### List a directory and filter by type ```ts const entries = await fs.list('/output') const files = entries.filter((e) => e.type === 'file') console.log(`Found ${files.length} files in /output`) ``` ## Integration Filesystem integrates with: * **[Sandbox](/docs/system/sandbox)**: Sandbox has its own isolated filesystem — use `system/fs` for host-level persistence before or after sandbox execution * **[KV](/docs/context/kv)**: KV stores short-lived key-value data; filesystem is for structured, durable file content * **[Environment](/docs/system/env)**: Environment variables may point to filesystem roots or provider credentials --- # System import { Database, Settings as SettingsIcon, FolderOpen, Server, Wrench, User, Plug } from 'lucide-react'; ## Overview The System Intelligence layer provides infrastructure services that all agents depend on. Unlike traditional protocols that focus solely on agent-to-agent communication, OSP includes system-level intelligence through the Operating System abstraction. This layer manages the lifecycle, coordination, and resource management that multi-agent systems require, providing the foundation for reliable, scalable agent orchestration. ## The Operating System Abstraction Just as traditional operating systems abstract hardware resources (CPU, memory, disk), the System Intelligence layer abstracts cognitive resources (inference, context, knowledge, tools). It provides standardized APIs for: * **Agent Registry**: Discovery and capability management * **System Configuration**: Environment and settings management * **File Operations**: Standardized file system interfaces * **Protocol Integration**: MCP client and external tool access * **Installation & Setup**: System deployment and configuration Learn more: [Agentic OS Concept](/docs/concepts/agentic-os) | [Architecture](/docs/architecture) ## System Components } href="/docs/system/registry" title="Registry"> Agent registration, discovery, and capability matching for dynamic service allocation. } href="/docs/system/env" title="Environment"> Configuration and environment variable management for deployment-specific settings. } href="/docs/system/fs" title="Filesystem"> Standardized file system operations and management for agent file access. } href="/docs/system/settings" title="Settings"> System and agent settings management for centralized configuration. } href="/docs/system/mcp-client" title="MCP Client"> Model Context Protocol client for standardized external tool and resource access. } href="/docs/system/installer" title="Installer"> System installation, setup, and dependency management for deployment. } href="/docs/system/preferences" title="Preferences"> Agent preferences and user settings for customization and personalization. } href="/docs/system/sandbox" title="Sandbox"> Isolated execution environments for running agent workloads with their own filesystem and commands. ## How System Intelligence Works The System Intelligence layer operates at the infrastructure level, providing services that agents use rather than defining agent behavior directly: 1. **Registration**: Agents register capabilities with the Registry 2. **Discovery**: The OS matches agents to tasks based on capabilities 3. **Configuration**: Environment and settings provide runtime context 4. **Execution**: Filesystem and MCP enable tool access and file operations 5. **Management**: Installer and preferences configure the system These components work together to provide the resource abstraction, process isolation, and inter-process communication that define an Agentic OS. ## Integration with Other Layers System Intelligence integrates with: * **[Context](/docs/context)**: Filesystem and Environment provide context storage * **[Actions](/docs/actions)**: MCP Client enables standardized tool access * **[Checks](/docs/checks)**: Settings and Preferences configure check behavior * **[Workflows](/docs/workflows)**: Registry enables agent discovery for orchestration patterns ## Next Steps * Explore **[Registry](/docs/system/registry)** to understand agent discovery and matching * Learn about **[Environment](/docs/system/env)** for configuration management * Review **[MCP Client](/docs/system/mcp-client)** for external protocol integration * Understand how System Intelligence fits into the **[Architecture](/docs/architecture)** --- # Installer This interface is experimental — no production implementation exists yet. The API surface may change. ## Overview The Installer is the kernel's package manager, responsible for adding capabilities to the system at runtime. It installs, updates, and removes skills, tools, and extensions without requiring a system restart. Provider backends include npm, pip, Claude Code Skills, and Homebrew. ## Install Status ## TypeScript API ```ts import type { InstallStatus, InstallEntry, InstallerContext, InstallerActions, Installer, } from '@osprotocol/schema/system/installer' ``` ### InstallStatus ```ts type InstallStatus = 'installed' | 'updating' | 'failed' ``` The lifecycle state of a managed package. | Value | Description | | ----------- | ------------------------------------------- | | `installed` | Package is present and ready to use | | `updating` | An update is in progress | | `failed` | The last install or update operation failed | ### InstallEntry ```ts interface InstallEntry { name: string version: string status: InstallStatus installedAt: number metadata?: Record } ``` A record describing a managed package. `installedAt` is a Unix timestamp (milliseconds). `metadata` carries provider-specific data (e.g., checksums, source URLs). ### InstallerContext ```ts interface InstallerContext { get(name: string): Promise list(): Promise } ``` Read-only gather phase. Use `InstallerContext` to inspect what is currently installed without triggering side effects. | Method | Description | | ----------- | ----------------------------------------------------------------- | | `get(name)` | Returns the `InstallEntry` for `name`, or `null` if not installed | | `list()` | Returns all managed `InstallEntry` records | ### InstallerActions ```ts interface InstallerActions { install(name: string, version?: string): Promise uninstall(name: string): Promise update(name: string, version?: string): Promise } ``` Write act phase. Use `InstallerActions` to mutate the set of installed packages. | Method | Description | | ------------------------- | -------------------------------------------------------------- | | `install(name, version?)` | Installs the package. Resolves to the resulting `InstallEntry` | | `uninstall(name)` | Removes the package. Resolves to `true` on success | | `update(name, version?)` | Updates the package, optionally pinning a version | ### Installer ```ts interface Installer { install(name: string, version?: string): Promise uninstall(name: string): Promise get(name: string): Promise list(): Promise update(name: string, version?: string): Promise } ``` The combined interface. `Installer` merges `InstallerContext` and `InstallerActions` into a single object that a kernel implementation exposes to agents. ## Usage Examples ### Install a package ```ts const entry = await installer.install('@osprotocol/skill-web-search') console.log(entry.status) // 'installed' console.log(entry.version) // e.g. '1.2.0' ``` ### List installed packages ```ts const packages = await installer.list() for (const pkg of packages) { console.log(`${pkg.name}@${pkg.version} — ${pkg.status}`) } ``` ### Update a package ```ts const updated = await installer.update('@osprotocol/skill-web-search', '2.0.0') if (updated.status === 'installed') { console.log('Update succeeded') } else { console.error('Update failed') } ``` ## Integration * [Registry](/docs/system/registry) — discover available skills and tools before installing * [MCP Client](/docs/system/mcp-client) — manages installed MCP servers exposed to agents --- # MCP Client This interface is experimental — no production implementation exists yet. The API surface may change. ## Overview `McpClient` is the kernel's connection manager for external MCP (Model Context Protocol) servers. It handles the infrastructure side of MCP integration: establishing connections, tracking server status, and exposing available tools to the rest of the system. This is distinct from the agent-facing side. The agent-facing interface for discovering and invoking MCP tools lives in [MCP Servers](/docs/actions/mcp-servers). Think of `McpClient` as the device driver manager — it keeps connections alive so that agents can use them through higher-level APIs. Provider examples: Claude Code MCP, Cursor MCP, VS Code Copilot MCP. ## Connection Lifecycle ## TypeScript API ```ts import type { McpServerStatus, McpServerEntry, McpContext, McpActions, McpClient, } from '@osprotocol/schema/system/mcp-client' ``` ### McpServerStatus ```ts type McpServerStatus = 'connected' | 'disconnected' | 'error' ``` Represents the current state of an MCP server connection. | Value | Description | | -------------- | -------------------------------------------- | | `connected` | Server is reachable and active | | `disconnected` | Server is not connected | | `error` | Connection attempt failed or was interrupted | ### McpServerEntry ```ts interface McpServerEntry { name: string uri: string status: McpServerStatus tools?: string[] metadata?: Record } ``` A record describing a registered MCP server. `tools` lists the tool names exposed by the server. `metadata` holds any additional provider-specific information. ### McpContext ```ts interface McpContext { get(name: string): Promise list(): Promise } ``` Read-only gather phase. Used to inspect the current state of registered servers without modifying connections. ### McpActions ```ts interface McpActions { connect(name: string, uri: string): Promise disconnect(name: string): Promise } ``` Write/act phase. Used to establish or tear down server connections. ### McpClient ```ts interface McpClient { connect(name: string, uri: string): Promise disconnect(name: string): Promise get(name: string): Promise list(): Promise } ``` The unified interface combining both context and actions. `McpClient` is the primary surface for code that needs both read and write access to MCP connections. ## Usage Examples ### Connect to an MCP server ```ts const entry = await mcpClient.connect( 'claude-code', 'mcp://localhost:3100' ) console.log(entry.status) // 'connected' console.log(entry.tools) // ['read_file', 'write_file', ...] ``` ### List all connected servers and their tools ```ts const servers = await mcpClient.list() for (const server of servers) { if (server.status === 'connected') { console.log(`${server.name}: ${server.tools?.join(', ')}`) } } ``` ### Disconnect a server ```ts const ok = await mcpClient.disconnect('claude-code') console.log(ok) // true ``` ## System MCP Client vs Actions MCP Servers | | `system/mcp-client` | `actions/mcp-servers` | | -------------- | ------------------------------------- | --------------------------------- | | Layer | Infrastructure / kernel | Agent-facing | | Responsibility | Manage server connections | Invoke tools on connected servers | | Who uses it | System internals, orchestrators | Agents, skills | | Phase | Context (read) + Actions (write) | Actions | | Example | `connect()`, `disconnect()`, `list()` | `callTool()`, `listTools()` | The system client keeps connections alive. The actions interface exposes those connections as callable tools for agents. ## Integration * [MCP Servers (actions)](/docs/actions/mcp-servers) — agent-facing tool invocation over MCP connections managed here * [Registry](/docs/system/registry) — discover available MCP servers before connecting * [Installer](/docs/system/installer) — provision and install MCP server binaries or packages --- # Preferences This interface is experimental — no production implementation exists yet. The API surface may change. ## Overview The `Preferences` interface provides kernel-level storage for per-agent and per-user configuration that customizes agent behavior without affecting the global system. Unlike system-wide settings, preferences are scoped — each entry belongs to an `agent`, `user`, or `system` scope — and resolved in cascade order: agent overrides user, user overrides system. Familiar provider analogs include VS Code Settings, Claude Code Scoped Config, and GitHub User Preferences. ## Scope Resolution When an agent requests a preference value, the kernel checks scopes from most specific to least specific. The first match wins. ## TypeScript API ```ts import type { Preferences, PreferenceEntry, PreferenceScope, PreferencesContext, PreferencesActions, } from '@osprotocol/schema/system/preferences' ``` ### PreferenceScope The three available scopes for a preference entry. ```ts type PreferenceScope = 'agent' | 'user' | 'system' ``` ### PreferenceEntry A single preference value with its scope and optional metadata. ```ts interface PreferenceEntry { key: string value: T scope: PreferenceScope metadata?: Record } ``` ### PreferencesContext Read-only access for the context phase of the agent loop. ```ts interface PreferencesContext { get(key: string, scope: PreferenceScope): Promise | null> list(scope: PreferenceScope): Promise } ``` ### PreferencesActions Write operations for the actions phase of the agent loop. ```ts interface PreferencesActions { set(key: string, value: T, scope: PreferenceScope): Promise> remove(key: string, scope: PreferenceScope): Promise } ``` ### Preferences Full preferences management interface combining context and actions. ```ts interface Preferences { get(key: string, scope: PreferenceScope): Promise | null> set(key: string, value: T, scope: PreferenceScope): Promise> remove(key: string, scope: PreferenceScope): Promise list(scope: PreferenceScope): Promise } ``` ## Usage Examples ### Read an agent-scoped preference Agent-scoped preferences take priority over user or system values with the same key. ```ts const entry = await preferences.get('output.format', 'agent') if (entry) { console.log(entry.key) // 'output.format' console.log(entry.value) // 'json' console.log(entry.scope) // 'agent' } ``` ### Set a user-scoped preference User-scoped preferences apply across agents that have not overridden the key at the agent scope. ```ts await preferences.set('output.format', 'markdown', 'user') ``` ### Implement the cascade manually When you need to resolve a value across all scopes in priority order: ```ts async function resolve( preferences: Preferences, key: string, ): Promise { const scopes: PreferenceScope[] = ['agent', 'user', 'system'] for (const scope of scopes) { const entry = await preferences.get(key, scope) if (entry !== null) return entry.value } return null } const format = await resolve(preferences, 'output.format') ``` ## Integration * **[Settings](/docs/system/settings)**: System-wide configuration that acts as the baseline before preference scopes are applied * **[Environment](/docs/system/env)**: Platform-level variables for credentials and deployment targets; preferences handle behavioral customization above that layer * **[Registry](/docs/system/registry)**: Agent registration metadata may include default preference values that seed the agent scope at registration time --- # Registry This interface is experimental — no production implementation exists yet. The API surface may change. ## Overview Registry is the kernel's service directory: it registers and discovers any type of resource in the system. The interface is generic (`Registry`) so it works uniformly for agents, skills, MCP servers, or any other resource type. The `RegistryContext`/`RegistryActions` split operates on **named registries** (e.g., `"agents"`, `"skills"`), allowing a single implementation to manage multiple resource namespaces. This pattern is analogous to Google A2A Agent Cards, Microsoft AutoGen Registry, AGENTS.md/AAIF Agent Discovery, and the npm Registry. ## TypeScript API ```ts import type { RegistryEntry, RegistryContext, RegistryActions, Registry, } from '@osprotocol/schema/system/registry' ``` ### RegistryEntry A single record stored in the registry. ```ts interface RegistryEntry { name: string description: string resource: T metadata?: Record } ``` | Field | Type | Description | | ------------- | ------------------------- | -------------------------------------------------------- | | `name` | `string` | Unique identifier within its registry | | `description` | `string` | Human-readable summary | | `resource` | `T` | The actual resource payload (agent, skill, server, etc.) | | `metadata` | `Record` | Optional arbitrary metadata | ### RegistryContext Read-only access to named registries. Used during the **gather phase** to inspect what is available. ```ts interface RegistryContext { get(registry: string, name: string): Promise | null> list(registry: string): Promise[]> } ``` | Method | Description | | --------------------- | -------------------------------------------------- | | `get(registry, name)` | Fetch a single entry by name from a named registry | | `list(registry)` | List all entries in a named registry | ### RegistryActions Write access to named registries. Used during the **act phase** to mutate what is registered. ```ts interface RegistryActions { register(registry: string, entry: RegistryEntry): Promise unregister(registry: string, name: string): Promise } ``` | Method | Description | | ---------------------------- | --------------------------------------------- | | `register(registry, entry)` | Add or update an entry in a named registry | | `unregister(registry, name)` | Remove an entry; returns `true` if it existed | ### Registry The full provider interface for a single typed registry. Combines all operations and adds `find` for criteria-based lookup — this method is only available at the provider level, not on `RegistryContext` or `RegistryActions`. ```ts interface Registry { register(entry: RegistryEntry): Promise unregister(name: string): Promise get(name: string): Promise | null> list(): Promise[]> find(criteria: Partial): Promise[]> } ``` | Method | Description | | ------------------ | --------------------------------------------------------- | | `register(entry)` | Add or update an entry | | `unregister(name)` | Remove an entry; returns `true` if it existed | | `get(name)` | Fetch a single entry by name | | `list()` | List all entries | | `find(criteria)` | Search entries by matching fields on the resource payload | ## Named Registries `RegistryContext` and `RegistryActions` accept a `registry` string as the first parameter. This means one implementation can manage multiple independent namespaces: ```ts // Same context, different namespaces const agent = await ctx.get('agents', 'summarizer-v2') const skill = await ctx.get('skills', 'web-search') const server = await ctx.get('mcp-servers', 'filesystem') ``` The full `Registry` interface, by contrast, is typed per resource type and manages a single namespace — you would instantiate one `Registry` for agents and a separate `Registry` for skills. ## Usage Examples ### Register an agent ```ts await actions.register('agents', { name: 'summarizer-v2', description: 'Summarizes long documents into structured output.', resource: agentDefinition, metadata: { version: '2.0.0', owner: 'platform-team' }, }) ``` ### Discover resources by criteria Using a typed `Registry`, find all agents that match a partial resource shape: ```ts const agentRegistry: Registry = getAgentRegistry() const matches = await agentRegistry.find({ domain: 'summarization', supportsStreaming: true, }) ``` ### Use named registries to cross-reference resources ```ts async function resolveSkillsForAgent( ctx: RegistryContext, agentName: string, ): Promise { const agentEntry = await ctx.get('agents', agentName) if (!agentEntry) return [] const requiredSkills: string[] = agentEntry.metadata?.skills as string[] ?? [] return Promise.all( requiredSkills.map((skillName) => ctx.get('skills', skillName)), ).then((results) => results.filter(Boolean) as RegistryEntry[]) } ``` ## Integration * [Installer](/docs/system/installer) — installs capabilities that are then registered, making them available for discovery * [MCP Client](/docs/system/mcp-client) — discovers MCP servers through the registry before establishing connections * [Env](/docs/system/env) — environment-aware discovery uses registry lookups scoped to the current runtime context --- # Sandbox This interface is experimental — no production implementation exists yet. The API surface may change. ## Overview The sandbox interface manages isolated execution environments for agent workloads. Agents that write and run code need isolation — a contained filesystem, command execution, optional network access, and automatic cleanup. The protocol defines the convergent surface across providers like Vercel Sandbox, E2B, Cloudflare Workers, and Docker. ## TypeScript API ```ts import type { Sandbox, SandboxEntry, SandboxConfig, SandboxStatus, SandboxContext, SandboxActions, CommandResult, SandboxFile, } from '@osprotocol/schema/system/sandbox' ``` ### SandboxStatus Lifecycle states for a sandbox. ```ts type SandboxStatus = 'pending' | 'running' | 'stopping' | 'stopped' | 'failed' ``` ### SandboxEntry Summary of a sandbox instance. ```ts interface SandboxEntry { /** Unique sandbox identifier */ id: string /** Current lifecycle status */ status: SandboxStatus /** When the sandbox was created (Unix ms) */ createdAt: number /** Remaining time before auto-stop (ms) */ timeout?: number /** Extensible metadata for provider-specific data */ metadata?: Record } ``` ### SandboxConfig Configuration for creating a new sandbox. ```ts interface SandboxConfig { /** Runtime or template identifier (e.g., "node24", "python3.13") */ runtime?: string /** Initial timeout in milliseconds before auto-stop */ timeout?: number /** Environment variables to inject */ env?: Record /** Ports to expose for external access */ ports?: number[] /** Extensible metadata for provider-specific data */ metadata?: Record } ``` ### CommandResult Result of executing a command inside a sandbox. ```ts interface CommandResult { /** Process exit code (0 = success) */ exitCode: number /** Standard output */ stdout: string /** Standard error */ stderr: string /** Extensible metadata for provider-specific data */ metadata?: Record } ``` ### SandboxFile A file within the sandbox filesystem. ```ts interface SandboxFile { /** File path within the sandbox */ path: string /** File contents */ content: string } ``` ### Sandbox Full sandbox management interface. ```ts interface Sandbox { create(config?: SandboxConfig): Promise get(id: string): Promise list(): Promise stop(id: string): Promise exec(id: string, command: string, args?: string[]): Promise readFile(id: string, path: string): Promise writeFiles(id: string, files: SandboxFile[]): Promise getUrl(id: string, port: number): Promise extendTimeout(id: string, duration: number): Promise } ``` ### SandboxContext Read-only view for the context phase of the agent loop. ```ts interface SandboxContext { get(id: string): Promise list(): Promise } ``` ### SandboxActions Write operations for the actions phase of the agent loop. ```ts interface SandboxActions { create(config?: SandboxConfig): Promise stop(id: string): Promise exec(id: string, command: string, args?: string[]): Promise writeFiles(id: string, files: SandboxFile[]): Promise readFile(id: string, path: string): Promise getUrl(id: string, port: number): Promise extendTimeout(id: string, duration: number): Promise } ``` ## Usage Examples ### Create a sandbox and run code ```ts const entry = await sandbox.create({ runtime: 'node24', timeout: 120000, env: { NODE_ENV: 'production' }, }) await sandbox.writeFiles(entry.id, [ { path: 'index.ts', content: 'console.log("hello from sandbox")' }, ]) const result = await sandbox.exec(entry.id, 'npx', ['tsx', 'index.ts']) // result.stdout → 'hello from sandbox' // result.exitCode → 0 ``` ### Expose a web server ```ts const entry = await sandbox.create({ runtime: 'node24', ports: [3000], }) await sandbox.writeFiles(entry.id, [ { path: 'server.js', content: 'require("http").createServer((_, res) => res.end("ok")).listen(3000)' }, ]) await sandbox.exec(entry.id, 'node', ['server.js']) const url = await sandbox.getUrl(entry.id, 3000) // url → 'https://sandbox-abc123.provider.dev' ``` ### Extend timeout for long-running tasks ```ts const entry = await sandbox.create({ timeout: 60000 }) // Task is taking longer than expected await sandbox.extendTimeout(entry.id, 60000) // +60s // Clean up when done await sandbox.stop(entry.id) ``` ## Sandbox vs Filesystem | | Sandbox (`system/sandbox`) | Filesystem (`system/fs`) | | ------------- | ----------------------------------------------- | ------------------------------- | | **Scope** | Isolated environment with own filesystem | Host/platform filesystem | | **Execution** | Can run commands (`exec`) | No execution capability | | **Lifecycle** | Created, used, destroyed | Always available | | **Network** | Optional port exposure | N/A | | **Use case** | Run untrusted code, test builds, serve previews | Read configs, persist artifacts | ## Integration Sandbox integrates with: * **[Filesystem](/docs/system/fs)**: Host fs for persistent artifacts, sandbox fs for ephemeral execution * **[Environment](/docs/system/env)**: Sandbox inherits or overrides environment variables * **[Timeout](/docs/runs/timeout)**: Sandbox timeout is independent of run timeout — both can apply * **[Screenshot](/docs/checks/screenshot)**: Visual verification of sandbox-served web previews --- # Settings This interface is experimental — no production implementation exists yet. The API surface may change. ## Overview Settings is the kernel interface for managing system-wide configuration that applies to all agents and workflows running on the platform. Unlike [Preferences](/docs/system/preferences), which are scoped to individual agents or users, Settings entries are global — changing a setting affects every agent and workflow in the system. Analogous to Vercel Project Settings, Cloudflare Zone Settings, or AWS SSM Parameter Store. ## TypeScript API ```ts import type { SettingsEntry, SettingsContext, SettingsActions, Settings, } from '@osprotocol/schema/system/settings' ``` ### SettingsEntry A single configuration entry stored in the system. ```ts interface SettingsEntry { key: string value: T description?: string readOnly?: boolean metadata?: Record } ``` | Field | Type | Description | | ------------- | -------------------------- | ------------------------------------------------------ | | `key` | `string` | Unique identifier for the setting | | `value` | `T` | The stored value (generic, defaults to `unknown`) | | `description` | `string?` | Human-readable explanation of the setting | | `readOnly` | `boolean?` | When `true`, the entry cannot be modified at runtime | | `metadata` | `Record?` | Arbitrary additional data (e.g. source, last modified) | ### SettingsContext Read-only interface used during the gather phase. Lets agents inspect system configuration without mutation rights. ```ts interface SettingsContext { get(key: string): Promise | null> list(): Promise } ``` ### SettingsActions Write interface used during the act phase. Restricts callers to mutation operations only. ```ts interface SettingsActions { set(key: string, value: T): Promise> remove(key: string): Promise } ``` ### Settings Full interface combining read and write operations. Intended for privileged system-level callers. ```ts interface Settings { get(key: string): Promise | null> set(key: string, value: T): Promise> remove(key: string): Promise list(): Promise } ``` ## Usage Examples ### Reading a setting ```ts import type { Settings } from '@osprotocol/schema/system/settings' async function getMaxConcurrency(settings: Settings) { const entry = await settings.get('workflow.max_concurrency') if (entry === null) { return 4 // default } return entry.value } ``` ### Handling read-only settings Some entries are marked `readOnly` and must not be modified at runtime. Check the flag before attempting a write. ```ts async function updateSetting(settings: Settings, key: string, value: unknown) { const existing = await settings.get(key) if (existing?.readOnly) { throw new Error(`Setting "${key}" is read-only and cannot be modified.`) } return settings.set(key, value) } ``` ### Typed settings with metadata ```ts import type { Settings, SettingsEntry } from '@osprotocol/schema/system/settings' interface RateLimitConfig { requestsPerMinute: number burstAllowance: number } async function configureRateLimit(settings: Settings): Promise> { return settings.set('agent.rate_limit', { requestsPerMinute: 60, burstAllowance: 10, }) } ``` ## Settings vs Preferences | | Settings | Preferences | | ------------------ | ---------------------------------------------- | -------------------------------- | | **Scope** | System-wide | Per-agent or per-user | | **Applies to** | All agents and workflows | A specific agent or user | | **Who manages it** | Platform operators | Individual agents or users | | **Typical values** | Concurrency limits, feature flags, rate limits | Language, persona, output format | | **Change impact** | Platform-wide | Isolated to the scope | ## Integration * [Preferences](/docs/system/preferences) — per-agent or per-user configuration, scoped rather than global * [Environment](/docs/system/env) — platform-level variables; settings govern higher-level behavioral configuration --- # Evaluator-Optimizer This workflow pattern is part of the OS Protocol specification. The interfaces below describe the expected contract; implementations must honor the scoring range (0.0–1.0) and respect maxIterations to avoid infinite refinement loops. ## Overview The Evaluator-Optimizer workflow runs an iterative generate-evaluate-optimize loop, continuing to refine output until a quality threshold is met or the maximum number of iterations is exhausted. Each cycle produces structured feedback that the optimizer uses to improve the next generation attempt. This pattern is best suited to quality-critical tasks where initial outputs are unlikely to meet standards without targeted refinement. ## Pattern ## TypeScript API ```ts import type { Evaluation, CriterionResult, EvaluationCriterion, EvaluatorOptimizerWorkflow, EvaluatorOptimizerConfig, } from "@osprotocol/schema/workflows/evaluator-optimizer" ``` ### Evaluation Result returned by `evaluate()`. The `score` is a float between 0.0 and 1.0. `passed` indicates whether the score meets the configured threshold. `feedback` is a human-readable explanation. `criteria` provides a per-criterion breakdown when multiple evaluation dimensions are configured. ```ts interface Evaluation { score: number passed: boolean feedback: string criteria?: CriterionResult[] } ``` ### CriterionResult Per-criterion scoring entry inside an `Evaluation`. Mirrors `EvaluationCriterion` but carries the measured `score` and `passed` result for a single generation attempt. ```ts interface CriterionResult { name: string score: number passed: boolean feedback?: string } ``` ### EvaluationCriterion Declares a single quality dimension used during evaluation. `threshold` sets the minimum acceptable score (0.0–1.0) for this criterion. `weight` controls its relative contribution when computing the aggregate score; weights across all criteria should sum to 1.0. ```ts interface EvaluationCriterion { name: string description: string threshold: number weight?: number } ``` ### EvaluatorOptimizerWorkflow Extends the base `Workflow` interface with the three methods that implement the loop. `generate` produces an initial output from the prompt. `evaluate` scores that output and returns structured feedback. `optimize` uses the output and evaluation to produce an improved version. ```ts interface EvaluatorOptimizerWorkflow extends Workflow { generate(prompt: string): Promise evaluate(output: Output, prompt: string): Promise optimize(output: Output, evaluation: Evaluation, prompt: string): Promise } ``` ### EvaluatorOptimizerConfig Configuration passed when constructing the workflow. `threshold` sets the global pass score (default 0.8). `maxIterations` caps the refinement loop. `criteria` declares the evaluation dimensions. `generatorModel` and `evaluatorModel` allow using different models for generation and evaluation—useful when a smaller, faster model generates and a larger, more critical model evaluates. ```ts interface EvaluatorOptimizerConfig { threshold?: number maxIterations?: number criteria?: EvaluationCriterion[] generatorModel?: string evaluatorModel?: string } ``` ## Usage Examples ### Basic generation loop Runs the loop until the output passes the global threshold or `maxIterations` is reached. ```ts const result = await workflow.run("Write a concise executive summary for Q4 results", { config: { threshold: 0.85, maxIterations: 4, }, }) ``` ### Multi-criteria evaluation Defines separate quality dimensions with individual thresholds and weights. The aggregate score is a weighted sum of criterion scores. ```ts const result = await workflow.run("Draft a technical proposal for the new caching layer", { config: { threshold: 0.80, maxIterations: 5, criteria: [ { name: "technical_accuracy", description: "Claims are technically correct and current", threshold: 0.90, weight: 0.5, }, { name: "clarity", description: "Language is clear and free of ambiguity", threshold: 0.75, weight: 0.3, }, { name: "completeness", description: "All required sections are present and addressed", threshold: 0.70, weight: 0.2, }, ], }, }) ``` ### Different models for generation and evaluation Uses a fast model to generate and a more capable model to evaluate, balancing cost against quality. ```ts const result = await workflow.run("Translate the following legal clause to plain English", { config: { threshold: 0.90, maxIterations: 3, generatorModel: "claude-haiku-4-5", evaluatorModel: "claude-opus-4-6", }, }) ``` ## Integration * [Routing](/docs/workflows/routing) — route inputs to the appropriate generator before entering the loop * [Orchestrator-Workers](/docs/workflows/orchestrator-workers) — use evaluator-optimizer as a worker in a larger orchestration * [Parallelization](/docs/workflows/parallelization) — run multiple generation candidates in parallel and evaluate each * [Judge](/docs/checks/judge) — reuse judge checks as evaluation criteria inside this workflow * [Runs](/docs/runs) — control timeout, retry, and cancellation for the refinement loop --- # Workflows ## Overview Workflows define execution patterns for agent tasks. All workflow types extend the base `Workflow` interface. The OS Protocol implements patterns based on [Anthropic's building blocks for agentic systems](https://www.anthropic.com/engineering/building-effective-agents). ## Available Patterns | Pattern | Description | When to Use | | ----------------------------------------------------------- | --------------------------------- | --------------------------------- | | [Routing](/docs/workflows/routing) | Classify and delegate | Single specialized handler needed | | [Orchestrator-Workers](/docs/workflows/orchestrator-worker) | Plan, delegate, synthesize | Multiple specialized capabilities | | [Parallelization](/docs/workflows/parallelization) | Split, execute in parallel, merge | Independent subtasks | | [Evaluator-Optimizer](/docs/workflows/evaluator-optimizer) | Generate, evaluate, refine | Quality is critical | ## TypeScript API ```ts import type { Workflow, InferWorkflowOutput } from '@osprotocol/schema/workflows' ``` ### Workflow Base interface that all workflow patterns implement. ```ts interface Workflow { /** * Execute the workflow with the given prompt * * @param prompt - The input prompt/task to process * @param options - Optional run configuration (timeout, retry, cancel, etc.) * @returns Promise resolving to the workflow output */ run(prompt: string, options?: RunOptions): Promise } ``` ### InferWorkflowOutput Utility type to extract the output type from a workflow. ```ts type InferWorkflowOutput = T extends Workflow ? Output : never ``` ## Usage Example ```ts import type { Workflow, RunOptions } from '@osprotocol/schema/workflows' // Define a custom workflow const myWorkflow: Workflow = { async run(prompt, options) { // Execute workflow logic return `Processed: ${prompt}` } } // Run with options const result = await myWorkflow.run('Hello', { timeout: { ms: 30000 }, retry: { attempts: 3, delayMs: 1000 } }) ``` ## Composing Workflows Workflows can be composed to create complex execution patterns: ```ts // Routing delegates to specialized workflows const router: RoutingWorkflow = { async classify(prompt) { // Determine which workflow to use return prompt.includes('code') ? 'code-assistant' : 'general' }, async run(prompt, options) { const route = await this.classify(prompt) return workflows[route].run(prompt, options) } } ``` ## Integration Workflows integrate with: * **RunOptions**: Configure timeout, retry, and cancel behavior * **Agents**: Agents declare which workflow patterns they can use * **Execution**: Workflows create executions with lifecycle control --- # Orchestrator-Workers This workflow pattern is experimental. Interfaces are subject to change as the OS Protocol specification evolves. ## Overview The Orchestrator-Workers pattern uses a central orchestrator to decompose a complex task into a structured plan, delegate each step to a specialized worker, and synthesize the collected results into a final output. It is the most powerful multi-agent pattern for tasks that require multiple specialized capabilities working together. Workers operate independently on their assigned steps, and the orchestrator aggregates their results with full awareness of the original goal. ## Pattern Diagram ## TypeScript API Import from `@osprotocol/schema/workflows/orchestrator-workers`. ### PlanStep A single unit of work in the execution plan. ```ts interface PlanStep { id: string description: string worker: string input: string dependsOn?: string[] } ``` * `id` — unique identifier for the step * `description` — human-readable summary of what the step does * `worker` — name of the worker responsible for this step * `input` — the prompt or data passed to the worker * `dependsOn` — optional list of step IDs that must complete before this step runs ### Plan The full execution plan produced by the orchestrator. ```ts interface Plan { steps: PlanStep[] goal: string } ``` * `steps` — ordered list of steps to execute * `goal` — the original objective driving the plan ### WorkerResult The result returned by a worker after executing a step. ```ts interface WorkerResult { stepId: string worker: string success: boolean data?: T error?: string } ``` * `stepId` — the ID of the step this result corresponds to * `worker` — name of the worker that produced the result * `success` — whether the step completed without error * `data` — the output produced by the worker on success * `error` — error message if the step failed ### OrchestratorWorkersWorkflow The main interface extending the base `Workflow`. ```ts interface OrchestratorWorkersWorkflow extends Workflow { plan(prompt: string): Promise delegate(step: PlanStep): Promise synthesize(results: WorkerResult[], plan: Plan): Promise } ``` * `plan` — generates a structured `Plan` from the input prompt * `delegate` — sends a single `PlanStep` to the assigned worker and returns a `WorkerResult` * `synthesize` — combines all results with the original plan to produce the final `Output` ### WorkerConfig Configuration for registering a worker with the orchestrator. ```ts interface WorkerConfig { name: string description: string workflow: Workflow capabilities: string[] } ``` * `name` — unique identifier used in `PlanStep.worker` * `description` — what this worker is specialized to do * `workflow` — the underlying workflow the worker executes * `capabilities` — list of capability tags used for worker selection ## Usage Examples ### Create a Plan and Execute ```ts const result = await orchestrator.run("Research and summarize AI agent frameworks") // Or step by step: const plan = await orchestrator.plan("Research and summarize AI agent frameworks") const results: WorkerResult[] = [] for (const step of plan.steps) { const result = await orchestrator.delegate(step) results.push(result) } const output = await orchestrator.synthesize(results, plan) ``` ### Register Workers ```ts const workers: WorkerConfig[] = [ { name: "researcher", description: "Searches the web and retrieves relevant information", workflow: searchWorkflow, capabilities: ["search", "retrieve", "browse"], }, { name: "analyst", description: "Analyzes data and identifies patterns", workflow: analysisWorkflow, capabilities: ["analyze", "compare", "rank"], }, { name: "writer", description: "Drafts and edits written content", workflow: writingWorkflow, capabilities: ["write", "summarize", "edit"], }, ] ``` ### Handle Step Dependencies Use `dependsOn` to ensure steps run in the correct order when outputs from one step feed into another. ```ts const plan: Plan = { goal: "Produce a competitive analysis report", steps: [ { id: "step-1", description: "Gather data on competitor A", worker: "researcher", input: "Find key features and pricing for Competitor A", }, { id: "step-2", description: "Gather data on competitor B", worker: "researcher", input: "Find key features and pricing for Competitor B", }, { id: "step-3", description: "Write comparative analysis", worker: "writer", input: "Compare the two competitors based on research results", dependsOn: ["step-1", "step-2"], }, ], } // Execute steps respecting dependencies for (const step of plan.steps) { const deps = step.dependsOn ?? [] const depsComplete = deps.every(id => results.find(r => r.stepId === id && r.success) ) if (depsComplete) { results.push(await orchestrator.delegate(step)) } } ``` ## Integration * [Routing](/docs/workflows/routing) — use routing to select which orchestrator handles an incoming task * [Parallelization](/docs/workflows/parallelization) — combine with parallelization to execute independent steps concurrently * [Evaluator-Optimizer](/docs/workflows/evaluator-optimizer) — wrap synthesized output in an evaluator loop for iterative refinement * [Runs](/docs/runs) — configure timeouts, retries, and approval gates on orchestrator runs --- # Parallelization The Parallelization workflow pattern is part of the OS Protocol specification. It is defined in `@osprotocol/schema/workflows/parallelization` and must be implemented by agents that declare support for parallel execution. ## Overview Parallelization splits an incoming prompt into independent subtasks, executes all of them concurrently, and merges their results into a single output. This pattern is best suited for throughput-sensitive workloads where subtasks do not depend on each other and the total work can be decomposed cleanly. It follows the parallelization building block described in Anthropic's "Building Effective Agents." ## Pattern ## Failure Strategies The `failureStrategy` option in `ParallelizationConfig` controls how the workflow behaves when one or more subtasks fail. | Strategy | Behavior | | ------------- | -------------------------------------------------------------------------------------- | | `fail-fast` | Stops execution on the first failed subtask and rejects the run immediately. | | `collect-all` | Runs all subtasks regardless of failures and returns every result, including failures. | Use `fail-fast` when any single failure makes the merged output invalid. Use `collect-all` when partial results are still useful or when you want to surface all errors at once. ## TypeScript API ```ts import type { Subtask, SubtaskResult, ParallelizationWorkflow, ParallelizationConfig, } from "@osprotocol/schema/workflows/parallelization"; ``` ### Subtask ```ts interface Subtask { id: string prompt: string metadata?: Record } ``` A single unit of work produced by `split()`. The `id` uniquely identifies the subtask and is echoed back in `SubtaskResult` so results can be correlated to their origin. ### SubtaskResult ```ts interface SubtaskResult { id: string success: boolean data?: T error?: string durationMs?: number } ``` The outcome of a single subtask. Results are returned in the same order as the input `Subtask[]` array. `durationMs` is available for performance monitoring. ### ParallelizationWorkflow ```ts interface ParallelizationWorkflow extends Workflow { split(prompt: string): Promise parallel(subtasks: Subtask[]): Promise merge(results: SubtaskResult[]): Promise } ``` The core workflow interface. Implementations must provide all three methods: * `split` — decomposes the incoming prompt into a list of independent subtasks. * `parallel` — executes all subtasks concurrently and returns their results in input order. * `merge` — combines `SubtaskResult[]` into the final typed output. ### ParallelizationConfig ```ts interface ParallelizationConfig { maxConcurrency?: number failureStrategy?: 'fail-fast' | 'collect-all' includeFailures?: boolean } ``` | Field | Default | Description | | ----------------- | ------------- | -------------------------------------------------------------------- | | `maxConcurrency` | unbounded | Maximum number of subtasks to run at the same time. | | `failureStrategy` | `'fail-fast'` | How to handle subtask failures (see Failure Strategies). | | `includeFailures` | `false` | When using `collect-all`, whether to pass failed results to `merge`. | ## Usage Examples ### Split and merge ```ts const subtasks = await workflow.split( "Summarize these five documents: doc1, doc2, doc3, doc4, doc5" ); const results = await workflow.parallel(subtasks); const output = await workflow.merge(results); ``` ### Configure concurrency ```ts const run = await workflow.run(prompt, { config: { maxConcurrency: 3, failureStrategy: "fail-fast", }, }); ``` ### Handle partial failures with collect-all ```ts const run = await workflow.run(prompt, { config: { failureStrategy: "collect-all", includeFailures: true, }, }); const failed = run.output.results.filter((r) => !r.success); if (failed.length > 0) { console.warn(`${failed.length} subtask(s) failed`, failed.map((r) => r.error)); } ``` ## Integration * [Routing](/docs/workflows/routing) — classify and delegate to the right handler before parallelizing. * [Orchestrator-Workers](/docs/workflows/orchestrator-workers) — combine orchestration with parallel worker execution. * [Evaluator-Optimizer](/docs/workflows/evaluator-optimizer) — evaluate and refine merged results after parallelization. * [Runs](/docs/runs) — control timeout, retry, and cancellation for the overall parallel run. --- # Routing This interface is experimental — no production implementation exists yet. The API surface may change. ## Overview The Routing workflow classifies an incoming prompt and delegates it to a single specialized workflow based on the classification result. It is the simplest multi-agent pattern — it routes but does not aggregate results across multiple branches. The design follows Anthropic's routing building block from "Building Effective Agents." ## Pattern ## TypeScript API Import from `@osprotocol/schema/workflows/routing`. The base `Workflow` interface is available from `@osprotocol/schema/workflows`. ### RouteConfig Describes when a route should be selected and provides optional examples for few-shot classification. ```ts interface RouteConfig { description: string whenToUse: string[] examples?: string[] } ``` ### RoutingWorkflow Extends the base `Workflow` interface with a `classify` method that returns the key of the matched route. ```ts interface RoutingWorkflow extends Workflow { classify(prompt: string): Promise } ``` ### RoutingWorkflowConfig Top-level configuration object. `workflows` is a record keyed by route name, each value being a `RoutingWorkflowEntry`. ```ts interface RoutingWorkflowConfig { model?: string workflows: Record> } ``` ### RoutingWorkflowEntry Pairs a concrete `Workflow` with its `RouteConfig`. Set `markAsDefault` to `true` on one entry to use it as the fallback when no route matches. ```ts interface RoutingWorkflowEntry { workflow: Workflow route: RouteConfig markAsDefault?: boolean } ``` ## Usage Examples ### Configure a routing workflow with multiple routes ```ts const config: RoutingWorkflowConfig = { model: "claude-opus-4-6", workflows: { billing: { workflow: billingWorkflow, route: { description: "Handles billing, invoices, and payment questions.", whenToUse: [ "User asks about an invoice", "User wants to update payment method", "User reports a charge they do not recognize", ], examples: [ "Why was I charged twice this month?", "How do I cancel my subscription?", ], }, }, technical: { workflow: technicalWorkflow, route: { description: "Handles technical support and troubleshooting.", whenToUse: [ "User reports a bug or error", "User needs help integrating the API", "User asks how a feature works", ], }, }, general: { workflow: generalWorkflow, route: { description: "Handles all other questions.", whenToUse: ["Input does not match any other route"], }, markAsDefault: true, }, }, } ``` ### Classify a prompt to determine the route ```ts const routeKey = await router.classify("I was charged twice last week.") // routeKey === "billing" ``` ### Run the full routing workflow ```ts const run = await router.run("I was charged twice last week.") const result = await run.output() ``` The `run` method internally calls `classify`, selects the matching `RoutingWorkflowEntry`, and delegates execution to its `workflow`. If no route matches, the entry with `markAsDefault: true` is used. ## Integration * [Orchestrator-Workers](/docs/workflows/orchestrator-workers) — plan, delegate to multiple workers, and synthesize results * [Parallelization](/docs/workflows/parallelization) — split a task across concurrent workflows * [Evaluator-Optimizer](/docs/workflows/evaluator-optimizer) — generate, evaluate, and refine output in a loop * [Runs](/docs/runs) — timeout, retry, cancel, and approval controls for any workflow