System Model

逻辑分层

Agent Service 平台由以下逻辑子系统组成：

Control Plane
- 管理 Organization、Agent、Tool
- 接收创建/取消/通知/失败等 run 命令
- 处理鉴权、配额、资源元数据与策略校验
Orchestrator
- 驱动 run 状态机
- 选择 provider
- 管理 attempt、超时、重试、取消与恢复
Runtime Session Gateway
- 管理 runtime attach、heartbeat、recover-connections、KV、secrets、config
- 在 runtime 与 orchestrator 之间传递输入与输出
Telemetry / Trace Plane
- 接收 runtime 导出的 span 记录
- 提供 trace / span / span event 查询、聚合和关联日志上下文
- 负责 context propagation 兼容层和 LLM 输入输出内容捕获治理
- 不参与 run 状态推进，但为诊断、性能分析和审计提供事实
Provider Layer
- 抽象 local、sandbox、faas 等执行环境
- 统一 prepare / start / keepAlive / release 能力
Event Log + Projection
- 持久化运行事实
- 重建 message、artifact、checkpoint、webhook 等读模型
Read Plane
- 提供 run 查询、消息查询、artifact 查询、checkpoint 查询与流式读取

规范性原则

所有执行事实 MUST 先进入事件流，再产生投影。
控制面 MUST NOT 直接写入消息读模型或 artifact 读模型。
runtime 与 provider MUST NOT 直接更新 run 终态读模型；它们只能通过命令和事件影响状态。
任何可回放的用户可见输出 MUST 可由事件流重建。
Trace / span 记录 MUST 与 run 事实关联，但 MUST NOT 反向驱动 run 状态机。

规范中的 canonical write path

客户端向 Control Plane 发送命令
Orchestrator 创建或推进 run / attempt
Runtime Session Gateway 将输入交付给 runtime
runtime 发布输出事件
事件写入 Event Log
Projection 生成 Message、Artifact、Checkpoint、WebhookDelivery
runtime 或 tracing exporter 导出结束后的 AgentSpan
runtime 或 tracing exporter 导出 TraceEvent 与可选的 GenAI operation details
Read Plane 提供查询与 replay + live tail

逻辑存储角色

Resource Store
- 组织、Agent、Tool 等资源定义
Run Store
- run / attempt 的控制面状态快照
Event Log
- append-only 运行事实
Projection Store
- message / artifact / checkpoint 等读模型
Trace Store
- trace / span 查询模型
Lease / Cache Store
- 分布式租约、短期协调、热状态
Blob Store
- checkpoint、日志、二进制 artifact

非目标

本规范不要求控制面、编排器、投影器必须拆成独立物理服务。
本规范不要求底层消息中间件或数据库的具体选型。
本规范不要求所有 transport 使用统一底层协议，但语义必须统一。
本规范不要求 tracing 与 event log 共用同一物理存储，但 trace 查询语义必须稳定。