Policy Optimization - Sec0

Keep the Safety Boundary Fixed
Signals Developers Should Emit
SDK Features That Produce Those Signals
Rollout Guidance

Policy optimization in Sec0 is not a separate training API. It is the practice of emitting consistent runtime signals so you can evaluate policy changes, gate rollouts, and improve orchestration quality without weakening the safety boundary.

Keep the Safety Boundary Fixed

Guardrails, deny rules, and approvals remain authoritative.
Policy optimization should only rank or choose among already-allowed actions.
New policies should move through observe mode, evals, and staged rollout before full enforcement.

Signals Developers Should Emit

Stable nodeId and runId values on every hop.
Pinned tool@version names for every invocation.
Objectives, plan state, and hop metadata through AgentManager.
Allow, deny, and escalate reasons from middleware, gateway, decorators, and guard.
Latency, retries, cost proxies, and approval outcomes in audit logs.

SDK Features That Produce Those Signals

sec0-sdk/instrumentation provides hop identity, agent state, and trace linkage.
sec0-sdk/middleware adds per-tool decisions, integrity signals, and scan findings.
sec0-sdk/gateway adds entitlement, quota, idempotency, and AP2 decisions at the network edge.
sec0-sdk/guard records allow, redact, block, and escalate outcomes for app-level actions.
sec0-sdk/audit gives you append-only evidence you can diff across policy revisions.

Rollout Guidance

Start with deny_on: [] and capture clean audit data.
Add per-node policy scope when different agents have different risk budgets.
Compare policy revisions against the same audit stream before turning on deny paths.
Stage new approval or remote-runtime rules behind partial rollout.
Keep rollback simple by pinning policy versions per environment.

Remediation Policy Audit & Custody