agent-ecosystem/docs/team-management/opencode-snapshot-first-proof-upgrade-plan.md

113 KiB

OpenCode Snapshot-First Proof Upgrade Plan

Goal

Reduce false or avoidable OpenCode review warnings for new tasks by upgrading metadata-only OpenCode edit, write, and apply_patch changes to verified full-text before/after changes when, and only when, existing OpenCode snapshot evidence proves the exact file state transition.

The implementation must be fail-closed:

  • If proof is complete, store full before/after content and remove manual-only warnings.
  • If proof is incomplete, ambiguous, too large, binary, outside scope, or unavailable, keep the current warning.
  • Never use current disk content as proof for historical before/after.
  • Never broaden attribution outside strict delivery context.

Non-goals

  • Do not change Codex or Anthropic task extraction.
  • Do not change generic review UI semantics.
  • Do not infer diffs from current disk.
  • Do not scan unrelated OpenCode sessions.
  • Do not increase OpenCode snapshot file size limits as part of this work.
  • Do not retroactively "fix" old tasks unless existing ledger backfill has strict delivery and snapshot evidence.

Current System Facts

Desktop repo:

  • ChangeExtractorService requests OpenCode ledger backfill only when delivery context is available.
  • Backfill goes through OpenCodeReadinessBridge.backfillOpenCodeTaskLedger.
  • Imported events already support full before/after content and metadata-only fallbacks.

Orchestrator repo:

  • OpenCodeProfileManager.buildManagedConfig() already sets snapshot: true.
  • OpenCodeLedgerBridgeService.backfill() reconstructs toolpart changes, then calls OpenCodeChangeEvidenceEnricher.enrich().
  • OpenCodeOfflineSessionReader reads OpenCode SQLite history in read-only mode and extracts snapshot windows.
  • OpenCodeSnapshotEvidenceProviderService reads before/after snapshot file contents with strict limits.
  • OpenCodeToolpartChangeReconstructor already creates exact toolpart-chain changes when it has a known baseline.
  • OpenCodeChangeEvidenceEnricher already upgrades some metadata-only changes through snapshot or inverse chain proof.

This plan should strengthen the existing evidence path rather than introduce a new capture subsystem.

Risk Estimate

Recommended implementation:

  • Functional bug risk: 3/10.
  • Performance regression risk: 2/10.
  • Data safety risk: 2/10.
  • Complexity: 7/10.
  • Approximate runtime change size: 220-450 LOC.
  • Approximate total change size with tests and diagnostics: 450-900 LOC.

The low data safety risk depends on preserving fail-closed behavior. If any step starts accepting guesses as proof, data safety risk becomes 7/10 or worse.

Hard Safety Invariants

These invariants are more important than reducing warnings.

  1. A full-text upgrade must be tied to one task, one member, one OpenCode session, one delivery record, one assistant message, one toolpart, and one snapshot window.
  2. The upgrade must be local to OpenCode. Codex, Anthropic, generic task-log parsing, and non-OpenCode review flows must not change.
  3. strict-delivery is required for every snapshot-based full-text upgrade. Compatible attribution may still import metadata-only events, but it must not produce auto-safe before/after content.
  4. Current disk content is never historical evidence. It can be displayed as read-only context by the desktop, but it cannot remove a warning or enable safe reject.
  5. Hash-only evidence is not full-text evidence. A hash can verify text that was already read from a trusted snapshot, but a hash alone is not enough.
  6. Empty string is valid full text. null and undefined mean unavailable.
  7. Large, binary, truncated, path-unsafe, or schema-unsupported content stays metadata-only.
  8. A failed upgrade must preserve the original change event shape as much as possible. It may add diagnostics, but it must not remove warnings or mutate operation/confidence.
  9. Imported event idempotency must remain based on existing source import keys. The upgrade must not create duplicate events for the same toolpart/path.
  10. Any multi-change path chain must be all-or-nothing for unresolved changes in that path/window. Partial upgrades are allowed only for changes that were already independently exact before the chain attempt.
  11. The implementation must never make a previously non-rejectable change rejectable unless both beforeContent and the target after/absence state are proven from the same trusted historical evidence path.
  12. Diagnostics are allowed to become more detailed. They are not allowed to be used as a substitute for proof.

Things That Are Explicitly Not Proof

These signals can be useful diagnostics, but they must not remove warnings or enable safe reject by themselves:

  • Current disk content matching an expected hash.
  • Current disk content matching newString.
  • A file path appearing in OpenCode tool metadata.
  • A file path appearing in a snapshot diff without readable before/after text.
  • A before/after hash without the corresponding text blob.
  • An OpenCode tool status of completed.
  • The absence of an error in the toolpart.
  • The task title mentioning the same directory.
  • A member name matching the expected teammate.
  • A session id matching but no strict delivery record.
  • A snapshot window in the same session but a different assistant message.
  • A snapshot window that overlaps several toolparts ambiguously.
  • A successful manual UI render of current disk preview.

If implementation pressure makes any of these tempting, stop and keep the warning.

Threat Model

The feature is not security-sensitive in the network sense, but it is data-safety-sensitive. The main threat is a false full-text proof that enables safe reject/apply for the wrong historical state.

Bug classes to defend against:

  1. Cross-task contamination:
    • A file change from task A appears in task B review.
    • Main defense: strict delivery, canonical task id, source message/window matching, real-data smoke.
  2. Cross-member contamination:
    • A teammate using the same OpenCode profile is attributed to another member.
    • Main defense: delivery record member/lane/session matching.
  3. Cross-window contamination:
    • A toolpart is matched to the wrong snapshot window in the same message.
    • Main defense: exactly-one window matching and order tests.
  4. False baseline:
    • Current disk or hash-only evidence is treated as historical before text.
    • Main defense: "not proof" list and code review checklist.
  5. Unsafe warning removal:
    • UI stops warning about a file that is still manual-only.
    • Main defense: central warning predicate and negative warning tests.
  6. Duplicate imported events:
    • The same source toolpart appears twice after re-backfill.
    • Main defense: source-key idempotency audit and repeated-backfill tests.
  7. Silent performance regression:
    • Snapshot proof reads too many blobs or times out often.
    • Main defense: proof-needed filtering, existing limits, timing counters.
  8. Unsupported upstream shape:
    • OpenCode changes SQLite/snapshot schema and our parser guesses.
    • Main defense: shape fingerprint, unsupported fallback, abort condition.

For every bug class above, the implementation needs at least one negative test or real-data smoke assertion.

Pre-Implementation Audit Checklist

Before writing runtime code, answer these questions from the current codebase:

  1. Does the task-change ledger importer update, replace, supersede, or append events with the same sourceImportKey?
  2. Does the desktop review bundle dedupe by source key, file path, event id, or a computed change id?
  3. Which exact helper decides whether a file is rejectable?
  4. Which warnings are currently surfaced in TeamChangesSection versus the full review dialog?
  5. Does the OpenCode backfill cache hide an upgraded result for up to 60 seconds after a first metadata-only result?
  6. Does materializeMetadataOnlyChanges preserve evidenceProof, snapshotId, snapshotSource, and warnings exactly?
  7. Can the snapshot provider return beforeState/afterState with hashes but no text, and how is that serialized into task-change events?
  8. Are OpenCode snapshot windows always message-local in current real data?
  9. Are there real examples where a single toolpart touches more than one file?
  10. Are there real examples where apply_patch contains rename or mode-only changes?
  11. Does the snapshot provider ever return duplicate file entries for the same normalized path?
  12. Does the task-change worker cache bundle results independently from the OpenCode backfill cache?
  13. Are task-level warnings derived only from imported events, or can they come from boundary parsing separately?
  14. Does a safe reject require beforeContent, or can beforeState.exists === false plus afterContent be enough for creates?
  15. Is there any existing telemetry/log sink where structured counters can be emitted without leaking file contents?

If any answer is unknown, add a focused diagnostic or unit test before changing behavior. Do not use implementation guesses for these contracts.

Decision Gates

These gates must be passed in order. Do not skip gates to get fewer warnings faster.

Gate Required evidence If not met
G0 contract audit importer, bundle, rejectability, cache behavior known no runtime change
G1 diagnostics-only new diagnostics pass tests with no behavior change fix diagnostics first
G2 shadow proof proof computes stats but imports original changes keep behavior disabled
G3 single-change proof positive and negative single-change tests pass keep apply disabled
G4 real-data single-change smoke OpenCode improves or stays same, non-OpenCode unchanged do not enable default
G5 multi-change proof all chain tests pass, no ambiguous branch accepted keep full unavailable
G6 real-data full smoke no cross-task leakage, budgets pass keep default single-change
G7 rollback check OPENCODE_SNAPSHOT_PROOF_UPGRADE=off restores old behavior do not ship

The implementation should be easy to stop after G4. Single-change mode is a valid ship point; full mode is optional.

Known Unknowns That Block Full Mode

full mode must stay disabled if any of these are still unknown:

  • Whether importer supersedes or appends duplicate sourceImportKey events.
  • Whether real OpenCode data has nested or overlapping snapshot windows.
  • Whether real OpenCode apply_patch parts include rename, chmod, or binary patch shapes.
  • Whether multi-change same-path chains occur often enough to justify the risk.
  • Whether review bundle dedupe can handle upgraded old events without duplicate rows.
  • Whether snapshot proof stats can be collected without logging sensitive content.

Unknowns do not block diagnostics or single-change mode. They block full mode.

Assumption Ledger

Keep an explicit ledger of assumptions. Each assumption needs a validation path and a fallback. Do not leave assumptions implicit in implementation code.

Assumption Validation Fallback if false
OpenCode snapshot windows are message-local unit fixture and real-data diagnostics metadata-only fallback
Source import keys are stable across re-backfill repeated-backfill test new-imports-only, no old rewrite
Review bundle dedupes safely Phase 0 audit and bridge test do not upgrade old events
Empty string survives materialization serialization test do not upgrade empty files
Existing reject helper checks current disk desktop contract test fix shared helper before enabling
Snapshot store objects remain readable long enough retention fixture and diagnostics metadata-only fallback
Part ordering is stable enough for chains ordering unit tests disable full
Warning predicates are complete unit tests naming every removed warning preserve warning
Stats can be emitted without content log review and tests disable stats or redact harder
Non-OpenCode fingerprints stay identical real-data mode comparison keep apply modes disabled

If an assumption has no validation path, it should be moved to "Known Unknowns" and block full mode.

Capability And Version Gates

Do not assume that snapshot: true in managed OpenCode config means snapshot evidence is usable for every session. Treat snapshot proof as a runtime capability that must be observed for the specific session being backfilled.

Required capability checks:

  • OpenCode SQLite schema is supported.
  • Session identity includes project id, directory, worktree, and git VCS.
  • Session worktree matches the expected workspace root.
  • Snapshot windows are present and paired.
  • Snapshot git store reader reports the expected shape fingerprint.
  • Snapshot file evidence can be read under the existing limits.
  • The proof path sees the same normalized relative path in reconstruction and snapshot evidence.

Suggested result type:

type SnapshotProofCapability =
  | {
      supported: true
      shapeFingerprint: string
      sessionId: string
      projectId: string
    }
  | {
      supported: false
      code:
        | 'sqlite-schema-unsupported'
        | 'session-identity-missing'
        | 'workspace-mismatch'
        | 'snapshot-window-missing'
        | 'snapshot-store-unsupported'
        | 'snapshot-store-missing'
      diagnostics: string[]
    }

Rules:

  • Unsupported capability returns metadata-only fallback.
  • Unknown capability returns metadata-only fallback.
  • Capability diagnostics may be emitted in shadow.
  • Capability success alone is not proof. It only allows proof attempts.

Mode Behavior Matrix

The mode must determine both proof computation and proof application.

Mode Compute proof? Apply proof? Import changed events? Intended use
off no no no rollback and baseline comparison
shadow yes no no validate proof quality and performance
single-change yes only one unresolved change per path/window yes safe first rollout
full yes one-change and verified chains yes optional later rollout

If implementation makes shadow import different events from off, it is a bug. If implementation makes off compute snapshot proof, it is a performance bug.

Minimum Safe Scope

The first behavior-changing implementation should intentionally support less than the full theoretical feature.

Allowed in first single-change apply mode:

  • OpenCode only.
  • strict-delivery only.
  • One unresolved change for one normalized path inside one snapshot window.
  • Text files within existing size limits.
  • write create when before absence and after text are proven.
  • write modify when before text, after text, and toolpart after content agree.
  • edit modify when oldString occurs exactly once and produces snapshot after.
  • delete when before text and after absence are proven.

Explicitly excluded from first apply mode:

  • Multi-change same-path chains.
  • apply_patch without parsed hunks.
  • rename, chmod, binary patch, submodule, and mode-only changes.
  • Any case requiring current disk as evidence.
  • Any case requiring line-ending normalization.
  • Any case where snapshot evidence exists but operation semantics are unclear.
  • Any old metadata-only event rewrite unless source-key supersede is proven.

This scope is deliberately conservative. The goal is to prove the pipeline, not to maximize warning reduction in the first implementation.

Lowest-Confidence Areas And Mitigations

The implementation should explicitly address the areas below because they are where mistakes are most likely.

Snapshot Window Matching

Risk: OpenCode history can contain several step-start and step-finish records in the same assistant message. Incorrect ordering could attach a toolpart to the wrong snapshot pair.

Mitigation:

  • Keep the existing requirement that a toolpart must match exactly one window.
  • Keep message-local matching. Do not match a toolpart to a window from another assistant message.
  • Add tests where a toolpart is before the first window, after the last window, and inside two overlapping windows.
  • If window order cannot be proven from rawParts, skip the upgrade.

Multi-Change Chains

Risk: several edits to the same file can produce the same final content through more than one path. This is the easiest place to create a convincing but wrong diff.

Mitigation:

  • Implement single-change upgrades first.
  • Gate multi-change chain upgrades behind a narrow helper and dense tests.
  • Do not allow a write in the middle of a reverse chain unless both sides of that write are independently proven.
  • Abort the whole path/window chain on the first ambiguous step.
  • Add a kill switch that can disable multi-change upgrades while leaving single-change upgrades enabled.

Warning Removal

Risk: broad substring filtering can hide warnings that still matter, especially task-boundary or attribution warnings.

Mitigation:

  • Do not remove warnings by broad terms like manual-only alone.
  • Centralize warning predicates and match only known OpenCode baseline/content warning messages.
  • Preserve all warnings that mention attribution, delivery, boundary, confidence, path scope, binary, too-large, truncated, or unavailable snapshot content.
  • Add tests where a warning contains manual-only but is unrelated to baseline proof.

Snapshot Shape Stability

Risk: OpenCode can change SQLite or snapshot git-store shape. A shape change could make old assumptions invalid.

Mitigation:

  • Keep snapshotShapeFingerprint checks visible in diagnostics.
  • Treat unknown or unsupported shapes as metadata-only fallback.
  • Do not add compatibility shims that guess from partial rows.
  • Add an abort condition for a real-data shape mismatch.

Snapshot Store Retention

Risk: OpenCode SQLite can contain snapshot window hashes while the corresponding git-store objects are missing, pruned, moved, or unreadable. The history then looks promising but cannot prove full text.

Mitigation:

  • Treat missing snapshot objects as metadata-only fallback.
  • Keep a distinct diagnostic for missing store object versus unsupported shape.
  • Do not retry by reading current disk.
  • Do not reconstruct from only one side of the snapshot pair.
  • Add a fixture where the window exists but object read fails.

Performance

Risk: reading snapshot blobs for every task can become expensive on large sessions.

Mitigation:

  • Try snapshot proof only for unresolved OpenCode changes in strict delivery.
  • Pass only unresolved touched paths to the snapshot reader unless a same-path chain requires exact already-proven neighbors.
  • Keep the current snapshot read limits.
  • Add timing diagnostics around snapshot proof attempts.
  • Abort rollout if repeated snapshot timeouts appear in smoke data.

Existing Ledger Events

Risk: a task that was previously imported as metadata-only may later be backfilled with better evidence. If importer semantics are append-only, the UI could show duplicates or stale warnings.

Mitigation:

  • Audit importer behavior before enabling upgrades for old data.
  • Prefer stable source-key replacement/superseding if already supported.
  • If replacement is not supported, limit the behavior change to new backfill imports and leave old events untouched.
  • Add repeated-backfill tests before real-data smoke.

Cache Invalidation

Risk: desktop or worker cache may return an old metadata-only bundle after the orchestrator has imported stronger evidence, making validation confusing or causing stale warnings to persist.

Mitigation:

  • Audit all cache layers in Phase 0.
  • Include the OpenCode ledger fingerprint or imported event count in cache invalidation if an existing mechanism supports it.
  • For tests, clear or bypass caches instead of waiting for TTLs.
  • Do not add broad cache busting for all teams. Keep invalidation scoped to the requested team/task.

Partial Success Semantics

Risk: one file in a task upgrades while another remains metadata-only. Bulk review actions might accidentally assume the task is fully safe.

Mitigation:

  • Keep rejectability file-level.
  • Keep task-level warnings if any file remains manual-only or if boundaries are uncertain.
  • Add a mixed-task desktop test.

Feature Flag And Rollback

Add a runtime guard before changing behavior:

type SnapshotProofUpgradeMode = 'off' | 'shadow' | 'single-change' | 'full'

function getSnapshotProofUpgradeMode(env: NodeJS.ProcessEnv): SnapshotProofUpgradeMode {
  const raw = env.OPENCODE_SNAPSHOT_PROOF_UPGRADE
  if (raw === '0' || raw === 'off') {
    return 'off'
  }
  if (raw === 'shadow') {
    return 'shadow'
  }
  if (raw === 'full') {
    return 'full'
  }
  if (raw === 'single-change') {
    return 'single-change'
  }
  return 'shadow'
}

Recommended rollout:

  • Default to shadow during development and first smoke validation.
  • Move to single-change only after shadow stats show expected upgrades with no behavior changes.
  • Move to full only after multi-change chain tests and real-data smoke pass.
  • Keep off available as an emergency rollback path.
  • If full later becomes the default, that should be a separate rollout change after the implementation has passed real-data smoke in explicit full mode.

If the project already has a central feature-flag/env helper for OpenCode runtime behavior, use that instead of adding a new ad-hoc parser.

shadow mode is intentionally different from off:

  • off does not attempt proof.
  • shadow attempts proof and records stats/diagnostics, but returns the original changes to the importer.
  • single-change applies only one-change path/window upgrades.
  • full applies single-change and multi-change chain upgrades.

This gives a low-risk way to validate proof quality and performance on real data before changing review safety.

Architecture

Use this pipeline:

OpenCode SQLite history
  -> toolpart reconstruction
  -> strict delivery attribution
  -> snapshot window grouping
  -> snapshot file read with limits
  -> proof upgrade per path
  -> validate candidate batch
  -> import task-change events

The upgrade belongs in the orchestrator evidence layer, primarily around:

  • OpenCodeChangeEvidenceEnricher.ts
  • OpenCodeSnapshotEvidenceProvider.ts
  • OpenCodeToolpartChangeReconstructor.ts only if a small helper or extra metadata is needed
  • tests near existing OpenCode evidence and ledger bridge tests

Avoid touching desktop review UI for the proof itself. The desktop should only benefit from better imported event content.

Composite Identity Contract

Every full-text proof must be anchored to a composite identity. Do not rely on any single field alone.

Required identity dimensions:

type SnapshotProofIdentity = {
  teamName: string
  taskId: string
  memberName: string
  laneId?: string
  sessionId: string
  parentUserMessageId?: string
  assistantMessageId: string
  sourceMessageId: string
  sourcePartId: string
  toolUseId: string
  relativePath: string
  snapshotWindowId: string
  fromSnapshot: string
  toSnapshot: string
}

Rules:

  • taskId must be canonical, not display-only.
  • memberName, laneId, and sessionId must come from strict delivery or already trusted session records.
  • sourceMessageId must match the snapshot window message id.
  • sourcePartId must be inside the matched window according to the same message's part order.
  • relativePath must be normalized through the existing OpenCode path helpers.
  • fromSnapshot and toSnapshot must be the exact pair used to read file evidence.

If any identity dimension is missing, the default is metadata-only-fallback.

Ordering Contract

Same-path chains are safe only if toolpart order is stable and proven. Use the existing ordering data from OpenCode SQLite. Do not introduce a new sort.

Preferred order keys, in priority order:

  1. messageTimeCreated
  2. messageIdSort
  3. messagePartOrder
  4. partId

Rules:

  • Do not sort only by partId.
  • Do not sort only by timestamp.
  • Do not merge parts from different sourceMessageId values into one chain.
  • If two parts have indistinguishable order, do not upgrade the chain.
  • If raw part order is unavailable, single-change upgrade may still work, but multi-change mode must skip.

Example guard:

function hasStablePartOrder(parts: SourcePartSortKey[]): boolean {
  const seen = new Set<string>()
  for (const part of parts) {
    const key = [
      part.messageTimeCreated,
      part.messageIdSort,
      part.messagePartOrder,
      part.partId,
    ].join('\0')
    if (seen.has(key)) {
      return false
    }
    seen.add(key)
  }
  return true
}

If the real symbol names differ, keep the same invariant.

Cross-Repo Contract Boundaries

This feature crosses the desktop repo and the orchestrator repo. Keep the contract explicit.

Desktop responsibilities:

  • Request OpenCode backfill only when delivery context exists.
  • Keep cache/in-flight dedupe behavior.
  • Render full-text events as diffs.
  • Render metadata-only events as manual-only warnings.
  • Use existing rejectability checks. Do not special-case OpenCode snapshot events in the UI unless a rendering bug is found.

Orchestrator responsibilities:

  • Read OpenCode history and snapshot evidence.
  • Decide whether proof is strong enough to materialize before/after text.
  • Preserve strict delivery attribution.
  • Preserve source import keys.
  • Emit diagnostics explaining why upgrades were skipped.

Shared contract:

type ReviewSafetyContract = {
  sourceImportKey: string
  evidenceProof: OpenCodeEvidenceProof
  beforeContent: string | null
  afterContent: string | null
  beforeState?: { exists?: boolean; sha256?: string; sizeBytes?: number; unavailableReason?: string }
  afterState?: { exists?: boolean; sha256?: string; sizeBytes?: number; unavailableReason?: string }
  warnings?: string[]
}

Safe reject requires a proven historical baseline:

function hasSafeHistoricalBaseline(change: ReviewSafetyContract): boolean {
  if (change.beforeContent !== null) {
    return true
  }
  return change.beforeState?.exists === false && change.afterContent !== null
}

The exact desktop helper may have a different name. The invariant should match this contract.

Apply/Reject Execution Safety Contract

Snapshot proof can make a review event eligible for normal diff rendering and safe reject consideration. It must not bypass current worktree conflict checks.

Review safety and execution safety are different:

  • Review safety answers: "Do we know the historical before/after for this change?"
  • Execution safety answers: "Can we apply or reject this change against the user's current disk state right now?"

This feature only upgrades review safety. It must not weaken execution safety.

Required rules:

  • Rejecting a modify still requires the current file to match the expected after state, or whatever stricter existing conflict check is already used.
  • Rejecting a create still requires the current file to match the created after state before deletion.
  • Rejecting a delete still requires the current absence/after state to match the expected deleted state before restoring before content.
  • Accepting an OpenCode change must not overwrite unrelated current disk edits.
  • Bulk Reject All must keep per-file conflict checks and skip unsafe files.
  • Current disk mismatch should produce a conflict/manual warning, not a proof downgrade.

Suggested predicate split:

function isReviewSafe(change: ReviewSafetyContract): boolean {
  return isSnapshotReviewSafe(change)
}

function canExecuteReject(input: {
  change: ReviewSafetyContract
  currentDiskState: { exists: boolean; sha256?: string }
}): boolean {
  if (!isReviewSafe(input.change)) {
    return false
  }
  // Use the existing project helper here. This sketch only documents that
  // execution safety is a separate check from proof safety.
  return currentDiskMatchesExpectedAfterState(input.change, input.currentDiskState)
}

Do not implement currentDiskMatchesExpectedAfterState ad hoc if the project already has a conflict/rejectability helper. This plan requires preserving that existing behavior.

Data Model Contract

Do not introduce a new task-change event shape unless absolutely necessary. Prefer filling existing fields:

type UpgradedOpenCodeChangeContract = {
  sourceTool: 'write' | 'edit' | 'apply_patch' | 'snapshot_patch'
  sourceImportKey: string
  evidenceProof: 'opencode-snapshot' | 'inverse-edit-chain' | 'inverse-apply-patch-chain' | 'toolpart-chain'
  confidence: 'high' | 'exact'
  beforeContent: string | null
  afterContent: string | null
  beforeState: {
    exists?: boolean
    sha256?: string
    sizeBytes?: number
    unavailableReason?: never
  }
  afterState: {
    exists?: boolean
    sha256?: string
    sizeBytes?: number
    unavailableReason?: never
  }
  snapshotId?: string
  snapshotSource?: 'opencode'
  warnings: string[]
}

Important:

  • Upgraded full-text events should not carry unavailableReason for the before/after side they claim to prove.
  • Metadata-only events may carry unavailableReason, but then they must remain non-rejectable.
  • confidence: 'high' is acceptable for snapshot proof. Use exact only for truly exact toolpart chains that already have local full text.
  • snapshotId is useful provenance, but it is not required for safety if the proof was otherwise validated. Missing snapshotId should be diagnostic.

Storage And Memory Contract

The feature must not create a second blob storage path or keep large full-text content in memory longer than the existing ledger import requires.

Rules:

  • Reuse the existing task-change ledger content storage.
  • Do not duplicate before/after text in diagnostics, stats, or cache keys.
  • Do not add per-team global caches of snapshot file content.
  • Do not store both snapshot raw blobs and task-change blobs unless the existing snapshot reader already does that internally.
  • Apply the per-file and total byte limits before materializing upgraded events.
  • If a file exceeds the limit, store metadata-only state with a reason.
  • If many small files exceed the total byte budget, skip the excess files as metadata-only instead of raising the limit.
  • Stats should count bytes read and files skipped, but never include content.

Suggested stats additions:

type SnapshotProofStorageStats = {
  bytesRead: number
  bytesMaterialized: number
  skippedByByteLimit: number
  skippedByTotalBudget: number
}

Memory pressure is a reason to keep metadata-only fallback. It is not a reason to increase limits or stream partial text into a diff.

Mutation Rules

When an upgrade is skipped, only diagnostics may change. The returned ReconstructedOpenCodeToolChange must preserve:

  • beforeContent
  • afterContent
  • beforeState
  • afterState
  • operation
  • confidence
  • warnings
  • evidenceProof
  • sourceImportKey

When an upgrade succeeds, only these fields may change:

  • beforeContent
  • afterContent
  • beforeState
  • afterState
  • operation, only when the tool semantics and snapshot operation both prove it
  • confidence
  • warnings, only through the central resolved-warning predicate
  • evidenceProof
  • evidenceDiagnostics
  • snapshotId
  • snapshotSource

No other fields should be rewritten by the proof upgrade. This reduces accidental attribution changes.

Proof Levels

Use explicit proof labels and keep their meaning strict.

type OpenCodeEvidenceProof =
  | 'toolpart-chain'
  | 'opencode-snapshot'
  | 'inverse-edit-chain'
  | 'inverse-apply-patch-chain'
  | 'metadata-only-fallback'

Accepted for auto review:

  • toolpart-chain
  • opencode-snapshot
  • inverse-edit-chain
  • inverse-apply-patch-chain

Not accepted for safe reject/apply:

  • metadata-only-fallback
  • current disk only
  • file path metadata only
  • hash without text
  • text without matching task/window/path proof

Proof Decision Tables

Operation State Table

Tool Snapshot before Snapshot after Tool fields Upgrade? Reason
write create absent text content absent or same as after yes create is fully proven
write modify text text content absent or same as after yes before and after are fully proven
write modify unavailable text any no overwrite baseline is unknown
write modify text text content differs from after no toolpart and snapshot disagree
edit modify text text old/new apply exactly once yes edit transition is proven
edit modify text text old/new ambiguous no multiple valid transitions are possible
apply_patch modify text text hunks verify exactly yes patch transition is proven
apply_patch modify text text hunks missing maybe only if the snapshot window has exact single-file proof and no competing changes
delete text absent delete operation yes delete is fully proven
any binary/large/unavailable any any no full text is not available

Confidence Table

Evidence Confidence Safe reject/apply?
toolpart chain with known previous text exact yes
snapshot before/after with verified transition high yes
inverse edit/apply-patch chain with exact single replacements high yes
snapshot path anchor without verified transition medium no
metadata-only toolpart medium no

Do not upgrade confidence from medium to high unless safe reject/apply would also be valid.

Proof State Machine

Implement the upgrade as a state machine, not as scattered conditionals.

original change
  -> not eligible
  -> candidate
  -> snapshot evidence requested
  -> snapshot evidence matched
  -> transition verified
  -> upgraded change
  -> validated import candidate

Failure from any state returns to the original metadata-only change plus diagnostics.

Allowed transitions:

From To Required condition
original not eligible not OpenCode, exact already, flag off, non-strict delivery
original candidate OpenCode unresolved change, strict delivery, flag permits
candidate snapshot requested non-empty touched path set within limits
snapshot requested snapshot matched exactly one window and one file anchor
snapshot matched transition verified operation-specific before/after proof succeeds
transition verified upgraded state hashes match content and warnings stripped safely
upgraded validated import candidate existing candidate validation accepts it

Forbidden transitions:

  • original -> upgraded
  • candidate -> upgraded
  • snapshot requested -> upgraded
  • snapshot matched -> upgraded without operation-specific verification
  • skipped -> upgraded

Suggested type:

type ProofState =
  | { state: 'not-eligible'; reason: SnapshotUpgradeDiagnosticCode }
  | { state: 'candidate'; change: ReconstructedOpenCodeToolChange }
  | { state: 'snapshot-matched'; change: ReconstructedOpenCodeToolChange; anchor: SnapshotFileAnchor }
  | { state: 'transition-verified'; change: ReconstructedOpenCodeToolChange; before: string | null; after: string | null }
  | { state: 'upgraded'; change: ReconstructedOpenCodeToolChange }
  | { state: 'skipped'; reason: SnapshotUpgradeDiagnosticCode; original: ReconstructedOpenCodeToolChange }

The concrete implementation does not have to use this exact union, but it should preserve the same transitions.

Exhaustiveness And Type Safety

Use exhaustive switches for proof decisions, operation handling, and feature flag modes. Do not add a permissive default branch that silently preserves or upgrades without naming the case.

function assertNever(value: never, context: string): never {
  throw new Error(`Unexpected ${context}: ${String(value)}`)
}

function applyProofDecision(
  decision: SnapshotProofDecision,
  original: ReconstructedOpenCodeToolChange,
): ReconstructedOpenCodeToolChange {
  switch (decision.type) {
    case 'upgraded':
      return decision.change
    case 'skipped':
      return original
    default:
      return assertNever(decision, 'snapshot proof decision')
  }
}

If TypeScript cannot prove exhaustiveness, keep the code more explicit rather than using casts. A cast in proof code should be treated as a review smell.

Default Answers To Uncertainty

Use these defaults when implementation hits an unclear case:

Question Default
Is attribution strict enough? no upgrade
Is toolpart order stable? no multi-change upgrade
Does snapshot text prove the operation? no upgrade
Does warning removal feel broad? preserve warning
Is content text or binary? treat as unavailable
Does old event replacement behavior seem unclear? new imports only
Is cache invalidation unclear? do not rely on cache for proof
Does UI need an OpenCode-specific branch? fix shared helper or stop
Is performance impact unclear? keep flag off or single-change only

These defaults are part of the safety design, not temporary indecision.

Formal Proof Predicates

Implement proof decisions through small predicates that can be unit tested directly. Avoid spreading equivalent checks across several branches.

function isReadableFullText(value: string | null | undefined): value is string {
  return typeof value === 'string'
}

function isKnownAbsent(state: { exists?: boolean } | undefined): boolean {
  return state?.exists === false
}

function hasUnavailableReason(state: { unavailableReason?: string } | undefined): boolean {
  return typeof state?.unavailableReason === 'string' && state.unavailableReason.length > 0
}

function isProvenCreate(change: ReviewSafetyContract): boolean {
  return (
    isKnownAbsent(change.beforeState) &&
    isReadableFullText(change.afterContent) &&
    !hasUnavailableReason(change.afterState)
  )
}

function isProvenModify(change: ReviewSafetyContract): boolean {
  return (
    isReadableFullText(change.beforeContent) &&
    isReadableFullText(change.afterContent) &&
    !hasUnavailableReason(change.beforeState) &&
    !hasUnavailableReason(change.afterState)
  )
}

function isProvenDelete(change: ReviewSafetyContract): boolean {
  return (
    isReadableFullText(change.beforeContent) &&
    change.afterState?.exists === false &&
    !hasUnavailableReason(change.beforeState)
  )
}

function isSnapshotReviewSafe(change: ReviewSafetyContract): boolean {
  return (
    change.evidenceProof === 'opencode-snapshot' ||
    change.evidenceProof === 'inverse-edit-chain' ||
    change.evidenceProof === 'inverse-apply-patch-chain' ||
    change.evidenceProof === 'toolpart-chain'
  ) && (
    isProvenCreate(change) ||
    isProvenModify(change) ||
    isProvenDelete(change)
  )
}

The real implementation can use existing helper names, but tests should cover the predicates above as behavior. In particular, unavailableReason on a side that claims full proof should make the change unsafe.

Atomicity And Failure Semantics

Snapshot proof should behave atomically at three levels.

Per change:

  • Success returns one upgraded change.
  • Failure returns the original change unchanged plus diagnostics.
  • No intermediate state should be visible to importer validation.

Per same-path chain:

  • Success upgrades every unresolved change in the chain.
  • Failure upgrades none of the unresolved changes in the chain.
  • Already exact changes may remain exact, but they must not be rewritten by the failed chain attempt.

Per import batch:

  • Candidate validation runs after proof upgrade.
  • If import fails, review safety must not observe a partially imported safe state.
  • Retry uses stable source import keys.

Implementation pattern:

const original = change
const decision = tryUpgradeChange(change)
if (decision.type === 'skipped') {
  diagnostics.push(decision.reason)
  return original
}
const upgraded = decision.change
if (!isSnapshotReviewSafe(upgraded)) {
  diagnostics.push('snapshot-upgrade-skipped/postcondition-failed')
  return original
}
return upgraded

Do not mutate change in place before postconditions pass.

Postconditions

Every successful upgrade must satisfy these postconditions:

function assertUpgradePostconditions(input: {
  original: ReconstructedOpenCodeToolChange
  upgraded: ReconstructedOpenCodeToolChange
}): boolean {
  const { original, upgraded } = input
  return (
    original.sourceImportKey === upgraded.sourceImportKey &&
    original.taskId === upgraded.taskId &&
    original.teamName === upgraded.teamName &&
    original.memberName === upgraded.memberName &&
    original.sessionId === upgraded.sessionId &&
    original.sourcePartId === upgraded.sourcePartId &&
    original.sourceMessageId === upgraded.sourceMessageId &&
    original.relativePath === upgraded.relativePath &&
    upgraded.evidenceProof !== 'metadata-only-fallback' &&
    isSnapshotReviewSafe(upgraded)
  )
}

If a postcondition fails, keep the original change and emit a diagnostic. A postcondition failure is a bug in proof logic, not a reason to relax safety.

Runtime Assertion Policy

Assertions should catch programmer errors without making production data unsafe.

Rules:

  • In tests, postcondition failures should fail loudly.
  • In production backfill, postcondition failures should skip the upgrade, preserve the original metadata-only change, and emit a diagnostic.
  • Assertions must never catch an error and continue with upgraded content.
  • Assertions must not include file content in thrown messages.
  • Assertions should include stable identifiers such as task id, source part id, source import key, and normalized relative path.

Suggested pattern:

function enforceUpgradePostconditions(input: {
  original: ReconstructedOpenCodeToolChange
  upgraded: ReconstructedOpenCodeToolChange
  diagnostics: string[]
}): ReconstructedOpenCodeToolChange {
  if (assertUpgradePostconditions(input)) {
    return input.upgraded
  }
  input.diagnostics.push(
    `snapshot-upgrade-skipped/postcondition-failed:${input.original.sourceImportKey}`,
  )
  return input.original
}

Do not use runtime assertions to justify looser proof predicates. Assertions are a last guard, not the proof itself.

Implementation Phases

Phase 0 - Audit Contracts Before Behavior Changes

This phase should be completed before any runtime behavior change.

Audit:

  • Ledger import behavior for duplicate sourceImportKey.
  • Review bundle dedupe behavior.
  • Existing rejectability helper.
  • Existing OpenCode backfill cache and in-flight behavior.
  • materializeMetadataOnlyChanges serialization of proof fields.
  • Current real-data snapshot diagnostics for a few OpenCode teams.

Deliverable:

Contract audit:
- sourceImportKey duplicate policy: replace | supersede | append | unknown
- review bundle dedupe key: ...
- rejectability helper: ...
- metadata materialization preserves proof fields: yes | no
- observed snapshot shape fingerprint: ...
- can proceed to Phase 1: yes | no

If any field is unknown, do not proceed to behavior changes.

Phase 1 - Add Targeted Diagnostics

Add diagnostics that explain why a metadata-only change was not upgraded. This makes real-data validation much easier.

Examples:

  • snapshot-upgrade-skipped/no-window
  • snapshot-upgrade-skipped/ambiguous-window
  • snapshot-upgrade-skipped/no-file-anchor
  • snapshot-upgrade-skipped/binary
  • snapshot-upgrade-skipped/too-large
  • snapshot-upgrade-skipped/path-chain-ambiguous
  • snapshot-upgrade-skipped/toolpart-after-mismatch
  • snapshot-upgrade-skipped/current-disk-not-proof
  • snapshot-upgrade-skipped/strict-delivery-required
  • snapshot-upgrade-skipped/unsupported-snapshot-shape
  • snapshot-upgrade-skipped/warning-preserved
  • snapshot-upgrade-skipped/feature-flag-off

These diagnostics should not be user-noisy by default, but they should be available in backfill result diagnostics and tests.

Diagnostics should be structured internally even if the public result remains a string array:

type SnapshotUpgradeDiagnosticCode =
  | 'snapshot-upgrade-skipped/no-window'
  | 'snapshot-upgrade-skipped/ambiguous-window'
  | 'snapshot-upgrade-skipped/no-file-anchor'
  | 'snapshot-upgrade-skipped/operation-mismatch'
  | 'snapshot-upgrade-skipped/toolpart-after-mismatch'
  | 'snapshot-upgrade-skipped/path-chain-ambiguous'
  | 'snapshot-upgrade-skipped/strict-delivery-required'
  | 'snapshot-upgrade-skipped/unsupported-snapshot-shape'

function pushSnapshotDiagnostic(
  diagnostics: string[],
  code: SnapshotUpgradeDiagnosticCode,
  detail: string,
): void {
  diagnostics.push(`${code}: ${detail}`)
}

Using a small typed union makes it harder to accidentally invent inconsistent diagnostics throughout the proof code.

Phase 2 - Make Upgrade Eligibility Explicit

Add a helper that decides whether a change needs snapshot proof. It should skip already exact full-text changes.

function needsSnapshotProof(change: ReconstructedOpenCodeToolChange): boolean {
  if (change.evidenceProof === 'toolpart-chain') {
    return false
  }
  if (change.beforeContent !== null && change.afterContent !== null) {
    return false
  }
  if (change.beforeState?.exists === false && change.afterContent !== null) {
    return false
  }
  if (change.beforeContent !== null && change.afterState?.exists === false) {
    return false
  }
  return (
    change.sourceTool === 'write' ||
    change.sourceTool === 'edit' ||
    change.sourceTool === 'apply_patch' ||
    change.sourceTool === 'snapshot_patch'
  )
}

Use this helper before expensive snapshot work where possible:

const changesNeedingProof = params.changes.filter(needsSnapshotProof)
if (changesNeedingProof.length === 0) {
  return result
}

Important: this helper should reduce work, not reduce safety. If in doubt, include a change in snapshot proof attempt and let the proof logic reject it.

Add a second helper for safety eligibility. It must be stricter than needsSnapshotProof.

function mayUseSnapshotProof(input: {
  attributionMode: OpenCodeLedgerAttributionMode
  change: ReconstructedOpenCodeToolChange
  mode: SnapshotProofUpgradeMode
}): boolean {
  if (input.mode === 'off') {
    return false
  }
  if (input.attributionMode !== 'strict-delivery') {
    return false
  }
  if (!needsSnapshotProof(input.change)) {
    return false
  }
  return input.change.attributionMethod === 'delivery-ledger-taskrefs'
}

This keeps "should we spend time trying?" separate from "is this proof allowed to affect review safety?".

Add a third helper for apply eligibility. shadow may compute proof, but must not apply it.

function mayApplySnapshotProof(input: {
  mode: SnapshotProofUpgradeMode
  changeCountForPathWindow: number
}): boolean {
  if (input.mode === 'off' || input.mode === 'shadow') {
    return false
  }
  if (input.mode === 'single-change') {
    return input.changeCountForPathWindow === 1
  }
  return input.mode === 'full'
}

The call site should look structurally like this:

const decision = tryComputeSnapshotProof(change)
stats.record(decision)
if (!mayApplySnapshotProof({ mode, changeCountForPathWindow })) {
  return originalChange
}
return decision.type === 'upgraded' ? decision.change : originalChange

This prevents diagnostics-only validation from accidentally changing imported review events.

Also make the final proof decision return a typed result instead of a nullable change. Nullable returns tend to hide why an upgrade failed.

type SnapshotProofDecision =
  | {
      type: 'upgraded'
      change: ReconstructedOpenCodeToolChange
      proof: Exclude<OpenCodeEvidenceProof, 'metadata-only-fallback'>
    }
  | {
      type: 'skipped'
      reason: SnapshotUpgradeDiagnosticCode
      preserveOriginal: true
    }

function preserveOriginal(
  reason: SnapshotUpgradeDiagnosticCode,
): SnapshotProofDecision {
  return { type: 'skipped', reason, preserveOriginal: true }
}

Callers should be forced to handle both branches. A skipped decision must return the original change unchanged except for diagnostics collected outside the change object.

Phase 3 - Strengthen Snapshot Anchor Matching

Snapshot anchors should be accepted only when all these conditions hold:

  1. The change belongs to a strict delivery session.
  2. The source toolpart belongs to exactly one snapshot window.
  3. The snapshot window belongs to the same OpenCode message.
  4. The normalized touched path is inside the session worktree.
  5. The snapshot reader returns an anchor for the exact relative path.
  6. Text content is full text, not binary, and within existing limits.
  7. The file operation is compatible with the tool operation.

Add a small validation helper:

type SnapshotAnchorValidation =
  | { ok: true }
  | { ok: false; reason: string }

function validateSnapshotAnchorForChange(input: {
  change: ReconstructedOpenCodeToolChange
  anchor: SnapshotFileAnchor | undefined
}): SnapshotAnchorValidation {
  const { change, anchor } = input
  if (!anchor) {
    return { ok: false, reason: 'snapshot-upgrade-skipped/no-file-anchor' }
  }

  if (change.operation === 'create' && anchor.operation !== 'create') {
    return { ok: false, reason: 'snapshot-upgrade-skipped/operation-mismatch' }
  }

  if (change.operation === 'delete' && anchor.operation !== 'delete') {
    return { ok: false, reason: 'snapshot-upgrade-skipped/operation-mismatch' }
  }

  if (change.operation === 'modify' && anchor.operation === 'create') {
    return { ok: false, reason: 'snapshot-upgrade-skipped/operation-mismatch' }
  }

  return { ok: true }
}

Do not rely only on operation matching. It is a gate, not proof.

The validation helper should also distinguish these concepts:

  • anchor operation: what the snapshot diff says happened to the file.
  • tool operation: what the reconstructed toolpart thinks happened.
  • review operation: what the imported task-change event will expose.

If these disagree, do not silently rewrite the operation unless the snapshot transition and tool semantics both prove the new value. For example, a write with no previous baseline may be reconstructed as modify; if snapshot says create and before is absent, it may be upgraded to create. A reconstructed edit must not become create or delete.

Add source identity checks before path checks:

function isSameSourceWindow(input: {
  change: ReconstructedOpenCodeToolChange
  windowMessageId: string
  windowId: string
  matchedWindowIds: string[]
}): boolean {
  return (
    input.change.sourceMessageId === input.windowMessageId &&
    input.matchedWindowIds.length === 1 &&
    input.matchedWindowIds[0] === input.windowId
  )
}

The exact data shape can differ, but the check must prove message-local and single-window identity before using the snapshot file anchor.

Phase 4 - Upgrade Single-Change Snapshot Proof

For one change touching a file within a snapshot window, upgrade directly if the snapshot anchor proves the full transition.

Rules:

  • write create:
    • accept when beforeState.exists === false and afterContent is full text.
    • if toolpart content exists, require it to equal snapshot after content.
  • write modify:
    • accept when both snapshot before and after are full text.
    • if toolpart content exists, require it to equal snapshot after content.
  • edit modify:
    • accept when both snapshot before and after are full text.
    • require applying oldString -> newString to before to equal after, unless the edit came from a verified snapshot patch.
  • apply_patch:
    • accept when snapshot before and after are full text.
    • if parsed hunks exist, verify before-to-after application or inverse chain.
  • delete:
    • accept when snapshot before is full text and snapshot after is absent.

For phase 4, do not support "maybe" apply_patch upgrades without parsed hunks. Keep those for phase 5 or leave them metadata-only. This reduces the first behavior change to the most provable cases.

Example helper:

function applyEditExactlyOnce(input: {
  before: string
  oldString: string | undefined
  newString: string | undefined
}): string | null {
  if (
    typeof input.oldString !== 'string' ||
    typeof input.newString !== 'string' ||
    input.oldString === input.newString
  ) {
    return null
  }
  if (countOccurrences(input.before, input.oldString) !== 1) {
    return null
  }
  return input.before.replace(input.oldString, input.newString)
}

Example upgrade:

function upgradeEditFromSnapshot(input: {
  change: ReconstructedOpenCodeToolChange
  anchor: SnapshotFileAnchor
}): ReconstructedOpenCodeToolChange | null {
  const before = input.anchor.beforeContent
  const after = input.anchor.afterContent
  if (typeof before !== 'string' || typeof after !== 'string') {
    return null
  }

  const applied = applyEditExactlyOnce({
    before,
    oldString: input.change.oldString,
    newString: input.change.newString,
  })
  if (applied !== after) {
    return null
  }

  return {
    ...input.change,
    beforeContent: before,
    afterContent: after,
    beforeState: contentStateForText(before),
    afterState: contentStateForText(after),
    confidence: 'high',
    evidenceProof: 'opencode-snapshot',
    snapshotId: input.anchor.snapshotId,
    snapshotSource: input.anchor.snapshotId ? 'opencode' : undefined,
    warnings: stripManualOnlyWarnings(input.change.warnings, input.anchor.warnings),
  }
}

Add a generic transition verifier so write/edit/apply_patch decisions share the same state checks:

type VerifiedTransition =
  | { ok: true; beforeContent: string | null; afterContent: string | null; operation: 'create' | 'modify' | 'delete' }
  | { ok: false; reason: SnapshotUpgradeDiagnosticCode }

function verifySnapshotTransition(input: {
  change: ReconstructedOpenCodeToolChange
  anchor: SnapshotFileAnchor
}): VerifiedTransition {
  const before = input.anchor.beforeContent
  const after = input.anchor.afterContent

  if (input.anchor.operation === 'create') {
    return typeof after === 'string'
      ? { ok: true, beforeContent: null, afterContent: after, operation: 'create' }
      : { ok: false, reason: 'snapshot-upgrade-skipped/no-file-anchor' }
  }

  if (input.anchor.operation === 'delete') {
    return typeof before === 'string'
      ? { ok: true, beforeContent: before, afterContent: null, operation: 'delete' }
      : { ok: false, reason: 'snapshot-upgrade-skipped/no-file-anchor' }
  }

  if (typeof before !== 'string' || typeof after !== 'string') {
    return { ok: false, reason: 'snapshot-upgrade-skipped/no-file-anchor' }
  }

  return { ok: true, beforeContent: before, afterContent: after, operation: 'modify' }
}

This function should not be the final proof for edit or apply_patch. It only proves that snapshot text exists for the operation state.

Before returning an upgraded change, verify the emitted states match the emitted content:

function assertStateMatchesContent(input: {
  beforeContent: string | null
  afterContent: string | null
  beforeState: ReconstructedOpenCodeToolChange['beforeState']
  afterState: ReconstructedOpenCodeToolChange['afterState']
}): boolean {
  if (input.beforeContent !== null) {
    const expected = contentStateForText(input.beforeContent)
    if (input.beforeState?.sha256 !== expected.sha256) {
      return false
    }
  }
  if (input.afterContent !== null) {
    const expected = contentStateForText(input.afterContent)
    if (input.afterState?.sha256 !== expected.sha256) {
      return false
    }
  }
  return true
}

If this assertion fails, keep the original metadata-only change and emit a diagnostic. Do not import inconsistent state/content.

Phase 5 - Upgrade Multi-Change Same-Path Chains

When several changes touch the same file inside one snapshot window, only upgrade if the whole chain verifies.

Algorithm:

  1. Start from snapshot afterContent.
  2. Walk changes for that path in reverse source order.
  3. For each change:
    • if it already has full before/after, require its after to equal the cursor.
    • for edit, reverse newString -> oldString exactly once.
    • for apply_patch, reverse parsed hunks exactly once.
    • for write, only allow it as the first/oldest operation if snapshot before matches the previous state or known absent state.
  4. If any step is ambiguous, stop and keep all unresolved warnings.
  5. If the reverse chain reaches snapshot beforeContent, materialize replacements for every unresolved change in the chain.

Pseudo-code:

function upgradeSamePathChain(input: {
  changes: ReconstructedOpenCodeToolChange[]
  anchor: SnapshotFileAnchor
  diagnostics: string[]
}): Map<string, ReconstructedOpenCodeToolChange> {
  const replacements = new Map<string, ReconstructedOpenCodeToolChange>()
  let cursor = input.anchor.afterContent

  if (typeof cursor !== 'string') {
    input.diagnostics.push('snapshot-upgrade-skipped/no-after-anchor')
    return replacements
  }

  for (let index = input.changes.length - 1; index >= 0; index -= 1) {
    const change = input.changes[index]
    if (!change) {
      continue
    }

    const upgraded = reverseOneChangeFromAfter({ change, after: cursor, anchor: input.anchor })
    if (!upgraded) {
      input.diagnostics.push(`snapshot-upgrade-skipped/path-chain-ambiguous:${change.relativePath}`)
      return new Map()
    }

    replacements.set(change.sourceImportKey, upgraded.change)
    cursor = upgraded.beforeContent
  }

  if (typeof input.anchor.beforeContent === 'string' && cursor !== input.anchor.beforeContent) {
    input.diagnostics.push('snapshot-upgrade-skipped/path-chain-boundary-mismatch')
    return new Map()
  }

  return replacements
}

This is the highest-risk section. Keep tests dense here.

If there is any schedule pressure, defer this whole phase. Single-change upgrades are enough to reduce many warnings and are much less risky.

Additional multi-change restrictions:

  • Do not cross snapshot-window boundaries.
  • Do not cross assistant-message boundaries.
  • Do not cross task delivery boundaries.
  • Do not mix changes with different sourceMessageId.
  • Do not mix changes with different normalized relativePath.
  • Do not include changes whose source import key is missing or duplicated.
  • Do not upgrade a chain if any change in the path has an operation that cannot be reversed from the current cursor.
  • Do not upgrade if the final reverse cursor does not exactly equal snapshot beforeContent for modify/delete, or known absence for create.

Add this explicit guard:

function assertSinglePathWindowChain(input: {
  changes: ReconstructedOpenCodeToolChange[]
}): boolean {
  const relativePaths = new Set(input.changes.map(change => change.relativePath))
  const messageIds = new Set(input.changes.map(change => change.sourceMessageId))
  const importKeys = new Set(input.changes.map(change => change.sourceImportKey))
  return (
    relativePaths.size === 1 &&
    messageIds.size === 1 &&
    importKeys.size === input.changes.length
  )
}

Phase 6 - Warning Stripping Must Be Conservative

Only remove warnings that are made false by the new proof.

Safe to remove after verified before/after:

  • OpenCode edit was captured without a proven full-text baseline; apply/reject is manual-only.
  • OpenCode write overwrote an existing file before the bridge had a known baseline; reject is manual-only.
  • OpenCode apply_patch was captured without full before/after text; review is manual-only.
  • OpenCode toolpart content was unavailable or too large; review is manual-only.
  • full review depends on snapshot evidence

Do not remove:

  • attribution warnings
  • low confidence task boundary warnings
  • delivery context warnings
  • path outside session directory warnings
  • large/binary warnings for other files
  • warnings attached to unrelated changes in the same task
  • snapshot unavailable warnings attached to the same file
  • any warning whose text is not in the known resolved warning predicate

Example:

function isResolvedByFullTextProof(warning: string): boolean {
  return (
    warning === 'OpenCode edit was captured without a proven full-text baseline; apply/reject is manual-only.' ||
    warning === 'OpenCode write overwrote an existing file before the bridge had a known baseline; reject is manual-only.' ||
    warning === 'OpenCode apply_patch was captured without full before/after text; review is manual-only.' ||
    warning === 'OpenCode toolpart content was unavailable or too large; review is manual-only.' ||
    warning.includes('full review depends on snapshot evidence')
  )
}

function stripManualOnlyWarnings(
  existing: string[] | undefined,
  snapshotWarnings: string[] | undefined,
): string[] {
  return [
    ...(existing ?? []).filter(warning => !isResolvedByFullTextProof(warning)),
    ...(snapshotWarnings ?? []),
  ].filter(Boolean)
}

If snapshot warnings contain unavailable content for this exact file, the change should probably not have been upgraded. Add a test for that.

Phase 7 - Preserve Performance Limits And Add Budgets

Do not increase these limits by default:

  • maxFiles: 100
  • maxBytesPerTextFile: 1024 * 1024
  • maxTotalBytes: 4 * 1024 * 1024
  • timeoutMs: 3000

Additional guard:

const unresolved = params.changes.filter(needsSnapshotProof)
if (unresolved.length === 0) {
  return result
}

const touchedRelativePaths = [...new Set(unresolved.map(change => change.relativePath))]

Do not pass already exact changes into touchedRelativePaths unless needed for chain verification. This keeps snapshot reads narrow.

Add explicit performance budgets:

  • A no-op backfill with no unresolved OpenCode changes should not invoke the snapshot reader.
  • A strict-delivery task with one unresolved file should read one touched path.
  • Snapshot proof attempt should record elapsed time in diagnostics when it exceeds 500 ms.
  • More than two snapshot timeouts in a real-data smoke run blocks rollout.
  • The broad real-data smoke should not increase total runtime by more than 10% compared with the baseline measured before the change.

Implementation sketch:

const startedAt = performance.now()
const snapshotResult = await readSnapshotEvidence()
const elapsedMs = performance.now() - startedAt
if (elapsedMs > 500) {
  diagnostics.push(`snapshot-upgrade-slow: ${Math.round(elapsedMs)}ms`)
}

Use the local runtime timing primitive already used in the orchestrator if performance.now() is not available in that module.

Add a resource envelope for one backfill call:

type SnapshotProofResourceEnvelope = {
  maxSnapshotReadsPerBackfill: 10
  maxTouchedPathsPerRead: 100
  maxBytesPerTextFile: 1024 * 1024
  maxTotalBytesPerRead: 4 * 1024 * 1024
  maxElapsedMsPerRead: 3000
}

Do not add hidden retries that can multiply these limits. One failed or timed out snapshot read should produce diagnostics and preserve metadata-only changes.

Phase 8 - Idempotency And Existing Ledger Events

The upgrade may change the materialized content for a source event that was previously imported as metadata-only. That needs a clear policy.

Preferred policy:

  1. Keep sourceImportKey stable.
  2. Let the existing ledger importer treat the upgraded event as the same source event, not a new file change.
  3. If the importer is append-only and cannot update a previous event safely, do not attempt to rewrite old ledger data in this feature.
  4. For new tasks, the upgraded evidence should be imported on the first backfill.
  5. For old tasks, a re-backfill can show better evidence only if the existing ledger/import layer already supports replacing or superseding by source key.

Add a test for repeated backfill. It should not duplicate files in the review bundle.

Phase 9 - Desktop Contract Validation

This phase should not add new UI behavior unless tests expose a bug. It validates that the upgraded events are already consumed safely.

Checklist:

  • Full-text upgraded OpenCode event renders through the same path as Codex full-text diffs.
  • Metadata-only OpenCode event still renders the warning banner.
  • Mixed full-text and metadata-only task keeps per-file rejectability.
  • Reject All skips metadata-only files.
  • Current disk preview remains read-only context.
  • Task summary warnings remain visible if attribution or boundary warnings remain.

If any item fails, fix the shared review safety helper rather than adding a separate OpenCode-specific branch in the UI.

Observability And Metrics

Add counters to diagnostics or existing debug output. They should be cheap and safe to expose in test logs.

Suggested counters:

type SnapshotProofStats = {
  attemptedChanges: number
  upgradedChanges: number
  skippedChanges: number
  skippedByReason: Record<string, number>
  snapshotReadCount: number
  snapshotReadTimeouts: number
  snapshotReadElapsedMs: number
  touchedPathCount: number
  exactToolpartChainCount: number
  metadataOnlyFallbackCount: number
}

Use these stats in smoke output:

OpenCode snapshot proof:
- attempted: 12
- upgraded: 7
- skipped: 5
- skipped/no-window: 2
- skipped/path-chain-ambiguous: 1
- skipped/too-large: 2
- snapshot reads: 3
- snapshot read time: 184ms

Metrics must not include file content or secrets. Paths are acceptable only if the existing diagnostics already expose paths in the same context.

Deterministic Output Comparison

Use deterministic fingerprints to compare off, shadow, and apply modes. This catches accidental behavior changes that are hard to see in UI screenshots.

Suggested fingerprint input:

type ReviewBundleFingerprintInput = Array<{
  taskId: string
  relativePath: string
  sourceImportKey: string
  evidenceProof: string | undefined
  operation: string
  beforeSha256?: string
  afterSha256?: string
  warningCount: number
  rejectable: boolean
}>

Rules:

  • off and shadow fingerprints must match except for diagnostics/stats.
  • single-change may change OpenCode entries only.
  • full may change OpenCode entries only.
  • Non-OpenCode entries must have identical fingerprints in every mode.
  • Fingerprints must not include raw file content.

If a mode comparison fails, inspect the structured diff before looking at UI.

Cache And Re-Backfill Policy

The safest initial policy is:

  • New backfills may import upgraded proof.
  • Existing metadata-only events should not be rewritten unless the current importer already has a proven source-key replacement/supersede path.
  • The desktop cache should not be globally invalidated.
  • A task-specific refresh may re-read after successful OpenCode import.
  • If cache behavior is unclear, tests should bypass cache and the rollout should leave old events unchanged.

Pseudo-policy:

type ExistingEventPolicy = 'new-imports-only' | 'supersede-by-source-key'

function chooseExistingEventPolicy(audit: {
  importerSupersedesBySourceKey: boolean
  reviewBundleDedupesBySourceKey: boolean
}): ExistingEventPolicy {
  return audit.importerSupersedesBySourceKey && audit.reviewBundleDedupesBySourceKey
    ? 'supersede-by-source-key'
    : 'new-imports-only'
}

Do not create a third policy that appends upgraded duplicates and relies on UI filtering to hide the old event.

Rollback Runbook

Rollback must be possible without data repair.

Immediate rollback:

OPENCODE_SNAPSHOT_PROOF_UPGRADE=off

Expected behavior after rollback:

  • New OpenCode backfills return to previous metadata-only/manual-only behavior for cases without exact toolpart chains.
  • Existing already-imported upgraded events remain valid historical full-text events. Do not delete them as part of rollback.
  • No new upgraded events should be imported while the flag is off.
  • Desktop review should continue to render previously imported full-text events.

If rollback is needed because upgraded duplicates were imported:

  1. Do not add renderer-side filtering as a permanent fix.
  2. Identify whether duplicates share sourceImportKey.
  3. Fix importer/source-key dedupe.
  4. Add a regression test with the duplicated event fixture.
  5. Only then consider a one-off ledger cleanup, and only with explicit user approval.

If rollback is needed because of performance:

  1. Keep diagnostics.
  2. Disable proof upgrade.
  3. Preserve exact toolpart-chain behavior.
  4. Inspect snapshot read counters and touched path counts.
  5. Re-enable only after reducing reads, not after raising limits.

Implementation Slices

Prefer these slices even if the work lands in one PR. Each slice should compile and have focused tests before the next slice starts.

  1. Diagnostics only:
    • Add typed diagnostic codes.
    • Add stats object.
    • No behavior change.
  2. Eligibility only:
    • Add feature flag parser.
    • Add needsSnapshotProof and mayUseSnapshotProof.
    • Prove off mode has no behavior change.
  3. Shadow proof:
    • Compute proof decisions and stats.
    • Return original changes to importer.
    • Compare shadow and off outputs.
  4. Single-change proof:
    • Implement formal predicates.
    • Implement create/modify/delete proof for one path/window/change.
    • Keep multi-change groups skipped.
  5. Import/idempotency validation:
    • Verify source-key dedupe or choose new-imports-only.
    • Add repeated-backfill tests.
  6. Desktop validation:
    • Verify shared rejectability consumes upgraded events safely.
    • No OpenCode-specific renderer branch unless a shared helper bug is found.
  7. Multi-change proof:
    • Implement only after ordering contract tests pass.
    • Keep behind full.
  8. Default enablement:
    • Enable single-change only after real-data smoke.
    • Enable full only in a separate rollout decision.

Stop points:

  • It is acceptable to stop after slice 3 and ship only shadow.
  • It is acceptable to stop after slice 4 and ship only single-change.
  • It is acceptable to stop after diagnostics if real data shows unsupported snapshot shape.
  • It is not acceptable to ship multi-change proof without real or synthetic chain coverage.

Definition Of Done By Mode

off

  • No behavior change from current metadata-only/full-text decisions.
  • Diagnostics may mention that the feature is disabled.
  • Tests prove no upgraded event appears in this mode.

shadow

Required before any apply mode can be default:

  • Proof attempts run for eligible OpenCode changes.
  • Importer receives the original change list.
  • Stats include would-upgrade and skipped counts.
  • No review diff, rejectability, warning, or file count changes.
  • Real-data smoke shows non-OpenCode teams unchanged.
  • Performance budget passes while proof is computed but not applied.

single-change

Required before this mode can be default:

  • Only one unresolved change for a path/window can upgrade.
  • Multi-change path/window groups are skipped with diagnostics.
  • write create/modify, edit modify, and delete cases have positive and negative tests.
  • Non-OpenCode teams are unchanged in real-data smoke.
  • Metadata-only count for OpenCode tasks decreases or stays equal.
  • No duplicate review rows after repeated backfill.

full

Required before this mode can be default:

  • Every known unknown that blocks full mode is resolved.
  • Same-path multi-change order is proven by tests.
  • Chain upgrades are all-or-nothing for unresolved changes.
  • Real-data smoke includes at least one actual multi-change chain or a synthetic fixture with equivalent shape.
  • full mode can be disabled without changing code.
  • A separate rollout decision enables full; it must not become default as a side effect of implementing single-change mode.

Edge Case Matrix

Attribution and Task Boundaries

  • No delivery context:
    • Do not run strict snapshot upgrade.
    • Keep existing backfill skipped behavior.
  • Delivery context exists but does not include the requested task:
    • Keep no-attribution behavior. Do not use compatible fallback for safe full-text.
  • Compatible attribution mode:
    • Do not upgrade to auto-safe full text.
    • Reason: task ownership is not strict enough.
  • Missing task start boundary:
    • Snapshot proof may prove file content, but task boundary warning remains.
  • Estimated end boundary:
    • Snapshot proof may prove file content, but boundary warning remains.
  • Same OpenCode session contains several tasks:
    • Only strict delivery records for the requested task are eligible.
  • Same member touches same file for two tasks:
    • Do not merge changes across delivery windows.
  • Multiple members share an OpenCode profile:
    • Require the delivery record member/lane/session match. Do not trust profile alone.
  • Runtime delivery ledger was reset after launch:
    • No strict delivery context means no safe upgrade. Keep warnings.
  • Delivery record has task refs but missing observed assistant message:
    • Do not use message-local snapshot proof unless the toolpart can still be tied to the delivered prompt through existing strict delivery matching.
  • Delivery record has a pre-prompt cursor but no post-prompt cursor:
    • Keep strict-delivery matching conservative. Do not widen to the whole session.
  • Task display id matches but canonical task id differs:
    • Use canonical task id for safe upgrades.

Snapshot Windows

  • No snapshot windows:
    • Keep metadata-only warning.
  • Toolpart outside window:
    • Keep metadata-only warning.
  • Toolpart matches multiple windows:
    • Keep metadata-only warning.
  • Window has before hash but no after hash:
    • Keep metadata-only warning.
  • Window has after hash but no before hash:
    • Allow create only if file absence is explicitly proven. Otherwise keep warning.
  • Snapshot diff contains the path but operation is unknown:
    • Keep metadata-only warning.
  • Snapshot diff includes more changed files than reconstructed toolparts:
    • Upgrade only exact reconstructed paths. Add diagnostic for extra snapshot paths.
  • Snapshot diff misses a reconstructed path:
    • Keep that path metadata-only.
  • OpenCode SQLite changed during read:
    • Existing transaction snapshot is okay, but add diagnostic.
  • OpenCode schema changed:
    • Treat as unsupported history shape, no upgrade.
  • Snapshot git store object is missing:
    • Keep metadata-only warning and include the store diagnostic.
  • Snapshot git store read times out:
    • Keep metadata-only warning and include timeout diagnostic.
  • Snapshot window hashes exist but git-store object is pruned:
    • Keep metadata-only warning and include retention diagnostic.
  • Snapshot window hashes exist but point to an object from a different project:
    • Treat as workspace mismatch and skip.
  • Snapshot window contains no reconstructed changes after path filtering:
    • Do not read files for that window.
  • Snapshot reader returns duplicate entries for one relative path:
    • Treat as ambiguous and skip that path.
  • Snapshot reader returns content for a path with different casing:
    • Use existing normalized comparison key. If identity is ambiguous, skip.
  • Snapshot window is valid but OpenCode part JSON was truncated by our reader cap:
    • Treat affected toolparts as metadata-only. Do not combine partial part data with snapshot proof.
  • Snapshot window contains changes from a tool type not modeled by this plan:
    • Keep those changes metadata-only until the tool type has explicit tests.

File Content

  • Text over size limit:
    • Keep warning with too-large.
  • Binary or null-byte:
    • Keep warning with binary.
  • Empty file:
    • Valid text content. Do not confuse empty string with unavailable.
  • Missing file after delete:
    • Valid delete if before content is known.
  • Missing file before create:
    • Valid create if after content is known.
  • File exists before create operation:
    • Operation mismatch, no upgrade.
  • File absent after modify operation:
    • Operation mismatch, no upgrade.
  • Content normalizes differently by line endings:
    • Do not normalize for proof. Exact byte-equivalent UTF-8 text comparison is required.
  • Content has invalid UTF-8:
    • Treat as binary/unavailable.
  • Generated/minified text below limit:
    • It can be upgraded if full text is available, but review UI may still choose to collapse display.
  • File mode-only changes:
    • Do not create a text diff upgrade unless text before/after also changed or mode changes are explicitly modeled.
  • Very small binary file:
    • Size does not make it text. Binary detection still wins.
  • UTF-16 or other non-UTF-8 text:
    • Treat as unavailable unless the existing snapshot reader explicitly decodes and hashes the exact same text representation used by review events.
  • Secrets in file content:
    • Do not log content in diagnostics. Existing ledger storage rules apply to before/after blobs.
  • Git LFS pointer file:
    • Treat the pointer text as the file content if that is what the snapshot contains. Do not dereference LFS objects.
  • Sparse checkout missing working-tree file:
    • Irrelevant for proof. Snapshot evidence may still be valid, but execution safety must handle current disk conflict separately.
  • Submodule path:
    • Do not read inside submodule git data unless existing snapshot reader explicitly models submodules. Treat as metadata-only otherwise.
  • Permission denied reading snapshot object:
    • Keep metadata-only warning and include a permission diagnostic.

Edit Semantics

  • oldString === newString:
    • Skip as no-op as current code does.
  • oldString missing:
    • Cannot prove edit from toolpart alone.
  • newString missing:
    • Cannot prove edit from toolpart alone.
  • oldString appears twice:
    • No upgrade unless snapshot chain proves exact transition through another trusted source.
  • newString appears twice when reversing:
    • No inverse upgrade.
  • Replacement creates same final content through multiple possible paths:
    • No upgrade.
  • replaceAll or multi-replacement edit shape appears:
    • Do not upgrade until that tool shape is explicitly parsed and tested.
  • Edit tool reports success but snapshot before does not contain oldString:
    • No upgrade.
  • Edit tool reports success but snapshot after does not contain newString:
    • No upgrade unless the replacement legitimately deletes the string and the exact transition verifies.
  • Empty oldString:
    • Do not upgrade. Empty search strings are ambiguous.
  • Empty newString:
    • Valid deletion only when oldString occurs exactly once and snapshot after equals the deletion result.
  • Overlapping replacements:
    • Do not upgrade unless exact before-to-after application has one valid path.

Write Semantics

  • Write creates new file:
    • Upgrade only if before absent and after content known.
  • Write overwrites existing file:
    • Upgrade only if snapshot before and snapshot after are known.
  • Toolpart content differs from snapshot after:
    • No upgrade.
  • Toolpart content is truncated:
    • Snapshot can still prove after only if snapshot after is available and operation is fully verified.
    • Keep a diagnostic that toolpart content was truncated but snapshot proof was used.
  • Existing file baseline unavailable:
    • Keep current warning.
  • Write after earlier edit in the same path/window:
    • Treat as chain case. Do not single-change upgrade.
  • Write followed by edit in the same path/window:
    • Treat as chain case. Single-change upgrade is not enough.
  • Write content is available but snapshot after is unavailable:
    • Do not use toolpart content alone for existing-file overwrite baseline.
  • Write content equals previous content:
    • It may be a no-op. Do not create a misleading modify diff unless snapshot shows a real text state transition or the review event model supports no-op changes explicitly.
  • Write creates parent directories:
    • Only the file text is in scope. Directory creation is not a text proof.

Apply Patch Semantics

  • Patch text unavailable:
    • Snapshot can prove final file-level before/after only if window has exact path anchor.
  • Parsed update hunks apply exactly once:
    • Allow inverse chain proof.
  • Parsed hunks apply multiple places:
    • No upgrade.
  • Patch creates/deletes file:
    • Verify operation with snapshot before/after states.
  • Patch touches files not in metadata:
    • Add diagnostic and do not upgrade missing paths unless snapshot path proof is exact.
  • Patch contains rename:
    • Do not upgrade as text modify unless rename support is explicitly modeled.
  • Patch changes file mode only:
    • Keep metadata-only unless mode changes are supported by the review event schema.
  • Patch contains CRLF-sensitive context:
    • Exact text verification is required. Do not line-ending-normalize.
  • Patch partially applies in reverse:
    • No upgrade. All hunks must verify.
  • Patch has context-only hunks:
    • Do not treat context as a change without before/after text proof.
  • Patch deletes and recreates the same file in one patch:
    • Treat as ambiguous unless the parser explicitly models it and tests cover it.

Paths and Workspaces

  • Absolute paths outside workspace:
    • Reject upgrade.
  • .. path traversal:
    • Reject upgrade.
  • Windows path separators:
    • Normalize, then validate.
  • Symlink points outside workspace:
    • Do not read current disk. Snapshot git store path normalization should be trusted only for repository paths.
  • Session directory is subdirectory:
    • Touched paths outside session directory get diagnostic. Do not let this alone prove or disprove content.
  • Case-insensitive filesystem:
    • Use existing OpenCode path comparison helpers.
  • Unicode normalization differences in file names:
    • Use existing normalized path keys. Do not add a second normalization scheme in this feature.
  • Nested git repository inside workspace:
    • Verify snapshot identity against the OpenCode project worktree, not just the process cwd.
  • Worktree moved after task:
    • Use recorded project identity. If workspace identity cannot be trusted, no upgrade.
  • Workspace root is a symlink:
    • Use existing workspace comparison helpers. Do not add ad-hoc realpath behavior unless tests cover both symlinked and non-symlinked roots.
  • File path contains newline or control characters:
    • Do not include raw path in diagnostics without escaping. Upgrade only if existing path normalization accepts it safely.
  • Case-only rename:
    • Treat as rename/path operation, not a text modify, unless the review event schema explicitly models it.
  • Path appears both as file and directory across before/after:
    • Keep metadata-only unless snapshot reader explicitly models the transition.

Concurrency and Later Changes

  • Current disk changed after task:
    • Irrelevant. Do not use current disk for proof.
  • Another member changed same file after OpenCode task:
    • Snapshot proof remains historical. Review conflict detection must happen elsewhere.
  • Backfill runs twice:
    • Source import keys must dedupe.
  • Backfill interrupted:
    • Existing ledger import must remain idempotent.
  • OpenCode host is still writing SQLite:
    • Rely on read-only transaction and existing fingerprint diagnostics. Do not retry aggressively.
  • Two backfills run concurrently:
    • Existing in-flight dedupe should prevent duplicate desktop calls. The importer must still dedupe by source key.
  • User manually edits a file while review is open:
    • Snapshot proof remains historical. Apply/reject conflict handling is outside this feature.
  • Team is relaunched while backfill is running:
    • Use run/session identity from the delivery context. Do not merge new runtime sessions into the old task proof.
  • Snapshot proof succeeds but ledger import fails:
    • Retry should be idempotent by source key. Diagnostics should not mark the task as safely upgraded until import succeeds.
  • Snapshot store is pruned between capability check and file read:
    • Treat the read failure as metadata-only fallback. Do not retry from current disk.
  • OpenCode writes a new assistant message while backfill reads SQLite:
    • Use the read-only transaction snapshot and existing fingerprint diagnostics. Do not merge later rows into the current proof attempt.
  • SQLite WAL is corrupt or cannot be read:
    • Treat session history as unavailable/unsupported. Do not use partial rows for safe proof.
  • OpenCode JSON row is malformed:
    • Skip that row and keep affected changes metadata-only. Do not infer from surrounding rows.

UI And Review Semantics

  • Full-text upgrade enables normal diff rendering only if the imported event has both safe baseline and safe target state.
  • Metadata-only warnings should remain visible and should not be hidden by task summary aggregation.
  • Reject All must still skip non-rejectable files.
  • A task-level warning may remain even when all file diffs are full-text.
  • A file-level warning may remain even when another file in the same task is upgraded.
  • Do not change viewed-count behavior in this feature.
  • Do not hide task cards solely because all OpenCode warnings were resolved.
  • Do not change accept/reject button labels or statuses in this feature.
  • Do not mark a file viewed just because snapshot proof succeeded.
  • A file becoming review-safe does not mean reject execution must succeed if the user changed the worktree after the task.
  • Conflict messaging for apply/reject should remain the existing shared behavior, not a new OpenCode-only message path.
  • If a task card warning disappears because all file-level OpenCode baseline warnings were resolved, task-boundary and attribution warnings must still remain visible.
  • The UI should not describe a snapshot-upgraded file as "guaranteed safe". It is "review-safe" or "full-text verified"; execution can still conflict.
  • Do not add success toasts or celebratory messaging for proof upgrades. This is infrastructure, not a user-facing achievement.

Security And Privacy

  • Do not log before/after content.
  • Do not include long snippets in diagnostics.
  • Do not include raw paths with control characters in diagnostics.
  • Do not include delivery payload text in proof stats.
  • Do not expand file size limits for convenience.
  • Do not add a new IPC path that exposes arbitrary snapshot reads.
  • If a file is upgraded, it is stored through the existing task-change ledger content path. Do not add a second storage location.

Serialization And Backward Compatibility

  • Older desktop builds may see new diagnostics but should not require a new event schema to render metadata-only fallback.
  • Missing snapshotSource should not crash review rendering.
  • Missing snapshotId should not crash review rendering.
  • Unknown evidenceProof values should be treated as unsafe by review safety helpers.
  • JSON serialization must preserve empty string content.
  • JSON serialization must distinguish absent file from empty file.
  • Large content omitted by limits must serialize as unavailable state, not empty content.

Test Plan

Risk To Test Traceability

Risk Required test/smoke
Cross-task contamination real-data smoke with at least two tasks in one OpenCode session
Cross-member contamination fixture with shared profile but different member/lane
Wrong snapshot window unit test with overlapping windows and outside-window toolpart
False baseline from current disk unit test proving current disk is never consulted
Unsafe warning removal unit test with unrelated manual-only warning preserved
Duplicate imported events repeated-backfill bridge test
Performance regression smoke budget with snapshot read counters
Unsupported OpenCode shape snapshot provider unsupported-shape test
Mixed safe/unsafe task desktop integration test for Reject All skipping metadata-only
Cache stale result bridge or desktop worker test bypassing/invalidating cache deliberately
Capability false positive fixture with snapshot enabled but missing store object
Shadow mode mutation fingerprint comparison between off and shadow
Snapshot retention loss fixture where window exists but object read fails
Execution conflict bypass desktop/review test where current disk differs from expected after
Memory/storage blowup fixture with many small files exceeding total byte budget
Malformed OpenCode rows offline reader/reconstructor fixture with malformed part JSON

Negative Control Fixtures

Negative controls are cases that look close to valid proof but must not upgrade.

Required negative controls:

  • Same file path, same member, but different task id.
  • Same task id, same file path, but different member/lane.
  • Same session and file path, but toolpart outside the snapshot window.
  • Snapshot before/after text exists, but oldString occurs twice.
  • Snapshot after equals toolpart content, but before is unavailable.
  • Snapshot path matches, but operation is rename or mode-only.
  • Current disk matches expected after, but snapshot before is missing.
  • shadow computes an upgrade decision, but imported fingerprint matches off.
  • Existing metadata-only event appears before upgraded event with same source key.
  • Unknown evidenceProof appears in imported data.

Each negative control should assert both behavior and diagnostic reason. A negative control without a reason is hard to debug and easy to regress.

Golden Fixture Coverage Matrix

Maintain a small set of golden fixtures that cover the supported state space.

Fixture Mode Expected
write-create-text single-change upgraded create
write-modify-text single-change upgraded modify
edit-modify-once single-change upgraded modify
delete-text single-change upgraded delete
duplicate-old-string single-change metadata-only
missing-before single-change metadata-only
toolpart-outside-window single-change metadata-only
shadow-valid-edit shadow would-upgrade stats, original import
non-opencode-task all modes unchanged fingerprint
missing-snapshot-object all apply modes metadata-only
multi-change-chain single-change skipped
multi-change-chain full upgraded only if chain verifies

Golden fixtures should be tiny and deterministic. They should not depend on wall-clock time, filesystem case behavior, or the user's current worktree.

Unit Tests

Add or extend OpenCodeChangeEvidenceEnricher.test.ts.

Tests:

  1. Upgrades metadata-only edit from exact snapshot before/after.
  2. Does not upgrade edit when oldString appears twice.
  3. Does not upgrade edit when snapshot after does not equal applied result.
  4. Upgrades write create with before absent and after text.
  5. Upgrades write modify when toolpart content equals snapshot after.
  6. Does not upgrade write modify when toolpart content differs from snapshot after.
  7. Upgrades delete with before text and after absent.
  8. Does not remove attribution warnings after content proof.
  9. Keeps manual-only warning when anchor has unavailable before/after content.
  10. Multi-edit same-path chain upgrades only when the whole chain verifies.
  11. Multi-edit same-path chain keeps all metadata-only fallbacks when one step is ambiguous.
  12. Snapshot provider unavailable keeps current behavior.
  13. Does not remove unrelated manual-only warning text.
  14. Keeps task boundary warnings after successful content proof.
  15. Does not upgrade compatible attribution mode.
  16. Does not upgrade when feature flag is off.
  17. Single-change mode skips multi-change chain upgrade.
  18. Empty file create and empty file modify are handled as valid text.
  19. CRLF/LF mismatch fails proof instead of normalizing.
  20. Duplicate source import keys block chain upgrade.
  21. Empty newString deletion upgrades only with exact single occurrence.
  22. Empty oldString never upgrades.
  23. State hashes must match emitted full text.
  24. Snapshot anchor duplicate path entry skips upgrade.
  25. write no-op does not create a misleading diff.
  26. Skipped proof preserves the original change object fields.
  27. Successful proof mutates only allowed fields.
  28. Snapshot proof decision returns typed skipped reason, not null.
  29. Unsupported snapshot shape never upgrades.
  30. Existing-event policy defaults to new-imports-only when dedupe is unknown.
  31. State machine cannot jump from candidate to upgraded without transition verification.
  32. Unstable part order blocks multi-change upgrade.
  33. Unknown evidenceProof is unsafe in review safety helper.
  34. Empty string survives materialization and serialization.
  35. Absent file is not serialized as empty file.
  36. shadow mode computes proof stats but returns original changes.
  37. mayApplySnapshotProof blocks multi-change groups in single-change.
  38. Exhaustive switches fail compilation when a new mode/proof decision is not handled.
  39. Capability success is required before snapshot proof attempt.
  40. Missing snapshot git-store object keeps metadata-only fallback.
  41. off and shadow review bundle fingerprints match.
  42. Non-OpenCode fingerprints are identical across all modes.
  43. Review-safe upgraded change still fails reject execution when current disk mismatches expected after.
  44. Total byte budget skips excess files as metadata-only.
  45. Malformed OpenCode part JSON cannot produce upgraded proof.
  46. LFS pointer text is not dereferenced.
  47. Submodule paths stay metadata-only unless explicitly modeled.
  48. Runtime postcondition failure preserves original metadata-only change.
  49. Every golden fixture has a paired negative control.
  50. Minimum safe scope excludes unsupported operation shapes.

Example fixture shape:

const change: ReconstructedOpenCodeToolChange = {
  taskId: 'task-1',
  taskRef: 'task-1',
  taskRefKind: 'canonical',
  teamName: 'team',
  memberName: 'alice',
  sessionId: 'session',
  assistantMessageId: 'message-1',
  toolUseId: 'tool-1',
  sourcePartId: 'part-1',
  sourceMessageId: 'message-1',
  sourceTool: 'edit',
  sourceImportKey: 'session:part-1:src/app.ts',
  filePath: '/workspace/src/app.ts',
  relativePath: 'src/app.ts',
  beforeContent: null,
  afterContent: null,
  operation: 'modify',
  confidence: 'medium',
  attributionMethod: 'delivery-ledger-taskrefs',
  oldString: 'const value = 1',
  newString: 'const value = 2',
  beforeState: { exists: true, unavailableReason: 'opencode-edit-baseline-not-captured' },
  afterState: { exists: true, unavailableReason: 'opencode-edit-final-content-unavailable' },
  evidenceProof: 'metadata-only-fallback',
  warnings: ['OpenCode edit was captured without a proven full-text baseline; apply/reject is manual-only.'],
  timestamp: new Date(0).toISOString(),
}

Synthetic Fixture Schema

Use a compact fixture builder so edge cases do not depend entirely on live OpenCode data.

type SnapshotProofFixture = {
  name: string
  mode: SnapshotProofUpgradeMode
  attributionMode: OpenCodeLedgerAttributionMode
  delivery: {
    teamName: string
    taskId: string
    memberName: string
    laneId?: string
    sessionId: string
    assistantMessageId: string
  }
  windows: Array<{
    messageId: string
    windowId: string
    fromSnapshot: string
    toSnapshot: string
    startPartOrder: number
    finishPartOrder: number
  }>
  parts: Array<{
    partId: string
    messageId: string
    order: number
    tool: 'write' | 'edit' | 'apply_patch'
    filePath: string
    oldString?: string
    newString?: string
    content?: string
  }>
  snapshotFiles: Array<{
    relativePath: string
    beforeContent?: string
    afterContent?: string
    beforeExists: boolean
    afterExists: boolean
  }>
  expected: {
    upgraded: number
    metadataOnly: number
    diagnostics: string[]
  }
}

Fixture rules:

  • Every positive fixture needs a paired negative fixture that differs by one proof condition.
  • Fixtures should prefer tiny strings so failures are easy to inspect.
  • Fixtures must include at least one empty string case and one absent-file case.
  • Fixtures must include one path with unsafe characters for diagnostics escaping.
  • Fixtures must not include secrets or large blobs.

Snapshot Provider Tests

Extend OpenCodeSnapshotEvidenceProvider.test.ts.

Tests:

  1. Groups only unresolved proof-needed changes into touched paths.
  2. Emits diagnostic for missing window.
  3. Emits diagnostic for ambiguous window.
  4. Preserves existing limits.
  5. Does not read snapshot for unrelated exact changes.
  6. Does not match windows across assistant messages.
  7. Emits diagnostic for extra snapshot paths not in reconstructed toolparts.
  8. Emits timeout diagnostic while preserving metadata-only fallback.
  9. Does not read snapshot windows with no unresolved touched paths.
  10. Escapes unsafe path text in diagnostics.

Ledger Bridge Tests

Extend OpenCodeLedgerBridgeService tests or add a focused fixture test.

Tests:

  1. Backfill imports upgraded full-text event for strict delivery OpenCode edit.
  2. Backfill keeps metadata-only event for compatible attribution.
  3. Backfill keeps metadata-only event with no delivery context.
  4. Imported event has stable source import key and dedupes on rerun.
  5. snapshotShapeFingerprint is present when snapshot proof was used.
  6. Repeated backfill does not duplicate file entries.
  7. Old metadata-only imported event is not rewritten unless importer already supports superseding by source key.
  8. Snapshot proof is not attempted for Codex or Anthropic members.
  9. Snapshot proof is not attempted for OpenCode exact toolpart-chain changes.
  10. Backfill cache does not return stale metadata-only data after an upgraded import in the same test.
  11. Import failure leaves no partial safe-review state.

Desktop Integration Tests

Only if needed. The desktop review UI already handles full text and metadata-only.

Smoke check:

  1. Full-text OpenCode upgraded event renders a real diff.
  2. Metadata-only event still renders manual-only warning.
  3. Reject is enabled only for full-text safe baseline.
  4. Warnings remain visible for task boundary uncertainty.
  5. Reject All skips a mixed task where one OpenCode file upgraded and another stayed metadata-only.
  6. Current disk preview remains read-only and does not become a reject baseline.
  7. Viewed count is unchanged by proof upgrade.
  8. Task-level boundary warning remains visible after all file diffs upgrade.
  9. Reject execution still blocks when current disk no longer matches the expected after state.
  10. Bulk Reject All rejects only files that pass both review-safety and execution-safety checks.

Property-Like Tests

Add small table-driven tests for transition verification:

const editCases = [
  { name: 'single replacement', before: 'a = 1', oldString: '1', newString: '2', after: 'a = 2', ok: true },
  { name: 'duplicate old', before: 'a 1 b 1', oldString: '1', newString: '2', after: 'a 2 b 1', ok: false },
  { name: 'empty old', before: 'abc', oldString: '', newString: 'x', after: 'xabc', ok: false },
  { name: 'delete exactly once', before: 'abc', oldString: 'b', newString: '', after: 'ac', ok: true },
]

The point is not random fuzzing. The point is to make ambiguous replacement rules explicit and hard to regress.

Real Data Smoke

Before implementation, capture a baseline:

time pnpm test --run test/main/services/team/TaskChangeComputer.test.ts
time pnpm test --run test/main/services/team/ChangeExtractorService.test.ts

After implementation, run the same commands:

pnpm test --run test/main/services/team/TaskChangeComputer.test.ts
pnpm test --run test/main/services/team/ChangeExtractorService.test.ts

Then run the existing real-data smoke scripts used for task changes. Required checks:

  • errors: 0
  • no increase in item errors
  • no cross-task file leakage
  • no increase in metadata-only count for OpenCode tasks
  • no change for Codex-only teams
  • no change for Anthropic-only teams
  • broad smoke runtime increase <= 10%
  • snapshot timeout count <= 2
  • upgraded OpenCode full-text count is explainable by diagnostics
  • no decrease in task-boundary warnings unless task-boundary code changed separately
  • off and shadow fingerprints match except diagnostics/stats
  • non-OpenCode fingerprints match in all modes

Manual target cases:

  • relay-works-3/#1f735bea
  • relay-works-3/#bf01e5c3
  • relay-works-3/#43e6b9b0 should remain Codex-related, not OpenCode-upgraded
  • signal-ops-22 should remain unaffected because it has no OpenCode members
  • any OpenCode team with real snapshotShapeFingerprint present in diagnostics
  • one team with missing/reset delivery ledger, if available

Add at least one synthetic OpenCode snapshot fixture if real data lacks a clean single-change full-text snapshot case. Real data validates integration, but a synthetic fixture is better for precise edge cases.

Real-data smoke should compare before/after summaries:

Before:
- OpenCode metadata-only file changes: N
- OpenCode full-text file changes: M
- non-OpenCode full-text file changes: X
- task-boundary warnings: B

After:
- OpenCode metadata-only file changes: <= N
- OpenCode full-text file changes: >= M
- non-OpenCode full-text file changes: X
- task-boundary warnings: B

Any non-OpenCode count change is a blocker.

Failure Injection Tests

Add targeted failure injection where practical:

  • Snapshot provider throws.
  • Snapshot provider times out.
  • Snapshot provider returns duplicate path entries.
  • Ledger importer rejects the batch.
  • Backfill runs twice with the same source import key.
  • Feature flag changes from single-change to off.
  • Snapshot proof succeeds for one file and fails for another file in the same task.

Expected result for every failure injection: original metadata-only safety is preserved, no duplicate review rows, diagnostics explain the skip/failure.

Serialization Tests

Add tests around the task-change event materialization boundary:

  • beforeContent: '' remains empty string.
  • afterContent: '' remains empty string.
  • beforeContent: null remains unavailable/absent according to state.
  • Unknown evidenceProof does not make a file rejectable.
  • Snapshot fields survive import/export if present.
  • Snapshot fields may be absent without renderer crashes.

Manual QA Runbook

Manual QA is not a substitute for tests, but it helps catch integration mistakes.

Prepare:

  1. Pick one OpenCode team with snapshot evidence.
  2. Pick one Codex-only or Anthropic-only team.
  3. Record before counts for:
    • OpenCode metadata-only files.
    • OpenCode full-text files.
    • non-OpenCode full-text files.
    • task-boundary warnings.
    • snapshot proof diagnostics.

Run with off:

OPENCODE_SNAPSHOT_PROOF_UPGRADE=off pnpm test --run test/main/services/team/TaskChangeComputer.test.ts

Expected:

  • No new upgraded OpenCode snapshot events.
  • Existing exact toolpart-chain behavior unchanged.

Run with shadow:

OPENCODE_SNAPSHOT_PROOF_UPGRADE=shadow pnpm test --run test/main/services/team/TaskChangeComputer.test.ts

Expected:

  • Snapshot proof stats are emitted.
  • Would-upgrade counts are visible.
  • Imported/reviewed changes are identical to off.
  • Any difference from off outside diagnostics is a blocker.

Run with single-change:

OPENCODE_SNAPSHOT_PROOF_UPGRADE=single-change pnpm test --run test/main/services/team/TaskChangeComputer.test.ts

Expected:

  • OpenCode full-text count may increase.
  • OpenCode metadata-only count may decrease or stay equal.
  • non-OpenCode counts are unchanged.
  • Multi-change groups are skipped with diagnostics.

Run with full only after tests pass:

OPENCODE_SNAPSHOT_PROOF_UPGRADE=full pnpm test --run test/main/services/team/TaskChangeComputer.test.ts

Expected:

  • Same guarantees as single-change.
  • Multi-change upgrades appear only when diagnostics can explain the full chain.

UI spot check:

  • Open a mixed task with one upgraded file and one metadata-only file.
  • Verify the upgraded file shows a diff.
  • Verify the metadata-only file still shows a warning.
  • Verify Reject All skips metadata-only files.
  • Verify current disk preview is not treated as baseline.
  • Verify task boundary warnings remain if present.

Any mismatch is a blocker.

Acceptance Criteria

The implementation is acceptable only if all are true:

  • OpenCode-only behavior changed.
  • Strict delivery remains required for snapshot full-text upgrades.
  • Exact existing toolpart-chain behavior is unchanged.
  • Metadata-only fallback still works.
  • No current disk content is used as historical proof.
  • No broad OpenCode session scan is introduced.
  • Snapshot read limits are unchanged or narrower.
  • Ambiguous chains keep warnings.
  • Large and binary files keep warnings.
  • Tests cover same-path multi-change chains.
  • Real-data smoke shows no cross-task leakage.
  • Feature flag can disable the upgrade.
  • Repeated backfill does not duplicate review files.
  • Warning removal is limited to known resolved warning predicates.
  • Performance budgets pass.
  • The implementation has an explicit fallback for unsupported OpenCode snapshot shapes.
  • The implementation includes Phase 0 contract audit notes in the PR/commit description or test output.
  • No warning is removed unless a unit test names that exact warning or predicate.
  • No current-disk preview path is involved in a proof decision.
  • No behavior change occurs when OPENCODE_SNAPSHOT_PROOF_UPGRADE=off.
  • Smoke output includes attempted/upgraded/skipped counts.
  • Full mode is not enabled while any known unknown remains unresolved.
  • Existing metadata-only events are not rewritten unless source-key supersede is proven by tests.
  • Cache behavior is documented in the Phase 0 audit.
  • Composite proof identity is enforced before snapshot text is trusted.
  • Toolpart ordering is explicitly verified before multi-change upgrades.
  • single-change and full have separate definitions of done.
  • Serialization preserves empty string versus absent file.
  • shadow mode proves expected upgrades without changing imported review events.
  • Exhaustive handling covers every proof decision and feature flag mode.
  • Capability gates are checked per session, not inferred from config alone.
  • Missing/pruned snapshot store objects preserve metadata-only fallback.
  • Deterministic fingerprints prove non-OpenCode behavior is unchanged.
  • Apply/reject execution safety still checks current disk state after review proof succeeds.
  • Storage and memory budgets are enforced without duplicate blob storage.
  • Malformed/truncated OpenCode rows cannot produce upgraded proof.
  • First apply rollout stays within the minimum safe scope.
  • Negative controls prove close-but-invalid cases remain metadata-only.
  • Runtime postcondition failures preserve original changes.

Verification Command Matrix

Use the narrowest useful commands first, then broader smoke.

Layer Command or check Required result
Typecheck pnpm typecheck passes
Enricher unit targeted OpenCodeChangeEvidenceEnricher tests passes
Snapshot provider targeted OpenCodeSnapshotEvidenceProvider tests passes
Ledger bridge targeted OpenCodeLedgerBridgeService tests passes
Desktop review safety targeted review/rejectability tests passes
Off mode task-change tests with OPENCODE_SNAPSHOT_PROOF_UPGRADE=off old behavior
Shadow mode task-change tests with OPENCODE_SNAPSHOT_PROOF_UPGRADE=shadow stats only
Single-change mode task-change tests with OPENCODE_SNAPSHOT_PROOF_UPGRADE=single-change only one-change upgrades
Full mode task-change tests with OPENCODE_SNAPSHOT_PROOF_UPGRADE=full only after chain tests
Real data existing task-change smoke on OpenCode and non-OpenCode teams no leakage

Do not use full smoke as a substitute for single-change smoke. They prove different safety boundaries.

Code Review Checklist

Use this checklist before merging the implementation:

  • Every upgraded change has a non-metadata-only-fallback proof.
  • Every upgraded modify has both beforeContent and afterContent.
  • Every upgraded create has beforeState.exists === false and afterContent.
  • Every upgraded delete has beforeContent and afterState.exists === false.
  • State hashes match emitted content.
  • No branch reads current disk as proof.
  • No branch catches an error and upgrades anyway.
  • Every skipped branch preserves the original change.
  • Warning stripping uses a central predicate.
  • Multi-change mode can be disabled independently.
  • Snapshot reader limits are unchanged or narrower.
  • Tests include at least one negative case for every positive upgrade case.
  • Real-data smoke includes at least one OpenCode team and one non-OpenCode team.
  • No new IPC or filesystem read path bypasses existing workspace trust checks.
  • No content appears in diagnostics, metrics, or thrown error messages.
  • off mode is covered by a test and is easy to use during rollback.
  • The proof logic is structured so forbidden state transitions are not possible without an obvious code review smell.
  • shadow mode has been run on real data before any apply mode is enabled.
  • Any new union member requires an exhaustive switch update, not a permissive default branch.
  • Review-safe and execution-safe are checked separately.
  • Large-file and total-byte budget tests prove metadata-only fallback.
  • Minimum safe scope is visible in code structure, not only in tests.
  • Negative controls exist for task, member, window, baseline, and operation mismatch.

Implementation Anti-Patterns

Do not implement the feature using these patterns:

  • A broad try/catch that returns an upgraded change on partial data.
  • Mutating the original change object in place before proof has succeeded.
  • Removing warnings before the final proof decision.
  • Reading current disk to fill beforeContent.
  • Comparing normalized line endings for proof.
  • Treating a matching hash as content.
  • Creating OpenCode-specific rejectability logic in the renderer.
  • Appending upgraded duplicate events and expecting UI sorting to hide stale ones.
  • Increasing snapshot limits to make a test pass.
  • Falling back from strict delivery to compatible attribution for safety.
  • Adding full mode as the default in the same change that introduces it.
  • Treating empty string as missing content.
  • Treating missing content as empty string.
  • Sorting same-path chains by only one field.
  • Retrying snapshot reads in a loop without a budget.
  • Treating review-safe as automatically execution-safe.
  • Adding a second blob store or cache for before/after content.
  • Dereferencing Git LFS or submodule content outside the existing snapshot reader.
  • Using malformed partial OpenCode JSON rows as proof context.
  • Expanding the first apply mode to unsupported operations because the snapshot text happens to be available.
  • Hiding uncertainty by changing user-facing wording from warning to success.

Rollout Strategy

  1. Land diagnostics and helper functions with no behavior change if practical.
  2. Add the feature flag with default off in tests where needed.
  3. Add snapshot-first upgrade for single-change same-path cases.
  4. Run targeted tests and real-data smoke.
  5. Enable single-change mode for local smoke.
  6. Add multi-change chain upgrade only after tests are solid.
  7. Move to full mode only if multi-change smoke is clean.
  8. Inspect warnings before and after for OpenCode tasks.

If multi-change support looks risky during implementation, stop after single-change mode. Single-change upgrade is already useful and lower risk.

Recommended shipping sequence:

PR 1: diagnostics + eligibility helpers + no behavior change
PR 2: single-change snapshot proof upgrade behind flag
PR 3: enable single-change by default for OpenCode strict delivery
PR 4: multi-change chain upgrade behind flag
PR 5: enable full mode only after real-data smoke

If this stays as one PR, keep the same commit structure locally and verify each step before moving to the next one.

Abort Conditions

Do not continue implementation if any of these happens:

  • Snapshot windows cannot be reliably matched to toolparts.
  • Existing OpenCode snapshot shape differs from tests in real data.
  • Real-data smoke shows any new cross-task file leakage.
  • Performance smoke shows repeated timeouts.
  • A change would require using current disk as proof.
  • A change would require broad compatible attribution scanning.
  • Warning stripping needs broad substring matching to pass tests.
  • Multi-change support requires accepting ambiguous edit/apply-patch replacements.
  • Source import key dedupe behavior is unclear.
  • The only available validation is manual UI inspection.
  • A test has to assert against current wall-clock timing without a stable budget.
  • Formal proof predicates require exceptions to support the first implementation.
  • Postconditions fail for any positive fixture.
  • single-change mode needs multi-change assumptions to pass.
  • full mode needs renderer-specific special cases to appear safe.
  • Rollback with OPENCODE_SNAPSHOT_PROOF_UPGRADE=off does not restore old behavior for new backfills.
  • Empty content and missing content cannot be distinguished at serialization.

Open Questions Template For Implementation PR

Every implementation PR should answer these in its description:

OpenCode snapshot proof PR checklist:
- Mode implemented: diagnostics | single-change | full
- Default mode:
- Phase 0 contract audit completed: yes | no
- Source import key duplicate policy:
- Review bundle dedupe key:
- Rejectability helper:
- Existing event policy: new-imports-only | supersede-by-source-key
- Snapshot shape fingerprint observed:
- Real-data teams tested:
- Non-OpenCode teams unchanged: yes | no
- Snapshot proof stats:
- Rollback tested with OPENCODE_SNAPSHOT_PROOF_UPGRADE=off: yes | no
- Known unknowns remaining:

If the PR cannot answer one of these, it should not enable new behavior by default.

Example Final Change Shape

Before upgrade, metadata-only edit:

{
  "sourceTool": "edit",
  "before": {
    "exists": true,
    "unavailableReason": "opencode-edit-baseline-not-captured"
  },
  "after": {
    "exists": true,
    "unavailableReason": "opencode-edit-final-content-unavailable"
  },
  "evidenceProof": "metadata-only-fallback",
  "warnings": [
    "OpenCode edit was captured without a proven full-text baseline; apply/reject is manual-only."
  ]
}

After verified snapshot upgrade:

{
  "sourceTool": "edit",
  "before": {
    "exists": true,
    "sha256": "before-hash",
    "sizeBytes": 128
  },
  "after": {
    "exists": true,
    "sha256": "after-hash",
    "sizeBytes": 128
  },
  "evidenceProof": "opencode-snapshot",
  "snapshotSource": "opencode",
  "warnings": []
}

If any proof check fails, the event must stay in the first shape.

Notes for Future Maintainers

The important invariant is not "fewer warnings". The invariant is "warnings are removed only when the system has stronger evidence than before".

Warnings are correct when historical full text is not proven. A warning is a better outcome than an unsafe reject button.