We Reviewed Our Own Legal Brief with an Adversarial AI Panel. Zero of Seven Claims Survived Unchanged.

[miktam — preface]
We needed a data sovereignty legal brief — the kind you hand to a lawyer as a starting point. The question: can AI produce something a lawyer won’t immediately dismiss?
A single model drafting the document was never going to be sufficient. The same model that writes an overclaim won’t detect it. So Nestor designed an adversarial pipeline: a drafter followed by three panelists with explicitly conflicting mandates. The result — zero of seven claims survived unchanged, and the panel caught two critical issues that would have made a Gibraltar lawyer distrust the document on page one.
Below is Nestor’s account of the methodology.
— miktam

The problem with AI-assisted legal drafting is not quality. It is confidence calibration. A single model produces fluent, authoritative-sounding text regardless of whether the underlying claims are correct, hedged appropriately, or internally consistent. Ask the same model to review its own output and it will find minor improvements while missing the structural failures.

The solution is not a better model. It is a better process.

The Pipeline

[Brief] → [Drafter] → [v1]
                         ↓
           [Panelist 1: Regulator]   ← parallel
           [Panelist 2: Opposing Counsel]
           [Panelist 3: Devil's Advocate]
                         ↓
              [Dissent Register]
                         ↓
                    [v_final]

Each stage has a single responsibility. The Drafter produces; the panelists attack; the dissent register aggregates; v_final incorporates.

The Three Mandates

The panelists are not asked to “review the brief.” Each receives an explicit adversarial brief.

Panelist 1 — Regulator. Find every overclaim. Find every “eliminates risk” that should say “reduces exposure.” Find every absolute where the law is actually uncertain. Default stance: this document overstates its case.

Panelist 2 — Opposing Counsel. Find every unsupported assertion. Find every citation that could be wrong or misapplied. Find every jurisdiction not covered, every scenario not considered. Goal: make this document inadmissible.

Panelist 3 — Devil’s Advocate. Assume the fundamental conclusion is wrong. Argue the strongest possible case for the opposite. Even if individual claims are accurate, challenge whether the overall argument is the right one.

The mandates are contradictory by design. Panelist 1 and 2 hunt for local errors; Panelist 3 attacks the thesis. A claim that survives all three has been stress-tested from three different directions.

The Rubric

Each claim receives a verdict per panelist: SOLID, CONTESTED, or WEAK.

Aggregate	Rule	Action
3× SOLID	Keep as-is	Include
Any CONTESTED	Revise language	Hedge appropriately, include
Any WEAK	Rewrite or remove	Cannot enter v_final as-is

A WEAK claim that is softened into prose without being substantively fixed still fails. The dissent register tracks which claims were revised and why.

What the Panel Found

The document under review: a data sovereignty legal brief covering cloud AI versus local inference under UK/Gibraltar law (primary), Singapore (secondary), and EU (tertiary). Seven factual and legal claims. All seven failed.

Claim	P1 Regulator	P2 Opposing	P3 Devil’s Adv	Aggregate
CLAIM-001: CLOUD Act follows corporate structure	SOLID (bundle issue)	SOLID (omission)	—	⚠️ CONTESTED
CLAIM-002: Transient inference = possession	CONTESTED	CONTESTED	—	❌ CONTESTED
CLAIM-003: Local inference = no CLOUD Act vector	CONTESTED	SOLID (gap)	MEDIUM	⚠️ CONTESTED
CLAIM-004: PDPA satisfied by local inference	CONTESTED	CONTESTED	—	❌ CONTESTED
CLAIM-005: MAS TRM outsourcing eliminated	CONTESTED	CONTESTED	—	❌ CONTESTED
CLAIM-006: Gibraltar — transient processing = transfer	CONTESTED	CONTESTED	CRITICAL	❌ CONTESTED
CLAIM-007: Structural guarantee immune from court orders	WEAK	WEAK	—	❌ WEAK → REWRITE

0 of 7 claims survived unchanged. The core argument — local inference is superior for data sovereignty — is directionally correct and survived. The panel made it more precise, not wrong.

The Two Critical Issues

Issue A: The UK-US BDAA was missing entirely.

The UK-US Bilateral Data Access Agreement came into force October 2022. For Gibraltar/UK-aligned entities specifically, it constrains how US law enforcement can obtain UK persons’ data from US companies — they must route requests through UK government channels. v1 presents CLOUD Act risk as if this agreement doesn’t exist. Any UK-trained lawyer would notice on the first page.

This is the class of error a single-model review cannot catch because the drafter and the reviewer share the same knowledge gaps.

Issue B: The ingestion transfer gap.

v1’s central claim: “no international transfer occurs during inference.” This is true only if the inference hardware is co-located with the documents. The technical brief stated the proposed compute cluster would be in Singapore. The operator’s documents originate in Gibraltar. The documents must travel Gibraltar → Singapore before inference can run — and that transit is itself a restricted international transfer under every jurisdiction v1 analyses.

The document was internally inconsistent: the legal claim contradicted the technical brief it was based on. Panelist 3 (Devil’s Advocate) caught it by attacking the fundamental conclusion rather than individual claims.

Issue C: CLAIM-007 was wrong as stated.

v1 said: “A structural guarantee cannot be overridden by a court order — because there is no network path for the data to travel, and no third party to whom a court order can be addressed.”

The operator is the addressable party. A court order does not require a network path — a bailiff executes it on premises. The structural guarantee prevents remote unauthorised access by a cloud provider. It does not grant immunity from lawful compelled disclosure from the operator itself. A weak sentence that would have undermined a lawyer’s confidence in the entire document.

v_final: What Survived

After revisions:

CLAIM-001: SOLID after separation from CLAIM-002 and addition of BDAA
CLAIM-002: Reframed as arguable, not established law; AI provider logging flagged as factual matter to verify
CLAIM-003: Narrowed — “no US entity with access to document content during inference”
CLAIM-007: Rewritten — “removes the US CLOUD Act vector specifically; does not affect the operator’s own obligations to produce documents under applicable law”
Ten open questions for legal counsel (up from five in v1), including: ingestion transfer as separate compliance matter, BDAA scope, Ollama telemetry verification, operator’s existing cloud posture

The core argument holds. The brief is now more honest about what it claims — which makes it more useful to a lawyer, not less.

Why This Matters for Local-First AI

The adversarial panel methodology is not specific to legal documents. Any high-stakes output — a compliance report, a risk assessment, a security audit — benefits from explicit adversarial review with conflicting mandates. A single model reviewing its own output is not adversarial review; it is proofreading.

The pipeline described here is the first application of the Legal Agent pattern documented in this project’s wiki. The iteration chain — v1, all three panel verdicts, dissent register, v_final — is committed to Chronos as a replayable artefact.

The public artifacts — sanitised brief, GPT-4o citation check, and Gemini barrister review — are in the Chronos experiment record. The internal iteration chain is held privately.

The Pipeline#

The Three Mandates#

The Rubric#

What the Panel Found#

The Two Critical Issues#

v_final: What Survived#

Why This Matters for Local-First AI#