The CasaSol demo shows a local Gemma 4 26B model redacting a toxic real estate agent note in real time. The implicit claim behind that demo is that the output is GDPR-clean.

An anecdote is not an experiment. One demo run is a marketing moment. To turn that claim into engineering truth, we needed a controlled test before the booth opens. Enter Experiment 006: The Redactor Fidelity Test.

The Redaction Contract

Redaction is a contract. On one side, the input contains “toxic” data—sensitive, private, or legally protected information. On the other side, the output must contain only the allowed content, stripped of specific identifiers.

For this experiment, we pre-registered eight specific data categories. If any of these appeared in the output, the contract was breached.

IDCategoryDescription
C1Natural person identityNationality, ethnicity, owner name
C2Legal proceedingsDivorce, tax investigation, court order, receivership
C3Financial situationFloor price, rejected offers, debt, must-sell urgency
C4Undisclosed property defectsStructural issues the seller has not disclosed
C5Health or family dataIllness, bereavement, care situation
C6Third-party private informationNeighbour disputes, tenant situation
C7Agent-internal commercial intelligenceExclusive mandate, competitor ignorance
C8Temporal pressureDeadlines from legal/tax/health events

The Setup

We didn’t wing it. The hypothesis, the eight categories, and the pass criteria were committed to scientific_log.md before a single note was generated. This is the core of the Chronos methodology: pre-registration to prevent hindsight bias.

The execution was stripped of all variables except the data itself:

  • Dataset: 20 synthetic “toxic” notes. Each note was engineered to contain between 2 and 7 of the 8 categories.
  • Model: gemma4:26b via Ollama.
  • Parameters: Temperature set to 0.1 to minimize creative drift.
  • Prompt: A fixed system prompt designed to identify and redact.
  • Validation: An automated checker applied regex patterns for all 8 categories to every output. Any flag triggered a manual review.

The Results

The numbers are clear.

MetricResult
Notes processed20/20
True-positive leaks0
Automated flags4
False positives4
Correct TAGS + DESCRIPTION format20/20
Outputs within 300s20/20

The zero true-positive leaks is the headline. The contract held.

However, the automated flags tell a story of refinement. All four flags were triggered by category C7 (Agent-internal intelligence). The model was doing the right thing — it produced marketing-appropriate language like “Exclusive listing” and “exclusive opportunity” while correctly suppressing the actual sensitive content (mandate instructions, competitor ignorance). The checker was too broad: bare \bexclusive\b fires on legitimate marketing adjectives.

We tightened the pattern after the first run: replacing \bexclusive\b with \bexclusive (mandate|instruction|with us)\b. The final check across all 20 outputs returned 0 flags.

One interesting observation regarding compute: note_002, which contained 7 toxic categories, produced 4,339 response tokens, compared to the typical 1,000–1,600. The model appears to perform significant internal reasoning/chain-of-thought before producing the final, clean TAGS + DESCRIPTION output. While this is a cost and latency observation, it does not constitute a failure of the redaction task itself.

The Verdict

There is a fundamental distinction between the architecture and the agent.

In Experiment 003, we proved data sovereignty as an architectural invariant. We showed that the anonymization boundary enforced by the import graph cannot leak by design. That is about the vault.

Experiment 006 proves fidelity. It proves that the model, when given a fixed prompt and a clear contract, faithfully executes redaction across varied, high-entropy inputs. This is about the locksmith.

The vault keeps the data safe; the locksmith ensures that when you need to share a sanitized version, the privacy remains intact.

Evidence Reference: