What "Simulation-Aware"
Actually Means

It's not a buzzword. It's a design pattern.

If your AI system processes documents and someone asks "why did it flag this section as safety-critical?" — what do you answer?

"The model said so" is not an answer. Not in construction. Not in healthcare. Not in any industry where documents have legal weight and wrong answers have professional liability.

"Simulation-aware" is a design pattern that answers this question. Every classification is traceable to specific causes. Every output is verifiable against its inputs. Every anomaly is detected before it reaches a human. The pattern is built from real computer science primitives — Merkle trees, causal DAGs, attention budgets, parity checks — not from marketing.

The core idea

Treat every document as a small universe. It has its own rules (mandatory requirements, permissive options), its own physics (structural loads, financial thresholds), its own timeline (deadlines, milestones), and its own entities (standards bodies, parties, jurisdictions).

A simulation-aware system processes this universe with the same rigor a physics engine processes a game world:

This is the architecture behind AECai. It's 17 systems distributed across three engine pillars. Here's what they actually do.

System 1: Causal Consistency Networks

Every finding in the pipeline has a causal chain explaining why it was made. Not a confidence score. Not a probability. A directed acyclic graph of specific, traceable causes.

# Unit 5 classified as safety_critical. Why? explanation = ccn.explain("unit_5") # Returns: { "effect_id": "unit_5", "causal_chain": [ {"cause": "keyword:shall", "relationship": "authority_signal"}, {"cause": "standard:ACI_318", "relationship": "seismic_reference"}, {"cause": "discipline:structural", "relationship": "domain_context"} ] }

This matters for E&O insurance defense. When a client asks "why did your AI say this was critical?" you point to the causal chain, not the model weights. The graph has a consistency checker that detects cycles, orphaned findings, and contradictions. If a classification has no causal chain, it's flagged as an orphaned finding — something the system produced but can't explain.

System 2: Reality Anchors

Some facts in a document are externally verifiable. "ACI 318-19" is a real standard. "January 15, 2026" is a real date. "OSHA" is a real organization. These are anchors — known-true reference points that everything else is measured against.

# Register verifiable facts as anchors anchors.register_anchor("std_1", "standard", "ACI 318-19", confidence=1.0) anchors.register_anchor("date_1", "date", "2026-01-15", confidence=1.0) # Attach findings to their anchors anchors.attach_to_anchor("finding_12", ["std_1", "date_1"]) # If an anchor is invalidated, all dependents cascade anchors.invalidate_anchor("std_1", reason="Standard withdrawn") # → finding_12 confidence drops from 1.0 to 0.0 # → All findings attached to std_1 flagged as suspect

The confidence model uses geometric mean of anchor confidences. A finding with three verified anchors has confidence ~1.0. A finding with one invalidated anchor drops to ~0.0. A finding with no anchors at all gets baseline 0.5 — the system acknowledges uncertainty rather than guessing.

This is how the system handles a withdrawn standard, a retracted report, or an amended contract. Invalidate the anchor, and everything that depended on it cascades to "suspect" automatically.

System 3: Temporal Merkle Trees

Every semantic unit the pipeline produces gets hashed into a Merkle tree. The root hash represents the entire output. Any single unit can be verified without downloading the full dataset.

# Build tree from pipeline output for unit in units: merkle.add_leaf(unit["unit_id"], unit) root = merkle.build_tree() # Later: verify a single unit proof = merkle.get_proof(leaf_index=5) valid = merkle.verify_leaf(5, unit_hash, proof) # True/False # If anyone tampers with unit 5, the proof fails. # Domain separators prevent second-preimage attacks.

This isn't blockchain. There's no distributed consensus, no mining, no chain. It's a standard Merkle tree — the same data structure git uses to verify commits. The difference is that it operates at the semantic unit level, so you can verify that a single paragraph of a 200-page spec hasn't been altered without re-processing the entire document.

Why this matters: when you deliver an AI-processed compliance report to a client, they need to know the output hasn't been modified after processing. The Merkle proof is that guarantee.

System 4: Attention Budgets

The pipeline has a fixed attention budget of 100 units per document. Safety-critical content consumes more. Boilerplate consumes less. The budget prevents the system from spending equal compute on every section.

# Allocate attention based on risk and field strength scheduler = ConsciousnessScheduler(total_budget=100) scheduler.allocate("unit_1", field_strength=4.0, risk="safety_critical") # → 16 units scheduler.allocate("unit_2", field_strength=1.0, risk="informational") # → 1 unit # Attention maps to processing depth: # ≥ 8 → "deep" (full analysis, standards matching, AI enrichment) # ≥ 3 → "standard" (normal processing) # ≥ 1 → "shallow" (minimal classification only) # < 1 → "skip" (omit from processing)

This is the same principle behind Decompose's attention scoring, but applied to the full pipeline. In Decompose, attention decides what your agent reads. In AECai, attention decides what processing depth each unit receives: deep analysis for safety-critical content, shallow pass for background, skip for boilerplate.

The budget is finite. When it runs out, remaining units get minimal processing. This is intentional — it forces the system to prioritize. A 200-page spec where every section gets "deep" analysis is a system that doesn't know what matters.

System 5: Multi-Channel Error Correction

Run multiple independent extraction channels on the same content. Where channels agree, confidence is high. Where they disagree, flag for review.

# Channel A: regex extraction (fast, literal) # Channel B: structural analysis (position-aware) # Channel C: LLM extraction (semantic, optional) result = qec.parity_check( "unit_5", channel_a={"risk_level": "compliance", "discipline": "structural"}, channel_b={"risk_level": "safety_critical", "discipline": "structural"}, channel_c={"risk_level": "safety_critical", "discipline": "structural"} ) # → risk_level: corrected to "safety_critical" (2/3 majority) # → discipline: unanimous agreement "structural"

This catches OCR errors, misclassifications, and edge cases that any single extraction method would miss. The correction is conservative: unanimous = high confidence, majority = corrected with note, split = flagged for human review. No auto-correction on uncertain data.

System 6: Anomaly Detection

Documents can contain contradictions, impossible dates, and circular references. The simulation escape detector flags these before they reach a human.

# Check for impossible timelines escape.check_temporal_consistency([ {"parsed_date": "2026-01-15", "raw_text": "Notice to proceed"}, {"parsed_date": "1847-03-01", "raw_text": "Contract execution"} ]) # → temporal_anomaly: events span 65,340 days # Check for contradictory standards escape.check_standard_consistency([ {"body": "ACI", "version": "318-14"}, {"body": "ACI", "version": "318-19"} ]) # → version_conflict: ACI referenced with versions {'318-14', '318-19'}

A date in 1847 is almost certainly an OCR error or copy-paste mistake. Two different versions of the same standard in the same spec is a real conflict that needs resolution. Both are "simulation escapes" — states that shouldn't exist given the document's internal rules.

The full inventory

Six systems explained above, eleven more running underneath. Here's the complete map, organized by which engine pillar owns each system:

Vanta / Pipeline
Quantized Message Bus
Deterministic inter-stage message passing with causal ordering and atomic commits.
Vanta / Pipeline
Hierarchical Contexts
Nested document processing — when a spec says "per ASTM A615," spawn a child context for that standard.
Vanta / Pipeline
Memetic Evolution
Detection patterns evolve: confirmed patterns gain weight, false positives lose it. Safety patterns are immutable.
Vanta / Pipeline
Attention Budget
100-unit budget per document. Safety-critical gets deep analysis. Boilerplate gets skipped.
Vanta / Pipeline
Irreducibility Detector
Identifies content that cannot be paraphrased: engineering values, legal mandates, formulas.
Aletheia / Verification
Causal Consistency
DAG of why every classification was made. Cycle and orphan detection for audit defense.
Aletheia / Verification
Error Correction
Multi-channel extraction with majority vote. Unanimous = trusted. Split = flagged for review.
Aletheia / Verification
Reality Anchors
Verifiable facts as anchor points. Invalidation cascades to all dependent findings.
Aletheia / Verification
Merkle Verification
Unit-level tamper detection. Verify any paragraph without re-processing the document.
Aletheia / Verification
Anomaly Detection
Impossible dates, contradictory standards, circular references. Flagged before delivery.
Aletheia / Verification
Counterfactual Logger
What-if audit trail. "If this pattern had scored differently, what would have changed?"
Daedalus / Data
Holographic Storage
Erasure-resilient encoding. Lose 30% of units, still reconstruct the document's meaning.
Daedalus / Data
Data Segregation
PII and client data isolation via topological braiding. No cross-contamination between projects.

Plus four systems in the torsion subsystem (lazy scheduling, spin-curvature field computation, vortex caching, chirality feedback) that handle the initial field physics computations before classification begins.

Why the simulation framing

Fair question. Why call it "simulation-aware" instead of "document processing pipeline"?

Because the framing changes how you design systems. If you think of a document as text to extract data from, you build a pipeline. If you think of a document as a universe to verify, you build something different:

The simulation framing led us to systems we wouldn't have built otherwise. Causal consistency networks exist because we asked "can we trace the causal chain for every finding?" Reality anchors exist because we asked "what are the known-true facts in this document, and what happens when one is wrong?" Merkle trees exist because we asked "can we verify a single paragraph without re-processing 200 pages?"

These questions don't arise from a data extraction mindset. They arise from treating the document as a system with internal rules that can be checked.

What this enables

Three capabilities that a standard document AI can't provide:

1. Audit defense

When a client or regulator asks "why did your system flag this section as safety-critical?", you show the causal chain: keyword "shall" (mandatory authority) + reference to OSHA 1926 (safety standard) + structural discipline context = safety-critical classification. Each link in the chain is a specific, verifiable signal. Not a model confidence score.

2. Incremental verification

A 500-page spec was processed six months ago. Today, section 14 needs re-verification. The Merkle tree provides a proof path for section 14 without re-processing sections 1-13 and 15-500. If the proof validates, section 14 hasn't been tampered with. If it fails, something changed.

3. Cascading trust

ASTM C150-22 gets superseded by C150-23. One anchor invalidation, and every finding in every document that referenced the old standard gets flagged as "suspect" with a clear trail: "This finding was anchored to ASTM C150-22, which has been superseded." No re-processing needed — just an anchor update that cascades through the dependency graph.

What we open-sourced

Decompose is the open-source version of two of these systems: the attention scorer (system 4) and the irreducibility detector (system 5). It runs on pure regex, processes documents in ~14ms on average, and gives any agent the ability to prioritize what matters.

The remaining 15 systems are part of AECai, which runs locally on your hardware and processes AEC documents with the full simulation-aware architecture.

Both are built by Echology. If you're building document intelligence for an industry where wrong answers have consequences, let's talk.