Methodology 2026-02-20

Where Decompose Fits

Decompose is not the whole product. It is the structural step that makes everything else possible.

People find Decompose and think they've found the product. They haven't. Decompose is one step in a much larger system. It's the open-source cognitive primitive, the structural step that happens before reasoning begins.

The full product is Signal. This post explains how they connect, what each piece does, and why we open-sourced the part we did.

The pipeline

Every document that enters Signal flows through five stages. Decompose is Stage 4.

Stage 1 Parse PDF, DOCX, DXF, images → raw text

↓

Stage 2 Classify Document type, domain, risk indicators

↓

Stage 3 Enrich Standards, PII, timelines, financials, contracts

↓

                        Stage 4, Open Source
                        Decompose
                        Semantic chunking, authority, risk, attention scoring
                    

↓

Stage 5 Index Vector embeddings → Qdrant → searchable archive

Stages 1–3 prepare the document. Stage 4 gives it structure. Stage 5 makes it searchable. Decompose is the hinge, it's where raw text becomes classified, scored semantic units that everything downstream can reason about.

What Decompose does (and doesn't do)

Decompose is a standalone library. You can pip install decompose-mcp right now and use it without Signal, without Echology, without anything else. It does exactly one thing:

Text in, classified semantic units out. Authority, risk, attention, entities. No LLM. Deterministic.

That's it. It doesn't parse PDFs. It doesn't run OCR. It doesn't search a vector database. It doesn't talk to an LLM. It takes text and returns structure.

Decompose does

Split text into semantic units
Classify authority (mandatory, directive, permissive)
Score risk (safety, financial, compliance)
Calculate attention (0–10 priority)
Extract entities (standards, dates, dollars)
Flag irreducible content
Run in ~14ms, deterministically

Signal adds

Parse 15+ file formats (PDF, DOCX, DXF, images)
AI-powered document classification via local LLM
Standards cross-referencing by jurisdiction
PII detection and redaction
Timeline and financial analysis plugins
Vector search across your entire archive
Verification, audit trails, and certificates
CAD/BIM script generation
Workflow orchestration (Temporal)
Full API server (FastAPI)

Decompose is the cognitive primitive. Signal is the intelligence platform built on top of it.

The three engines

Signal has three engines. Each one handles a different phase of document intelligence. Decompose lives inside the first one.

Your documents

PDFs, specs, contracts, submittals, RFIs, drawings, anything your firm produces or receives

↓

Vanta

Read & classify

Aletheia

Verify & audit

Daedalus

Retrieve & act

↓

Open source, inside Vanta

Decompose

The structural step. Every document that Vanta reads gets decomposed into classified semantic units before anything else happens. This is the part we open-sourced.

↓

Structured intelligence

Searchable archive, verified outputs, audit trails, generated scripts, the actual deliverable

Vanta is the ingestion engine. It reads documents, classifies them, runs enrichment plugins, and calls Decompose to break them into semantic units. Parse → Classify → Enrich → Decompose → Index. That's the Vanta pipeline.

Aletheia is the verification engine. It checks decomposed units against jurisdiction-specific standards, validates cross-references, runs consistency checks, and issues audit certificates. When Signal says a document passes, Aletheia is why.

Daedalus is the retrieval and action engine. It searches the indexed archive, finds patterns across documents, and generates outputs, including Civil3D and Revit scripts. When an engineer asks "show me every foundation spec that references ACI 318," Daedalus answers.

Why we open-sourced this part

Three reasons.

1. It teaches the architecture. If you understand what Decompose does, splitting text into classified units with authority, risk, and attention scores, you understand the fundamental design pattern behind all of Signal. Structure before reasoning. Classification before generation. Deterministic preprocessing before probabilistic inference.

2. It's useful on its own. You don't need the full Signal platform to benefit from decomposition. If you're building a RAG pipeline, an AI agent, or any system that processes documents, Decompose makes your model work better by giving it structure before it starts thinking.

3. It's the on-ramp. Teams that start with Decompose learn the vocabulary: authority levels, risk categories, attention scores, irreducibility. When they're ready for the full platform, parsing, verification, search, audit, they already speak the language.

The picture

Here's the simplest way to think about it:

  Echology (the company)
  ├── Decompose   —  open source, free, pip install
  └── Signal        —  commercial platform
       ├── Vanta     (read + classify + decompose + index)
       ├── Aletheia  (verify + audit + certify)
       └── Daedalus  (search + retrieve + generate)
            

Decompose is the cognitive primitive. Signal is what happens when you take that primitive and build an enterprise document intelligence platform around it. Vanta, Aletheia, and Daedalus are the three engines that make that platform work.

Everything runs locally. No cloud. No data leaves your building. That's not a limitation. For the industries we serve, it's the requirement.

Where it fits in AEC

AEC firms deal with documents that have legal weight. Specifications, contracts, submittals, RFIs, these aren't articles, they're obligations. Decompose turns every one of them into scored, classified units that downstream systems can reason about.

Specs

Specification Review

A 200-page structural specification comes in. Decompose finds the 40 mandatory clauses, flags the 12 safety-critical requirements, extracts every ASTM and ACI reference, and scores each section by attention. Your engineer reads 30 pages instead of 200.

Contracts

Contract Analysis

Liquidated damages, retainage, indemnification, insurance minimums, buried in boilerplate. Decompose surfaces every financial obligation and flags irreducible clauses that cannot be paraphrased. Your PM sees the money on day one.

Submittals

Submittal Processing

Product data sheets, shop drawings, material certs. Decompose classifies which sections carry mandatory compliance requirements vs. informational marketing content. Route the compliance sections to your engineer. Skip the rest.

RFIs

RFI Triage

An RFI references three specification sections and asks about a substitution. Decompose breaks the referenced sections into units, scores them, and surfaces the mandatory constraints the substitution needs to satisfy. Your response is grounded, not guessed.

QA/QC

Standards Compliance

Feed your internal standards into Decompose. Feed incoming documents into Decompose. Compare the structured units. Every gap between "what we require" and "what they submitted" becomes visible and auditable.

Training

Model Fine-Tuning

Every decomposed unit is a labeled training sample: authority, risk, attention, irreducibility. Build a corpus of AEC documents, decompose all of them, and you have structured training data for domain-specific AI. No manual labeling.

Where it fits beyond AEC

Decompose was built for AEC, but the architecture is universal. Any industry that processes documents with obligations, risk, and compliance requirements can use it. Here are the workflows we're seeing.

Legal

Law firms process contracts, regulations, case filings, and compliance documentation. Decompose classifies authority levels ("shall" vs. "should" vs. "may"), extracts every obligation and deadline, and flags clauses that must be preserved verbatim. Workflow: Intake document → Decompose → route mandatory clauses to attorney review, skip informational sections → generate obligation summary with audit trail.

Insurance

Underwriters read policies, endorsements, and claim documents all day. Decompose surfaces every coverage limit, exclusion, and conditional clause with attention scores. Workflow: Policy document → Decompose → extract financial terms and conditions → flag exclusions and limitations → compare against claim submissions for coverage gaps.

Healthcare & Life Sciences

Clinical protocols, regulatory submissions, and compliance documentation carry safety-critical requirements. Decompose identifies mandatory procedures, flags dosage and threshold values as irreducible, and scores risk by section. Workflow: Protocol document → Decompose → flag safety-critical sections → preserve verbatim dosages → route to compliance review.

Government & Defense

Federal acquisitions, SOWs, and regulatory guidance follow strict authority language (FAR, DFARS, NIST). Decompose classifies every requirement by authority level and extracts standards references. Workflow: RFP or SOW → Decompose → extract every "shall" requirement → map to compliance checklist → identify gaps in proposal response.

Financial Services

Loan documents, regulatory filings, and audit reports contain obligations buried in dense prose. Decompose extracts every dollar amount, percentage, and deadline, flags financial risk sections, and scores attention. Workflow: Regulatory filing → Decompose → extract financial entities → flag compliance obligations → feed structured data to risk models.

Any RAG Pipeline

If you're building retrieval-augmented generation for any , Decompose is the preprocessing step that makes it work better. Instead of embedding raw text chunks, embed classified semantic units. Your vector search returns mandatory clauses and safety requirements instead of random paragraphs.

# Generic workflow for any industry

# 1. Decompose the document
from decompose import decompose_text
units = decompose_text(document_text)["units"]

# 2. Route by priority
critical = [u for u in units if u["attention"] >= 3.0]
review   = [u for u in units if u["risk"] != "informational"]
skip     = [u for u in units if u["attention"] < 0.5]

# 3. Protect what can't be changed
verbatim = [u for u in units if u["irreducible"]]

# 4. Send only what matters to your LLM
for unit in critical:
    response = llm.analyze(unit["text"], unit["metadata"])
            

The pattern is always the same: decompose first, route by structure, protect what's irreducible, and only send high-attention content to the model. The industry changes. The architecture doesn't.

Try it

Start with Decompose. See what structure looks like.

pip install decompose-mcp

# Decompose any text
from decompose import decompose_text
result = decompose_text(open("contract.md").read())

# Or run as an MCP tool for your AI agent
python -m decompose --serve
            

When you're ready for the full pipeline, parsing, verification, search, audit, script generation, that's Signal.

← Back to methodology