Where Decompose Fits
Decompose is not the whole product. It is the structural step that makes everything else possible.
People find Decompose and think they've found the product. They haven't. Decompose is one step in a much larger system. It's the open-source cognitive primitive, the structural step that happens before reasoning begins.
The full product is Signal. This post explains how they connect, what each piece does, and why we open-sourced the part we did.
The pipeline
Every document that enters Signal flows through five stages. Decompose is Stage 4.
Stages 1–3 prepare the document. Stage 4 gives it structure. Stage 5 makes it searchable. Decompose is the hinge, it's where raw text becomes classified, scored semantic units that everything downstream can reason about.
What Decompose does (and doesn't do)
Decompose is a standalone library. You can pip install decompose-mcp right now and use it without Signal, without Echology, without anything else. It does exactly one thing:
Text in, classified semantic units out. Authority, risk, attention, entities. No LLM. Deterministic.
That's it. It doesn't parse PDFs. It doesn't run OCR. It doesn't search a vector database. It doesn't talk to an LLM. It takes text and returns structure.
Decompose does
- Split text into semantic units
- Classify authority (mandatory, directive, permissive)
- Score risk (safety, financial, compliance)
- Calculate attention (0–10 priority)
- Extract entities (standards, dates, dollars)
- Flag irreducible content
- Run in ~14ms, deterministically
Signal adds
- Parse 15+ file formats (PDF, DOCX, DXF, images)
- AI-powered document classification via local LLM
- Standards cross-referencing by jurisdiction
- PII detection and redaction
- Timeline and financial analysis plugins
- Vector search across your entire archive
- Verification, audit trails, and certificates
- CAD/BIM script generation
- Workflow orchestration (Temporal)
- Full API server (FastAPI)
Decompose is the cognitive primitive. Signal is the intelligence platform built on top of it.
The three engines
Signal has three engines. Each one handles a different phase of document intelligence. Decompose lives inside the first one.
Vanta is the ingestion engine. It reads documents, classifies them, runs enrichment plugins, and calls Decompose to break them into semantic units. Parse → Classify → Enrich → Decompose → Index. That's the Vanta pipeline.
Aletheia is the verification engine. It checks decomposed units against jurisdiction-specific standards, validates cross-references, runs consistency checks, and issues audit certificates. When Signal says a document passes, Aletheia is why.
Daedalus is the retrieval and action engine. It searches the indexed archive, finds patterns across documents, and generates outputs, including Civil3D and Revit scripts. When an engineer asks "show me every foundation spec that references ACI 318," Daedalus answers.
Why we open-sourced this part
Three reasons.
1. It teaches the architecture. If you understand what Decompose does, splitting text into classified units with authority, risk, and attention scores, you understand the fundamental design pattern behind all of Signal. Structure before reasoning. Classification before generation. Deterministic preprocessing before probabilistic inference.
2. It's useful on its own. You don't need the full Signal platform to benefit from decomposition. If you're building a RAG pipeline, an AI agent, or any system that processes documents, Decompose makes your model work better by giving it structure before it starts thinking.
3. It's the on-ramp. Teams that start with Decompose learn the vocabulary: authority levels, risk categories, attention scores, irreducibility. When they're ready for the full platform, parsing, verification, search, audit, they already speak the language.
The picture
Here's the simplest way to think about it:
Decompose is the cognitive primitive. Signal is what happens when you take that primitive and build an enterprise document intelligence platform around it. Vanta, Aletheia, and Daedalus are the three engines that make that platform work.
Everything runs locally. No cloud. No data leaves your building. That's not a limitation. For the industries we serve, it's the requirement.
Where it fits in AEC
AEC firms deal with documents that have legal weight. Specifications, contracts, submittals, RFIs, these aren't articles, they're obligations. Decompose turns every one of them into scored, classified units that downstream systems can reason about.
Specification Review
A 200-page structural specification comes in. Decompose finds the 40 mandatory clauses, flags the 12 safety-critical requirements, extracts every ASTM and ACI reference, and scores each section by attention. Your engineer reads 30 pages instead of 200.
Contract Analysis
Liquidated damages, retainage, indemnification, insurance minimums, buried in boilerplate. Decompose surfaces every financial obligation and flags irreducible clauses that cannot be paraphrased. Your PM sees the money on day one.
Submittal Processing
Product data sheets, shop drawings, material certs. Decompose classifies which sections carry mandatory compliance requirements vs. informational marketing content. Route the compliance sections to your engineer. Skip the rest.
RFI Triage
An RFI references three specification sections and asks about a substitution. Decompose breaks the referenced sections into units, scores them, and surfaces the mandatory constraints the substitution needs to satisfy. Your response is grounded, not guessed.
Standards Compliance
Feed your internal standards into Decompose. Feed incoming documents into Decompose. Compare the structured units. Every gap between "what we require" and "what they submitted" becomes visible and auditable.
Model Fine-Tuning
Every decomposed unit is a labeled training sample: authority, risk, attention, irreducibility. Build a corpus of AEC documents, decompose all of them, and you have structured training data for domain-specific AI. No manual labeling.
Where it fits beyond AEC
Decompose was built for AEC, but the architecture is universal. Any industry that processes documents with obligations, risk, and compliance requirements can use it. Here are the workflows we're seeing.
Legal
Law firms process contracts, regulations, case filings, and compliance documentation. Decompose classifies authority levels ("shall" vs. "should" vs. "may"), extracts every obligation and deadline, and flags clauses that must be preserved verbatim. Workflow: Intake document → Decompose → route mandatory clauses to attorney review, skip informational sections → generate obligation summary with audit trail.
Insurance
Underwriters read policies, endorsements, and claim documents all day. Decompose surfaces every coverage limit, exclusion, and conditional clause with attention scores. Workflow: Policy document → Decompose → extract financial terms and conditions → flag exclusions and limitations → compare against claim submissions for coverage gaps.
Healthcare & Life Sciences
Clinical protocols, regulatory submissions, and compliance documentation carry safety-critical requirements. Decompose identifies mandatory procedures, flags dosage and threshold values as irreducible, and scores risk by section. Workflow: Protocol document → Decompose → flag safety-critical sections → preserve verbatim dosages → route to compliance review.
Government & Defense
Federal acquisitions, SOWs, and regulatory guidance follow strict authority language (FAR, DFARS, NIST). Decompose classifies every requirement by authority level and extracts standards references. Workflow: RFP or SOW → Decompose → extract every "shall" requirement → map to compliance checklist → identify gaps in proposal response.
Financial Services
Loan documents, regulatory filings, and audit reports contain obligations buried in dense prose. Decompose extracts every dollar amount, percentage, and deadline, flags financial risk sections, and scores attention. Workflow: Regulatory filing → Decompose → extract financial entities → flag compliance obligations → feed structured data to risk models.
Any RAG Pipeline
If you're building retrieval-augmented generation for any , Decompose is the preprocessing step that makes it work better. Instead of embedding raw text chunks, embed classified semantic units. Your vector search returns mandatory clauses and safety requirements instead of random paragraphs.
The pattern is always the same: decompose first, route by structure, protect what's irreducible, and only send high-attention content to the model. The industry changes. The architecture doesn't.
Try it
Start with Decompose. See what structure looks like.
When you're ready for the full pipeline, parsing, verification, search, audit, script generation, that's Signal.