Decompose is the missing cognitive primitive for AI agents. Text in, classified structured units out. No LLM. No setup. One function call.
Before / After
Real output from the MCP Transport Specification.
attention — 0–10 priority score. Unit 3 scores 4.5 (security risk + directive authority). The overview scores 0.0. Your agent knows where to focus.
risk: security — "attackers", "DNS rebinding", "authentication" trigger security risk detection. The overview and guidelines carry no risk signal.
authority — MUST/MUST NOT = mandatory/prohibitive. SHOULD = directive. Plain prose = informational. Your agent knows what's binding vs. advisory.
actionable — Unit 3 requires action: validate Origin headers, bind to localhost, implement auth. The overview requires nothing.
source — This is real output. Run it yourself: curl spec.modelcontextprotocol.io | python -m decompose
What it does
Mandatory, prohibitive, directive, permissive, informational, conditional. Knows the difference between "shall" and "should" and "may."
Safety-critical, compliance, financial, contractual, advisory. Each chunk gets scored and labeled by risk category.
Standards, dates, dollar amounts, percentages. Deterministic regex. No hallucinations. No API calls.
Detects content that must be preserved verbatim — legal mandates, threshold values, safety limits. Tells your model what it cannot summarize.
Header-aware Markdown chunking. Sentence-boundary text splitting. Each chunk preserves its heading path and structural context.
Every unit gets an attention score from 0–10. Your agent knows which chunks matter most without reading all of them.
Install
OpenClaw / ClawHub
Install the skill for any OpenClaw-compatible agent:
MCP Integration
Add one block to your MCP config. Your agent gets two tools: decompose_text and decompose_url.
Benchmarks
11 documents, 162,107 characters, run on Apple Silicon.
How it trains your model
Decompose doesn't just help agents read — it produces the structured labels that make models smarter over time.
irreducible: true — The financial calculations ($10,000 × 1.06^5 = $13,382.25) contain exact values. A model trained on this label learns: never paraphrase dollar amounts, formulas, or threshold values. This is how you prevent hallucinated numbers.
irreducible: false — The "Why let Claude think?" section is advisory prose. A model trained on this label learns: safe to summarize, reword, or compress. This is how you save tokens without losing meaning.
risk: financial — Decompose detected dollar amounts and investment calculations. A model fine-tuned on these labels learns to flag financial content for human review — even when the surrounding text looks like a tutorial.
attention score — Unit 7 scores 1.5 (financial risk multiplier). Unit 3 scores 0.1 (permissive + informational). When building RAG or curriculum-weighted training, attention tells you which samples to oversample and which to skip.
Use attention scores to weight training samples. High-attention units get oversampled. Informational filler gets downsampled. Your model learns to prioritize what matters.
Each unit is a natural (input, label) pair. Input: the raw text. Labels: authority, risk, actionable, irreducible. Fine-tune a model to classify documents the way decompose does.
Units flagged PRESERVE_VERBATIM teach the model which content it must never paraphrase — exact figures, legal mandates, threshold values. This is how you stop hallucinated numbers.
Instead of stuffing entire documents into context, feed only units above your attention threshold. The model sees 1 unit instead of 10, with metadata explaining why it matters.
heading_path gives your agent document topology without reading the whole thing. Route security units to a safety chain, financial units to an audit chain, informational units to /dev/null.
All of this runs locally in ~6ms per document. No API calls. No GPU. No tokens consumed. Structure your data before it ever touches a model.
Your model is only as good as what you feed it. Feed it structure.