Here's a six-section construction specification. Your agent needs to read it and figure out what matters. Go:
A raw LLM will read all six sections with equal attention. It will spend the same amount of compute on "The architect of record is Smith & Associates" as it does on "Liquidated damages of $2,500 per calendar day." It will treat the background section with the same importance as the safety section.
This is the fundamental problem. Your agent reads everything equally because it has no structure to tell it what matters.
What a cognitive primitive is
A CPU doesn't reason about data. It has primitives: ADD, COMPARE, LOAD, STORE. High-level thinking is composed from these low-level operations.
AI agents work the same way. Your orchestration layer — LangChain, CrewAI, raw API calls, whatever — composes from primitives. The usual ones: retrieve, summarize, generate, classify. But there's one missing from the stack.
Decomposition: splitting raw text into classified, prioritized semantic units before reasoning begins.
This isn't summarization. Summarization loses information. It's not retrieval. Retrieval requires a query. It's not classification in the ML sense. There's no model, no training data, no probabilities.
It's structural analysis. The same document goes in. Classified units come out. Every time. In 4 milliseconds.
What decomposition looks like
That six-section spec above? One function call:
Here's what your agent sees instead of 2,045 characters of raw text:
Section 4 scored 8.0. Section 6 scored 0.1. Your agent now knows which 30% of the document to send to the LLM and which 70% to skip.
Why Safety scored 8.0
Four words triggered it: shall (mandatory authority), must (mandatory), required (mandatory), and comply (compliance risk). The attention formula is straightforward:
No machine learning. No embeddings. The word "shall" has meant "mandatory" since RFC 2119 was published in 1997. It means the same thing in a building spec, a software spec, and a procurement contract. Regex is the right tool here.
Why Background scored 0.1
No mandatory keywords. No risk indicators. One "permissive" match from generic language. The attention formula bottoms out:
"Similar projects in the region have typically required 8-12 months." That's useful context, but it's not an obligation, not a risk, and not something your agent needs to act on. Attention: 0.1.
What else the primitive extracts
Attention is the headline, but each unit carries a full classification:
Notice the structural requirements unit: low attention score (0.6) but marked PRESERVE_VERBATIM. Your agent might skip this unit for initial triage, but if it ever summarizes the document, it knows: do not paraphrase "4,000 psf." That number is irreducible. Getting it wrong kills people.
How this changes agent architecture
Without a decomposition primitive, your agent pipeline looks like this:
With decomposition:
The difference isn't just efficiency. It's correctness. Three things change:
1. Attention allocation
Instead of the LLM deciding what to focus on (which it does poorly for technical documents), the attention scores pre-allocate compute. Safety-critical content gets processed. Background doesn't. Your agent acts like an experienced engineer who knows which sections of a spec to read first.
2. Irreducibility awareness
When a unit is marked PRESERVE_VERBATIM, your agent knows not to summarize or paraphrase it. "4,000 psf" stays "4,000 psf" — it doesn't become "about 4,000 psf" or "approximately 4 kips per square foot." This is the difference between a useful tool and a liability.
3. Entity-aware routing
Decompose extracts every standards reference (ASTM, ASCE, IBC, OSHA), every date, every dollar amount. Your agent can route by entity type: send OSHA references to a safety compliance chain, send financial values to a payment audit chain, send ACI references to a structural analysis chain. No embedding similarity search required. Just pattern matching on structured metadata.
What this is not
Decompose is not a replacement for LLMs. It's a preprocessor. Some things it explicitly cannot do:
- Nuance. "The contractor should consider" and "The contractor should comply" both classify as
directive. An LLM knows these have different implications. - Cross-reference. If section 3 says "per the geotechnical report" and section 1 references a different report, Decompose doesn't catch the conflict. That's what your model is for.
- Intent. Decompose classifies structure, not meaning. It knows "shall" means mandatory. It doesn't know what the sentence is trying to accomplish.
- Domain reasoning. "4,000 psf" is flagged as an engineering value. Decompose doesn't know whether 4,000 psf is reasonable for this soil type. A domain expert (human or LLM) does.
This is the point. Your LLM handles nuance, cross-referencing, intent, and domain reasoning. Decompose handles everything else — the mechanical work of splitting, classifying, scoring, and extracting — so the LLM can focus on what it's actually good at.
Why it has to be deterministic
Imagine an agent processing safety compliance documents. Run it Monday, it flags section 4 as safety-critical. Run it Wednesday, it classifies the same section as "advisory" because the LLM was slightly more creative that day.
That's not a tool. That's a coin flip.
Decompose returns the same output for the same input. Every time. There's no temperature knob, no sampling variance, no model drift. If "shall" matched mandatory yesterday, it matches mandatory today. If section 4 scored 8.0 in production, it scores 8.0 in your test suite. You can write assertions against it. You can audit it.
For industries where documents have legal weight — construction, healthcare, defense, finance — determinism isn't a nice-to-have. It's table stakes.
The primitive pattern
The broader point: agents need more primitives, not more parameters.
The current trajectory of AI tooling is: make the model bigger, give it more context, add more retrieval. This works, until it doesn't. A 200K context window doesn't help if 70% of the tokens are background noise. RAG doesn't help if the embeddings can't distinguish a mandatory requirement from an informational note.
Primitives help because they operate at a different layer. They don't compete with the LLM — they feed it. A well-decomposed document makes every model work better: smaller context, higher signal, structured metadata to route on.
Decompose is one primitive. There should be more. Entity resolution, temporal ordering, reference graph construction, obligation tracking — these are all deterministic operations that agents currently outsource to the model. Every one of them could be a regex-and-heuristics library that runs in milliseconds and gives the LLM structured inputs instead of raw text.
We're building these at Echology. Decompose is the first one we open-sourced.
Try it
Every unit comes with authority, risk, attention, actionable, irreducible, entities, dates, financial. No API key. No setup. Runs offline.
Your LLM handles reasoning. Let something else handle reading.