How we measure
What the reference task behind the headline numbers actually is, what was compared, and why these are reference figures rather than per-session guarantees.
The short version: the headline numbers on this site come from one reference task. We ran a knowledge-heavy edit on a real codebase twice, once with Pathrule delivering the team's context and once without it, and recorded what the assistant did each time. This page tells you what that task was, what was compared, and where the numbers do and do not apply.
We would rather you understand one honest comparison than trust a round number with no story behind it.
The numbers we cite
These figures appear across the site, including on the context layer explainer and in the machine-readable summary at /llms.txt. They all trace to the same comparison.
| Metric | With Pathrule | Without Pathrule | Effect |
|---|---|---|---|
| Input tokens | a small fraction | a large context window | about 85% fewer |
| Wall-clock time | seconds | minutes | 5 to 8 times faster |
| Tool calls | a few | many | about 5 times fewer |
| Files read | one | several to many | about 10 times fewer |
| Cost per task | low | higher | about 80% lower |
The token and cost reductions are linked: fewer files read and fewer tool calls mean a smaller input window, and a smaller input window is most of what the cost tracks.
What the reference task is
A developer asks an AI coding assistant to make a change to an existing codebase. The change is small in lines but loaded in context: the right answer depends on a decision the team made earlier that is not written in the source files.
The concrete shape is the coupon example we use elsewhere on the site. On this codebase a discount attaches to the line item, never to the order total. One engineer learned that the hard way; it is a team convention, not something the code states. A capable assistant reading the repo cannot derive it, because it is not there to derive.
That is the whole point of the task. It is chosen to expose the gap between what a code scan can find and what a team already knows.
What was compared
Two runs of the same task, same assistant, same prompt.
- Without Pathrule. The assistant opens the relevant module, reads the schema, follows imports into related code, and reasons its way toward a plausible change. It often lands on the wrong attachment point, because the convention it needs is not in any file it can read. A reviewer catches it later.
- With Pathrule. The path-scoped slice of the team's memories and rules arrives at hook time, before the first tool call. The convention is already in context, so the assistant makes the correct change on the first pass and reads far fewer files to get there.
The difference is not that one assistant is smarter. It is that one of them started from what the team already learned instead of rediscovering it from scratch. For how that delivery works, see How hooks work.
What the numbers do not mean
These are reference figures from one comparison. They are not a per-session guarantee, and we do not present them as an average across all work.
- They are task-dependent. The reference task is deliberately knowledge-heavy. A change where the answer is already in the file you have open will not show an 85% token reduction, because there was little context to save in the first place.
- They are codebase-dependent. The savings scale with how much tribal knowledge your team has captured and how much the assistant would otherwise have to scan to re-derive it.
- They are model-dependent and tool-dependent. Different assistants read files and call tools at different rates, so the absolute numbers move with the tool you run.
- They describe input, not output. The reductions are about the context the assistant has to take in, not about the size of the change it writes.
If a single number has to stand in for the set, we use "about 85% fewer input tokens" and pair it with "on a reference task". That phrasing is intentional and consistent.
Why a code scan does not close the gap
The slow run is not a tooling failure. The assistant did the reasonable thing: it read the code and reasoned about it. The miss is structural. The fact it needed is a team decision, not a property of the source. No amount of additional scanning surfaces a decision that the files do not record.
This is the part the numbers are really measuring: the cost of rediscovering, every session, knowledge the team already has. Pathrule carries that knowledge to the path where it applies so the assistant starts from it. See How knowledge compounds for what that looks like over many sessions.
What we will and will not publish
We publish what we measured and why it matters: the task, the comparison, the observed figures, and their limits.
We do not publish how the system decides which knowledge to surface. The ranking that picks the path-scoped slice is part of the product, not part of this page. If you want to understand the behavior rather than the internals, How retrieval finds the right knowledge describes what you can observe.
When you run your own knowledge-heavy task with and without Pathrule, you should expect the same shape of result: fewer files read, fewer tool calls, a smaller input window, and a change that reflects what your team already knows. The exact multiples will be yours, not ours.