Has your GEO agency told you about TurboQuant?

99% of GEO agencies are unaware of this compression algorithm.

In March 2026, Google publicly released TurboQuant, a compression algorithm that changed how AI retrieves and processes information. RAM company stocks dropped 5–10% on release day. The GEO industry has barely noticed. WLDM’s TurboQuant simulators and full delivery workflow were already in place.

5–10%

Drop in publicly traded RAM company stocks on TurboQuant’s release, March 2026

80% → 30%

Share of AI Overview answers sourced from the classical top-10, before and after TurboQuant

45%

hare of Google searches now answered by AI Overviews

WHAT CHANGED

Google stopped retrieving documents. It started computing entities.

Until TurboQuant, AI search worked like this: a buyer types a question, the model finds the most relevant document and summarises it. That process ended. The model now extracts the entities out of a query and computes an answer directly, creating new answers from structured facts rather than retrieving and regurgitating.

If your site is built around keywords and long content, it is still optimised for the old behaviour. The model is running a different process now.

For technical teams

How TurboQuant compresses the model's working memory

When an AI model processes a sequence of tokens, it stores a Key vector and a Value vector for every token already processed. This structure is called the KV cache, the model’s working memory. The KV cache is also the biggest cost driver in long-context inference. Two structural problems made it increasingly expensive: attention cost is quadratic in sequence length (O(N²)), so doubling context roughly quadruples cost; and even with large context windows, models attend unevenly, favouring the start and end and skimming the middle. Brute-force context expansion was a dead end. Compression was the only viable path.

TurboQuant solves this in two stages. PolarQuant (stage 1) re-encodes the cache vectors in polar coordinates: radius and angle instead of full Cartesian directions. For attention computation, direction matters more than magnitude, and angles cluster predictably across tokens, so they compress cheaply onto a small fixed codebook. QJL (stage 2, Quantized Johnson-Lindenstrauss) adds one extra bit per dimension that corrects stage 1’s small directional bias, making the inner-product estimates the model uses for attention scores provably unbiased: mathematically guaranteed not to drift. This is lossless compression of the model’s working memory: on average, the compressed answer matches the uncompressed answer exactly.

Metric	Result	Source
KV cache footprint	4.5× smaller	Google Research
Attention speed (H100)	8× faster (4-bit vs 32-bit)	Google Research
Evidence per query	100K → 450K tokens	Google Research
Candidate sources per query	5 → 20	Google Research
Retraining required	None — drop-in on existing models	Google Research

The proof?

The mechanism is Google's. The numbers are documented.

TurboQuant is a published Google Research algorithm. The compression figures are verified. The case for building computation-grade entity coverage rests on those numbers, not on WLDM’s claims.

4.5×

smaller working memory footprint per query (Google Research)

8×

faster computation on current production hardware (Google Research)

+29.6%

retrieval accuracy from structured entity pages vs. HTML content (WordLift, arXiv:2603.10700)

"It's no longer retrieving and regurgitating. It's computing entities, creating new answers."

Brie Moreau, Founder, WLDM

For technical teams

The Quantization Landscape — why TurboQuant is a breakthrough, not an increment

Prior compression algorithms required trade-offs that made them unsafe for production retrieval pipelines. TurboQuant is the first algorithm to combine all four properties required for reliable, high-performance deployment at web scale.

Algorithm	Unbiased	Codebook-free	GPU-native	Data-oblivious
PQ / OPQ	no	no	no	no
ScaNN	no	no	partial	no
RaBitQ	no	no	yes	no
TurboQuant	yes	yes	yes	yes

Unbiased: attention estimates guaranteed not to drift. Codebook-free: no per-dataset calibration, so it deploys without retraining. GPU-native: runs at hardware speed. Data-oblivious: no prior exposure to the data distribution required. Source: Google Research blog; peter-turboquant-reference.md §1.

What the structure research shows

Three independent findings from WordLift (Andrea Volpini) establish the empirical case for structured entity pages over prose content. These are WordLift’s findings. Attribute them as such.

• +29.6% AI retrieval accuracy from dedicated entity pages with RDF structured data, over plain HTML and over HTML-plus-schema-without-RDF. (arXiv:2603.10700)

• 71% win rate for graph traversal (RLM-on-KG) over Microsoft GraphRAG on complex, scattered-evidence reasoning, which is the real-world case for most brands. (arXiv:2604.17056)

• 7% of AI citations go to pages outside the classical top-10 SERP, reached via entity links not rankings. Empirical proof that AI citation has already diverged from classical ranking.

Schema presence (has_schema) is also an independent variable in WLDM’s own 11M+ AI citations dataset. WLDM measures schema as a citation signal, not just asserts it.

Honest gap

No WLDM client result is yet tied specifically to TurboQuant Optimisation. A WLDM schema + TurboQuant data study is in progress. The proof today is the verified Google mechanism and the WordLift research above.

The Difference

What changes when you stop being built to be read and start being built to be computed.

Current approach

TurboQuant Optimisation

What AI encounters

Prose it has to read and parse

Structured facts it can compute directly

What gets measured

Rankings and traffic

How often your brand appears in AI answers

What the work is

Content and keyword targeting

Fact coverage, structured data, consistency

What accountability looks like

Content published

AI retrieval accuracy, before and after

The gap between what your agency is measuring and what AI now rewards is the gap between your investment and your results.

For technical teams

The three eras of AI search

The unit of optimisation has shifted three times. Each shift made the previous era’s techniques not just suboptimal but counterproductive. Optimising for keyword density in the embedding era is noise. Optimising for vector proximity in the graph-traversal era misses the signal that now decides visibility.

	GPT-3 era	GPT-4 era	GPT-5.4 era
Method	Pattern match	Semantic similarity	Graph traversal
Unit	Token	Embedding	Entity
What matters	Keyword density	Vector proximity	Structural connectivity
Content’s job	Be present	Be similar	Be reachable
Failure mode	Not indexed	Low similarity	Disconnected from graph

"You can't optimise for GPT-5.4 using GPT-4 intuitions."

Andrea Volpini, WordLift

The Gap

The GEO industry is still optimising for document retrieval. The model moved on.

The standard GEO playbook (keyword targeting, content volume, embedding proximity) was built for the era before TurboQuant. Agencies that do not know TurboQuant exists cannot help you prepare for the environment it created.

When Brie opens a pitch with “Have you heard of TurboQuant?”, the answer from almost every agency and in-house team is no. That gap is where the advantage lives. For now.

For technical teams

Why the failure is silent

Before provably-unbiased compression, aggressive KV cache compression degraded retrieval accuracy without any visible signal: no error, no ranking drop notification, no warning. The right pages simply stopped appearing in AI answers and no one knew why.

"Bad Embedding Compression Destroys Rankings. Similarity drifts and rankings degrade silently. This is how content disappears from AI answers without anyone noticing."

Andrea Volpini, WordLift

Because the failure mode was silent, AI systems compressed cautiously and stayed narrow, which is why the 5-source candidate budget persisted for so long. TurboQuant’s lossless guarantee ended that constraint: compression can now be aggressive and faithful simultaneously, which is what unlocks the 5 → 20 candidate expansion.

The structural consequence: if a brand is absent from the entity graph, the failure is invisible. No content quality metric surfaces it. The fix is always structural, never editorial.

The Service

TurboQuant Optimisation structures your brand as entities AI can compute.

Under entity computation, the AI makes a simple resource decision: parse a 6,000-word document, or compute 20 clean entities on a structured page. It picks the entities every time, because they take a fraction of the resource.

TurboQuant Optimisation is the work of making your brand those 20 entities. Two pillars: coverage across every question your buyers ask, and consistency across every place your facts appear.

Four steps from audit to computation-ready coverage.

Step 01 — Audit

Map every fact your brand should be known for against what AI can currently find.Most brands have fewer than 20% coverage on the questions their buyers actually ask AI.

Step 02 — Identify gaps

Where coverage is missing, inconsistent, or in a form AI cannot compute without additional parsing effort.

Step 03 — Rebuild

Structured fact pages, stable identifiers, and consistent fact grounding
across third-party sources, so the model can compute your brand cheaply and reach it reliably.

Step 04 — Measure

AI retrieval accuracy before and after, independently verifiable. Not our dashboard. Your data.

The GEO program is the foundation. TurboQuant Optimisation is the specialised tier that takes your entity coverage to computation grade.

For technical teams

The technical toolkit

Four categories of work make a brand computable at computation grade.

Schema skyscrapers

Comprehensive structured-data pages built so the model computes the brand’s entities without parsing prose. Maximum computable density: every entity the brand should be known for, represented in structured markup with stable identifiers, so the model can answer a broad range of queries from a single page traversal.

Article IDs and @id stability

Every schema entity, and every reference to it across the site, resolves to one canonical @id. Character-for-character consistency. Computation-grade requirement, not a best practice.

Wikidata entity engineering

The brand’s Wikidata QID anchors its entity identity across AI systems. Most enterprise QIDs exist but are thin: missing category-level and relationship properties. The work is property-completeness and on-page sameAs anchoring to the correct QID.

Off-page entity stacks

Consistent entity facts across third-party sources, with deliberate wording variation to validate the cluster. Coordinated with the AI Citations service.

Boundary with GEO

GEO delivers the broad on-page foundation: entity graph, @id consistency, crawler compliance, content structure. TurboQuant Optimisation is the specialised tier that pushes entity coverage to computation grade across the full question surface and enforces stability as a hard requirement.

Internal tooling

WLDM’s TurboQuant simulators and TurboCon tool are in QA. The schema and entity consulting described here is production-ready; the deepest proprietary research tooling is forthcoming.

Coverage

Coverage across every question your buyers ask AI.

Entity breadth means your brand is computable for head terms, comparisons, methodologies, and use cases: the full range of questions buyers ask before they ever reach your site.

Most brands cover the obvious questions. TurboQuant Optimisation maps the full question surface and builds structured fact coverage for the questions your brand is currently absent from.

For technical teams

Mapping the full question surface

Entity breadth is the computation-grade extension of GEO’s entity inventory. The principle: for every question a buyer asks AI in the category, the model should be able to compute a brand-relevant answer from structured facts without falling back to parsing prose.

Most enterprise brands have strong coverage for head-term queries (brand name, primary product, core use case) and near-zero coverage for the long tail: comparisons, methodology questions, use-case variations, integration queries. That long tail is where much of the computation-era citation concentrates, because those are the questions buyers ask AI before they have a shortlist.

The audit maps the full question surface for the category, scores current entity coverage against it, and identifies the gaps where a competitor or a generic source is being computed instead of the brand. The rebuild prioritises by the queries with the highest buyer-intent signal and the lowest current brand coverage.

Schema skyscrapers are the architecture for breadth at scale: a single structured-data page designed to answer a cluster of related buyer questions from computed entities, so the model never needs to fall back to a 6,000-word article for questions in that cluster.

Consistency

The same facts, everywhere they appear.

Entity stability is NAP consistency applied to facts. If your brand resolves three different ways across your site, third-party sources, and structured data, the model gets three weak signals instead of one strong one.

TurboQuant Optimisation enforces consistent entity identity on-page and across the web. That is the level of consistency the model needs to compute your brand reliably.

For technical teams

@id discipline — the computation requirement

For a model to compute with confidence, every reference to the same entity must resolve to the same canonical identifier, character for character. “Acme Corp”, “ACME Corp”, and “Acme Corporation” are three separate entities to a model performing entity computation. The correct canonical @id, applied consistently across every instance of the entity in schema markup, internal links, and structured data, is what consolidates three weak signals into one strong one. This is the @id consistency rule from GEO raised from a best practice to a non-negotiable computation requirement.

Wikidata grounding

“Wikipedia is the human version; Wikidata is the data version for LLMs.”

A brand’s Wikidata QID is the anchor point AI systems use to resolve brand references across the web. Most enterprise brands have a QID that resolves the brand name but is missing the category-level properties, relationship properties, and industry context that make it computable for buyer questions. The work is property-completeness: ensuring the QID carries the properties AI systems actively query, and on-page sameAs anchoring so every relevant entity page links back to the correct QID.

Off-page entity stacks

The same facts, names, and properties need to appear in third-party sources across the web, with deliberate wording variation to prevent the cluster from collapsing to a single source.

The variation validates the cluster; the consistency validates the entity. As Brie puts it: “We create entity stacks across the internet — consistently we want to change the wording a little bit — ‘Kinsta is an enterprise-level WordPress hosting platform that focuses on XYZ’, then variate it. Once you start clustering those entities together with a little variation, you’re validating.” This work is coordinated with the AI Citations service.

The Window

The brands that build for TurboQuant now will own the positions that form over the next 18 months.

Full adoption of TurboQuant across AI systems sits on approximately an 18-month horizon. The entity graph you build today is what the model reaches for as adoption completes. This is not a future problem. It is a present window.

Working with WLDM now means operating in the entity-computation environment while 99% of your competitors are still figuring out what changed in March.

For technical teams

The timing model

TurboQuant is deployed now and its effect on retrieval behaviour is measurable now. The variable you control is not the compression algorithm, which runs inside the model. The variable you control is the quality, breadth, and consistency of your entity graph.

Full adoption across all AI systems sits on approximately an 18-month horizon. The entity coverage built during that window is what the model reaches for as adoption completes.

"Context windows will keep growing. Models will keep getting cheaper. The variable that compounds is your data connectivity."

Andrea Volpini, WordLift

The floor is set. TurboQuant is arriving regardless of whether a brand prepares for it. The compounding asset is the entity graph, and unlike the model, it is entirely within a brand’s control to build.

Talk to the team that is already operating in the TurboQuant environment.

A strategy session with Brie takes 30 minutes. You leave with a clear read on where your brand sits in the entity-computation era and what it would take to be ahead of the 99% still catching up to March.

Book a strategy session The window is open. Most of your competitors haven't heard of TurboQuant yet.