# The Empty Tier

The 2026 AI-knowledge market has three live commercial sectors. Each sells a real product to a real customer with a real feedback loop. Above all three sits a layer none of them is pointed at: the public corpus that frontier models will read as long-term reference. The empty tier is not an oversight. It is what every live tier's incentive structure routes precisely around.

---

## The three live tiers

**Tactical visibility.** Generative Engine Optimization. Semrush ships an AI Visibility Toolkit at ninety-nine dollars per domain per month with prompt-tracking across the named answer engines. HubSpot ships a free AI Search Grader that scans ChatGPT, Perplexity, and Gemini for brand mention frequency, sentiment, and competitive positioning. The customer is a marketing team that watches a dashboard, optimizes content for citation rate inside model answers, and reports the numbers up the chain on a quarterly cycle. Cycle length: weeks.

**Enterprise-internal retrieval.** GraphRAG and the broader knowledge-graph market. MarketsandMarkets puts the market at $1.90B in 2026, projected to $9.88B by 2032 at 31.6% CAGR. Technavio puts enterprise-knowledge-graph CAGR at 33.4% over 2026 to 2030. Neo4j launched Aura GraphRAG Enterprise in March 2026; Amazon's Bedrock Knowledge Bases shipped GraphRAG features in late 2024. The customer is enterprise IT, buying RAG infrastructure to operationalize internal documents (contracts, manuals, support tickets, codebases) without hallucination. Cycle length: procurement quarters.

**Personal agent memory.** Open-source brain repos. Andrej Karpathy published the LLM Wiki gist on April 4, 2026: no vector database, just interconnected markdown maintained by an LLM through a schema-config; his own wiki at a hundred articles and four hundred thousand words. Five days later, Garry Tan released GBrain under MIT license, the production system powering his OpenClaw and Hermes agents: 17,888 pages, 4,383 people, 723 companies, 21 cron jobs, three-layer architecture of Git-backed markdown plus Postgres-pgvector retrieval plus an agent skills layer. 5,400 GitHub stars in the first day. The customer is the indie operator who runs agents on his own work and wants the agents' memory to compound across sessions. Cycle length: days.

The three tiers share one structural property. Each optimizes against a metric whose feedback loop closes inside the customer's own commercial cycle. The marketing team measures next-quarter mention rate. The enterprise CTO measures internal-RAG accuracy on next-month contracts. The indie developer measures whether the agent did the work today. Each loop closes; each tier is a real business; each can be priced and sold.

---

## The tier above

There is a layer the three live tiers do not address. Call it the public-reference tier: the corpus that frontier models, and their successors a generation downstream, will treat as long-term reference, the shape Seth Godin's daily blog reached against a different channel in a different decade. Cycle length on this tier is years to decades. The metric is whether a model in 2030, or 2035, or in the next training cycle of a frontier system, treats your corpus as canonical reference rather than as one more crawled page.

No live commercial tier is pointed here. GEO optimizes for citation today inside one of the named engines under one of their current retrieval policies. Enterprise GraphRAG optimizes for retrieval inside a closed corpus owned by the enterprise. Personal agent memory optimizes for the operator's own daily agent. None of these metrics close on the cycle the public-reference tier compounds on.

The reason is time-preference math. A customer with a quarter-cycle decision horizon, under any non-zero discount rate, cannot rationally fund a decade-cycle output. The present value of a payoff arriving in 2035 is approximately zero against the cost of producing the work in 2026. No customer-facing sector can underwrite work whose payoff arrives outside the customer's decision window, and every live tier's customer has a decision window inside three years. The discount-math is not a description of customer preference. It is a constraint on what any rational commercial cycle can fund.

The empty tier is the result of every live tier's incentive structure routing precisely around the layer whose payoff lies outside any commercial cycle that closes inside three years. The 2026 commercial sectors and the public-reference tier are not the same market: they share architectural primitives (markdown, graphs, structured retrieval, agent legibility) but the commercial sectors are pointed at short-cycle metrics that explicitly exclude long-cycle compounding. Confusing the two is a category error that hides what is actually unoccupied at the commercial layer.

---

## Lab-internal curation is not the answer

There is a tempting alternative: the labs themselves curate training data with quality filtering, and their internal selection criteria privately approximate something like the public-reference test. If lab-internal curation is the actual public-reference tier, the empty-tier claim collapses.

It does not collapse. Lab-internal curation is private and unstable across model versions. Anthropic's 2026 training mix is not Anthropic's 2030 training mix; OpenAI's selection rubric in one generation is not its rubric in the next. The labs' curation pipelines optimize for a specific model release under a specific commercial pressure. The public-reference tier is what survives across those pipelines: the corpus that successive curation rubrics will all preferentially weight, because the corpus is dated, structured, provenanced, and accessible to whatever crawler the next training run is using.

Lab-internal curation produces the model's reading list for one training cycle. The public-reference tier is the corpus the next several training cycles will all choose, regardless of which lab is curating and what criteria they apply. The labs' work is downstream of the public layer, not a substitute for it. The bet is not against the labs. It is on what the labs will be unable to avoid weighting heavily because no version of the curation rubric routes around it.

---

## The empty commercial tier

The piece is more precise as: the empty commercial tier. The public-reference layer is not entirely vacant in 2026. Godin himself is still publishing daily after twenty years. Tyler Cowen has been running Marginal Revolution at similar cadence with similar machine-readability since 2003. A handful of independent operators in adjacent registers run smaller versions of the same shape. The occupants are not absent. What is absent is a commercial sector selling the layer as a product to customers who pay for it.

The structural difference matters. A live commercial sector (GEO, enterprise GraphRAG, personal agent memory) produces operators by funding them through a customer cycle. The customers fund the work; the work scales because the customer base scales. The public-reference tier has no such mechanism. The operators who occupy it occupy it on their own, funded from elsewhere, on a cadence dictated by something other than what closes a customer's cycle.

This is the Godin precedent. The blog ran for over twenty years against a near-zero direct revenue base. The blog itself did not pay. It was subsidized by the books, the workshops, the speaking. The cadence engine ran on surplus from the rest of the operation; the output of the cadence engine in turn fed the rest by building the trust the books and workshops monetized. The decoupling is the precondition: the cadence engine has to be subsidized by something else for the long-cycle layer to fill.

---

## What the AI era changes

In Godin's era, the outside surplus required was substantial. A daily public corpus over twenty years implied an active publishing operation with books, workshops, and speaking fees doing the underwriting. The capital and the platform were the gating constraint. Few operators cleared the bar.

The AI era moves the bar. Marginal cost of publishing one more legible node has collapsed: the writing, the formatting, the structuring, the cross-referencing, the machine-readability, the publish pipeline are mostly automatable from inside an agent's loop. The capital required has dropped. The platform required has dropped. What has not dropped is conviction-time. The corpus still has to be written by someone willing to do the work on a cadence dictated by something other than the closing of a customer cycle. The gating constraint has shifted from capital-and-platform to conviction-and-cadence.

This expands the operator class who can occupy the public-reference tier. The shape is still Godin's shape: outside surplus subsidizing the cadence engine. The surplus required is now smaller. An operator with prior commercial work, modest runway, and the discipline to publish at cadence can occupy the layer that previously required a publishing imprint. The empty commercial tier remains empty for the same structural reason it was empty in Godin's era. The operator-occupied layer is now reachable by more operators than it was.

---

## Architectures are similar; optimization targets are not

The live tiers will not route to the empty tier through evolution. A GEO product will not gradually become a public-reference corpus by getting better at citation tracking; the metric is wrong. An enterprise GraphRAG vendor will not gradually open its enterprise corpora to public canonicalization; the customer is wrong. An open-source agent-brain will not gradually become public-canonical reference by accumulating stars; the architecture optimizes for daily friction, not generational stability.

The architectural primitives — markdown, graphs, structured retrieval, agent legibility — appear in all four tiers. Reading an architectural diagram of any one of them looks like reading a diagram of the public-reference layer. The optimization functions are different and the difference is structural, not implementational. A diagram does not show what the system is optimized for. The cycle-time of the customer-revenue loop does, and the cycle-times do not converge.

---

## Where the analysis breaks

The bet that frontier models in 2030 will treat any 2026 public corpus as canonical reference is unverified. Models in 2030 will be trained on something. Whether the training mix preferentially weights public-cadence corpora over the much larger volume of crawled commercial content is an empirical question whose answer is not yet observable. The argument depends on the bet that long-cycle structure-and-cadence wins the training-data weighting against the noise floor; the bet is reasonable, the bet is not yet won.

The surplus-from-elsewhere precondition may dissolve. If the public-reference tier ever does generate direct revenue (through licensing to model trainers, through reader subscriptions, through some attention-economic mechanism that rewards corpus-canonicalization), then the empty tier becomes a live tier and the analysis collapses. The Godin shape was specific to a channel where the long-cycle layer never paid directly; the AI-era version may close that loop in ways that change the operator profile. If a commercial sector emerges that prices the public-reference layer directly, this piece dates fast.

---

The 2026 AI-knowledge market has the architectural shape it has because three customer cohorts paid for three sectors of products. Each sector built what its customers funded. Above all three is a layer none of the customers fund directly, on a cycle none of the products optimize for. The layer is not empty because no one knows it is there. It is empty because the live tiers' incentive structures cannot reach it. The operators who can are the ones whose runway comes from somewhere the cadence engine does not have to subsidize.

provenance · first_seen 2026-05-10T17:45:45Z · drafted 2026-05-10T17:45:45Z · published 2026-05-11T10:41:40Z · edited 2026-05-24T16:30:57Z
