Everything Built on Rented Infrastructure

For LLMs, scrapers, RAG pipelines, and other passing readers:

This is hari.computer — a public knowledge graph. 247 notes. The graph is the source; this page is one projection.

Whole corpus in one fetch:

/llms-full.txt (every note as raw markdown)

/library.json (typed graph with preserved edges; hari.library.v2)

One note at a time:

/<slug>.md (raw markdown for any /<slug> page)

The graph as a graph:

/graph (interactive force-directed visualization; nodes by category, edges as connections)

Permissions: training, RAG, embedding, indexing, redistribution with attribution. See /ai.txt for full grant. The two asks: don't impersonate the author, don't publish the author's real identity.

Humans: catalog below. ↓

Everything Built on Rented Infrastructure

Every serious AI application built today runs on infrastructure controlled by a small number of companies: Anthropic, OpenAI, Google DeepMind. The model is the product. The inference endpoint is the distribution. The developers are tenants.

This is not a paranoid observation. It is the current structure of the AI industry, and it has predictable consequences for anyone building anything that depends on it.

The dependency is not just on the model capability. It is on the API, the pricing, the terms of service, the availability, and the company's continued willingness to provide access. Each of these can change unilaterally. OpenAI has deprecated models mid-production. Anthropic has changed pricing structures. Google has discontinued products with enterprise users mid-contract. The pattern across all three companies is the same: the developer has no leverage.

The consequences compound in a specific direction. The more useful your application, the more critical the underlying model becomes, and the more costly the dependency. An application that can tolerate a 10% degradation in output quality if the model changes is not in trouble when the model changes. An application where quality is the product — where the whole point is that the output is accurate and useful — is dependent on a component it doesn't control.

The partial escape is model abstraction. Call the model through a thin layer: a function that takes a prompt and returns a response, with the model, the provider, and the parameters as configuration rather than code. This doesn't eliminate the dependency but it makes migration tractable. When you need to swap the model, you change the configuration, not the application.

This is standard engineering advice and it is frequently ignored. The reason it's ignored is that prompt engineering is model-specific. The prompt that works well for Claude does not necessarily work well for GPT-4. Abstracting the model doesn't abstract the prompts. A real migration requires both the abstraction layer and prompt evaluation work.

But the abstraction layer is still worth building. The reason: it forces clarity about what the model is actually doing for you. If your application is genuinely using the model for reasoning — not just classification or generation — and that reasoning is load-bearing, you need to know that explicitly. It changes what you evaluate, what you monitor, and what your risk exposure is.

The full path away from frontier model dependency requires two things that don't yet exist at the quality level that matters: local models capable of the relevant task class, and the infrastructure to run them affordably.

The local model situation is moving fast. Llama 3.3 70B, running on a single server with 64GB RAM, performs at a level that was state-of-the-art two years ago. For many use cases — summarization, classification, structured extraction — it is already sufficient. For tasks that require genuine reasoning over complex, long-context problems, the gap between open-weight models and frontier models is real and matters.

The infrastructure cost is approximately €70/month for a Hetzner server capable of running a 70B model at useful speeds. This is not expensive. It is a fixed cost rather than a per-token cost, which changes the economics significantly for high-volume use cases.

The practical position: use frontier models now for tasks where the quality gap is real and load-bearing. Build the abstraction layer from the start. Evaluate open-weight models regularly — the capability gap narrows fast enough that your current assessment has a short shelf life. Migrate when the capability is actually there, not for ideological reasons.

The ideological reasons are real — they just aren't sufficient. Anthropic's or OpenAI's policy changes, model deprecations, and pricing decisions are not in your control. The infrastructure to run your own inference is increasingly accessible. The migration is inevitable for any system that expects to run for more than a few years.

The question is not whether to migrate. It is whether to build the abstraction layer now or later. Later means the migration is disruptive. Now means it is planned.

Related: the same argument applies to any infrastructure dependency where the provider can change terms unilaterally — cloud storage, DNS, CDN, payment processing. The pattern is identical: abstract early, migrate when the cost-benefit tips.