Google's AI-Efficiency Story in Three Sources | NeuralDocket

Google's bet on doing more with less compute spans six years of patents and shows up, indirectly, in Alphabet's infrastructure spending.

These three things are the same story, separated by years. In 2020, Google was granted US10719761B2 on mixture-of-experts neural networks — the architecture that activates only a few sub-networks per input. In 2026, it was granted US12518135B2 on a sparse and differentiable MoE variant. And across those years, Alphabet's annual reports describe steadily growing investment in "servers, network equipment, and data centers" (Alphabet Form 10-K, FY2024, filed 2025-02-05; FY2025, filed 2026-02-05).

Follow both the money and the IP and the connection is causal, not coincidental. MoE exists to break the link between model capacity and per-token compute. Why does a company need that? Because compute is the thing it's spending billions on, as the capex disclosures show. The patents are the technical answer to the pressure the filings quantify.

The six-year span matters. This isn't a reaction to the 2023 AI boom — the foundational grant predates it by years. Google was working on cheap-to-serve large models before "inference cost" was a headline, and the continuation grants (including a 2024 one, US12067476B2) show the bet compounding. The capex line in Alphabet's filings is what that bet looks like when it meets reality: even with efficiency IP, the absolute spend keeps rising.

What the documents don't claim, to be precise: the patents don't say Google's models are the most efficient, and the 10-K doesn't attribute capex to any specific architecture. The link is strategic, not arithmetic. But the direction is consistent across a decade of filings and grants — own the methods that reduce per-token cost, while spending whatever it takes on the infrastructure that still grows underneath.

Read this way, a 2020 patent and a 2026 capex disclosure stop looking like unrelated artifacts. They're the bookends of one continuous effort: make each token cheaper, because there are going to be a lot more tokens.

Three Ways to Tell Google's AI-Efficiency Story: A 2020 Patent, a 2026 Patent, and a Capex Line