These two grants are the same story as a product launch. When an AI lab demos an agent that "uses your computer" — moving the cursor, clicking buttons, filling forms — there is usually IP underneath. Anthropic's grant US12619815B2, "Magnitude invariant multimodal agent for efficient image-text interface automation" (issued May 5, 2026), is that layer, alongside the earlier US12387036B1 from August 2025 in the same family.
Connect the dots and the mechanism is a loop. The agent takes a screenshot (the image), reads any text on it, decides on an action — click here, type this — executes it, then looks again. "Multimodal" because it fuses pixels and text; "interface automation" because the output is UI actions rather than just words. The "magnitude invariant" language in the title points to making that perception robust to scale and resolution differences across screens, which is exactly the kind of brittleness that breaks naive screen-driving agents.
Why this is a both-lanes story for the AI sector: the capability is a product (agents that do tasks), a research artifact (the perception-action method in the claims), and a competitive instrument (a patent family). The same development shows up as a keynote, a paper-shaped patent, and a line item in whoever's cloud bill runs the agent.
What the patent does not tell you is how well it works in the wild — claims describe a method, not a benchmark. And screen-driving agents remain famously fragile. But the existence of a granted family signals that Anthropic treats computer-use as core enough to protect, not a demo-day novelty.
For readers tracking the agent wave, this is the grounding move: when a lab says "our model can use a computer," ask what the mechanism is. Here, the documents answer — a multimodal perception loop turning screenshots into actions, with the robustness tricks claimed explicitly.