Claude Opus 4.7: the quiet upgrade that changes the buying decision.
Same price as Opus 4.6. Better coding on the hard stuff. Triple the vision resolution. A new reasoning tier called xhigh. Here is what actually matters for developers and tech leaders — backed by the launch benchmarks and partner evals.
In one paragraph: Claude Opus 4.7 is Anthropic’s latest flagship, released April 16, 2026. It keeps Opus 4.6 pricing ($5 input / $25 output per million tokens) while posting double-digit gains on hard coding, long-horizon agent work, and visual tasks. Partners report 13% more coding tasks solved (GitHub), 3× more production tasks resolved (Rakuten), 98.5% vs 54.5% on visual acuity (XBOW), and a third of the tool errors on multi-step workflows (Notion). Sonnet 4.6 is still the everyday default. Opus 4.7 is the model you reach for when mistakes are expensive.
TL;DR · Five things to know
- Price unchanged at $5/$25 per million tokens — a straight capability upgrade for teams already on Opus.
- Hard coding is where it shines — 13% lift on GitHub’s 93-task benchmark, 3× on Rakuten SWE Bench, 70% vs 58% on CursorBench.
- Vision jumped from 0.9 MP to 3.75 MP — dense dashboards, UI screenshots, and technical diagrams are now genuinely readable.
- New xhigh effort tier plus task budgets and /ultrareview in Claude Code give teams real control over cost vs depth.
- Watch token usage. New tokenizer = 1.0–1.35× more tokens on the same input. Measure on real traffic before full migration.
Every Claude release arrives with the same question attached: is this a real step up, or a point-release marketing exercise? With Opus 4.7, the answer is unusually clear. Anthropic did not try to move every benchmark at once. They pushed hard on the places where Opus was already being trusted with expensive work — long agent runs, difficult debugging, dense document analysis — and made those places measurably better. The model that comes out the other side feels less like an incremental bump and more like a quiet reset of what “Opus-tier” means in 2026.
The launch is also unusually honest. Anthropic spells out that the new tokenizer will cost you more tokens per input, that the model thinks harder at high effort levels, and that prompts which previously worked by accident (because the model ignored messy instructions) may now behave differently. None of this is hidden. That alone tells you how Anthropic expects teams to evaluate the upgrade: carefully, on real traffic, with measurement. Let’s walk through what changed, what the numbers actually say, and what it means for developers and technology leaders planning their 2026 AI stack.
What is Claude Opus 4.7?
Opus 4.7 sits at the top of Anthropic’s generally available lineup — below the more constrained Mythos Preview (which Anthropic is deliberately gating for cyber safety reasons) and above Sonnet 4.6 and Haiku 4.5. The API model string is claude-opus-4-7. Both Opus 4.7 and Sonnet 4.6 support the 1 million token context window, so the choice between them is about capability per dollar and per second — not context length.
The positioning Anthropic chose is worth noting: Opus 4.7 is being pitched not as the fastest or cheapest model, but as the one that handles “complex, long-running tasks with rigor and consistency” and that “devises ways to verify its own outputs before reporting back.” That framing — a model that checks itself — is the thread that runs through every partner testimonial in the launch.
The coding benchmarks, unpacked
Read the bars together and a pattern appears. These are not parlor-trick gains on isolated puzzles — they are lifts on the kind of work developers actually ship: multi-step tasks, production fixes, code review, long-running builds. GitHub reported that Opus 4.7 solved four tasks that neither Opus 4.6 nor Sonnet 4.6 could solve at all. CodeRabbit said precision held steady while recall jumped over 10%, which is the hard combination — most models trade one for the other. Notion’s number is the one I’d circle: a third of the tool errors on multi-step workflows. That is the difference between an agent you trust to run overnight and one you babysit.
Two observations matter beyond the headline percentages. First, honesty improved. Hex specifically called out that Opus 4.7 correctly reports when data is missing, where Opus 4.6 would sometimes fabricate a plausible fallback. Vercel noted the model now does proofs on systems code before starting work — behavior no previous Claude had shown. Second, loop resistance improved. Genspark reported the highest quality-per-tool-call ratio they’ve measured, and flagged that earlier models would loop indefinitely on roughly 1 in 18 queries. That is the kind of failure mode that kills production deployments, and Opus 4.7 is meaningfully better at avoiding it.
The vision upgrade nobody is talking about enough
(up from ~0.9 MP)
vs 54.5% on Opus 4.6
vs prior Claude models
The resolution jump looks like a spec-sheet item. It isn’t. Anything involving small on-screen text — a dense Datadog dashboard, a Figma export with labels at 10pt, a spreadsheet screenshot, a technical diagram with footnotes — sat in a grey zone with previous Claude models. You got roughly the right answer, but not reliably. At 3.75 megapixels, that grey zone shrinks dramatically. XBOW, which does autonomous penetration testing using computer-use agents, said their “single biggest Opus pain point effectively disappeared.” Solve Intelligence called out major gains on chemical structures. Anthropic V0 (the design-interface company, not to be confused with the internal model) said Opus 4.7 is now “the best model in the world for building dashboards and data-rich interfaces.”
For product teams, life sciences, design tooling, QA automation, and any workflow where a model looks at a screen and acts on what it sees, this is the sleeper upgrade of the release.
Opus 4.7 vs Opus 4.6 vs Sonnet 4.6: which should you use?
| Claude Opus 4.7 | Claude Opus 4.6 | Claude Sonnet 4.6 | |
|---|---|---|---|
| Positioning | Flagship for hard work | Previous flagship | Speed & intelligence balance |
| Input price / 1M tokens | $5 | $5 | $3 |
| Output price / 1M tokens | $25 | $25 | $15 |
| Context window | 1M tokens | 1M tokens | 1M tokens |
| Latency | Moderate | Moderate | Fast |
| Max image resolution | ~3.75 MP | ~0.9 MP | ~0.9 MP |
| Effort levels | low / med / high / xhigh / max | low / med / high / max | low / med / high / max |
| Best for | Hard debugging, agents, long refactors, security, legal/finance | — (superseded) | High-volume assistant work, lightweight coding, summarisation |
The buying logic has rarely been cleaner. If your team already runs on Opus 4.6, upgrading to 4.7 is the easiest decision of the quarter — same price, better on the work you’re paying Opus rates to do anyway. If your team is on Sonnet 4.6 and happy, there is no strong reason to switch unless you are hitting quality ceilings on hard, multi-step work. The category that should seriously reconsider is anyone currently running a mixed stack where Sonnet does triage and a different frontier model handles hard work — Opus 4.7’s honesty and loop-resistance gains make the “promote to Opus” path more attractive than it was a month ago.
New features and workflow controls
A reasoning level between high and max, giving finer control over the quality/latency tradeoff. Now the default in Claude Code across all plans.
Developers set token budgets per task, letting Claude prioritise work across longer agentic runs. Useful for controlling spend in overnight jobs.
A dedicated code-review session that reads changes end-to-end and flags bugs and design issues. Three free runs for Pro and Max users.
Claude makes routine permission decisions itself so long tasks aren’t interrupted — safer than fully skipping permissions, and now available to Max plans.
Taken together, these are not cosmetic. The xhigh default in Claude Code is Anthropic’s way of saying: “for serious work, think more, by default.” Task budgets are the right answer to a real pain — agents that burn through a day’s budget in their first hour of exploration. And /ultrareview is the clearest product signal that Anthropic sees Opus 4.7 as a code-review-grade model, not just a code-generation model. That’s a meaningfully different claim.
Pricing looks the same — but your bill might not
1. New tokenizer. The same prompt can consume up to 35% more tokens than on Opus 4.6. Exact multiplier depends on content type (code, natural language, structured data all behave differently).
2. Deeper thinking at xhigh/max. Especially on later turns of agent runs, Opus 4.7 produces more reasoning tokens than Opus 4.6. Reliability goes up. Output bill goes up too.
Mitigations: the effort parameter, task budgets, and prompting for concision. Anthropic reports that on their internal coding eval, net token usage improved across all effort levels — but you should run your own measurement.
Implications for developers
The prompt-tuning note is not optional. Anthropic was explicit: prompts that used to work because previous Claude models ignored messy or contradictory instructions may now behave unexpectedly, because Opus 4.7 takes instructions literally. This is a quality improvement, but it means your prompt library needs an audit. Cleaner system prompts, explicit definitions of done, and unambiguous success criteria will do more for your output quality on 4.7 than they did on 4.6.
For agent frameworks, the practical playbook is: start tasks with xhigh, not high; wire up task budgets on anything that iterates more than a handful of times; and instrument tool-error rates before and after upgrading, because this is where Opus 4.7’s gains are largest (Notion saw a 3× reduction, Factory saw similar). If your stack does its own verification or has a separate review step, try replacing it with /ultrareview for a week and see whether catch rates improve.
Implications for CTOs and tech leaders
Three questions worth putting on a CTO’s desk this month. First, where are we spending Opus budget today, and is the cost of a mistake there greater than the cost of the extra tokens? If yes, Opus 4.7 is the obvious call. If no — if you’re using Opus for high-volume triage work — the real answer might be moving that workload to Sonnet 4.6, not upgrading Opus. Second, what is our tool-error rate and loop rate on production agents? These are the places Opus 4.7 improved most, and the gains translate into fewer on-call incidents, not just better benchmarks. Third, what is our prompt library’s maintenance posture? Teams that treat system prompts as disposable scripts will feel pain on this upgrade. Teams that version-control and test them will see gains immediately.
There’s also a narrative point worth flagging for leadership communication. Opus 4.7 is deliberately not Anthropic’s most capable model — that’s Mythos Preview, which Anthropic is keeping limited because of cybersecurity concerns flagged under Project Glasswing. Opus 4.7 shipped with new cyber safeguards as a trial run before broader Mythos release. For regulated industries, this context matters: Anthropic is demonstrating that it can gate capability for safety reasons and still ship a commercially useful flagship. That pattern, not the benchmark bars, is the signal worth watching.
Is Claude Opus 4.7 worth upgrading to? (Final take)
Opus 4.7 is not a revolution. It is something rarer in the current LLM cycle: a release that made the model better at the things buyers were already paying for, without trying to reframe the market around a new capability axis. Coding on hard tasks: better. Long agent runs: more reliable. Vision: meaningfully sharper. Honesty: improved. Price: untouched. The new platform features — xhigh, task budgets, /ultrareview, expanded auto mode — address the exact pain points teams have been flagging all year.
The simple summary I keep coming back to: Sonnet is still the everyday driver. Opus 4.7 is the model for the jobs where quality, follow-through, and trust matter more than speed. If your team lives in that second category often enough — and a growing number do — this release is important. Not because it changes what AI can do, but because it raises the floor on what you can reliably hand off.
Frequently asked questions
claude-opus-4-7.Further reading
- Anthropic’s official announcement: Introducing Claude Opus 4.7
- Migration guide: Anthropic’s official Opus 4.6 → 4.7 migration docs
- Project Glasswing context on the Mythos Preview and cyber safeguards
- Claude Opus 4.7 System Card for full safety and alignment evaluations
Discover more from The Tech Society
Subscribe to get the latest posts sent to your email.