Claude Opus 4.6: When Paying $75/M Actually Saves You Money
Opus costs 5× Sonnet and 268× DeepSeek. Here's the decision framework for when that's the right call — and the real cost of getting it wrong.
The standard advice is to use Opus sparingly. That's correct, but incomplete. The real question isn't how often you use it — it's whether you can afford not to on the tasks that matter.
Claude Opus 4.6 runs at roughly $75 per million output tokens on OpenRouter. Claude Sonnet 4.6 is $15. DeepSeek V3.2 is $0.28. That's a 5× gap between Opus and Sonnet, and a 268× gap between Opus and DeepSeek.
If you treat Opus as "Sonnet but more expensive," you'll always decide it's not worth it. That's the wrong frame.
The right frame: cost of failure
Opus earns its cost when the cost of a wrong or mediocre answer exceeds the price difference.
Sonnet handles 80% of reasoning tasks well. The remaining 20% — the ones where Sonnet hedges, misses a dependency, or produces plausible-but-wrong output — are the ones that bite you.
The question is: what does a wrong answer cost on this specific task?
If the answer is "I'll re-run it" → Sonnet is fine. If the answer is "I'll spend two hours debugging" → Opus starts looking cheap. If the answer is "this goes to a customer or shapes a decision" → Sonnet is the risky option.
The Tier C decision tree
Is the output irreversible or customer-facing?
├─ NO → Does this involve more than 3 interdependent reasoning steps?
│ ├─ NO → Sonnet (Tier B)
│ └─ YES → Is a wrong answer recoverable in under 30 minutes?
│ ├─ YES → Sonnet (Tier B)
│ └─ NO → Opus (Tier C)
└─ YES → Is tone, precision, or trust the primary success criterion?
├─ NO → Sonnet (Tier B)
└─ YES → Opus (Tier C)
The key gate is recoverability. Opus is for tasks where Sonnet's failure mode costs more than the $0.06–0.15 per call you're saving.
Where Opus earns its cost
Architecture decisions with downstream lock-in. If you're asking the model to design a data schema, choose a library, or spec an integration — and you'll be living with that choice for months — Sonnet's tendency to produce locally-coherent-but-globally-inconsistent answers is a real risk. Opus holds the full context better over long reasoning chains.
Complex debugging across multiple files. Sonnet can lose the thread on bugs that span more than two or three files. It finds plausible causes and stops. Opus is more likely to keep pulling the thread until it hits the actual root cause.
High-stakes external communications. A pitch deck summary, a client proposal, a sensitive message. These aren't expensive in tokens — they're expensive if they're wrong. Opus's judgment on register, implication, and what not to say is meaningfully better.
Strategy documents that shape direction. Not research drafts — those are Tier A work. Final synthesis: weighing options, identifying the actual constraint, making a recommendation. If someone will act on this, use Opus.
Financial or contractual analysis. When the task is "review this and tell me what I'm agreeing to," the model needs to catch things that aren't obvious. Sonnet misses subtle implications in dense legal or financial language more often than Opus does.
Where Opus wastes money
Anything with a clear success criterion. Classification, extraction, summarisation, formatting — these are Tier A tasks. Routing them to Opus because the subject matter feels important is a common mistake. The complexity of the topic doesn't determine the complexity of the task.
Long but repetitive work. Processing 50 similar documents? Tier A for each. The per-document reasoning is simple even if the volume is large.
Drafts you're going to edit anyway. If you're writing the first version and you plan to revise, Sonnet's draft is good enough. Use Opus for the final pass if precision matters — not the first draft.
Anything where you haven't tried Sonnet first. Before you escalate to Opus, run the task on Sonnet. You'll often find it's sufficient. Opus is an escalation, not a default.
Practical cost maths
On a typical OpenClaw deployment (4,000 requests/month, averaging ~500 output tokens each):
| Tier mix | Monthly cost |
|---|---|
| 100% Sonnet | ~$30/month |
| 68% DeepSeek / 28% Sonnet / 4% Opus | ~$17/month |
| 68% DeepSeek / 24% Sonnet / 8% Opus | ~$22/month |
The difference between 4% Opus and 8% Opus is $5/month. If those 160 extra Opus calls prevent two debugging sessions or one bad client output, it's paid for itself.
The failure case isn't using Opus too much. It's using it indiscriminately — sending it work that Sonnet handles fine — while also being too conservative on the tasks where it actually matters.
The OpenClaw routing config
{
"routing": {
"tiers": [
{
"name": "fast",
"model": "deepseek/deepseek-chat-v3-2",
"conditions": ["task_type:lookup", "task_type:summarise", "task_type:classify"]
},
{
"name": "main",
"model": "anthropic/claude-sonnet-4-6",
"conditions": ["task_type:reason", "task_type:code", "task_type:plan"]
},
{
"name": "deep",
"model": "anthropic/claude-opus-4-6",
"conditions": [
"flags:irreversible",
"flags:customer_facing",
"flags:financial",
"complexity:high AND recoverability:low"
]
}
]
}
}
The flags conditions are set manually or by a lightweight classifier that runs before the main call. The security checklist guide covers how to vet the classifier itself — a compromised classifier that routes everything to Tier A is a real attack surface.
The actual mistake
Most setups under-use Opus on a handful of high-value tasks and over-use Sonnet on everything else. The fix isn't a blanket "use Opus more" — it's being explicit about which tasks have a high cost-of-failure, and routing those deliberately.
Treat Opus as insurance on the calls where being wrong is expensive. At those prices, it's usually cheap insurance.
The routing config above is a simplified version. The full config with classifier prompts and fallback chains is in the hybrid routing guide.
More free routing guides
Drop-in config snippets, classification prompts, and fallback chains — all free.
View all guides →