Claude 3 is my new favourite model

Anthropic released Claude 3 last week. Three models: Haiku (fast, small), Sonnet (balanced), Opus (powerful). I’ve been testing them against the work I usually give to GPT-4, and I’m doing more work in them. That’s not a complete change, but it’s worth writing down, because it changes the conversation about AI tooling in an agency.

I’ve been an OpenAI user for a year and a half. GPT-4 is genuinely good at what it does. Code generation, complex reasoning, writing. I didn’t have a compelling reason to switch. Then Claude 3 dropped and I spent a few hours on on it.

The first test was a code review task. I gave it a PR for a WordPress plugin change, asked it to look for issues, think about edge cases, consider performance. GPT-4’s output was good. Claude’s output was more careful. It flagged things GPT-4 missed and was more explicit about the trade-offs it was making. That’s the difference between “this is good” and “this is good and here’s why I’m confident about that”.

Second test was content scaffolding. I gave it a brief for a guide we’re writing, asked it to outline the structure. Both models produced outlines. Claude’s was more conservative. It didn’t pad with filler sections. It pushed back gently on one aspect of the brief where it thought the structure didn’t quite work. That’s useful feedback, not just execution.

Third was straightforward code generation for a utility script. Claude was faster. Haiku and Sonnet both outperformed GPT-4’s speed on something that didn’t need Opus-level reasoning. That matters for the cost model if you’re running this at scale.

So why does this matter beyond “new model is good, try it”?

Inside Filter, we use OpenAI because it’s the best available. That’s still true, but “best” is context-dependent. GPT-4 is better at certain kinds of reasoning. Claude is better at others. They have different cost curves, different latency profiles, different behaviour.

Twelve months ago, if you were betting on language models, OpenAI was the obvious choice. The only choice, really. If Anthropic proves itself as a reliable provider and the models keep getting better, then we’re likely to switch to that for the future.

It will be interesting to see what happens with Google in the next twelve months though – can they catchup? Claude 3 is the winner right now, but it could all change again soon.