Anthropic is leading the way when it comes to programming. As a developer myself, I’m on top of these models.
Recommend reading their official post:
One thing that’s super interesting to me is how close all of these other models are in comparison. Wouldn’t that be extremely stressful? Knowing that the other competitors are a few percentage points away from not only dominating your expertise, but their whole ecosystem?
It’s an interesting time in AI… an all-out arms race, and one that will be won by the company that best addresses user needs… how do you determine that? It always comes down to context and problem.
In the example shown… could be:
Time to resolution
Number of steps the user must take
How many clarifying questions are needed
Whether the user understands the next action
User frustration signals (backtracking, repeated complaints)
But those don’t show up necessarily in the model benchmarks. Each model has its own strengths.
As time goes on, I’m seeing a shift in smaller OSS models starting to arise.
Cheaper, faster, open source.
I think it’s gonna take a lot longer before AGI is achieved, so more than likely we’re going to have a world of specialized models that tackle different verticals across the economy.