Anthropic's Latest- The Fight Towards AGI and How They're Measuring It

ben · November 25, 2025, 5:01pm

Anthropic is leading the way when it comes to programming. As a developer myself, I’m on top of these models.

Recommend reading their official post:

One thing that’s super interesting to me is how close all of these other models are in comparison. Wouldn’t that be extremely stressful? Knowing that the other competitors are a few percentage points away from not only dominating your expertise, but their whole ecosystem?

I’m curious about when these incremental changes turn into an innovative, large leap forward, and what work that might consist of.

The other interesting thing, is how they’re measuring.

They reference this in their benchmarks GitHub - sierra-research/tau2-bench: τ²-Bench: Evaluating Conversational Agents in a Dual-Control Environment

Are any of these means of measurement tools that other orgs should be using to identify gaps in their own tools? Or is this for only the core models?

nathaliesmith · November 26, 2025, 11:31pm

That’s what competition is for, @ben - keeps you wanting to engage and pedal to the metal. Where the fun without that?

Bryan · November 29, 2025, 11:57pm

It’s an interesting time in AI… an all-out arms race, and one that will be won by the company that best addresses user needs… how do you determine that? It always comes down to context and problem.

In the example shown… could be:

Time to resolution
Number of steps the user must take
How many clarifying questions are needed
Whether the user understands the next action
User frustration signals (backtracking, repeated complaints)

But those don’t show up necessarily in the model benchmarks. Each model has its own strengths.

ben · December 4, 2025, 4:58pm

As time goes on, I’m seeing a shift in smaller OSS models starting to arise.

Cheaper, faster, open source.

I think it’s gonna take a lot longer before AGI is achieved, so more than likely we’re going to have a world of specialized models that tackle different verticals across the economy.

No single winner…

Topic		Replies	Views
Design has shifted from control to stewardship (Q&A) Using AI	23	34	February 2, 2026
What does human-in-the-loop really mean in practice? (Q&A) Using AI	28	66	January 27, 2026
The (Open)AI Bubble Open topics ai	14	51	December 4, 2025
Google's AI Metrics - How Google is Defining Them Choosing metrics ux-metrics , ai	2	12	September 9, 2025
UX Is Evolving. Are We Keeping Up? A conversation with Menno Cramer Proving biz impact user-experience , podcast , ai-design	11	83	February 2, 2026

Anthropic's Latest- The Fight Towards AGI and How They're Measuring It

Related topics