Google's AI Metrics - How Google is Defining Them

ben · September 9, 2025, 6:23pm

Howdy to a fantastic Tuesday, Glarians!

I’ve come here to find an answer a simple question:

What are some AI Metrics that make sense to track?

Here’s an interesting find from one of Google’s blogs:

Here are some of the key metrics that they highlight:

Coherence: Measures the model’s ability to generate a coherent response based on the prompt.

Fluency: Measures the model’s language mastery based on the prompt.

Safety: Measures the level of harmlessness in a response.

Groundedness: Measures the ability to provide or reference information included only in the prompt.

Instruction following: Assesses a model’s ability to follow instructions provided in the prompt.

Verbosity: Measures a model’s conciseness and its ability to provide sufficient detail without being too wordy or brief.

Text quality: Measures how well a model’s responses convey clear, accurate, and engaging information that directly addresses the prompt.

Summarization quality: Measures the overall ability of a model to summarize text.

What are some other ones that make sense to add to this list? Are these the most important ones to define?

MoData · September 9, 2025, 9:12pm

I like a few of these metric ideas like Coherence and Instruction Following (could use a snappier name) for measuring AI, others seem like a bit of a stretch like Groundedness or Verbosity.

Here’s a few metrics that we surfaced for AI performance based on the results from a survey with 90+ product leaders, ux designers, and researchers.

Trust, the rate at which participants accept the first answer provided
Repetition, the number of times that users have to ask a prompt in a session to get a desired answer
Frustration, the amount of negative terms/language thats used by participants in their follow-up prompts to the AI

You can check out the link to the survey and the participant responses here: Helio | Design Insights

EricZ · September 9, 2025, 10:06pm

Pairwise metrics involve comparing the responses of two models and picking the better one to create a win rate. This is often used when comparing a candidate model with the baseline model. These metrics work well in cases where it’s difficult to define a scoring rubric and preference is sufficient for evaluation.

From an underlying perspective, this seems to align with our conversation the other day. Identifying an example of an ideal to compare against the AI’s first pass allows for a round of refinement.

Topic		Replies	Views
Defining AI Metrics Choosing metrics ai , metrics , define	4	34	November 10, 2025
Intelligence Metrics: The User vs. Product Framework Choosing metrics	39	103	January 6, 2026
Is AI really the problem in your product development? Using AI product , ai , ai-product-developme	11	53	January 2, 2026
UX metrics Cheatsheet Glare Pages Choosing metrics	4	27	October 17, 2025
AI helps you build. UX metrics help you decide Choosing metrics	1	21	January 15, 2026

Google's AI Metrics - How Google is Defining Them

Related topics