What does human-in-the-loop really mean in practice? (Q&A)

@patrizia_bertini you’re dropping some gold in here :joy:

“I personally think of LLMs as less than a junior. I see them as hyper educated interns from Mars. They are eager to please, never contradict you, and incredibly good at producing convincing outputs. But those outputs can look great until you actually engage with them and test them.”

Very accurate (and funny) analogy. I have to bring up my (now 2nd) favorite description of AI provided by @ben :

“AI is a highly efficient gap filler. The bigger the gap, the worse the result is.”

1 Like

Some ideas for the thrill of receiving a deliverable in minutes that will require hours of QC and assessment.

  1. Rapid Assembly Malaise
  2. Rapid Euphoric Dissonance
  3. Euphoric Assessment Delivery

@ben you got more?

1 Like

YES. Somebody has to. What’s interesting is that this is also a problem that companies and large governmental organizations run into as well. Who is responsible when large amounts of toxins are found in paper receipts?

Awesome article and great thread.

He had a while ago a thread on Hitl, Hatl, Hotl, Hbtl.

Has anyone found any good operating models/descision models for when and where to put the human?

1 Like

Seems like a fertile area to explore, especially when bringing in metrics to support judgment.

Most framework documents are either simple like this, or going more into the technical flow charting. @menno, are you thinking about the actual human reasoning at each step?

2 Likes

No, I was more thinking related with the “in the loop”, “on the loop”, “above the loop” etc…

Like if trust is high, focus on “on or above” or even “behind” the loop.

Or the greater the task size, the more likely it will be “in” the loop..

So a decision model for “where to put the human”. This may very well vary per line of business and or use case though. But there must be generic principles, like when the tasks go into the 10s of thousands, you want to see a report. When there are few but “expensive” tasks, you want to be close etc…

So, if volume goes up, human necessity goes down.

If risk goes up, human necessity goes up.
If trust is high, human need goes down, untill a certain threshold?

1 Like

Here are a couple research-oriented approaches…but not seeing practical examples:

https://repository.lsu.edu/cgi/viewcontent.cgi?article=1464&context=honors_etd

2 Likes

Cheers and thanks for this topic!
The distinction between outputs and outcomes feels critical here. AI can flood teams with plausible outputs, but outcomes only emerge once someone applies judgment, context, and consequence thinking. When that layer is underdefined, responsibility is likely to quietly evaporate.

What’s tricky is scale. As volume increases, it becomes unrealistic to put humans “in the loop” for everything. That suggests the real design challenge isn’t whether humans are involved, but where judgment is most valuable and where it meaningfully changes risk.

I’m increasingly convinced we need clearer models for:

  • Which decisions are reversible vs irreversible

  • Where bias or harm would be most costly

  • What signals justify slowing the system down

Without that, HITL risks becoming theater — comforting, but ineffective.

5 Likes

Good stuff @jonathon.thomason_m2! Went digging a bit more to see which frameworks aligned with judgment and trade-offs. These are not specific to AI, but overlap in other disciplines.

Leader Character Framework
Defines effective leadership as rooted in a set of core character dimensions that shape how leaders think, decide and act to achieve sustained excellence. Not specifically AI, but a core group of ideas to support leadership.

Human-Machine Teaming (HMT)
Refers to the formation of collaborative partnerships between humans and artificially intelligent systems, characterized by shared goals, coactive problem-solving, mutual awareness, and synergistic adaptation.

The 30% Rule of AI: Automate a Third, Amplify the Rest
Guideline in AI integration where 30% of routine work is automated, and 70% is left for human judgment to handle tasks requiring creativity, context, and empathy. Not sure on this one… is this a balance that makes sense now?

Comparative Judgement (CJ)
Specifically used in education, combines the strengths of human judgment with the speed of Artificial Intelligence

AI Debate (Adversarial) Framework
It’s a method where AI systems argue opposite sides of a topic to catch mistakes, bias, and weak reasoning.

The Seven-Eyed Model
Supervision for counsellors and psychotherapists with seven areas of focus

Ultimately, it seems we will need to create new patterns @menno, but there are a lot of great areas to pull from.

3 Likes