Frontiers · Long-horizon autonomy

Humans in the loop

You can describe the four-layer model Anthropic uses to reason about trustworthy agents, and the practical question that follows it: where do humans actually sit in a long-horizon agent workflow?

Long-horizon autonomy does not mean "remove the human." It means "decide which humans, at which moments, looking at what." The interesting design work in 2026-2027 is the checkpoint, not the absence of one.

Anthropic's Trustworthy Agents in practice (2026) gives the field a shared vocabulary: a four-layer model — intentions, knowledge, recovery, and accountability. Intentions cover whether the agent is trying to do what you asked. Knowledge covers whether it has accurate context. Recovery covers whether it can detect and back out of mistakes. Accountability covers what happens when it cannot. The framing matters because in a 14-hour run, "do you trust this agent?" is the wrong question. The right one is "which of the four layers are you trusting right now, and how do you know?"

A related, smaller datum from the same Anthropic work has become field shorthand: in Claude Code's auto mode, developers approve 93% of permission prompts. Per-action approval is not control. It is decision fatigue. The replacement now ubiquitous in the orchestration products is Plan Mode — review the entire plan upfront and let the agent execute against it — paired with hard stops on a small set of high-stakes actions.

This chapter walks three lessons:

Chapter contains 3 lessons.