Wrapper vs. Scaffolding vs. Harness: What's Actually the Difference?

Spend five minutes in any AI engineering conversation right now and you’ll hear all three.

“We built a wrapper around GPT-4.” “It’s just some scaffolding.” “The harness is what makes it real.”

They sound interchangeable. They’re not.

And the difference isn’t just semantic. It’s architectural. Confuse them and you’ll underestimate what you’re building, or worse, underdesign it.

👉 What exactly is a wrapper? 👉 What does scaffolding actually do that a wrapper doesn’t? 👉 And what is a harness, and why is it suddenly everywhere?

Let’s unpack all three.

🔌 A Wrapper Is Just Glue

A wrapper is the thinnest layer you can put around an AI model. It makes an API call easier to use. It handles authentication, formats the request, catches errors, and hands back a clean response.

That’s it.

Authentication: Manages API keys so you don’t repeat that code everywhere
Request formatting: Shapes your input into what the model expects
Response parsing: Gives you back something usable
Basic error handling: Retries a failed call, surfaces a clean error

Think of it like putting a handle on a knife. The knife doesn’t change. You’ve just made it easier to hold.

A wrapper is useful. It’s often necessary. But it’s passive. It doesn’t shape what the model does, just how you talk to it.

🏗 Scaffolding Gives the Model a Shape

Scaffolding sits a level above. It provides the structural support for a model to operate: prompt templates, conversation state, retry logic, basic tool-calling loops.

Where a wrapper handles the call, scaffolding handles the context around the call.

Prompt templates: Pre-structured inputs that guide what the model does
Conversation state: Keeping track of what’s been said across turns
Tool-calling loops: Letting the model invoke a function and continue
Retry logic: Deciding when to try again and how

Think of scaffolding like the frame of a building under construction. It holds things in place and gives the model a shape to work within. But it doesn’t check the work. It’s still relatively passive. It sets up the environment without actively managing outcomes.

“A wrapper handles the call. Scaffolding handles the context. The harness handles the consequences.”

🔧 A Harness Is the Whole System

A harness isn’t just infrastructure around a model. It’s the infrastructure that compensates for the model’s weaknesses. It catches errors the model makes. It enforces constraints the model would ignore. It creates feedback loops so the model can actually iterate and improve.

The cleanest definition I’ve come across:

If an agent = LLM + loop with tool calls, the harness is everything except the LLM itself.

A well-designed harness includes:

Tool access: File system, terminal, browser, APIs the model can call
Memory and context management: Keeping relevant state across long tasks
Feedback loops: Run the code, check if tests pass, feed the result back
Guardrails and constraints: Linters, structural rules, enforced deterministically, not by the model’s judgment
Observability: Logs, traces, and metrics the agent can see and use to self-correct

The key insight: the harness actively manages outcomes. It’s not setting up an environment and stepping back. It’s continuously shaping what the model does next.

🏋️ Everyday Analogy

The clearest way to feel the difference:

Wrapper: Giving an athlete the right shoes. Makes movement easier, doesn’t change the athlete.
Scaffolding: A training schedule and a coach’s game plan. Structure to work within.
Harness: The full training system. Feedback after every session, corrected form, tracked progress, adjusted plan.

📊 How They Relate

Term	Scope	Active or Passive	Real-World Example
Wrapper	Thin API layer	Passive	A Python class around the OpenAI API
Scaffolding	Templates + loops	Passive	A prompt chain with retry logic
Harness	Full agent runtime	Active	Cursor, Claude Code, OpenAI’s Codex system

🌐 Why “Harness” Is Having a Moment

OpenAI recently coined the term “Harness Engineering” (source): the idea that human engineers stop writing code directly and instead design the harness, specifying intent, constraints, and feedback mechanisms while agents do the implementation.

Anthropic has written about this too (source), noting that the harness often determines real-world effectiveness more than the model itself.

That line is worth sitting with.

We spend enormous energy debating which model is smarter. GPT-4 vs Claude vs Gemini, benchmarks, token costs, reasoning ability. And meanwhile, the harness, the system that actually determines whether your agent ships working code or hallucinates into the void, is treated as an afterthought.

“We keep upgrading the brain and ignoring the body. The harness is the body.”

🎯 Why the Distinction Matters

When you call something a “wrapper” that’s actually a harness, you underestimate what you’ve built and fail to invest in it properly.

When you call a harness “just scaffolding,” you miss the active ingredient: the feedback loop, the guardrails, the observability that makes an agent reliable rather than impressive in demos and broken in production.

The vocabulary matters because the vocabulary shapes the design decisions.

A wrapper engineer asks: How do I make this API easier to call? A scaffolding engineer asks: How do I give this model structure to work within? A harness engineer asks: How do I build the environment that catches failures, enforces correctness, and lets this model iterate toward the right answer?

Three different questions. Three different systems. Three very different outcomes.

“The model is the brain in the jar. The harness is everything that makes the brain useful in the real world.”

The next time someone says “we built a wrapper,” ask them: does it enforce constraints? Does it have feedback loops? Does it catch errors and feed them back to the model?

If yes, they built a harness. They just didn’t know to call it that.

👉 Which of these are you actually building — a wrapper, scaffolding, or a harness? And does naming it correctly change how you’d design it?