Why we keep AI in the loop, not on autopilot
AI ships fast. AI ships wrong. The trick is using it where it accelerates judgment instead of replacing it. Here is how we draw the line at Baboons.
Generative models are extraordinary at producing plausible work. They are also extraordinary at producing plausibly wrong work, with no internal alarm bell. The whole game, for a studio shipping real software, is learning where to trust the model, and where to keep a human's hand on the wheel.
Three places we let AI move first
We treat the model as a fast intern, not a senior engineer. It moves first on tasks where the cost of being wrong is low and the cost of being slow is high:
- Drafting boilerplate. CRUD endpoints, form schemas, test scaffolds. Anything where the correct answer is shape, not insight.
- Translating intent across formats. A Figma frame into a typed component interface, a SQL schema into a Zod validator, a PRD into a checklist.
- Sketching alternatives. Three layouts, three copy variants, three architectures, in the time it used to take to produce one.
Three places we pull it back
We take the wheel back the moment a decision becomes load-bearing. The line is whether a wrong answer would compound:
If catching the mistake costs more than the work the model saved, it doesn't belong on the model's desk.
That principle keeps us out of three categories: data-model decisions, anything touching auth or money, and copy that's going out under a client's name. Not because the model can't try, but because we can't afford to verify every output as carefully as we would need to.
How we wire it in
Every project gets a thin shim that hides the model behind a typed interface. The shim is where the policy lives: retries, allowed prompts, output validation. The application code only sees the validated result, never the raw response.
export async function suggestSlug(title: string): Promise<string> {
const candidate = await ai.generate({
schema: z.object({ slug: z.string().regex(/^[a-z0-9-]+$/) }),
prompt: `Slugify: ${title}`,
});
return candidate.slug;
}
For a one-line utility that looks like overkill, and it is, for the utility itself. But when forty of these are scattered across a codebase, having one place to swap models, tune prompts, and log failures is what turns AI from a party trick into a tool.
Want to see how we apply this on real client work? The easiest way is to book a call. We'll walk you through the actual shim from a project we shipped this quarter.