Every frontier model you have heard of is, under the hood, doing the same thing: predicting the next token (word fragment) given everything that came before it. A "reasoning" model isn't doing something fundamentally different. It's doing the same thing, but it's been trained — and is allowed at inference time — to generate a long internal scratchpad of intermediate steps before producing the answer the user sees.

That's the whole trick. The model talks to itself first.

When you toggle "deep think" or "reasoning effort: high" on a provider's API, you're telling the model: spend more tokens thinking before responding. The intermediate tokens are sometimes hidden from the user, sometimes shown. The output is the same kind of generated text, but produced after the model has rehearsed multiple drafts of its own internal monologue.

Why this matters for legal work

The difference shows up on tasks where the answer requires combining several constraints.

The pattern: routine paraphrase and extraction tasks don't benefit from reasoning. Multi-constraint analysis, ambiguous classification, and tasks where the model needs to argue against its own first answer — those benefit.

What I use reasoning for on this site

This site is a real product I run, not a thought experiment. Here's where reasoning is actually wired in:

Where reasoning isn't worth it

It's tempting to turn the reasoning toggle on for everything because the outputs feel more thorough. Don't.

The mental model I keep in my head

Reasoning is the model thinking out loud before it speaks. For routine tasks, it's the verbal equivalent of someone clearing their throat for a long time before saying "okay." For multi-constraint tasks, it's the model arguing with itself before it commits, which is exactly what you want when the question is hard.

The provider docs make this sound more exotic than it is. Anthropic's "extended thinking", OpenAI's "reasoning effort", Google's "thinking budget", xAI's "reasoning effort" — same idea, different knob names. The right question for any task isn't "should I use a reasoning model?" but "is this task one where the model needs to talk to itself first?"

A practical close

If you're picking models for a workflow, start without reasoning. Watch where the output is weakest — typically multi-constraint legal questions, ambiguous classification, anything where the model should be arguing with its first answer. Toggle reasoning on for those. Leave it off for everything else.

That's the rule I apply when I'm wiring a new agent into this site. Most things stay on the fast lane. The hard problems get the slow lane. The user toggle exists because some users know which is which, and the system shouldn't second-guess them.