AI app stacks
Aaron Brick9 min read4 views

AI agent app stack: 6 pieces, 1 weekend

An AI agent app is six moving parts: a Next.js shell, a model SDK, a tool-calling layer, a durable job runner for long tasks, a vector store for memory, and a streaming UI. Wire those and you can ship something genuinely useful in a weekend. The hard part isn't the model — it's the plumbing around it that keeps a slow, flaky call from taking your whole app down. Here's the stack and the four traps that eat the most time.

Warm abstract neural-network style lights against a dark background
Warm abstract neural-network style lights against a dark background
On this page

Most "AI app" tutorials show you one API call and call it a day. Real agent apps fall over on everything around that call: the slow runs, the tool failures, the memory that has to persist between turns. The model is the easy part now. The plumbing is the work.

Here's the six-piece stack we'd use to ship a genuinely useful agent app over a weekend — and the traps that eat the most time.

The six pieces

Scroll to see more

PieceRoleOur default
ShellRouting, auth, UINext.js (App Router)
Model SDKTalk to the LLMA first-party model SDK with streaming
Tool layerLet the agent actTyped function-calling handlers
Job runnerSurvive long runsDurable background jobs
MemoryRetrieve contextVector search over your docs
Streaming UIFeel fastServer-streamed tokens

Why this stack

Start with a boring shell

The shell is just a web app: routing, auth, a place to store conversations. Next.js covers it, and crucially its API routes give you a server boundary where your model keys live. Never call a model from the browser — your key is your bill.

The model SDK and the tool layer

Pick a model SDK that supports streaming and structured tool calls natively. Define your tools as typed handlers — searchDocs, createTicket, sendEmail — and let the model choose. The discipline that matters: every tool validates its own inputs and returns a clean error string the model can read, never a thrown exception that kills the run.

Stack pick

Typed tool handlers behind one dispatcher. Keep every tool in one folder with a shared signature — validate in, structured result out. The agent loop stays tiny and you can add a tool without touching the loop. This is the single highest-leverage piece of structure in an agent app.

Durable jobs: the piece everyone skips

An agent run is a loop — think, act, observe, repeat — and it can run for a minute or more. That's too long to block an HTTP request. Push the run into a durable job, return a job id immediately, and let the client stream or poll. This one decision is the difference between an app that feels solid and one that times out under any real load.

The first version of nearly every agent app we review tries to do the whole loop inside one request handler. It demos fine and dies the moment a tool is slow. Move the loop into a job on day one.

Give the agent a searchDocs tool backed by vector search over your own content. For a few thousand chunks you don't need a dedicated vector database — search inside your existing data layer is fine. Add a specialized store only when you've measured a recall or latency problem.

Streaming UI

Stream tokens to the client. A ten-second answer that starts appearing in one second feels fast; the same answer delivered as a single block feels broken. The App Router makes streaming responses a few lines, and it's the cheapest win in the whole stack.

Wiring it together

 User prompt
    |
    v
 Next.js /api/chat ──▶ enqueue durable job ──▶ return job id
                                  |
                                  v
                         agent loop: model ◀──▶ tools
                                  |          |
                                  |          +──▶ vector search (memory)
                                  v
                         stream tokens back to UI

Stack risks

  • Runaway cost. A looping agent can call the model dozens of times. Cap the loop iterations and log token usage per run from the first commit, or your bill will teach you the hard way.
  • Tool failures crashing runs. A thrown exception inside a tool ends the whole agent run. Wrap every tool so it returns an error string the model can reason about instead of dying.
  • Prompt injection through retrieved content. If the agent reads user-supplied documents, treat that text as untrusted. Never let retrieved content silently override your system instructions.
  • Vector store premature optimization. Standing up a dedicated vector DB before you have data to put in it is a classic weekend-killer. Defer it.

Your move:

Build the loop as a durable job behind a streaming endpoint first, with one real tool, before you add a second model feature — the plumbing is what makes or breaks it.

Sources

  • Anthropic tool-use & streaming documentation (2026) — function calling and streamed responses.
  • Next.js App Router streaming responses, Vercel (2026).
  • "Patterns for building reliable LLM agents," engineering write-ups (2025).
  • ShipGarden internal teardown notes, "weekend agent app" (2026).
A

Written by

Aaron Brick

Frequently asked questions

Do I need a dedicated vector database to start?

Not on day one. For a few thousand documents, vector search inside your existing data layer is plenty. Reach for a dedicated vector DB only when recall quality or latency at scale actually becomes a measured problem, not before.

Why do I need a job runner if the model call is fast?

Agent runs aren't single calls — they loop: think, call a tool, observe, repeat. That can take 30 seconds or two minutes, far longer than a request should block. A durable job runner lets the run survive timeouts and lets the UI poll or stream progress.

Should I stream tokens to the UI or wait for the full answer?

Stream. Perceived latency is most of the battle in an AI app. Streaming the first tokens within a second makes a ten-second answer feel responsive, and it's a few lines with the App Router's streaming responses.