AI agent app stack: 6 pieces, 1 weekend
An AI agent app is six moving parts: a Next.js shell, a model SDK, a tool-calling layer, a durable job runner for long tasks, a vector store for memory, and a streaming UI. Wire those and you can ship something genuinely useful in a weekend. The hard part isn't the model — it's the plumbing around it that keeps a slow, flaky call from taking your whole app down. Here's the stack and the four traps that eat the most time.
On this page
Most "AI app" tutorials show you one API call and call it a day. Real agent apps fall over on everything around that call: the slow runs, the tool failures, the memory that has to persist between turns. The model is the easy part now. The plumbing is the work.
Here's the six-piece stack we'd use to ship a genuinely useful agent app over a weekend — and the traps that eat the most time.
The six pieces
Scroll to see more
| Piece | Role | Our default |
|---|---|---|
| Shell | Routing, auth, UI | Next.js (App Router) |
| Model SDK | Talk to the LLM | A first-party model SDK with streaming |
| Tool layer | Let the agent act | Typed function-calling handlers |
| Job runner | Survive long runs | Durable background jobs |
| Memory | Retrieve context | Vector search over your docs |
| Streaming UI | Feel fast | Server-streamed tokens |
Why this stack
Start with a boring shell
The shell is just a web app: routing, auth, a place to store conversations. Next.js covers it, and crucially its API routes give you a server boundary where your model keys live. Never call a model from the browser — your key is your bill.
The model SDK and the tool layer
Pick a model SDK that supports streaming and structured tool calls natively. Define your tools as typed handlers — searchDocs, createTicket, sendEmail — and let the model choose. The discipline that matters: every tool validates its own inputs and returns a clean error string the model can read, never a thrown exception that kills the run.
Stack pick
Typed tool handlers behind one dispatcher. Keep every tool in one folder with a shared signature — validate in, structured result out. The agent loop stays tiny and you can add a tool without touching the loop. This is the single highest-leverage piece of structure in an agent app.
Durable jobs: the piece everyone skips
An agent run is a loop — think, act, observe, repeat — and it can run for a minute or more. That's too long to block an HTTP request. Push the run into a durable job, return a job id immediately, and let the client stream or poll. This one decision is the difference between an app that feels solid and one that times out under any real load.
The first version of nearly every agent app we review tries to do the whole loop inside one request handler. It demos fine and dies the moment a tool is slow. Move the loop into a job on day one.
Memory via vector search
Give the agent a searchDocs tool backed by vector search over your own content. For a few thousand chunks you don't need a dedicated vector database — search inside your existing data layer is fine. Add a specialized store only when you've measured a recall or latency problem.
Streaming UI
Stream tokens to the client. A ten-second answer that starts appearing in one second feels fast; the same answer delivered as a single block feels broken. The App Router makes streaming responses a few lines, and it's the cheapest win in the whole stack.
Wiring it together
User prompt
|
v
Next.js /api/chat ──▶ enqueue durable job ──▶ return job id
|
v
agent loop: model ◀──▶ tools
| |
| +──▶ vector search (memory)
v
stream tokens back to UI
Stack risks
- Runaway cost. A looping agent can call the model dozens of times. Cap the loop iterations and log token usage per run from the first commit, or your bill will teach you the hard way.
- Tool failures crashing runs. A thrown exception inside a tool ends the whole agent run. Wrap every tool so it returns an error string the model can reason about instead of dying.
- Prompt injection through retrieved content. If the agent reads user-supplied documents, treat that text as untrusted. Never let retrieved content silently override your system instructions.
- Vector store premature optimization. Standing up a dedicated vector DB before you have data to put in it is a classic weekend-killer. Defer it.
Your move:
Build the loop as a durable job behind a streaming endpoint first, with one real tool, before you add a second model feature — the plumbing is what makes or breaks it.
Sources
- Anthropic tool-use & streaming documentation (2026) — function calling and streamed responses.
- Next.js App Router streaming responses, Vercel (2026).
- "Patterns for building reliable LLM agents," engineering write-ups (2025).
- ShipGarden internal teardown notes, "weekend agent app" (2026).
Written by
Aaron BrickFrequently asked questions
Do I need a dedicated vector database to start?
Not on day one. For a few thousand documents, vector search inside your existing data layer is plenty. Reach for a dedicated vector DB only when recall quality or latency at scale actually becomes a measured problem, not before.
Why do I need a job runner if the model call is fast?
Agent runs aren't single calls — they loop: think, call a tool, observe, repeat. That can take 30 seconds or two minutes, far longer than a request should block. A durable job runner lets the run survive timeouts and lets the UI poll or stream progress.
Should I stream tokens to the UI or wait for the full answer?
Stream. Perceived latency is most of the battle in an AI app. Streaming the first tokens within a second makes a ten-second answer feel responsive, and it's a few lines with the App Router's streaming responses.