Agent system design edition
"Design a customer support agent. 50K users, 200 req/hr, LLM bills per token."
Engineer talks about models, tools, retrieval...
I ask: "What breaks first?"
Long pause. "Probably... latency?"
That's when I knew they hadn't carried this system in production.
"Design an agent system for [domain]. You have [scale constraint]. Your LLM provider [cost/latency constraint]. What breaks first?"
The last part is the entire interview.
"I'd use GPT-4 for quality. Tool calls for ticket lookup. RAG for knowledge base. Stream responses for lower latency."
Correct. Thorough. Missing the signal.
"First thing that breaks is cost. 200 req/hr × 2.5K tokens avg = 12M tokens/day = $270/day before retries. Need per-request and per-user token caps before I build anything else."
Production-minded engineers think in failures. Demo builders think in features.
Production-minded engineers don't try to optimize all three.
200 requests/hour × 2,000 tokens in × 500 tokens out = 500K tokens/hour = 12M tokens/day Input: 12M × 0.6 × $0.01/1K = $72/day Output: 12M × 0.4 × $0.03/1K = $144/day Total: $216/day = $6,480/month Before retries. Before spikes.
If you didn't do this math, you haven't shipped this.
Start with what can't work.
Architecture fits inside constraints.
Not the other way around.
"I'd build a multi-agent system with a planner, retrieval agent, and response generator."
"Latency budget is tight. Multi-agent orchestration adds 300ms. Using single agent with tool calls. Trade-off: lose modularity, gain latency. Acceptable because..."
Engineers who've operated production systems don't wait for the failure question.
They've debugged this at 2am.
Building the happy path, then retrofitting failure handling.
This signals: demo experience, not production experience.
Engineers with a production mindset design guardrails first, features second.
Not "the database." Not "scalability."
Agent-specific failure modes.
Production-focused interviews test: Have you operated this?
The signal:
"I've built this" ≠ "I've operated this"
More production-pattern thinking
When you're ready for structured learning