Message Arrives
Store in Redis (1-2ms, 1hr TTL)
↓
Retrieve Short-term Context
Exact lookup from Redis
↓
Agent Processes
With short-term + long-term context
↓
Conversation Ends
Generate summary (200 tokens)
↓
Relevance Check
Should we keep this?
↓
Embed Summary
Store in Vector DB (not full transcript)