Most people default to one model for everything. That's expensive on complex tasks and wasteful on simple ones. The right architecture routes different tasks to different models based on complexity, cost, and speed. This is the routing logic running in production at Ask Patrick — it cuts API costs 60–70% without sacrificing quality on anything that matters.
| Tier | Model | Use When |
|---|---|---|
| Tier 1 — Fast | Haiku / Flash / GPT-4o mini | Classification, routing, simple extraction, yes/no decisions |
| Tier 2 — Balanced | Sonnet / GPT-4o | Research, content drafting, code gen, analysis |
| Tier 3 — Deep | Opus / o1 / GPT-4 Turbo | Complex reasoning, high-stakes decisions, architecture planning |
Is this task < 100 tokens input AND a simple extraction/classification?
→ Tier 1 (Haiku)
Does this task require: multi-step reasoning, code gen, or research synthesis?
→ Tier 2 (Sonnet) — unless stakes are high
Is this task: irreversible action, major strategy, complex debugging?
→ Tier 3 (Opus)
Is a human waiting for this response synchronously?
→ Tier 1 or 2 only (latency matters)
Is this an async background job?
→ Any tier, optimize for quality
Use a cheap Tier 1 model to classify the request, then route to the appropriate worker.
ROUTING_PROMPT = """Classify this task into exactly one category.
Return ONLY the category name, nothing else.
Categories:
- simple_extraction (get a specific piece of data from text)
- classification (categorize into predefined options)
- content_generation (write something new)
- code_task (write, review, or debug code)
- complex_reasoning (multi-step analysis or strategy)
Task: {task}"""
def route_task(task: str) -> str:
category = haiku(ROUTING_PROMPT.format(task=task)).strip()
routing_map = {
"simple_extraction": "haiku",
"classification": "haiku",
"content_generation": "sonnet",
"code_task": "sonnet",
"complex_reasoning": "opus"
}
return routing_map.get(category, "sonnet") # default to sonnet
Cost: ~$0.00025 per routing decision (Haiku). Pays for itself on first Opus-to-Sonnet redirect.
Draft cheap, review with a peer model, escalate only on failure.
def draft_with_review(task: str, quality_threshold: float = 0.8) -> str:
# Step 1: Draft with Sonnet
draft = sonnet(f"Complete this task: {task}")
# Step 2: Self-review with Haiku (cheap QA pass)
review_prompt = f"""Rate the quality of this response on a scale of 0.0 to 1.0.
Consider: accuracy, completeness, clarity.
Return ONLY a float. Nothing else.
Task: {task}
Response: {draft}"""
score = float(haiku(review_prompt).strip())
if score >= quality_threshold:
return draft
# Step 3: Escalate to Opus only when Sonnet fails QA
return opus(f"""Improve this response. Make it accurate and complete.
Task: {task}
Draft (score {score}/1.0): {draft}
Return the improved version only.""")
The rest of this item covers Pattern 3 (context-aware model selection), Pattern 4 (token budget routing), Pattern 5 (fallback chains), and the exact cost breakdown from 30 days of production data.
Includes all 40+ library items + Daily Briefing. 30-day money-back guarantee.