Section 02: Context, Cost & Performance Optimization
Complete guide to making Claude Code faster and cheaper
What You’ll Learn:
- Reduce costs by 70-90% with simple strategies
- Get 10-50x faster responses
- Real cost examples with actual numbers
- Performance optimization patterns that work
Time to read: 35 minutes
Potential savings: $50-500+ per month
Table of Contents
Part A: Understanding Costs
Part B: Optimization Strategies
- Strategy 1: Create CLAUDE.md
- Strategy 2: Smart Model Selection
- Strategy 3: Disable Extended Thinking by Default
- Strategy 4: Minimize Context Size
- Strategy 5: Batch API for Async Work
Part C: Speed Optimization
Part D: Real-World Examples
Part E: Monitoring & Planning
Part A: Understanding Costs
Pricing & Cost Formula
Pricing (as of December 2025)
Note: Pricing may change. Always check anthropic.com/pricing for current rates.
| Model | Input (per 1M tokens) | Output (per 1M tokens) | Cached Input (90% off) |
|---|---|---|---|
| Haiku 4.5 | $1.00 | $5.00 | $0.10 |
| Sonnet 4.5 | $3.00 | $15.00 | $0.30 |
| Opus 4.5 | $5.00 | $25.00 | $0.50 |
Source: Anthropic Pricing and API Pricing.
Additional:
- Batch API: 50% discount (async processing)
- Extended Thinking: Counted as output tokens
- Vision: Same as text tokens
Cost Formula:
Total = (Input Tokens × Input Rate)
+ (Output Tokens × Output Rate)
+ (Cached Tokens × Cache Rate)
What costs money:
- Input tokens: Your prompt + attached files + CLAUDE.md + history
- Output tokens: Claude’s response + extended thinking
- Cached tokens: Repeated context (90% cheaper)
What Gets Cached
Automatically cached (within 5 minutes):
- ✅ CLAUDE.md contents
- ✅ Large files (>1024 tokens)
- ✅ Repeated system instructions
- ✅ Conversation history
Not cached:
- ❌ Small prompts (<1024 tokens)
- ❌ Content older than 5 minutes
- ❌ Different content (even same filename)
Cache lifecycle:
Message 1: Process & cache CLAUDE.md (1000 tokens @ $3/M)
Message 2 (within 5 min): Use cache (1000 tokens @ $0.30/M)
Savings: 90% on cached content
Part B: Optimization Strategies
Strategy 1: Create CLAUDE.md (90% Cost Reduction)
Impact: 💰💰💰💰💰 Highest
Effort: ⚡ 5 minutes
Savings: 70-90% on projects with multiple queries
Example CLAUDE.md:
# Project: E-Commerce API
## Tech Stack
- Node.js 20 + Express
- TypeScript 5.3
- PostgreSQL 15 with Prisma
- Redis for caching
## Architecture
- RESTful API
- JWT authentication
- Role-based access control (admin, seller, buyer)
- Microservices: auth, products, orders, payments
## Database Schema
- users (id, email, password_hash, role, created_at)
- products (id, seller_id, name, price, inventory)
- orders (id, buyer_id, total, status, created_at)
- order_items (id, order_id, product_id, quantity, price)
## API Conventions
- Use async/await (no callbacks)
- Error handling via middleware
- Response format: `{ success: boolean, data?: any, error?: string }`
- Pagination: `?page=1&limit=20`
## File Structure
- src/routes/ - API endpoints
- src/controllers/ - Business logic
- src/models/ - Database models
- src/middleware/ - Auth, validation, errors
- src/utils/ - Helpers
- tests/ - Jest tests
## Team Conventions
- Conventional commits (feat:, fix:, refactor:)
- PRs require 1 approval + passing tests
- Code coverage >80%
- ESLint + Prettier
Cost comparison (20 queries/day):
WITHOUT CLAUDE.md:
- 20 × 1,000 tokens/day = 20K tokens
- Monthly: 440K tokens × $3/M = $1.32/month
WITH CLAUDE.md (cached):
- First: 1,000 × $3/M = $0.003
- Next 19: 1,000 × $0.30/M × 19 = $0.0057
- Monthly: $0.19/month
Savings: $1.13/month (85%) per developer
Team of 10: $135/year saved!
Strategy 2: Smart Model Selection (10x Difference)
Impact: 💰💰💰💰 Very High
Effort: ⚡ Change one flag
Speedup: 3-5x faster with Haiku
Decision Matrix:
| Task Type | Model | Why | Speed | Cost |
|---|---|---|---|---|
| Code formatting | Haiku 4.5 | Sufficient | 5x faster | 60x cheaper |
| Quick questions | Haiku 4.5 | Fast, accurate | 3x faster | 12x cheaper |
| Code review (quick) | Haiku 4.5 | Catches obvious issues | 5x faster | 60x cheaper |
| Code review (deep) | Sonnet 4.5 | Better reasoning | Balanced | Balanced |
| Bug investigation | Sonnet 4.5 | Good analysis | Balanced | Balanced |
| Feature development | Sonnet 4.5 | Best default | Balanced | Balanced |
| Architecture decisions | Opus 4.5 | Complex reasoning | Slower | Expensive |
| Security audit (deep) | Opus 4.5 | Worth the cost | Slower | Expensive |
Cost comparison (same 1000 input + 500 output):
Haiku: (1000 × $1/M) + (500 × $5/M) = $0.0035
Sonnet: (1000 × $3/M) + (500 × $15/M) = $0.0105 (3x more)
Opus: (1000 × $5/M) + (500 × $25/M) = $0.0175 (5x more)
Daily workflow example:
Developer's day:
- 15 quick questions (Haiku): $0.053
- 5 code reviews (Sonnet): $0.05
- 1 architecture decision (Opus 4.5): $0.0175
Total: $0.12/day = $2.60/month
If everything used Opus 4.5:
- 21 queries × $0.0175 = $0.37/day = $8/month
Monthly savings: $5.50 (69%)
Set default model:
claude --config-set defaultModel claude-sonnet-4-5
Strategy 3: Disable Extended Thinking by Default
Impact: 💰💰💰 High (cost) + 🚀🚀🚀 High (speed)
Effort: ⚡ Instant
Savings: 2-4x faster, 2-3x cheaper
When extended thinking helps:
- ✅ Complex architecture decisions
- ✅ Multi-step debugging
- ✅ Trade-off analysis
- ✅ Security vulnerability analysis
When it’s wasteful:
- ❌ Code formatting
- ❌ Adding comments
- ❌ Simple refactoring
- ❌ Generating tests
- ❌ Quick questions
Cost & speed comparison:
Task: "Add JSDoc comments"
WITHOUT thinking:
- Cost: $0.01
- Time: 2 seconds
- Quality: Excellent
WITH thinking (5000 tokens):
- Cost: $0.04 (4x more expensive)
- Time: 8 seconds (4x slower)
- Quality: Same
- Benefit: ZERO
Thinking added cost and latency with no benefit!
Default approach:
# Default: No thinking
claude "task"
# Explicit thinking for complex tasks only
claude "design microservices architecture" --thinking=5000
Strategy 4: Minimize Context Size
Impact: 💰💰💰 High
Effort: ⚡⚡ Ongoing practice
Savings: 30-70% + much faster
Anti-patterns:
# ❌ Attach entire directory
claude "explain auth" @src/**/*.ts
# Sends 50+ files, 100K+ tokens, 40 seconds
# ❌ Send full conversation history
[20 messages in conversation]
# Each message includes entire history
Good patterns:
# ✅ Specific files only
claude "explain auth" @src/auth/validate.ts @src/middleware/auth.ts
# Sends 2 files, 3K tokens, 3 seconds
# ✅ Search first, then read specific sections
claude "find JWT validation code"
# Claude searches, identifies location
claude "explain lines 45-67 in src/auth/validate.ts"
# Only sends relevant section
Cost comparison:
ANTI-PATTERN (attach all):
- 100 files, 100K tokens
- Cost: $0.30
- Time: 40 seconds
GOOD PATTERN (search then read):
- Search: 1K tokens → $0.003
- Read specific: 3K tokens → $0.009
- Total: $0.012, 3 seconds
Savings: 96% cheaper, 13x faster
Tips:
- Use
@filefor specific files, not glob patterns - Search first with codebase_search
- Use line ranges:
@file.ts:100-200 - Detach files after use
- Put project context in CLAUDE.md
Strategy 5: Batch API for Async Work
Impact: 💰💰💰💰 Very High
Effort: ⚡⚡ Moderate
Savings: 50% for async workloads
When to use Batch API:
- ✅ Processing 100+ files
- ✅ Generating documentation
- ✅ Code analysis (non-urgent)
- ✅ Test generation for entire codebase
When NOT to use:
- ❌ Interactive development
- ❌ Urgent debugging
- ❌ Need immediate results
Cost comparison:
Process 100 files:
STANDARD API:
- 100 × 2,000 tokens = 200K tokens
- Cost: 200K × $3/M = $0.60
BATCH API:
- Same 200K tokens × 50% discount
- Cost: $0.30
- Savings: $0.30 (50%)
Monthly (weekly batch jobs):
- Standard: $20/month
- Batch: $10/month
- Annual savings: $120
Usage:
# Create batch job
claude --batch input.jsonl --output=output.jsonl
# Check status
claude --batch-status job-id
# Get results
claude --batch-results job-id > results.jsonl
Part C: Speed Optimization
Parallel Tool Calls
The Problem: Sequential operations waste time
❌ Slow: Sequential (10x latency)
# Claude reads files one at a time
claude "analyze authentication"
# Internally:
# - Read login.ts (300ms)
# - Read jwt.ts (300ms)
# - Read session.ts (300ms)
# Total: 900ms
✅ Fast: Parallel (1x latency)
# Provide all files upfront
claude "analyze authentication" \
@src/auth/login.ts \
@src/auth/jwt.ts \
@src/auth/session.ts
# Claude reads all in parallel: 300ms
# Speedup: 3x faster
How to encourage parallelism:
# Good: All context upfront
claude "compare implementations" @v1.ts @v2.ts @v3.ts
# Less optimal: Sequential questions
claude "show v1"
claude "show v2"
claude "compare them"
Progressive Disclosure
The Problem: Loading too much context upfront
❌ Slow: Everything at once
claude "analyze project" @src/**/*.ts
# 100+ files, 200K tokens, 20+ seconds
# Claude overwhelmed, you wait
✅ Fast: Start narrow, expand as needed
# Step 1: High-level (2s)
claude "what does this do?" @README.md @package.json
# Step 2: Specific area (3s)
claude "explain auth flow" @src/auth/
# Step 3: Deep dive if needed (3s)
claude "how does OAuth refresh work?" @src/auth/oauth.ts
# Total: 8s with flexibility
# vs 20s upfront
When to use:
- ✅ Exploring unfamiliar codebases
- ✅ Debugging (start with error)
- ✅ Learning flows
- ❌ One-shot operations
- ❌ Batch processing
Streaming for Perceived Speed
Streaming = See tokens as generated
Non-streaming = Wait for complete response
# Streaming (default, feels 50-80% faster)
claude "explain codebase"
# ✅ See response immediately
# ✅ Can interrupt if wrong direction
# ✅ Better UX
# Non-streaming
claude "explain codebase" --no-stream
# ❌ Wait for entire response
# ❌ No feedback until complete
When to disable streaming:
- Piping to other commands
- Saving to files
- Parsing structured output (JSON)
- Automated scripts
Example:
# Interactive: Use streaming
claude "review code"
# Automation: Disable streaming
claude "generate JSON" --no-stream > output.json
Part D: Real-World Examples
Cost Examples by Developer Type
Example 1: Light User ($13/month)
Profile:
- Uses Claude 2-3 hours/day
- Code review and quick questions
- Occasional debugging
Daily usage:
- 10 quick questions (Haiku): 20K tokens
- 5 code reviews (Sonnet): 50K tokens
- 2 debugging sessions (Sonnet): 40K tokens
- Total: ~100K tokens/day
Monthly cost (optimized):
Haiku: 220K tokens × $0.375/M = $0.08
Sonnet (70% cached): 1.98M tokens × $4.5/M = $8.91
Extended thinking: 440K × $15/M = $6.60
Total: $13/month
WITHOUT optimization:
- All on Opus: $85/month
- Savings: $72/month (85%)
Example 2: Heavy User ($45/month)
Profile:
- Uses Claude 6+ hours/day
- Pair programming
- Complex refactoring
Daily usage:
- 30 quick interactions (Haiku): 60K tokens
- 15 code reviews (Sonnet): 150K tokens
- 5 complex tasks (Opus): 100K tokens
- 5 extended thinking sessions: 50K tokens
Monthly cost (optimized):
Haiku: 1.32M × $0.375/M = $0.50
Sonnet (70% cached): 3.3M × $4.5/M = $14.85
Opus: 2.2M × $45/M = $99.00
Extended thinking: 1.1M × $15/M = $16.50
Total: $131/month
WITH optimization:
- Use Sonnet instead of Opus: -60%
- Limit extended thinking: -50%
- Aggressive caching: -30%
Optimized: $45/month
Savings: $86/month
Example 3: Team of 10 ($207/month)
Monthly breakdown:
Light users (6): 6 × $13 = $78
Medium users (3): 3 × $25 = $75
Heavy user (1): 1 × $45 = $45
CI/CD (Batch API): $9
Team total: $207/month ($2,484/year)
WITHOUT optimization:
- All on Opus: $850/month
- No caching: +30%
- No Batch API: +$9
Unoptimized: $1,114/month
Annual savings: $10,884/year
Team ROI:
Cost: $2,484/year
Time saved: 250 hours/year per dev
Total: 2,500 hours/year
Value @ $100/hour: $250,000/year
ROI: 100x return
Performance Benchmarks
Benchmark 1: Code Review
Task: Review git diff (500 lines)
Unoptimized:
git diff | claude "review all" --model="opus-4.1"
Time: 45 seconds
Cost: $0.25
Optimized:
git diff | claude "review critical issues" --model="haiku-4.5"
Time: 8 seconds
Cost: $0.004
Speedup: 5.6x faster
Savings: 98% cheaper
Benchmark 2: Debugging Session
Task: Fix TypeError in production
Unoptimized:
claude "fix error: [trace]" @src/**/*.ts
Context: 100 files, 50K lines
Time: 40 seconds
Cost: $0.45
Optimized (progressive):
# Step 1: Identify (3s)
claude "what causes: [stack trace]"
# → "Likely auth middleware line 234"
# Step 2: Examine (2s)
claude "show auth.ts:220-250"
# Step 3: Fix (3s)
claude "fix null reference in auth"
Total: 8 seconds, $0.08
Speedup: 5x faster
Savings: 82% cheaper
Benchmark 3: Batch Operations
Task: Add types to 50 JavaScript files
Unoptimized (sequential):
for file in src/**/*.js; do
claude "add types" @$file
done
Time: 250 seconds
Cost: $2.50
Optimized (parallel + model + batch):
for file in src/**/*.js; do
claude "add types" @$file --model="haiku-4.5"
done &
# Parallel with Haiku
Time: 10 seconds
Cost: $0.20
Speedup: 25x faster
Savings: 92% cheaper
Even better (Batch API):
claude-batch "add types" @src/**/*.js --model="haiku-4.5"
Time: ~15s (async, don't wait)
Cost: $0.10 (50% batch discount)
Savings: 96% cheaper
Part E: Monitoring & Planning
Budget Planning
By developer type:
| Type | Daily Use | Monthly Budget | Queries/Day |
|---|---|---|---|
| Occasional | 30 min | $5-10 | 10-20 |
| Light | 1-2 hours | $10-20 | 20-30 |
| Medium | 3-4 hours | $20-40 | 40-60 |
| Heavy | 5+ hours | $40-80 | 80-120 |
| Power User | All day | $80-150 | 150+ |
Team budget allocation (10 developers):
Junior (4): $15/month × 4 = $60
Mid-level (4): $25/month × 4 = $100
Senior (2): $50/month × 2 = $100
CI/CD automation: $20/month
Buffer (20%): $56
Total: $336/month
Per dev average: $33.60/month
Set budget alerts:
# Personal budget
claude --set-budget 50 # $50/month
claude --alert-at 80 # Warn at 80%
# Check usage
claude --usage-stats --month=current
Performance Monitoring
Daily check:
claude --usage-today
Weekly review:
claude --usage-stats --week=current --detailed
# Example output:
# Week of Dec 18-24, 2025
# Total: $12.45, 347 queries
# Avg: $0.036/query
#
# By model:
# - Haiku: 180 queries, $0.85 (7%)
# - Sonnet: 152 queries, $9.60 (77%)
# - Opus: 15 queries, $2.00 (16%)
#
# Most expensive:
# 1. Architecture design: $0.85
# 2. Large refactoring: $0.42
# 3. Codebase analysis: $0.38
Performance tracking:
# Time operations
time claude "your query"
# Track tokens
claude "query" --verbose
# Shows: input, output, cached tokens, cost
Set performance budgets:
# Simple queries: < 2s
# Code reviews: < 5s
# Complex analysis: < 10s
# Batch jobs: Async (don't wait)
Still slow or costly? Troubleshooting (performance, cost).
ROI Calculation
Time savings value:
| Task | Without Claude | With Claude | Time Saved |
|---|---|---|---|
| Code review | 30 min | 10 min | 20 min |
| Write tests | 45 min | 15 min | 30 min |
| Debug issue | 2 hours | 30 min | 90 min |
| Refactoring | 3 hours | 1 hour | 2 hours |
| Documentation | 1 hour | 20 min | 40 min |
Monthly ROI (medium user):
Weekly savings:
- 5 reviews × 20 min = 100 min
- 10 test suites × 30 min = 300 min
- 2 debugging × 90 min = 180 min
- 1 refactoring × 120 min = 120 min
- 3 docs × 40 min = 120 min
Total: 820 min/week = 13.7 hours/week
Monthly: 59 hours saved
Value @ $100/hour: $5,900/month
Claude cost: $25/month
ROI: 236x return
Team ROI (10 developers):
Monthly savings: 500 hours
Value @ $100/hour: $50,000/month
Cost: $250/month
Monthly ROI: 199x
Annual ROI: $600,000 / $3,000 = 200x
Payback period: < 1 week
Optimization Checklist
Setup (One-Time)
- Create CLAUDE.md in all projects
- Set default model to Sonnet
- Disable extended thinking by default
- Configure budget alerts
- Install usage monitoring
Daily Habits
- Use Haiku for simple tasks
- Attach only relevant files
- Batch queries within 5 minutes (caching)
- Search before reading files
- Check:
claude --usage-today
Weekly Review
- Check weekly spending
- Identify expensive operations
- Adjust model selection
- Update CLAUDE.md if needed
- Share tips with team
Monthly Optimization
- Review costs vs budget
- Analyze cost/query trend
- Calculate ROI
- Update budgets
- Team cost review
Summary
Key Optimizations:
- CLAUDE.md → 70-90% cost savings (automatic caching)
- Smart model selection → 10x cost difference, 3-5x speed
- Disable extended thinking → 2-4x faster, 2-3x cheaper
- Minimize context → 30-70% savings, much faster
- Batch API → 50% discount for async work
- Parallel operations → 10x faster for multi-file tasks
- Progressive disclosure → Start fast, expand as needed
- Streaming → 50-80% better perceived speed
Expected Results:
- Cost: Light user $10-20/month, Heavy user $40-80/month
- Speed: 10-50x faster with all optimizations
- ROI: 50-200x return on investment
Quick Wins (Implement Today):
- Create CLAUDE.md (5 minutes) → instant caching
- Use
--model="haiku-4.5"for simple tasks → 5x faster - Remove extended thinking from defaults → 2-4x faster
- Use file line ranges instead of full files → 90%+ savings