Section 02: Context, Cost & Performance Optimization

Complete guide to making Claude Code faster and cheaper

What You’ll Learn:

Reduce costs by 70-90% with simple strategies
Get 10-50x faster responses
Real cost examples with actual numbers
Performance optimization patterns that work

Time to read: 35 minutes
Potential savings: $50-500+ per month

Part A: Understanding Costs

Pricing & Cost Formula
What Gets Cached

Part B: Optimization Strategies

Strategy 1: Create CLAUDE.md
Strategy 2: Smart Model Selection
Strategy 3: Disable Extended Thinking by Default
Strategy 4: Minimize Context Size
Strategy 5: Batch API for Async Work

Part C: Speed Optimization

Parallel Tool Calls
Progressive Disclosure
Streaming for Perceived Speed

Part D: Real-World Examples

Cost Examples by Developer Type
Performance Benchmarks

Part E: Monitoring & Planning

Budget Planning
Performance Monitoring
ROI Calculation

Part A: Understanding Costs

Pricing & Cost Formula

Pricing (as of December 2025)

Note: Pricing may change. Always check anthropic.com/pricing for current rates.

Model	Input (per 1M tokens)	Output (per 1M tokens)	Cached Input (90% off)
Haiku 4.5	$1.00	$5.00	$0.10
Sonnet 4.5	$3.00	$15.00	$0.30
Opus 4.5	$5.00	$25.00	$0.50

Source: Anthropic Pricing and API Pricing.

Additional:

Batch API: 50% discount (async processing)
Extended Thinking: Counted as output tokens
Vision: Same as text tokens

Cost Formula:

Total = (Input Tokens × Input Rate) 
      + (Output Tokens × Output Rate)
      + (Cached Tokens × Cache Rate)

What costs money:

Input tokens: Your prompt + attached files + CLAUDE.md + history
Output tokens: Claude’s response + extended thinking
Cached tokens: Repeated context (90% cheaper)

What Gets Cached

Automatically cached (within 5 minutes):

✅ CLAUDE.md contents
✅ Large files (>1024 tokens)
✅ Repeated system instructions
✅ Conversation history

Not cached:

❌ Small prompts (<1024 tokens)
❌ Content older than 5 minutes
❌ Different content (even same filename)

Cache lifecycle:

Message 1: Process & cache CLAUDE.md (1000 tokens @ $3/M)
Message 2 (within 5 min): Use cache (1000 tokens @ $0.30/M)
Savings: 90% on cached content

Part B: Optimization Strategies

Strategy 1: Create CLAUDE.md (90% Cost Reduction)

Impact: 💰💰💰💰💰 Highest
Effort: ⚡ 5 minutes
Savings: 70-90% on projects with multiple queries

Example CLAUDE.md:

# Project: E-Commerce API

## Tech Stack
- Node.js 20 + Express
- TypeScript 5.3
- PostgreSQL 15 with Prisma
- Redis for caching

## Architecture
- RESTful API
- JWT authentication
- Role-based access control (admin, seller, buyer)
- Microservices: auth, products, orders, payments

## Database Schema
- users (id, email, password_hash, role, created_at)
- products (id, seller_id, name, price, inventory)
- orders (id, buyer_id, total, status, created_at)
- order_items (id, order_id, product_id, quantity, price)

## API Conventions
- Use async/await (no callbacks)
- Error handling via middleware
- Response format: `{ success: boolean, data?: any, error?: string }`
- Pagination: `?page=1&limit=20`

## File Structure
- src/routes/ - API endpoints
- src/controllers/ - Business logic
- src/models/ - Database models
- src/middleware/ - Auth, validation, errors
- src/utils/ - Helpers
- tests/ - Jest tests

## Team Conventions
- Conventional commits (feat:, fix:, refactor:)
- PRs require 1 approval + passing tests
- Code coverage >80%
- ESLint + Prettier

Cost comparison (20 queries/day):

WITHOUT CLAUDE.md:
- 20 × 1,000 tokens/day = 20K tokens
- Monthly: 440K tokens × $3/M = $1.32/month

WITH CLAUDE.md (cached):
- First: 1,000 × $3/M = $0.003
- Next 19: 1,000 × $0.30/M × 19 = $0.0057
- Monthly: $0.19/month

Savings: $1.13/month (85%) per developer
Team of 10: $135/year saved!

Strategy 2: Smart Model Selection (10x Difference)

Impact: 💰💰💰💰 Very High
Effort: ⚡ Change one flag
Speedup: 3-5x faster with Haiku

Decision Matrix:

Task Type	Model	Why	Speed	Cost
Code formatting	Haiku 4.5	Sufficient	5x faster	60x cheaper
Quick questions	Haiku 4.5	Fast, accurate	3x faster	12x cheaper
Code review (quick)	Haiku 4.5	Catches obvious issues	5x faster	60x cheaper
Code review (deep)	Sonnet 4.5	Better reasoning	Balanced	Balanced
Bug investigation	Sonnet 4.5	Good analysis	Balanced	Balanced
Feature development	Sonnet 4.5	Best default	Balanced	Balanced
Architecture decisions	Opus 4.5	Complex reasoning	Slower	Expensive
Security audit (deep)	Opus 4.5	Worth the cost	Slower	Expensive

Cost comparison (same 1000 input + 500 output):

Haiku:   (1000 × $1/M) + (500 × $5/M) = $0.0035
Sonnet:  (1000 × $3/M) + (500 × $15/M) = $0.0105     (3x more)
Opus:    (1000 × $5/M) + (500 × $25/M) = $0.0175     (5x more)

Daily workflow example:

Developer's day:
- 15 quick questions (Haiku): $0.053
- 5 code reviews (Sonnet): $0.05
- 1 architecture decision (Opus 4.5): $0.0175
Total: $0.12/day = $2.60/month

If everything used Opus 4.5:
- 21 queries × $0.0175 = $0.37/day = $8/month
Monthly savings: $5.50 (69%)

Set default model:

claude --config-set defaultModel claude-sonnet-4-5

Strategy 3: Disable Extended Thinking by Default

Impact: 💰💰💰 High (cost) + 🚀🚀🚀 High (speed)
Effort: ⚡ Instant
Savings: 2-4x faster, 2-3x cheaper

When extended thinking helps:

✅ Complex architecture decisions
✅ Multi-step debugging
✅ Trade-off analysis
✅ Security vulnerability analysis

When it’s wasteful:

❌ Code formatting
❌ Adding comments
❌ Simple refactoring
❌ Generating tests
❌ Quick questions

Cost & speed comparison:

Task: "Add JSDoc comments"

WITHOUT thinking:
- Cost: $0.01
- Time: 2 seconds
- Quality: Excellent

WITH thinking (5000 tokens):
- Cost: $0.04 (4x more expensive)
- Time: 8 seconds (4x slower)
- Quality: Same
- Benefit: ZERO

Thinking added cost and latency with no benefit!

Default approach:

# Default: No thinking
claude "task"

# Explicit thinking for complex tasks only
claude "design microservices architecture" --thinking=5000

Strategy 4: Minimize Context Size

Impact: 💰💰💰 High
Effort: ⚡⚡ Ongoing practice
Savings: 30-70% + much faster

Anti-patterns:

# ❌ Attach entire directory
claude "explain auth" @src/**/*.ts
# Sends 50+ files, 100K+ tokens, 40 seconds

# ❌ Send full conversation history
[20 messages in conversation]
# Each message includes entire history

Good patterns:

# ✅ Specific files only
claude "explain auth" @src/auth/validate.ts @src/middleware/auth.ts
# Sends 2 files, 3K tokens, 3 seconds

# ✅ Search first, then read specific sections
claude "find JWT validation code"
# Claude searches, identifies location
claude "explain lines 45-67 in src/auth/validate.ts"
# Only sends relevant section

Cost comparison:

ANTI-PATTERN (attach all):
- 100 files, 100K tokens
- Cost: $0.30
- Time: 40 seconds

GOOD PATTERN (search then read):
- Search: 1K tokens → $0.003
- Read specific: 3K tokens → $0.009
- Total: $0.012, 3 seconds

Savings: 96% cheaper, 13x faster

Tips:

Use @file for specific files, not glob patterns
Search first with codebase_search
Use line ranges: @file.ts:100-200
Detach files after use
Put project context in CLAUDE.md

Strategy 5: Batch API for Async Work

Impact: 💰💰💰💰 Very High
Effort: ⚡⚡ Moderate
Savings: 50% for async workloads

When to use Batch API:

✅ Processing 100+ files
✅ Generating documentation
✅ Code analysis (non-urgent)
✅ Test generation for entire codebase

When NOT to use:

❌ Interactive development
❌ Urgent debugging
❌ Need immediate results

Cost comparison:

Process 100 files:

STANDARD API:
- 100 × 2,000 tokens = 200K tokens
- Cost: 200K × $3/M = $0.60

BATCH API:
- Same 200K tokens × 50% discount
- Cost: $0.30
- Savings: $0.30 (50%)

Monthly (weekly batch jobs):
- Standard: $20/month
- Batch: $10/month
- Annual savings: $120

Usage:

# Create batch job
claude --batch input.jsonl --output=output.jsonl

# Check status
claude --batch-status job-id

# Get results
claude --batch-results job-id > results.jsonl

Part C: Speed Optimization

Parallel Tool Calls

The Problem: Sequential operations waste time

❌ Slow: Sequential (10x latency)

# Claude reads files one at a time
claude "analyze authentication"
# Internally:
# - Read login.ts (300ms)
# - Read jwt.ts (300ms)
# - Read session.ts (300ms)
# Total: 900ms

✅ Fast: Parallel (1x latency)

# Provide all files upfront
claude "analyze authentication" \
  @src/auth/login.ts \
  @src/auth/jwt.ts \
  @src/auth/session.ts
# Claude reads all in parallel: 300ms
# Speedup: 3x faster

How to encourage parallelism:

# Good: All context upfront
claude "compare implementations" @v1.ts @v2.ts @v3.ts

# Less optimal: Sequential questions
claude "show v1"
claude "show v2"
claude "compare them"

Progressive Disclosure

The Problem: Loading too much context upfront

❌ Slow: Everything at once

claude "analyze project" @src/**/*.ts
# 100+ files, 200K tokens, 20+ seconds
# Claude overwhelmed, you wait

✅ Fast: Start narrow, expand as needed

# Step 1: High-level (2s)
claude "what does this do?" @README.md @package.json

# Step 2: Specific area (3s)
claude "explain auth flow" @src/auth/

# Step 3: Deep dive if needed (3s)
claude "how does OAuth refresh work?" @src/auth/oauth.ts

# Total: 8s with flexibility
# vs 20s upfront

When to use:

✅ Exploring unfamiliar codebases
✅ Debugging (start with error)
✅ Learning flows
❌ One-shot operations
❌ Batch processing

Streaming for Perceived Speed

Streaming = See tokens as generated
Non-streaming = Wait for complete response

# Streaming (default, feels 50-80% faster)
claude "explain codebase"
# ✅ See response immediately
# ✅ Can interrupt if wrong direction
# ✅ Better UX

# Non-streaming
claude "explain codebase" --no-stream
# ❌ Wait for entire response
# ❌ No feedback until complete

When to disable streaming:

Piping to other commands
Saving to files
Parsing structured output (JSON)
Automated scripts

Example:

# Interactive: Use streaming
claude "review code"

# Automation: Disable streaming
claude "generate JSON" --no-stream > output.json

Part D: Real-World Examples

Cost Examples by Developer Type

Example 1: Light User ($13/month)

Profile:

Uses Claude 2-3 hours/day
Code review and quick questions
Occasional debugging

Daily usage:

10 quick questions (Haiku): 20K tokens
5 code reviews (Sonnet): 50K tokens
2 debugging sessions (Sonnet): 40K tokens
Total: ~100K tokens/day

Monthly cost (optimized):

Haiku: 220K tokens × $0.375/M = $0.08
Sonnet (70% cached): 1.98M tokens × $4.5/M = $8.91
Extended thinking: 440K × $15/M = $6.60

Total: $13/month

WITHOUT optimization:
- All on Opus: $85/month
- Savings: $72/month (85%)

Example 2: Heavy User ($45/month)

Profile:

Uses Claude 6+ hours/day
Pair programming
Complex refactoring

Daily usage:

30 quick interactions (Haiku): 60K tokens
15 code reviews (Sonnet): 150K tokens
5 complex tasks (Opus): 100K tokens
5 extended thinking sessions: 50K tokens

Monthly cost (optimized):

Haiku: 1.32M × $0.375/M = $0.50
Sonnet (70% cached): 3.3M × $4.5/M = $14.85
Opus: 2.2M × $45/M = $99.00
Extended thinking: 1.1M × $15/M = $16.50

Total: $131/month

WITH optimization:
- Use Sonnet instead of Opus: -60%
- Limit extended thinking: -50%
- Aggressive caching: -30%

Optimized: $45/month
Savings: $86/month

Example 3: Team of 10 ($207/month)

Monthly breakdown:

Light users (6): 6 × $13 = $78
Medium users (3): 3 × $25 = $75
Heavy user (1): 1 × $45 = $45
CI/CD (Batch API): $9

Team total: $207/month ($2,484/year)

WITHOUT optimization:
- All on Opus: $850/month
- No caching: +30%
- No Batch API: +$9
Unoptimized: $1,114/month

Annual savings: $10,884/year

Team ROI:

Cost: $2,484/year
Time saved: 250 hours/year per dev
Total: 2,500 hours/year
Value @ $100/hour: $250,000/year
ROI: 100x return

Performance Benchmarks

Benchmark 1: Code Review

Task: Review git diff (500 lines)

Unoptimized:

git diff | claude "review all" --model="opus-4.1"
Time: 45 seconds
Cost: $0.25

Optimized:

git diff | claude "review critical issues" --model="haiku-4.5"
Time: 8 seconds
Cost: $0.004
Speedup: 5.6x faster
Savings: 98% cheaper

Benchmark 2: Debugging Session

Task: Fix TypeError in production

Unoptimized:

claude "fix error: [trace]" @src/**/*.ts
Context: 100 files, 50K lines
Time: 40 seconds
Cost: $0.45

Optimized (progressive):

# Step 1: Identify (3s)
claude "what causes: [stack trace]"
# → "Likely auth middleware line 234"

# Step 2: Examine (2s)
claude "show auth.ts:220-250"

# Step 3: Fix (3s)
claude "fix null reference in auth"

Total: 8 seconds, $0.08
Speedup: 5x faster
Savings: 82% cheaper

Benchmark 3: Batch Operations

Task: Add types to 50 JavaScript files

Unoptimized (sequential):

for file in src/**/*.js; do
  claude "add types" @$file
done
Time: 250 seconds
Cost: $2.50

Optimized (parallel + model + batch):

for file in src/**/*.js; do
  claude "add types" @$file --model="haiku-4.5"
done &
# Parallel with Haiku
Time: 10 seconds
Cost: $0.20
Speedup: 25x faster
Savings: 92% cheaper

Even better (Batch API):

claude-batch "add types" @src/**/*.js --model="haiku-4.5"
Time: ~15s (async, don't wait)
Cost: $0.10 (50% batch discount)
Savings: 96% cheaper

Part E: Monitoring & Planning

Budget Planning

By developer type:

Type	Daily Use	Monthly Budget	Queries/Day
Occasional	30 min	$5-10	10-20
Light	1-2 hours	$10-20	20-30
Medium	3-4 hours	$20-40	40-60
Heavy	5+ hours	$40-80	80-120
Power User	All day	$80-150	150+

Team budget allocation (10 developers):

Junior (4): $15/month × 4 = $60
Mid-level (4): $25/month × 4 = $100
Senior (2): $50/month × 2 = $100
CI/CD automation: $20/month
Buffer (20%): $56

Total: $336/month
Per dev average: $33.60/month

Set budget alerts:

# Personal budget
claude --set-budget 50  # $50/month
claude --alert-at 80    # Warn at 80%

# Check usage
claude --usage-stats --month=current

Performance Monitoring

Daily check:

claude --usage-today

Weekly review:

claude --usage-stats --week=current --detailed

# Example output:
# Week of Dec 18-24, 2025
# Total: $12.45, 347 queries
# Avg: $0.036/query
#
# By model:
# - Haiku: 180 queries, $0.85 (7%)
# - Sonnet: 152 queries, $9.60 (77%)
# - Opus: 15 queries, $2.00 (16%)
#
# Most expensive:
# 1. Architecture design: $0.85
# 2. Large refactoring: $0.42
# 3. Codebase analysis: $0.38

Performance tracking:

# Time operations
time claude "your query"

# Track tokens
claude "query" --verbose
# Shows: input, output, cached tokens, cost

Set performance budgets:

# Simple queries: < 2s
# Code reviews: < 5s
# Complex analysis: < 10s
# Batch jobs: Async (don't wait)

Still slow or costly? Troubleshooting (performance, cost).

ROI Calculation

Time savings value:

Task	Without Claude	With Claude	Time Saved
Code review	30 min	10 min	20 min
Write tests	45 min	15 min	30 min
Debug issue	2 hours	30 min	90 min
Refactoring	3 hours	1 hour	2 hours
Documentation	1 hour	20 min	40 min

Monthly ROI (medium user):

Weekly savings:
- 5 reviews × 20 min = 100 min
- 10 test suites × 30 min = 300 min
- 2 debugging × 90 min = 180 min
- 1 refactoring × 120 min = 120 min
- 3 docs × 40 min = 120 min

Total: 820 min/week = 13.7 hours/week
Monthly: 59 hours saved

Value @ $100/hour: $5,900/month
Claude cost: $25/month
ROI: 236x return

Team ROI (10 developers):

Monthly savings: 500 hours
Value @ $100/hour: $50,000/month

Cost: $250/month
Monthly ROI: 199x
Annual ROI: $600,000 / $3,000 = 200x
Payback period: < 1 week

Optimization Checklist

Setup (One-Time)

Daily Habits

Use Haiku for simple tasks
Attach only relevant files
Batch queries within 5 minutes (caching)
Search before reading files
Check: claude --usage-today

Weekly Review

Monthly Optimization

Summary

Key Optimizations:

CLAUDE.md → 70-90% cost savings (automatic caching)
Smart model selection → 10x cost difference, 3-5x speed
Disable extended thinking → 2-4x faster, 2-3x cheaper
Minimize context → 30-70% savings, much faster
Batch API → 50% discount for async work
Parallel operations → 10x faster for multi-file tasks
Progressive disclosure → Start fast, expand as needed
Streaming → 50-80% better perceived speed

Expected Results:

Cost: Light user $10-20/month, Heavy user $40-80/month
Speed: 10-50x faster with all optimizations
ROI: 50-200x return on investment

Quick Wins (Implement Today):

Create CLAUDE.md (5 minutes) → instant caching
Use --model="haiku-4.5" for simple tasks → 5x faster
Remove extended thinking from defaults → 2-4x faster
Use file line ranges instead of full files → 90%+ savings

← Back: Prompt Engineering

Next: Part 6 →