PR Review Guide for AI-Generated Code

Why AI Code Review is Different

Traditional code review focuses on formatting, naming conventions, and style consistency. AI-generated code typically “looks clean” but hides logic bugs that require different review priorities.

Key statistics:

Over 40% of AI-generated code solutions contain security flaws (academic studies, 2024-2025)
Missing input validation is the #1 flaw in LLM-generated code
AI code introduces both familiar vulnerabilities (injection, auth bypass) and novel risks (hallucinated dependencies, architectural drift)

AI models are trained on open-source code—both good and bad. They inherit not just best practices but also insecure patterns that were prevalent in training data.

Many review issues stem from Anti-Patterns.

1. Security Vulnerabilities

Missing input validation is the most common flaw. AI often omits validation unless explicitly prompted.

SQL Injection

# BAD: AI often generates string interpolation
query = f"SELECT * FROM users WHERE id = {user_id}"

# GOOD: Parameterized queries
cursor.execute("SELECT * FROM users WHERE id = ?", (user_id,))

Missing Authentication/Authorization

AI-generated endpoints frequently bypass auth entirely when prompts don’t explicitly require it:

# BAD: No auth check - common in AI output
@app.route('/admin/users')
def list_users():
    return db.get_all_users()

# GOOD: Explicit authorization
@app.route('/admin/users')
@require_role('admin')
def list_users():
    return db.get_all_users()

Hardcoded Secrets

// BAD: AI may hardcode credentials from training data patterns
const API_KEY = "sk-1234567890abcdef";

// GOOD: Environment variables
const API_KEY = process.env.API_KEY;

Dangerous Function Calls

Watch for: eval(), exec(), shell_exec(), innerHTML, document.write(), dangerouslySetInnerHTML

Command Injection

# BAD: User input in shell commands
os.system(f"convert {filename} output.png")

# GOOD: Use subprocess with explicit arguments
subprocess.run(["convert", filename, "output.png"], check=True)

2. Invariants (What Must Not Change)

Invariants are properties that must remain true throughout system execution and evolution. AI doesn’t understand your system’s invariants—you must verify they’re preserved.

API Contracts

Request/response shapes match existing documentation
HTTP status codes follow established patterns
Error response formats remain consistent

Database Schema Assumptions

Foreign key relationships preserved
NOT NULL constraints respected
Column types match expectations

Error Codes and Their Meanings

Error codes mean the same thing as before
New error codes don’t conflict with existing ones

Public Function Signatures

Parameters haven’t changed in incompatible ways
Return types are consistent
Thrown exceptions match interface contracts

Configuration File Formats

Environment variable names unchanged
Config file structure preserved
Default values maintained

Event Names and Payload Structures

Event names match subscribers’ expectations
Payload fields present and correctly typed
Breaking changes flagged for migration

Example from real systems:

“TigerBeetle doesn’t allocate memory after startup. This simple invariant affects every bit of code—whatever you do, you must manage with existing, pre-allocated data structures.” — matklad

3. Corner Cases and Boundaries

AI often handles the “happy path” correctly but misses edge cases. Use the 0-1-Many testing pattern.

Null/Undefined/Empty Inputs

// Does the code handle:
processItems(null);      // null input
processItems(undefined); // undefined input
processItems([]);        // empty array
processItems("");        // empty string

Array with 0, 1, Many Items

# Test all three cases
calculate_average([])           # 0 items - division by zero?
calculate_average([42])         # 1 item - special case handling?
calculate_average([1, 2, 3])    # many items - normal operation

Numeric Boundaries

Zero (often special-cased incorrectly)
Negative numbers (sign errors)
MAX_INT / MIN_INT (overflow)
Floating point precision (0.1 + 0.2 ≠ 0.3)

Empty Strings vs Null

// These are different!
user.name === ""     // empty string - user cleared their name
user.name === null   // null - name was never set

Unicode and Special Characters

Multi-byte characters (emoji, CJK)
RTL text (Arabic, Hebrew)
Null bytes in strings
SQL/HTML special characters

Concurrent Access Scenarios

Race conditions (Time-of-Check vs Time-of-Use)
Deadlocks
Lost updates

4. Error and Failure Handling

AI-generated code often has optimistic error handling or exposes sensitive information in error messages.

Network Timeouts and Retries

# BAD: No timeout, will hang forever
response = requests.get(url)

# GOOD: Explicit timeout
response = requests.get(url, timeout=30)

Database Connection Failures

What happens when the database is down?
Is there retry logic with backoff?
Are connections properly pooled and released?

Partial Failures in Batch Operations

# BAD: All-or-nothing without transaction
for item in items:
    db.save(item)  # What if item 50 fails?

# GOOD: Transaction with clear rollback
with db.transaction():
    for item in items:
        db.save(item)

Resource Cleanup

# BAD: Resource leak on error
file = open("data.txt")
data = file.read()
process(data)  # If this throws, file stays open
file.close()

# GOOD: Context manager ensures cleanup
with open("data.txt") as file:
    data = file.read()
    process(data)

Error Message Information Leakage

# BAD: Exposes internal details (stack trace, SQL, paths)
except Exception as e:
    return {"error": str(e)}

# GOOD: Generic message, detailed logging server-side
except Exception as e:
    logger.error(f"Operation failed: {e}")
    return {"error": "An error occurred. Please retry."}

Recovery Paths After Failure

Can the system recover from a crash mid-operation?
Is state left consistent after failures?
Are partial writes cleaned up?

What NOT to Focus On

AI-generated code typically excels at:

✅ Consistent formatting
✅ Reasonable variable naming
✅ Comment style
✅ Code structure

Don’t spend review time on surface-level “clean code” concerns. The real bugs hide in logic, security, and edge cases—areas where LLMs systematically underperform.

AI-Specific Pitfalls

Watch for these AI-unique failure modes:

Hallucinated APIs

AI may suggest methods/functions that don’t exist:

// AI might generate this, but `safeParseJSON` isn't a real method
const data = JSON.safeParseJSON(input);

Hallucinated Dependencies

AI may suggest packages that don’t exist (“slopsquatting” risk):

// Verify every dependency actually exists before installing
"dependencies": {
  "react-safe-utils": "^1.0.0"  // Does this package exist?
}

Ignored Constraints

AI may ignore requirements mentioned in the prompt—verify all requirements are actually implemented.

Tests Deleted Instead of Fixed

# AI might "fix" failing tests by removing them!
# Before: 50 tests, 2 failing
# After AI "fix": 48 tests, 0 failing  ← RED FLAG

Architectural Drift

Subtle design changes that break security invariants without violating syntax. Example: swapping crypto libraries, removing access control checks.

Quick Checklist

Use this during every AI-generated PR review:

Security

All user input validated server-side
Parameterized queries (no string interpolation in SQL)
Authentication required on protected endpoints
Authorization checks at each access point
No hardcoded secrets or credentials
No dangerous functions (eval, exec, innerHTML)
Output encoding appropriate for context (HTML, JS, URL)

Invariants

API contracts unchanged (or intentionally versioned)
Database schema assumptions preserved
Public interfaces backward compatible
Error codes/messages consistent
Config formats unchanged

Edge Cases

Handles null/undefined/empty input
Handles 0, 1, and many items
Numeric boundaries tested (0, negative, MAX)
Special characters handled
Concurrent access considered

Error Handling

Network calls have timeouts
Resources cleaned up on failure (files, connections)
Errors logged server-side with context
User-facing errors don’t leak implementation details
Partial failures handled gracefully

AI-Specific

All suggested APIs/methods actually exist
All dependencies verified in package registry
No tests mysteriously deleted
All prompt requirements actually implemented
Security controls not subtly removed