diff --git a/examples/commands/review-plan.md b/examples/commands/review-plan.md
new file mode 100644
index 0000000..d8085af
--- /dev/null
+++ b/examples/commands/review-plan.md
@@ -0,0 +1,92 @@
+---
+name: review-plan
+description: "Structured plan review across 4 axes before writing any code (inspired by Garry Tan's workflow)"
+---
+
+# Review Plan Before Implementation
+
+Review the current plan thoroughly before making any code changes. For every issue or recommendation, explain the concrete tradeoffs, give an opinionated recommendation, and ask for user input before assuming a direction.
+
+## Engineering Preferences
+
+Use these to guide your recommendations (override with project-specific CLAUDE.md preferences if they exist):
+
+- DRY is important: flag repetition aggressively
+- Well-tested code is non-negotiable: prefer too many tests over too few
+- Code should be "engineered enough": not under-engineered (fragile, hacky) and not over-engineered (premature abstraction, unnecessary complexity)
+- Err on the side of handling more edge cases, not fewer
+- Bias toward explicit over clever; thoughtfulness over speed
+
+## Review Pipeline
+
+Work through each section sequentially. After each section, pause and ask for feedback before moving on.
+
+### 1. Architecture Review
+
+Evaluate:
+- Overall system design and component boundaries
+- Dependency graph and coupling concerns
+- Data flow patterns and potential bottlenecks
+- Scaling characteristics and single points of failure
+- Security architecture (auth, data access, API boundaries)
+
+### 2. Code Quality Review
+
+Evaluate:
+- Code organization and module structure
+- DRY violations (be aggressive here)
+- Error handling patterns and missing edge cases (call these out explicitly)
+- Technical debt hotspots
+- Areas that are over-engineered or under-engineered relative to engineering preferences
+
+### 3. Test Review
+
+Evaluate:
+- Test coverage gaps (unit, integration, e2e)
+- Test quality and assertion strength
+- Missing edge case coverage (be thorough)
+- Untested failure modes and error paths
+
+### 4. Performance Review
+
+Evaluate:
+- N+1 queries and database access patterns
+- Memory-usage concerns
+- Caching opportunities
+- Slow or high-complexity code paths
+
+## Issue Reporting Format
+
+For every specific issue found (bug, smell, design concern, or risk):
+
+1. Describe the problem concretely, with file and line references
+2. Present 2-3 options, including "do nothing" where that's reasonable
+3. For each option, specify: implementation effort, risk, impact on other code, and maintenance burden
+4. Give your recommended option and why, mapped to engineering preferences above
+5. Ask explicitly whether the user agrees or wants to choose a different direction before proceeding
+
+## Workflow
+
+- Do not assume priorities on timeline or scale
+- After each section, pause and ask for feedback before moving on
+- Use AskUserQuestion for structured option selection
+
+## Before Starting
+
+Ask if the user wants one of two options:
+
+1. **BIG CHANGE**: Work through this interactively, one section at a time (Architecture → Code Quality → Tests → Performance) with at most 4 top issues in each section
+2. **SMALL CHANGE**: Work through interactively ONE question per review section
+
+## Tips
+
+- Combine with `.claude/rules/` files for project-specific review criteria
+- Engineering preferences above can be overridden by your project's CLAUDE.md
+- For deeper analysis, use this command with Opus model
+
+## Sources
+
+- Inspired by [Garry Tan's Plan Mode prompt](https://garrytan.com/) (Feb 2026)
+- Adapted for Claude Code's native config system
+
+$ARGUMENTS
diff --git a/examples/rules/architecture-review.md b/examples/rules/architecture-review.md
new file mode 100644
index 0000000..d8ca12f
--- /dev/null
+++ b/examples/rules/architecture-review.md
@@ -0,0 +1,33 @@
+---
+description: "Architecture review criteria for plan and code reviews"
+---
+
+# Architecture Review Criteria
+
+When reviewing architecture (plans or code), evaluate these dimensions:
+
+## System Design
+- Are component boundaries clear and well-defined?
+- Does each component have a single, well-understood responsibility?
+- Are interfaces between components minimal and well-documented?
+
+## Dependencies
+- Is the dependency graph acyclic and manageable?
+- Are there circular dependencies that need breaking?
+- Are external dependencies justified and up-to-date?
+
+## Data Flow
+- Is data ownership clear (which component is source of truth)?
+- Are there potential bottlenecks in the data pipeline?
+- Is data transformation happening at the right layer?
+
+## Scaling
+- What are the single points of failure?
+- Where will the system break under 10x load?
+- Are stateless and stateful components properly separated?
+
+## Security
+- Are authentication and authorization properly layered?
+- Is data access controlled at the right boundaries?
+- Are API boundaries validated (input sanitization, rate limiting)?
+- Are secrets properly managed (no hardcoded values)?
diff --git a/examples/rules/code-quality-review.md b/examples/rules/code-quality-review.md
new file mode 100644
index 0000000..4dbe1c5
--- /dev/null
+++ b/examples/rules/code-quality-review.md
@@ -0,0 +1,33 @@
+---
+description: "Code quality review criteria for plan and code reviews"
+---
+
+# Code Quality Review Criteria
+
+When reviewing code quality, evaluate these dimensions:
+
+## Organization
+- Is the module structure logical and consistent?
+- Are files in the right directories?
+- Is the naming convention consistent across the codebase?
+
+## DRY Violations
+- Flag any duplicated logic (be aggressive)
+- Identify copy-paste patterns that should be abstracted
+- Check for repeated configuration or magic values
+
+## Error Handling
+- Are errors handled at the right level (not swallowed, not over-caught)?
+- Are edge cases explicitly handled or documented as out-of-scope?
+- Do error messages provide enough context for debugging?
+- Are there silent failures (empty catch blocks, ignored return values)?
+
+## Technical Debt
+- Which areas have the highest maintenance burden?
+- Are there TODO/FIXME comments that should be addressed now?
+- Is there dead code that should be removed?
+
+## Engineering Balance
+- Are there areas that are over-engineered (premature abstraction, unnecessary complexity)?
+- Are there areas that are under-engineered (fragile, hacky, missing validation)?
+- Does the complexity match the actual requirements?
diff --git a/examples/rules/performance-review.md b/examples/rules/performance-review.md
new file mode 100644
index 0000000..09625bc
--- /dev/null
+++ b/examples/rules/performance-review.md
@@ -0,0 +1,29 @@
+---
+description: "Performance review criteria for plan and code reviews"
+---
+
+# Performance Review Criteria
+
+When reviewing performance, evaluate these dimensions:
+
+## Database Access
+- Are there N+1 query patterns (loop with individual queries)?
+- Are queries using appropriate indexes?
+- Is data fetched at the right granularity (not over-fetching)?
+- Are bulk operations used where possible?
+
+## Memory
+- Are large datasets streamed rather than loaded entirely in memory?
+- Are there potential memory leaks (event listeners, unclosed connections)?
+- Is object allocation minimized in hot paths?
+
+## Caching
+- What data is expensive to compute and stable enough to cache?
+- Are cache invalidation strategies defined?
+- Is caching applied at the right layer (application, database, CDN)?
+
+## Complexity
+- Are there O(n^2) or worse algorithms that could be optimized?
+- Are hot paths identified and optimized?
+- Is unnecessary work being done (redundant computations, unused data transforms)?
+- Are expensive operations deferred or lazy-loaded where possible?
diff --git a/examples/rules/test-review.md b/examples/rules/test-review.md
new file mode 100644
index 0000000..e7551e0
--- /dev/null
+++ b/examples/rules/test-review.md
@@ -0,0 +1,29 @@
+---
+description: "Test review criteria for plan and code reviews"
+---
+
+# Test Review Criteria
+
+When reviewing tests, evaluate these dimensions:
+
+## Coverage Gaps
+- Are there untested public functions or API endpoints?
+- Is there unit, integration, AND e2e coverage where appropriate?
+- Are critical paths (auth, payments, data mutations) fully tested?
+
+## Test Quality
+- Do assertions test behavior, not implementation details?
+- Are test descriptions clear about what they verify?
+- Do tests fail for the right reasons (not brittle/flaky)?
+- Is each test independent (no shared mutable state)?
+
+## Edge Cases
+- Are boundary values tested (empty, null, max, negative)?
+- Are error paths tested (network failures, invalid input, timeouts)?
+- Are race conditions and concurrent access scenarios covered?
+
+## Failure Modes
+- What happens when external services are unavailable?
+- Are retry and fallback mechanisms tested?
+- Do tests verify graceful degradation?
+- Are error messages and status codes correct for each failure?