GenerativeModels.ai
Code Review Process Overview

Code Reviews


“Programs must be written for people to read, and only incidentally for machines to execute.” Harold Abelson and Gerald Jay Sussman, Structure and Interpretation of Computer Programs, 2nd ed., MIT Press, 1996, Preface.

🎯 Why Code Reviews Matter


Code reviews aren’t just about catching bugs. They’re about:

  • Knowledge sharing: Every review is a chance to level each other up.
  • Long-term velocity: Fast code is good. Maintainable code is better.
  • Protecting the user: Especially with LLMs and agents, bad assumptions = hallucinations, bugs, or trust failures.

✅ What to Look For

1. Clarity & Intent

  • Is it obvious what this code does and why it was added?
  • Are variable names, function names, and structure easy to follow?
  • Are there inline comments or Notion links explaining non-obvious logic?

2. Atomic Commits

  • Is this pull request focused on one thing?
  • If not, it should be split. No catch-all “cleanup” PRs unless discussed.

3. Evaluation-Ready

  • If the code touches model logic, prompt flow, or user-facing AI output:
    • Does it log outputs?
    • Is there an eval set or at least a test input set?
    • Are results tracked somewhere (ClickHouse, Notion, markdown)?
      • [TODO] Require specific approach for this

4. Observability

  • Is the feature debug-friendly? Look for:
    • Logs (with clear log levels)
    • Error tracking/reporting
    • Analytics hooks if user-facing

5. Security & Privacy

  • Any hardcoded secrets, access tokens, or unsafe evals?
  • Are we respecting user data boundaries?
  • Any potential for prompt injection or LLM misuse?

6. Consistency

  • Does it follow our stack’s conventions?
  • Does it use shared libraries/utilities instead of reinventing?

7. Tests

  • Are there meaningful unit/integration tests?
    • [TODO] Add test pyramid
  • Are eval prompts tested where relevant?
  • If no tests, is there a reason (e.g. exploratory code, not prod-bound yet)?

🚨 Red Flags

  • Logic buried in prompts with no version control or testing
  • Silent failures or except: pass patterns
  • Feature flagging missing for experimental code
  • Pushing without a corresponding design doc or Notion task
  • PR > 500 LOC without a strong reason or breakdown

🙋 Reviewer Mindset

  • Be kind. Be curious. Ask clarifying questions instead of assuming mistakes.
  • Review the why, not just the what. Does this fit the direction of the product, not just the codebase?
  • Don’t block unless necessary. If something’s not ideal but isn’t critical, suggest + approve.
  • Leave clear, thoughtful, and useful feedback. Avoid sarcasm, personal attacks, etc.

🤖 AI-Specific Code Review Tips

  • Prompts are code. Version them. Document them. Review them like logic.
  • LLM calls need fallback. Always check: What happens if the model fails or returns junk?
  • Data > assumptions. Encourage logging real outputs, not just theoretical flows.

📋 Code Review Checklist (for Authors)

Before requesting a review, make sure:

  • The PR has a clear title and description
  • The PR links to a Notion task or goal
  • You’ve tested it locally or on staging
  • Prompt logic is isolated and versioned
  • Eval/logging is included if it touches AI
  • You’ve written a Loom walkthrough (if >300 LOC or user-facing)

🧠 Final Note: Build With Trust

Every line of code is a commitment to the team, the product, and our users. Reviews are how we protect velocity without losing quality. The goal is not to gatekeep—it’s to raise the floor for everyone.

📖 Required Reading

Make sure you to read Google’s The Standard of Code Review, and the corresponding Author’s Guide and Reviewer Guide.