GenerativeModels.ai
Engineering Guidelines Overview

Engineering Guidelines


🎯 Purpose

This document outlines engineering best practices, values, and decision-making principles at GenerativeModels.ai. It exists to help us move fast without breaking trust, build resilient systems, and stay aligned across contributors—even as we scale.


🧠 Our Engineering Philosophy

We are building foundational infrastructure for generative AI systems, which we and others use to build applications that are used at scale. Our software should be:

  • Composable: Each part can be improved or replaced without rewriting everything.
  • Evaluated: If we can’t measure it, we don’t ship it.
  • Reliable: Errors should be explainable, observable, and correctable.
  • Lean: Build only what’s needed—make it work, then make it elegant.
  • Agent-ready: Every piece of software should be easy for both humans and agents to interact with.

🔁 How We Build

1. Start with the End (User) in Mind

  • Whether it’s a UI or an API, identify who’s using it (internally or externally) and what value it delivers.
  • Our users include engineers, non-technical PMs, researchers, clients, and future agents.

2. Iterate in Public

  • Every repo should be tied to a corresponding Notion or GitHub project entry.
  • Use feature flags, sandbox deployments, and short feedback loops.
  • When in doubt, ship internally, demo early, collect real-world feedback.

3. Build for Evaluation

  • Every feature must include basic observability hooks—especially anything involving LLMs or AI logic.b
    • [TODO] Link to observability guidelines
  • Create eval sets for new model features or prompts.
  • Track key metrics: latency, accuracy, hallucination rate, user thumbs up/down, etc.

📦 Stack and Tooling Principles

✅ Preferred Tech

  • Infra: Docker Compose + Terraform for local + cloud parity.
  • Data: ClickHouse (analytics), PostgreSQL (app state), Qdrant (semantic search), Redis (cache/state), MinIO (assets).
  • Frontend: React + Tailwind + TipTap/Lexical for structured content.
  • Backend: Python/FastAPI preferred; TypeScript/Node okay when needed.
  • LLM Layer: Internal prompt libraries > Langchain-like wrappers. RAG-first. No hardcoded prompts in code.
    • TODO: Link to prompt engineering principles

🧩 Integrations

  • Build reusable data ingestion APIs for Google Drive, Notion, web scraping, etc.
  • Every tool must be agent-compatible: think POST /action, not just GUI.

📏 Code Quality & Process

1. Pull Request Checklist

  • Linked to Notion spec/task
  • Clear title and purpose
  • Has test coverage if logic-based
  • Linted and formatted
  • Includes observability hooks if it touches AI logic
  • Includes a short Loom walkthrough if it’s a major feature

2. Versioning Prompts like Code

  • Prompts live alongside app logic (same repo or repo-per-prompt module)
  • Use Git for versioning, Notion for notes, and output tracking for performance evaluation

3. Data as a First-Class Citizen

  • Labeling, eval sets, and datasets must live in versioned folders
    • [TODO] Guidelines for large, multimodal data
  • Use metadata.json for eval sets (creator, goal, metrics)
  • Prefer few-shot or system prompt injection over retraining where possible

🤖 AI-Specific Guidelines

  • Never trust a model blindly. Always include fallback logic or verification where user-facing.
  • No model without evaluation. Every new LLM, embedding, or generation logic must have a documented eval.
  • Observability first. Track prompt performance, hallucination triggers, API failures, and user feedback in ClickHouse.

📚 Documentation

  • Every repo must include:
    • README.md: Install, run, purpose
    • docs/: Usage examples, API schema, evaluation reports
  • Every internal tool or feature has a corresponding Notion page with:
    • Goal
    • Design doc or brainstorm
    • Link to PR(s)
    • Eval results (if AI-related)

🔓 Open Source and IP

  • We believe in transparency. When practical, we open source.
  • Keep core IP (e.g. eval tools, agent architecture, prompt strategy) internal unless specifically cleared for public use by @Ali Alavi .

🙋 Decision-making Heuristics

  • Optimize for learning velocity, not just output.
  • If it helps us build a better agent or reduce hallucinations, prioritize it.
  • Use existing tools if they meet 80% of the need—but know when to build.
  • Prefer ownership over perfection. Done is better than perfect.

🎓 Continuous Learning

This field changes fast, and you need to be on top of the latest changes. But it doesn’t mean you need to skip the basics! Make sure you learn basics and foundations really well. This includes fluency of the programming language, as well as skillfulness in foundational tools, such as terminal and version control.

A good resource for learning these foundational tools is MIT’s Missing Semester. While watching the whole series will be useful, you are encouraged to watch the following:

[TODO] Add further education links here. @Rosemary Liao @Simon Vutov Please add other resources you’ve found useful for upping your engineering skills here.