GenerativeModels.ai
Python Engineering Guidelines Overview

Python Engineering Guidelines


🎯 Why This Matters

Python is the backbone of our backend systems, AI agents, and eval infrastructure. This guideline helps us:

  • Maintain a clean, composable, and observable codebase
  • Align across all contributors and teams
  • Move fast without sacrificing testability or reliability
  • Build tools usable by both humans and LLMs

🧱 Use the Official Template

All new projects must be based on our template:

πŸ‘‰ github.com/GenerativeModels-ai/python-template

It includes:

  • src/ structure for clean imports
  • uv for dependency management
  • pytest + hypothesis testing setup
  • Logging, eval tracking, LLM prompt scaffolding
  • CLI scaffolding with Typer

πŸ’» Use Cursor as Your IDE

Use Cursor as the default editor:

  • Fast, AI-native dev environment
  • Seamless GitHub and LLM integration
  • Excellent for working on prompt-driven, version-controlled workflows
  • Works well with our src/ pattern and Python standards

πŸ“ Project Layout (src/ pattern)

css
CopyEdit
project_name/
β”œβ”€β”€ src/
β”‚   └── app/
β”‚       β”œβ”€β”€ models/
β”‚       β”œβ”€β”€ services/
β”‚       β”œβ”€β”€ llm/
β”‚       β”œβ”€β”€ routes/
β”‚       └── utils/
β”œβ”€β”€ tests/
β”œβ”€β”€ scripts/
β”œβ”€β”€ evals/
β”œβ”€β”€ requirements.txt
β”œβ”€β”€ pyproject.toml
└── README.md

🧰 Tooling Stack

CategoryPreferred Tool
Dependency Manageruv
Test Frameworkpytest + hypothesis
Type Checkingmypy, pyright
Formatterblack
Linterruff
Web FrameworkFastAPI
Data ModelsPydantic
Loggingloguru
CLI ToolingTyper
Eval/LLM LoggingClickHouse

βœ… Code Quality & Style

  • Follow PEP8
  • Format code with black
  • Lint with ruff
  • Use mypy or pyright for static type checking
  • All public APIs, routes, and LLM functions must be fully typed
  • Use semantic logging with loguru, including structured context (user, request_id, etc.)

πŸ”„ Dependency Management with uv

  • Use uv for all environment setup and dependency installation

  • Install with:

    uv pip install <package>
    uv pip freeze > requirements.txt
  • Use uv pip sync to reproduce environments

  • Only commit (TODO: Review):

    • pyproject.toml
    • requirements.txt

πŸ§ͺ Testing Principles

  • Use pytest + hypothesis
  • Structure tests mirroring your src/ structure
  • Add @pytest.mark.slow for eval-heavy or long-running tests
  • Use fixtures and mocks for I/O, LLMs, and services
  • Every business logic function must have a test
  • Minimum coverage target: 80%, 100% for eval code and agents

πŸ” Prompt / LLM Development

  • Place all prompts in src/app/llm/prompts/
  • Use structured builders or Jinja, not raw f-strings
  • Version prompts using Git
  • Track evals, feedback, and versions in ClickHouse or eval logs
  • Always log:
    • Prompt inputs
    • Output
    • Model/version
    • Metadata (user, task, etc.)
    • Latency

πŸ“ˆ Evaluation Guidelines

  • Eval sets live in evals/
  • Use JSONL, CSV, or Pydantic datasets
  • Always include:
    • Model version
    • Prompt used
    • Data slice
    • Output logs
    • Metrics (hallucination, accuracy, etc.)
  • Reproducible evals must use CLI tooling (see below)

🧡 CLI Tooling with Typer

Use Typer for all internal scripts and eval tools:

  • Type-safe, zero-boilerplate, modern
  • Built on Click but works like FastAPI
  • Add commands to scripts/cli.py:
python
CopyEdit
import typer

app = typer.Typer()

@app.command()
def evaluate(model: str, dataset: str = "default"):
    ...

@app.command()
def ingest(file: str):
    ...

if __name__ == "__main__":
    app()
  • Run with:

    bash
    CopyEdit
    python scripts/cli.py evaluate --model gpt-4
    

πŸ“š Documentation Expectations

  • Every module must have a module-level docstring
  • Functions and classes must have Google- or NumPy-style docstrings
  • CLI commands must include clear -help via Typer
  • Markdown documentation lives in docs/ or on Notion
  • Major features must include a Notion link and Loom walkthrough

πŸ” Security Tips

  • Never log secrets or raw user inputs from LLMs
  • Sanitize inputs before prompt injection
  • Validate all data using Pydantic
  • Log user IDs and request trace IDs for all sensitive paths

🧠 Final Note: Python That Thinks

Write Python that’s composable, observable, and reliableβ€”for people and agents alike.

Start from the template. Use Cursor. Use Typer. Build responsibly.