Python Engineering Guidelines
π― Why This Matters
Python is the backbone of our backend systems, AI agents, and eval infrastructure. This guideline helps us:
- Maintain a clean, composable, and observable codebase
- Align across all contributors and teams
- Move fast without sacrificing testability or reliability
- Build tools usable by both humans and LLMs
π§± Use the Official Template
All new projects must be based on our template:
π github.com/GenerativeModels-ai/python-template
It includes:
src/
structure for clean importsuv
for dependency managementpytest + hypothesis
testing setup- Logging, eval tracking, LLM prompt scaffolding
- CLI scaffolding with Typer
π» Use Cursor as Your IDE
Use Cursor as the default editor:
- Fast, AI-native dev environment
- Seamless GitHub and LLM integration
- Excellent for working on prompt-driven, version-controlled workflows
- Works well with our
src/
pattern and Python standards
π Project Layout (src/
pattern)
css
CopyEdit
project_name/
βββ src/
β βββ app/
β βββ models/
β βββ services/
β βββ llm/
β βββ routes/
β βββ utils/
βββ tests/
βββ scripts/
βββ evals/
βββ requirements.txt
βββ pyproject.toml
βββ README.md
π§° Tooling Stack
Category | Preferred Tool |
---|---|
Dependency Manager | uv |
Test Framework | pytest + hypothesis |
Type Checking | mypy , pyright |
Formatter | black |
Linter | ruff |
Web Framework | FastAPI |
Data Models | Pydantic |
Logging | loguru |
CLI Tooling | Typer |
Eval/LLM Logging | ClickHouse |
β Code Quality & Style
- Follow PEP8
- Format code with
black
- Lint with
ruff
- Use
mypy
orpyright
for static type checking - All public APIs, routes, and LLM functions must be fully typed
- Use semantic logging with
loguru
, including structured context (user, request_id, etc.)
π Dependency Management with uv
-
Use
uv
for all environment setup and dependency installation -
Install with:
uv pip install <package> uv pip freeze > requirements.txt
-
Use
uv pip sync
to reproduce environments -
Only commit (TODO: Review):
pyproject.toml
requirements.txt
π§ͺ Testing Principles
- Use
pytest
+hypothesis
- Structure tests mirroring your
src/
structure - Add
@pytest.mark.slow
for eval-heavy or long-running tests - Use fixtures and mocks for I/O, LLMs, and services
- Every business logic function must have a test
- Minimum coverage target: 80%, 100% for eval code and agents
π Prompt / LLM Development
- Place all prompts in
src/app/llm/prompts/
- Use structured builders or Jinja, not raw f-strings
- Version prompts using Git
- Track evals, feedback, and versions in ClickHouse or eval logs
- Always log:
- Prompt inputs
- Output
- Model/version
- Metadata (user, task, etc.)
- Latency
π Evaluation Guidelines
- Eval sets live in
evals/
- Use JSONL, CSV, or Pydantic datasets
- Always include:
- Model version
- Prompt used
- Data slice
- Output logs
- Metrics (hallucination, accuracy, etc.)
- Reproducible evals must use CLI tooling (see below)
π§΅ CLI Tooling with Typer
Use Typer for all internal scripts and eval tools:
- Type-safe, zero-boilerplate, modern
- Built on Click but works like FastAPI
- Add commands to
scripts/cli.py
:
python
CopyEdit
import typer
app = typer.Typer()
@app.command()
def evaluate(model: str, dataset: str = "default"):
...
@app.command()
def ingest(file: str):
...
if __name__ == "__main__":
app()
-
Run with:
bash CopyEdit python scripts/cli.py evaluate --model gpt-4
π Documentation Expectations
- Every module must have a module-level docstring
- Functions and classes must have Google- or NumPy-style docstrings
- CLI commands must include clear
-help
via Typer - Markdown documentation lives in
docs/
or on Notion - Major features must include a Notion link and Loom walkthrough
π Security Tips
- Never log secrets or raw user inputs from LLMs
- Sanitize inputs before prompt injection
- Validate all data using Pydantic
- Log user IDs and request trace IDs for all sensitive paths
π§ Final Note: Python That Thinks
Write Python thatβs composable, observable, and reliableβfor people and agents alike.
Start from the template. Use Cursor. Use Typer. Build responsibly.