🎯 Why This Matters

Python is the backbone of our backend systems, AI agents, and eval infrastructure. This guideline helps us:

Maintain a clean, composable, and observable codebase
Align across all contributors and teams
Move fast without sacrificing testability or reliability
Build tools usable by both humans and LLMs

🧱 Use the Official Template

All new projects must be based on our template:

👉 github.com/GenerativeModels-ai/python-template

It includes:

src/ structure for clean imports
uv for dependency management
pytest + hypothesis testing setup
Logging, eval tracking, LLM prompt scaffolding
CLI scaffolding with Typer

💻 Use Cursor as Your IDE

Use Cursor as the default editor:

Fast, AI-native dev environment
Seamless GitHub and LLM integration
Excellent for working on prompt-driven, version-controlled workflows
Works well with our src/ pattern and Python standards

📁 Project Layout (`src/` pattern)

css
CopyEdit
project_name/
├── src/
│   └── app/
│       ├── models/
│       ├── services/
│       ├── llm/
│       ├── routes/
│       └── utils/
├── tests/
├── scripts/
├── evals/
├── requirements.txt
├── pyproject.toml
└── README.md

🧰 Tooling Stack

Category	Preferred Tool
Dependency Manager	`uv`
Test Framework	`pytest` + `hypothesis`
Type Checking	`mypy`, `pyright`
Formatter	`black`
Linter	`ruff`
Web Framework	`FastAPI`
Data Models	`Pydantic`
Logging	`loguru`
CLI Tooling	`Typer`
Eval/LLM Logging	ClickHouse

✅ Code Quality & Style

Follow PEP8
Format code with black
Lint with ruff
Use mypy or pyright for static type checking
All public APIs, routes, and LLM functions must be fully typed
Use semantic logging with loguru, including structured context (user, request_id, etc.)

🔄 Dependency Management with `uv`

Use uv for all environment setup and dependency installation

Install with:

uv pip install <package>
uv pip freeze > requirements.txt

Use uv pip sync to reproduce environments
Only commit (TODO: Review):
- pyproject.toml
- requirements.txt

🧪 Testing Principles

Use pytest + hypothesis
Structure tests mirroring your src/ structure
Add @pytest.mark.slow for eval-heavy or long-running tests
Use fixtures and mocks for I/O, LLMs, and services
Every business logic function must have a test
Minimum coverage target: 80%, 100% for eval code and agents

🔍 Prompt / LLM Development

Place all prompts in src/app/llm/prompts/
Use structured builders or Jinja, not raw f-strings
Version prompts using Git
Track evals, feedback, and versions in ClickHouse or eval logs
Always log:
- Prompt inputs
- Output
- Model/version
- Metadata (user, task, etc.)
- Latency

📈 Evaluation Guidelines

Eval sets live in evals/
Use JSONL, CSV, or Pydantic datasets
Always include:
- Model version
- Prompt used
- Data slice
- Output logs
- Metrics (hallucination, accuracy, etc.)
Reproducible evals must use CLI tooling (see below)

🧵 CLI Tooling with Typer

Use Typer for all internal scripts and eval tools:

Type-safe, zero-boilerplate, modern
Built on Click but works like FastAPI
Add commands to scripts/cli.py:

python
CopyEdit
import typer

app = typer.Typer()

@app.command()
def evaluate(model: str, dataset: str = "default"):
    ...

@app.command()
def ingest(file: str):
    ...

if __name__ == "__main__":
    app()

Run with:

bash
CopyEdit
python scripts/cli.py evaluate --model gpt-4

📚 Documentation Expectations

Every module must have a module-level docstring
Functions and classes must have Google- or NumPy-style docstrings
CLI commands must include clear -help via Typer
Markdown documentation lives in docs/ or on Notion
Major features must include a Notion link and Loom walkthrough

🔐 Security Tips

Never log secrets or raw user inputs from LLMs
Sanitize inputs before prompt injection
Validate all data using Pydantic
Log user IDs and request trace IDs for all sensitive paths

🧠 Final Note: Python That Thinks

Write Python that’s composable, observable, and reliable—for people and agents alike.

Start from the template. Use Cursor. Use Typer. Build responsibly.

Python Engineering Guidelines