Production Python
Apply every rule below whenever writing or modifying Python code.
When to Apply
- Writing new Python modules, classes, or functions
- Creating Pydantic models or SQLAlchemy ORM models
- Setting up project structure (imports, logging, config, paths)
- Writing or updating tests
- Reviewing Python code for production readiness
- Scaffolding FastAPI endpoints, repositories, or pipelines
Quick Reference
| Rule | Pattern |
|---|---|
| Module header | """Module Name: filename.py\nDescription: ...""" |
| Section markers | 77-dash lines with # SECTION: Name |
| End of module | # END OF MODULE block — always last |
| Imports | 3 groups: stdlib / third-party / local, alphabetical |
| Logger | logger = setup_logger(__name__) — never print() |
| Type hints | All args + return, Optional[X] not X | None |
| Docstrings | Google style, class docstring on class not __init__ |
| Naming | snake_case funcs, PascalCase classes, UPPER_SNAKE constants |
| Formatting | 100-char target, trailing commas, 2 blanks between top-level |
| Strings | f-strings only, pathlib.Path for all paths |
| Errors | Specific exceptions, from e chaining, log before raise |
| Testing | test_<func>_<scenario>, Arrange/Act/Assert, mock all I/O |
| Architecture | Repository pattern, centralized paths, registry for dispatch |
| Pydantic | ConfigDict, Field(description=...), separate Create/Response |
| SQLAlchemy | Mapped[T] + mapped_column(), back_populates |
| Package mgmt | uv add, uv sync --frozen in CI |
1. Module Header
Every .py file starts with a module docstring — no exceptions:
"""
Module Name: filename.py
Description:
Clear, concise description of the module's purpose and role
in the overall system. Can span multiple lines.
"""
No author, date, or version — that belongs in version control.
2. Section Markers
Wrap every logical section in 77-character dash lines with blank lines before and after:
# --------------------------------------------------------------------------
# SECTION: Imports
# --------------------------------------------------------------------------
Standard order (include only what's needed): Imports -> Logger Initialization -> Constants -> Type Aliases -> [content sections]
Class sub-sections use shorter inline markers:
class ExampleClass:
# --- Constructor ---
def __init__(self, param: str) -> None: ...
# --- Public API ---
def process(self, data: List[Dict]) -> pd.DataFrame: ...
# --- Private Helpers ---
def _validate(self) -> None: ...
3. End-of-Module Marker
The very last thing in every .py file — no code or comments after it:
# --------------------------------------------------------------------------
# END OF MODULE
# --------------------------------------------------------------------------
4. Import Organization
Three groups with comment headers, separated by blank lines:
# Standard library imports
import logging
import os
from pathlib import Path
from typing import Any, Dict, List, Optional, Tuple
# Third-party imports
import pandas as pd
from pydantic import BaseModel
# Local application imports
from src.logger import setup_logger
from src.utils.helpers import load_prompt
Rules:
- Alphabetical within each group — bare
import Xbeforefrom X import Y - Group multiple from same module:
from typing import Dict, List, Optional - Absolute imports only — never relative (
from .module import x) - Never
import *
5. Logger
Every project needs a central logger.py (see references/REFERENCE.md for full implementation). In every module, one line after imports:
logger = setup_logger(__name__)
Log level guide:
logger.debug(f"Chunk {i}/{total} processed — tokens: {tokens}")
logger.info(f"Processing complete — {count} records in {elapsed:.2f}s")
logger.warning(f"Unexpected mode '{mode}', falling back to default")
logger.error(f"Chain execution failed: {e}")
logger.critical("Database connection lost — shutting down")
Never use print() in production code.
6. Type Hints
Required on all function signatures — no exceptions for public APIs:
def load_prompt(file_path: str) -> str: ...
def process(items: List[str], limit: int = 10) -> Optional[Dict[str, int]]: ...
Multi-line signatures — one param per line, trailing comma:
def process_chunks(
chunks: List[Dict[str, Any]],
prompts: List[Dict[str, str]],
run_sequentially: bool = False,
) -> Tuple[List[Dict[str, Any]], int, int]:
Class attributes — document in the class docstring Attributes: section.
Rules:
Optional[X]notX | None(Python 3.9 compat)- Import
Dict,List,Tuple,Optionalfromtyping - Always annotate return type — even
-> None - Use
Anysparingly
7. Docstrings — Google Style
Class docstring — on the class, never on __init__:
class BaseAgent:
"""
Base class providing shared methods for all LLM-based agents.
Attributes:
model_name (str): LLM model identifier.
response_schema: Pydantic model for parsing output.
"""
Function docstring:
def fetch_data(source: str, timeout: int = 30) -> List[Dict]:
"""
Fetch structured data from source.
Args:
source: URL or file path to fetch from.
timeout: Request timeout in seconds (default: 30).
Returns:
List of record dicts. Empty list if nothing found.
Raises:
ValueError: If source is empty or malformed.
"""
Pydantic fields — always use Field(description=...):
row_index: int = Field(description="Row index in the original data.")
confidence: float = Field(description="Confidence score 0.0-1.0.", default=1.0)
8. Naming Conventions
| Kind | Convention | Example |
|---|---|---|
| Variables, functions, modules | snake_case | user_id, parse_resume() |
| Private methods / attributes | _prefix | _validate(), _cache |
| Classes | PascalCase | MatchResult, BaseAgent |
| Constants | UPPER_SNAKE_CASE | MAX_RETRIES, DEFAULT_TIMEOUT |
| Type aliases | PascalCase | ChunkType = Dict[str, Any] |
Descriptive, unambiguous names. Avoid abbreviations except standard ones (url, id, db, api).
Class patterns: [Domain]Agent, [Domain]Manager, [Domain]Repository, descriptive nouns for models.
9. Formatting Standards
Target line length: 100 characters (hard limit: 120).
Blank lines: 2 between top-level definitions, 1 between methods, 1 before/after section markers.
Long signatures — one param per line, trailing comma:
def process_data(
input_path: Path,
output_dir: Path,
chunk_size: int = 1000,
) -> List[Dict[str, Any]]:
String continuation — parenthesized implicit concatenation:
error_message = (
f"Failed to process file {file_path}. "
f"Error: {error_details}."
)
Method chains — one call per line:
result = (
df.query("status == 'active'")
.groupby("category")
.agg(total=("amount", "sum"))
.reset_index()
)
Comments: 2 spaces before # for inline; block comments above code preferred.
10. Strings & Paths
f-strings exclusively — never .format() or %:
logger.info(f"Processing {count} records for job {job_id}")
pathlib.Path always — never string concatenation:
output_path = Path(base_dir) / "results" / f"{job_id}.json"
11. Error Handling
Catch specific exceptions. Never bare except:. Log before raising:
try:
result = parse_document(file_path)
except FileNotFoundError:
logger.error(f"File not found: {file_path}")
raise
except ValueError as e:
logger.warning(f"Parse failed for {file_path}: {e}")
return None
Exception chaining — always from e to preserve traceback:
except json.JSONDecodeError as e:
raise ExtractionError(f"Invalid JSON in {file_path}") from e
finally for cleanup:
try:
conn = get_connection()
result = conn.execute(query)
finally:
conn.close()
Retry with Tenacity for transient failures:
from tenacity import retry, stop_after_attempt, wait_exponential
@retry(stop=stop_after_attempt(3), wait=wait_exponential(multiplier=1, max=10))
def call_llm(prompt: str) -> str:
"""Call LLM API with automatic retry on transient failures."""
return client.chat(prompt).content
Full traceback logging: logger.error(f"Failed: {e}\n{traceback.format_exc()}")
Custom exceptions for domain errors: class ExtractionError(Exception): ...
12. Architecture Patterns
Repository Pattern — all DB access through repository classes:
class JobRepository:
"""Encapsulates all job-related database operations."""
def __init__(self, session: Session) -> None:
self.session = session
def get_by_id(self, job_id: uuid.UUID) -> Optional[Job]:
return self.session.get(Job, job_id)
def create(self, **kwargs) -> Job:
job = Job(**kwargs)
self.session.add(job)
self.session.commit()
self.session.refresh(job)
return job
Centralized Paths — single source of truth in config/paths.py:
from pathlib import Path
PROJECT_ROOT = Path(__file__).resolve().parent.parent
DATA_DIR = PROJECT_ROOT / "data"
CONFIG_DIR = PROJECT_ROOT / "config"
LOGS_DIR = PROJECT_ROOT / "logs"
Registry Pattern — extensible dispatch:
PROCESSOR_REGISTRY: Dict[str, Callable] = {}
def register_processor(name: str, func: Callable) -> None:
PROCESSOR_REGISTRY[name] = func
def process(name: str, data: Dict) -> Dict:
return PROCESSOR_REGISTRY.get(name, process_default)(data)
Configuration — YAML in config/, never hardcoded values.
13. Testing Patterns
Naming: test_<module>.py in tests/ mirroring src/. Functions: test_<func>_<scenario>.
Structure — Arrange / Act / Assert:
def test_process_chunks_returns_expected_count() -> None:
"""Verify process_chunks returns one result per input chunk."""
# Arrange
chunks = [{"text": "hello"}, {"text": "world"}]
pipeline = ProcessingPipeline(model_name="test")
# Act
results = pipeline.process_chunks(chunks)
# Assert
assert len(results) == 2
assert all("output" in r for r in results)
Fixtures — @pytest.fixture for shared setup, conftest.py for cross-module:
@pytest.fixture
def db_session(tmp_path: Path) -> Generator[Session, None, None]:
engine = create_engine(f"sqlite:///{tmp_path / 'test.db'}")
Base.metadata.create_all(engine)
with Session(engine) as session:
yield session
Mocking — prefer monkeypatch over unittest.mock.patch:
def test_fetch_data_handles_timeout(monkeypatch: pytest.MonkeyPatch) -> None:
monkeypatch.setattr(requests, "get", Mock(side_effect=Timeout))
assert fetch_data("https://api.example.com") is None
Rules: one behavior per test, pytest.raises for expected exceptions, mock all I/O.
14. Pydantic Models
Separate models for API boundaries:
class CandidateCreate(BaseModel):
"""Request schema for creating a candidate."""
name: str = Field(description="Full legal name.")
email: str = Field(description="Primary email address.")
class CandidateResponse(BaseModel):
"""Response schema — never expose ORM objects directly."""
candidate_id: uuid.UUID
name: str
created_at: datetime
model_config = ConfigDict(from_attributes=True)
Structured LLM output:
class ExtractedField(BaseModel):
field_name: str = Field(description="Name of the extracted field.")
value: str = Field(description="Extracted value.")
confidence: float = Field(description="Confidence score 0.0-1.0.")
class ExtractionResult(BaseModel):
fields: List[ExtractedField]
Rules: ConfigDict (not deprecated class Config), Field(description=...) on all fields, one model per concept.
15. SQLAlchemy Models (2.0)
class Job(Base):
"""ORM model for the jobs table."""
__tablename__ = "jobs"
job_id: Mapped[uuid.UUID] = mapped_column(UUID(as_uuid=True), primary_key=True, default=uuid.uuid4)
title: Mapped[str] = mapped_column(String(200), nullable=False)
status: Mapped[str] = mapped_column(String(50), default="draft")
created_at: Mapped[datetime] = mapped_column(DateTime(timezone=True), server_default=func.now())
candidates: Mapped[List["Candidate"]] = relationship(back_populates="job")
def __repr__(self) -> str:
return f"Job(job_id={self.job_id!r}, title={self.title!r})"
Rules:
Mapped[T]+mapped_column()— never legacyColumn()- UUID primary keys, all timestamps
timezone=True,server_default=func.now() back_populates(neverbackref),__repr__on every model- No business logic in models — pure data containers
- Never
max()on UUID columns — userow_number()window functions
16. Anti-Patterns
try: ...
except: ... # Bare except — hides bugs
print("done") # Use logger, never print()
def add(items=[]): # Mutable default — shared state bug
items.append(1)
from module import * # Wildcard — pollutes namespace
path = "/base/" + subdir + "/f.txt" # Use pathlib
msg = "Hello %s" % name # Use f-strings
from .utils import helper # Use absolute imports
API_KEY = "sk-abc123" # Use environment variables
Also avoid: logic in __init__.py (re-exports only), god functions (> ~50 lines).
17. Pre-Commit Checklist
- Module docstring with
Module Name:andDescription: - Section markers (77-dash lines) around logical sections
-
# END OF MODULEas the very last line - Imports in 3 groups, sorted alphabetically
-
logger = setup_logger(__name__)present - Type hints on all public function args + return
- Google-style docstrings on public functions/classes
- No
print(), no bareexcept:, no mutable defaults - f-strings throughout,
pathlib.Pathfor file paths - Exception chaining uses
from e - Retry decorators on external API calls
- Public functions have at least one test
- No hardcoded paths — use
config/paths.py
18. CHANGELOG.md
Follow Keep a Changelog + Semantic Versioning. Full template in references/REFERENCE.md.
Rules: reverse chronological, YYYY-MM-DD dates, sections Added/Changed/Fixed/Removed/Security (omit empty ones). MAJOR = breaking, MINOR = features, PATCH = fixes.
19. README.md
Required sections: Title -> Stack -> Architecture -> Quick Start -> Environment Variables -> Development -> Project Structure -> Docs
Full template in references/REFERENCE.md. Start broad (what + why), get specific (how). Quick Start < 5 steps.
20. Jupyter Notebooks
Notebooks in notebooks/, snake_case names (no dates). Cell order: title -> imports -> config -> processing -> results. Use setup_logger(__name__). Clear outputs before committing. Full conventions in references/REFERENCE.md.
21. Package Management with uv
uv add fastapi # Add dependency
uv add pandas==2.2.3 # Pin exact version for production
uv add --dev pytest ruff black # Dev dependency
uv sync # Install from lock file
uv sync --no-dev --frozen # CI/CD — production only
Rules: pin exact versions in production, commit both pyproject.toml and uv.lock, dev tools under [project.optional-dependencies].
Dockerfile: COPY --from=ghcr.io/astral-sh/uv:latest /uv /usr/local/bin/uv then uv sync --system --no-dev --frozen.
For full templates, complete code examples, and additional resources, see references/REFERENCE.md.