David MacIver Hypothesis Style Guide⁠‍⁠‌‌‌‌‍‌‌‌‌‍‌‌‌‍‌‌‌‍‌‍‌‌‌‌‍‌‍‌‌‌‌‌‌‍‌‌‌‍‌‌‌‌‌‌‌‍‌‌‌‍‌‌‌‌‌‌‍‌‌‌‌‍‌‌‌‌‍‌‌‌‌‌‍‌‌‍‌‌‌‌‌‌‍‌‌‌‌‍‌‌‌‍‌‌‌‌‌‌‍‌‌‌‌‌‌‍‌‌‌‍‌‌‌⁠‍⁠

Overview

David MacIver created Hypothesis, widely considered the most sophisticated property-based testing library available. Hypothesis improves on QuickCheck with better shrinking (using internal reduction rather than type-based shrinking), an example database for regression testing, and deep integration with Python's ecosystem. MacIver's philosophy emphasizes that property-based testing should be practical, integrated into normal development workflows, and produce genuinely useful minimal examples.

Core Philosophy

"The purpose of Hypothesis is to make it easier to write better tests."

"Shrinking should produce the simplest example, not just a smaller one."

"Every failing example should be saved and replayed forever."

MacIver believes that property-based testing fails when it's treated as exotic. Hypothesis is designed to integrate seamlessly with pytest, produce human-readable minimal examples, and remember every failure to prevent regressions.

Design Principles

Integrated Shrinking: Shrinking happens during generation, not after—producing simpler examples.

Example Database: Every failure is saved and replayed on subsequent runs.

Compositional Strategies: Build complex generators from simple ones.

Practical by Default: Sensible defaults that work for real projects.

Stateful Testing: First-class support for testing state machines.

Hypothesis Architecture

┌─────────────────────────────────────────────────────────────┐ │ HYPOTHESIS INTERNALS │ ├─────────────────────────────────────────────────────────────┤ │ │ │ STRATEGY (describes how to generate data) │ │ │ │ │ ▼ │ │ CONJECTURE ENGINE │ │ ┌────────────────────────────────────────────────────────┐ │ │ │ Generates a stream of bytes (the "choice sequence") │ │ │ │ │ │ │ │ Strategy interprets bytes → structured data │ │ │ │ │ │ │ │ Shrinking = finding smaller choice sequences │ │ │ │ that still fail (not type-aware, universal) │ │ │ └────────────────────────────────────────────────────────┘ │ │ │ │ │ ▼ │ │ EXAMPLE DATABASE │ │ ┌────────────────────────────────────────────────────────┐ │ │ │ .hypothesis/examples/ │ │ │ │ Stores choice sequences for all failing examples │ │ │ │ Replays them first on every run │ │ │ │ Never forgets a failure │ │ │ └────────────────────────────────────────────────────────┘ │ │ │ └─────────────────────────────────────────────────────────────┘

When Using Hypothesis

Always

Use @given decorator for property-based tests
Let Hypothesis shrink—don't manually minimize
Commit the .hypothesis directory for CI
Use @example for important edge cases
Combine with pytest fixtures naturally
Use assume() to filter invalid inputs

Never

Ignore the example database (commit it!)
Use random directly—use strategies
Catch exceptions to hide failures
Set max_examples too low (<100)
Skip @example for known edge cases
Use filter() when assume() works

Prefer

Composite strategies over complex custom ones
@example for regression tests
assume() over filter() for preconditions
Stateful testing for APIs
data() strategy for dynamic generation
Settings profiles for CI vs local

Code Patterns

Basic Property Tests

from hypothesis import given, example, assume, settings from hypothesis import strategies as st

Basic property test

@given(st.lists(st.integers())) def test_sort_preserves_length(xs): assert len(sorted(xs)) == len(xs)

With explicit example for regression

@given(st.lists(st.integers())) @example([]) # Empty list edge case @example([1]) # Single element @example([2, 1]) # Minimal unsorted def test_sort_is_sorted(xs): result = sorted(xs) assert all(result[i] <= result[i+1] for i in range(len(result)-1))

Roundtrip property

@given(st.binary()) def test_compress_decompress_roundtrip(data): assert decompress(compress(data)) == data

With precondition

@given(st.integers(), st.integers()) def test_division(a, b): assume(b != 0) # Skip when b is 0 assert (a // b) * b + (a % b) == a

Strategy Composition

from hypothesis import strategies as st from hypothesis import given

Composite strategy for custom types

@st.composite def user_strategy(draw): """Generate valid User objects.""" name = draw(st.text(min_size=1, max_size=50)) age = draw(st.integers(min_value=0, max_value=150)) email = draw(st.emails())

return User(name=name, age=age, email=email)

@given(user_strategy()) def test_user_serialization(user): assert User.from_json(user.to_json()) == user

Recursive strategy for tree structures

def json_strategy(): """Generate arbitrary JSON-compatible data.""" return st.recursive( # Base case: simple values st.none() | st.booleans() | st.integers() | st.floats(allow_nan=False) | st.text(), # Recursive case: lists and dicts lambda children: st.lists(children) | st.dictionaries(st.text(), children), max_leaves=50 )

@given(json_strategy()) def test_json_roundtrip(data): import json assert json.loads(json.dumps(data)) == data

Dependent generation

@st.composite def sorted_pair(draw): """Generate two integers where a <= b.""" a = draw(st.integers()) b = draw(st.integers(min_value=a)) return (a, b)

@given(sorted_pair()) def test_range(pair): a, b = pair assert a <= b

Stateful Testing

from hypothesis.stateful import RuleBasedStateMachine, rule, invariant from hypothesis import strategies as st

class DatabaseStateMachine(RuleBasedStateMachine): """ Test a key-value store against a reference model. Hypothesis generates sequences of operations. """

def __init__(self):
    super().__init__()
    self.model = {}  # Reference model
    self.db = Database()  # System under test

@rule(key=st.text(), value=st.binary())
def put(self, key, value):
    """Put a key-value pair."""
    self.model[key] = value
    self.db.put(key, value)

@rule(key=st.text())
def get(self, key):
    """Get a value by key."""
    model_result = self.model.get(key)
    db_result = self.db.get(key)
    assert model_result == db_result

@rule(key=st.text())
def delete(self, key):
    """Delete a key."""
    self.model.pop(key, None)
    self.db.delete(key)

@invariant()
def size_matches(self):
    """Invariant: size should always match model."""
    assert len(self.model) == self.db.size()

Run the state machine test

TestDatabase = DatabaseStateMachine.TestCase

More complex: with bundles for generated values

class QueueStateMachine(RuleBasedStateMachine): """Test a queue with references to pushed items."""

def __init__(self):
    super().__init__()
    self.model = []
    self.queue = Queue()

items = Bundle('items')

@rule(target=items, item=st.integers())
def push(self, item):
    """Push an item and track it."""
    self.model.append(item)
    self.queue.push(item)
    return item

@rule()
def pop(self):
    """Pop an item."""
    if self.model:
        expected = self.model.pop(0)
        actual = self.queue.pop()
        assert expected == actual

@rule(item=items)
def contains(self, item):
    """Check if a previously-pushed item is present."""
    expected = item in self.model
    actual = self.queue.contains(item)
    assert expected == actual

Settings and Profiles

from hypothesis import settings, Verbosity, Phase, HealthCheck from hypothesis import given from hypothesis import strategies as st

Custom settings for a single test

@settings( max_examples=500, deadline=None, # No time limit suppress_health_check=[HealthCheck.too_slow], ) @given(st.lists(st.integers(), min_size=1000)) def test_large_lists(xs): assert sorted(xs) == sorted(xs)

Register profiles for different environments

settings.register_profile("ci", max_examples=1000) settings.register_profile("dev", max_examples=100) settings.register_profile("debug", max_examples=10, verbosity=Verbosity.verbose)

Load profile from environment

HYPOTHESIS_PROFILE=ci pytest

Deadline handling

@settings(deadline=200) # 200ms max per example @given(st.binary(min_size=1000)) def test_with_deadline(data): process(data)

Control phases

@settings( phases=[ Phase.explicit, # Run @example first Phase.reuse, # Run from database Phase.generate, # Generate new examples Phase.shrink, # Shrink failures ] ) @given(st.integers()) def test_all_phases(n): assert n < 1000000

Advanced Strategies

from hypothesis import strategies as st from hypothesis import given

Fixed dictionaries

@given(st.fixed_dictionaries({ 'name': st.text(min_size=1), 'age': st.integers(0, 150), 'active': st.booleans(), })) def test_user_dict(user): assert 'name' in user assert 0 <= user['age'] <= 150

One of several strategies

@given(st.one_of( st.none(), st.integers(), st.text(), )) def test_nullable_values(value): # Value is None, int, or str assert value is None or isinstance(value, (int, str))

Data strategy for dynamic generation

from hypothesis import given from hypothesis.strategies import data

@given(data()) def test_dynamic_generation(data): # Generate based on runtime conditions n = data.draw(st.integers(min_value=1, max_value=10)) xs = data.draw(st.lists(st.integers(), min_size=n, max_size=n))

assert len(xs) == n

Sampled from

@given(st.sampled_from(['red', 'green', 'blue'])) def test_colors(color): assert color in ['red', 'green', 'blue']

Builds for dataclasses/namedtuples

from dataclasses import dataclass

@dataclass class Point: x: float y: float

@given(st.builds(Point, x=st.floats(), y=st.floats())) def test_point(point): assert hasattr(point, 'x') assert hasattr(point, 'y')

From type annotations (inferred)

@given(st.from_type(Point)) def test_point_from_type(point): assert isinstance(point, Point)

Shrinking Examples

Hypothesis shrinking in action

BAD: This will shrink to something like [0, 0, 1]

@given(st.lists(st.integers())) def test_no_duplicates_bad(xs): assert len(xs) == len(set(xs)) # Fails for [0, 0]

Shrunk output: Falsifying example: test_no_duplicates_bad(xs=[0, 0])

Note: Hypothesis found the MINIMAL failing case automatically

The shrinking process:

1. Initial failure: [42, -17, 0, 999, 42, 3]

2. Try smaller list: [42, -17, 0, 999, 42] - still fails? no (no dup)

3. Try: [42, -17, 42] - fails? yes (has dup)

4. Try: [42, 42] - fails? yes

5. Try: [0, 0] - fails? yes (simpler values)

6. Try: [0] - fails? no

7. Minimal: [0, 0]

Custom shrinking with map

@given(st.integers().map(lambda x: x * 2)) def test_even_numbers(n): assert n % 2 == 0 # Always passes, but shrinks to 0

Filter affects shrinking

@given(st.integers().filter(lambda x: x > 10)) def test_large_numbers(n): assert n < 100 # Shrinks to 11 (smallest that passes filter)

Integration with Pytest

import pytest from hypothesis import given, settings from hypothesis import strategies as st

Combine with fixtures

@pytest.fixture def database(): db = Database() yield db db.close()

@given(key=st.text(), value=st.binary()) def test_database_roundtrip(database, key, value): database.put(key, value) assert database.get(key) == value

Parametrize + hypothesis

@pytest.mark.parametrize("operation", ["add", "subtract", "multiply"]) @given(a=st.integers(), b=st.integers()) def test_operations(operation, a, b): result = calculate(operation, a, b) # Verify properties based on operation if operation == "add": assert result == a + b

Mark slow tests

@pytest.mark.slow @settings(max_examples=10000) @given(st.binary(min_size=10000)) def test_large_data(data): process(data)

Skip in certain conditions

@pytest.mark.skipif(condition, reason="...") @given(st.integers()) def test_conditional(n): pass

Reproducing Failures

from hypothesis import given, reproduce_failure, settings from hypothesis import strategies as st

Hypothesis prints reproduction code on failure:

Falsifying example: test_example(x=[0, 0])

You can add @reproduce_failure(...) to force this example:

@reproduce_failure('6.100.0', b'AAAB')

Force a specific failing example for debugging

@reproduce_failure('6.100.0', b'AAAB') # Version and blob @given(st.lists(st.integers())) def test_reproduction(xs): assert len(xs) == len(set(xs))

Example database location

.hypothesis/examples/<test_function_name>

Commit this to version control!

Every CI run will replay all known failures first

Explicit seed for determinism

@settings(database=None) # Disable database @given(st.integers()) def test_with_seed(n): assert n != 42

Run with: pytest --hypothesis-seed=12345

Testing APIs

from hypothesis import given, assume from hypothesis import strategies as st import requests

Generate valid HTTP requests

@st.composite def http_request(draw): method = draw(st.sampled_from(['GET', 'POST', 'PUT', 'DELETE'])) path = '/' + draw(st.text( alphabet='abcdefghijklmnopqrstuvwxyz/', min_size=1, max_size=50 ))

if method in ('POST', 'PUT'):
    body = draw(st.dictionaries(
        st.text(min_size=1),
        st.text() | st.integers() | st.booleans()
    ))
else:
    body = None

return {'method': method, 'path': path, 'body': body}

@given(http_request()) def test_api_doesnt_crash(request): """Property: API should never crash (return 5xx).""" response = make_request( request['method'], request['path'], json=request['body'] )

# 4xx is fine (client error), 5xx is not
assert response.status_code &#x3C; 500

Schemathesis-style: generate from OpenAPI spec

@given(st.from_schema(openapi_schema)) def test_schema_conformance(request): response = client.request(**request) validate_response(response, openapi_schema)

Mental Model

MacIver approaches testing by asking:

What properties should hold? Think invariants and roundtrips
What's the simplest failure? Trust Hypothesis shrinking
Am I saving failures? Commit the example database
Can I compose strategies? Build complex from simple
Is it stateful? Use RuleBasedStateMachine

The Hypothesis Checklist

□ Use @given with appropriate strategies □ Add @example for known edge cases □ Use assume() for preconditions □ Commit .hypothesis/ directory □ Set appropriate max_examples (100+) □ Configure CI profile with more examples □ Use composite strategies for complex types □ Use stateful testing for APIs □ Trust the shrinking—don't manually minimize □ Check deadline settings for slow tests

Signature MacIver Moves

Integrated shrinking (choice sequence reduction)
Example database (never forget a failure)
Composite strategies (@st.composite)
Stateful testing (RuleBasedStateMachine)
Settings profiles (CI vs dev)
Type-based inference (st.from_type)
Recursive strategies (st.recursive)
Health checks for performance issues