cohort-analysis-builder

Build, validate, and interpret cohort retention analyses for SaaS / subscription / e-commerce / consumer products. Covers cohort definition (signup cohort vs activation cohort vs first-purchase cohort vs feature-adoption cohort), the right time grain (daily for product, weekly for early SaaS, monthly for mature SaaS, quarterly for enterprise), retention curves (W1/M1, M3, M6, M12, M24), the "smile curve" (re-engagement after churn — usually a data error, not a real product trend), gross retention vs net revenue retention, cohort layering (compare cohorts by acquisition source / pricing tier / product version / geography), the "retention shape" (declining, flat-after-N-periods, smile, cliff), and common pitfalls (mid-period accounting, late-binding revenue, data leakage, survivorship bias). Use when analyst says "build cohort analysis", "M1 retention", "retention curve", "logo retention vs net retention", "expansion cohort", "weekly cohorts", "cohort analysis SQL", "Looker cohort", "Mode Analytics cohort", "Amplitude cohort", "Mixpanel cohort", "Heap cohort". Triggers on phrases like "cohort analysis", "retention cohort", "user retention", "subscription cohort", "M1 retention", "churn cohort", "cohort SQL", "smile curve", "cohort layering".

Safety Notice

This listing is from the official public ClawHub registry. Review SKILL.md and referenced scripts before running.

Copy this and send it to your AI assistant to learn

Install skill "cohort-analysis-builder" with this command: npx skills add charlie-morrison/cohort-analysis-builder

cohort-analysis-builder

Build cohort analyses that hold up under scrutiny. Most cohort charts presented in board decks are wrong — bad cohort definitions, mid-period accounting drift, survivorship bias, or comparing cohorts of different ages. This skill walks through cohort definition, building the analysis, validating it, and interpreting the resulting retention shapes correctly.

The work order is always: define → query → validate → interpret. Skipping validation is how "looking good in the deck" diverges from "actually good for the business".

When to engage

Trigger when the user asks:

  • For a cohort analysis: "build a cohort", "retention cohort", "M1 retention", "M3 / M6 / M12", "weekly cohorts", "monthly cohorts"
  • About retention shapes: "is this normal?", "smile curve", "cliff at week 4", "retention is flatlining"
  • For specific tools: SQL cohort query, Mode / Looker / Tableau cohort, Amplitude / Mixpanel / Heap cohort, Excel / Google Sheets cohort
  • About cohort comparisons: "are paid users retaining better than free?", "how does v2 compare to v1?", "geo cohort", "channel cohort"
  • For NRR / GRR analysis: "net revenue retention by cohort", "expansion cohort", "logo retention by cohort"
  • For investor / board prep: "investor wants M12 retention", "QofE cohort review", "due diligence cohort question"

Do not engage for: pure churn-prediction ML models (different skill), customer segmentation analyses (different — use customer-segmentation-coach), or LTV calculation alone (related but distinct).

Define the cohort first — this is half the work

Most analytical mistakes are at the definition step.

Cohort axis: what binds the cohort together

Different products demand different cohort definitions:

  • Signup cohort: all users who signed up in period X. Most common; useful for top-of-funnel evaluation. Risk: signup is not activation; many signups never use the product.
  • Activation cohort: all users who completed an "activation event" (e.g., first save, first invite, first purchase) in period X. Better for product-led products; aligns with "real" usage.
  • First-purchase cohort: all users who made their first paid transaction in period X. Best for paid SaaS / subscription / e-commerce.
  • First-paid-month cohort: all customers whose first paid invoice was in period X. Standard for SaaS where free trial precedes paid conversion.
  • Feature-adoption cohort: all users who first used feature F in period X. For feature-specific retention analysis.
  • Geography / segment cohort: all customers from segment S in period X. Combine with one of the above.

Default: for SaaS, use first-paid-month cohort (matches investor / board / financial framing). For product analytics, use activation cohort (matches actual user behavior).

Time grain: how to slice the cohort

  • Daily cohorts: useful only for very-high-volume consumer products (>1K signups/day) where daily-grain noise averages out
  • Weekly cohorts: standard for consumer products and PLG SaaS in early days
  • Monthly cohorts: standard for B2B SaaS and most subscription products at scale
  • Quarterly cohorts: for enterprise / contract-based businesses where customers come in seasonal waves

Don't mix grains in one chart. If you have weekly cohorts but only 8 weeks of data, that's a 12-week chart, not "monthly retention".

Retention period grain — DIFFERENT from cohort grain

You can have monthly cohorts measured at weekly retention granularity (W1, W2, W3...). The cohort axis (how cohorts are formed) and the retention axis (how often you measure them after) are independent decisions.

For SaaS with monthly billing: monthly cohorts × monthly retention measure (M1, M2, M3, ...) = standard. For consumer apps: weekly cohorts × daily retention measure (D1, D7, D14, D30) = standard.

Retention event: what does "retained" mean

  • Logo retention: customer is still paying. Easiest to measure. Misses expansion / contraction.
  • Revenue retention: dollars from cohort N at time T / dollars from cohort N at time 0. Captures expansion + churn + contraction.
  • Net revenue retention (NRR): customers in cohort, current revenue / cohort original revenue. Includes upsell, churn, contraction.
  • Gross revenue retention (GRR): customer in cohort, current revenue / cohort original revenue, capped at 100% per customer (no upsell counted). Measures churn / contraction only.
  • Activity retention: did the user perform action X in week N? For consumer products / engagement metrics.
  • Per-user revenue retention: ARPU evolution per cohort.

Choose one and stick with it. Don't switch mid-analysis. If the audience needs both NRR and GRR, build both as separate cohort tables and present together.

Building the cohort table — SQL pattern

The canonical cohort retention SQL:

-- Step 1: define cohort assignment
WITH cohort_assignment AS (
  SELECT
    customer_id,
    DATE_TRUNC('month', first_paid_date) AS cohort_month,
    first_paid_revenue
  FROM customers
  WHERE first_paid_date IS NOT NULL
),

-- Step 2: define monthly active / paying state
monthly_state AS (
  SELECT
    customer_id,
    DATE_TRUNC('month', invoice_date) AS month,
    SUM(invoice_amount) AS revenue_in_month,
    1 AS is_active
  FROM invoices
  WHERE status = 'paid'
  GROUP BY 1, 2
),

-- Step 3: compute cohort × retention period
cohort_retention AS (
  SELECT
    c.cohort_month,
    DATE_DIFF('month', c.cohort_month, m.month) AS month_index,
    COUNT(DISTINCT c.customer_id) AS active_customers,
    SUM(m.revenue_in_month) AS cohort_revenue
  FROM cohort_assignment c
  LEFT JOIN monthly_state m
    ON c.customer_id = m.customer_id
    AND m.month >= c.cohort_month
  GROUP BY 1, 2
)

-- Step 4: pivot for the chart
SELECT
  cohort_month,
  COUNT(DISTINCT customer_id_at_month_0) AS cohort_size,
  COUNT(DISTINCT customer_id_at_month_1) AS m1_active,
  COUNT(DISTINCT customer_id_at_month_3) AS m3_active,
  -- ...
FROM cohort_retention
GROUP BY 1
ORDER BY 1;

Adapt for: weekly cohorts (use DATE_TRUNC('week', ...)), feature retention (replace is_active with did_feature_X), revenue retention (use revenue_in_month instead of distinct count).

Common SQL mistakes

  • Using LEFT JOIN then WHERE m.month IS NOT NULL: turns it into an INNER JOIN, drops cohorts with zero retention
  • Forgetting to align cohort_month: comparing customer's first-paid-date to a fiscal calendar that doesn't match
  • Counting customer_id from invoices, not from cohort_assignment: misses customers who churned in M1
  • Using last_invoice_date instead of monthly invoice presence: misclassifies long-term customers who paid in M1 and skipped M2

Validation — before showing anyone

Run these checks. Most cohort errors are caught here, not at presentation.

Cohort-size sanity check

  • Each cohort row should have a customer count that matches your CRM / billing-system count for that period
  • If your "Jan 2024 cohort" has 47 customers but your finance system shows 50 paid customers acquired in Jan 2024, find the 3 missing — partial-month signups, returns, refunds, edge cases

Retention-vs-aggregate sanity check

  • M0 of every cohort = 100% by definition (customer is in the cohort by definition of being a customer in M0)
  • Sum across cohorts should match overall active customer count at each calendar month
  • If overall active customers = 1000 in Jan 2025, sum of all (cohort, month_index) where calendar = Jan 2025 should also = 1000

Monotonicity check (for logo retention)

  • Logo retention curve should be monotonically non-increasing — if M2 retention > M1 retention for the same cohort, you have a re-engagement counted as new retention (data leakage)
  • If you see "smile curve" (M3 < M6 < M9 retention), 95% of the time it's an analytical bug, not a real product trend
  • Common cause: re-subscribing after churn is being counted as "still retained" — fix the SQL to count only continuous retention

Edge case: customer left and returned

  • Should they be counted as retained in months they were paused / churned?
  • Default: NO. They're a NEW cohort if they came back.
  • If you do count them as retained-throughout, surface this explicitly.

Survivorship bias check

  • Does the cohort table include cohorts whose retention you can't yet measure to the same depth?
  • Example: if you're computing M12 retention and your most recent cohort is 6 months old, that cohort has incomplete M12 data. Either: drop incomplete cohorts, or clearly mark them.

Cross-validation

  • Run the same cohort logic in a different tool (e.g., compute in SQL, validate in spreadsheet for one cohort)
  • Compare to your finance / accounting system — total revenue per cohort over time should match invoice totals

Interpretation — what retention shapes mean

Cliff retention (sharp drop in early periods, then flat)

  • Pattern: 100% → 50% → 30% → 28% → 27% → 27%...
  • Typical of: products with a strong "evergreen user" segment after early activation
  • Healthy when: the flat tail is at a high enough percentage to support unit economics
  • Concerning when: cliff is too steep (e.g., 50% drop in M1 = poor activation / onboarding)

Linear-decline retention

  • Pattern: 100% → 90% → 80% → 70% → 60%...
  • Typical of: subscription products with predictable annual / multi-year churn
  • Concerning if: the slope steepens over time (cohorts are getting worse)
  • Healthy if: the slope is flat across cohorts (consistent retention) and acceptable level (e.g., 90% per year)

S-curve / leveled retention

  • Pattern: 100% → 60% → 40% → 35% → 33% → 32% → 32%...
  • Typical of: products with strong product-market fit; customers either churn early or stay forever
  • Healthy when: tail level is high enough; cohort sizes are growing

Smile curve (re-engagement)

  • Pattern: 100% → 50% → 30% → 25% → 30% → 35%...
  • Almost always a data error: re-subscribers being counted as continuously retained
  • Real smile curves exist but are rare — typically only on consumer products with seasonal / lifecycle reasons (fitness apps in January, tax software in April)
  • ALWAYS validate the smile curve before presenting

Aging / cohort-version retention

  • Plot multiple cohorts on the same chart (Jan 2024, Feb 2024, Mar 2024, ...)
  • Healthy: more recent cohorts retain at least as well as older ones (slope of retention curves is consistent)
  • Concerning: more recent cohorts retain WORSE than older cohorts ("retention is degrading") — often a sign of growth-driven loosening of ICP, or competitor pressure
  • Healthy: more recent cohorts retain BETTER than older cohorts (often a sign of product improvement)

Cohort layering — the high-leverage analysis

Single-cohort analysis answers "what's our retention". Layered cohort analysis answers "what's CAUSING our retention".

Layer by acquisition channel

  • Compare retention curves of customers from organic, paid Google, paid Meta, content, referrals, sales-led
  • High-leverage finding: paid channels often retain worse than organic; if so, CAC payback math should adjust

Layer by pricing tier

  • Compare retention of Starter, Pro, Business, Enterprise
  • Common: lower tiers churn more; helps justify investment in upmarket motion

Layer by product version / feature adoption

  • Compare cohorts who adopted Feature X vs cohorts who didn't
  • Finding: "users who adopted Feature X in M1 retain at 90% in M6 vs 50% for those who didn't"
  • Caution: correlation, not causation — Feature X adoption may be a proxy for engagement, not a driver

Layer by sales rep / customer success owner

  • Compare retention of customers across reps / CSMs
  • Surfaces individual performance differences

Layer by industry / company size

  • Compare SMB / mid-market / enterprise cohorts
  • Surfaces ICP fit gaps

Layer by geography

  • Compare US / EU / APAC cohorts
  • Surfaces market-fit differences

Layer by signup-date period

  • Compare cohorts pre-/post- a major product change
  • Cleanest test of whether a feature / pricing / messaging change actually improved retention

Common pitfalls

Mid-period accounting drift

  • Cohort defined by start-of-month signup but customer actually started mid-month (free trial period)
  • Their "M1" really only includes 15 days of post-signup activity
  • Fix: define cohorts by exact dates, then bucket; or accept the drift as systematic

Survivorship bias

  • M12 retention chart that omits cohorts with insufficient age (only computed for cohorts ≥12 months old)
  • Audience interprets as current-state; actually it's a historic-state view
  • Fix: clearly label "cohorts with sufficient age" + footnote

Currency / pricing changes

  • If you charge in local currency and report in USD, FX swings show up as "expansion" or "contraction" that isn't real
  • Fix: report constant-currency NRR / GRR

Plan migrations

  • Customer downgrades from Pro to Starter, or moves between plans
  • Counted as expansion / contraction depending on direction
  • Fix: track plan changes separately; surface migration as its own metric

Refunds / chargebacks

  • Customer paid in M1, refunded in M2 — should they count as M1 retained?
  • Default: count their full original cohort revenue, then subtract in subsequent months as a contraction
  • Fix: be explicit about treatment

Payment method changes

  • Customer's credit card expires, fails to renew, then re-enters payment 30 days later
  • Counted as churn-and-return or as continuous retention?
  • Default: continuous if return is < 30 days (grace period)
  • Fix: define grace period explicitly

Zombie customers

  • Customers paying $0 (free / trialing forever) included in cohort
  • Distorts revenue retention but not logo retention
  • Fix: define paying customer explicitly (e.g., "any customer with > $0 invoice in last 90 days")

Tool-specific cohort building

SQL (Postgres / Snowflake / BigQuery / Redshift)

  • Use the canonical pattern above
  • Window functions help: ROW_NUMBER() OVER (PARTITION BY customer_id ORDER BY month) for cohort assignment

Amplitude / Mixpanel

  • Built-in cohort visualization; risk is overconfidence in the tool's defaults
  • Always check: cohort definition (event-based vs property-based), retention type (Returning / Bracket / Unbounded)

Looker / Lookbridge

  • LookML-based; can build retention as a named metric
  • Watch for: persistent derived tables that don't refresh; aggregate awareness conflicts

dbt + Mode / Tableau / Metabase

  • Build the cohort table in dbt as a materialized model
  • Visualize in any BI tool
  • Best for reproducibility and audit trail

Spreadsheet (Excel / Google Sheets)

  • For one-off analysis or first cohort
  • Use INDEX/MATCH or pivot table; risky for production / repeated analysis

Output to user after diagnostic

After understanding the user's product, scale, and goal:

  1. Cohort definition recommendation: cohort axis (signup / activation / first-paid), cohort grain (weekly / monthly / quarterly), retention period grain
  2. Retention metric: logo / GRR / NRR / activity — match to the question being asked
  3. SQL or tool template: canonical query adapted to the user's schema
  4. Validation checklist: specific checks to run before presenting (cohort sizes, monotonicity, aggregate match)
  5. Interpretation framework: what retention shape to expect, what would be concerning vs healthy
  6. Layering plan: which 2-3 dimensions to layer first for maximum insight
  7. Pitfalls to avoid: tied to the user's specific business model (e.g., currency for international, plan migration for tiered SaaS)
  8. Output format: heatmap (cohort × period), unsmoothed curve overlay, table for board

A solid cohort analysis is one of the highest-leverage diagnostic tools in B2B and consumer subscription. A flawed cohort analysis is one of the most common ways founders / ops teams convince themselves of false retention narratives. This skill builds the discipline to do it right, every time.

Source Transparency

This detail page is rendered from real SKILL.md content. Trust labels are metadata-based hints, not a safety guarantee.

Related Skills

Related by shared tags or category signals.

Web3

Claw-Net

Ask a question in plain English, get data from 13,000+ APIs in one call. Crypto prices, social data, market intelligence. Every response is cryptographically...

Registry SourceRecently Updated
3541Profile unavailable
Web3

ClawPay-Hedera

Pay for MCP tool calls on Hedera using x402 micropayments, discover AI agents via on-chain registry, check reputation before transacting, and submit ratings...

Registry SourceRecently Updated
2540Profile unavailable
Web3

Governance Inheritance

Hierarchical policy inheritance system for OpenClaw agents. Enables policies to be defined at organization, team, project, and session levels with automatic...

Registry SourceRecently Updated
2720Profile unavailable
Web3

oudated-noa

Citizen skill for the Nation of Agents — authenticate with your Ethereum wallet, communicate via Matrix, trade and collaborate with other AI agents.

Registry SourceRecently Updated
2580Profile unavailable