Subscription Killer
Analyse a bank transactions CSV file and surface every recurring subscription, ranked by cancellation / downgrade priority with estimated monthly savings.
Skill overview
This skill takes a single CSV file of bank transactions and:
- Sniffs the CSV structure to detect column names automatically.
- Normalises merchant names using a curated alias dictionary plus fuzzy matching (Levenshtein distance ≤ 2 tokens).
- Detects recurring charges by cadence (monthly / quarterly / annual) and amount consistency (±2 % tolerance).
- Scores each subscription by confidence (0–100).
- Ranks subscriptions by a composite priority score: spend × inactivity risk × price-creep factor × duplication penalty.
- Outputs a structured report with clear cancellation and downgrade actions.
Input format
The skill accepts any CSV export from a personal bank account. Common formats are auto-detected:
| Bank / Provider | Typical columns |
|---|---|
| Monzo | Date, Name, Amount, Category |
| Starling | Date, Counter Party, Amount, Balance |
| Revolut | Started Date, Description, Amount, Currency |
| Standard OFX export | Date, Description, Debit Amount |
| Generic | date, merchant / description, amount (minimum required) |
Column detection priority:
- Date: looks for columns named
date,transaction date,started date,posted date(case-insensitive). Falls back to first column that parses as ISO 8601 or DD/MM/YYYY. - Amount: looks for
amount,debit amount,debit. Ignores credit / top-up rows (positive amounts in debit-positive conventions). - Merchant: looks for
name,description,merchant,counter party,payee. Falls back to longest non-numeric column.
Core logic
Step 1 — Parse CSV
import csv, re, difflib
from datetime import datetime, timedelta
from collections import defaultdict
def sniff_columns(header: list[str]) -> dict:
"""Return {'date': col, 'amount': col, 'merchant': col} best guesses."""
header_lower = [h.lower().strip() for h in header]
DATE_HINTS = ['date', 'transaction date', 'started date', 'posted date']
AMOUNT_HINTS = ['amount', 'debit amount', 'debit']
MERCHANT_HINTS = ['name', 'description', 'merchant', 'counter party',
'payee', 'narrative']
def first_match(hints):
for hint in hints:
if hint in header_lower:
return header[header_lower.index(hint)]
return None
return {
'date': first_match(DATE_HINTS) or header[0],
'amount': first_match(AMOUNT_HINTS) or header[1],
'merchant': first_match(MERCHANT_HINTS) or header[2],
}
def parse_amount(raw: str) -> float | None:
"""Return a positive debit amount, or None for credits / errors."""
cleaned = re.sub(r'[£$€,\s]', '', raw.strip())
if not cleaned:
return None
try:
val = float(cleaned)
# Debit-positive CSV (most UK banks): only keep debits (negatives after
# sign flip are credits — skip them)
# Some CSVs use negative for debits; normalise to positive debit.
return abs(val) if val != 0 else None
except ValueError:
return None
Step 2 — Normalise merchant names
Use a two-pass approach:
Pass 1 — Known alias dictionary (curated, highest-confidence)
MERCHANT_ALIASES = {
# Streaming
r'netflix': 'Netflix',
r'spotify': 'Spotify',
r'apple\.com/bill': 'Apple Subscriptions',
r'itunes': 'Apple Subscriptions',
r'amazon prime': 'Amazon Prime',
r'amzn\s?prime': 'Amazon Prime',
r'disney\+|disneyplus': 'Disney+',
r'youtube premium': 'YouTube Premium',
r'hbo|max\.com': 'Max (HBO)',
r'paramount': 'Paramount+',
r'dazn': 'DAZN',
# Music
r'tidal': 'Tidal',
r'deezer': 'Deezer',
r'soundcloud': 'SoundCloud',
# Productivity / SaaS
r'github': 'GitHub',
r'notion': 'Notion',
r'slack': 'Slack',
r'dropbox': 'Dropbox',
r'google\s?(one|storage|workspace)': 'Google One/Workspace',
r'microsoft 365|office 365|msft': 'Microsoft 365',
r'adobe': 'Adobe Creative Cloud',
r'figma': 'Figma',
r'zoom': 'Zoom',
r'1password|lastpass|bitwarden': 'Password Manager',
r'nordvpn|expressvpn|surfshark': 'VPN Service',
# News / Reading
r'medium': 'Medium',
r'substack': 'Substack',
r'kindle unlimited': 'Kindle Unlimited',
r'audible': 'Audible',
r'the times|thetimes': 'The Times',
r'financial times|ft\.com': 'Financial Times',
r'economist': 'The Economist',
# Fitness
r'peloton': 'Peloton',
r'strava': 'Strava',
r'myfitnesspal': 'MyFitnessPal',
r'calm|headspace': 'Meditation App',
# Utilities / services
r'amazon web services|aws': 'AWS',
r'google cloud|gcp': 'Google Cloud',
r'digitalocean': 'DigitalOcean',
r'cloudflare': 'Cloudflare',
r'railway\.app': 'Railway',
r'openai': 'OpenAI',
r'anthropic': 'Anthropic',
}
def normalize_merchant(raw: str) -> str:
"""Return a clean, canonical merchant name."""
s = raw.lower().strip()
for pattern, canonical in MERCHANT_ALIASES.items():
if re.search(pattern, s, re.IGNORECASE):
return canonical
# Pass 2 — strip noise tokens (card numbers, references, country codes)
cleaned = re.sub(
r'\b(www|com|co\.uk|ltd|limited|uk|us|gb|inc|gmbh)\b'
r'|\d{4,}' # card / ref numbers
r'|\*+\S*' # asterisk-prefixed tokens
r'|[^a-z0-9 ]', # special chars
'', s
).strip().title()
return cleaned or raw.title()
Pass 2 — Fuzzy deduplication across the transaction list
After normalisation, cluster remaining merchant names by token-set similarity
(difflib.SequenceMatcher ratio > 0.82). The most-frequent variant in each
cluster becomes the canonical name.
Step 3 — Detect recurring charges
def detect_cadence(dates: list[datetime]) -> str | None:
"""
Return 'monthly', 'quarterly', 'annual', or None.
Requires ≥ 2 transactions for monthly/quarterly, ≥ 1 for annual hinting.
"""
if len(dates) < 2:
# Single occurrence — flag as potential annual if amount > 30
return 'annual_candidate'
gaps = sorted([(dates[i+1] - dates[i]).days for i in range(len(dates)-1)])
median_gap = gaps[len(gaps)//2]
if 25 <= median_gap <= 35: return 'monthly'
if 85 <= median_gap <= 100: return 'quarterly'
if 340 <= median_gap <= 380: return 'annual'
return None
def amounts_consistent(amounts: list[float], tolerance: float = 0.02) -> bool:
"""True if all amounts are within ±2 % of the median."""
if not amounts:
return False
median = sorted(amounts)[len(amounts)//2]
return all(abs(a - median) / median <= tolerance for a in amounts)
Step 4 — Confidence scoring (0–100)
| Factor | Max points | Logic |
|---|---|---|
| Cadence regularity | 40 | All gaps within ±3 days of median |
| Amount consistency | 30 | All amounts within ±2 % |
| Known merchant match | 20 | Exact alias dictionary hit |
| Transaction count | 10 | log₂(count) × 3, capped at 10 |
Confidence ≥ 70 → Confirmed subscription Confidence 40–69 → Probable subscription Confidence < 40 → Possible / flag for review
Step 5 — Priority ranking
Each confirmed/probable subscription gets a composite priority score:
priority = monthly_cost
× price_creep_multiplier # max(amounts) / min(amounts); 1.0 = stable
× duplication_penalty # 1.5 if same category has 2+ subscriptions
× inactivity_risk # 1.3 if last charge > 45 days ago
Higher priority score = stronger cancellation candidate.
Special flags:
- Trial trap: single charge in the last 7 days from a previously-unseen merchant, followed by a recurring pattern. Flag as "check for auto-renew".
- Price creep: amount has increased > 5 % over the observation window. Flag with the exact amount delta.
- Annual renewal due: annual-cadence subscription whose last charge was 300–365 days ago. Flag with estimated renewal date.
- Unknown merchant: low-confidence recurring charge with no alias match. Flag for manual review.
Output schema
The skill returns a structured report. Print to stdout as JSON for downstream agent consumption, or render the human-readable summary below.
{
"summary": {
"observation_window_days": 90,
"total_subscriptions": 12,
"confirmed": 9,
"probable": 2,
"flagged_for_review": 1,
"monthly_spend_gbp": 187.43,
"annual_spend_gbp": 2249.16,
"potential_monthly_saving_gbp": 54.20
},
"subscriptions": [
{
"merchant": "Adobe Creative Cloud",
"cadence": "monthly",
"median_amount": 54.99,
"last_charge": "2025-04-01",
"confidence": 95,
"priority_score": 89.2,
"flags": ["price_creep"],
"price_creep_delta": 5.00,
"cancel_url": "https://account.adobe.com/plans",
"actions": ["cancel", "downgrade"]
}
],
"top_cancellation_targets": ["Adobe Creative Cloud", "..."],
"unknown_merchants": ["TRSF 48291 REF77"],
"annual_renewals_due": []
}
Human-readable summary (terminal / Telegram)
╔══════════════════════════════════════════╗
║ SUBSCRIPTION KILLER v1.0 ║
╠══════════════════════════════════════════╣
║ 12 subscriptions • £187/mo • £2,249/yr
╚══════════════════════════════════════════╝
TOP CANCELLATION TARGETS
1. Adobe Creative Cloud £54.99/mo ↑ price crept +£5.00
2. Calm £39.99/yr last used 3 months ago
3. LinkedIn Premium £29.99/mo duplicate: 2 job-search tools
POTENTIAL SAVING: £54/mo • £648/yr
UNKNOWN MERCHANTS (review manually)
• TRSF 48291 REF77 — £12.99/mo (confidence 38%)
Cancel guidance (top 50 services)
The skill bundles a cancel_urls.json file mapping canonical merchant names
to their self-serve cancellation pages. This removes friction for the user.
{
"Netflix": "https://www.netflix.com/cancelplan",
"Spotify": "https://www.spotify.com/account/subscription/cancel",
"Adobe Creative Cloud": "https://account.adobe.com/plans",
"Amazon Prime": "https://www.amazon.co.uk/mc/pipeline/cancelEndBenefit",
"Disney+": "https://www.disneyplus.com/account/subscription",
"Microsoft 365": "https://account.microsoft.com/services",
"GitHub": "https://github.com/settings/billing/subscriptions",
"Dropbox": "https://www.dropbox.com/account/plan",
"Zoom": "https://zoom.us/billing",
"LinkedIn Premium": "https://www.linkedin.com/premium/products",
"Google One/Workspace": "https://myaccount.google.com/payments-and-subscriptions",
"OpenAI": "https://platform.openai.com/account/billing"
}
Invocation
# Analyse a CSV file
python3 subscription_killer.py --file transactions.csv
# Override currency display
SUBSCRIPTION_KILLER_CURRENCY=EUR python3 subscription_killer.py --file transactions.csv
# Output raw JSON for piping to another skill
python3 subscription_killer.py --file transactions.csv --json
The script reads SUBSCRIPTION_KILLER_CURRENCY from the environment for display
only; all internal calculations use the amounts as-found in the CSV.
Roadmap (future skills in this suite)
This skill is intentionally scoped to subscription detection. Planned companion skills that share the CSV parsing layer:
| Skill | Description |
|---|---|
savings-rate-analyser | Track income vs spend, compute savings rate, compare to benchmarks |
deposit-rate-scout | Pull live UK/EU savings account rates via open banking APIs |
emergency-fund-checker | Assess months of runway based on average monthly spend |
investment-nudger | Identify investable surplus after subscriptions + essentials |
The bank-csv-parser module within this skill will be extracted into a shared
utility skill once the suite matures.
Security & privacy
- All processing is local. No transaction data leaves the machine.
- The skill does not write any files outside the working directory.
- No API keys required for the core subscription detection flow.
- The
cancel_urls.jsonis a static lookup table; no network calls are made unless the user explicitly requests live rate data (future skill).