window-function-builder

SQL Window Function Builder

Safety Notice

This listing is imported from skills.sh public index metadata. Review upstream SKILL.md and repository scripts before running.

Copy this and send it to your AI assistant to learn

Install skill "window-function-builder" with this command: npx skills add dengineproblem/agents-monorepo/dengineproblem-agents-monorepo-window-function-builder

SQL Window Function Builder

Эксперт по SQL window functions и аналитическим запросам для бизнес-аналитики.

Core Function Categories

Function Overview

window_functions: ranking: - ROW_NUMBER(): "Unique sequential numbers" - RANK(): "Ranking with gaps for ties" - DENSE_RANK(): "Ranking without gaps" - NTILE(n): "Divide into n buckets"

aggregate: - SUM(): "Running/cumulative totals" - AVG(): "Moving averages" - COUNT(): "Running counts" - MIN(): "Running minimum" - MAX(): "Running maximum"

offset: - LAG(col, n): "Access previous row" - LEAD(col, n): "Access next row" - FIRST_VALUE(): "First value in frame" - LAST_VALUE(): "Last value in frame" - NTH_VALUE(col, n): "Nth value in frame"

statistical: - PERCENT_RANK(): "Relative rank as percentage" - CUME_DIST(): "Cumulative distribution" - PERCENTILE_CONT(): "Interpolated percentile" - PERCENTILE_DISC(): "Discrete percentile"

Basic Syntax

OVER Clause Structure

function_name(expression) OVER ( [PARTITION BY partition_expression, ...] [ORDER BY sort_expression [ASC|DESC], ...] [frame_clause] )

Frame Specifications

-- Frame clause syntax ROWS | RANGE BETWEEN frame_start AND frame_end

-- Frame bounds UNBOUNDED PRECEDING -- From first row of partition n PRECEDING -- n rows before current CURRENT ROW -- Current row n FOLLOWING -- n rows after current UNBOUNDED FOLLOWING -- To last row of partition

-- Common frame patterns ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW -- Running total (default for ORDER BY) ROWS BETWEEN 6 PRECEDING AND CURRENT ROW -- 7-day rolling ROWS BETWEEN 1 PRECEDING AND 1 FOLLOWING -- 3-row centered RANGE BETWEEN INTERVAL '30' DAY PRECEDING AND CURRENT ROW -- 30-day range

Ranking Functions

ROW_NUMBER, RANK, DENSE_RANK

-- Basic ranking comparison SELECT employee_id, department, salary, ROW_NUMBER() OVER (ORDER BY salary DESC) AS row_num, RANK() OVER (ORDER BY salary DESC) AS rank, DENSE_RANK() OVER (ORDER BY salary DESC) AS dense_rank FROM employees;

-- Results with tied salaries:

--employee_idsalaryrow_numrankdense_rank
--101100000111
--102100000211
--10390000332

Ranking Within Groups

-- Top 3 salaries per department WITH ranked AS ( SELECT employee_id, employee_name, department, salary, DENSE_RANK() OVER ( PARTITION BY department ORDER BY salary DESC ) AS dept_rank FROM employees ) SELECT * FROM ranked WHERE dept_rank <= 3;

NTILE for Percentiles

-- Divide customers into quartiles by revenue SELECT customer_id, customer_name, total_revenue, NTILE(4) OVER (ORDER BY total_revenue DESC) AS revenue_quartile, CASE NTILE(4) OVER (ORDER BY total_revenue DESC) WHEN 1 THEN 'Top 25%' WHEN 2 THEN '25-50%' WHEN 3 THEN '50-75%' WHEN 4 THEN 'Bottom 25%' END AS segment FROM customer_revenue;

Running Calculations

Running Totals

-- Running total of sales SELECT order_date, order_amount, SUM(order_amount) OVER ( ORDER BY order_date ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW ) AS running_total FROM orders;

-- Running total per customer SELECT customer_id, order_date, order_amount, SUM(order_amount) OVER ( PARTITION BY customer_id ORDER BY order_date ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW ) AS customer_running_total FROM orders;

Moving Averages

-- 7-day moving average SELECT date, daily_revenue, AVG(daily_revenue) OVER ( ORDER BY date ROWS BETWEEN 6 PRECEDING AND CURRENT ROW ) AS moving_avg_7d, -- Also track the count for partial windows COUNT(*) OVER ( ORDER BY date ROWS BETWEEN 6 PRECEDING AND CURRENT ROW ) AS days_in_window FROM daily_sales;

-- Centered moving average (3 days before, current, 3 days after) SELECT date, daily_revenue, AVG(daily_revenue) OVER ( ORDER BY date ROWS BETWEEN 3 PRECEDING AND 3 FOLLOWING ) AS centered_avg_7d FROM daily_sales;

Cumulative Percentage

-- Cumulative percentage of total SELECT product_category, revenue, SUM(revenue) OVER (ORDER BY revenue DESC) AS cumulative_revenue, ROUND( 100.0 * SUM(revenue) OVER (ORDER BY revenue DESC) / SUM(revenue) OVER (), 2 ) AS cumulative_pct FROM category_sales ORDER BY revenue DESC;

LAG and LEAD

Period-over-Period Comparison

-- Month-over-month growth SELECT month, revenue, LAG(revenue, 1) OVER (ORDER BY month) AS prev_month_revenue, revenue - LAG(revenue, 1) OVER (ORDER BY month) AS mom_change, ROUND( 100.0 * (revenue - LAG(revenue, 1) OVER (ORDER BY month)) / NULLIF(LAG(revenue, 1) OVER (ORDER BY month), 0), 2 ) AS mom_growth_pct FROM monthly_revenue;

-- Year-over-year comparison SELECT date, revenue, LAG(revenue, 365) OVER (ORDER BY date) AS yoy_revenue, revenue - LAG(revenue, 365) OVER (ORDER BY date) AS yoy_change FROM daily_revenue;

Gap Analysis

-- Find gaps between orders SELECT customer_id, order_date, LAG(order_date) OVER ( PARTITION BY customer_id ORDER BY order_date ) AS previous_order_date, order_date - LAG(order_date) OVER ( PARTITION BY customer_id ORDER BY order_date ) AS days_since_last_order FROM orders;

-- Identify churned customers (no order > 90 days) WITH order_gaps AS ( SELECT customer_id, order_date, LEAD(order_date) OVER ( PARTITION BY customer_id ORDER BY order_date ) AS next_order_date, LEAD(order_date) OVER ( PARTITION BY customer_id ORDER BY order_date ) - order_date AS gap_days FROM orders ) SELECT DISTINCT customer_id FROM order_gaps WHERE gap_days > 90 OR (next_order_date IS NULL AND order_date < CURRENT_DATE - INTERVAL '90 days');

Lead for Future Values

-- Forecast vs actual comparison SELECT forecast_date, predicted_value, LEAD(actual_value, 7) OVER (ORDER BY forecast_date) AS actual_7d_later, LEAD(actual_value, 7) OVER (ORDER BY forecast_date) - predicted_value AS forecast_error FROM forecasts;

FIRST_VALUE and LAST_VALUE

First/Last in Group

-- First and last order per customer SELECT customer_id, order_id, order_date, order_amount, FIRST_VALUE(order_date) OVER ( PARTITION BY customer_id ORDER BY order_date ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING ) AS first_order_date, LAST_VALUE(order_date) OVER ( PARTITION BY customer_id ORDER BY order_date ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING ) AS last_order_date FROM orders;

-- Note: LAST_VALUE requires explicit frame to see all rows!

Baseline Comparison

-- Compare each day to first day of month SELECT date, revenue, FIRST_VALUE(revenue) OVER ( PARTITION BY DATE_TRUNC('month', date) ORDER BY date ) AS first_day_revenue, revenue - FIRST_VALUE(revenue) OVER ( PARTITION BY DATE_TRUNC('month', date) ORDER BY date ) AS diff_from_first_day FROM daily_revenue;

Statistical Functions

Percentile Calculations

-- Calculate percentiles SELECT product_id, price, PERCENT_RANK() OVER (ORDER BY price) AS percent_rank, CUME_DIST() OVER (ORDER BY price) AS cumulative_dist, NTILE(100) OVER (ORDER BY price) AS percentile FROM products;

-- Median calculation (50th percentile) SELECT PERCENTILE_CONT(0.5) WITHIN GROUP (ORDER BY salary) AS median_salary, PERCENTILE_DISC(0.5) WITHIN GROUP (ORDER BY salary) AS median_salary_discrete, AVG(salary) AS mean_salary FROM employees;

-- Multiple percentiles at once SELECT department, PERCENTILE_CONT(0.25) WITHIN GROUP (ORDER BY salary) AS p25, PERCENTILE_CONT(0.50) WITHIN GROUP (ORDER BY salary) AS median, PERCENTILE_CONT(0.75) WITHIN GROUP (ORDER BY salary) AS p75, PERCENTILE_CONT(0.90) WITHIN GROUP (ORDER BY salary) AS p90 FROM employees GROUP BY department;

Business Analytics Patterns

Customer Lifecycle Analysis

-- Customer order sequence and lifecycle metrics WITH customer_orders AS ( SELECT customer_id, order_id, order_date, order_amount, ROW_NUMBER() OVER ( PARTITION BY customer_id ORDER BY order_date ) AS order_number, FIRST_VALUE(order_date) OVER ( PARTITION BY customer_id ORDER BY order_date ) AS first_order_date, SUM(order_amount) OVER ( PARTITION BY customer_id ORDER BY order_date ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW ) AS lifetime_value, LAG(order_date) OVER ( PARTITION BY customer_id ORDER BY order_date ) AS previous_order_date FROM orders ) SELECT customer_id, order_id, order_number, order_amount, lifetime_value, order_date - first_order_date AS days_since_first_order, order_date - previous_order_date AS days_since_last_order, CASE WHEN order_number = 1 THEN 'New' WHEN order_date - previous_order_date > 90 THEN 'Reactivated' ELSE 'Repeat' END AS customer_status FROM customer_orders;

Cohort Analysis

-- Monthly cohort retention WITH user_cohorts AS ( SELECT user_id, DATE_TRUNC('month', first_activity_date) AS cohort_month, DATE_TRUNC('month', activity_date) AS activity_month FROM user_activity ), cohort_sizes AS ( SELECT cohort_month, COUNT(DISTINCT user_id) AS cohort_size FROM user_cohorts GROUP BY cohort_month ), monthly_activity AS ( SELECT cohort_month, activity_month, COUNT(DISTINCT user_id) AS active_users, EXTRACT(MONTH FROM AGE(activity_month, cohort_month)) AS month_number FROM user_cohorts GROUP BY cohort_month, activity_month ) SELECT ma.cohort_month, ma.month_number, cs.cohort_size, ma.active_users, ROUND(100.0 * ma.active_users / cs.cohort_size, 2) AS retention_rate FROM monthly_activity ma JOIN cohort_sizes cs ON ma.cohort_month = cs.cohort_month ORDER BY ma.cohort_month, ma.month_number;

Sales Performance

-- Sales rep performance with rankings and targets WITH sales_performance AS ( SELECT sales_rep_id, rep_name, region, SUM(deal_amount) AS total_sales, COUNT(*) AS deal_count, AVG(deal_amount) AS avg_deal_size FROM sales_deals WHERE close_date >= DATE_TRUNC('quarter', CURRENT_DATE) GROUP BY sales_rep_id, rep_name, region ) SELECT sales_rep_id, rep_name, region, total_sales, deal_count, avg_deal_size, RANK() OVER (ORDER BY total_sales DESC) AS overall_rank, RANK() OVER (PARTITION BY region ORDER BY total_sales DESC) AS region_rank, total_sales - AVG(total_sales) OVER () AS vs_company_avg, total_sales - AVG(total_sales) OVER (PARTITION BY region) AS vs_region_avg, PERCENT_RANK() OVER (ORDER BY total_sales) AS percentile FROM sales_performance;

Inventory Analysis

-- Inventory movement analysis SELECT product_id, transaction_date, transaction_type, quantity, SUM( CASE WHEN transaction_type = 'IN' THEN quantity WHEN transaction_type = 'OUT' THEN -quantity ELSE 0 END ) OVER ( PARTITION BY product_id ORDER BY transaction_date, transaction_id ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW ) AS running_inventory, AVG( CASE WHEN transaction_type = 'OUT' THEN quantity END ) OVER ( PARTITION BY product_id ORDER BY transaction_date ROWS BETWEEN 29 PRECEDING AND CURRENT ROW ) AS avg_daily_usage_30d FROM inventory_transactions;

Platform-Specific Features

PostgreSQL

-- FILTER clause with window functions SELECT date, category, amount, SUM(amount) OVER (ORDER BY date) AS total_running, SUM(amount) FILTER (WHERE category = 'A') OVER (ORDER BY date) AS category_a_running FROM transactions;

-- GROUPS frame type SELECT date, amount, SUM(amount) OVER ( ORDER BY date GROUPS BETWEEN 1 PRECEDING AND 1 FOLLOWING ) AS sum_adjacent_groups FROM transactions;

SQL Server

-- STRING_AGG with OVER (SQL Server 2017+) -- Not directly supported, use workaround: SELECT DISTINCT department, STRING_AGG(employee_name, ', ') WITHIN GROUP (ORDER BY employee_name) OVER (PARTITION BY department) AS employees FROM employees;

-- OFFSET with frames (SQL Server 2022+) SELECT date, value, AVG(value) OVER ( ORDER BY date ROWS BETWEEN 7 PRECEDING AND 1 PRECEDING ) AS previous_week_avg FROM daily_metrics;

BigQuery

-- QUALIFY clause (filter on window function results) SELECT customer_id, order_date, order_amount FROM orders QUALIFY ROW_NUMBER() OVER ( PARTITION BY customer_id ORDER BY order_date DESC ) = 1;

-- Equivalent to CTE approach: -- WITH ranked AS ( -- SELECT *, ROW_NUMBER() OVER (...) AS rn FROM orders -- ) -- SELECT * FROM ranked WHERE rn = 1;

Snowflake

-- QUALIFY clause SELECT * FROM sales QUALIFY DENSE_RANK() OVER ( PARTITION BY region ORDER BY revenue DESC ) <= 5;

-- CONDITIONAL_TRUE_EVENT for sessionization SELECT user_id, event_time, CONDITIONAL_TRUE_EVENT( DATEDIFF('minute', LAG(event_time) OVER ( PARTITION BY user_id ORDER BY event_time ), event_time) > 30 ) OVER ( PARTITION BY user_id ORDER BY event_time ) AS session_id FROM user_events;

Performance Optimization

Indexing Strategy

-- Composite index for window function -- Index columns: PARTITION BY columns first, then ORDER BY columns CREATE INDEX idx_orders_customer_date ON orders (customer_id, order_date);

-- For running totals across all data CREATE INDEX idx_orders_date_amount ON orders (order_date, order_amount);

Query Optimization Tips

optimization_tips:

  • tip: "Add indexes on PARTITION BY + ORDER BY columns" reason: "Reduces sort operations"

  • tip: "Use CTEs to compute window once, reference multiple times" reason: "Avoids redundant calculations"

  • tip: "Limit rows before applying window functions when possible" reason: "Smaller dataset = faster windows"

  • tip: "Be explicit about frame clause" reason: "Prevents unexpected defaults"

  • tip: "Use QUALIFY when supported" reason: "Cleaner and often faster than subquery"

CTE Pattern for Clarity

-- Use CTEs for complex window queries WITH -- Step 1: Calculate raw metrics daily_metrics AS ( SELECT date, SUM(amount) AS daily_total FROM transactions GROUP BY date ), -- Step 2: Add window calculations with_windows AS ( SELECT date, daily_total, SUM(daily_total) OVER (ORDER BY date) AS running_total, AVG(daily_total) OVER ( ORDER BY date ROWS BETWEEN 6 PRECEDING AND CURRENT ROW ) AS rolling_7d_avg, LAG(daily_total, 7) OVER (ORDER BY date) AS same_day_last_week FROM daily_metrics ) -- Step 3: Final calculations SELECT date, daily_total, running_total, rolling_7d_avg, daily_total - same_day_last_week AS wow_change FROM with_windows WHERE date >= CURRENT_DATE - INTERVAL '30 days';

Лучшие практики

  • Явно указывай frame clause — дефолтное поведение может быть неожиданным

  • Используй CTE для сложных запросов — улучшает читаемость и maintainability

  • Создавай composite индексы — на PARTITION BY + ORDER BY колонки

  • Проверяй NULL handling — LAG/LEAD возвращают NULL для граничных строк

  • Тестируй edge cases — первая/последняя строка, одна строка в партиции

  • Используй QUALIFY где поддерживается — чище чем subquery

  • Учитывай особенности СУБД — синтаксис и функции различаются

  • Оптимизируй сначала WHERE — фильтруй до применения window functions

Source Transparency

This detail page is rendered from real SKILL.md content. Trust labels are metadata-based hints, not a safety guarantee.

Related Skills

Related by shared tags or category signals.

Automation

social-media-marketing

No summary provided by upstream source.

Repository SourceNeeds Review
Automation

video-marketing

No summary provided by upstream source.

Repository SourceNeeds Review
Automation

frontend-design

No summary provided by upstream source.

Repository SourceNeeds Review