17. Window Functions & Analytics Queries

17. Window Functions & Analytics Queries

Learning Objectives

  • Understand window function concepts and differences from regular aggregate functions
  • Master OVER clause, partitions, and frame concepts
  • Utilize ranking functions (ROW_NUMBER, RANK, DENSE_RANK)
  • Utilize analytical functions (LEAD, LAG, FIRST_VALUE)
  • Improve practical analytical query writing skills

Table of Contents

  1. Window Function Basics
  2. Ranking Functions
  3. Analytical Functions
  4. Aggregate Window Functions
  5. Frame Details
  6. Practical Use Patterns
  7. Practice Problems

1. Window Function Basics

1.1 What are Window Functions?

β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚                 Regular Aggregate vs Window Functions            β”‚
β”‚                                                                 β”‚
β”‚   Regular Aggregate (GROUP BY)     Window Function (OVER)      β”‚
β”‚   β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”            β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”               β”‚
β”‚   β”‚ A | B | SUM   β”‚            β”‚ A | B | val | SUM             β”‚
β”‚   β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€            β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€           β”‚
β”‚   β”‚ X |   | 150   β”‚            β”‚ X | 1 | 50  | 150 β”‚           β”‚
β”‚   β”‚ Y |   | 120   β”‚            β”‚ X | 2 | 100 | 150 β”‚           β”‚
β”‚   β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜            β”‚ Y | 3 | 70  | 120 β”‚           β”‚
β”‚   (rows collapsed to groups)   β”‚ Y | 4 | 50  | 120 β”‚           β”‚
β”‚                                β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜           β”‚
β”‚                                (all rows kept + aggregate)      β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

1.2 Basic Syntax

-- Window function basic structure
function_name() OVER (
    [PARTITION BY column]    -- Group division (optional)
    [ORDER BY column]        -- Ordering (optional)
    [frame_clause]           -- Range specification (optional)
)

-- Example
SELECT
    department,
    employee_name,
    salary,
    SUM(salary) OVER (PARTITION BY department) AS dept_total,
    AVG(salary) OVER (PARTITION BY department) AS dept_avg
FROM employees;

1.3 Sample Data

-- Create test table
CREATE TABLE sales (
    id SERIAL PRIMARY KEY,
    salesperson VARCHAR(50),
    region VARCHAR(20),
    sale_date DATE,
    amount NUMERIC(10,2)
);

INSERT INTO sales (salesperson, region, sale_date, amount) VALUES
    ('Alice', 'East', '2024-01-15', 1000),
    ('Alice', 'East', '2024-01-20', 1500),
    ('Alice', 'East', '2024-02-10', 2000),
    ('Bob', 'East', '2024-01-18', 800),
    ('Bob', 'East', '2024-02-15', 1200),
    ('Charlie', 'West', '2024-01-10', 900),
    ('Charlie', 'West', '2024-01-25', 1100),
    ('Charlie', 'West', '2024-02-20', 1300),
    ('Diana', 'West', '2024-01-30', 700),
    ('Diana', 'West', '2024-02-05', 1600);

2. Ranking Functions

2.1 ROW_NUMBER, RANK, DENSE_RANK Comparison

-- Compare ranking functions
SELECT
    salesperson,
    amount,
    ROW_NUMBER() OVER (ORDER BY amount DESC) AS row_num,
    RANK() OVER (ORDER BY amount DESC) AS rank,
    DENSE_RANK() OVER (ORDER BY amount DESC) AS dense_rank
FROM sales;
Example result (tie handling difference):
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚ salespersonβ”‚ amount β”‚ row_num β”‚ rank β”‚ dense_rank β”‚
β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚ Alice      β”‚ 2000   β”‚ 1       β”‚ 1    β”‚ 1          β”‚
β”‚ Diana      β”‚ 1600   β”‚ 2       β”‚ 2    β”‚ 2          β”‚
β”‚ Alice      β”‚ 1500   β”‚ 3       β”‚ 3    β”‚ 3          β”‚
β”‚ Charlie    β”‚ 1300   β”‚ 4       β”‚ 4    β”‚ 4          β”‚
β”‚ Bob        β”‚ 1200   β”‚ 5       β”‚ 5    β”‚ 5          β”‚
β”‚ Charlie    β”‚ 1100   β”‚ 6       β”‚ 6    β”‚ 6          β”‚
β”‚ Alice      β”‚ 1000   β”‚ 7       β”‚ 7    β”‚ 7          β”‚  -- no ties
β”‚ Charlie    β”‚  900   β”‚ 8       β”‚ 8    β”‚ 8          β”‚
β”‚ Bob        β”‚  800   β”‚ 9       β”‚ 9    β”‚ 9          β”‚
β”‚ Diana      β”‚  700   β”‚ 10      β”‚ 10   β”‚ 10         β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

When there are ties:
β”‚ A          β”‚ 1000   β”‚ 1       β”‚ 1    β”‚ 1          β”‚
β”‚ B          β”‚ 1000   β”‚ 2       β”‚ 1    β”‚ 1          β”‚  -- tie!
β”‚ C          β”‚  900   β”‚ 3       β”‚ 3    β”‚ 2          β”‚
                      (sequential) (skip) (sequential)

2.2 Ranking with PARTITION BY

-- Regional sales ranking
SELECT
    region,
    salesperson,
    amount,
    RANK() OVER (
        PARTITION BY region
        ORDER BY amount DESC
    ) AS region_rank
FROM sales;
Result:
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚ region β”‚ salespersonβ”‚ amount β”‚ region_rank β”‚
β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚ East   β”‚ Alice      β”‚ 2000   β”‚ 1           β”‚
β”‚ East   β”‚ Alice      β”‚ 1500   β”‚ 2           β”‚
β”‚ East   β”‚ Bob        β”‚ 1200   β”‚ 3           β”‚
β”‚ East   β”‚ Alice      β”‚ 1000   β”‚ 4           β”‚
β”‚ East   β”‚ Bob        β”‚  800   β”‚ 5           β”‚
β”‚ West   β”‚ Diana      β”‚ 1600   β”‚ 1           β”‚  ← partition reset
β”‚ West   β”‚ Charlie    β”‚ 1300   β”‚ 2           β”‚
β”‚ West   β”‚ Charlie    β”‚ 1100   β”‚ 3           β”‚
β”‚ West   β”‚ Charlie    β”‚  900   β”‚ 4           β”‚
β”‚ West   β”‚ Diana      β”‚  700   β”‚ 5           β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

2.3 NTILE - Assign Quantiles

-- Divide into 4 quartiles
SELECT
    salesperson,
    amount,
    NTILE(4) OVER (ORDER BY amount DESC) AS quartile
FROM sales;

-- Use case: identify top 25% customers
SELECT *
FROM (
    SELECT
        customer_id,
        total_purchase,
        NTILE(4) OVER (ORDER BY total_purchase DESC) AS quartile
    FROM customer_summary
) sub
WHERE quartile = 1;  -- top 25%

2.4 Top-N Queries

-- Extract top 2 sales per region
SELECT *
FROM (
    SELECT
        region,
        salesperson,
        sale_date,
        amount,
        ROW_NUMBER() OVER (
            PARTITION BY region
            ORDER BY amount DESC
        ) AS rn
    FROM sales
) ranked
WHERE rn <= 2;

3. Analytical Functions

3.1 LAG and LEAD

-- LAG: reference previous row value
-- LEAD: reference next row value
SELECT
    salesperson,
    sale_date,
    amount,
    LAG(amount) OVER (
        PARTITION BY salesperson
        ORDER BY sale_date
    ) AS prev_amount,
    LEAD(amount) OVER (
        PARTITION BY salesperson
        ORDER BY sale_date
    ) AS next_amount
FROM sales
ORDER BY salesperson, sale_date;
Result:
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚ salespersonβ”‚ sale_date  β”‚ amount β”‚ prev_amount β”‚ next_amount β”‚
β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚ Alice      β”‚ 2024-01-15 β”‚ 1000   β”‚ NULL        β”‚ 1500        β”‚
β”‚ Alice      β”‚ 2024-01-20 β”‚ 1500   β”‚ 1000        β”‚ 2000        β”‚
β”‚ Alice      β”‚ 2024-02-10 β”‚ 2000   β”‚ 1500        β”‚ NULL        β”‚
β”‚ Bob        β”‚ 2024-01-18 β”‚  800   β”‚ NULL        β”‚ 1200        β”‚
β”‚ Bob        β”‚ 2024-02-15 β”‚ 1200   β”‚  800        β”‚ NULL        β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

3.2 Calculate Growth Rate

-- Month-over-month growth rate
SELECT
    salesperson,
    sale_date,
    amount,
    LAG(amount) OVER (
        PARTITION BY salesperson
        ORDER BY sale_date
    ) AS prev_amount,
    ROUND(
        (amount - LAG(amount) OVER (
            PARTITION BY salesperson ORDER BY sale_date
        )) * 100.0 /
        NULLIF(LAG(amount) OVER (
            PARTITION BY salesperson ORDER BY sale_date
        ), 0),
        2
    ) AS growth_pct
FROM sales
ORDER BY salesperson, sale_date;

3.3 FIRST_VALUE, LAST_VALUE, NTH_VALUE

-- First/last value within partition
SELECT
    salesperson,
    sale_date,
    amount,
    FIRST_VALUE(amount) OVER (
        PARTITION BY salesperson
        ORDER BY sale_date
        ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING
    ) AS first_sale,
    LAST_VALUE(amount) OVER (
        PARTITION BY salesperson
        ORDER BY sale_date
        ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING
    ) AS last_sale,
    NTH_VALUE(amount, 2) OVER (
        PARTITION BY salesperson
        ORDER BY sale_date
        ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING
    ) AS second_sale
FROM sales;

4. Aggregate Window Functions

4.1 SUM, AVG, COUNT

-- Window aggregate functions
SELECT
    salesperson,
    sale_date,
    amount,
    -- Total by salesperson
    SUM(amount) OVER (PARTITION BY salesperson) AS person_total,
    -- Average by salesperson
    AVG(amount) OVER (PARTITION BY salesperson) AS person_avg,
    -- Percentage of total
    ROUND(amount * 100.0 / SUM(amount) OVER (), 2) AS pct_of_total
FROM sales
ORDER BY salesperson, sale_date;

4.2 Running Total

-- Cumulative sum
SELECT
    sale_date,
    amount,
    SUM(amount) OVER (
        ORDER BY sale_date
        ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW
    ) AS running_total
FROM sales
ORDER BY sale_date;

-- Cumulative sum by salesperson
SELECT
    salesperson,
    sale_date,
    amount,
    SUM(amount) OVER (
        PARTITION BY salesperson
        ORDER BY sale_date
    ) AS cumulative_sales
FROM sales
ORDER BY salesperson, sale_date;

4.3 Moving Average

-- Moving average of last 3 records
SELECT
    sale_date,
    amount,
    AVG(amount) OVER (
        ORDER BY sale_date
        ROWS BETWEEN 2 PRECEDING AND CURRENT ROW
    ) AS moving_avg_3
FROM sales
ORDER BY sale_date;

-- Centered moving average (including 1 before and after)
SELECT
    sale_date,
    amount,
    AVG(amount) OVER (
        ORDER BY sale_date
        ROWS BETWEEN 1 PRECEDING AND 1 FOLLOWING
    ) AS centered_avg
FROM sales
ORDER BY sale_date;

5. Frame Details

5.1 Frame Syntax

ROWS | RANGE | GROUPS BETWEEN start_point AND end_point

start_point/end_point:
- UNBOUNDED PRECEDING  -- partition start
- n PRECEDING          -- n rows before
- CURRENT ROW          -- current row
- n FOLLOWING          -- n rows after
- UNBOUNDED FOLLOWING  -- partition end

5.2 ROWS vs RANGE

-- Test with tie data to see difference
CREATE TABLE test_frame (
    id INT,
    val INT
);
INSERT INTO test_frame VALUES (1, 100), (2, 100), (3, 200), (4, 200), (5, 300);

-- ROWS: physical row-based
SELECT
    id, val,
    SUM(val) OVER (ORDER BY val ROWS UNBOUNDED PRECEDING) AS rows_sum
FROM test_frame;

-- RANGE: logical value-based (same values grouped)
SELECT
    id, val,
    SUM(val) OVER (ORDER BY val RANGE UNBOUNDED PRECEDING) AS range_sum
FROM test_frame;
Result comparison:
ROWS:                          RANGE:
β”Œβ”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”       β”Œβ”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚ id β”‚ val β”‚ rows_sum β”‚       β”‚ id β”‚ val β”‚ range_sum β”‚
β”œβ”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€       β”œβ”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚ 1  β”‚ 100 β”‚ 100      β”‚       β”‚ 1  β”‚ 100 β”‚ 200       β”‚ ← two 100s
β”‚ 2  β”‚ 100 β”‚ 200      β”‚       β”‚ 2  β”‚ 100 β”‚ 200       β”‚ ← same
β”‚ 3  β”‚ 200 β”‚ 400      β”‚       β”‚ 3  β”‚ 200 β”‚ 600       β”‚ ← two 200s
β”‚ 4  β”‚ 200 β”‚ 600      β”‚       β”‚ 4  β”‚ 200 β”‚ 600       β”‚ ← same
β”‚ 5  β”‚ 300 β”‚ 900      β”‚       β”‚ 5  β”‚ 300 β”‚ 900       β”‚
β””β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜       β””β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

5.3 GROUPS (PostgreSQL 11+)

-- GROUPS: same ORDER BY values as one group
SELECT
    id, val,
    SUM(val) OVER (
        ORDER BY val
        GROUPS BETWEEN 1 PRECEDING AND CURRENT ROW
    ) AS groups_sum
FROM test_frame;

5.4 EXCLUDE Clause (PostgreSQL 11+)

-- Exclude specific rows from frame
SELECT
    id, val,
    SUM(val) OVER (
        ORDER BY val
        ROWS BETWEEN 1 PRECEDING AND 1 FOLLOWING
        EXCLUDE CURRENT ROW  -- exclude current row
    ) AS sum_excluding_current
FROM test_frame;

-- EXCLUDE options:
-- EXCLUDE NO OTHERS (default)
-- EXCLUDE CURRENT ROW
-- EXCLUDE GROUP (current row and same values)
-- EXCLUDE TIES (same values except current row)

6. Practical Use Patterns

6.1 Daily Cumulative Sales and Target Achievement

SELECT
    sale_date,
    amount,
    SUM(amount) OVER (
        ORDER BY sale_date
    ) AS cumulative_sales,
    ROUND(
        SUM(amount) OVER (ORDER BY sale_date) * 100.0 / 10000,
        2
    ) AS target_pct  -- target: 10,000
FROM sales
ORDER BY sale_date;

6.2 Outlier Detection

-- Data beyond average Β± 2 standard deviations
WITH stats AS (
    SELECT
        salesperson,
        amount,
        AVG(amount) OVER (PARTITION BY salesperson) AS avg_amount,
        STDDEV(amount) OVER (PARTITION BY salesperson) AS stddev_amount
    FROM sales
)
SELECT *
FROM stats
WHERE amount > avg_amount + 2 * stddev_amount
   OR amount < avg_amount - 2 * stddev_amount;

6.3 Consecutive Record Analysis

-- Calculate consecutive sales days
WITH daily_sales AS (
    SELECT
        salesperson,
        sale_date,
        sale_date - (ROW_NUMBER() OVER (
            PARTITION BY salesperson
            ORDER BY sale_date
        ))::int AS grp
    FROM sales
)
SELECT
    salesperson,
    MIN(sale_date) AS streak_start,
    MAX(sale_date) AS streak_end,
    COUNT(*) AS streak_length
FROM daily_sales
GROUP BY salesperson, grp
ORDER BY salesperson, streak_start;

6.4 Row/Column Comparison Without Pivot

-- Current vs previous month vs same month last year
SELECT
    salesperson,
    DATE_TRUNC('month', sale_date) AS month,
    SUM(amount) AS monthly_total,
    LAG(SUM(amount)) OVER (
        PARTITION BY salesperson
        ORDER BY DATE_TRUNC('month', sale_date)
    ) AS prev_month,
    LAG(SUM(amount), 12) OVER (
        PARTITION BY salesperson
        ORDER BY DATE_TRUNC('month', sale_date)
    ) AS same_month_last_year
FROM sales
GROUP BY salesperson, DATE_TRUNC('month', sale_date)
ORDER BY salesperson, month;

6.5 Sessionization

-- New session if gap exceeds 30 minutes
WITH events AS (
    SELECT
        user_id,
        event_time,
        LAG(event_time) OVER (
            PARTITION BY user_id ORDER BY event_time
        ) AS prev_event_time
    FROM user_events
),
session_flags AS (
    SELECT
        user_id,
        event_time,
        CASE
            WHEN prev_event_time IS NULL THEN 1
            WHEN event_time - prev_event_time > INTERVAL '30 minutes' THEN 1
            ELSE 0
        END AS is_new_session
    FROM events
)
SELECT
    user_id,
    event_time,
    SUM(is_new_session) OVER (
        PARTITION BY user_id
        ORDER BY event_time
    ) AS session_id
FROM session_flags;

6.6 Gap Filling

-- Generate date series and LEFT JOIN
WITH date_series AS (
    SELECT generate_series(
        '2024-01-01'::date,
        '2024-01-31'::date,
        '1 day'::interval
    )::date AS date
),
daily_totals AS (
    SELECT sale_date, SUM(amount) AS total
    FROM sales
    GROUP BY sale_date
)
SELECT
    ds.date,
    COALESCE(dt.total, 0) AS daily_total,
    SUM(COALESCE(dt.total, 0)) OVER (ORDER BY ds.date) AS running_total
FROM date_series ds
LEFT JOIN daily_totals dt ON ds.date = dt.sale_date
ORDER BY ds.date;

6.7 Percentile Calculation

-- Calculate percentiles
SELECT
    salesperson,
    amount,
    PERCENT_RANK() OVER (ORDER BY amount) AS pct_rank,
    CUME_DIST() OVER (ORDER BY amount) AS cume_dist,
    PERCENTILE_CONT(0.5) WITHIN GROUP (ORDER BY amount) OVER () AS median
FROM sales;

-- Median by group
SELECT DISTINCT
    region,
    PERCENTILE_CONT(0.5) WITHIN GROUP (ORDER BY amount)
        OVER (PARTITION BY region) AS median_by_region
FROM sales;

7. Practice Problems

Exercise 1: Sales Performance Analysis

Analyze each salesperson's sales amount and calculate: - Rank by salesperson - Percentage of total - Change from previous sale

-- Example answer
SELECT
    salesperson,
    sale_date,
    amount,
    RANK() OVER (ORDER BY amount DESC) AS overall_rank,
    RANK() OVER (
        PARTITION BY salesperson
        ORDER BY amount DESC
    ) AS personal_rank,
    ROUND(amount * 100.0 / SUM(amount) OVER (), 2) AS pct_of_total,
    amount - LAG(amount) OVER (
        PARTITION BY salesperson ORDER BY sale_date
    ) AS change_from_prev
FROM sales
ORDER BY salesperson, sale_date;

Exercise 2: Moving Sum

Calculate moving sum for the last 7 days.

-- Example answer
SELECT
    sale_date,
    amount,
    SUM(amount) OVER (
        ORDER BY sale_date
        RANGE BETWEEN INTERVAL '6 days' PRECEDING AND CURRENT ROW
    ) AS rolling_7day_sum
FROM sales
ORDER BY sale_date;

Exercise 3: Find Target Achievement Date

Find the first date when cumulative sales reached 5000.

-- Example answer
SELECT sale_date, cumulative
FROM (
    SELECT
        sale_date,
        SUM(amount) OVER (ORDER BY sale_date) AS cumulative,
        LAG(SUM(amount) OVER (ORDER BY sale_date)) OVER (ORDER BY sale_date) AS prev_cumulative
    FROM sales
) sub
WHERE cumulative >= 5000
  AND (prev_cumulative IS NULL OR prev_cumulative < 5000)
LIMIT 1;

Next Steps

References

to navigate between lessons