Window Functions: Calculations Without Collapsing
Concept. Window functions compute aggregates across a partition defined by OVER, returning one result per input row instead of collapsing them like GROUP BY. The OVER PARTITION BY syntax adds derived metrics while preserving row granularity.
Intuition. AVG(rating) OVER (PARTITION BY user_id) computes Mickey's average alongside every one of Mickey's listen rows without collapsing them. The Listens table comes back at 9 rows, plus one new column.
GROUP BY vs PARTITION BY: The Key Difference
GROUP BY: Collapses Rows
GROUP BY collapses detailed entries into aggregate summaries, to return the mathematical output.
-- One row per user; the per-row Listens detail is gone.
SELECT user_id, AVG(rating) AS avg_rating
FROM Listens
GROUP BY user_id;
| user_id | avg_rating | Notes | |
|---|---|---|---|
| 1 | 4.2 | AVG(4.5, 4.2, 3.9) = 12.6/3 | |
| 2 | 4.4 | AVG(4.7, 4.6, 3.9) = 13.2/3 | |
| 3 | 3.9 | AVG(2.9, 4.9, NULL) = 7.8/2 (NULL ignored) |
Result: 9 rows → 3 rows
PARTITION BY: Keep All Rows
PARTITION BY isolates data ranges for aggregate mathematical calculation while strictly preserving the underlying query results.
-- All 9 listen rows preserved, with each user's avg attached as a new column.
SELECT user_id, song_id, rating,
AVG(rating) OVER (PARTITION BY user_id) AS user_avg
FROM Listens
ORDER BY user_id, song_id;
| user_id | song_id | rating | user_avg | Notes | |
|---|---|---|---|---|---|
| 1 | 1 | 4.5 | 4.2 | Partition 1: AVG(4.5, 4.2, 3.9) | |
| 1 | 2 | 4.2 | 4.2 | Partition 1: AVG(4.5, 4.2, 3.9) | |
| 1 | 6 | 3.9 | 4.2 | Partition 1: AVG(4.5, 4.2, 3.9) | |
| 2 | 2 | 4.7 | 4.4 | Partition 2: AVG(4.7, 4.6, 3.9) | |
| 2 | 7 | 4.6 | 4.4 | Partition 2: AVG(4.7, 4.6, 3.9) | |
| 2 | 8 | 3.9 | 4.4 | Partition 2: AVG(4.7, 4.6, 3.9) | |
| 3 | 1 | 2.9 | 3.9 | Partition 3: AVG(2.9, 4.9, NULL) = 3.9 | |
| 3 | 2 | 4.9 | 3.9 | Partition 3: AVG(2.9, 4.9, NULL) = 3.9 | |
| 3 | 6 | NULL | 3.9 | Partition 3: AVG ignores NULL |
Result: 9 rows → 9 rows (all kept)
Ranking Functions: ROW_NUMBER, RANK, DENSE_RANK
Window functions let you rank data without collapsing it. Here's the query we'll trace:
-- Rank each user's listens by rating, highest first.
SELECT
user_id,
song_id,
rating,
ROW_NUMBER() OVER (PARTITION BY user_id ORDER BY rating DESC) AS rn,
RANK() OVER (PARTITION BY user_id ORDER BY rating DESC) AS rk,
DENSE_RANK() OVER (PARTITION BY user_id ORDER BY rating DESC) AS drk
FROM Listens;
Walk it in three steps:
Step 1: Partition by user_id
| user_id | song_id | rating | Notes | |
|---|---|---|---|---|
| 1 | 1 | 4.5 | Partition 1 | |
| 1 | 2 | 4.2 | Partition 1 | |
| 1 | 6 | 3.9 | Partition 1 | |
| 2 | 2 | 4.7 | Partition 2 | |
| 2 | 7 | 4.6 | Partition 2 | |
| 2 | 8 | 3.9 | Partition 2 | |
| 3 | 1 | 2.9 | Partition 3 | |
| 3 | 2 | 4.9 | Partition 3 | |
| 3 | 6 | NULL | Partition 3 |
Step 2: Order by rating DESC within each partition
| user_id | song_id | rating | Notes | |
|---|---|---|---|---|
| 1 | 1 | 4.5 | Highest in partition 1 | |
| 1 | 2 | 4.2 | Second in partition 1 | |
| 1 | 6 | 3.9 | Third in partition 1 | |
| 2 | 2 | 4.7 | Highest in partition 2 | |
| 2 | 7 | 4.6 | Second in partition 2 | |
| 2 | 8 | 3.9 | Third in partition 2 | |
| 3 | 2 | 4.9 | Highest in partition 3 | |
| 3 | 1 | 2.9 | Second in partition 3 | |
| 3 | 6 | NULL | NULL sorts last with DESC |
Step 3: Apply ranking functions
| user_id | song_id | rating | rn | rk | drk | Notes | |
|---|---|---|---|---|---|---|---|
| 1 | 1 | 4.5 | 1 | 1 | 1 | Highest | |
| 1 | 2 | 4.2 | 2 | 2 | 2 | Second | |
| 1 | 6 | 3.9 | 3 | 3 | 3 | Third | |
| 2 | 2 | 4.7 | 1 | 1 | 1 | Highest | |
| 2 | 7 | 4.6 | 2 | 2 | 2 | Second | |
| 2 | 8 | 3.9 | 3 | 3 | 3 | Third | |
| 3 | 2 | 4.9 | 1 | 1 | 1 | Highest | |
| 3 | 1 | 2.9 | 2 | 2 | 2 | Second | |
| 3 | 6 | NULL | 3 | 3 | 3 | NULL gets rank 3 |
Key Differences:
-
ROW_NUMBER: Always unique (1, 2, 3, 4)
-
RANK: Ties share, gaps after (1, 2, 2, 4)
-
DENSE_RANK: Ties share, no gaps (1, 2, 2, 3)
What if there were a tie?
Our Listens data has no duplicate ratings, so all three functions land on the same numbers above. To see them diverge, imagine User 1 also had a 4th listen, song 9, rated 4.5 (a tie with their existing 4.5). Sorting User 1's partition by rating DESC now gives:
| user_id | song_id | rating | rn | rk | drk | Notes | |
|---|---|---|---|---|---|---|---|
| 1 | 1 | 4.5 | 1 | 1 | 1 | Tied for 1st | |
| 1 | 9 | 4.5 | 2 | 1 | 1 | Tied for 1st (hypothetical) | |
| 1 | 2 | 4.2 | 3 | 3 | 2 | RANK skips to 3, DENSE_RANK is 2 | |
| 1 | 6 | 3.9 | 4 | 4 | 3 | rn counts; rk=4, drk=3 |
ROW_NUMBER doesn't care about ties. RANK shares the tied rank but leaves a gap (no #2). DENSE_RANK shares and stays packed (#2 still gets used).
Key Rules
-
No Row Reduction: Unlike GROUP BY, all rows survive.
-
PARTITION BY: Defines groups (optional - omit for whole table).
-
ORDER BY: Sets sequence within partitions (required for some functions).
-
NULL Handling: NULLs group together in PARTITION BY, sort first/last in ORDER BY.
Common Patterns
-
Ranking: RANK(), ROW_NUMBER(), DENSE_RANK()
-
Running Totals: SUM() OVER (ORDER BY...)
-
Moving Averages: AVG() OVER (ROWS BETWEEN...)
-
Lead/Lag: Compare to previous/next rows
-
Percentiles: NTILE(), PERCENT_RANK()
Common Mistakes
The Missing PARTITION BY Bug
Without PARTITION BY, you get a global ranking, which might not be what you intended.
Wrong ORDER BY Direction
ASC vs DESC changes the ranking meaning, so double-check your order.
Forgetting NULLs in ORDER BY
NULL sorting varies by database (PostgreSQL: NULLs first, MySQL: NULLs last), which can lead to unexpected results.