SQL Problem Solving: Reading Complex Queries
Understand and break down complex SQL before you start writing your own.
Example 1: Finding Top Genres Per User
-- Goal: For each user, find genres they have 2+ songs in.
-- Read in execution order: FROM/JOIN → WHERE → GROUP BY → HAVING → SELECT → ORDER BY
SELECT user_id, genre, -- Step 5: choose output columns
COUNT(*) AS song_count,
AVG(rating) AS avg_rating
FROM Listens l -- Step 1a: input table
JOIN Songs s ON l.song_id = s.song_id -- Step 1b: join for genre info
WHERE genre IS NOT NULL -- Step 2: row filter (drop NULL-genre songs)
GROUP BY user_id, genre -- Step 3: form (user, genre) groups
HAVING COUNT(*) >= 2 -- Step 4: keep groups with 2+ songs
ORDER BY user_id, avg_rating DESC; -- Step 5a: sort final output
The Mental Model
Step 1
FROM + JOIN → Gather all data (9 listens × song info)
Input: All rows from joined tables
Step 2
WHERE → Filter rows (eliminate NULL genres)
Row-level filtering before any grouping
Step 3
GROUP BY → Form groups (user-genre combinations)
Collapse rows into groups
Step 4
HAVING → Filter groups (retain groups with 2+ songs)
Group-level filtering after aggregation
Step 5
SELECT → Choose columns & calculate
Decide what appears in output
Step 5a
ORDER BY → Sort results
Final presentation order
Remember: Read queries in execution order (FROM → WHERE → GROUP BY → HAVING → SELECT → ORDER BY), not written order!
Example 2: Finding Power Users
📋 The Query
-- Goal: Find the top-half users by average rating.
-- Three layers: per-user metrics (CTE 1), percentile rank (CTE 2), final filter.
WITH user_stats AS ( -- CTE 1: per-user metrics
SELECT user_id,
COUNT(*) AS listen_count,
AVG(rating) AS avg_rating
FROM Listens
GROUP BY user_id
),
ranked_users AS ( -- CTE 2: percentile rank
SELECT user_id, listen_count, avg_rating,
PERCENT_RANK() OVER (ORDER BY avg_rating DESC) AS rating_rank
FROM user_stats
)
SELECT u.name, -- final layer: name + filter
r.listen_count,
r.avg_rating
FROM ranked_users r
JOIN Users u ON r.user_id = u.user_id
WHERE r.rating_rank < 0.5 -- top half
ORDER BY r.avg_rating DESC;
How to Read This Query
Building Understanding Step by Step
- Start Here
Final SELECT
• SELECT name, listen_count, avg_rating
• FROM ranked_users
• WHERE rating_rank < 0.5
• JOIN with Users table
• Get user names and their stats
• From something called ranked_users
• Only the top half (rating_rank < 0.5)
• Join Users to get actual names
→ So we need to understand how rating_rank is built
- Then Read
ranked_users CTE
• PERCENT_RANK() OVER
• ORDER BY avg_rating DESC
• FROM user_stats
• Rank each user by their avg rating
• Highest avg rating gets rank 0
• Built from user_stats table
→ rating_rank = where you fall in the rating distribution
- Finally
user_stats CTE
• COUNT(*) listens per user
• AVG(rating) per user
• GROUP BY user_id
• FROM Listens
• Count listens per user
• Compute their average rating
• One row per user
• Source: the Listens table
→ Foundation: raw listens become user-level metrics
Putting It Together
By reading backwards, we discovered the query's purpose: Find the top half of users by average rating. Each CTE builds on the previous one, transforming raw listens → per-user metrics → percentile ranks → final filtered list.