Spotify End-to-End: How Everything Connects
Concept. A single click on Spotify's play button triggers a coordinated cascade across multiple subsystems (auth, recommendation, search, playback). Each is backed by a different storage and indexing strategy chosen for its specific workload.
Intuition. When Mickey hits play, Postgres authenticates him, an LSM-backed events store records the click, a vector search returns recommended next songs, a B+Tree-indexed catalog fetches the audio file, and a CDN streams it. All within 2 seconds.
One Play Button, Every Algorithm
When you hit play on "Anti-Hero," that single click sets off a chain reaction through every algorithm we've covered in this course.
1 Click
User action
3 Systems
OLTP → ETL → OLAP
8 Algorithms
From this course
2 Seconds
To CEO dashboard
The Journey
Figure: one click triggers three systems. Act 1 is OLTP (LSM tree write), Act 2 is ETL (Spark hash partition, BigSort, broadcast join, columnar write), and Act 3 is OLAP (star-schema query with broadcast and columnar scan).
Act 1 · The Click (OLTP)
You tap play. In ~8 ms, this happens:
INSERT INTO listens (user_id, song_id, timestamp)
VALUES (123456, 789, NOW());
The write lands in an in-memory MemTable, returns instantly, and is flushed to disk later as part of an SSTable. No random disk I/O on the hot path. That's why Spotify can sustain 100 K writes/sec per node.
→ Algorithm: LSM Tree (lsm-indexes.html)
8 ms response · 100K writes/sec · 8.6B events/day
Act 2 · The Transform (ETL)
At 2 AM, Spark kicks in. 8.6 billion events from yesterday need processing into the analytics warehouse.
# 1. Hash Partition → spread across 1000 nodes
listens.repartition(1000, "user_id")
# 2. BigSort → sort within each partition
.sortWithinPartitions("timestamp")
# 3. Broadcast Join → small dim tables to every node
.join(broadcast(songs), "song_id")
# 4. Write Columnar → compress 100 TB → 10 TB
.write.parquet("s3://warehouse/")
Each step is one of the algorithms from M3:
| Step | Algorithm | Page |
|---|---|---|
Repartition by user_id |
Hash Partitioning | hash-partitioning.html |
| Sort within partitions | BigSort | big-sort.html |
| Join dim tables | Broadcast Join | distributed-query.html |
| Write Parquet | Columnar storage | storage-layout.html |
3 hours total · 1000-node Spark cluster · 100 TB processed per day
Act 3 · The Dashboard (OLAP)
CEO opens the mobile app. The "Top Artists" dashboard renders in 2 seconds:
-- "Top Artists by Country This Month"
SELECT artist, country, COUNT(*) AS plays
FROM fact_plays f
JOIN dim_artists a ON f.artist_id = a.artist_id -- broadcast (10 MB)
WHERE f.date >= '2024-01-01'
GROUP BY artist, country
ORDER BY plays DESC
LIMIT 100;
The 10 MB dim_artists is broadcast to all 1000 nodes (no shuffle). The 1 PB fact_plays is scanned columnar. Only artist_id, country, date are read out of 20 columns.
→ Algorithms: Broadcast Join + Columnar Scan (distributed-query.html, storage-layout.html)
2-second query · 3 of 20 columns scanned · 1 PB warehouse
The Big Picture
| Act | System type | Hot algorithm | Latency budget |
|---|---|---|---|
| 1 · Click | OLTP | LSM tree write | < 10 ms |
| 2 · Transform | ETL batch | Hash partition · BigSort · Broadcast join · Columnar write | 3 hours |
| 3 · Dashboard | OLAP | Broadcast join · Columnar scan | 2 seconds |
The pattern: opposite optimizations at opposite ends of the pipeline. Act 1 needs blazing writes (LSM, no random I/O). Act 3 needs blazing scans (columnar, compressed). Act 2 (ETL) does the format conversion in between. You don't pick one. You pick the right tool for each stage.
Next: Module 3 Capstone takes the same "right tool for the job" mindset and applies it to seven open-ended Spotify-style design problems.