B+Tree Indexes
Concept. A B+Tree index keeps keys sorted in a balanced tree, supporting O(log N) point lookups and range scans (WHERE rating BETWEEN 4.0 AND 5.0) by walking the leaf pages in order.
Intuition. When you query "find Listens with rating between 4.0 and 5.0," a B+Tree on rating walks down the tree to the first 4.0, then reads leaf pages in order until it passes 5.0. That is one descent followed by a scan, not a tree walk per row.
Prerequisites. Index Fundamentals introduces (key, page#) pairs and pointers.
What a B+Tree Stores
A B+Tree sits on top of the table's data pages. Three layers of index nodes (root, internal, leaf) route a lookup down to the right page, and the leaves are linked left-to-right so a range scan can stream across them without going back to the root.
Every layer is one IO. A billion-row table with fanout around 1,000 fits in three levels (root, internal, leaf). Add one more IO to read the data page itself, and a point lookup is roughly 4 IOs regardless of table size.
Walking a Range Query
The B+Tree finds the starting leaf in O(log N) IOs, then streams across leaves until it passes the end key. The trace below walks 12000 ≤ release_date ≤ 14000.
The result is mostly sequential IO once the descent reaches the starting leaf, which is what makes range scans cheap on a B+Tree.
Why B+Trees Are Shallow: Fanout
Fanout is how many children an internal node has, and it's set by how many (key, pointer) pairs fit on one page. High fanout means a shallow tree, and a shallow tree means few IOs per lookup.
Fanout = ⌊page size / (key size + pointer size)⌋
Height = ⌈log_fanout(number of keys)⌉
A 4 KB page with 4-byte keys and 4-byte pointers gives a fanout of 512. An 8 KB page gives roughly 1,024. At fanout 1,000, a billion keys need only three levels. That is why B+Tree lookups feel constant-time even as tables grow.
How Page and Key Size Drive Tree Height
One trillion keys stored four different ways. Read down the Page-size column to see the effect of moving from a 64 KB to a 64 MB page.
| Page size | Key size | Fanout | Height | IOs / lookup |
|---|---|---|---|---|
| 64 KB | 8 B | 4,096 | 4 | 4 |
| 64 KB | 1 KB | 63 | 7 | 7 |
| 64 MB | 8 B | 4,194,304 | 2 | 2 |
| 64 MB | 1 KB | 65,027 | 3 | 3 |
A trillion-key index with 1 KB search keys goes from 7 IOs on a 64 KB page to just 3 IOs on a 64 MB page. Same tree structure, different arithmetic. The tradeoff is that larger pages use more memory per node, but they give you a far shallower tree. Real systems pick the page size that fits their workload: PostgreSQL ships with 8 KB, MySQL InnoDB with 16 KB, and analytical engines like BigQuery and Snowflake use MB-sized columnar blocks.
What B+Trees Handle Well, and What They Don't
Handles well
WHERE user_id = 42
WHERE rating BETWEEN 4.0 AND 5.0
WHERE name LIKE 'Ali%'
ORDER BY created_at
Equality, ranges, prefix matching, and sorted scans. Anything that benefits from keys being in order.
Doesn't help
WHERE LOWER(name) = 'alice'
WHERE name LIKE '%son'
WHERE age != 30
Functions on the indexed column (the index is on name, not LOWER(name)), suffix matches, and non-equality !=. The tree can't narrow the search space, so it falls back to a scan.
Key Takeaways
-
Fanout is the whole story. Fanout of 1,000 means
log₁,₀₀₀(N)levels, which is three for a billion rows. -
One descent, then a scan. Range queries walk the linked leaves, with no per-row tree traversal.
-
Match the index to the query shape. B+Trees shine on
=,BETWEEN,LIKE 'prefix%', andORDER BY. They can't help when you wrap the column in a function.
Next
SSTables → The immutable sorted file format that write-heavy stores use instead of a B+Tree on disk.