If you’ve ever wondered how databases like MySQL or PostgreSQL can find data so fast—even with millions of records—the answer is often B-Trees.
This guide builds your understanding step by step:
- Binary Search → B-Tree intuition
- Insert / Search / Update / Delete
- Disk I/O and performance
We’ll use a simple bank account example throughout.
🧠 1. From Binary Search to B-Tree
🔍 Binary Search (Foundation)
Binary search works on sorted data:
[1000, 2000, 3000, 4000, 5000]
To find 4000:
- Check middle → 3000
- Go right
- Find 4000
👉 Time complexity: O(log n)
🚧 Problem with Binary Search on Disk
Binary search assumes:
- Data is in memory (RAM)
- Fast random access
👉 But databases store data on disk → accessing disk is slow
🌳 2. B-Tree = Binary Search + Tree + Disk Optimization
👉 B-Tree extends binary search:
Instead of 1 middle value → many keys per node
[2000 | 4000 | 6000 | 8000]
👉 This splits data into multiple ranges:
<2000 | 2000–4000 | 4000–6000 | ...
💡 Key Idea
B-Tree performs binary search inside each node, then uses pointers to jump to the next node
📦 3. Core Terminology
| Term | Meaning |
|---|---|
| Node | A container holding multiple keys |
| Key | Value used for comparison |
| Children | Pointers to other nodes |
| Root | Entry point |
| Leaf | Node with no children |
| Order (m) | Max number of children |
📏 4. B-Tree Properties
For order m:
- Max children = m
- Min children = ⌈m/2⌉ (except root)
- All leaves at same level
- Keys sorted
- children = keys + 1
🏦 5. Example Tree (Bank Accounts)
[4000 | 8000] / | \ [2000] [6000] [10000] / \ / \ / \ [1000] [3000] [5000] [7000] [9000] [11000]
🔍 6. Search (Query)
Example: Find 7000
Step-by-step:
-
Root:
[4000 | 8000]→ 7000 is between → go middle
-
Node:
[6000]→ 7000 > 6000 → go right
-
Leaf:
[7000]→ Found 🎉
💡 Key insight
- Each node: binary search inside
- Between nodes: pointer traversal
✏️ 7. Insert
Rule:
Insert into leaf → if overflow → split
Example
[9000 | 10000] + 11000 → overflow
Split:
- Middle = 10000
- Left = [9000]
- Right = [11000]
- Push 10000 up
🔥 Cascade Split
If parent overflows → split again → may create new root
🔄 8. Update
👉 Update is simple:
UPDATE users SET account = 7001 WHERE account = 7000
Two cases:
✔ Key unchanged (non-index column)
- Just update value
- No tree change
⚠ Key changed (indexed column)
👉 Equivalent to:
DELETE old key + INSERT new key
❌ 9. Delete
Deletion is harder than insert.
Step 1: Remove key
Example:
Delete 7000
Step 2: Fix underflow
If node has too few keys:
Option 1: Borrow from sibling
[5000] [7000] [9000] → borrow from neighbor
Option 2: Merge nodes
[5000] + [7000] → merge
→ pull key down from parent
🔥 Cascade Merge
Like split, merge can propagate upward
💽 10. Disk Read (I/O)
👉 1 disk read = load 1 node (page) into RAM
Example search:
Root → Internal → Leaf
👉 ~3 disk reads
In reality:
- Root cached
- Internal nodes cached
👉 ~1–2 disk reads
🚀 11. Why B-Tree is Fast
🔥 Reason 1: High Fan-out
Each node contains many keys:
~100–1000 keys
🔥 Reason 2: Very Small Height
| Records | Height |
|---|---|
| 1K | 2 |
| 1M | 3 |
| 1B | 4 |
🔥 Reason 3: Fewer Disk Reads
👉 That’s the real goal
📐 12. How Order (m) is Chosen
👉 Based on disk page size
Example:
- Page = 4KB
- Key = 8 bytes
- Pointer = 8 bytes
m ≈ 257
🧠 13. Big Picture
| Operation | What happens |
|---|---|
| Search | Traverse tree using binary search + pointers |
| Insert | Add → split if overflow |
| Update | Delete + insert (if key changes) |
| Delete | Remove → borrow or merge if underflow |
🎯 Final Takeaway
B-Tree = Binary search inside nodes + tree structure + disk optimization
It minimizes:
- Tree height
- Disk I/O
And that’s why databases can query millions of records in milliseconds 🚀
