Buffer Pool
What It Is
The Buffer Pool is a cache that keeps frequently accessed pages in memory. Instead of reading from disk every time, we check the buffer pool first.
The Problem It Solves
Disk I/O is slow:
Operation Time
─────────────────────────────
CPU instruction ~1 ns
RAM access ~100 ns
SSD read ~100,000 ns (100 μs)
HDD read ~10,000,000 ns (10 ms)
RAM is 1,000-100,000x faster than disk. If we can keep hot pages in RAM, performance improves dramatically.
Architecture
┌─────────────────────────────────────────────────────────────────┐
│ Buffer Pool │
│ │
│ ┌─────────────────────────────────────────────────────────┐ │
│ │ Page Table │ │
│ │ (PageId → FrameId mapping) │ │
│ │ │ │
│ │ PageId 5 ──► Frame 2 │ │
│ │ PageId 12 ─► Frame 0 │ │
│ │ PageId 3 ──► Frame 7 │ │
│ └─────────────────────────────────────────────────────────┘ │
│ │ │
│ ▼ │
│ ┌───────┐ ┌───────┐ ┌───────┐ ┌───────┐ ┌───────┐ ┌───────┐ │
│ │Frame 0│ │Frame 1│ │Frame 2│ │Frame 3│ │Frame 4│ │Frame 5│ │
│ │───────│ │───────│ │───────│ │───────│ │───────│ │───────│ │
│ │pg: 12 │ │pg: -- │ │pg: 5 │ │pg: -- │ │pg: 8 │ │pg: 1 │ │
│ │pin: 2 │ │pin: 0 │ │pin: 1 │ │pin: 0 │ │pin: 0 │ │pin: 3 │ │
│ │dirty:Y│ │dirty:N│ │dirty:N│ │dirty:N│ │dirty:Y│ │dirty:N│ │
│ │use: 3 │ │use: 0 │ │use: 1 │ │use: 0 │ │use: 2 │ │use: 5 │ │
│ │[data] │ │[data] │ │[data] │ │[data] │ │[data] │ │[data] │ │
│ └───────┘ └───────┘ └───────┘ └───────┘ └───────┘ └───────┘ │
│ │
│ Free list: [1, 3] (frames not holding any page) │
│ Clock hand: ──────────────────────► │
└─────────────────────────────────────────────────────────────────┘
The Frame
Each frame holds one page:
pub const BufferFrame = struct {
page_id: PageId, // Which page is here (0 = empty)
data: []align(4096) u8, // The actual 4KB page data
pin_count: atomic(u32), // Reference count
dirty: bool, // Modified since read?
usage_count: u8, // For Clock eviction
latch: RwLatch, // Reader-writer lock
};
Pin Count
The pin count is a reference count:
fetchPage() ──► pin_count += 1 "I'm using this page"
unpinPage() ──► pin_count -= 1 "I'm done with this page"
pin_count > 0 ──► Page is in use, cannot be evicted
pin_count = 0 ──► Page can be evicted if needed
Dirty Flag
Read page ──► dirty = false "Matches disk"
Modify page ──► dirty = true "Different from disk"
Write to disk ─► dirty = false "Matches disk again"
Dirty pages MUST be written to disk before eviction. Otherwise we lose data!
Reader-Writer Latch
Multiple readers OR one writer
Read latch: "I'm reading, others can read too"
Write latch: "I'm modifying, exclusive access"
Fetching a Page
pub fn fetchPage(self: *Self, page_id: PageId, mode: LatchMode) !*BufferFrame {
self.mutex.lock();
defer self.mutex.unlock();
// 1. Check if already in buffer pool
if (self.page_table.get(page_id)) |frame_id| {
const frame = &self.frames[frame_id];
frame.pin_count.fetchAdd(1, .monotonic);
frame.usage_count = @min(frame.usage_count + 1, 255);
acquireLatch(frame, mode);
return frame;
}
// 2. Not in pool - need to load from disk
const frame = try self.findVictimFrame();
// 3. If victim is dirty, flush it first
if (frame.dirty) {
try self.pm.writePage(frame.page_id, frame.data);
frame.dirty = false;
}
// 4. Update page table
if (frame.page_id != NULL_PAGE) {
self.page_table.remove(frame.page_id);
}
self.page_table.put(page_id, frame_id);
// 5. Read page from disk
try self.pm.readPage(page_id, frame.data);
// 6. Set up frame
frame.page_id = page_id;
frame.pin_count.store(1, .monotonic);
frame.dirty = false;
frame.usage_count = 1;
acquireLatch(frame, mode);
return frame;
}
The Clock Eviction Algorithm
When the buffer pool is full, we need to evict a page to make room. We use the Clock algorithm (also called "second chance").
Why Clock?
- LRU (Least Recently Used) is optimal but expensive - requires updating timestamps on every access
- Clock approximates LRU cheaply using a usage bit
How It Works
Imagine the frames arranged in a circle with a clock hand:
┌─────┐
┌────│ 3 │────┐
╱ │use:1│ ╲
┌─────┐ └─────┘ ┌─────┐
│ 2 │ │ 4 │
│use:0│ │use:2│
└─────┘ └─────┘
╲ ┌─────┐ ╱
└────│ 1 │────┘
│use:0│◄──── clock hand
│pin:0│
└─────┘
To find a victim:
fn findVictimFrame(self: *Self) !*BufferFrame {
// Try free list first
if (self.free_list.pop()) |frame_id| {
return &self.frames[frame_id];
}
// Clock sweep
var attempts: usize = 0;
while (attempts < self.frame_count * 2) {
const frame = &self.frames[self.clock_hand];
// Move hand
self.clock_hand = (self.clock_hand + 1) % self.frame_count;
attempts += 1;
// Skip pinned pages
if (frame.pin_count.load(.monotonic) > 0) {
continue;
}
// Second chance: if used recently, clear and skip
if (frame.usage_count > 0) {
frame.usage_count -= 1;
continue;
}
// Found victim!
return frame;
}
return error.BufferPoolFull; // All pages pinned
}
The key insight: Usage count gives pages a "second chance". Recently used pages survive one clock sweep. Only pages that haven't been used in a full rotation get evicted.
Unpinning Pages
When done with a page:
pub fn unpinPage(self: *Self, frame: *BufferFrame, dirty: bool) void {
// Release latch
frame.latch.release();
// Mark dirty if modified
if (dirty) {
frame.dirty = true;
}
// Decrement pin count
_ = frame.pin_count.fetchSub(1, .monotonic);
}
Always unpin! Failure to unpin causes:
- Pages stuck in memory forever
- Buffer pool eventually fills with pinned pages
BufferPoolFullerrors
Flushing Pages
Writing dirty pages to disk:
// Flush one page
pub fn flushPage(self: *Self, page_id: PageId) !void {
const frame = self.getFrame(page_id) orelse return;
if (frame.dirty) {
try self.pm.writePage(frame.page_id, frame.data);
frame.dirty = false;
}
}
// Flush all dirty pages
pub fn flushAll(self: *Self) !void {
for (self.frames) |*frame| {
if (frame.page_id != NULL_PAGE and frame.dirty) {
try self.pm.writePage(frame.page_id, frame.data);
frame.dirty = false;
}
}
}
Thread Safety
The buffer pool is thread-safe:
- Mutex protects page_table, free_list, clock_hand
- Per-frame latches protect page data
- Atomic pin_count for safe reference counting
Thread 1: fetchPage(5) Thread 2: fetchPage(5)
│ │
├─► mutex.lock() ├─► mutex.lock() [WAIT]
│ look up page 5 │
│ pin_count++ │
│ mutex.unlock() ─────────────┼─► mutex.lock() [GOT IT]
│ │ look up page 5 [FOUND]
├─► frame.latch.read() │ pin_count++
│ ... read data ... │ mutex.unlock()
│ │
│ ├─► frame.latch.read()
│ │ ... read data ...
│ │
├─► unpinPage() ├─► unpinPage()
Multiple threads can read the same page concurrently (shared latch).
Memory Alignment
Page buffers are 4KB-aligned:
const data = try allocator.alignedAlloc(u8, 4096, page_size);
Why?
- Direct I/O: Some systems require aligned buffers for O_DIRECT
- SIMD: Aligned data enables vectorized operations
- Cache lines: Better CPU cache utilization
Sizing the Buffer Pool
// 64MB buffer pool = 16,384 pages
var bp = try BufferPool.init(allocator, &pm, 64 * 1024 * 1024);
Guidelines:
- More is better (to a point)
- Working set: Should fit frequently accessed pages
- Available RAM: Leave room for OS and other processes
- Typical: 25-75% of available RAM
Usage Pattern
var bp = try BufferPool.init(allocator, &pm, pool_size);
defer bp.deinit(); // Flushes dirty pages
// Read a page
const frame = try bp.fetchPage(page_id, .shared);
defer bp.unpinPage(frame, false);
const value = readValueFromPage(frame.data);
// Modify a page
const frame = try bp.fetchPage(page_id, .exclusive);
defer bp.unpinPage(frame, true); // true = dirty
modifyPage(frame.data);
Key Invariants
- Pin before access: Never access page data without pinning
- Unpin when done: Every fetchPage must have matching unpinPage
- Mark dirty: If you modified the page, set dirty=true when unpinning
- Flush before close: deinit() flushes, or call flushAll() explicitly