Virtual File System (VFS)
What It Is
The Virtual File System is an abstraction layer over file I/O operations. Instead of calling OS functions directly, all file operations go through a VFS interface.
Why We Need It
1. Testability
With a VFS, we can create an in-memory implementation for testing:
Production: Testing:
┌─────────────┐ ┌─────────────┐
│ B+Tree │ │ B+Tree │
└──────┬──────┘ └──────┬──────┘
│ │
▼ ▼
┌─────────────┐ ┌─────────────┐
│ PosixVfs │ │ MemoryVfs │
└──────┬──────┘ └──────┬──────┘
│ │
▼ ▼
┌─────────────┐ ┌─────────────┐
│ Disk I/O │ │ HashMap │
└─────────────┘ └─────────────┘
Tests run instantly because there's no actual disk I/O.
2. Portability
Different operating systems have different file APIs:
- Linux:
pread,pwrite,fsync - Windows:
ReadFile,WriteFile,FlushFileBuffers - Embedded: Custom flash drivers
The VFS hides these differences. We implement one VFS per platform, and the rest of the database doesn't care.
3. Injection of Failures
For testing crash recovery, we need to simulate failures:
const FaultyVfs = struct {
inner: Vfs,
fail_after_n_writes: usize,
write_count: usize,
pub fn write(self: *Self, offset: u64, data: []const u8) !void {
self.write_count += 1;
if (self.write_count > self.fail_after_n_writes) {
return error.SimulatedCrash;
}
return self.inner.write(offset, data);
}
};
The Interface
File Operations
pub const File = struct {
ptr: *anyopaque,
vtable: *const VTable,
pub const VTable = struct {
read: fn(ptr, offset: u64, buf: []u8) Error!usize,
write: fn(ptr, offset: u64, data: []const u8) Error!void,
sync: fn(ptr) Error!void,
size: fn(ptr) Error!u64,
close: fn(ptr) void,
truncate: fn(ptr, size: u64) Error!void,
};
// Convenience methods that call vtable
pub fn read(self: File, offset: u64, buf: []u8) !usize {
return self.vtable.read(self.ptr, offset, buf);
}
// ... etc
};
VFS Operations
pub const Vfs = struct {
ptr: *anyopaque,
vtable: *const VTable,
pub const VTable = struct {
open: fn(ptr, path: []const u8, flags: OpenFlags) Error!File,
delete: fn(ptr, path: []const u8) Error!void,
exists: fn(ptr, path: []const u8) bool,
};
};
Open Flags
pub const OpenFlags = struct {
read: bool = true, // Open for reading
write: bool = false, // Open for writing
create: bool = false, // Create if doesn't exist
exclusive: bool = false, // Fail if exists (with create)
};
Key Operations
Positioned Read/Write
Unlike streaming I/O, we use positioned reads and writes:
// Read 4096 bytes starting at byte offset 8192
const bytes_read = try file.read(8192, buffer[0..4096]);
// Write 4096 bytes starting at byte offset 8192
try file.write(8192, data[0..4096]);
This maps directly to page-based access:
- Page 0 is at offset 0
- Page 1 is at offset 4096
- Page N is at offset N * 4096
Sync (fsync)
The most important operation for durability:
try file.sync();
This forces all pending writes to physical storage. Without it, data might sit in OS buffers and be lost on power failure.
Application writes: After fsync:
┌─────────────┐ ┌─────────────┐
│ Application │ │ Application │
└──────┬──────┘ └─────────────┘
│ write()
▼
┌─────────────┐ ┌─────────────┐
│ OS Buffer │ │ OS Buffer │
│ (in RAM) │ ◄── Data │ (empty) │
└─────────────┘ sits └──────┬──────┘
╳ here! │ fsync forces
Power │ write to disk
failure ▼
loses it! ┌─────────────────────┐
│ Disk (persistent) │
│ Data is SAFE │
└─────────────────────┘
The POSIX Implementation
For Unix-like systems (Linux, macOS):
pub const PosixVfs = struct {
allocator: Allocator,
pub fn open(self: *Self, path: []const u8, flags: OpenFlags) !File {
var posix_flags: u32 = 0;
if (flags.read and flags.write) {
posix_flags |= O_RDWR;
} else if (flags.write) {
posix_flags |= O_WRONLY;
} else {
posix_flags |= O_RDONLY;
}
if (flags.create) posix_flags |= O_CREAT;
if (flags.exclusive) posix_flags |= O_EXCL;
const fd = try std.posix.open(path, posix_flags, 0o644);
// ... wrap in File struct
}
};
Read Implementation
Uses pread for positioned reading (thread-safe, no seeking):
fn read(self: *PosixFile, offset: u64, buf: []u8) !usize {
return std.posix.pread(self.fd, buf, offset);
}
Write Implementation
Uses pwrite for positioned writing:
fn write(self: *PosixFile, offset: u64, data: []const u8) !void {
var written: usize = 0;
while (written < data.len) {
const n = try std.posix.pwrite(self.fd, data[written..], offset + written);
if (n == 0) return error.DiskFull;
written += n;
}
}
Sync Implementation
fn sync(self: *PosixFile) !void {
try std.posix.fsync(self.fd);
}
Error Handling
VFS operations can fail in various ways:
pub const VfsError = error{
FileNotFound, // Path doesn't exist
PermissionDenied, // No access rights
DiskFull, // No space left
IoError, // Hardware failure
FileLocked, // Another process has it
InvalidPath, // Bad path string
AlreadyExists, // Exclusive create failed
};
These are translated from OS-specific error codes:
fn mapPosixError(err: std.posix.Error) VfsError {
return switch (err) {
.ENOENT => VfsError.FileNotFound,
.EACCES, .EPERM => VfsError.PermissionDenied,
.ENOSPC => VfsError.DiskFull,
.EIO => VfsError.IoError,
// ...
};
}
Usage Pattern
// Create VFS
var posix_vfs = PosixVfs.init(allocator);
const vfs = posix_vfs.vfs(); // Get interface
// Open file
const file = try vfs.open("database.db", .{
.read = true,
.write = true,
.create = true,
});
defer file.close();
// Read page 5
var buf: [4096]u8 = undefined;
_ = try file.read(5 * 4096, &buf);
// Modify and write back
buf[0] = 42;
try file.write(5 * 4096, &buf);
// Ensure durability
try file.sync();
Why Not Just Use std.fs?
Zig's standard library has file operations, but:
- No positioned I/O -
std.fs.Fileuses streaming with seek, which isn't thread-safe - No vtable pattern - Can't swap implementations
- Different error types - We want database-specific errors
The VFS is a thin layer, but it gives us control where we need it.