"MMAP" System Call for DBMS

MMAP, which stands for Memory Mapping, allows to map files into a process's virtual memory space. It makes it faster to load certain things, as the common data can be kept in primary memory (like RAM), which is significantly faster than, say HDD or SSD. It is typically achieved using the mmap() system call provided by most modern operating systems. How MMAP Works? When a file is memory mapped, the operating system gets a say as to what it wants to load in the memory, and what it wants to keep in the disk. The Operating System does not load the entire file into memory immediately, and instead it's associated with a range of virtual memory addresses. As the program reads or writes to a file which is memory mapped: If the page is already loaded in memory - Success ✅ If not, then a Page Fault occurs, and the control is transferred to the OS page fault handler as: The OS determines which file and offset correspond to the faulting address. It allocates a physical memory page (from RAM) It reads the corresponding file block from disk into RAM Updates the page table to map the virtual page to the newly loaded physical page. Marks it read-only or writable based on the PROT_* flags. It should, however be noted that memory might have Dirty Pages, i.e. pages that already have some changes in them in primary memory, which are not yet updated to the disk. If the OS needs to overwrite any such page, it also would require a flush policy to handle how the data will be flushed back to the disk, before overwriting. C Example for mmap Here's a C Example with detailed comments, that demonstrates: Opening a file Mapping it into memory Reading from the mapped memory Writing to it Flushing changes to disk using msync() Unmapping and closing the file #include #include #include // For open() #include // For mmap() #include // For fstat() #include // For close(), lseek(), write() #include // For memcpy() #define FILENAME "example_mmap.txt" #define FILESIZE 4096 // 4KB int main() { int fd; char *mapped; struct stat st; // Step 1: Open (or create) the file fd = open(FILENAME, O_RDWR | O_CREAT, 0666); if (fd == -1) { perror("open"); return EXIT_FAILURE; } // Step 2: Ensure file is at least FILESIZE long if (ftruncate(fd, FILESIZE) == -1) { perror("ftruncate"); close(fd); return EXIT_FAILURE; } // Step 3: Memory-map the file mapped = mmap(NULL, FILESIZE, PROT_READ | PROT_WRITE, MAP_SHARED, fd, 0); if (mapped == MAP_FAILED) { perror("mmap"); close(fd); return EXIT_FAILURE; } printf("File successfully memory-mapped.\n"); // Step 4: Read the first few bytes (may be zero if file is empty) printf("Initial contents: \"%.32s\"\n", mapped); // Step 5: Write something to the mapped region const char *msg = "Hello via mmap!\n"; memcpy(mapped, msg, strlen(msg)); // Equivalent to writing into memory printf("Modified contents in memory: \"%.32s\"\n", mapped); // Step 6: Flush changes to disk if (msync(mapped, FILESIZE, MS_SYNC) == -1) { perror("msync"); } else { printf("Changes successfully flushed to disk.\n"); } // Step 7: Unmap memory and close the file if (munmap(mapped, FILESIZE) == -1) { perror("munmap"); } close(fd); printf("File unmapped and closed.\n"); return EXIT_SUCCESS; } When to Use MMAP in a DBMS MMAP is particularly beneficial when: Workload is read-heavy. System memory is sufficient to cache working set. Simplicity and low-overhead are priorities. You trust the OS’s page replacement and sync mechanisms. It might be suboptimal for write-heavy, large-scale distributed DBMS systems like PostgreSQL or MySQL, which prefer explicit control over I/O and caching. Limitations and Caveats Less Control: MMAP delegates caching and eviction to the OS, which might not be optimal for DBMS-specific patterns. I/O Patterns: Random writes can generate frequent page faults and dirty pages, leading to thrashing. Consistency and Sync: You must use msync() or mprotect() carefully to ensure consistency and durability (especially for WAL-based systems). Portability: Some platforms or file systems (e.g., networked FS) behave unpredictably with MMAP. Address Space Limits: On 32-bit systems, you’re constrained by virtual address space. Performance Considerations of Using mmap() in DBMS While mmap() can simplify file I/O and offer performance benefits in some cases, it's important to understand the performance trade-offs it introduces when used in database systems: ✅ Potential Advantages: Zero-Copy I/O: Since mmap() maps a file directly into the process's address space, the kernel can avoid explicit copies between user and kernel

Apr 13, 2025 - 09:19
 0
"MMAP" System Call for DBMS

MMAP, which stands for Memory Mapping, allows to map files into a process's virtual memory space. It makes it faster to load certain things, as the common data can be kept in primary memory (like RAM), which is significantly faster than, say HDD or SSD.

It is typically achieved using the mmap() system call provided by most modern operating systems.

How MMAP Works?

When a file is memory mapped, the operating system gets a say as to what it wants to load in the memory, and what it wants to keep in the disk.

The Operating System does not load the entire file into memory immediately, and instead it's associated with a range of virtual memory addresses.

As the program reads or writes to a file which is memory mapped:

  • If the page is already loaded in memory - Success ✅
  • If not, then a Page Fault occurs, and the control is transferred to the OS page fault handler as:
    1. The OS determines which file and offset correspond to the faulting address.
    2. It allocates a physical memory page (from RAM)
    3. It reads the corresponding file block from disk into RAM
    4. Updates the page table to map the virtual page to the newly loaded physical page.
    5. Marks it read-only or writable based on the PROT_* flags.

It should, however be noted that memory might have Dirty Pages, i.e. pages that already have some changes in them in primary memory, which are not yet updated to the disk. If the OS needs to overwrite any such page, it also would require a flush policy to handle how the data will be flushed back to the disk, before overwriting.

MMAP Diagram

C Example for mmap

Here's a C Example with detailed comments, that demonstrates:

  • Opening a file

  • Mapping it into memory

  • Reading from the mapped memory

  • Writing to it

  • Flushing changes to disk using msync()

  • Unmapping and closing the file

#include 
#include 
#include       // For open()
#include    // For mmap()
#include    // For fstat()
#include      // For close(), lseek(), write()
#include      // For memcpy()

#define FILENAME "example_mmap.txt"
#define FILESIZE 4096  // 4KB

int main() {
    int fd;
    char *mapped;
    struct stat st;

    // Step 1: Open (or create) the file
    fd = open(FILENAME, O_RDWR | O_CREAT, 0666);
    if (fd == -1) {
        perror("open");
        return EXIT_FAILURE;
    }

    // Step 2: Ensure file is at least FILESIZE long
    if (ftruncate(fd, FILESIZE) == -1) {
        perror("ftruncate");
        close(fd);
        return EXIT_FAILURE;
    }

    // Step 3: Memory-map the file
    mapped = mmap(NULL, FILESIZE, PROT_READ | PROT_WRITE, MAP_SHARED, fd, 0);
    if (mapped == MAP_FAILED) {
        perror("mmap");
        close(fd);
        return EXIT_FAILURE;
    }

    printf("File successfully memory-mapped.\n");

    // Step 4: Read the first few bytes (may be zero if file is empty)
    printf("Initial contents: \"%.32s\"\n", mapped);

    // Step 5: Write something to the mapped region
    const char *msg = "Hello via mmap!\n";
    memcpy(mapped, msg, strlen(msg));  // Equivalent to writing into memory

    printf("Modified contents in memory: \"%.32s\"\n", mapped);

    // Step 6: Flush changes to disk
    if (msync(mapped, FILESIZE, MS_SYNC) == -1) {
        perror("msync");
    } else {
        printf("Changes successfully flushed to disk.\n");
    }

    // Step 7: Unmap memory and close the file
    if (munmap(mapped, FILESIZE) == -1) {
        perror("munmap");
    }

    close(fd);
    printf("File unmapped and closed.\n");

    return EXIT_SUCCESS;
}

When to Use MMAP in a DBMS

MMAP is particularly beneficial when:

  • Workload is read-heavy.
  • System memory is sufficient to cache working set.
  • Simplicity and low-overhead are priorities.
  • You trust the OS’s page replacement and sync mechanisms.

It might be suboptimal for write-heavy, large-scale distributed DBMS systems like PostgreSQL or MySQL, which prefer explicit control over I/O and caching.

Limitations and Caveats

Less Control:

MMAP delegates caching and eviction to the OS, which might not be optimal for DBMS-specific patterns.

I/O Patterns:

Random writes can generate frequent page faults and dirty pages, leading to thrashing.

Consistency and Sync:

You must use msync() or mprotect() carefully to ensure consistency and durability (especially for WAL-based systems).

Portability:

Some platforms or file systems (e.g., networked FS) behave unpredictably with MMAP.

Address Space Limits:

On 32-bit systems, you’re constrained by virtual address space.

Performance Considerations of Using mmap() in DBMS

While mmap() can simplify file I/O and offer performance benefits in some cases, it's important to understand the performance trade-offs it introduces when used in database systems:

✅ Potential Advantages:

  • Zero-Copy I/O: Since mmap() maps a file directly into the process's address space, the kernel can avoid explicit copies between user and kernel space, improving I/O efficiency.
  • On-Demand Paging: The operating system loads only the necessary pages into memory, which can save memory if the entire file isn't accessed.
  • Automatic Page Caching: OS-level page caching can result in fast access times for frequently read data, without the database developer needing to implement a custom caching layer.

⚠️ Potential Pitfalls

  • Page Fault Overhead: Every access to a not-yet-loaded page triggers a page fault, which involves a context switch and can be expensive if not handled efficiently.
  • Less Control Over I/O Scheduling: Unlike pread()/pwrite() or custom read-ahead strategies, mmap() offloads much of the paging and flushing behavior to the kernel. This can lead to unpredictable latencies, especially under memory pressure.
  • Writeback Latency: Dirty pages in memory must be flushed to disk. If the OS's flushing (e.g., via msync() or madvise()) is not carefully managed, write operations can block for long periods, introducing latency spikes.
  • Interaction with Virtual Memory Limits: On 32-bit systems or environments with limited address space, mapping large files can exhaust virtual memory quickly, causing issues even if physical RAM is available.