libbloom icon indicating copy to clipboard operation
libbloom copied to clipboard

Bloom save/load routines fail for large filters due to missing write/read loops`

Open bstephen2 opened this issue 2 months ago • 0 comments

Title

Bloom save/load routines fail for large filters due to missing write/read loops

Description

When saving large Bloom filters, the library’s write and read calls assume that the entire buffer will be processed in one call. On larger filters this assumption fails:

  • On save: write returns fewer bytes than requested. The routine checks the return value, sees a mismatch, and reports a generic error (code 1). The file is created and appears full‑size, but is actually incomplete.
  • On load: read also returns fewer bytes than expected. The routine detects the mismatch and reports error 11 (“bytes read mismatch”).

This sequence makes it look like the file was saved correctly, but reload fails because the data is truncated.

Steps to Reproduce

  1. Create a Bloom filter with a large number of entries (e.g. millions).
  2. Save the filter using the current library routines.
  3. Observe error code 1 on save.
  4. Reload the filter.
  5. Observe error code 11 due to incomplete read.

Diagnosis

The save/load routines call write(fd, buf, len) and read(fd, buf, len) once and assume success. POSIX guarantees only that up to len bytes are processed; partial I/O is normal. The return type ssize_t is correct, but the code does not loop until all bytes are handled.

Suggested Fix

Wrap the I/O in loops that retry until all bytes are written/read. For example:

ssize_t full_write(int fd, const void *buf, size_t count) {
    size_t total = 0;
    const char *p = buf;
    while (total < count) {
        ssize_t rc = write(fd, p + total, count - total);
        if (rc < 0) { if (errno == EINTR) continue; return -1; }
        if (rc == 0) break;
        total += rc;
    }
    return total;
}

ssize_t full_read(int fd, void *buf, size_t count) {
    size_t total = 0;
    char *p = buf;
    while (total < count) {
        ssize_t rc = read(fd, p + total, count - total);
        if (rc < 0) { if (errno == EINTR) continue; return -1; }
        if (rc == 0) break;
        total += rc;
    }
    return total;
}

Then update the Bloom save/load routines to use these helpers and verify that the total equals the intended length.

Impact

This fix ensures Bloom filters of any size are saved and loaded reliably, eliminating both the initial save error (code 1) and the subsequent load error (code 11).


bstephen2 avatar Dec 04 '25 15:12 bstephen2