zig std.io: add BufferedReader.peek

Fixes #4501.

Mar 14 '24 21:03 pierrec

After using this API a bit, I think the following would be better as it avoids copying:

/// Peeks the next `n` bytes without advancing the reader.
/// The bytes are valid until the next read call.
/// If the number of requested bytes is larger than the underlying buffer,
/// `buffer_size`bytes are returned.
pub fn peek(self: *Self, n: usize) Error![]const u8

 /// Discards the next `n` bytes and returns the number of discarded bytes.
pub fn discard(self: *Self, n: usize) Error!usize

Mar 18 '24 08:03 pierrec

Why do you want it to be possible to provide a buffer that is too small?

Related, I have this sketch for a non-generic BufferedReader:

unbuffered_reader: io.AnyReader,
buffer: []u8,
start: usize,
end: usize,

pub fn init(unbuffered_reader: io.AnyReader, buffer: []u8) BufferedReader {
    return .{
        .unbuffered_reader = unbuffered_reader,
        .buffer = buffer,
        .start = 0,
        .end = 0,
    };
}

/// If the amount requested can be fulfilled from already-buffered data, no
/// underlying read occurs, however, if the amount requested exceeds the amount
/// of buffered data, an underlying read occurs.
///
/// Calling this function will cause at most one underlying read call.
pub fn readv(br: *BufferedReader, vecs: []const io.Vec) anyerror!usize {
    var total_read_len: usize = 0;
    var vec_i: usize = 0;
    while (vec_i < vecs.len) : (vec_i += 1) {
        var vec = vecs[vec_i];
        while (vec.len > 0) {
            if (br.end == br.start) {
                // Caller wants more data but we have none buffered.
                // If the caller has only one vector remaining, then we'll pass
                // it along with the main buffer to underlying read.
                // Otherwise we'll pass the rest of the caller's vectors directly
                // to the underlying reader with one tweak: if the last vector
                // is smaller than the main buffer, then we swap it with the main
                // buffer instead.
                const remaining_vecs = vecs[vec_i..];
                switch (remaining_vecs.len) {
                    0 => unreachable, // vec.len > 0 above
                    1 => {
                        var my_vecs: [2]io.Vec = .{
                            vec,
                            .{
                                .ptr = br.buffer.ptr,
                                .len = br.buffer.len,
                            },
                        };
                        const n = try br.unbuffered_reader.readv(&my_vecs);
                        if (n <= vec.len) {
                            total_read_len += n;
                            return total_read_len;
                        }
                        br.start = 0;
                        br.end = n - vec.len;
                        total_read_len += vec.len;
                        return total_read_len;
                    },
                    else => {
                        const last = remaining_vecs[remaining_vecs.len - 1];
                        if (last.len < br.buffer.len) {
                            const first = remaining_vecs[0];
                            defer {
                                remaining_vecs[0] = first;
                                remaining_vecs[remaining_vecs.len - 1] = last;
                            }
                            remaining_vecs[0] = vec;
                            remaining_vecs[remaining_vecs.len - 1] = .{
                                .ptr = br.buffer.ptr,
                                .len = br.buffer.len,
                            };
                            var n = try br.unbuffered_reader.readv(remaining_vecs);
                            for (remaining_vecs[0 .. remaining_vecs.len - 1]) |v| {
                                if (n >= v.len) {
                                    n -= v.len;
                                    total_read_len += v.len;
                                    continue;
                                }
                                total_read_len += n;
                                return total_read_len;
                            }
                            const copy_len = @min(last.len, n);
                            @memcpy(last.ptr[0..copy_len], br.buffer[0..n]);
                            total_read_len += copy_len;
                            br.start = copy_len;
                            br.end = n;
                            return total_read_len;
                        }
                        total_read_len += try br.unbuffered_reader.readv(remaining_vecs);
                        return total_read_len;
                    },
                }
            }
            const copy_len = @min(vec.len, br.end - br.start);
            @memcpy(vec.ptr[0..copy_len], br.buffer[br.start..][0..copy_len]);
            vec.ptr += copy_len;
            vec.len -= copy_len;
            total_read_len += copy_len;
            br.start += copy_len;
        }
    }
    return total_read_len;
}

pub fn reader(br: *BufferedReader) io.AnyReader {
    return .{
        .context = br,
        .readv = readv,
    };
}

const std = @import("../std.zig");
const io = std.io;
const BufferedReader = @This();

With this code, the buffer is provided by the user rather than being part of the type. Under these circumstances, I can't imagine the programmer passing a buffer smaller than the one they will use for peek(). Is that the case in your project that you've been testing this API with?

Mar 19 '24 02:03 andrewrk

Why do you want it to be possible to provide a buffer that is too small?

...

With this code, the buffer is provided by the user rather than being part of the type. Under these circumstances, I can't imagine the programmer passing a buffer smaller than the one they will use for peek(). Is that the case in your project that you've been testing this API with?

I totally agree, I am only thinking about programmer errors. Maybe an assert would be the best solution.

Will update the PR with the API I currently use, the one proposed above and fix the comments.

Mar 19 '24 08:03 pierrec

I think this function should additionally assert that dest.len is <= buf.len (and document that assertion).

How about a more convenient API instead?

pub const PeekError = Error || error{EndOfStream};

/// Returns the next `n` bytes without advancing.
///
/// The returned slice is a subslice of the buffer, which may or may not start
/// from the beginning. The length of the returned slice is always `n`.
///
/// Asserts that the number of requested bytes is less than or equal to the
/// buffer size.
pub fn peek(self: *Self, n: usize) PeekError![]u8 {
    // ...
}

Missed this comment... But came to the same conclusion. PR updated accordingly.

Mar 22 '24 11:03 pierrec