libs-team icon indicating copy to clipboard operation
libs-team copied to clipboard

Add `std::io::Seek` instance for `std::io::Take<T>` when `T: Seek`

Open melrief opened this issue 11 months ago • 4 comments

Proposal

Problem statement

std::io::Read::take currently returns a new instance of Read but not Seek regardless of whether self implements Seek or not. This results in code that may work with self but not with the instance returned by .take because the former implements Seek while the latter doesn't. Moreover there is not reason for this to be the case; Take is just a wrapper around self that can read at most limit bytes. If self implements Seek then Take can also naturally implement Seek.

This proposal is to add a Seek instance for Take.

Motivating examples or use cases

For example, binary files parsers could benefits from this feature. Let's say that we have file format that is a list of records where each record is composed of the length of the record followed by the bytes of the record. It could be useful to read the length of the record and then use .take to delimit the bytes that the record parser should process. This removes the need to deal with trailing bytes from the record parser resulting in simpler code.

Now let's say that the parser needs to skip certain parts of the record because e.g. obsolete or not needed. seek is a natural fit for this task but currently Take doesn't support it even if File supports seek. The parser would have to read the bytes that it wants to skip and write them to a noop sink.

There is also a reusability reason: any generic code that works with bytes sequences should work with the result of .take too. For instance, let's write the interface of the parser above in a generic way:

fn record_parser<I: Read + Seek>(i: I) -> Result<Record> { ... }

this function should work for Take if it wraps an instance of Seek.

Solution sketch

seek can be expensive and therefore a requirement for the solution is that a call to Take.seek results in a single call to seek on the inner object of Take.

Take currently keeps track only of the remaining limit bytes that can be read from the inner object. This info alone is not enough to implement seek because e.g. we don't know when the Take "started" so seek(SeekFrom::Start) cannot be computed. This solution proposes to keep track of the initial limit of the Take in a new field called len. This can then be used to compute the position of the cursor inside the Takeas len - limit. The reason to not track position directly is to avoid adding one more mutating variable that changes together with limit. len is constant (unless set_limit is called).

Once we know the position of the cursor inside Take, we can compute the offset from the current position based on the argument of seek. If the argument is SeekFrom::Start then we seek from -position, if the argument is SeekFrom::End then we seek from position+limit, if the argument is SeekFrom::Current then we seek from position.

API

Add a position method that returns the position within the Take

impl Take {
    fn position(&self) -> u64 {
       self.len - self.limit
    }
}

Add the Seekimplementation for Take

impl<T: Seek> Seek for Take<T> {
  ...
}

Alternatives

The solution proposed in 97227 keeps track of the position instead of the original len. It is equivalent to the solution proposed here. The reason why I prefer the above solution is because it's simpler to maintain given that len never changes while cursor in 97227 must be updated together with limit leading to code like:

        self.cursor += amt as u64;
        self.limit -= amt as u64;

Links and related work

  1. 97227
  2. My implementation of the solution proposed here PR #138023

melrief avatar Mar 05 '25 02:03 melrief

Could you add a code block with the proposed new API to the solution sketch? Easier to read at a glance. Also include the position function you have in the implementation PR if that makes sense with this API.

tgross35 avatar Mar 05 '25 02:03 tgross35

I think it would be useful to prevent seeking to before the initial position, so Take essentially just lets you see a defined range of the wrapped file-like thing.

programmerjake avatar Mar 05 '25 03:03 programmerjake

@programmerjake agreed, seeking before the initial position should return an error as per Seek spec. The reason is that negative offset is not supported an any offset before the initial position is negative relative to the Take.

melrief avatar Mar 05 '25 07:03 melrief

@tgross35 I added a subsection API with the new additions.

melrief avatar Mar 05 '25 07:03 melrief

We talked about this in today's @rust-lang/libs-api meeting, and we agreed to approve this.

joshtriplett avatar Apr 15 '25 18:04 joshtriplett

The PR got merged. Should I close this now?

melrief avatar May 19 '25 23:05 melrief

there's some minor cleanup that someone with permissions (@tgross35?) needs to do (https://github.com/rust-lang/rust/issues/97227#issuecomment-2892560898 and https://github.com/rust-lang/rust/pull/138023#issuecomment-2892555305), but yeah afaik you can close this issue.

programmerjake avatar May 20 '25 00:05 programmerjake

there's some minor cleanup that someone with permissions (@tgross35?) needs to do (rust-lang/rust#97227 (comment) and rust-lang/rust#138023 (comment)), but yeah afaik you can close this issue.

Thanks for pointing this out, updated.

tgross35 avatar May 20 '25 19:05 tgross35