Refactor `OggStreamReader` and provide precise seek
Current OggStreamReader has several issues:
- We cannot seek the stream at sample-level (closes #16)
-
get_last_absgpprovides insufficient information in some case (closes #84) and may not work for chained streams (closes #87) - Header fields are public, and subject to unwanted change (closes #85)
- Behavior for multiplexed streams is unclear (closes #86)
- ~We cannot obtain the length of stream~ (#95): this is still unimplemented
- It won't drop the samples in the first packet even when needed (which is required by spec)
- It always requires
Seek(Rustaudio/rodio#333) (Well, I know you may disagree about this)
So I implemented a sample-level seek.
Changes in API
-
OggStreamReaderdoes no longer requireio::Seek. Therefore it does not provideseek_absgp_pg. - Instead, new struct
SeekableOggStreamReaderis provided. This struct comes withseek_absgp, where users can seek the reader in sample-level.- I had to keep those two structs separate because we need to do the different initialization steps and fields for seekable and unseekable reader (in specific, record the stream bounds).
-
OggStreamReaderandSeekableOggStreamReaderis dedicated to only a single stream. If you want to parse chained stream, you have to create a new one.- Later, I'm going to implement some function like
next_stream(self) -> Option<OggStreamReader>that consumes the reader, checks if the next packet exists, and returns if exists.
- Later, I'm going to implement some function like
-
ident_hdr,comment_hdr, andsetup_hdris no longer public. Now they can be accessed only immutably through the corresponding getter functions.
Issues & Future Task
One major issue is that the page absgp in real-world ogg/vorbis file often contains errors. Specifically, each of the ogg files I have has 3-20 errors, classified to the following two types:
- When a page ends with a "long" block and the next page starts with "short" block, then the
absgpfor the former page is wrong. In specific, all of the ogg files I have use "long" blocks of length 2048 and "short" blocks of length 512, and the absgp for the former page sometimes differs by 448. This type of error was found in almost all the ogg files I have. - The absgp for the last page is less than the number of total samples obtained from the entire stream, subtracted by the number of samples obtained from the last packet. The spec requires to drop the last few frames from the final packet according to the absgp provided in the page, as quoted below, but it seems that real-word ogg files requires to drop more samples.
A granule position on the final page in a stream that indicates less audio data than the final packet would normally return is used to end the stream on other than even frame boundaries
If the absgp provided in the pages has an error, then the seek result may be wrong, especially for the first type of error mentioned above. I'm now wondering that I've mistook some spec of the Vorbis stream, or this is the bug of many Vorbis encoders (all I have of which is based on libvorbis by the way). Yeah, it's only 448 frames in ~44100 fps data, so the diff is 0.01 seconds, so we may not have to care about that though.
Speaking about the incompatibility with the spec, I also realized that most of the ogg files does not follow the spec that " the second finished audio packet must flush the page on which it appears and the third packet begin a fresh page". I implemented the dropping-leading-samples features based on this fact, but it does not actually run for most of the files. Anyway I followed the another requirement in the spec that "Failure to do so should, at worst, cause a decoder implementation to return incorrect positioning information for seeking operations at the very beginning of the stream". At any rate, most of the ogg files starts with absgp 0, which means that we don't have to drop the initial frames at all.
Here are the other tasks and concerns:
- Only informal test is done yet for the new seeking function. Maybe should I include it in this repository as a test?
- Also, it is not yet tested against chained / multiplexed streams.
- I have not checked the compatibility with
asyncfeature yet. - I didn't encountered an audio packet spanning more than one page, so there may be a bug for that case.
- The code may not compile with MSRV.
And here are the issues I'll handle at last:
- The code may not be well-formatted.
- The commits are not rebased yet.
I'm opening a draft PR since it's quite a large change. As soon as I'm done with other tasks I'll switch. But earlier comments and critics are welcome!
Some things I forgot to write:
- I'm wondering if I should implement
Deref<Target=OggStreamReader>forSeekableOggStreamReader>. It's definitely convenient as it requires less changes on old code, but it is considered as anti-pattern. - This PR is dependent on RustAudio/ogg#22.
@TonalidadeHidrica thanks I'll take a look tomorrow.
@est31 Thanks a lot, but there's no rush!
You can make a trait containing the common methods of OggStreamReader and SeekableOggStreamReader, impl it for OggStreamReader, and then forward the calls into impl YourTrait for SeekableOggStreamReader. That will also make it more convenient for people extending on Lewton.
Is this still being worked on?
The implementations are working correctly in my product so I'm currently waiting for a review. Also, I'm busy recently so if I receive a review, I will not be able to respond instantly.
Alright then it's just up the RustAudio to review this then... which seems to be taking a while, this was last commented on by a org member 3 months ago. Do you already have a branch of ogg with those two PRs merged?
@JackRedstonia better-seek branch in my fork is my state-of-the-art branch which includes every improvements I made :)
based and forked
Just to clarify, the SeekableOggStreamReader::stream_end_pos function returns the byte position of the last logical packet, right? I'm not exactly sure how to go from here to getting the stream duration in granules, as I don't know much about the ogg/vorbis format.
@JackRedstonia Oops, sorry. You're right. stream_end_pos seems to return the final byte position, so I DIDN'T implement it. I'll remove the "fixes" tag. (I guess it is quite easy to implement it though)
Sorry I'm a bit busy atm. I think first the ogg PRs need to be reviewed, then this PR.
On getting the stream length, from my understanding, the steps needed are:
- Seek to the final packet's byte position, which we can obtain using
stream_end_pos - Read that packet
- Get the absolute granule position
- The stream length is the absolute granule position plus the packet's length in granules
Is this correct?
@JackRedstonia one also has to take the first packet's absolute granule pos into account, but other than that, yes it's accurate.
wait no, one doesn't have to take the length of the last packet into account, but the length of the first packet. Absolute granule positions always refer to the last sample offset of the last packet finished on the page.
I don't actually remember the code I wrote few months ago, but now I'm wondering that stream_end_pos really seeks to the beginning of the last packet (or page?) of the current stream? Doesn't it seek to the real "end" of the stream? I've gotta check.
What is the correct way to use SeekableOggStreamReader? I am currently testing the code in this branch. If I call seek_absgp with certain values, inner_mut().read_dec_packet() returns BadAudio(AudioBadFormat). I can seek to small values (0-5000 works), but if I seek to a larger value, I get the error.
@hf29h8sh321 I'm very sorry to leave it undocumented. Since I'm quite busy and sleepy I cannot currently give you a precise example now, but perhaps seeing this file might help you. Sorry for the inconvenienve.
I copied that file into my project, and I cannot get any audio data after seeking. Calls to next only return None.
Hmm, at least it works for almost all of my ogg files, so there should be some difference...
@hf29h8sh321 Did you check out this messy gist of mine? (I'm not confident it would work right now...)
I have narrowed down the issue. I ran the code in the gist, which works with my own files. However, I then modified the code and removed this line: let samples = take_n_samples(&mut reader, 1_500_000, false)?;, and all other lines that depend on that line. Then, check_seek fails with Error: Vorbis bitstream audio decode problem.
It seems like seeking into the middle of the stream without reading it without seeking first causes the error.
I think if you remove first 2000 bytes or so of the audio file. it should work properly. Thats where all the "garbage" data is stored.
@hsi3322 Can you elaborate on what you mean by "garbage data"?
Thanks for keeping this PR alive! This is the single biggest issue of the lewton crate at the moment, I think.
You're welcome! The seeking is in fact crucial part of my project, and the feature has actually been thoroughly tested through my actually using. I'll resolve the conflict when I have time, and of course review is always welcome, in that case I'll respond to it as soon as I can.