quick-xml icon indicating copy to clipboard operation
quick-xml copied to clipboard

async support v2

Open 999eagle opened this issue 3 years ago • 14 comments

As discussed in #314, this is a new attempt to add support for async reading.

This PR is still WIP and open for discussion on implementation details.

999eagle avatar Jul 11 '22 08:07 999eagle

Response to https://github.com/tafia/quick-xml/pull/314#issuecomment-1180124604

Actually, I'll plan to merge similar traits XmlRead and XmlSource into one and use approach with a special intermediate struct SliceReader / IoReader instead of directly implementing XmlSource for BufRead and &[u8]. With that approach it should be possible to have an AsyncSliceReader / AsyncIoReader (which would wrap only a inner reader) for which new traits will be implemented.

So it is good time for you to investigate that surface and how it can be integrated with async support

Mingun avatar Jul 11 '22 09:07 Mingun

Okay, I've moved a lot of stuff around to hopefully separate some things better. I've also implemented async support on the existing Reader. The following things have been changed as of now:

  • The Reader is now generic over the parser in use. This separates namespaced access and non-namespaced access through type checking at compile time.
  • To implement this, there is now a Parser-trait, implemented by a DefaultParser and a NamespacedParser.
  • The Reader is now built via a builder-pattern so no changes to the parser settings (like whether whitespace is trimmed or comments/end tags are checked) are possible after being built. This does change the public API and thus is a breaking change but allows the Parser separation and thus eliminates a lot of duplicate code.
  • There is now an AsyncXmlSource trait that's automatically implemented for everything that implements tokio's AsyncBufRead trait.
  • The Reader now has some read_..._async methods. Using the same names for sync and async methods sadly won't work as that would require to tell Rust that no type can ever implement both XmlSource and AsyncXmlSource (or BufRead and AsyncBufRead) at the same time which is (for now) impossible.
  • There is a new test_async test which is rather bare-bones as of now but I'll work on that.
  • Currently, tests with the serialize feature are broken because I haven't changed the deserialization to work with the new Reader yet.

999eagle avatar Jul 11 '22 17:07 999eagle

1852c70d75fce7831062f22c068379d4b89016ed is a pretty big change which is hard to review. Could you split this commit into several? At least I can suggest extract introducing builder pattern into a separate commit. Also, can you rebase your work and squash up all fixup changes as much as possible before the finishing work (for example, you've changed AsyncXmlSource trait in the commit where you are actually implement it. Actually, it would be better to introduce this trait in the same commit where you implement it).

Also, some changes seems to a little complicated to me. Actually I was thinking about something like (signatures approximate, just to illustrate the idea):

struct SliceReader<'i>(&'i [u8]);
struct IoReader<R: BufRead>(R);
struct AsyncIoReader<R: AsyncBufRead>(R);

/// The existing Reader struct
pub struct Reader<R> {
  reader: R,
  // ...
}

impl<R> Reader<R> {
  // common parsing code, setup code
}

impl<'i> Reader<SliceReader<'i>> {
  pub fn read_event(&mut self) { /* ... */ }
  pub fn read_event_async(&mut self) { /* ... */ }
}

impl<R: BufRead> Reader<IoReader<R>> {
  pub fn read_event_into(&mut self, buf: &mut Vec<u8>) { /* ... */ }
}

impl<R: AsyncBufRead> Reader<AsyncIoReader<R>> {
  pub fn read_event_into_async(&mut self, buf: &mut Vec<u8>) { /* ... */ }
}

////////////////////////////////////////////////////////////////////////////

/// The struct that I would introduce in my namespaces PR
pub struct NsReader<R> {
  reader: Reader<R>,
  // ...
}

impl<R> NsReader<R> {
  // common parsing code, setup code
}

impl<'i> NsReader<SliceReader<'i>> {
  pub fn read_event(&mut self) { /* ... */ }
  pub fn read_event_async(&mut self) { /* ... */ }
}

impl<R: BufRead> NsReader<IoReader<R>> {
  pub fn read_event_into(&mut self, buf: &mut Vec<u8>) { /* ... */ }
}

impl<R: AsyncBufRead> NsReader<AsyncIoReader<R>> {
  pub fn read_event_into_async(&mut self, buf: &mut Vec<u8>) { /* ... */ }
}

Mingun avatar Jul 12 '22 05:07 Mingun

Okay, so you'd prefer having the NsReader<R> wrap the "standard" Reader<R> and then implement the basic parsing functions like that. I think that would work too for async support. Your example code also greatly simplifies reading as the generic bound for the buffer in XmlSource doesn't exist anymore and the different wrapper structs allow for strictly separating sync/async access. With that approach, I think it's better to abandon my current code entirely and start anew. I'll try again from the current master using your approach.

999eagle avatar Jul 13 '22 09:07 999eagle

@Mingun I think I've reimplemented this now the way you suggested and with minimal unrelated changes. The builder pattern can still be added again in a later PR. I also didn't separate the namespaced read methods in this PR but it should be relatively easy to add.

999eagle avatar Jul 13 '22 15:07 999eagle

Codecov Report

Merging #417 (6b35382) into master (a6588c2) will increase coverage by 0.90%. The diff coverage is 73.44%.

@@            Coverage Diff             @@
##           master     #417      +/-   ##
==========================================
+ Coverage   49.58%   50.48%   +0.90%     
==========================================
  Files          22       26       +4     
  Lines       13935    14429     +494     
==========================================
+ Hits         6909     7284     +375     
- Misses       7026     7145     +119     
Flag Coverage Δ
unittests 50.48% <73.44%> (+0.90%) :arrow_up:

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files Coverage Δ
benches/macrobenches.rs 0.00% <0.00%> (ø)
benches/microbenches.rs 0.00% <0.00%> (ø)
examples/custom_entities.rs 0.00% <0.00%> (ø)
examples/read_buffered.rs 0.00% <0.00%> (ø)
examples/read_texts.rs 0.00% <0.00%> (ø)
src/writer.rs 49.01% <0.00%> (-0.28%) :arrow_down:
src/lib.rs 12.74% <16.66%> (+0.41%) :arrow_up:
src/de/mod.rs 77.38% <50.00%> (+0.09%) :arrow_up:
src/reader/async_reader.rs 65.66% <65.66%> (ø)
src/reader/io_reader.rs 77.89% <77.89%> (ø)
... and 6 more

Continue to review full report at Codecov.

Legend - Click here to learn more Δ = absolute <relative> (impact), ø = not affected, ? = missing data Powered by Codecov. Last update a6588c2...6b35382. Read the comment docs.

codecov-commenter avatar Jul 13 '22 17:07 codecov-commenter

Okay, I think I have addressed all your concerns now. Feel free to mark your discussions as resolved where it applies and keep them open otherwise. I've added some docs, updated examples and split the commits as best as I could.

999eagle avatar Jul 14 '22 10:07 999eagle

Rebased onto current master to resolve conflicts.

999eagle avatar Jul 15 '22 11:07 999eagle

@999eagle Apologies for all the breakages!

Thank you for splitting apart the commits - I actually suggest to go a step further and break off anything in this PR which is independent of the fundamental changes (e.g. maybe commits #1 and #4) into a separate PR that can be merged easily. Likewise if there are fundamental changes which aren't tied directly to the async-ones, we can possibly review and merge that separately.

The motivation is just to reduce the amount of the unmerged work and prevent merge conflicts. There are other fundamental changes being worked on simultaneously right now (see https://github.com/tafia/quick-xml/issues/158) and reconciling the massive patch sets together will not be a pleasant experience unless it is done in smaller pieces.

dralley avatar Jul 16 '22 21:07 dralley

Hi @dralley, while those two commits are somewhat independent of the other changes, they don't really do anything useful on their own. The first commit is nothing but a requirement for testing the changes of the third commit, which in turn requires an example for explicitly buffered access for files as most examples just read strings or byte slices and thus don't have buffered reading anymore. In my opinion, it'd make sense to split this PR into two separate PRs at most: Splitting the Reader into the explicit SliceReader and IoReader (that is, commits 1 through 4 from this current PR) for the first PR and all the async-specific implementation in a second PR. If that's what you want to do, tell me and I'll open a new PR for the non-async changes.

999eagle avatar Jul 17 '22 10:07 999eagle

@999eagle Yes, that makes perfect sense, and it would be helpful.

dralley avatar Jul 17 '22 15:07 dralley

Almost done reviewing the first 4 commits (which as you said will be split into the new PR), and it is looking fine so far. I really appreciate all the work you've put in to get this cleaned up.

dralley avatar Jul 17 '22 22:07 dralley

@dralley I've opened #425 for the first few changes. I've also rebased onto the current master. As this PR is now based on #425, it must be merged afterwards.

999eagle avatar Jul 18 '22 06:07 999eagle

This is relevant: https://blog.rust-lang.org/inside-rust/2022/07/27/keyword-generics.html

Anything that comes of that initiative would be months or years down the road, but it might be worthwhile to provide feedback and examples early in the process.

dralley avatar Jul 27 '22 16:07 dralley