engine icon indicating copy to clipboard operation
engine copied to clipboard

Collapse bounds calculations into DisplayListBuilder

Open ColdPaleLight opened this issue 3 years ago • 24 comments

fix https://github.com/flutter/flutter/issues/100172

This PR does not change the API and semantics, the tests already exist https://github.com/flutter/engine/blob/31afa38d95f5e67af4720af44052fbb873d240df/display_list/display_list_unittests.cc#L1104

Pre-launch Checklist

  • [x] I read the [Contributor Guide] and followed the process outlined there for submitting PRs.
  • [x] I read the [Tree Hygiene] wiki page, which explains my responsibilities.
  • [x] I read and followed the [Flutter Style Guide] and the [C++, Objective-C, Java style guides].
  • [x] I listed at least one issue that this PR fixes in the description above.
  • [ ] I added new tests to check the change I am making or feature I am adding, or Hixie said the PR is test-exempt. See [testing the engine] for instructions on writing and running engine tests.
  • [ ] I updated/added relevant documentation (doc comments with ///).
  • [x] I signed the [CLA].
  • [x] All existing and new tests are passing.

ColdPaleLight avatar Jun 29 '22 07:06 ColdPaleLight

It looks like this pull request may not have tests. Please make sure to add tests before merging. If you need an exemption to this rule, contact Hixie on the #hackers channel in Chat (don't just cc him here, he won't see it! He's on Discord!).

If you are not sure if you need tests, consider this rule of thumb: the purpose of a test is to make sure someone doesn't accidentally revert the fix. Ask yourself, is there anything in your PR that you feel it is important we not accidentally revert back to how it was before your fix?

Reviewers: Read the Tree Hygiene page and make sure this patch meets those guidelines before LGTMing.

flutter-dashboard[bot] avatar Jun 29 '22 07:06 flutter-dashboard[bot]

I should clarify in the original issue. The concept here was not to just eliminate an extra Dispatch process, but to integrate the 2 mechanisms so that they could share information when needed. I've updated the original bug description to explain more of the inefficiencies that we want to avoid with this effort.

flar avatar Jul 01 '22 17:07 flar

Unfortunately we're going to need the Calculator class for just a little while longer while we solve a pressing need to get more coverage for Impeller. The PlatformViews will need to move over to DisplayList and to do that they need to produce an RTree from a DisplayList.

To get that work done quickly I reused the Calculator class but factored out the BoundsAccumulator into Rect and RTree subclasses. By default, a DisplayList will only produce the regular bounds, but a caller can ask it to produce an RTree. I hope to have the PR submitted by tonight or tomorrow, but am running into problems with code sharing between flow and display_list that I'll likely solve by just cut/pasting code.

Once that is in we'll have to figure out how we can inline and make a good hybrid implementation. We could potentially just RTree every DisplayList, but I'd like to see the overhead of that. It would still plug in just as the current stuff does in this PR since the RTree is a small variant on the existing BoundsCalculator, but the question will be whether we try to do both in the inlining or not.

Another idea would be for the accumulator to save the bounds of each primitive as it goes and then making an RTree would be a simple act of handing the entire list to the RTree generator without having to dispatch. But that will be a slight change to how the existing BoundsAccumulator works anyway. I'll get to work on creating this RTree PR so you can see better what I'm talking about. The goal is to have PlatformViews working on Impeller within the next week or so, so it is fairly high priority work and hopefully doesn't really modify what has happened in this PR too much (the Calculator dispatch methods are largely untouched and both variants of Accumulator still support accumulate(rect)).

flar avatar Jul 21 '22 06:07 flar

I tagged you in https://github.com/flutter/engine/pull/34809 so that we can plan the more forward on how to integrate bounds calculation while also considering the need for an RTree.

flar avatar Jul 21 '22 07:07 flar

I tagged you in #34809 so that we can plan the more forward on how to integrate bounds calculation while also considering the need for an RTree.

Very excited to see RTree for DisplayList! ~About this PR, Maybe we can add a flag to the constructor of DisplayListBuilder to control whether RTree needs to be generated? If this flag is true, we will generate RTree while calculating bounds. If flag is false, we will only calculate bounds and will not generate RTree. WDYT?~

I thought about it again, I think a better solution is :

  1. Record the rects_ needed to generate the RTree through the RTreeBoundsAccumulator in the DisplayListBuilder.
  2. Then pass the ownership of the rects_ to the DisplayList in the DisplayListBuilder::Build method.
  3. Finally generate the RTree based on rects_ when the DisplayList::rtree( ) is called.

ColdPaleLight avatar Jul 21 '22 07:07 ColdPaleLight

I tagged you in #34809 so that we can plan the more forward on how to integrate bounds calculation while also considering the need for an RTree.

Very excited to see RTree for DisplayList! ~About this PR, Maybe we can add a flag to the constructor of DisplayListBuilder to control whether RTree needs to be generated? If this flag is true, we will generate RTree while calculating bounds. If flag is false, we will only calculate bounds and will not generate RTree. WDYT?~

I thought about it again, I think a better solution is :

  1. Record the rects_ needed to generate the RTree through the RTreeBoundsAccumulator in the DisplayListBuilder.
  2. Then pass the ownership of the rects_ to the DisplayList in the DisplayListBuilder::Build method.
  3. Finally generate the RTree based on rects_ when the DisplayList::rtree( ) is called.

That's what I was thinking with my comment about saving the bounds in my earlier comment. The pros are:

  • already done the work to generate the RTree when it is needed.
  • we have the bounds of each primitive saved somewhere where we can consult them for fast(er) culling

Cons:

  • memory usage of 16 bytes per rendering op
    • we could dump the list of rects after we generate the RTree, if we could use the RTree for culling
    • SkPicture was already generating RTrees when we switched to DisplayList, so we've already encountered the time and memory that would be used here

flar avatar Jul 21 '22 17:07 flar

@flar When I tried to integrate ClipBoundsDispatchHelper into DisplayListBuider, I ran into some problems. Can you enlighten me? Thanks you.

  1. If https://github.com/flutter/engine/pull/34835 landed, Then we will find that params is_aa in ClipBoundsDispatchHelper::clipRect will affect clip bounds, but it is ignored in DisplayListBuider::clipRect. Which one is correct? c.f. https://github.com/flutter/engine/blob/main/display_list/display_list_utils.cc#L216-L218
void ClipBoundsDispatchHelper::intersect(const SkRect& rect, bool is_aa) {
  SkRect devClipBounds = matrix().mapRect(rect);
  if (is_aa) {
    devClipBounds.roundOut(&devClipBounds);
  }
  ......
}
  1. has_clip_ of ClipBoundsDispatchHelper looks like always be true since param cull_rect in constructor will not be nullptr. (Because DisplayList::bounds_cull_ will always has value). So can we delete all the code which in branch that has_clip is false? c.f. https://github.com/flutter/engine/blob/main/display_list/display_list_utils.h#L314
  explicit ClipBoundsDispatchHelper(const SkRect* cull_rect)
      : has_clip_(cull_rect),
        bounds_(cull_rect && !cull_rect->isEmpty() ? *cull_rect
                                                   : SkRect::MakeEmpty()) {}
  1. DisplayListBoundsCalculator::saveLayer will affect clip bounds, but DisplayListBuider::saveLayer won't. Should we make DisplayListBuilder::saveLayer also affect clip bounds? c.f. https://github.com/flutter/engine/blob/main/display_list/display_list_utils.cc#L454-L459
  // Even though Skia claims that the bounds are only a hint, they actually
  // use them as the temporary layer bounds during rendering the layer, so
  // we set them as if a clip operation were performed.
  if (bounds) {
    clipRect(*bounds, SkClipOp::kIntersect, false);
  }

ColdPaleLight avatar Jul 22 '22 11:07 ColdPaleLight

Those are all excellent questions and the more I look at what is being done in both cases I realize that we might not have a good answer for either. Let me compose my thoughts a bit more before I try to come up with a good description of the concerns, but mainly the issues are:

  • Some operations imply a specific rasterization anomaly that might affect the bounds
    • For example, clipping is not applied geometrically, but is instead a pixel operation. Do we account for that pixelization in the bounds reported via either mechanism or both?
    • The bounds of a saveLayer will not guarantee clipping, but in general that happens in the most common cases. They are a suggestion for the bounds of the virtual layer, but if we are under a transform, the bounds may transform into a diamond and drawing anywhere in the diamond will produce output. This means that you can't rely on them to clip out rendering outside of that local-space box, but it does have an impact on the bounds of the content. The clipRect call used in one of the cases is an attempt to deal with the fact that while you can't rely on every pixel outside the layer bounds to be eliminated, in general transformed pixels that lie outside its transformed bounds will not be rendered.
  • What transform are we being rendered under?
    • These operations are all being recorded. Once recorded they can be rendered in a context with any arbitrary transform.
    • If we adjust bounds to match pixel expectations per the previous top-level bullet item, how big is "a pixel" really when we don't know the final transform?
    • The calculations were an attempt to at least try to answer "what are the bounds in the destination coordinate space of this DL" but may not have gotten it right in all cases
    • This isn't a theoretical issue. Every ui.Picture recorded by the framework is recorded as if it will be played back in an Identity coordinate system, but every scene starts with a global scale by the DPI of the destination. So, all ui.Picture DisplayLists within the scene will minimally be scaled by the DPI.

flar avatar Jul 23 '22 19:07 flar

I haven't read the latest commits in detail, but it looks like you went the route of saving all of the rectangles during construction and then lazily turning them into an RTree when needed (without a second Dispatch cycle).

Did you measure the time that costs compared to:

  • the old system of constructing the records without bounds
  • the old system including lazily computing the bounds with a second dispatch cycle
  • the itermediate system you had of computing the bounds within the builder cycle

It would be nice if "DisplayList + realized bounds (but not realized RTree object)" was similar in performance to the previous "DisplayList construction + lazy bounds calculation" and even better if it was even faster than that - though I don't imagine it would be as fast as the prior "construction without bounds" - but it would be nice to know what the performance difference was even though we almost always ask for the bounds.

One way to test this would be to capture some complicated real-app scenes using the built-in flattening code that is used to turn the entire tree into a single DL (with nested DLs), and then turn that into a benchmark that rebuilds it under the different scenarios above. Or, we could just come up with a synthetic scene, but it would be nice if it reflected a variety of real-life app scenes.

Note that having such a benchmark would be good for us moving forward so that we could measure the impact of each of our changes to DL. We've gone long enough based on the mysterious art of "I don't think what I just modified will cost us much time in DL construction" and it is time we started measuring these things more explicitly.

flar avatar Jul 25 '22 05:07 flar

I could see a DL construction benchmark as either something driven from Dart or something driven from the native APIs directly. There are pros and cons for either:

Pros for Dart:

  • Framework engineers could add benchmarks for things they may have had perf issues with
  • It measures the native interface as well as the native DL code
  • Would this be in the framework benchmarks or in the engine benchmarks? Would it trigger regression notifications on every engine commit or only when we roll into framework?

Pros for native:

  • More focused on the native code that we are modifying a lot (the Dart interface never changes)
  • Might have less variance with fewer mechanisms to the benchmark
  • Would be more likely to trigger regression notifications closer to when the damage is done since it would be executed for every engine commit before rolling into the framework

flar avatar Jul 25 '22 05:07 flar

Created https://github.com/flutter/flutter/issues/108265 to track what might turn into a suite of benchmarks for DL (for this PR it would be nice to at least have construction/bounds/rtree benchmarks).

flar avatar Jul 25 '22 05:07 flar

One way to test this would be to capture some complicated real-app scenes using the built-in flattening code that is used to turn the entire tree into a single DL (with nested DLs), and then turn that into a benchmark that rebuilds it under the different scenarios above. Or, we could just come up with a synthetic scene, but it would be nice if it reflected a variety of real-life app scenes.

Creating a benchmark is indeed a great idea! But I'm a little confused about how to create a scene for testing. As far as I know, display list does not support serialization,https://github.com/flutter/flutter/issues/86715. So before implementing benchmarks, should we support serialization of display lists?

Another way I can find is that we can convert the layer tree to a skp format file through flutter screenshot --type=skia, and then generate a SkPicture from this file in the benchmark test and use it as the input of DisplayListBuilder.

ColdPaleLight avatar Jul 25 '22 08:07 ColdPaleLight

One way to test this would be to capture some complicated real-app scenes using the built-in flattening code that is used to turn the entire tree into a single DL (with nested DLs), and then turn that into a benchmark that rebuilds it under the different scenarios above. Or, we could just come up with a synthetic scene, but it would be nice if it reflected a variety of real-life app scenes.

Creating a benchmark is indeed a great idea! But I'm a little confused about how to create a scene for testing. As far as I know, display list does not support serialization,flutter/flutter#86715. So before implementing benchmarks, should we support serialization of display lists?

Another way I can find is that we can convert the layer tree to a skp format file through flutter screenshot --type=skia, and then generate a SkPicture from this file in the benchmark test and use it as the input of DisplayListBuilder.

Yes, serialization would make it very convenient. The problem with skp is that there are a lot of attributes we can't retrieve from an Sk stream. Shadows are out, most Filters are opaque, etc.

flar avatar Jul 25 '22 17:07 flar

One way to test this would be to capture some complicated real-app scenes using the built-in flattening code that is used to turn the entire tree into a single DL (with nested DLs), and then turn that into a benchmark that rebuilds it under the different scenarios above. Or, we could just come up with a synthetic scene, but it would be nice if it reflected a variety of real-life app scenes.

Creating a benchmark is indeed a great idea! But I'm a little confused about how to create a scene for testing. As far as I know, display list does not support serialization,flutter/flutter#86715. So before implementing benchmarks, should we support serialization of display lists? Another way I can find is that we can convert the layer tree to a skp format file through flutter screenshot --type=skia, and then generate a SkPicture from this file in the benchmark test and use it as the input of DisplayListBuilder.

Yes, serialization would make it very convenient. The problem with skp is that there are a lot of attributes we can't retrieve from an Sk stream. Shadows are out, most Filters are opaque, etc.

Actually, the biggest bang for the buck would be to utilize the data currently in display_list_unittests.cc which has a big array of arrays of "every flavor of every DL call" in them - mainly used for testing equality, predictability of byte and op counts, etc. But, these arrays would also be great input for a benchmark - lots of DisplayList calls all set up without any overhead for parsing. Just start a timer, create a DL, fill it with every variant of every op, build it, and check the time.

flar avatar Jul 26 '22 10:07 flar

@flar This PR is ready for review, please take a look when you have a chance :)

ColdPaleLight avatar Jul 28 '22 07:07 ColdPaleLight

From PR triage: Marking WIP while evaluating performance impact.

zanderso avatar Aug 04 '22 20:08 zanderso

If there is anything that could be done to reduce the cost of "construct + bounds" to be less than it is before this fix, that would be the golden ticket here as that is by far the most common case. It's actually pretty cool that we are so close even though the code is now keeping all of the rectangles around.

Also, should we include the size of the rectangles in the bytes() metric?

flar avatar Aug 04 '22 23:08 flar

If there is anything that could be done to reduce the cost of "construct + bounds" to be less than it is before this fix, that would be the golden ticket here as that is by far the most common case. It's actually pretty cool that we are so close even though the code is now keeping all of the rectangles around.

This seems a little difficult because we additionally calculate the rects used to generate the rtree. Maybe we should change direction and make DisplayListBoundsCalculator an abstract class and let DisplayListBuilder inherit from it. And add a flag in the constructor of DisplayListBuilder to identify whether to generate rects at the same time when calculating bounds.

ColdPaleLight avatar Aug 05 '22 07:08 ColdPaleLight

If there is anything that could be done to reduce the cost of "construct + bounds" to be less than it is before this fix, that would be the golden ticket here as that is by far the most common case. It's actually pretty cool that we are so close even though the code is now keeping all of the rectangles around.

This seems a little difficult because we additionally calculate the rects used to generate the rtree. Maybe we should change direction and make DisplayListBoundsCalculator an abstract class and let DisplayListBuilder inherit from it. And add a flag in the constructor of DisplayListBuilder to identify whether to generate rects at the same time when calculating bounds.

I haven't looked closely to see if there is anything that could make it go faster, but you are right that maybe we want this to be controlled by a flag. SkPicture would only produce an RTree if you supplied the factory, so that was sort of like a flag to enable the feature.

I'm not sure how having DLB inherit from the Calculator class would be part of this solution, though. Wouldn't this be more like having 2 types of bounds accumulators - one that saves the rects and one that just joins them all?

flar avatar Aug 05 '22 09:08 flar

@flar Add a flag to control whether the produce rtree does solve the performance problem.

I added a flag called need_produce_rtree which default value is false, if it is false then the bounds will be calculated using the RectBoundsAccumulator. And if it is true, the RTreeBoundsAccumulator will be used to calculate bounds and produce rtree. The RTreeBoundsAccumulator::bounds method is implemented as follows:

SkRect RTreeBoundsAccumulator::bounds() const {
  FML_DCHECK(saved_offsets_.empty());
  RectBoundsAccumulator accumulator;
  for (auto& rect : rects_) {
    accumulator.accumulate(rect);
  }
  return accumulator.bounds();
}

Here is the result of running testing/run_tests.py --type=benchmarks --variant=host_release

main

Run on (10 X 2400 MHz CPU s)
CPU Caches:
  L1 Data 64 KiB (x10)
  L1 Instruction 128 KiB (x10)
  L2 Unified 4096 KiB (x5)
Load Average: 2.90, 2.81, 4.73
-----------------------------------------------------------------------------------------------------------
Benchmark                                                                 Time             CPU   Iterations
-----------------------------------------------------------------------------------------------------------
BM_DisplayListBuilderDefault/kDefault                                  2.98 us         2.98 us       234046
BM_DisplayListBuilderDefault/kBounds                                   6.01 us         6.01 us       117202
BM_DisplayListBuilderDefault/kRtree                                    14.2 us         14.2 us        49206
BM_DisplayListBuilderDefault/kBoundsAndRtree                           17.0 us         17.0 us        41048
BM_DisplayListBuilderWithScaleAndTranslate/kDefault                    3.12 us         3.12 us       225619
BM_DisplayListBuilderWithScaleAndTranslate/kBounds                     6.16 us         6.16 us       112874
BM_DisplayListBuilderWithScaleAndTranslate/kRtree                      14.3 us         14.3 us        48725
BM_DisplayListBuilderWithScaleAndTranslate/kBoundsAndRtree             17.3 us         17.3 us        40446
BM_DisplayListBuilderWithPerspective/kDefault                          3.05 us         3.05 us       229715
BM_DisplayListBuilderWithPerspective/kBounds                           71.4 us         71.4 us         9727
BM_DisplayListBuilderWithPerspective/kRtree                            79.8 us         79.8 us         8660
BM_DisplayListBuilderWithPerspective/kBoundsAndRtree                    149 us          149 us         4670
BM_DisplayListBuilderWithClipRect/kDefault                             3.07 us         3.07 us       224331
BM_DisplayListBuilderWithClipRect/kBounds                              6.00 us         6.00 us       116364
BM_DisplayListBuilderWithClipRect/kRtree                               14.1 us         14.1 us        49528
BM_DisplayListBuilderWithClipRect/kBoundsAndRtree                      16.9 us         16.9 us        41146
BM_DisplayListBuilderWithSaveLayer/kDefault                            4.27 us         4.27 us       164400
BM_DisplayListBuilderWithSaveLayer/kBounds                             14.8 us         14.8 us        46948
BM_DisplayListBuilderWithSaveLayer/kRtree                              22.8 us         22.8 us        30887
BM_DisplayListBuilderWithSaveLayer/kBoundsAndRtree                     33.4 us         33.4 us        20971
BM_DisplayListBuilderWithSaveLayerAndImageFilter/kDefault              6.60 us         6.60 us       105184
BM_DisplayListBuilderWithSaveLayerAndImageFilter/kBounds               26.7 us         26.7 us        26133
BM_DisplayListBuilderWithSaveLayerAndImageFilter/kRtree                34.7 us         34.7 us        20112
BM_DisplayListBuilderWithSaveLayerAndImageFilter/kBoundsAndRtree       54.9 us         54.9 us        12636

main + this patch

Run on (10 X 2400 MHz CPU s)
CPU Caches:
  L1 Data 64 KiB (x10)
  L1 Instruction 128 KiB (x10)
  L2 Unified 4096 KiB (x5)
Load Average: 8.53, 7.48, 5.59
-----------------------------------------------------------------------------------------------------------
Benchmark                                                                 Time             CPU   Iterations
-----------------------------------------------------------------------------------------------------------
BM_DisplayListBuilderDefault/kDefault                                  5.04 us         5.03 us       138102
BM_DisplayListBuilderDefault/kBounds                                   5.05 us         5.05 us       137317
BM_DisplayListBuilderDefault/kRtree                                    13.7 us         13.6 us        50368
BM_DisplayListBuilderDefault/kBoundsAndRtree                           13.5 us         13.5 us        51664
BM_DisplayListBuilderWithScaleAndTranslate/kDefault                    5.59 us         5.59 us       125074
BM_DisplayListBuilderWithScaleAndTranslate/kBounds                     5.56 us         5.55 us       124942
BM_DisplayListBuilderWithScaleAndTranslate/kRtree                      13.9 us         13.9 us        50535
BM_DisplayListBuilderWithScaleAndTranslate/kBoundsAndRtree             13.9 us         13.9 us        50155
BM_DisplayListBuilderWithPerspective/kDefault                          71.7 us         71.6 us         9757
BM_DisplayListBuilderWithPerspective/kBounds                           71.7 us         71.7 us         9762
BM_DisplayListBuilderWithPerspective/kRtree                            80.2 us         80.2 us         8726
BM_DisplayListBuilderWithPerspective/kBoundsAndRtree                   80.1 us         80.1 us         8665
BM_DisplayListBuilderWithClipRect/kDefault                             5.08 us         5.07 us       136665
BM_DisplayListBuilderWithClipRect/kBounds                              5.08 us         5.08 us       137174
BM_DisplayListBuilderWithClipRect/kRtree                               13.4 us         13.4 us        52185
BM_DisplayListBuilderWithClipRect/kBoundsAndRtree                      13.5 us         13.5 us        52254
BM_DisplayListBuilderWithSaveLayer/kDefault                            7.95 us         7.94 us        87855
BM_DisplayListBuilderWithSaveLayer/kBounds                             7.96 us         7.95 us        87519
BM_DisplayListBuilderWithSaveLayer/kRtree                              16.2 us         16.2 us        42867
BM_DisplayListBuilderWithSaveLayer/kBoundsAndRtree                     16.2 us         16.2 us        42998
BM_DisplayListBuilderWithSaveLayerAndImageFilter/kDefault              25.4 us         25.4 us        27665
BM_DisplayListBuilderWithSaveLayerAndImageFilter/kBounds               25.5 us         25.5 us        27817
BM_DisplayListBuilderWithSaveLayerAndImageFilter/kRtree                33.4 us         33.4 us        21023
BM_DisplayListBuilderWithSaveLayerAndImageFilter/kBoundsAndRtree       33.3 us         33.2 us        21133

ColdPaleLight avatar Aug 08 '22 09:08 ColdPaleLight

@flar friendly ping

ColdPaleLight avatar Aug 15 '22 23:08 ColdPaleLight

@flar Do you have the cycles to take a look at this?

chinmaygarde avatar Aug 25 '22 20:08 chinmaygarde

We believe this PR is super close and thanks so much for working on it as well as the attached benchmark numbers. However, @flar is currently working on reworking parts of the layer tree and doesn't have the cycles to finish reviewing this for at least a couple of weeks. I think we should wait for his sign-off before proceeding. Thanks for understanding!

chinmaygarde avatar Sep 01 '22 21:09 chinmaygarde

From PR review triage: I believe @flar is still working on the project that @chinmaygarde noted above, but if I'm mistaken then this can be a reminder that this PR is waiting for review.

zanderso avatar Sep 29 '22 20:09 zanderso

Getting back to this as it looks like we might be getting closer to implementing RTree query-based culling in DL dispatch. The new code is faster for any case where we expect to call bounds or rtree or both. Since I believe that all cases where we construct a DL will eventually get the bounds of the DL this should be a win. It is interesting to note that the construction is usually handled on the Dart thread and the bounds may not be called until it is in a layer tree so this change will move some of the work done during the frame painting into the UI thread.

Here is a graph of the data in the recent comment. Note that all of the kDefault cases (every 4th bar starting with the 1st) which do not involve calling bounds or rtree are slower, but none of the bars that involve calling bounds or rtree reach 100% which means that they are faster.

Bounds_RTree_consolidation

flar avatar Dec 07 '22 00:12 flar

@flar I try to rebase the latest code, but I have a problem. This PR is based on the assumption that the method DisplayList::rtree() is only valid if the flag prepare_rtree is set to true in DisplayListBuilder. However, after the PR https://github.com/flutter/engine/pull/37451 landed, this assumption was broken.

Method DisplayListBuilder::drawDisplayList will be changed to the following in this PR. Note that when type is BoundsAccumulatorType::kRTree (that is, when prepare_rtree is true), the rtree() method of the input parameter display_list will be accessed. However, at this time we cannot guarantee that this access is legal since display_list may not generate rtree at it's building time.

void DisplayListBuilder::drawDisplayList(
    const sk_sp<DisplayList> display_list) {

  const SkRect bounds = display_list->bounds();
  switch (accumulator()->type()) {
    case BoundsAccumulatorType::kRect:
      AccumulateOpBounds(bounds, kDrawDisplayListFlags);
    case BoundsAccumulatorType::kRTree:
      std::list<SkRect> rects =
          display_list->rtree()->searchNonOverlappingDrawnRects(bounds);
      for (const SkRect& rect : rects) {
        // TODO (https://github.com/flutter/flutter/issues/114919): Attributes
        // are not necessarily `kDrawDisplayListFlags`.
        AccumulateOpBounds(rect, kDrawDisplayListFlags);
      }
  }

  Push<DrawDisplayListOp>(0, 1, display_list);
  ......
}

Any thoughts? Thanks!

ColdPaleLight avatar Dec 08 '22 06:12 ColdPaleLight

I would just accumulate a bounding box for a DL that didn't include RTree. I'm not sure if that is worth an FML_LOG(INFO) as there is nothing that the developer can do about this if we cross our rtree flags.

Also, your switch statement needs some break statements.

Or, is it worth being able to reverse engineer an RTree? That can be a future task. For now I think accumulating just a rect is fine as we should be mostly consistent about using RTree internally, right?

flar avatar Dec 08 '22 17:12 flar

@flar The cause of the matter is testOneOverlayAndTwoIntersectingOverlays scenario test fails with impeller turned on, please see https://github.com/flutter/flutter/issues/113251#issuecomment-1301161688 for detail.

The reason for the problem is that after impeller turn on, the display_list of method drawDisplayList is considered an Op, so the corresponding rect of the Op is the bounds of display_list. c.f. https://github.com/flutter/engine/blob/5b31f4f0d9b6973944aba742a70133d8078c23fe/flow/layers/display_list_layer.cc#L161-L167

The PR https://github.com/flutter/engine/pull/37451 solved the problem. However, this solution needs to calculate the rtree of the parameter display_list of method drawDisplayList.

In the current PR, the calculation of rtree depends on the prepare_rtree flag of DisplayListBuilder, which does not exist in the above scenario.

If we just accumulate a bounding box for a DL that didn't include RTree, testOneOverlayAndTwoIntersectingOverlays scenario test will still fail.

To sum up, the above issue is caused by the different processing of the DisplayListLayer by the impeller backend and the skia backend. The impeller implementation will call the drawDisplayList method and treat it as an OP, while the skia implementation will dispatch display_list.

I think, if we plan to continue to use drawDisplayList instead of dispatch displayList in the impeller backend, perhaps modifying testOneOverlayAndTwoIntersectingOverlays is an option.

ColdPaleLight avatar Dec 13 '22 07:12 ColdPaleLight

2 things...

When generating a DL from Dart, the old code used to supply the parameter to SkPicture/Recorder that asked it to generate an RTree, so hopefully when you add that flag, you are setting it in the code that sets up the builder in picture.cc/picture_recorder.cc

That code should probably say "if accumulator is RTree AND DL has an RTree, then do the individual rects, otherwise just do the bounds.

If the Builders are created via some other mechanism, then perhaps we need more locations that set the flag in the new scenario.

Or ... should we have the default be true?

flar avatar Dec 13 '22 09:12 flar

Look at the difference between the initialization of the PictureRecorder vs the DLBuilder/Recorder in this PR that removed the old SkP code. Look at the lines in picture_recorder.cc - the DL side of the if statement just created a basic DLRecorder, but the Sk side of it passed an rtree_factory_ in which induced the SkPicture to create an rtree by default.

flar avatar Dec 13 '22 09:12 flar