Collapse bounds calculations into DisplayListBuilder
fix https://github.com/flutter/flutter/issues/100172
This PR does not change the API and semantics, the tests already exist https://github.com/flutter/engine/blob/31afa38d95f5e67af4720af44052fbb873d240df/display_list/display_list_unittests.cc#L1104
Pre-launch Checklist
- [x] I read the [Contributor Guide] and followed the process outlined there for submitting PRs.
- [x] I read the [Tree Hygiene] wiki page, which explains my responsibilities.
- [x] I read and followed the [Flutter Style Guide] and the [C++, Objective-C, Java style guides].
- [x] I listed at least one issue that this PR fixes in the description above.
- [ ] I added new tests to check the change I am making or feature I am adding, or Hixie said the PR is test-exempt. See [testing the engine] for instructions on writing and running engine tests.
- [ ] I updated/added relevant documentation (doc comments with
///). - [x] I signed the [CLA].
- [x] All existing and new tests are passing.
It looks like this pull request may not have tests. Please make sure to add tests before merging. If you need an exemption to this rule, contact Hixie on the #hackers channel in Chat (don't just cc him here, he won't see it! He's on Discord!).
If you are not sure if you need tests, consider this rule of thumb: the purpose of a test is to make sure someone doesn't accidentally revert the fix. Ask yourself, is there anything in your PR that you feel it is important we not accidentally revert back to how it was before your fix?
Reviewers: Read the Tree Hygiene page and make sure this patch meets those guidelines before LGTMing.
I should clarify in the original issue. The concept here was not to just eliminate an extra Dispatch process, but to integrate the 2 mechanisms so that they could share information when needed. I've updated the original bug description to explain more of the inefficiencies that we want to avoid with this effort.
Unfortunately we're going to need the Calculator class for just a little while longer while we solve a pressing need to get more coverage for Impeller. The PlatformViews will need to move over to DisplayList and to do that they need to produce an RTree from a DisplayList.
To get that work done quickly I reused the Calculator class but factored out the BoundsAccumulator into Rect and RTree subclasses. By default, a DisplayList will only produce the regular bounds, but a caller can ask it to produce an RTree. I hope to have the PR submitted by tonight or tomorrow, but am running into problems with code sharing between flow and display_list that I'll likely solve by just cut/pasting code.
Once that is in we'll have to figure out how we can inline and make a good hybrid implementation. We could potentially just RTree every DisplayList, but I'd like to see the overhead of that. It would still plug in just as the current stuff does in this PR since the RTree is a small variant on the existing BoundsCalculator, but the question will be whether we try to do both in the inlining or not.
Another idea would be for the accumulator to save the bounds of each primitive as it goes and then making an RTree would be a simple act of handing the entire list to the RTree generator without having to dispatch. But that will be a slight change to how the existing BoundsAccumulator works anyway. I'll get to work on creating this RTree PR so you can see better what I'm talking about. The goal is to have PlatformViews working on Impeller within the next week or so, so it is fairly high priority work and hopefully doesn't really modify what has happened in this PR too much (the Calculator dispatch methods are largely untouched and both variants of Accumulator still support accumulate(rect)).
I tagged you in https://github.com/flutter/engine/pull/34809 so that we can plan the more forward on how to integrate bounds calculation while also considering the need for an RTree.
I tagged you in #34809 so that we can plan the more forward on how to integrate bounds calculation while also considering the need for an RTree.
Very excited to see RTree for DisplayList! ~About this PR, Maybe we can add a flag to the constructor of DisplayListBuilder to control whether RTree needs to be generated? If this flag is true, we will generate RTree while calculating bounds. If flag is false, we will only calculate bounds and will not generate RTree. WDYT?~
I thought about it again, I think a better solution is :
- Record the
rects_needed to generate theRTreethrough theRTreeBoundsAccumulatorin theDisplayListBuilder. - Then pass the ownership of the
rects_to theDisplayListin theDisplayListBuilder::Buildmethod. - Finally generate the
RTreebased onrects_when theDisplayList::rtree( )is called.
I tagged you in #34809 so that we can plan the more forward on how to integrate bounds calculation while also considering the need for an RTree.
Very excited to see RTree for DisplayList! ~About this PR, Maybe we can add a flag to the constructor of DisplayListBuilder to control whether RTree needs to be generated? If this flag is true, we will generate RTree while calculating bounds. If flag is false, we will only calculate bounds and will not generate RTree. WDYT?~
I thought about it again, I think a better solution is :
- Record the
rects_needed to generate theRTreethrough theRTreeBoundsAccumulatorin theDisplayListBuilder.- Then pass the ownership of the
rects_to theDisplayListin theDisplayListBuilder::Buildmethod.- Finally generate the
RTreebased onrects_when theDisplayList::rtree( )is called.
That's what I was thinking with my comment about saving the bounds in my earlier comment. The pros are:
- already done the work to generate the RTree when it is needed.
- we have the bounds of each primitive saved somewhere where we can consult them for fast(er) culling
Cons:
- memory usage of 16 bytes per rendering op
- we could dump the list of rects after we generate the RTree, if we could use the RTree for culling
- SkPicture was already generating RTrees when we switched to DisplayList, so we've already encountered the time and memory that would be used here
@flar
When I tried to integrate ClipBoundsDispatchHelper into DisplayListBuider, I ran into some problems. Can you enlighten me? Thanks you.
- If https://github.com/flutter/engine/pull/34835 landed, Then we will find that params
is_aainClipBoundsDispatchHelper::clipRectwill affect clip bounds, but it is ignored inDisplayListBuider::clipRect. Which one is correct? c.f. https://github.com/flutter/engine/blob/main/display_list/display_list_utils.cc#L216-L218
void ClipBoundsDispatchHelper::intersect(const SkRect& rect, bool is_aa) {
SkRect devClipBounds = matrix().mapRect(rect);
if (is_aa) {
devClipBounds.roundOut(&devClipBounds);
}
......
}
-
has_clip_ofClipBoundsDispatchHelperlooks like always betruesince paramcull_rectin constructor will not be nullptr. (BecauseDisplayList::bounds_cull_will always has value). So can we delete all the code which in branch thathas_clipis false? c.f. https://github.com/flutter/engine/blob/main/display_list/display_list_utils.h#L314
explicit ClipBoundsDispatchHelper(const SkRect* cull_rect)
: has_clip_(cull_rect),
bounds_(cull_rect && !cull_rect->isEmpty() ? *cull_rect
: SkRect::MakeEmpty()) {}
-
DisplayListBoundsCalculator::saveLayerwill affect clip bounds, butDisplayListBuider::saveLayerwon't. Should we makeDisplayListBuilder::saveLayeralso affect clip bounds? c.f. https://github.com/flutter/engine/blob/main/display_list/display_list_utils.cc#L454-L459
// Even though Skia claims that the bounds are only a hint, they actually
// use them as the temporary layer bounds during rendering the layer, so
// we set them as if a clip operation were performed.
if (bounds) {
clipRect(*bounds, SkClipOp::kIntersect, false);
}
Those are all excellent questions and the more I look at what is being done in both cases I realize that we might not have a good answer for either. Let me compose my thoughts a bit more before I try to come up with a good description of the concerns, but mainly the issues are:
- Some operations imply a specific rasterization anomaly that might affect the bounds
- For example, clipping is not applied geometrically, but is instead a pixel operation. Do we account for that pixelization in the bounds reported via either mechanism or both?
- The bounds of a saveLayer will not guarantee clipping, but in general that happens in the most common cases. They are a suggestion for the bounds of the virtual layer, but if we are under a transform, the bounds may transform into a diamond and drawing anywhere in the diamond will produce output. This means that you can't rely on them to clip out rendering outside of that local-space box, but it does have an impact on the bounds of the content. The clipRect call used in one of the cases is an attempt to deal with the fact that while you can't rely on every pixel outside the layer bounds to be eliminated, in general transformed pixels that lie outside its transformed bounds will not be rendered.
- What transform are we being rendered under?
- These operations are all being recorded. Once recorded they can be rendered in a context with any arbitrary transform.
- If we adjust bounds to match pixel expectations per the previous top-level bullet item, how big is "a pixel" really when we don't know the final transform?
- The calculations were an attempt to at least try to answer "what are the bounds in the destination coordinate space of this DL" but may not have gotten it right in all cases
- This isn't a theoretical issue. Every ui.Picture recorded by the framework is recorded as if it will be played back in an Identity coordinate system, but every scene starts with a global scale by the DPI of the destination. So, all ui.Picture DisplayLists within the scene will minimally be scaled by the DPI.
I haven't read the latest commits in detail, but it looks like you went the route of saving all of the rectangles during construction and then lazily turning them into an RTree when needed (without a second Dispatch cycle).
Did you measure the time that costs compared to:
- the old system of constructing the records without bounds
- the old system including lazily computing the bounds with a second dispatch cycle
- the itermediate system you had of computing the bounds within the builder cycle
It would be nice if "DisplayList + realized bounds (but not realized RTree object)" was similar in performance to the previous "DisplayList construction + lazy bounds calculation" and even better if it was even faster than that - though I don't imagine it would be as fast as the prior "construction without bounds" - but it would be nice to know what the performance difference was even though we almost always ask for the bounds.
One way to test this would be to capture some complicated real-app scenes using the built-in flattening code that is used to turn the entire tree into a single DL (with nested DLs), and then turn that into a benchmark that rebuilds it under the different scenarios above. Or, we could just come up with a synthetic scene, but it would be nice if it reflected a variety of real-life app scenes.
Note that having such a benchmark would be good for us moving forward so that we could measure the impact of each of our changes to DL. We've gone long enough based on the mysterious art of "I don't think what I just modified will cost us much time in DL construction" and it is time we started measuring these things more explicitly.
I could see a DL construction benchmark as either something driven from Dart or something driven from the native APIs directly. There are pros and cons for either:
Pros for Dart:
- Framework engineers could add benchmarks for things they may have had perf issues with
- It measures the native interface as well as the native DL code
- Would this be in the framework benchmarks or in the engine benchmarks? Would it trigger regression notifications on every engine commit or only when we roll into framework?
Pros for native:
- More focused on the native code that we are modifying a lot (the Dart interface never changes)
- Might have less variance with fewer mechanisms to the benchmark
- Would be more likely to trigger regression notifications closer to when the damage is done since it would be executed for every engine commit before rolling into the framework
Created https://github.com/flutter/flutter/issues/108265 to track what might turn into a suite of benchmarks for DL (for this PR it would be nice to at least have construction/bounds/rtree benchmarks).
One way to test this would be to capture some complicated real-app scenes using the built-in flattening code that is used to turn the entire tree into a single DL (with nested DLs), and then turn that into a benchmark that rebuilds it under the different scenarios above. Or, we could just come up with a synthetic scene, but it would be nice if it reflected a variety of real-life app scenes.
Creating a benchmark is indeed a great idea! But I'm a little confused about how to create a scene for testing. As far as I know, display list does not support serialization,https://github.com/flutter/flutter/issues/86715. So before implementing benchmarks, should we support serialization of display lists?
Another way I can find is that we can convert the layer tree to a skp format file through flutter screenshot --type=skia, and then generate a SkPicture from this file in the benchmark test and use it as the input of DisplayListBuilder.
One way to test this would be to capture some complicated real-app scenes using the built-in flattening code that is used to turn the entire tree into a single DL (with nested DLs), and then turn that into a benchmark that rebuilds it under the different scenarios above. Or, we could just come up with a synthetic scene, but it would be nice if it reflected a variety of real-life app scenes.
Creating a benchmark is indeed a great idea! But I'm a little confused about how to create a scene for testing. As far as I know, display list does not support serialization,flutter/flutter#86715. So before implementing benchmarks, should we support serialization of display lists?
Another way I can find is that we can convert the layer tree to a
skpformat file throughflutter screenshot --type=skia, and then generate a SkPicture from this file in the benchmark test and use it as the input of DisplayListBuilder.
Yes, serialization would make it very convenient. The problem with skp is that there are a lot of attributes we can't retrieve from an Sk stream. Shadows are out, most Filters are opaque, etc.
One way to test this would be to capture some complicated real-app scenes using the built-in flattening code that is used to turn the entire tree into a single DL (with nested DLs), and then turn that into a benchmark that rebuilds it under the different scenarios above. Or, we could just come up with a synthetic scene, but it would be nice if it reflected a variety of real-life app scenes.
Creating a benchmark is indeed a great idea! But I'm a little confused about how to create a scene for testing. As far as I know, display list does not support serialization,flutter/flutter#86715. So before implementing benchmarks, should we support serialization of display lists? Another way I can find is that we can convert the layer tree to a
skpformat file throughflutter screenshot --type=skia, and then generate a SkPicture from this file in the benchmark test and use it as the input of DisplayListBuilder.Yes, serialization would make it very convenient. The problem with skp is that there are a lot of attributes we can't retrieve from an Sk stream. Shadows are out, most Filters are opaque, etc.
Actually, the biggest bang for the buck would be to utilize the data currently in display_list_unittests.cc which has a big array of arrays of "every flavor of every DL call" in them - mainly used for testing equality, predictability of byte and op counts, etc. But, these arrays would also be great input for a benchmark - lots of DisplayList calls all set up without any overhead for parsing. Just start a timer, create a DL, fill it with every variant of every op, build it, and check the time.
@flar This PR is ready for review, please take a look when you have a chance :)
From PR triage: Marking WIP while evaluating performance impact.
If there is anything that could be done to reduce the cost of "construct + bounds" to be less than it is before this fix, that would be the golden ticket here as that is by far the most common case. It's actually pretty cool that we are so close even though the code is now keeping all of the rectangles around.
Also, should we include the size of the rectangles in the bytes() metric?
If there is anything that could be done to reduce the cost of "construct + bounds" to be less than it is before this fix, that would be the golden ticket here as that is by far the most common case. It's actually pretty cool that we are so close even though the code is now keeping all of the rectangles around.
This seems a little difficult because we additionally calculate the rects used to generate the rtree. Maybe we should change direction and make DisplayListBoundsCalculator an abstract class and let DisplayListBuilder inherit from it. And add a flag in the constructor of DisplayListBuilder to identify whether to generate rects at the same time when calculating bounds.
If there is anything that could be done to reduce the cost of "construct + bounds" to be less than it is before this fix, that would be the golden ticket here as that is by far the most common case. It's actually pretty cool that we are so close even though the code is now keeping all of the rectangles around.
This seems a little difficult because we additionally calculate the
rectsused to generate thertree. Maybe we should change direction and makeDisplayListBoundsCalculatoran abstract class and letDisplayListBuilderinherit from it. And add a flag in the constructor ofDisplayListBuilderto identify whether to generaterectsat the same time when calculating bounds.
I haven't looked closely to see if there is anything that could make it go faster, but you are right that maybe we want this to be controlled by a flag. SkPicture would only produce an RTree if you supplied the factory, so that was sort of like a flag to enable the feature.
I'm not sure how having DLB inherit from the Calculator class would be part of this solution, though. Wouldn't this be more like having 2 types of bounds accumulators - one that saves the rects and one that just joins them all?
@flar Add a flag to control whether the produce rtree does solve the performance problem.
I added a flag called need_produce_rtree which default value is false, if it is false then the bounds will be calculated using the RectBoundsAccumulator. And if it is true, the RTreeBoundsAccumulator will be used to calculate bounds and produce rtree. The RTreeBoundsAccumulator::bounds method is implemented as follows:
SkRect RTreeBoundsAccumulator::bounds() const {
FML_DCHECK(saved_offsets_.empty());
RectBoundsAccumulator accumulator;
for (auto& rect : rects_) {
accumulator.accumulate(rect);
}
return accumulator.bounds();
}
Here is the result of running testing/run_tests.py --type=benchmarks --variant=host_release
main
Run on (10 X 2400 MHz CPU s) CPU Caches: L1 Data 64 KiB (x10) L1 Instruction 128 KiB (x10) L2 Unified 4096 KiB (x5) Load Average: 2.90, 2.81, 4.73 ----------------------------------------------------------------------------------------------------------- Benchmark Time CPU Iterations ----------------------------------------------------------------------------------------------------------- BM_DisplayListBuilderDefault/kDefault 2.98 us 2.98 us 234046 BM_DisplayListBuilderDefault/kBounds 6.01 us 6.01 us 117202 BM_DisplayListBuilderDefault/kRtree 14.2 us 14.2 us 49206 BM_DisplayListBuilderDefault/kBoundsAndRtree 17.0 us 17.0 us 41048 BM_DisplayListBuilderWithScaleAndTranslate/kDefault 3.12 us 3.12 us 225619 BM_DisplayListBuilderWithScaleAndTranslate/kBounds 6.16 us 6.16 us 112874 BM_DisplayListBuilderWithScaleAndTranslate/kRtree 14.3 us 14.3 us 48725 BM_DisplayListBuilderWithScaleAndTranslate/kBoundsAndRtree 17.3 us 17.3 us 40446 BM_DisplayListBuilderWithPerspective/kDefault 3.05 us 3.05 us 229715 BM_DisplayListBuilderWithPerspective/kBounds 71.4 us 71.4 us 9727 BM_DisplayListBuilderWithPerspective/kRtree 79.8 us 79.8 us 8660 BM_DisplayListBuilderWithPerspective/kBoundsAndRtree 149 us 149 us 4670 BM_DisplayListBuilderWithClipRect/kDefault 3.07 us 3.07 us 224331 BM_DisplayListBuilderWithClipRect/kBounds 6.00 us 6.00 us 116364 BM_DisplayListBuilderWithClipRect/kRtree 14.1 us 14.1 us 49528 BM_DisplayListBuilderWithClipRect/kBoundsAndRtree 16.9 us 16.9 us 41146 BM_DisplayListBuilderWithSaveLayer/kDefault 4.27 us 4.27 us 164400 BM_DisplayListBuilderWithSaveLayer/kBounds 14.8 us 14.8 us 46948 BM_DisplayListBuilderWithSaveLayer/kRtree 22.8 us 22.8 us 30887 BM_DisplayListBuilderWithSaveLayer/kBoundsAndRtree 33.4 us 33.4 us 20971 BM_DisplayListBuilderWithSaveLayerAndImageFilter/kDefault 6.60 us 6.60 us 105184 BM_DisplayListBuilderWithSaveLayerAndImageFilter/kBounds 26.7 us 26.7 us 26133 BM_DisplayListBuilderWithSaveLayerAndImageFilter/kRtree 34.7 us 34.7 us 20112 BM_DisplayListBuilderWithSaveLayerAndImageFilter/kBoundsAndRtree 54.9 us 54.9 us 12636
main + this patch
Run on (10 X 2400 MHz CPU s) CPU Caches: L1 Data 64 KiB (x10) L1 Instruction 128 KiB (x10) L2 Unified 4096 KiB (x5) Load Average: 8.53, 7.48, 5.59 ----------------------------------------------------------------------------------------------------------- Benchmark Time CPU Iterations ----------------------------------------------------------------------------------------------------------- BM_DisplayListBuilderDefault/kDefault 5.04 us 5.03 us 138102 BM_DisplayListBuilderDefault/kBounds 5.05 us 5.05 us 137317 BM_DisplayListBuilderDefault/kRtree 13.7 us 13.6 us 50368 BM_DisplayListBuilderDefault/kBoundsAndRtree 13.5 us 13.5 us 51664 BM_DisplayListBuilderWithScaleAndTranslate/kDefault 5.59 us 5.59 us 125074 BM_DisplayListBuilderWithScaleAndTranslate/kBounds 5.56 us 5.55 us 124942 BM_DisplayListBuilderWithScaleAndTranslate/kRtree 13.9 us 13.9 us 50535 BM_DisplayListBuilderWithScaleAndTranslate/kBoundsAndRtree 13.9 us 13.9 us 50155 BM_DisplayListBuilderWithPerspective/kDefault 71.7 us 71.6 us 9757 BM_DisplayListBuilderWithPerspective/kBounds 71.7 us 71.7 us 9762 BM_DisplayListBuilderWithPerspective/kRtree 80.2 us 80.2 us 8726 BM_DisplayListBuilderWithPerspective/kBoundsAndRtree 80.1 us 80.1 us 8665 BM_DisplayListBuilderWithClipRect/kDefault 5.08 us 5.07 us 136665 BM_DisplayListBuilderWithClipRect/kBounds 5.08 us 5.08 us 137174 BM_DisplayListBuilderWithClipRect/kRtree 13.4 us 13.4 us 52185 BM_DisplayListBuilderWithClipRect/kBoundsAndRtree 13.5 us 13.5 us 52254 BM_DisplayListBuilderWithSaveLayer/kDefault 7.95 us 7.94 us 87855 BM_DisplayListBuilderWithSaveLayer/kBounds 7.96 us 7.95 us 87519 BM_DisplayListBuilderWithSaveLayer/kRtree 16.2 us 16.2 us 42867 BM_DisplayListBuilderWithSaveLayer/kBoundsAndRtree 16.2 us 16.2 us 42998 BM_DisplayListBuilderWithSaveLayerAndImageFilter/kDefault 25.4 us 25.4 us 27665 BM_DisplayListBuilderWithSaveLayerAndImageFilter/kBounds 25.5 us 25.5 us 27817 BM_DisplayListBuilderWithSaveLayerAndImageFilter/kRtree 33.4 us 33.4 us 21023 BM_DisplayListBuilderWithSaveLayerAndImageFilter/kBoundsAndRtree 33.3 us 33.2 us 21133
@flar friendly ping
@flar Do you have the cycles to take a look at this?
We believe this PR is super close and thanks so much for working on it as well as the attached benchmark numbers. However, @flar is currently working on reworking parts of the layer tree and doesn't have the cycles to finish reviewing this for at least a couple of weeks. I think we should wait for his sign-off before proceeding. Thanks for understanding!
From PR review triage: I believe @flar is still working on the project that @chinmaygarde noted above, but if I'm mistaken then this can be a reminder that this PR is waiting for review.
Getting back to this as it looks like we might be getting closer to implementing RTree query-based culling in DL dispatch. The new code is faster for any case where we expect to call bounds or rtree or both. Since I believe that all cases where we construct a DL will eventually get the bounds of the DL this should be a win. It is interesting to note that the construction is usually handled on the Dart thread and the bounds may not be called until it is in a layer tree so this change will move some of the work done during the frame painting into the UI thread.
Here is a graph of the data in the recent comment. Note that all of the kDefault cases (every 4th bar starting with the 1st) which do not involve calling bounds or rtree are slower, but none of the bars that involve calling bounds or rtree reach 100% which means that they are faster.

@flar I try to rebase the latest code, but I have a problem. This PR is based on the assumption that the method DisplayList::rtree() is only valid if the flag prepare_rtree is set to true in DisplayListBuilder. However, after the PR https://github.com/flutter/engine/pull/37451 landed, this assumption was broken.
Method DisplayListBuilder::drawDisplayList will be changed to the following in this PR. Note that when type is BoundsAccumulatorType::kRTree (that is, when prepare_rtree is true), the rtree() method of the input parameter display_list will be accessed. However, at this time we cannot guarantee that this access is legal since display_list may not generate rtree at it's building time.
void DisplayListBuilder::drawDisplayList(
const sk_sp<DisplayList> display_list) {
const SkRect bounds = display_list->bounds();
switch (accumulator()->type()) {
case BoundsAccumulatorType::kRect:
AccumulateOpBounds(bounds, kDrawDisplayListFlags);
case BoundsAccumulatorType::kRTree:
std::list<SkRect> rects =
display_list->rtree()->searchNonOverlappingDrawnRects(bounds);
for (const SkRect& rect : rects) {
// TODO (https://github.com/flutter/flutter/issues/114919): Attributes
// are not necessarily `kDrawDisplayListFlags`.
AccumulateOpBounds(rect, kDrawDisplayListFlags);
}
}
Push<DrawDisplayListOp>(0, 1, display_list);
......
}
Any thoughts? Thanks!
I would just accumulate a bounding box for a DL that didn't include RTree. I'm not sure if that is worth an FML_LOG(INFO) as there is nothing that the developer can do about this if we cross our rtree flags.
Also, your switch statement needs some break statements.
Or, is it worth being able to reverse engineer an RTree? That can be a future task. For now I think accumulating just a rect is fine as we should be mostly consistent about using RTree internally, right?
@flar
The cause of the matter is testOneOverlayAndTwoIntersectingOverlays scenario test fails with impeller turned on, please see https://github.com/flutter/flutter/issues/113251#issuecomment-1301161688 for detail.
The reason for the problem is that after impeller turn on, the display_list of method drawDisplayList is considered an Op, so the corresponding rect of the Op is the bounds of display_list.
c.f.
https://github.com/flutter/engine/blob/5b31f4f0d9b6973944aba742a70133d8078c23fe/flow/layers/display_list_layer.cc#L161-L167
The PR https://github.com/flutter/engine/pull/37451 solved the problem. However, this solution needs to calculate the rtree of the parameter display_list of method drawDisplayList.
In the current PR, the calculation of rtree depends on the prepare_rtree flag of DisplayListBuilder, which does not exist in the above scenario.
If we just accumulate a bounding box for a DL that didn't include RTree, testOneOverlayAndTwoIntersectingOverlays scenario test will still fail.
To sum up, the above issue is caused by the different processing of the DisplayListLayer by the impeller backend and the skia backend. The impeller implementation will call the drawDisplayList method and treat it as an OP, while the skia implementation will dispatch display_list.
I think, if we plan to continue to use drawDisplayList instead of dispatch displayList in the impeller backend, perhaps modifying testOneOverlayAndTwoIntersectingOverlays is an option.
2 things...
When generating a DL from Dart, the old code used to supply the parameter to SkPicture/Recorder that asked it to generate an RTree, so hopefully when you add that flag, you are setting it in the code that sets up the builder in picture.cc/picture_recorder.cc
That code should probably say "if accumulator is RTree AND DL has an RTree, then do the individual rects, otherwise just do the bounds.
If the Builders are created via some other mechanism, then perhaps we need more locations that set the flag in the new scenario.
Or ... should we have the default be true?
Look at the difference between the initialization of the PictureRecorder vs the DLBuilder/Recorder in this PR that removed the old SkP code. Look at the lines in picture_recorder.cc - the DL side of the if statement just created a basic DLRecorder, but the Sk side of it passed an rtree_factory_ in which induced the SkPicture to create an rtree by default.