This greatly reduces the number of vector reallocations happening while building primitive lists. On the difficult cases like youtube front page the reduction is a bit more than 50%, and more in other pages I tested. More importantly it dramatically reduces the amount of the most expensive of these reallocations which are when the vector is starting to get large.
Differential Revision: https://phabricator.services.mozilla.com/D81725
Fix ODR violation: type of GLboolean must match gleam, else the unsafe{}
glue code will corrupt the parameters (and go out of bounds in
GetBooleanv). "Composite" was getting a garbage value for "flip".
Differential Revision: https://phabricator.services.mozilla.com/D81618
Previously, tile cache instances were destroyed and recreated
each time a new scene was created, as they were embedded inside
the picture primitives. An elaborate but complicated system was
used to retain important state (such as native surfaces, primitive
dependencies) across new scenes.
This patch moves the tile cache instances to be stored inside the
render backend. It removes the previous code for retaining state
for each tile cache. Instead, tile caches are created / reused /
destroyed during `new_async_scene_ready`.
This removes quite a bit of complexity. More importantly, it is
another step towards being able to cache and retain state such
as primitive tile assignments and visibility state across both
new frames and scenes.
Differential Revision: https://phabricator.services.mozilla.com/D81487
Guard against nullptr access of missing p.impl.
Also change LinkStatus so is_initialized is no longer true and calling
code can early out if bind_program fails.
Differential Revision: https://phabricator.services.mozilla.com/D81421
This is a partial step towards a larger change. The goal of this and the
follow up patches is to move the tile cache instances to be stored in
the render backend, rather than inside the picture / primitive tree.
This will allow better caching of dependency and visibility state
across both frame and scene builds for primitives. This has the potential
to significantly reduce or eliminate the amount of work we do per-frame
to track per-primitive visibility, clip-chain state and tile assignments.
A longer term goal is to allow correlating up-to-date tile caches with
pipeline display lists that haven't changed. This would allow WR to
skip scene building for content display lists that haven't changed, if
only the outer pipeline content has changed.
Differential Revision: https://phabricator.services.mozilla.com/D81284
In various parts of the picture and mask code, we were casting
the `clipped` rect to i32 (after rounding out). However, this
can cause overflow panics when the origin of the rect is too big.
Instead, treat the origin as f32 (which it was generally being
converted to anyway), and only cast the size part to be i32 as
required. This is safe since we know that the size has been
clipped to the visible screen, so will always be safe to cast
to i32.
Differential Revision: https://phabricator.services.mozilla.com/D80968
In various parts of the picture and mask code, we were casting
the `clipped` rect to i32 (after rounding out). However, this
can cause overflow panics when the origin of the rect is too big.
Instead, treat the origin as f32 (which it was generally being
converted to anyway), and only cast the size part to be i32 as
required. This is safe since we know that the size has been
clipped to the visible screen, so will always be safe to cast
to i32.
Differential Revision: https://phabricator.services.mozilla.com/D80968
The code already explicitly checks for and handles the rect here
having a zero or invalid size, there is no need to assert that
the rect size itself is valid.
Differential Revision: https://phabricator.services.mozilla.com/D81249
In various parts of the picture and mask code, we were casting
the `clipped` rect to i32 (after rounding out). However, this
can cause overflow panics when the origin of the rect is too big.
Instead, treat the origin as f32 (which it was generally being
converted to anyway), and only cast the size part to be i32 as
required. This is safe since we know that the size has been
clipped to the visible screen, so will always be safe to cast
to i32.
Differential Revision: https://phabricator.services.mozilla.com/D80968
We detect empty scroll roots by checking the valid scrollable size
of a frame, in order to avoid attaching picture cache slices to
these redundant scroll frames.
However, under some fractional zoom scenarios, rounding CSS pixels
to device pixels can result in small rounding errors.
Apply the same epsilon check that Gecko uses in APZ code in order
to detect if a scroll frame is actually scrollable.
Differential Revision: https://phabricator.services.mozilla.com/D80943
It would be wasteful to preallocate all batch builders because the majority of them have only a single batch, while typically only one will will have many batches. Thankfully we can acurately guess which pictures will produce many batches by checking whether they have more than one cluster.
Differential Revision: https://phabricator.services.mozilla.com/D80469
Vector reallocations in CompositeState::push_surface are taking about 2% of total frame building time before this patch. There was an effort at preallocating some with constant values but I suspect these constants haven't been updated along with picture cachign heuristics.
Differential Revision: https://phabricator.services.mozilla.com/D80195
This patch a simple utility to help with pre-allocating vectors that we can't recycle and use it with the primitive headers.
Differential Revision: https://phabricator.services.mozilla.com/D80194
For performance reasons in SWGL software compositors. to avoid unnecessary
full-screen copies of the framembuffer, we need to allow those compositors to
map their underlying widget surfaces and pass that buffer to SWGL so that they
can be directly rendered to. That also requires supporting custom strides, as
we can't always enforce the particular layout of the buffers handed off to us.
To that end, InitDefaultFramebuffer is generalized to take such information
and then many places where we rely on a specific hard-coded SWGL-calculated
stride have been altered to deal with a caller-supplied stride.
Differential Revision: https://phabricator.services.mozilla.com/D80267
The code to support collapsing a picture with a single primitive
and an opacity filter into a primitive + opacity binding is
no longer an important optimization, due to picture caching.
Removing this old optimization also reduces complexity during
scene building, and slightly simplifies batching and picture
cache dependency tracking.
Differential Revision: https://phabricator.services.mozilla.com/D79975