In future, picture cache tiles will support different sizes, depending
on the size of the content slice being cached, and how frequently
parts of the slice are changing.
Although currently unused, this patch adds support for specifying
multiple different tile sizes for the picture cache texture array.
Differential Revision: https://phabricator.services.mozilla.com/D34989
--HG--
extra : moz-landing-system : lando
In future, picture cache tiles will support different sizes, depending
on the size of the content slice being cached, and how frequently
parts of the slice are changing.
Although currently unused, this patch adds support for specifying
multiple different tile sizes for the picture cache texture array.
Differential Revision: https://phabricator.services.mozilla.com/D34989
--HG--
extra : moz-landing-system : lando
Fixes an edge case where splitting the top level primitive list
for picture caching may result in a mismatched push/pop clip
pair.
This is not a particularly efficient fix, but it's a rare enough
edge case for now that this fix will be good enough until we work
out the long term solution for the push/pop clip chain instances.
Differential Revision: https://phabricator.services.mozilla.com/D35139
--HG--
extra : moz-landing-system : lando
This patch fixes two issues with blob images + new picture caching.
1) The logic that determines a conservative set of visible tiles
for tiled / blob images was no longer correct. It was relying
on the bounds of a single tile to build the conservative rect.
Instead, take the overall primitive world bounds and derive a
conservative set of visible tiles from this.
2) The logic to detect if an image was dirty was incorrect, and
somewhat error prone. It now maintains a set of dirty images
that have been requested. The image key dependencies are then
checked during the tile cache post_update step.
Differential Revision: https://phabricator.services.mozilla.com/D35126
--HG--
extra : moz-landing-system : lando
There appears to be a driver bug on android 8 and older where it does
not render correctly.
Differential Revision: https://phabricator.services.mozilla.com/D34618
--HG--
extra : moz-landing-system : lando
When using an advanced blend equation, fragment shader output must be
marked with a matching layout qualifier. Not doing so was causing
subsequent glDraw* operations to fail.
This patch adds a new shader feature, WR_FEATURE_ADVANCED_BLEND, which
requires the necessary extension and adds the qualifier. Variants of
the brush_image shaders are created with this feature, and are used
whenever a brush_image shader is requested for BlendMode::Advanced.
Differential Revision: https://phabricator.services.mozilla.com/D34617
--HG--
extra : moz-landing-system : lando
This patch implements the majority of the planned picture caching
improvements. It supports most of the functionality required to
(as a follow up) support OS compositor integration. It also improves
on the robustness and functionality of the previous picture caching
implementation.
There are some expected temporary performance regressions in
some cases (such as content that is constantly invalidating) and
during initial page render when many render targets must be drawn
to. These performance regressions will be resolved in follow up
commits by supporting multi-resolution tiles.
The scene is split into a number of slices, determined by the scroll
root of each primitive, which can be found by the primitive's
spatial node indices. If a scene contains too many slices, then
picture caching is disabled on the page, to avoid excessive texture
memory usage, and rendering falls back to rasterizing each frame.
The specific changes in this patch are:
* Support tile caches for multiple scroll roots, allowing the
entire page (including fixed divs and the main UI bar) to be
cached in most cases, in addition to the main content.
* Remove requirement to read tiles back from the framebuffer.
Instead, they are drawn into the picture cache target tiles,
and blitted to the screen. This is slightly slower than the
existing picture caching when content is constantly changing,
however this cost will disappear / become irrelevant when
the OS compositor integration work is complete.
* Switch picture cache render targets to be nearest sampled (they
are always rendered 1:1) and support depth buffer targets.
* Make use of the external scroll offset support to allow removal
of the primitive correlation hacks in the previous picture
caching implementation. Also allows storing of primitive
dependencies in picture space rather than world space, which
reduces floating point inaccuracies.
* Determine if each tile and picture cache can be considered
opaque. This is used to determine whether subpixel AA text
rendering is available on a slice, and for rendering optimizations
related to disabling blending and/or tile clears.
* Use the clip chain instance results from the recent visibility pass
work to determine clip chain dependencies. This results in fewer
clip item dependencies in tiles, which is faster to check validity
and reduces redundant invalidations.
* Remove extra overhead during batching related to batch lists,
and region iteration, as they are no longer required.
* Support PrimitiveVisibilityMask during batching. This allows a
single traversal of a picture (surface) root during batching to
efficiently construct multiple alpha batcher objects (typically
one per invalida tile).
* Picture caching is now handled implicitly by WR, depending on
the content of the scene. There is no requirement for client
code to manually select which stacking context should be cached.
* Simplify how clip chain / transform dependencies are tracked by
picture cache tiles.
* Support pushing / popping enclosing clip chain roots without
the need for a stacking context / picture in some cases. This
simplifies the logic to split the scene into multiple slices.
The main remaining work in this area is (a) extend the code to
optionally provide each slice as an input to the OS compositor
rather than drawing the tiles in WR, and (b) support multi-resolution
tiles so that we reduce the draw call, batching and render target
overhead in cases where much of the page content is changing.
Differential Revision: https://phabricator.services.mozilla.com/D34319
--HG--
extra : moz-landing-system : lando
The tile cache is 352 bytes large and in the majority of cases picture primitives don't have one, so this saves a few KB of ram in typical pages reduces the likely hood of hitting OOM crashes while growing the primitives vector.
Differential Revision: https://phabricator.services.mozilla.com/D34346
--HG--
extra : moz-landing-system : lando
Force perspective interpolation of UV coordinates in clip shaders.
In addition to fixing the interpolation curve, also adds checks for the homogeneous coordinates to be outside of the meaningful hemisphere, forcing the clip shaders to output zeroes in those areas.
Differential Revision: https://phabricator.services.mozilla.com/D34017
--HG--
extra : moz-landing-system : lando
This is the result of `cargo +nightly fix --all-features --all-targets`
using a recent rust nightly.
Depends on D33781
Differential Revision: https://phabricator.services.mozilla.com/D33782
--HG--
extra : moz-landing-system : lando
Having `ItemRange` contain a byte slice means that the `display_list` and
`pipeline_id` don't need to be threaded through code that accesses items from
ItemRange.
Differential Revision: https://phabricator.services.mozilla.com/D32781
--HG--
extra : moz-landing-system : lando
This refactor is in preparation for P3.
When refactoring `next_raw()` to use `peek_from` instead of
`deserialize_in_place`, it became clear that `ItemRange` is holding a slice of
bytes from the incoming serialized display list and `peek_from` could be adapted
work directly on the byte slice.
It was also noticed that the `get()` interface was potentially unsafe; any
`ItemRange` can be passed into `get()` for any display list.
Differential Revision: https://phabricator.services.mozilla.com/D32780
--HG--
extra : moz-landing-system : lando
The AsyncScreenshotGrabber now can operate in two modes:
* `ProfilerScreenshots`, which does asynchronous scaling of the captured frames
for inclusion in profiles by the Gecko Profiler; and
* `CompositionRecorder`, which does not do any scaling and is used for visual
metrics computations.
The latter mode is exposed by on the `Renderer` via the `record_frame`,
`map_recorded_frame`, and `release_composition_recorder_structures` methods.
A different handle type (`RecordedFrameHandle`) is returned and consumed by
these functions, but they translate between `RecordedFrameHandle` and
`AsyncScreenshotHandle` when communicating with the underlying
`AsyncScreenshotGrabber`.
I considered making the `AsyncScreenshotGrabber` generic over its handle type,
but the extra cost of monomorphization just to change the handle type did not
seem worth it.
Differential Revision: https://phabricator.services.mozilla.com/D32232
--HG--
extra : moz-landing-system : lando
The AsyncScreenshotGrabber now can operate in two modes:
* `ProfilerScreenshots`, which does asynchronous scaling of the captured frames
for inclusion in profiles by the Gecko Profiler; and
* `CompositionRecorder`, which does not do any scaling and is used for visual
metrics computations.
The latter mode is exposed by on the `Renderer` via the `record_frame`,
`map_recorded_frame`, and `release_composition_recorder_structures` methods.
A different handle type (`RecordedFrameHandle`) is returned and consumed by
these functions, but they translate between `RecordedFrameHandle` and
`AsyncScreenshotHandle` when communicating with the underlying
`AsyncScreenshotGrabber`.
I considered making the `AsyncScreenshotGrabber` generic over its handle type,
but the extra cost of monomorphization just to change the handle type did not
seem worth it.
Differential Revision: https://phabricator.services.mozilla.com/D32232
--HG--
extra : moz-landing-system : lando
The AsyncScreenshotGrabber now can operate in two modes:
* `ProfilerScreenshots`, which does asynchronous scaling of the captured frames
for inclusion in profiles by the Gecko Profiler; and
* `CompositionRecorder`, which does not do any scaling and is used for visual
metrics computations.
The latter mode is exposed by on the `Renderer` via the `record_frame`,
`map_recorded_frame`, and `release_composition_recorder_structures` methods.
A different handle type (`RecordedFrameHandle`) is returned and consumed by
these functions, but they translate between `RecordedFrameHandle` and
`AsyncScreenshotHandle` when communicating with the underlying
`AsyncScreenshotGrabber`.
I considered making the `AsyncScreenshotGrabber` generic over its handle type,
but the extra cost of monomorphization just to change the handle type did not
seem worth it.
Differential Revision: https://phabricator.services.mozilla.com/D32232
--HG--
extra : moz-landing-system : lando
We've had a constant of 10 hard-coded there since early days.
Turning it into a configurable number allows us to easier tune it and
debug related issues.
Differential Revision: https://phabricator.services.mozilla.com/D32761
--HG--
extra : moz-landing-system : lando
This is the last big step towards consistent flattening of transformations.
It includes removing the old "project_to_2d" method from the utils.
Differential Revision: https://phabricator.services.mozilla.com/D32528
--HG--
extra : moz-landing-system : lando
During the visibility pass, the main clip chain instance for each
primitive is created. In the prim prepare pass, a clip chain instance
is generated for each segment (of primitives that are segmented).
This previously required maintaining the active clip chain stack
during both passes. However, this is not ideal for a number of
reasons: the code is somewhat complicated / error prone and the
segment clip chain building step does more work than required.
This patch changes the segment clip chain building code to set up
the active clip nodes based on the result of the initial clip
chain built for the overall primitive during the visibility pass.
This means that it's no longer necessary to maintain the active
clip chain stack during the prepare pass. This simplifies some
upcoming picture caching changes related to avoiding redundant
cache invalidations, which is the main motivation for the change.
Differential Revision: https://phabricator.services.mozilla.com/D32250
--HG--
extra : moz-landing-system : lando
This is a follow-up to https://phabricator.services.mozilla.com/D30600
Previously, I changed changed the space mapper logic to use the world transformations.
This was seemingly needed because we requrested the relation between primitives and
their clip nodes, which could be in unrelated spatial sub-trees.
However, I believe the change was a mistake, since for clips we should not even try
to get the relative mapping, and clipping is done in world space for these cases.
This change reverts that logic back. ~~Fingers crossed for the try to not show any
asserts firing up inside get_relative_transform.~~ Try is green 🎉
Differential Revision: https://phabricator.services.mozilla.com/D32382
--HG--
extra : moz-landing-system : lando
When pinch zooming webrender would re-rasterize glyphs for each tiny
difference in zoom level. This takes time in itself, but also causes
the texture cache to grow incredibly large, to the point where
resizing it to make room for more glyphs takes far too much time.
This patch avoids this by rounding the size at which glyphs are
rasterized whilst pinch zooming. To do this we add a FrameMsg which
APZ uses to tell webrender whether a spatial node is being pinch
zoomed. Then during frame building if a spatial node is being pinch
zoomed we override the raster space of its corresponding picture.
The chosen raster space is the current zoom level rounded up to the
nearest power of two, but not exceeding 8x. This seems to be a good
balance between quality and performance, though at high zoom levels
the cache still does grow very large due to the size of the glyphs.
Differential Revision: https://phabricator.services.mozilla.com/D30213
--HG--
extra : moz-landing-system : lando
This patch contains two isolated changes related to upcoming picture
caching improvements. Specifically:
* Determine the blit reason for stacking contexts with clips
earlier, during scene building. This simplifies the code and
allows better detection of redundant stacking contexts.
* Centralize the code for pushing batch instances into a small
number of places. This will simplify the switch to adding
a single primitive to multiple batch lists, in the case of
dirty regions with different batchers.
Differential Revision: https://phabricator.services.mozilla.com/D31898
--HG--
extra : moz-landing-system : lando