This patch refactors how clip chains are internally represented and used
during scene and frame building. The intent is to make clip processing
during frame building more efficient and consistent. Additionally, this
work enables follow ups to cache the result of clip-chain builds between
frame and scene builds.
These changes will significantly reduce the cost of the visibility pass
for the common case when not much content has changed. In this patch,
the public API for clipping remains (mostly) the same, in order to allow
landing and stabilising this work without major changes to Gecko. However,
a longer term goal is to make the public WR clip API more closely match
the internal representation, to reduce work done during scene building.
Clips on a primitive can be categorized into two buckets. The first are
local clips that are specific to the primitive and move with it. These
could essentially be considered part of the definition of the primitive
itself. The second are a hierarchy of clips that apply to one or more
items, and may move independently of the primitive(s) they clip. These
clips are things like scroll regions, stacking context clips, iframe
clip regions etc. On (real world) pages, the clip hierarchy is typically
quite shallow, with a small number of clips that are shared by a large
number of primitives.
Finding clips that are shared between primitives is both required (for
things such as determining which picture cache slice a primitive can
be assigned to, while applying the shared clips during composition), and
also a potential optimization (processing shared clips only once and
caching this clip state similar primitives).
The public clip-chain API has two complexities that make the above
difficult and time consuming for WR to determine. It was possible to
express a clipping hierarchy both via the legacy clip parenting path
(via `ClipId` definitions) and also via clip-chains (the `parent`
field of a `ClipChain`). Second, clip-chains themselves can define
an arbitrary number and ordering of clips. Clips can also implicitly
apply to primitives via parent stacking contexts and iframes, but must
sometimes be removed (when an intermediate surface is created) for
performance reasons.
The new internal representation provided by this patch introduces a
`ClipTree` structure which is built during scene building by accumulating
the set of clips that apply to a primitive from all explicit and implicit
sources, and grafting this on to the existing clip-tree structure.
This provides WR a simple way to determine which clips are shared between
primitive (by checking ancestry) and reduces the size of the internal
representation (by sharing clips where possible rather than duplicating).
Interning is still used to identify parts of the clip-tree that define
the same clipping state.
Specific changes in this patch:
* Remove legacy `ClipId` style parenting support (in conjunction with
previous patches)
* Remove the public API ability to specify the clip on a primitive via
`ClipId` (it must now be a clip-chain)
* Remove `combined_local_clip_rect` from `PrimitiveInstance`, reducing
the size of the structure significantly
* Introduce `ClipTree` used during frame building, which is created by
`ClipTreeBuilder` during scene building
* Separate out per-primitive clip concept (`ClipTreeLeaf`) from clipping
hierarchy (`ClipTreeNode`). In future, more elements will be moved to
the `ClipTreeLeaf` and the state of each `ClipTreeNode` will be cached)
* Simplify the logic to disable / remove clips during frame building that
are applied by parent surface(s)
* Port hit-testing to be based on `ClipTree` which is simpler, faster and
also resolves some edge case correctness bugs
* Use a simpler and faster method to find shared clips during picture
cache slice assignment of primitives
* Update wrench to use the public clip-chain API definition changes
This patch already introduces some real-world optimizations (for example,
`displaylist_mutate` becomes 6% faster overall), but mostly sets things
up for follow up patches to be able to cache clip-state between frames,
which should result in much larger wins.
Differential Revision: https://phabricator.services.mozilla.com/D151987
The patch from 1780321 relaxes shared surface allocation, by allowing
surfaces to be shared even if they exist for >1 pass. However, it has
a logic bug - _non shared_ surfaces that are created may then be
allocated from as a shared surface if the `free_after` matches. This
restores the `is_shared` logic that used to exist, which fixes this
edge case (and still allows the performance optimization on the cases
that were fixed by 1780321).
Differential Revision: https://phabricator.services.mozilla.com/D152707
BorrowSnapshot can be called by OffScreenCanvas in various places that may send
a SourceSurfaceWebgl to the main thread. If it did not originate from the main
thread, then this can cause multiple threads to use it. In general we want to
avoid this. For now, override BorrowSnapshot and make it always force a Skia
snapshot that can be safely shared between threads instead of SourceSurfaceWebgl.
Differential Revision: https://phabricator.services.mozilla.com/D152417
This prevents copies and avoids the hack we have to avoid this, which
right now is using nsDependent{C,}String.
Non-virtual actors can still use `nsString` if they need to on the
receiving end.
Differential Revision: https://phabricator.services.mozilla.com/D152519
Implement the new dominant axis locking mode for the apz.axis_lock.mode
preference. When using this mode, we do not use the traditional axis locks.
Instead we only consider the input pan displacement for the axis with
a larger value, zeroing out the displacement on the opposite axis.
Differential Revision: https://phabricator.services.mozilla.com/D152104
With this patch moz_container_wayland_surface_lock() always locks MozContainer and needs to be paired with moz_container_wayland_surface_unlock() even if it fails and returns nullptr.
Split moz_container_wayland_add_initial_draw_callback() to two new functions:
- moz_container_wayland_add_initial_draw_callback_locked() is called on locked container and only adds draw callback.
It asserts when MozContainer is already to draw as we don't expect it.
- moz_container_wayland_add_or_fire_initial_draw_callback() is called on unlocked container as it has it's own lock.
It behaves as original moz_container_wayland_add_initial_draw_callback(), i.e. stores draw callback when MosContainer is not visible
and fires draw callback when we're ready to draw.
Differential Revision: https://phabricator.services.mozilla.com/D152276
A requirement of calling `get_relative_transform` is that the child
node is an ancestor of the reference node. To ensure this invariant
is met, we exclude non-ancestor scroll roots from consideration when
picking a scroll root for an atomic picture cache slice. However, this
can mean we select a non-optimal scroll root in some cases. But the
`get_relative_transform` constraint only applies if the spatial nodes
are in a different coordinate system - if we know that the scroll roots
are in the same coordinate system, we can always calculate the correct
relative transform, regardless of ancestry of the nodes. We can rely on
this to relax the condition here, which means we select a more appropriate
scroll root, resulting in much less invalidation and rasterization work
in these cases.
Differential Revision: https://phabricator.services.mozilla.com/D152236
In the presence of complex effects such as backdrop-filter, it's
possible that some picture cache tiles can be drawn in a different
pass to other picture cache tiles. If there are a large number of
child render tasks that are shared between tiles assigned to different
render passes, that may result in a large number of standalone render
target allocations, which can hurt performance and reduce batching
efficiency.
This patch allows shared surfaces to be used when they have a lifetime
that spans more than one pass. We track the `free_after` in the active
shared surface list, and only allocate tasks if they match the lifetime
of other tasks in that shared surface. Existing logic ensures that
surface is returned to the shared target pool only after the `free_after`
pass has occurred.
Differential Revision: https://phabricator.services.mozilla.com/D152235
There are 10 enum cases that we track internally, one of which isn't
emitted as telemetry. These cases are 0-indexed to match the enum values.
0 / NotVideo: Not used for telemetry. No video is showing.
1 / LowPower: We are showing exactly one video and we believe we are
hitting the video low power mode. We don't use "Success" because of name
collision in the telemetry generation.
2 / FailMultipleVideo: There is more than one video visible.
3 / FailWindowed: The video is being viewed in windowed mode, not
fullscreen mode, so low power mode is not possible.
4 / FailOverlaid: The video has something on top of it (like captions).
5 / FailBacking: The layer directly underneath the video does not cover
the window or does not have a black background.
6 / FailMacOSVersion: The system is running a too-early version of macOS.
7 / FailPref: The user has disabled the
`gfx.core-animation.specialize-video` pref.
8 / FailSurface: The video is encoded in such a way we can't decode it to
a qualifying pixel format.
9 / FailEnqueue: The video didn't enqueue properly, and we fell back to a
non-video display path.
Differential Revision: https://phabricator.services.mozilla.com/D129453
There are 10 enum cases that we track internally, one of which isn't
emitted as telemetry. These cases are 0-indexed to match the enum values.
0 / NotVideo: Not used for telemetry. No video is showing.
1 / Success: We are showing exactly one video and we believe we are
hitting the video low power mode.
2 / FailMultipleVideo: There is more than one video visible.
3 / FailWindowed: The video is being viewed in windowed mode, not
fullscreen mode, so low power mode is not possible.
4 / FailOverlaid: The video has something on top of it (like captions).
5 / FailBacking: The layer directly underneath the video does not cover
the window or does not have a black background.
6 / FailMacOSVersion: The system is running a too-early version of macOS.
7 / FailPref: The user has disabled the
`gfx.core-animation.specialize-video` pref.
8 / FailSurface: The video is encoded in such a way we can't decode it to
a qualifying pixel format.
9 / FailEnqueue: The video didn't enqueue properly, and we fell back to a
non-video display path.
Differential Revision: https://phabricator.services.mozilla.com/D129453
If MaybeRecordFrame is called after EndFrame, this means we are reading from
the back buffer state immediately after a call to SwapBuffers. The state of
the back buffer is undefined in that scenario, and mostly was just returning
old frames. We actually want to call MaybeRecordFrame before EndFrame, so we
get the valid contents of the back buffer before it is swapped out.
Differential Revision: https://phabricator.services.mozilla.com/D151988
A bug is not reported related to "zero copy hardware decoded video" on Windows. Zero video frame copy needs "reuse decoder device ". And it is already enabled on Nightly / Early Beta by Bug 1773714.
RadeonBlockNoVideoCopy is renamed to RadeonBlockZeroVideoCopy
Differential Revision: https://phabricator.services.mozilla.com/D152139
Implement the new dominant axis locking mode for the apz.axis_lock.mode
preference. When using this mode, we do not use the traditional axis locks.
Instead we only consider the input pan displacement for the axis with
a larger value, zeroing out the displacement on the opposite axis.
Differential Revision: https://phabricator.services.mozilla.com/D152104