The AsyncScreenshotGrabber now can operate in two modes:
* `ProfilerScreenshots`, which does asynchronous scaling of the captured frames
for inclusion in profiles by the Gecko Profiler; and
* `CompositionRecorder`, which does not do any scaling and is used for visual
metrics computations.
The latter mode is exposed by on the `Renderer` via the `record_frame`,
`map_recorded_frame`, and `release_composition_recorder_structures` methods.
A different handle type (`RecordedFrameHandle`) is returned and consumed by
these functions, but they translate between `RecordedFrameHandle` and
`AsyncScreenshotHandle` when communicating with the underlying
`AsyncScreenshotGrabber`.
I considered making the `AsyncScreenshotGrabber` generic over its handle type,
but the extra cost of monomorphization just to change the handle type did not
seem worth it.
Differential Revision: https://phabricator.services.mozilla.com/D32232
--HG--
extra : moz-landing-system : lando
The AsyncScreenshotGrabber now can operate in two modes:
* `ProfilerScreenshots`, which does asynchronous scaling of the captured frames
for inclusion in profiles by the Gecko Profiler; and
* `CompositionRecorder`, which does not do any scaling and is used for visual
metrics computations.
The latter mode is exposed by on the `Renderer` via the `record_frame`,
`map_recorded_frame`, and `release_composition_recorder_structures` methods.
A different handle type (`RecordedFrameHandle`) is returned and consumed by
these functions, but they translate between `RecordedFrameHandle` and
`AsyncScreenshotHandle` when communicating with the underlying
`AsyncScreenshotGrabber`.
I considered making the `AsyncScreenshotGrabber` generic over its handle type,
but the extra cost of monomorphization just to change the handle type did not
seem worth it.
Differential Revision: https://phabricator.services.mozilla.com/D32232
--HG--
extra : moz-landing-system : lando
The AsyncScreenshotGrabber now can operate in two modes:
* `ProfilerScreenshots`, which does asynchronous scaling of the captured frames
for inclusion in profiles by the Gecko Profiler; and
* `CompositionRecorder`, which does not do any scaling and is used for visual
metrics computations.
The latter mode is exposed by on the `Renderer` via the `record_frame`,
`map_recorded_frame`, and `release_composition_recorder_structures` methods.
A different handle type (`RecordedFrameHandle`) is returned and consumed by
these functions, but they translate between `RecordedFrameHandle` and
`AsyncScreenshotHandle` when communicating with the underlying
`AsyncScreenshotGrabber`.
I considered making the `AsyncScreenshotGrabber` generic over its handle type,
but the extra cost of monomorphization just to change the handle type did not
seem worth it.
Differential Revision: https://phabricator.services.mozilla.com/D32232
--HG--
extra : moz-landing-system : lando
We've had a constant of 10 hard-coded there since early days.
Turning it into a configurable number allows us to easier tune it and
debug related issues.
Differential Revision: https://phabricator.services.mozilla.com/D32761
--HG--
extra : moz-landing-system : lando
If ExternalImageType is just passed from C to rust, it caused crash on non-Windows platform. It was caused by stack corruption. Then &ExternalImageType is used instead of ExternalImageType to bypass the problem.
Differential Revision: https://phabricator.services.mozilla.com/D32436
--HG--
extra : moz-landing-system : lando
This is the last big step towards consistent flattening of transformations.
It includes removing the old "project_to_2d" method from the utils.
Differential Revision: https://phabricator.services.mozilla.com/D32528
--HG--
extra : moz-landing-system : lando
During the visibility pass, the main clip chain instance for each
primitive is created. In the prim prepare pass, a clip chain instance
is generated for each segment (of primitives that are segmented).
This previously required maintaining the active clip chain stack
during both passes. However, this is not ideal for a number of
reasons: the code is somewhat complicated / error prone and the
segment clip chain building step does more work than required.
This patch changes the segment clip chain building code to set up
the active clip nodes based on the result of the initial clip
chain built for the overall primitive during the visibility pass.
This means that it's no longer necessary to maintain the active
clip chain stack during the prepare pass. This simplifies some
upcoming picture caching changes related to avoiding redundant
cache invalidations, which is the main motivation for the change.
Differential Revision: https://phabricator.services.mozilla.com/D32250
--HG--
extra : moz-landing-system : lando
This is a follow-up to https://phabricator.services.mozilla.com/D30600
Previously, I changed changed the space mapper logic to use the world transformations.
This was seemingly needed because we requrested the relation between primitives and
their clip nodes, which could be in unrelated spatial sub-trees.
However, I believe the change was a mistake, since for clips we should not even try
to get the relative mapping, and clipping is done in world space for these cases.
This change reverts that logic back. ~~Fingers crossed for the try to not show any
asserts firing up inside get_relative_transform.~~ Try is green 🎉
Differential Revision: https://phabricator.services.mozilla.com/D32382
--HG--
extra : moz-landing-system : lando
These tests cause panics in debug mode because of the extra GL error
checking. Tests that are disabled are annotated with the failing
GL call.
Differential Revision: https://phabricator.services.mozilla.com/D32012
--HG--
extra : moz-landing-system : lando
This makes it so that when running reftests, wrench actually terminates
after a panic rather than just hanging. Termination is detectable and so
we can clean up properly instead of waiting until some other layer hits
a timeout.
Differential Revision: https://phabricator.services.mozilla.com/D32010
--HG--
extra : moz-landing-system : lando
When pinch zooming webrender would re-rasterize glyphs for each tiny
difference in zoom level. This takes time in itself, but also causes
the texture cache to grow incredibly large, to the point where
resizing it to make room for more glyphs takes far too much time.
This patch avoids this by rounding the size at which glyphs are
rasterized whilst pinch zooming. To do this we add a FrameMsg which
APZ uses to tell webrender whether a spatial node is being pinch
zoomed. Then during frame building if a spatial node is being pinch
zoomed we override the raster space of its corresponding picture.
The chosen raster space is the current zoom level rounded up to the
nearest power of two, but not exceeding 8x. This seems to be a good
balance between quality and performance, though at high zoom levels
the cache still does grow very large due to the size of the glyphs.
Differential Revision: https://phabricator.services.mozilla.com/D30213
--HG--
extra : moz-landing-system : lando
This patch contains two isolated changes related to upcoming picture
caching improvements. Specifically:
* Determine the blit reason for stacking contexts with clips
earlier, during scene building. This simplifies the code and
allows better detection of redundant stacking contexts.
* Centralize the code for pushing batch instances into a small
number of places. This will simplify the switch to adding
a single primitive to multiple batch lists, in the case of
dirty regions with different batchers.
Differential Revision: https://phabricator.services.mozilla.com/D31898
--HG--
extra : moz-landing-system : lando
This is part of the effort to get all the other versions of rand out.
Unfortunately the diff is kinda bug because this is the first crate
requiring rand 0.6 which has been split into multiple crates.
Differential Revision: https://phabricator.services.mozilla.com/D30744
--HG--
extra : moz-landing-system : lando
This is part of the effort to get all the other versions of rand out.
Unfortunately the diff is kinda bug because this is the first crate
requiring rand 0.6 which has been split into multiple crates.
Differential Revision: https://phabricator.services.mozilla.com/D30744
--HG--
extra : moz-landing-system : lando
This makes DrawTarget and ReadTarget no longer require a borrow
on a texture. This was previously fine, but in the near future
WR will be rendering picture caching surfaces directly into
texture handles. To allow this, we need to remove the borrow check
requirement on DrawTarget for rustc.
Differential Revision: https://phabricator.services.mozilla.com/D31390
--HG--
extra : moz-landing-system : lando
ColorMatrix is rarely used but takes most space in the Filter enum.
This removes 44 bytes from the enum and all structs that embed it.
Differential Revision: https://phabricator.services.mozilla.com/D30910
--HG--
extra : source : 07bf88839e1c35e678d11790e149dfda7f00cf30
extra : intermediate-source : bf2c0752c25f339a1179377dbb312249426609de
Rename the old overlapping corners testcase and add comments to make
the tests' purposes clearer:
* The existing one is testing that a corner is clipped correctly when
it overlaps with an adjacent corner.
* The new one is testing that corners and segments are clipped
correctly when opposite edges of the border overlap.
Depends on D30814
Differential Revision: https://phabricator.services.mozilla.com/D30815
--HG--
rename : gfx/wr/wrench/reftests/border/border-overlapping-ref.yaml => gfx/wr/wrench/reftests/border/border-overlapping-corner-ref.yaml
rename : gfx/wr/wrench/reftests/border/border-overlapping.yaml => gfx/wr/wrench/reftests/border/border-overlapping-corner.yaml
extra : moz-landing-system : lando
To fix bug 1496540 it was made so that webrender clips border corner
segments so that they do not overlap with their opposing
edges. However, cases where opposing _edges_ both overlap with
eachother (rather than just a corner overlapping with an edge), the
corners are clipped too far and a gap is left in the middle.
Additionally, no clipping was added to the edge segments. So rather
than there be a gap there is an area that is painted twice, which is
apparent if the colour is semi-transparent.
This fixes these issues by identifying when opposing edges overlap and
calculating the midpoint, then clipping the edges and corners to that
midpoint instead.
Differential Revision: https://phabricator.services.mozilla.com/D30814
--HG--
extra : moz-landing-system : lando
The implementation of `Device::map_pbo_for_readback` on GLES (e.g., Windows
with ANGLE) was using the incorrect enumeration value when attempting to map
the buffer into memory.
Differential Revision: https://phabricator.services.mozilla.com/D31156
--HG--
extra : moz-landing-system : lando
This moves the existing constants into a ReftestEnvironment which
encapsulates it a bit better. Also this fixes the incorrect "debug" cfg
check to "debug_assertions" which is more correct.
Differential Revision: https://phabricator.services.mozilla.com/D31181
--HG--
extra : moz-landing-system : lando
This moves the existing constants into a ReftestEnvironment which
encapsulates it a bit better. Also this fixes the incorrect "debug" cfg
check to "debug_assertions" which is more correct.
Differential Revision: https://phabricator.services.mozilla.com/D31181
--HG--
extra : moz-landing-system : lando
ColorMatrix is rarely used but takes most space in the Filter enum.
This removes 44 bytes from the enum and all structs that embed it.
Differential Revision: https://phabricator.services.mozilla.com/D30910
--HG--
extra : moz-landing-system : lando
sed -i 's/RenderTaskTree/RenderTaskGraph/g' gfx/wr/webrender/**/*.rs
sed -i 's/task tree/task graph/g' gfx/wr/webrender/**/*.rs
Differential Revision: https://phabricator.services.mozilla.com/D30897
--HG--
extra : moz-landing-system : lando
* Store render task address per-instance rather than per-primitive, to allow adding a single primitive to multiple batches / render tasks.
* Store render task id inside alpha batch builder, since multiple batch builders will be passed in future.
* Add primitive visibility mask, storing a bit mask of which dirty regions a visible primitive intersects.
* Store RenderTaskAddress as a u16 in CPU and shader types.
* Add picture caching debug flag to wrench.
Differential Revision: https://phabricator.services.mozilla.com/D30854
--HG--
extra : moz-landing-system : lando
This is part of the effort to get all the other versions of rand out.
Unfortunately the diff is kinda bug because this is the first crate
requiring rand 0.6 which has been split into multiple crates.
Differential Revision: https://phabricator.services.mozilla.com/D30744
--HG--
extra : moz-landing-system : lando
When the clip chain generates the local clip rect for a primitive, it
can be empty. This violated the assumption that the visible rect will
never be empty, and so when we snap, it produces NaNs, which in turn,
violates other assumptions when converting between spaces, and hence the
crash.
Now we cull the primitive from the visible list if the local clip rect
is empty, or if the visible rect is empty.
Differential Revision: https://phabricator.services.mozilla.com/D30659
This change makes get_relative_transform() to no longer rely on any flattening done before in the pipeline.
This makes it correct is some of the cases we failed previously (see ini files removed).
It now does flattening on every flat coordinate system it passes through, and it's used for SpaceMapper.
The old RelativeTransform is now replaced with CoordinateSpaceMapping, which reduces the zoo of our types :)
Differential Revision: https://phabricator.services.mozilla.com/D30600
--HG--
extra : moz-landing-system : lando
In some cases, Gecko supplies a display item with an image clip
mask where the image itself does not exist in the resource cache.
In these cases, WR would skip requesting the image, but would
still try to fetch the image info during batching, causing a panic.
This patch skips adding clip items to the batching pass if the
image mask does not exist.
It also adds support to wrench for a crash test when the image
mask is invalid.
Differential Revision: https://phabricator.services.mozilla.com/D30560
--HG--
extra : moz-landing-system : lando
This patch is scaffolding only - there shouldn't be any functional
changes as a result of these changes.
Differential Revision: https://phabricator.services.mozilla.com/D30330
--HG--
extra : moz-landing-system : lando
This moves creation of the render task graph to be slightly
earlier during frame building. Instead of creeating the render
tasks during prepare_prims, they are created during the
take_context call for pictures, before recursing into primitives.
This means that in future we can have a surface that targets
multiple render tasks (e.g. for multiple dirty regions), which
may create a different batch set for each region.
Differential Revision: https://phabricator.services.mozilla.com/D29994
--HG--
extra : moz-landing-system : lando
cargo fix works by building under a specific config, so we can't hit both
sides of a mutually exclusive pair of cfgs, such as cfg(target_os) or
when cfg(feature) and cfg(not(feature)) both exist in the codebase.
Differential Revision: https://phabricator.services.mozilla.com/D29568
--HG--
extra : moz-landing-system : lando
Notably `extern crate foo as bar` is no longer supported
(must do it in Cargo.toml). Also stopped using euclid through webrender_api,
because it produces worse results in 2018.
Differential Revision: https://phabricator.services.mozilla.com/D29566
--HG--
extra : moz-landing-system : lando
There's two ways we could get around this. We could add a path
around the prepared_for_frames assertion in GpuCache::end_frame,
or we can do this, and leave the TODO explicit with the
assertion. I took the latter approach because we can clear
the GpuCache / TextureCache through other routes than frame
building anyway, so we could hit this failure path before
any of the patches in this bug landed.
Differential Revision: https://phabricator.services.mozilla.com/D29884
--HG--
extra : moz-landing-system : lando
These aren't great names - I couldn't come up with better though.
We lose the symmetry of before/after but this clarifies a little
bit what they are doing.
Differential Revision: https://phabricator.services.mozilla.com/D29877
--HG--
extra : moz-landing-system : lando
If we run a GpuCache clear and do not update all documents, we run into
situations where the stale documents try to access things in the gpu
cache and just show random garbage. The solution is similar to what we
do for TextureCache, so I am just duplicating those signatures and providing
helpers in RenderBackend to access both together.
Differential Revision: https://phabricator.services.mozilla.com/D29860
--HG--
extra : moz-landing-system : lando
Due to the render task ping-pong target allocation scheme, we need to ensure:
- that tasks only read from tasks that are an odd number of passes apart,
- that render task content is kept valid long enough for all of the dependent tasks.
The former is solved in this patch through blit tasks, the latter by marking tasks for saving as needed.
Differential Revision: https://phabricator.services.mozilla.com/D29800
--HG--
extra : moz-landing-system : lando
If a document has an empty rect for whatever reason (in the case observed, this was
due to running a full screen video in the content document, which occludes the
chrome document), then we still want to ensure that we build a frame if the
texture cache has performed a clear, as otherwise we may try to access stale
items from the texture resolver in render_impl.
Differential Revision: https://phabricator.services.mozilla.com/D29848
--HG--
extra : moz-landing-system : lando
This allows us to blacklist certain configurations. Previously we could only
use platform(...) which was more of a whitelist.
Differential Revision: https://phabricator.services.mozilla.com/D29729
--HG--
extra : moz-landing-system : lando
This makes some minor tweaks:
- Use a /sdcard/wrench/ folder for the args and reftests. For me bundling
the reftests as assets didn't work (app would panic trying to load those
files). Plus it seems better to not always bundle the reftests with the app.
- Always dump a full backtrace on Android
- Build both x86 and armv7 architectures of wrench into the same APK for
better compatibility.
- Update documentation.
Differential Revision: https://phabricator.services.mozilla.com/D29728
--HG--
extra : moz-landing-system : lando
Historically we calculated the snapping offsets in the GPU shaders.
Because this information is always needed on the CPU side, we now just
pass the values into the shader instead of recalculating again. This
ensures we will use the same set of values consistently and makes it
easier to adjust how we snap in the future.
This patch should have no functional change on the output of WebRender
itself.
Differential Revision: https://phabricator.services.mozilla.com/D28883
We currently do most snapping on the GPU in the shader. However the
picture's local rect needs to take into account the snapping done there,
so we need to calculate this earlier in the pipeline. Instead of using
the clipped primitive local rects to create the picture's own local
rect, we now snap the child local rects first. If no snapping is
required, there should be no functional change. If snapping is required,
there should be fewer visual distortions caused by an inaccurate picture
local rect.
Differential Revision: https://phabricator.services.mozilla.com/D28882
We currently calculate a picture's local rect when we are doing the
first picture traversal. It was composed of the union of the clipped
local rects of its children. However the true local rect of a picture is
the union of the snapped clipped local rects of its children. The
snapping is done in device space, but we won't know the exact transform
until we establish the raster roots, which is based on the picture's
local rect.
As such, we create an estimated local rect which is how we currently
calculate the local rect. Then once the raster roots have been selected,
we recalculate the local rect of the picture based on its children
during update visibility.
This patch should have not contain any functional changes.
Differential Revision: https://phabricator.services.mozilla.com/D28881
This builds on earlier patches to move all remaining resource requests
to occur during the visibility pass, and change the block on glyph
rasterization to occur after the visibility pass.
This allows batch generation to occur during the prepare_prims pass,
which is useful for generating a batch set per-dirty-region, while
only doing a single pass of the picture / primitive tree.
Differential Revision: https://phabricator.services.mozilla.com/D29584
--HG--
extra : moz-landing-system : lando
Introduce a new texture allocation operation "reset", which acts like a "realloc" but without the contents preserved.
Use it for the picture texture cache.
Differential Revision: https://phabricator.services.mozilla.com/D29539
--HG--
extra : moz-landing-system : lando
In trying to diagnose bug 1538540, I'm hitting my limits as far as
simply staring at the code and trying to work out possible ways to
hit the crash goes. This assertion will split the search space into
clear-related causes and non-clear-related causes to narrow things
down.
Differential Revision: https://phabricator.services.mozilla.com/D29420
--HG--
extra : moz-landing-system : lando
Use natively supported mix-blend modes, where appropriate. Disabled by default.
Differential Revision: https://phabricator.services.mozilla.com/D26350
--HG--
extra : moz-landing-system : lando
This patch changes glyph requests in the resource cache to occur
as soon as a text run is found to be visible, rather than during
the prepare_prims pass.
This has two major benefits:
- (with other patches) will allow some batching code to run
during the prepare_prims pass.
- allows glyph raster worker threads to start earlier in the
frame, which may lead to less time blocking on the workers.
Differential Revision: https://phabricator.services.mozilla.com/D29458
--HG--
extra : moz-landing-system : lando
This is a first step towards allowing (some) batching work to be
done during prepare_prims pass rather than render pass building.
This is prep work related to output different batch lists for a given
picture (e.g. a different batch list per dirty region), rather than
replaying the same batch list.
Differential Revision: https://phabricator.services.mozilla.com/D29445
--HG--
extra : moz-landing-system : lando
In addition, batch together render tasks for the cached render tasks.
Differential Revision: https://phabricator.services.mozilla.com/D23839
--HG--
extra : source : 1b80e184b0bd6324ea62d1e38b2061da2b96a867
In trying to diagnose bug 1538540, I'm hitting my limits as far as
simply staring at the code and trying to work out possible ways to
hit the crash goes. This assertion will split the search space into
clear-related causes and non-clear-related causes to narrow things
down.
Differential Revision: https://phabricator.services.mozilla.com/D29420
--HG--
extra : moz-landing-system : lando
The Rgba8 enum value is redundant with the Standard(ImageFormat::RGBA8) value,
this patch collapses the former into the latter. Which then makes the entire
ReadPixelsFormat redundant, so we can get rid of it completely.
Differential Revision: https://phabricator.services.mozilla.com/D29059
--HG--
extra : moz-landing-system : lando
This changes our backface visibility semantics to a slightly more complex rule,
as described by Matt (and reinterpret by me in the context of WR) in bug 1525641 during our work week.
We are now propagating is_backface_visible to pictures and evaluate it in the context of
the local transform for a picture if it's outside of preserve-3d context.
We also refactor get_relative_transform() a bit.
Note: this fixes all of the existing backface-visibility bugs: 1525641, 1546110, 1546818
It also passes the Wrench tests, but the try push is still pending for surprises.
Differential Revision: https://phabricator.services.mozilla.com/D29009
--HG--
rename : gfx/wr/wrench/reftests/backface/backface-leaf.yaml => gfx/wr/wrench/reftests/backface/backface-3d-leaf.yaml
extra : moz-landing-system : lando
This patch is the output of rebuilding the debugger frontend with a
recent npm installation. The changes to dist/ and the addition of
package-lock.json are a result of the build. The changes to package.json
and main.js were done manually to work around an incompatible change
in beufy 0.6.7 (the lib/ folder changed to dist/).
Differential Revision: https://phabricator.services.mozilla.com/D28357
--HG--
extra : moz-landing-system : lando
This adds a RendererOption flag to control whether the debug server is
enabled. This allows the debug server feature to be built without
enabling the feature by default; it can then be enabled on a
per-renderer basis via the RendererOption.
Differential Revision: https://phabricator.services.mozilla.com/D28352
--HG--
extra : moz-landing-system : lando
The debugger in WebRender uses the image crate to generate PNGs, and so
it only really needs the png codec feature from the image crate.
Differential Revision: https://phabricator.services.mozilla.com/D28351
--HG--
extra : moz-landing-system : lando
The gist of the problem I introduced with the framebuffer coordinate system is that we provided the window rect (effectively) twice:
1. when computing the document rectangle (and Y-inverting it)
2. when rendering
If between these points the window got resized (during scene building), we end up with our document aligned to bottom left corner.
The user expects content to still be aligned to the top left, so that's what is visible as a bug.
The change here switched scene building to only care about device coordinate space, restraining the framebuffer coordinates to be mostly
an implementation detail of the renderer/device (the way it was originally meant to be, when introduced). This way the current window size
is only considered once during rendering.
Differential Revision: https://phabricator.services.mozilla.com/D28731
--HG--
extra : moz-landing-system : lando
This patch is the output of rebuilding the debugger frontend with a
recent npm installation. The changes to dist/ and the addition of
package-lock.json are a result of the build. The changes to package.json
and main.js were done manually to work around an incompatible change
in beufy 0.6.7 (the lib/ folder changed to dist/).
Differential Revision: https://phabricator.services.mozilla.com/D28357
--HG--
extra : moz-landing-system : lando
This adds a RendererOption flag to control whether the debug server is
enabled. This allows the debug server feature to be built without
enabling the feature by default; it can then be enabled on a
per-renderer basis via the RendererOption.
Differential Revision: https://phabricator.services.mozilla.com/D28352
--HG--
extra : moz-landing-system : lando
The debugger in WebRender uses the image crate to generate PNGs, and so
it only really needs the png codec feature from the image crate.
Differential Revision: https://phabricator.services.mozilla.com/D28351
--HG--
extra : moz-landing-system : lando
disclaimer: this isn't an *amazing* cleanup, but more of a major step that
unlocks the ability to do more minor cleanups and refinements. There's some
messy things and inconsistencies here and there, but we can hopefully iron
them out over time.
1. The primary change here is to move from
struct { common_fields, enum(specific_fields) }
to
enum (maybe_common_fields, specific_fields)
most notably this drops the common fields from a ton of things
that don't need them PopXXX, SetXXX, ClipChain, etc.
2. Additionally some types have had some redundant states shaved off,
for instance, rect no longer has *both* bounds and a clip_rect, as
the intersection of the two can be used. This was done a bit conservatively
as some adjustments will need to be done to the backend to fully eliminate
some states, and this can be done more incrementally.
2.5. As a minor side-effect of 2, we now early-reject some primitives whose
bounds and clip_rect are disjoint.
3. A HitTest display item has been added, which is just a Rect without
color. In addition to the minor space wins from this, this makes it much
easier to debug display lists
4. Adds a bunch of comments to the display list, making it easier to understand
things.
The end result of all these changes is a significantly smaller and easier to
understand display list. Especially on pages like gmail which have so many
clip chains. However this ultimately just makes text an even greater percentage
of pages (often 70-80%).
Differential Revision: https://phabricator.services.mozilla.com/D27439
--HG--
extra : moz-landing-system : lando
* make all enums repr(u8) (compiler bug blocking this long fixed)
* add display list stats feature
* remove cache markers (abandoned design)
* don't always push empty SetFilters before PushStackingContext
* remove dead pub methods
Differential Revision: https://phabricator.services.mozilla.com/D25845
--HG--
extra : moz-landing-system : lando
External scroll offsets are not propagated across reference frames.
When a perspective element scrolls relative to an ancestor scroll
frame, remove the effect of any external scroll offset during that
offset calculation.
Differential Revision: https://phabricator.services.mozilla.com/D28443
--HG--
extra : moz-landing-system : lando
disclaimer: this isn't an *amazing* cleanup, but more of a major step that
unlocks the ability to do more minor cleanups and refinements. There's some
messy things and inconsistencies here and there, but we can hopefully iron
them out over time.
1. The primary change here is to move from
struct { common_fields, enum(specific_fields) }
to
enum (maybe_common_fields, specific_fields)
most notably this drops the common fields from a ton of things
that don't need them PopXXX, SetXXX, ClipChain, etc.
2. Additionally some types have had some redundant states shaved off,
for instance, rect no longer has *both* bounds and a clip_rect, as
the intersection of the two can be used. This was done a bit conservatively
as some adjustments will need to be done to the backend to fully eliminate
some states, and this can be done more incrementally.
2.5. As a minor side-effect of 2, we now early-reject some primitives whose
bounds and clip_rect are disjoint.
3. A HitTest display item has been added, which is just a Rect without
color. In addition to the minor space wins from this, this makes it much
easier to debug display lists
4. Adds a bunch of comments to the display list, making it easier to understand
things.
The end result of all these changes is a significantly smaller and easier to
understand display list. Especially on pages like gmail which have so many
clip chains. However this ultimately just makes text an even greater percentage
of pages (often 70-80%).
Differential Revision: https://phabricator.services.mozilla.com/D27439
--HG--
extra : moz-landing-system : lando
* make all enums repr(u8) (compiler bug blocking this long fixed)
* add display list stats feature
* remove cache markers (abandoned design)
* don't always push empty SetFilters before PushStackingContext
* remove dead pub methods
Differential Revision: https://phabricator.services.mozilla.com/D25845
--HG--
extra : moz-landing-system : lando
Use the external scroll offsets provided by Gecko to:
(a) Offset primitives and clips by accumulated scroll offset.
(b) Adjust the scroll transforms and hit test results.
This allows primitives and clips to be stored in a true local space,
that is consistent between display lists, even if scrolling has
occurred. This is a step towards planned picture caching improvements.
Differential Revision: https://phabricator.services.mozilla.com/D27856
--HG--
extra : moz-landing-system : lando
We fail the assertion at the beginning of GpuCache::begin_frame
right now because we run the extract_updates call after building
frames for every document. This just moves the extract_updates
work to be per-document.
Differential Revision: https://phabricator.services.mozilla.com/D25251
--HG--
extra : moz-landing-system : lando
These are trivially different. I couldn't find a cause for the
difference, so I am just regenerating them.
Differential Revision: https://phabricator.services.mozilla.com/D27544
--HG--
extra : moz-landing-system : lando
... and ensure that, if we do cleanup, we generate frames for every document.
Differential Revision: https://phabricator.services.mozilla.com/D25133
--HG--
extra : moz-landing-system : lando
We discussed this a bit in Orlando. Essentially, we want to run cleanup
operations in texture_cache before all documents' frames, and then be
able to ensure that every document generates a frame, because otherwise
we will run into problems with evicted cache items used by non-updating-
but-still-rendering documents. Accordingly, we need an endpoint to
lump all of the transactions that generate frames together. This adds
that and builds out all of the plumbing necessary.
Differential Revision: https://phabricator.services.mozilla.com/D25132
--HG--
extra : moz-landing-system : lando
These are trivially different. I couldn't find a cause for the
difference, so I am just regenerating them.
Depends on D25134
Differential Revision: https://phabricator.services.mozilla.com/D27544
--HG--
extra : moz-landing-system : lando
... and ensure that, if we do cleanup, we generate frames for every document.
Differential Revision: https://phabricator.services.mozilla.com/D25133
--HG--
extra : moz-landing-system : lando
We discussed this a bit in Orlando. Essentially, we want to run cleanup
operations in texture_cache before all documents' frames, and then be
able to ensure that every document generates a frame, because otherwise
we will run into problems with evicted cache items used by non-updating-
but-still-rendering documents. Accordingly, we need an endpoint to
lump all of the transactions that generate frames together. This adds
that and builds out all of the plumbing necessary.
Differential Revision: https://phabricator.services.mozilla.com/D25132
--HG--
extra : moz-landing-system : lando
The local rect for border segments is not solely determined by
the widths and/or radius. Instead of determining the max scale
based on those parameters, use the calculated border segment
rects to determine an appropriate max scale factor.
Differential Revision: https://phabricator.services.mozilla.com/D27189
--HG--
extra : moz-landing-system : lando
The way that world content transform is calculated has some
inconsistencies related to transform flattening, compared to
the get_relative_transform implementation.
Reducing usage of this field will make it simpler to take
advantage of the external scroll offset, which is needed for
some of the planned picture caching improvements.
This patch removes the simple uses of world_content_transform,
but there are still a small number of more complicated uses that
need to be handled separately.
Differential Revision: https://phabricator.services.mozilla.com/D26651
--HG--
extra : moz-landing-system : lando
... and ensure that, if we do cleanup, we generate frames for every document.
Differential Revision: https://phabricator.services.mozilla.com/D25133
--HG--
extra : moz-landing-system : lando
We discussed this a bit in Orlando. Essentially, we want to run cleanup
operations in texture_cache before all documents' frames, and then be
able to ensure that every document generates a frame, because otherwise
we will run into problems with evicted cache items used by non-updating-
but-still-rendering documents. Accordingly, we need an endpoint to
lump all of the transactions that generate frames together. This adds
that and builds out all of the plumbing necessary.
Differential Revision: https://phabricator.services.mozilla.com/D25132
--HG--
extra : moz-landing-system : lando
Today we already support EXT_debug_marker for debug markers in
WebRender. This is useful to categorize GL API calls in tools such as
APITrace and RenderDoc. However not all drivers indicate support for
said extension, but instead support KHR_debug. This patch makes us
support both methods, preferring KHR_debug.
Differential Revision: https://phabricator.services.mozilla.com/D25787
You tell me if this is right, I have no Windows build available to test :)
Depends on D25602
Differential Revision: https://phabricator.services.mozilla.com/D25603
--HG--
extra : moz-landing-system : lando
The change contains a number of incremental improvements with the main goal of:
- allocating exactly as many tile as required by the app
- respecting the picture caching option
Differential Revision: https://phabricator.services.mozilla.com/D24740
--HG--
extra : moz-landing-system : lando
On some low end GPUs, clip mask rendering can be a significant
GPU cost. One way to reduce this is to support segment rendering
on more primitive types, to reduce the size of clip masks.
This patch adds support for pictures that have clip masks
to take part in segment rendering. This can significantly
reduce the number of fragments that must be drawn into a clip
mask for off-screen picture surfaces.
In future, WR can take advantage of segment rendering to use
clip mask shaders that handle only a single corner at a time.
This will be a further significant performance win to clip mask
render time.
Differential Revision: https://phabricator.services.mozilla.com/D24852
--HG--
extra : moz-landing-system : lando
The change contains a number of incremental improvements with the main goal of:
- allocating exactly as many tile as required by the app
- respecting the picture caching option
Differential Revision: https://phabricator.services.mozilla.com/D24740
--HG--
extra : moz-landing-system : lando
As discussed in IRC. AFAICT the ORTHO_NEAR|FAR_PLANE should match
up with the ranges of valid ZBufferIds, but please double-check
me.
Differential Revision: https://phabricator.services.mozilla.com/D23599
--HG--
extra : moz-landing-system : lando
This corrects A) An issue encountered with our strategy for skipping
the end_pass call for all but an offscreen render target. See the
comment above the end_pass call for details, and B) An issue with
depth clearing where we do not clear the whole rect if there are
multiple non-intersecting documents.
Differential Revision: https://phabricator.services.mozilla.com/D23056
--HG--
extra : moz-landing-system : lando
Just a little rename for clarity - I ended up scratching my head at
something for a little more time than I should have because I assumed
this was synchronous without looking at the implementation.
Differential Revision: https://phabricator.services.mozilla.com/D21584
--HG--
extra : moz-landing-system : lando
This is a large patch that contains all of the core changes for
renderroot splitting.
Differential Revision: https://phabricator.services.mozilla.com/D20701
--HG--
extra : moz-landing-system : lando
As discussed in IRC. AFAICT the ORTHO_NEAR|FAR_PLANE should match
up with the ranges of valid ZBufferIds, but please double-check
me.
Differential Revision: https://phabricator.services.mozilla.com/D23599
--HG--
extra : moz-landing-system : lando
This corrects A) An issue encountered with our strategy for skipping
the end_pass call for all but an offscreen render target. See the
comment above the end_pass call for details, and B) An issue with
depth clearing where we do not clear the whole rect if there are
multiple non-intersecting documents.
Differential Revision: https://phabricator.services.mozilla.com/D23056
--HG--
extra : moz-landing-system : lando
Just a little rename for clarity - I ended up scratching my head at
something for a little more time than I should have because I assumed
this was synchronous without looking at the implementation.
Differential Revision: https://phabricator.services.mozilla.com/D21584
--HG--
extra : moz-landing-system : lando
This is a large patch that contains all of the core changes for
renderroot splitting.
Differential Revision: https://phabricator.services.mozilla.com/D20701
--HG--
extra : moz-landing-system : lando
This patch shouldn't have any functional effect. It's motivated by
some changes needed to implement clip masks in pixel local storage,
but these are also general improvements that stand alone. Specifically:
- Remove clip_task_index from the global PrimitiveHeader struct.
In most cases, the clip task is supplied in the BrushInstance
structure, so it makes no sense to have this as a common field,
where it is generally unused. Instead, there is now an extra
'user data' field available in the PrimitiveHeader. Non-brush
shaders (text_run and split_composite) use that extra field to
store the clip task address, while the brush shaders gain an extra
(currently unused) user data field.
- In turn, this means there is no need to unconditionally try
and retrieve the first clip task address for a primitive
during batching. This was previously used to initialize the
PrimitiveHeader structure.
Differential Revision: https://phabricator.services.mozilla.com/D24312
--HG--
extra : moz-landing-system : lando
Add an experimental code path that makes use of the pixel local
storage extension available on many mobile GPUs.
This code path is currently disabled by default, as the support
is not complete for all primitives and blend modes. The initial
aim is to get feature parity with the existing renderer.
Once that's complete, we can take advantage of the (minimum)
12 bytes per pixel of high speed on-tile memory to store custom
data.
Clip masks are a good use case for this, since they map 1:1 with
the position of the fragment they are clipping. Using this for
clip masks allows us to handle clipping on mobile GPUs in a much
more efficient way - we can skip (a) separate render targets,
(b) target resolve (c) sample the mask texture during rendering.
Depends on D24123
Differential Revision: https://phabricator.services.mozilla.com/D24124
--HG--
extra : moz-landing-system : lando
Most rounded rect clips in real content are (a) axis aligned and
(b) have uniform radii.
When these conditions are met, we can run a fast path for clip
mask generation that uses significantly fewer ALU shader ops.
This is not typically a bottleneck on desktop GPUs, but can have
a large performance impact on mobile GPUs (and perhaps low end
integrated GPUs).
The Mali shader analyzer reports the slow path for the rounded
rect clip shader to be 94 cycles per fragment, while the fast
path is 10 cycles.
Differential Revision: https://phabricator.services.mozilla.com/D23817
--HG--
extra : moz-landing-system : lando
Most rounded rect clips in real content are (a) axis aligned and
(b) have uniform radii.
When these conditions are met, we can run a fast path for clip
mask generation that uses significantly fewer ALU shader ops.
This is not typically a bottleneck on desktop GPUs, but can have
a large performance impact on mobile GPUs (and perhaps low end
integrated GPUs).
The Mali shader analyzer reports the slow path for the rounded
rect clip shader to be 94 cycles per fragment, while the fast
path is 10 cycles.
Differential Revision: https://phabricator.services.mozilla.com/D23817
--HG--
extra : moz-landing-system : lando
In GLES 3 GLSL, the default precision for ints is highp (32 bit) in
vertex shaders, but only mediump (16 bit) in fragment shaders.
To render linear and radial gradients the fragment shader must fetch
the gradient stops from the gpu cache, using an address variable. This
variable is a 16 bit int, so if the stops data is located at
too high an address (row 32 or greater) then this value will have
overflown and we fetch from the wrong location. This was resulting in
garbage being drawn instead of the correct gradients.
To fix this, any address used in a fragment shader must be marked as
highp. This includes the varying input which supplies the address, and
the arguments to any functions used for the fetch.
Differential Revision: https://phabricator.services.mozilla.com/D23669
--HG--
extra : moz-landing-system : lando
The only time that the ancestor spatial node index is read is
during push_stacking_context. This means that even if it was
used as an ancestor for a 3d context, we can safely collapse
it in to the parent stacking context during flattening, if it
is otherwise redundant.
This is a partial fix for picture caching heuristics failing
with the display list produced on mobile devices.
Differential Revision: https://phabricator.services.mozilla.com/D23633
--HG--
extra : moz-landing-system : lando
The cleans up our WR use statements further, easying the merge conflicts.
Note: this PR is subject to instant rot, it is preferred to land quickly.
Differential Revision: https://phabricator.services.mozilla.com/D23373
--HG--
extra : moz-landing-system : lando
The cleans up our WR use statements further, easying the merge conflicts.
Note: this PR is subject to instant rot, it is preferred to land quickly.
Differential Revision: https://phabricator.services.mozilla.com/D23373
--HG--
extra : moz-landing-system : lando
As a general principle, we're trying to remove most usage of the
render task data that gets stored in a texture (bind_frame_data
is slow on many platforms, and it's somewhat inflexible with the
amount of data that can be provided).
This also lays some foundations for possibly drawing clip masks
with alternative techniques on mobile / tiled GPU architectures,
where we may not need / want a render task for clip masks.
Differential Revision: https://phabricator.services.mozilla.com/D23271
--HG--
extra : moz-landing-system : lando
By Bug 1526213, WebRenderBridgeParent::RecvEmptyTransaction() does not handle a case that resource update is handled by WebRenderTextureHostWrapper. In this case, txn.IsResourceUpdatesEmpty() became true and the function thought there was no resource update and the function returned DidComposite soon to client side. Then it caused a heavy over production of SharedSurface_ANGLEShareHandle if GPU is not powerful.
Differential Revision: https://phabricator.services.mozilla.com/D22454
Just a refactor without any semantical changes.
Makes it for cleaner client code. Also prepares for caching of the world transforms.
Differential Revision: https://phabricator.services.mozilla.com/D23015
--HG--
extra : moz-landing-system : lando
Establishes a clear link of TransformData with the shader.
Associates a vertex data type permanently, which makes the code cleaner and allow finding the relevant bits quickly with search.
Differential Revision: https://phabricator.services.mozilla.com/D23006
--HG--
extra : moz-landing-system : lando
Env logger is useful for:
1. seeing error! generated by the code (by default). Sometimes it can give us hints on what is wrong with the capture, or a test right away.
2. seeing processed shader compile/link errors, which is very useful when changing them
Given that wrench is the main debugging tool at the moment, enabling the logging output by default should increase our productivity, at the cost of slightly longer build times.
Differential Revision: https://phabricator.services.mozilla.com/D23000
--HG--
extra : moz-landing-system : lando
By Bug 1526213, WebRenderBridgeParent::RecvEmptyTransaction() does not handle a case that resource update is handled by WebRenderTextureHostWrapper. In this case, txn.IsResourceUpdatesEmpty() became true and the function thought there was no resource update and the function returned DidComposite soon to client side. Then it caused a heavy over production of SharedSurface_ANGLEShareHandle if GPU is not powerful.
Differential Revision: https://phabricator.services.mozilla.com/D22894
--HG--
extra : moz-landing-system : lando
Recently, semantics for clips on stacking contexts were changed
such that primitives inherit the clip chain from all enclosing
stacking contexts. However, the hit testing code was not updated
to handle this change.
As each hit testing primitive is added, the current stack of
active stacking contexts is now scanned. Any valid clip chain
roots from the primitive and/or the stacking context stack are
added to the list of clip chain roots for this hit testing
primitive.
This patch also applies some optimizations and other cleanups
of the hit-testing code, specifically:
- Instead of cloning the hit testing runs Vec every time a
frame is built, store these in the new HitTestingScene. The
HitTestingScene is built once per scene, and then shared by
any hit tester instances via an Arc. This reduces a lot of
memory allocations and copying during scrolling.
- When creating a new HitTestingScene, pre-allocate the size
of the arrays, based on the size of the previous hit testing
structure. This works similarly to how arrays are sized
in the PrimitiveStore.
- Pre-calculate and cache a number of inverse transform matrices
that were previously being calculated for each hit testing run.
- Store hit testing primitives in a flat array, instead of runs,
since there is no longer a single clip chain id per primitive.
- Fix an apparent (?) bug in the existing hit testing code, where
clipping out a single hit test primitive would break out of the
loop for the current run of hit test items.
Differential Revision: https://phabricator.services.mozilla.com/D22635
--HG--
extra : moz-landing-system : lando
This is the SDK version we use to build other Android stuff in
mozilla-central. It's simpler if we build Wrench using the same
target since we can reuse the toolchains.
Depends on D22376
Differential Revision: https://phabricator.services.mozilla.com/D22377
--HG--
extra : moz-landing-system : lando
The existing linear gradient shader is quite slow, especially
on very large gradients on integrated GPUs.
The vast majority of gradients in real content are very simple
(typically < 4 stops, no angle, no repeat). For these, we can
run a fast path to persist a small gradient in the texture cache
and draw the gradient via the image shader.
This is _much_ faster than the catch-all gradient shader, and also
results in better batching while drawing the main scene.
In future, we can expand the fast path to handle more cases, such
as angled gradients. For now, it takes a conservative approach,
but still seems to hit the fast path on most real content.
Differential Revision: https://phabricator.services.mozilla.com/D22445
--HG--
extra : moz-landing-system : lando
There is a bug on Adreno GPUs where glBlitFramebuffers always writes
to the 0th layer of a texture array, regardless of which layer is
actually attached to the draw framebuffer.
With picture caching enabled on webrender, the cached pictures were
all being drawn to the 0th layer of the picture cache texture array,
leaving the other layers blank. This was resulting in the wrong
content being drawn on one tile of the screen, and the rest being
black.
This works around this by blitting to an intermediate renderbuffer,
then using glCopyTexSubImage3D to copy from the renderbuffer to the
correct texture layer.
Differential Revision: https://phabricator.services.mozilla.com/D22305
--HG--
extra : moz-landing-system : lando
Same for blit_render_target_invert_y(). Make them wrappers around the
private function blit_render_target_impl(), which uses the currently
bound read and draw targets as before.
Differential Revision: https://phabricator.services.mozilla.com/D22304
--HG--
extra : moz-landing-system : lando
We can get back the fancy flag syntax as soon as we get C++17 inline variables,
which I sent an email to dev-platform@ about, with no reply.
Differential Revision: https://phabricator.services.mozilla.com/D22382
--HG--
extra : moz-landing-system : lando
Remove the intern_types module in favor of the associated Internable types that we already have.
The only bit of magic I had to do is around serialization bounds, and it's nicely isolated.
Differential Revision: https://phabricator.services.mozilla.com/D21797
--HG--
extra : moz-landing-system : lando
A picture can have its own unique clip which was not considered by its
children when calculating their visible rects. As a result, if a picture
clips its primitive rect for snapping purposes, it may produce a
different snapping offset than expected. We should instead just snap to
the primitive rect itself for the picture, since it is the union of the
visible rects that we snapped to for the children.
Differential Revision: https://phabricator.services.mozilla.com/D21778
Remove the intern_types module in favor of the associated Internable types that we already have.
The only bit of magic I had to do is around serialization bounds, and it's nicely isolated.
Differential Revision: https://phabricator.services.mozilla.com/D21797
--HG--
extra : moz-landing-system : lando
The goal of this change was to simplify the semantics of our document placement and split the logical elements inside (display list) from the actual screen rectangle occupied by a document.
To achieve that, we introduce the framebuffer space for things Y-flipped on screen.
We fix the frame outputs, so that they get produced on the first frame without loopback from the frame building to scene building.
Differential Revision: https://phabricator.services.mozilla.com/D21641
--HG--
extra : moz-landing-system : lando
example error
ERROR 2019-03-01T15:23:27Z: webrender::device::gl: (error) GL_INVALID_ENUM error generated. Invalid primitive mode.
thread 'main' panicked at 'Caught GL error 500 at 'draw_elements_instanced'', webrender/src/device/gl.rs:1098:17
note: Run with `RUST_BACKTRACE=1` for a backtrace.
Differential Revision: https://phabricator.services.mozilla.com/D21701
--HG--
extra : moz-landing-system : lando
The goal of this change was to simplify the semantics of our document placement and split the logical elements inside (display list) from the actual screen rectangle occupied by a document.
To achieve that, we introduce the framebuffer space for things Y-flipped on screen.
We fix the frame outputs, so that they get produced on the first frame without loopback from the frame building to scene building.
Differential Revision: https://phabricator.services.mozilla.com/D21641
--HG--
extra : moz-landing-system : lando
This patch fixes some wasted GPU time on mobile devices due to
redundant resolve / copy steps.
In the first case, we would previously do:
- Global clear of color / depth on main framebuffer.
- Bind and draw off-screen targets.
- Bind main framebuffer and draw scene.
Between step 1 and 2, a resolve step is triggered on tiled GPU
drivers, wasting a lot of GPU time. To fix this, the clear is
now deferred until the framebuffer of the first document is
drawn. This does slightly change the semantics of how WR does
clear operations, but I think it works fine and makes more sense.
In the second case, we would previously do:
- ...
- Draw main framebuffer
- End frame and invalidate the contents of input textures.
- Bind main framebuffer and draw debug overlay.
This also introduces an extra resolve / copy step, even if the
debug overlay is not enabled. To fix this, the invalidation step
of the input textures to the main framebuffer pass is deferred
until all drawing is complete on the main framebuffer, by doing
the invalidation in the end_frame() call of the texture resolver.
Together, these save a very significant amount of ms per frame
in GPU time on the mobile devices I tested.
Differential Revision: https://phabricator.services.mozilla.com/D21490
--HG--
extra : moz-landing-system : lando
This is a stab at what the correct approach to this should be. It
seems that we should be using the window size here and not the
screen_rect, as the screen_rect is not used to offset what we
normally draw, but instead generally for scissoring(?). The end
result is if we use an offset screen_rect, we end up applying
the offset of the chrome area twice, once because the document's
screen rect is offset, and once because of the tile.world_rect
offset.
Depends on D20696
Differential Revision: https://phabricator.services.mozilla.com/D20698
--HG--
extra : moz-landing-system : lando
Change the external scroll offset to be a vector, rather than a
point. This can also be updated gecko-side in future, but for
now is converted to a vector at the API boundary.
Also plumb through the external scroll offset so that it is stored
inside the ScrollFrameInfo in a spatial node. This will allow
modifying the transforms that the clip-scroll tree creates to
take into account the external scroll offset in future.
Differential Revision: https://phabricator.services.mozilla.com/D21319
--HG--
extra : moz-landing-system : lando
This doesn't introduce any functional changes. However, it refactors
the way that stacking context coords are converted into reference
frame relative coordinates.
Having a single method to retrieve the current offset will make it
easier to take advantage of the newly added API that allows Gecko
to supply initial scroll offsets for scroll nodes. In turn, this
will allow WR to normalize the local coordinates of primitives, which
will allow future improvements and simplifications to the picture
caching implementation.
Differential Revision: https://phabricator.services.mozilla.com/D21090
--HG--
extra : moz-landing-system : lando
A number of small tweaks to enable the picture caching invalidation
tests. With this in place, we can start adding more test coverage
for various invalidation scenarios.
- Make the reference image render after the test images, so that dirty
region tracking is more intuitive.
- Instead of replaying the same frame in wrench to ensure frames are
caching, try to cache tiles every frame when testing mode is enabled.
- Add a basic invalidation test for a rectangle with color that changes
each frame.
- Make the dirty region index implicit and rename dirty_region to dirty
in reftest format.
- Fix an underflow error when moving to next frame in wrench.
Differential Revision: https://phabricator.services.mozilla.com/D20963
--HG--
extra : moz-landing-system : lando
This is a new implementation of mix-blend compositing that is meant to be more idiomatic to WR and efficient.
Previously, mix-blend mode was composed in the following way:
1. parent stacking context was forced to isolate
2. source picture is also isolated
3. when rendering the isolated context, the framebuffer is read upon reaching the source. Both the readback and the source are placed in the RT cache.
4. a mix-blend draw call is issued to read from those cache segments and blend on top of the backdrop
The new implementation works by using the picture cutting (intruduced for preserve-3D contexts earlier) and some bits of magic:
1. backdrop stacking context is isolated with a special composition mode that prevents it from actually rendeing unless the suorce stacking context is invisible.
2. source stacking context is isolated with mix-blend composition mode that has a pointer to the backdrop picture
3. the instance of the backdrop picture is placed as a peer of the source picture (not a child)
4. if the backdrop is invisible, the source is drawn as a simple blit
5. otherwise, it's a draw call that reads from the isolated backdrop and source textures
Note the differences:
- parent stacking context is not isolated, but backdrop is
- no framebuffer readback is involved
- the source and backdrop pictures are rendered in parallel in a pass, improving the batching
- we don't blend onto the backdrop while reading from the backdrop copy at the same time
- the depth of the render pass tree is reduced: previously the parent and the source were isolated, now the source and the backdrop, which are siblings
Differential Revision: https://phabricator.services.mozilla.com/D20608
--HG--
rename : gfx/wr/wrench/reftests/blend/multiply-2-ref.yaml => gfx/wr/wrench/reftests/blend/multiply-3-ref.yaml
rename : gfx/wr/wrench/reftests/blend/multiply-3.yaml => gfx/wr/wrench/reftests/blend/multiply-4.yaml
extra : moz-landing-system : lando
Document splitting is crashing with early initialization of the debug
renderer. Not sure why, and this is just a temporary workaround, but
one that I think we want anyway, as we don't want to be unnecessarily
lazy-initting the debug renderer.
Depends on D20698
Differential Revision: https://phabricator.services.mozilla.com/D20700
--HG--
extra : moz-landing-system : lando
This is a new implementation of mix-blend compositing that is meant to be more idiomatic to WR and efficient.
Previously, mix-blend mode was composed in the following way:
1. parent stacking context was forced to isolate
2. source picture is also isolated
3. when rendering the isolated context, the framebuffer is read upon reaching the source. Both the readback and the source are placed in the RT cache.
4. a mix-blend draw call is issued to read from those cache segments and blend on top of the backdrop
The new implementation works by using the picture cutting (intruduced for preserve-3D contexts earlier) and some bits of magic:
1. backdrop stacking context is isolated with a special composition mode that prevents it from actually rendeing unless the suorce stacking context is invisible.
2. source stacking context is isolated with mix-blend composition mode that has a pointer to the backdrop picture
3. the instance of the backdrop picture is placed as a peer of the source picture (not a child)
4. if the backdrop is invisible, the source is drawn as a simple blit
5. otherwise, it's a draw call that reads from the isolated backdrop and source textures
Note the differences:
- parent stacking context is not isolated, but backdrop is
- no framebuffer readback is involved
- the source and backdrop pictures are rendered in parallel in a pass, improving the batching
- we don't blend onto the backdrop while reading from the backdrop copy at the same time
- the depth of the render pass tree is reduced: previously the parent and the source were isolated, now the source and the backdrop, which are siblings
Differential Revision: https://phabricator.services.mozilla.com/D20608
--HG--
rename : gfx/wr/wrench/reftests/blend/multiply-2-ref.yaml => gfx/wr/wrench/reftests/blend/multiply-3-ref.yaml
rename : gfx/wr/wrench/reftests/blend/multiply-3.yaml => gfx/wr/wrench/reftests/blend/multiply-4.yaml
extra : moz-landing-system : lando
Without this patch any enclosing scale transform between a blurred
picture and the nearest raster root was being ignored entirely for the
purposes of blur.
Also includes a couple of reftests to exercise this code.
Differential Revision: https://phabricator.services.mozilla.com/D20908
--HG--
extra : moz-landing-system : lando
Currently on Android we upload texture data to the webrender texture
cache using a PBO. On Adreno GPUs, however, this upload is still being
done synchronously, and profiles show a lot of time spent waiting in
glTexSubImage3D.
The problem is that the stride of the data in the PBO is not a
multiple of 256 bytes, so the driver is not able to DMA the upload.
This patch ensures that data is laid out optimally in the PBO, using
glMapBufferRange then copying the data line-by-line if required. This
allows the driver to perform the upload asynchronously as intended.
Differential Revision: https://phabricator.services.mozilla.com/D20492
--HG--
extra : moz-landing-system : lando
The tiling origin is computed withing image::tiles instead of being provided to the function.
In addition, the image rect in device space is exposed as function parameter.
In a followup, callers will have to determine the correct image rect using the blob image's visible area.
Differential Revision: https://phabricator.services.mozilla.com/D20175
--HG--
extra : moz-landing-system : lando
angle_shader_validation.rs just checks that the number of "switch" and "default:" are the same number in the source file, even if they occur in comments.
On Windows the vFuncs array is always 0 in the fragment shader. If we move the computation of the vFuncs array outside of the switch (so that it is computed for every type of shader, even when it is not needed) then it works.
For table/discrete we just create a lookup table for all 256 possible input values. We should probably switch to just computing the value in the shader, unless the list of value is really long.
The format for stacking contexts in the built display list goes from
PushStackingContext item
push_iter of Vec<FilterOp>
to
SetFilterOps item
push_iter of Vec<FilterOp>
1st SetFilterData item
push_iter of array of func types
push_iter funcR values
push_iter funcG values
push_iter funcB values
push_iter funcA values
.
.
.
nth SetFilterData item
push_iter of array of func types
push_iter funcR values
push_iter funcG values
push_iter funcB values
push_iter funcA values
PushStackingContext item
We need separate a SetFilterData item for each filter because we can't push_iter a variable sized thing.
When we iterate over the built display list to flatten it we work similarly to how gradients work with a SetGradientStops item before the actual gradient item. So when we see SetFilterOps or SetFilterData we use them to fill out values on the built display list iterator but don't those items return them to the iterator user and instead continue iterating until we hit the PushStackingContext item, at which point to the iterator consumer it appears as those the FilterOps and FilterDatas were on the PushStackingContext item. (This part is trickier too since we need a TempFilterData type that just holds ItemRange's until we get the actual bytes later.)
Do we need to clear cur_filters and cur_filter_data at some point to prevent them from getting ready by items for which they do not apply?