Today we already support EXT_debug_marker for debug markers in
WebRender. This is useful to categorize GL API calls in tools such as
APITrace and RenderDoc. However not all drivers indicate support for
said extension, but instead support KHR_debug. This patch makes us
support both methods, preferring KHR_debug.
Differential Revision: https://phabricator.services.mozilla.com/D25787
You tell me if this is right, I have no Windows build available to test :)
Depends on D25602
Differential Revision: https://phabricator.services.mozilla.com/D25603
--HG--
extra : moz-landing-system : lando
The change contains a number of incremental improvements with the main goal of:
- allocating exactly as many tile as required by the app
- respecting the picture caching option
Differential Revision: https://phabricator.services.mozilla.com/D24740
--HG--
extra : moz-landing-system : lando
On some low end GPUs, clip mask rendering can be a significant
GPU cost. One way to reduce this is to support segment rendering
on more primitive types, to reduce the size of clip masks.
This patch adds support for pictures that have clip masks
to take part in segment rendering. This can significantly
reduce the number of fragments that must be drawn into a clip
mask for off-screen picture surfaces.
In future, WR can take advantage of segment rendering to use
clip mask shaders that handle only a single corner at a time.
This will be a further significant performance win to clip mask
render time.
Differential Revision: https://phabricator.services.mozilla.com/D24852
--HG--
extra : moz-landing-system : lando
The change contains a number of incremental improvements with the main goal of:
- allocating exactly as many tile as required by the app
- respecting the picture caching option
Differential Revision: https://phabricator.services.mozilla.com/D24740
--HG--
extra : moz-landing-system : lando
As discussed in IRC. AFAICT the ORTHO_NEAR|FAR_PLANE should match
up with the ranges of valid ZBufferIds, but please double-check
me.
Differential Revision: https://phabricator.services.mozilla.com/D23599
--HG--
extra : moz-landing-system : lando
This corrects A) An issue encountered with our strategy for skipping
the end_pass call for all but an offscreen render target. See the
comment above the end_pass call for details, and B) An issue with
depth clearing where we do not clear the whole rect if there are
multiple non-intersecting documents.
Differential Revision: https://phabricator.services.mozilla.com/D23056
--HG--
extra : moz-landing-system : lando
Just a little rename for clarity - I ended up scratching my head at
something for a little more time than I should have because I assumed
this was synchronous without looking at the implementation.
Differential Revision: https://phabricator.services.mozilla.com/D21584
--HG--
extra : moz-landing-system : lando
This is a large patch that contains all of the core changes for
renderroot splitting.
Differential Revision: https://phabricator.services.mozilla.com/D20701
--HG--
extra : moz-landing-system : lando
As discussed in IRC. AFAICT the ORTHO_NEAR|FAR_PLANE should match
up with the ranges of valid ZBufferIds, but please double-check
me.
Differential Revision: https://phabricator.services.mozilla.com/D23599
--HG--
extra : moz-landing-system : lando
This corrects A) An issue encountered with our strategy for skipping
the end_pass call for all but an offscreen render target. See the
comment above the end_pass call for details, and B) An issue with
depth clearing where we do not clear the whole rect if there are
multiple non-intersecting documents.
Differential Revision: https://phabricator.services.mozilla.com/D23056
--HG--
extra : moz-landing-system : lando
Just a little rename for clarity - I ended up scratching my head at
something for a little more time than I should have because I assumed
this was synchronous without looking at the implementation.
Differential Revision: https://phabricator.services.mozilla.com/D21584
--HG--
extra : moz-landing-system : lando
This is a large patch that contains all of the core changes for
renderroot splitting.
Differential Revision: https://phabricator.services.mozilla.com/D20701
--HG--
extra : moz-landing-system : lando
This patch shouldn't have any functional effect. It's motivated by
some changes needed to implement clip masks in pixel local storage,
but these are also general improvements that stand alone. Specifically:
- Remove clip_task_index from the global PrimitiveHeader struct.
In most cases, the clip task is supplied in the BrushInstance
structure, so it makes no sense to have this as a common field,
where it is generally unused. Instead, there is now an extra
'user data' field available in the PrimitiveHeader. Non-brush
shaders (text_run and split_composite) use that extra field to
store the clip task address, while the brush shaders gain an extra
(currently unused) user data field.
- In turn, this means there is no need to unconditionally try
and retrieve the first clip task address for a primitive
during batching. This was previously used to initialize the
PrimitiveHeader structure.
Differential Revision: https://phabricator.services.mozilla.com/D24312
--HG--
extra : moz-landing-system : lando
Add an experimental code path that makes use of the pixel local
storage extension available on many mobile GPUs.
This code path is currently disabled by default, as the support
is not complete for all primitives and blend modes. The initial
aim is to get feature parity with the existing renderer.
Once that's complete, we can take advantage of the (minimum)
12 bytes per pixel of high speed on-tile memory to store custom
data.
Clip masks are a good use case for this, since they map 1:1 with
the position of the fragment they are clipping. Using this for
clip masks allows us to handle clipping on mobile GPUs in a much
more efficient way - we can skip (a) separate render targets,
(b) target resolve (c) sample the mask texture during rendering.
Depends on D24123
Differential Revision: https://phabricator.services.mozilla.com/D24124
--HG--
extra : moz-landing-system : lando
Most rounded rect clips in real content are (a) axis aligned and
(b) have uniform radii.
When these conditions are met, we can run a fast path for clip
mask generation that uses significantly fewer ALU shader ops.
This is not typically a bottleneck on desktop GPUs, but can have
a large performance impact on mobile GPUs (and perhaps low end
integrated GPUs).
The Mali shader analyzer reports the slow path for the rounded
rect clip shader to be 94 cycles per fragment, while the fast
path is 10 cycles.
Differential Revision: https://phabricator.services.mozilla.com/D23817
--HG--
extra : moz-landing-system : lando
Most rounded rect clips in real content are (a) axis aligned and
(b) have uniform radii.
When these conditions are met, we can run a fast path for clip
mask generation that uses significantly fewer ALU shader ops.
This is not typically a bottleneck on desktop GPUs, but can have
a large performance impact on mobile GPUs (and perhaps low end
integrated GPUs).
The Mali shader analyzer reports the slow path for the rounded
rect clip shader to be 94 cycles per fragment, while the fast
path is 10 cycles.
Differential Revision: https://phabricator.services.mozilla.com/D23817
--HG--
extra : moz-landing-system : lando
In GLES 3 GLSL, the default precision for ints is highp (32 bit) in
vertex shaders, but only mediump (16 bit) in fragment shaders.
To render linear and radial gradients the fragment shader must fetch
the gradient stops from the gpu cache, using an address variable. This
variable is a 16 bit int, so if the stops data is located at
too high an address (row 32 or greater) then this value will have
overflown and we fetch from the wrong location. This was resulting in
garbage being drawn instead of the correct gradients.
To fix this, any address used in a fragment shader must be marked as
highp. This includes the varying input which supplies the address, and
the arguments to any functions used for the fetch.
Differential Revision: https://phabricator.services.mozilla.com/D23669
--HG--
extra : moz-landing-system : lando
The only time that the ancestor spatial node index is read is
during push_stacking_context. This means that even if it was
used as an ancestor for a 3d context, we can safely collapse
it in to the parent stacking context during flattening, if it
is otherwise redundant.
This is a partial fix for picture caching heuristics failing
with the display list produced on mobile devices.
Differential Revision: https://phabricator.services.mozilla.com/D23633
--HG--
extra : moz-landing-system : lando
The cleans up our WR use statements further, easying the merge conflicts.
Note: this PR is subject to instant rot, it is preferred to land quickly.
Differential Revision: https://phabricator.services.mozilla.com/D23373
--HG--
extra : moz-landing-system : lando
The cleans up our WR use statements further, easying the merge conflicts.
Note: this PR is subject to instant rot, it is preferred to land quickly.
Differential Revision: https://phabricator.services.mozilla.com/D23373
--HG--
extra : moz-landing-system : lando
As a general principle, we're trying to remove most usage of the
render task data that gets stored in a texture (bind_frame_data
is slow on many platforms, and it's somewhat inflexible with the
amount of data that can be provided).
This also lays some foundations for possibly drawing clip masks
with alternative techniques on mobile / tiled GPU architectures,
where we may not need / want a render task for clip masks.
Differential Revision: https://phabricator.services.mozilla.com/D23271
--HG--
extra : moz-landing-system : lando
By Bug 1526213, WebRenderBridgeParent::RecvEmptyTransaction() does not handle a case that resource update is handled by WebRenderTextureHostWrapper. In this case, txn.IsResourceUpdatesEmpty() became true and the function thought there was no resource update and the function returned DidComposite soon to client side. Then it caused a heavy over production of SharedSurface_ANGLEShareHandle if GPU is not powerful.
Differential Revision: https://phabricator.services.mozilla.com/D22454