Add annotations to vertex shaders so that SWGL can detect when a vertex attribute
is generated by per-instance data rather than per-vertex data.
Differential Revision: https://phabricator.services.mozilla.com/D65614
--HG--
extra : moz-landing-system : lando
Add annotations to vertex shaders so that SWGL can detect when a vertex attribute
is generated by per-instance data rather than per-vertex data.
Differential Revision: https://phabricator.services.mozilla.com/D65614
--HG--
extra : moz-landing-system : lando
The code that picks the texture cache slab size does not consider snapped extents under 64 pixels for rectangular pages, causing for example a 16x500 request to go into a 512x512 page.
This commit allows very thin requests to get a 64x512 page.
Differential Revision: https://phabricator.services.mozilla.com/D66187
--HG--
extra : moz-landing-system : lando
build_frame is called by update_document which calls rebuild_hit_tester if the hit test tree is invalidated. The advantage of doing it there is that it is after the frame has been submitted to the renderer so we are out of the critical path and the work can overlap with draw call submission.
So we don't need to do the work in build_frame, and since we don't currently set the validity flag there we are often re-building the hit test tree a second time after frame building.
Differential Revision: https://phabricator.services.mozilla.com/D64328
--HG--
extra : moz-landing-system : lando
build_frame is called by update_document which calls rebuild_hit_tester if the hit test tree is invalidated. The advantage of doing it there is that it is after the frame has been submitted to the renderer so we are out of the critical path and the work can overlap with draw call submission.
So we don't need to do the work in build_frame, and since we don't currently set the validity flag there we are often re-building the hit test tree a second time after frame building.
Differential Revision: https://phabricator.services.mozilla.com/D64328
--HG--
extra : moz-landing-system : lando
This adds support for tracking and invalidating tiles based on a
movable virtual offset.
Differential Revision: https://phabricator.services.mozilla.com/D65687
--HG--
extra : moz-landing-system : lando
Turn the difference checkbox into a radio that adds "heatmap"; it uses
WebGL to show both images, their absolute difference, and a color-coded
max difference. The quadrants split following the mouse.
This helps to separate large variations (red) from small variations
(green) and helps to compare the images without losing track of where
they are.
Differential Revision: https://phabricator.services.mozilla.com/D65841
--HG--
extra : moz-landing-system : lando
Instead of trying to extract an inner rectangle from a transformed inner rectange,
which by itself doesn't have an obviously "best" solution, we are going to test if the
visibility rect is within the polygon of the projected inner rect.
The test is more precise, could be slightly more heavy, but most importantly - it's correct.
Differential Revision: https://phabricator.services.mozilla.com/D65836
--HG--
extra : moz-landing-system : lando
This makes the picture cache debug view more represent the amount
of pixels that are being rasterized and composited. It's also a
bit clearer where picture cache boundaries are on some pages.
Differential Revision: https://phabricator.services.mozilla.com/D65932
--HG--
extra : moz-landing-system : lando
This patch adds an asynchronous hit tester that can perform hit testing queries without blocking on a synchronous message to the render backend thread, which is often busy building frames. This is done by having a shared immutable hit tester readable by any thread, atomically swapped each time the render backend processes a new scene or frame.
In order to asynchronously hit test without causing race conditions with APZ intenral state, the hit tester has to be built while the APZ lock is held.
Differential Revision: https://phabricator.services.mozilla.com/D45345
--HG--
extra : moz-landing-system : lando
Adding a repro-case reftest that asks for a 19996x5000 RenderTask (at
-p1), then fix it in analogy with the clamping to reasonable values that
happens for `NormalBorder`.
Differential Revision: https://phabricator.services.mozilla.com/D65660
--HG--
extra : moz-landing-system : lando
This patch adds support for external compositor surfaces when
using native compositor mode.
Differential Revision: https://phabricator.services.mozilla.com/D65436
--HG--
extra : moz-landing-system : lando
This patch refactors how external surfaces are stored in the
CompositeState structure. This is primarily to simplify integration
with native compositor mode, but also simplifies the Draw compositor
path.
Previously, the ResolvedExternalSurface struct contained information
that was used to rasterize the external surface (YUV planes etc) and
also the information to composite it (device rect, clip rect, z_id).
Now, ResolvedExternalSurface contains just the information required
to rasterize the external surface, while the compositing information
is handled by adding the external surface as a regular tile. This
makes it possible to unify how external surfaces are drawn, via the
common draw_tile_list method.
Differential Revision: https://phabricator.services.mozilla.com/D65269
--HG--
extra : moz-landing-system : lando
Add support for a `fuzzy-range` keyword in reftest.list.
It is similar to `fuzzy` but it allows multiple pairs of
`max_difference, num_differences` numbers that introduce
multiple buckets of allowed differences.
For example, `fuzzy-range(5,100,20,10)` allows at most
100 pixels with a difference of at most 5, _plus_ an extra
10 pixels at most that have a difference more than 5 but
less than or equal to 20.
The total number of differing pixels allowed is thus 110,
but only if 100 of those differ by <= 5 and the remaining
10 by <= 20.
110 pixels with a difference <= 5 will still fail.
This is intentional to encourage tighter bounds in tests
where many pixels are slightly off and a few outliers are
off by a lot.
The number of parameters is arbitrary; longer lists can
get confusing so this change also introduces optional
support for writing `<=` in front of the max difference
and `*` in front of the max pixel count, eg.
`fuzzy-range(<=5,*100,<=20,*10)` (no spaces).
Any pixels that exceed the highest maximum will fail the
test, similar to `fuzzy`.
Steps tested:
1. the same tests fail in exactly the same way before and after;
2. reordered the `fuzzy` statements for raster_root_A/B/C to no longer
be sorted by max difference, and verified that the tests pass/fail
the same way;
(then sort them again which is easier to understand);
3. tests using the new feature still fail when the ref no longer matches
(deliberately broke the _ref version and verified test failed);
Differential Revision: https://phabricator.services.mozilla.com/D65247
--HG--
extra : moz-landing-system : lando
Ensure that the image keys and image generations for external
compositor surfaces are included in the composite descriptor,
which is used to determine if a composite is required or can
be skipped.
Differential Revision: https://phabricator.services.mozilla.com/D65216
--HG--
extra : moz-landing-system : lando
Rather than treating webrender::intern::UpdateList as a sequence of operations,
each of which might be an insertion or a removal, and using a side table to
supply extra data for insertions, it's simpler to segregate insertions and
removals into two separate vectors. This avoids the need for an enum whose
discriminant needs to be checked within the loop, and allows a few loops that
are only looking for one kind of operation to skip over the others entirely.
Ultimately, there should be no change in the order in which operations occur. In
practice, the old UpdateList always held a contiguous run of insertions,
followed by a run of removals (removals are consumed by apply_updates directly
after being generated by end_frame_and_get_pending_updates).
Differential Revision: https://phabricator.services.mozilla.com/D64444
--HG--
extra : moz-landing-system : lando
This patch improves the performance of compositor surfaces in
two ways:
(1) Ignore primitives behind the first compositor surface when
determining whether a tile needs to be moved to the overlay
(alpha) pass. This means WR only moves a tile to the alpha
pass when it has primitives that overlap with the compositor
surface bounding rect, and are ordered after that compositor
surface. In practice, this means most tiles are able to
remain in the fast (opaque) path. Typically, a small number
of tiles that contain overlay video controls are moved to the
alpha pass.
(2) Register the opaque compositor surfaces as potential occluders.
This allows tiles that are completely covered by a compositor
surface to be removed from the compositor visual tree, which
helps both the simple and native compositor modes.
Between them, these optimizations typically mean that when watching
video in full-screen, nothing is composited except the video surface
itself, and some small region(s) where video overlay controls are
currently active.
Differential Revision: https://phabricator.services.mozilla.com/D64909
--HG--
extra : moz-landing-system : lando
The appveyor.yml change bumps it for windows. The macOS worker has the
rust version bumped out-of-band and the change to .taskcluster.yml just
updates the documentation.
Differential Revision: https://phabricator.services.mozilla.com/D64945
--HG--
extra : moz-landing-system : lando
A previous patch in this series introduced overlay tiles. However,
now that native surfaces exist for for the opaque and alpha tiles
within a slice, we can remove the overlay tiles array and add
these special tiles to the alpha surface.
Differential Revision: https://phabricator.services.mozilla.com/D64899
--HG--
extra : moz-landing-system : lando
This patch fixes an oversight in part 5 of this patch series that
could result in an incorrect UV rect being used for an external
texture that uses a custom UV rect.
When the texture is an external texture, the UV rect is not known when
the external surface descriptor is created, because external textures
are not resolved until the lock() callback is invoked at the start of
the frame render. To handle this, query the texture resolver for the
UV rect if it's an external texture, otherwise use the default UV rect.
Differential Revision: https://phabricator.services.mozilla.com/D64687
--HG--
extra : moz-landing-system : lando
Previously, a native compositor surface was considered to be
completely opaque, or completely translucent. This is due to
a limitation in how alpha is handled in the DirectComposition
API level.
With this patch, each picture cache slice maintains both an
opaque and translucent native surface handle. Tiles are assigned
to one of those surfaces based on their current opacity.
This is a performance optimization in some cases, since:
- Even if part of a cache is translucent, opaque tiles can
still participate in occlusion at the compositor level.
- If a tile is changing from opaque to translucent, it now
invalidates only that tile, rather than the entire surface.
The primary benefit of this patch is that it allows compositor
surfaces to be drawn sliced in between the opaque surface and
any overlay / alpha tiles.
Differential Revision: https://phabricator.services.mozilla.com/D64495
--HG--
extra : moz-landing-system : lando
This was causing one of the large drop-shadow wrench reftests to timeout.
This is only a partial fix, as we should be checking the scale factors earlier on when sanitizing the
filter input. This will ensure we match what the non-WR backend is doing and will prevent overinflation.
Differential Revision: https://phabricator.services.mozilla.com/D64197
--HG--
extra : moz-landing-system : lando
This patch adds support for YUV images to be promoted to compositor
surfaces when using the simple (draw) compositor mode. A follow up
to this will extend support to native compositor implementations.
We rely on only promoting compositor surfaces that are opaque
primitives. With this assumption, the tile(s) that intersect the
compositor surface get a 'cutout' in the location where the
compositor surface exists, allowing that tile to be drawn as
an overlay after the compositor surface.
Tiles are only drawn in overlay mode if there is content that
exists on top of the compositor surface. Otherwise, we can draw
the tiles in the normal fast path before the compositor surface
is drawn.
The patch also introduces a new subpixel AA mode, which allows
subpixel rendering to be enabled conditionally as long as the
text run does not intersect some number of excluded rectangle
regions. In this way, subpixel text remains on most of the page,
but is disabled for elements that are drawn into transparent
regions of the tile where the compositor surface exists.
This allows video playback to be composited directly into the
framebuffer, without invalidation of content tiles, which can
save a significant amount of battery power and improve performance.
Differential Revision: https://phabricator.services.mozilla.com/D63816
--HG--
extra : moz-landing-system : lando
Previously we only saved shaders to disk on the tenth frame, meaning
shaders compiled afterwards would not be cached and would need to be
recompiled in future runs of the application. This change makes it so
that we cache shaders to disk regardless of which frame they are
compiled in.
We continue to treat only the shaders used within the first ten frames
as the "startup shaders", meaning only those ones will be loaded on
next startup. Caching as many other shaders as possible is still
beneficial, however, as they are loaded on-demand.
Differential Revision: https://phabricator.services.mozilla.com/D62748
--HG--
extra : moz-landing-system : lando
This patch folds the raster scale and device pixel scale effects into
the font transform, instead of the font size itself. This helps minimize
the quantization effect given the font size is stored as an Au
(resolution of 1/60) instead of an f32, and the shader does not
replicate that effect.
Differential Revision: https://phabricator.services.mozilla.com/D63641
--HG--
extra : moz-landing-system : lando
Move the UV writing function into the shared yuv.glsl include so
that the YUV compositing shader will be able to access it.
Differential Revision: https://phabricator.services.mozilla.com/D63815
--HG--
extra : moz-landing-system : lando
If only a single segment is produced, there is no benefit to writing
a segment instance array. Instead, just use the main primitive rect
written into the GPU cache.
Differential Revision: https://phabricator.services.mozilla.com/D63812
--HG--
extra : moz-landing-system : lando
This patch folds the raster scale and device pixel scale effects into
the font transform, instead of the font size itself. This helps minimize
the quantization effect given the font size is stored as an Au
(resolution of 1/60) instead of an f32, and the shader does not
replicate that effect.
Differential Revision: https://phabricator.services.mozilla.com/D63641
--HG--
extra : moz-landing-system : lando
It's unclear what this was accomplishing, but it
prevents us from correctly processing the pixels
on the edge of the mask, causing masked content to
peek through. No tests seem to rely on this
discarding behaviour.
Also added a reftest that's fairly fuzzy but should
suffice as a canary for a regression here.
Differential Revision: https://phabricator.services.mozilla.com/D63344
--HG--
extra : moz-landing-system : lando
Currently we quantize the snapped reference frame relative offset used
in the shader in order to stuff it in our extra integer parameter fields
in the prim header. Since we don't use the prim rect size for anything
in the text shader, this patch just puts it there instead, and avoids
quantizing that.
Differential Revision: https://phabricator.services.mozilla.com/D63639
--HG--
extra : moz-landing-system : lando
If a memory pressure event arrives _after_ a new scene has
been published that writes persistent targets (i.e. cached
render tasks to the texture cache, or picture cache tiles)
but _before_ the next update/render loop, those targets
will not be updated due to the active_documents list being
cleared at the end of this message. To work around that,
if any of the existing documents have not rendered yet, and
have picture/texture cache targets, force a render so that
those targets are updated.
Differential Revision: https://phabricator.services.mozilla.com/D63593
--HG--
extra : moz-landing-system : lando
The render backend's frame builder config is kept only in order to send updates to the scene builder's frame builder config which will update the scene's in the next transaction. If need be the scene configs can be updated right away by looping over the documents. This avoids confusing bugs where only updating the backend's config affects the visibility pass but not the rest.
Differential Revision: https://phabricator.services.mozilla.com/D63337
--HG--
extra : moz-landing-system : lando
Continuing on the trend of having all of the gpu data encoding in gpu_types.rs so that it is easy to find and to avoid repeating it in batch.rs.
Depends on D62928
Differential Revision: https://phabricator.services.mozilla.com/D63178
--HG--
extra : moz-landing-system : lando
A quality-of-life improvement that will make it easier to change the encoding of the user data without having to repeat the correct casting, bit shifting and masking in many places. It also makes it harder to encode the data incorrectly by mistake or forget information.
Differential Revision: https://phabricator.services.mozilla.com/D62928
--HG--
extra : moz-landing-system : lando
This should remove the allocation and copy in
TextureD3D::ensureRenderTarget() in some situations.
Differential Revision: https://phabricator.services.mozilla.com/D62952
--HG--
extra : moz-landing-system : lando
On GLES, the default shader behaviour is to use highp(32bit) integers
in the vertex shader and mediump(16bit) integers in the fragment shader. This
causes issues in the border shader due to bitshifting with 16 bits. The
fix here is to only shift by 8 bits as the data can be represented in a
16bit integer and forcing mediump in the vertex shader as well.
Differential Revision: https://phabricator.services.mozilla.com/D62784
--HG--
extra : moz-landing-system : lando
This should remove the allocation and copy in
TextureD3D::ensureRenderTarget() in some situations.
Differential Revision: https://phabricator.services.mozilla.com/D62952
--HG--
extra : moz-landing-system : lando
Factor some parts of the YUV brush shader out into a shared
yuv.glsl shader include.
In future, this shader code will also be referenced by the
composite.glsl shader when using the simple (Draw) compositing
mode, to composite video surfaces directly into the framebuffer
where possible.
Differential Revision: https://phabricator.services.mozilla.com/D63123
--HG--
extra : moz-landing-system : lando
This patch makes the CPU side incorporate the raster scale when
calculating the subpixel position of a glyph. It also makes the shader
side not include the glyph scale factor when recalculating the glyph
position (since it was not known/used when determining the subpixel
position in the first place). This makes the subpixel position stable
when we transition between Screen and Local(raster_scale) spaces.
Differential Revision: https://phabricator.services.mozilla.com/D62812
--HG--
extra : moz-landing-system : lando
Simplify some of the logic related to handling multiple
compositor surfaces in future, specifically:
* Only rebuild the tiles map when the tile rect changes.
* Remove need for tiles_to_draw array.
Differential Revision: https://phabricator.services.mozilla.com/D62694
--HG--
extra : moz-landing-system : lando
Add support to the yaml reader and writer to be able to specify
that a primitive should set the PREFER_COMPOSITOR_SURFACE flag.
This flag is not currently used, but in future will signal the
picture caching code to promote a primitive to draw on a native
compositor surface where possible.
Differential Revision: https://phabricator.services.mozilla.com/D62693
--HG--
extra : moz-landing-system : lando
composite_simple() calculates combined dirty rect from all tile's dirty rect. But the combined dirty rect becomes invalid when there is an old tile that was dropped. The dropped tile's dirty rect is not counted in composite_simple().
Differential Revision: https://phabricator.services.mozilla.com/D62531
--HG--
extra : moz-landing-system : lando
With this patch, a minimal valid rect is calculated for each
picture cache tile. This is used to reduce the scissor rect
during tile rasterization, and the draw rect during tile
compositing, whenever there is a partial tile.
Differential Revision: https://phabricator.services.mozilla.com/D62177
--HG--
extra : moz-landing-system : lando
For Draw (non-native) and CA modes, we include the per-tile
valid rect in the clip rect from the surface.
For DC (non-virtual) mode, a per-tile clip rect is set on the
visual for each tile, separate from the overall clip rect that
is set on the surface visual.
For DC (virtual) mode, the Trim API is used to remove pixels
in the virtual tile area that are outside the valid / clipped
region.
Note: Although the valid rect is now applied in the native
compositors, it's currently only based on the overall picture
cache bounding rect. Thus, with this patch there isn't any
noticeable performance improvement. Once this lands and is
working correctly, the follow up patch to calculate a smaller
valid region per-tile is a small amount of work.
Differential Revision: https://phabricator.services.mozilla.com/D61424
--HG--
extra : moz-landing-system : lando
Fourth iteration: improve the detail in reported tile invalidations.
The invalidation enum stores the old and new values for lightweight
types. For a change in PrimCount, the old and new list of ItemUids is
stored (if logging is enabled); the tool can then diff the two to see
what was added and removed. To convert that back into friendly strings,
the interning activity is used to build up a map of ItemUid to a string.
A similar special-casing of Content Descriptor will print the item
that's invalidating the tile, plus the origin and/or rectangle.
Also adding zoom and pan command line options both to fix high-DPI
issues and also to allow zooming out far enough to see out-of-viewport
cache lifetime and prefetching due to scrolling.
Also fix a bug where interning updates get lost if more than one update
happens without building a frame: switch to a vector of serialized
updatelists (one per type) instead of allowing just one string (per
type).
Differential Revision: https://phabricator.services.mozilla.com/D61656
--HG--
extra : moz-landing-system : lando
The test is meant to check that tupes work and a tuple of a single element needs a comma to differentiate it from just being the element.
Differential Revision: https://phabricator.services.mozilla.com/D61681
--HG--
extra : moz-landing-system : lando
We incorporate the reference frame origin offset for non-animated
transforms, but forgot for animated transforms. Since the offset itself
is not animated, we should still incorporate it into the snapping
transform.
Differential Revision: https://phabricator.services.mozilla.com/D61689
--HG--
extra : moz-landing-system : lando
The test is meant to check that tupes work and a tuple of a single element needs a comma to differentiate it from just being the element.
Differential Revision: https://phabricator.services.mozilla.com/D61681
--HG--
extra : moz-landing-system : lando
We need a way to switch it on and off to compare the performance and power usage of various test cases.
The new pref is "webrender.enable-multithreading" and does not require a restart.
Differential Revision: https://phabricator.services.mozilla.com/D61589
--HG--
extra : moz-landing-system : lando
In bug 1574493, we moved most snapping to scene building and a minority
to frame building. No snapping is done in the shader. However there was
some left over code that still tried to replicate the past behaviour and
this caused wobbling during the rendering. This patch removes the extra
snapping on the CPU side and trusts scene/frame building to do the job.
Differential Revision: https://phabricator.services.mozilla.com/D61590
--HG--
extra : moz-landing-system : lando
There's two potential cases handled by this patch:
(1) A scrollbar container followed by another scrollbar container.
In this case, we need to ensure these are placed into separate
clusters, even though the cluster flags otherwise match, to
ensure that slice creation will see the two clusters.
(2) If a fixed position scroll root trails a scrollbar container.
In this case, ensure that a scrollbar contains marks the
cluster flags to create a slice straight after the scrollbar,
to avoid other primitives with the same scroll root sneaking
into the scrollbar container.
Differential Revision: https://phabricator.services.mozilla.com/D61519
--HG--
extra : moz-landing-system : lando
This patch introduces a per-tile valid rect. In the initial implementation,
this only uses the bounds of the overall picture cache bounding rect. The
next part of this patch will make use of true per-tile valid regions, to
improve performance where there are holes in a single cache slice.
Differential Revision: https://phabricator.services.mozilla.com/D61378
--HG--
extra : moz-landing-system : lando
We want to use the same line decoration (dashed, dotted, wavy) shader code for
both horizontal and vertical lines, so it makes sense for them to use a
coordinate system that has been rotated (transposed, actually) so that .x always
runs parallel to the line being decorated, and .y is always perpendicular.
Before this patch, we passed the orientation enum as a vertex attribute, used a
switch to swap coordinates in the vertex shader, and then swapped them again in
the fragment shader.
This patch trades the orientation for a f32 'axis select' vertex attribute, and
uses `mix` to swap them in the vertex shader. Then no consideration is necessary
in the fragment shader: the vLocalPos varying is already in the appropriate form.
Since get_line_decoration_sizes is already thinking in terms of line-parallel
coordinates, it might seem like a good idea for decoration jobs to simply use
line-parallel coordinates throughout. However, this actually results in more
swapping and opportunities for confusion: much of the CPU work is concerned with
the rectangle the decoration's mask occupies in the texture cache, which is
axis-aligned.
Differential Revision: https://phabricator.services.mozilla.com/D60926
--HG--
extra : rebase_source : 8dcd8455c664067dd25f583c944d611a35c25e79
extra : source : dfb21632ea198c1acdc6a34ee08113d516f666d5
Without this change, get_line_decoration_sizes returns an (inline_size,
block_size) pair, where inline_size is parallel to the line being decorated, and
block_size perpendicular. However, these values are generally used as the
dimensions of an axis-aligned bounding box for the line, not as specific
parameters to the rendering process, so it makes sense to arrange them into a
LayoutSize value in this function, since it is already taking the orientation
into account anyway.
The caller, SceneBuilder::add_line, then doesn't need to swap the components,
and the adjustment of the clipping rectangle to avoid partial dots looks a bit
more natural: widths with widths, heights with heights.
Differential Revision: https://phabricator.services.mozilla.com/D60925
--HG--
extra : rebase_source : 093d572a7a35bddc6e070fc08d511f7f164a4d89
extra : source : 3549dd471446c291864822736f4587c81741cd56
We want to use the same line decoration (dashed, dotted, wavy) shader code for
both horizontal and vertical lines, so it makes sense for them to use a
coordinate system that has been rotated (transposed, actually) so that .x always
runs parallel to the line being decorated, and .y is always perpendicular.
Before this patch, we passed the orientation enum as a vertex attribute, used a
switch to swap coordinates in the vertex shader, and then swapped them again in
the fragment shader.
This patch trades the orientation for a f32 'axis select' vertex attribute, and
uses `mix` to swap them in the vertex shader. Then no consideration is necessary
in the fragment shader: the vLocalPos varying is already in the appropriate form.
Since get_line_decoration_sizes is already thinking in terms of line-parallel
coordinates, it might seem like a good idea for decoration jobs to simply use
line-parallel coordinates throughout. However, this actually results in more
swapping and opportunities for confusion: much of the CPU work is concerned with
the rectangle the decoration's mask occupies in the texture cache, which is
axis-aligned.
Differential Revision: https://phabricator.services.mozilla.com/D60926
--HG--
extra : moz-landing-system : lando
Without this change, get_line_decoration_sizes returns an (inline_size,
block_size) pair, where inline_size is parallel to the line being decorated, and
block_size perpendicular. However, these values are generally used as the
dimensions of an axis-aligned bounding box for the line, not as specific
parameters to the rendering process, so it makes sense to arrange them into a
LayoutSize value in this function, since it is already taking the orientation
into account anyway.
The caller, SceneBuilder::add_line, then doesn't need to swap the components,
and the adjustment of the clipping rectangle to avoid partial dots looks a bit
more natural: widths with widths, heights with heights.
Differential Revision: https://phabricator.services.mozilla.com/D60925
--HG--
extra : moz-landing-system : lando
There is nothing clipping related in there anymore.
Differential Revision: https://phabricator.services.mozilla.com/D61178
--HG--
rename : gfx/wr/webrender/src/clip_scroll_tree.rs => gfx/wr/webrender/src/spatial_tree.rs
extra : moz-landing-system : lando
Third iteration:
Fix broken scrolling (and incorrect positioning of quad tree lines) by
serializing the SpaceMapper(-transform) from take_context, and using it
to transform the primitive rects (instead of the previous translation
based on unclipped.origin);
Note: this is done at visualization time and not at export time to
distinguish actually moving elements from merely-scrolling ones.
Serialize the entire UpdateList, so we get the data (Keys) that's being
added; add it to the overview;
Move the static CSS code into tilecache_base.css; add this and the .js
file to the binary, write them as part of output (instead of manual
copy); clean up CSS a bit;
Differential Revision: https://phabricator.services.mozilla.com/D61049
--HG--
extra : source : 535ae1d4818a3f0af64d61846035135751352bd1
extra : histedit_source : bf9a8f830ec7db4c2d1fcb6deaaf72949d6b69ed
Third iteration:
Fix broken scrolling (and incorrect positioning of quad tree lines) by
serializing the SpaceMapper(-transform) from take_context, and using it
to transform the primitive rects (instead of the previous translation
based on unclipped.origin);
Note: this is done at visualization time and not at export time to
distinguish actually moving elements from merely-scrolling ones.
Serialize the entire UpdateList, so we get the data (Keys) that's being
added; add it to the overview;
Move the static CSS code into tilecache_base.css; add this and the .js
file to the binary, write them as part of output (instead of manual
copy); clean up CSS a bit;
Differential Revision: https://phabricator.services.mozilla.com/D61049
--HG--
extra : moz-landing-system : lando
Second part: trace the updates that are sent to the DataStore, and save
at least the Insert/Remove and ItemUID as part of the wr-capture.
(We could expand this with more info, eg. the actual Keys, later).
TileView then reads them back and generates a color coded report to
overlay with the page view. This helps to see the types and amounts of
interned primitives that lead to cache invalidations.
Differential Revision: https://phabricator.services.mozilla.com/D60619
--HG--
extra : moz-landing-system : lando
On pages with many render tasks (typically a lot of text shadows), we spend a lot of time moving RenderTask which is a fairly large struct into the render graph's buffer. This patch avoids it by using the VecHelper trick of allocaitng space before initializing the value. Some RenderTask::new_* methods which take the render task graph in parameter were modified to add the task and return the task ID to work around borrow-checking restriction.
Differential Revision: https://phabricator.services.mozilla.com/D60854
--HG--
extra : moz-landing-system : lando
Second part: trace the updates that are sent to the DataStore, and save
at least the Insert/Remove and ItemUID as part of the wr-capture.
(We could expand this with more info, eg. the actual Keys, later).
TileView then reads them back and generates a color coded report to
overlay with the page view. This helps to see the types and amounts of
interned primitives that lead to cache invalidations.
Differential Revision: https://phabricator.services.mozilla.com/D60619
--HG--
extra : moz-landing-system : lando
The computation of the repetition depends on the aspect ratio of the segment's uv rectangle, which was previously represented by the dx/dy variables in the shader. These were mistakenly computing the ratio of the normalized uvs within the primitive's total uv rect, which was incorrect since the normalization introduces a non-uniform scale.
This patch fixes it by taking the uv size in device pixels instead of the the normalized textel rect. dx and dy are also renamed into segment_uv_size which is a more informative name.
Differential Revision: https://phabricator.services.mozilla.com/D60768
--HG--
extra : moz-landing-system : lando
Unlike the border areas that only nead their own dimensions, the middle area of a border-image determines its repetition parameter based on the size of the borders. A new flag is provided for the brush_image shader to know whether to use the segment's own rect or look at the borders when computing the pattern's size.
Differential Revision: https://phabricator.services.mozilla.com/D59675
--HG--
extra : moz-landing-system : lando
border-image-repeat: Round is equivalent to Repeat with the pattern size adjusted to fill the area with a whole number of repetitions. This is done by adjusting the segment's stretch_size in the shader so that it fits a whole number of times in the segment's size.
Differential Revision: https://phabricator.services.mozilla.com/D59370
--HG--
extra : moz-landing-system : lando
Second part: trace the updates that are sent to the DataStore, and save
at least the Insert/Remove and ItemUID as part of the wr-capture.
(We could expand this with more info, eg. the actual Keys, later).
TileView then reads them back and generates a color coded report to
overlay with the page view. This helps to see the types and amounts of
interned primitives that lead to cache invalidations.
Differential Revision: https://phabricator.services.mozilla.com/D60619
--HG--
extra : moz-landing-system : lando
Second part: trace the updates that are sent to the DataStore, and save
at least the Insert/Remove and ItemUID as part of the wr-capture.
(We could expand this with more info, eg. the actual Keys, later).
TileView then reads them back and generates a color coded report to
overlay with the page view. This helps to see the types and amounts of
interned primitives that lead to cache invalidations.
Differential Revision: https://phabricator.services.mozilla.com/D60619
--HG--
extra : moz-landing-system : lando
This allows calling code to specify whether a primitive would prefer
to be promoted to a compositor surface and/or picture cache slice.
This is a performance hint that can be used for large external
primitives, such as videos and canvas elements.
Differential Revision: https://phabricator.services.mozilla.com/D60637
--HG--
extra : moz-landing-system : lando
There are a number of issues with the current gradient dithering
implementation, that cause many test failures and also fuzziness
rendering when enabling DirectComposition virtual surfaces. In
particular, the dither result is dependent on the offset of the
update rect within a render target.
For now, this patch disables gradient dithering by default. This
gives us:
- A heap of new test PASS results (or reduced fuzziness).
- Fixes a number of non-deterministic fuzziness bugs with DC.
- Improves performance of gradient rendering by a reasonable amount.
We can fix gradient dithering as a follow up, and re-enable if/when
we find content that would benefit from it significantly (we may
be able to improve gradients in other ways than dithering too).
Differential Revision: https://phabricator.services.mozilla.com/D60460
--HG--
extra : moz-landing-system : lando
The semantics of the inaccuracy reftest mode are that the overall
test succeeds as long as one of the multiple test images mismatches
the reference image.
However, the previous implementation would print an unexpected result
if any of the comparisons were equal (even though the overall test
result would be reported correctly).
This patch changes wrench so that it only reports an unexpected
failure for the overall test, not each individual reference.
Differential Revision: https://phabricator.services.mozilla.com/D60564
--HG--
extra : moz-landing-system : lando
Unlike other types of render tasks, pictures can have hundreds of dependencies. The dependency vector is re-built every frame, leading to a lot of vector re-allocations in some pages.
Depends on D60151
Differential Revision: https://phabricator.services.mozilla.com/D60182
--HG--
extra : moz-landing-system : lando
The majority of render tasks have 0, 1 or 2 dependencies, except for pictures that typically have dozens to hundreds of dependencies. SmallVec with 2 inline elements avoids many tiny heap allocations in pages with a lot of text shadows and other types of render tasks.
Differential Revision: https://phabricator.services.mozilla.com/D60151
--HG--
extra : moz-landing-system : lando
This patch introduces a new reftest (syntax ** or !* in reftest files).
This type of test renders a single input file multiple times, at a range
of different picture cache tile sizes. It then verifies that each of the
images matches (or doesn't).
This can be used to verify rasterizer accuracy when drawing primitives
with different tile sizes and/or dirty rect update strategies.
One of the included tests in this patch fails the accuracy test - the
intent is to fix this inaccuracy in a follow up patch and then be able to
mark it pixel exact.
Differential Revision: https://phabricator.services.mozilla.com/D60185
--HG--
extra : moz-landing-system : lando
Optionally serialize N frames into a circular memory buffer, and save
them as part of wr-capture into tilecache/.
The new tile_view utility reads the folder and converts the frames to
svg for easier visualization, with a few extra features:
- color code each primitive based on which slice it is on;
- highlight new or moving primitives in red (brute force diff);
- print all invalidated tiles at the top and the invalidation reason;
- overlay the tile quadtree to visualize splitting & merging;
- HTML and JS wrappers for animation playback, timeline scrubbing;
Positioning of the tiles on the screen is a bit broken still; especially
during scrolling and "special" pages (like about:config).
Interning info is not used yet.
Differential Revision: https://phabricator.services.mozilla.com/D59975
--HG--
extra : moz-landing-system : lando
Optionally serialize N frames into a circular memory buffer, and save
them as part of wr-capture into tilecache/.
The new tile_view utility reads the folder and converts the frames to
svg for easier visualization, with a few extra features:
- color code each primitive based on which slice it is on;
- highlight new or moving primitives in red (brute force diff);
- print all invalidated tiles at the top and the invalidation reason;
- overlay the tile quadtree to visualize splitting & merging;
- HTML and JS wrappers for animation playback, timeline scrubbing;
Positioning of the tiles on the screen is a bit broken still; especially
during scrolling and "special" pages (like about:config).
Interning info is not used yet.
Differential Revision: https://phabricator.services.mozilla.com/D59975
--HG--
extra : moz-landing-system : lando
Optionally serialize N frames into a circular memory buffer, and save
them as part of wr-capture into tilecache/.
The new tile_view utility reads the folder and converts the frames to
svg for easier visualization, with a few extra features:
- color code each primitive based on which slice it is on;
- highlight new or moving primitives in red (brute force diff);
- print all invalidated tiles at the top and the invalidation reason;
- overlay the tile quadtree to visualize splitting & merging;
- HTML and JS wrappers for animation playback, timeline scrubbing;
Positioning of the tiles on the screen is a bit broken still; especially
during scrolling and "special" pages (like about:config).
Interning info is not used yet.
Differential Revision: https://phabricator.services.mozilla.com/D59975
--HG--
extra : moz-landing-system : lando
I don't think it makes much of a difference but clippy is quite vocal about it.
Differential Revision: https://phabricator.services.mozilla.com/D59114
--HG--
extra : moz-landing-system : lando
I removed the old opacity filter path in the brush_blend shader and cleaned up the filter mode
constants in the shader so there are less magic numbers. This should help if/when we move more
filters to their own shaders.
Depends on D59610
Differential Revision: https://phabricator.services.mozilla.com/D59611
--HG--
extra : moz-landing-system : lando
Opacity is a common effect that is used and the opacit filter path is also used when a stacking
context has an opacity of < 1. The brush_blend shader is slow since it has support for a large
portion of CSS filters; however, opacity is used much more often than the rest of the filters.
This patch adds a simple shader for opacity effects which bypasses the extra overhead in the
brush_blend shader.
Differential Revision: https://phabricator.services.mozilla.com/D59610
--HG--
extra : moz-landing-system : lando
I don't think it makes much of a difference but clippy is quite vocal about it.
Differential Revision: https://phabricator.services.mozilla.com/D59114
--HG--
extra : moz-landing-system : lando
Easy to miss that the slow formatting code is run unconditionally.
The remaining instances are in recording and startup code.
Differential Revision: https://phabricator.services.mozilla.com/D58920
--HG--
extra : moz-landing-system : lando
This adds support for holes within virtual surfaces. On platforms
that don't use virtual surfaces, this just works by destroying
the tile that is empty so it never gets composited.
Differential Revision: https://phabricator.services.mozilla.com/D59059
--HG--
extra : moz-landing-system : lando
Adds an #ifdef to the DCLayerTree implementation that allows
selecting whether to use the virtual surface API (enabled by
default) or the regular DC surface API.
For now, this is a compile-time switch. As a follow up to this,
we will support both options at runtime (for example, using the
regular surface API for surfaces that have holes or translucency).
Differential Revision: https://phabricator.services.mozilla.com/D58870
--HG--
extra : moz-landing-system : lando
This will allow use of the DirectComposition virtual surface API. If
it turns out that some pages recreate surfaces a lot due to opacity
changing, we can add some extra logic to avoid recreating surfaces
as often, and making use of per-tile opacity in some cases.
Differential Revision: https://phabricator.services.mozilla.com/D57592
--HG--
extra : moz-landing-system : lando
* Fix crash due to shift left causing overflow (debug only)
* Remove rounding of scrolling offsets and snap to view space instead of
world space
Differential Revision: https://phabricator.services.mozilla.com/D57017
--HG--
extra : moz-landing-system : lando
* Fix crash due to shift left causing overflow (debug only)
* Remove rounding of scrolling offsets and snap to view space instead of
world space
Differential Revision: https://phabricator.services.mozilla.com/D57017
--HG--
extra : moz-landing-system : lando
Uploading texture data is showing up frequently in profiles on Adreno devices,
especially when zooming on a text-heavy page. Specifically, the time is spent in
glMapBufferRange and glBufferSubData, most likely when internally allocating the
buffer before transferring data in to it.
Currently, we are creating a new PBO, by calling glBufferData(), for each
individual upload region. This change makes it so that we calculate the required
size for all upload regions to a texture, then create single a PBO of the
required size. The entire buffer is then mapped only once, and each individual
upload chunk is written to it. This can require the driver to allocate a large
buffer, sometimes multiple megabytes in size. However, it handles this case much
better than allocating tens or even hundreds of smaller buffers.
An upload chunk may require more space in a PBO than the original CPU-side
buffer, so that the data is aligned correctly for performance or correctness
reasons. Therefore it is the caller of Device.upload_texture()'s responsibility
to call a new function, Device.required_upload_size(), to calculate the required
size beforehand.
On AMD Macs, there is a bug where PBO uploads from a non-zero offset can
fail. See bug 1603783. Therefore this patch preserves the current behaviour on
AMD Mac, reallocating a new PBO for each upload, therefore ensuring the offset
is always zero.
Differential Revision: https://phabricator.services.mozilla.com/D56382
--HG--
extra : moz-landing-system : lando
Adds a notion of empty cache items in the texture cache, that are not uploaded into textures but have a cache entry and expire like other types of entries. The motivation for this is to avoid continuously requesting invalid glyphs to be re-rasterized. Currently if a page contains invalid glyphs we gracefully fail to reasterize it but since we don't keep a trace of it in the cache it appears new each frame which cause us to schedule work on the rayon thread pool every frame at great costs.
Differential Revision: https://phabricator.services.mozilla.com/D56958
--HG--
extra : moz-landing-system : lando