The default size on most platforms is 256*256 which corresponds to the default blob tile size. I didn't check on android so I set it to 512*512 above which we never batch the upload so that the behavior is unchanged, but I suspect that a smaller threshold like 256*256 would also work better there.
On Windows with heavy blob image workloads, not batching gives a 20%-30% improvement to the time spent in update_texture_cache.
Differential Revision: https://phabricator.services.mozilla.com/D129516
Previously, each PrimitiveList within a Picture primitive contained
an array of PrimitiveInstance structs. This makes it hard for the
primitives to be accessed unless during a recursive picture tree
traversal.
In future, we want multiple subsystems to be able to store an index
buffer of primitives (for example, the animation property binding
system will store primitives that need to be invalidated when the
attached animation value changes). The tile-cache primitive
dependencies are currently stored in an index buffer, but after
this change can be unified so that the dependency information can
be stored with the primitive instance.
An additional benefit of this change is that we can now flatten the
visibility, prepare passes for better memory access patterns and
code simplicity.
Differential Revision: https://phabricator.services.mozilla.com/D129596
Add MixBlend and ComponentTransfer to the picture composite modes that
unconditionally establish a raster root.
All the known bugs with the raster root code have been fixed, so let's
start incrementally enabling raster roots for more picture modes, and
fix any regressions that come from these before making raster roots
the default for all surfaces.
Differential Revision: https://phabricator.services.mozilla.com/D117954
Using the composite shader for that was very clunky. It might even be faster thanks to how much simpler the shader is and it's use of texel fetch instead of linear sampling.
Differential Revision: https://phabricator.services.mozilla.com/D128256
The previous heuristics would set a threshold in number of allocated bytes per texture type, continuously evict a fixed number of items above the threshold and stop evicting below the threshold.
The new logic lowers the amount of allocated bytes below which we stop evicting, and make eviction above the threshold more progressive, only evicting very cold items if the the cache pressure is low and ramping up how aggressively items a are evicted along with the cache pressure.
In addition, we maintain a minimum of cache pressure until there is a only a single texture atlas allocated for a given shared texture type.
The above combined with the texture cache compaction code ensures that even after a difficult workload, the texture cache eventually settles back to a single texture atlas per type with reasonable fragmentation.
Differential Revision: https://phabricator.services.mozilla.com/D128255
The initial implementation uses the composite shader just like the upload code, but it's a bit messy. I'll add a simpler more specialized shader for that in a followup.
Differential Revision: https://phabricator.services.mozilla.com/D128253
The newer version contains a few new APIs that will be needed for texture cache compaction:
- An iterator of allocated items.
- A way to associate AllocIds with a stable index.
Differential Revision: https://phabricator.services.mozilla.com/D128251
With this change, the spatial tree is no longer rebuilt
every time a new display arrives and a scene is built.
Instead, scene building maintains a hash map of spatial
node keys <-> indices, allowing any spatial node that
has been recently seen in a display list to be retained.
Scene building then checks if the node is equivalent or
has been modified since the last display list, and sends
these delta changes as part of the scene swap to the
frame building code. The frame building code applies
the deltas to each updated spatial node.
The primary benefits of this are:
- Spatial node indices are now stable across display lists,
allowing future interning of primitives and clips to
include the spatial node. This can be used for various
optimizations, including interning during DL building,
caching transform state, reducing size of PrimitiveInstance
- Frame building now knows exactly which spatial nodes are
new, removed, updated or unchanged. We can make this of
this to cache a lot of the (mostly) redundant calculations
that are done during both scene and frame building.
Differential Revision: https://phabricator.services.mozilla.com/D127902
This allows picture slice backdrops to be supported when they contain
rounded-rect clip(s) that are in the same coordinate system as the
primitive. This is the common case, and allows subpixel AA to be
used in bookmark menu and other popups that are part of the current
Gecko UI.
Differential Revision: https://phabricator.services.mozilla.com/D128156
With this change, the spatial tree is no longer rebuilt
every time a new display arrives and a scene is built.
Instead, scene building maintains a hash map of spatial
node keys <-> indices, allowing any spatial node that
has been recently seen in a display list to be retained.
Scene building then checks if the node is equivalent or
has been modified since the last display list, and sends
these delta changes as part of the scene swap to the
frame building code. The frame building code applies
the deltas to each updated spatial node.
The primary benefits of this are:
- Spatial node indices are now stable across display lists,
allowing future interning of primitives and clips to
include the spatial node. This can be used for various
optimizations, including interning during DL building,
caching transform state, reducing size of PrimitiveInstance
- Frame building now knows exactly which spatial nodes are
new, removed, updated or unchanged. We can make this of
this to cache a lot of the (mostly) redundant calculations
that are done during both scene and frame building.
Differential Revision: https://phabricator.services.mozilla.com/D127902
Due to driver bugs, component transfer filters using brush_blend do
not currently work on some Adreno 3xx devices.
The first issue is that the values for which function to use for each
component (the "v_funcs" varying) appear to be incorrect in the
fragment shader. Previously we passed these values as an int[4], but
due to this requiring to many varying slots we changed this to an
ivec4. This broke the entire shader for all blend operations on this
device (bug 1731758), so we recently changed to bit packing the 4
values in to a single int. This fixed the rest of the blend ops, but
component transfer still didn't work. This patch switches to using a
vec4, casting the values to and from floats, which works correctly.
The second issue appears to be due to converting between integer
precisions. The component transfer functions require a texelFetch from
the GPU cache. The fetch_from_gpu_cache* functions accept a highp int
argument, as GPU cache addresses can exceed those represenatable with
a mediump int. brush_blend is currently using a mediump int (the
default fragment shader precision) for the table_address, which is
therefore incorrect. However, the shader is buggy even when the actual
value is representable with mediump, indicating a driver bug to do
with precision conversion, rather than the value overflowing. To avoid
this we must make the table_address varying and variables highp (as
they should be anyway), but additionally must make the "k" and
"offset" variables (which get added to the table address) highp too.
Depends on D128049
Differential Revision: https://phabricator.services.mozilla.com/D128050
Due to a driver bug, cs_border_segment renders incorrectly on some
Adreno 3xx devices. The problem lies with the "ivec4 vConfig" varying,
whose values appear to be incorrect in the fragment shader. I have
attempted splitting these all in to separate varyings, removing the
bit packing for style and edge_axis, and using unsigned uvecs, each of
which solved some issues but not all.
The only workaround which appeared to fix all of the problems was
using floating point vecs rather than ivecs, and casting the values to
and from floats. This means that we need two components each for style
and edge_axis (as we cannot use bitwise operations with floats), so
they are packed in a vec4. The segment and clip_mode must therefore be
packed in a separate vec2. This patch implements this workaround.
It certainly seems as if the Adreno driver has serious issues using
integer (or ivec) varyings, however it is not as simple as them never
working correctly. For that reason I have, for now, resisted
converting all integer varyings in all shaders to floats. If we
encounter more similar issues in the future it may be worthwhile doing
so (and adding a test to ensure we do not reintroduce them).
Differential Revision: https://phabricator.services.mozilla.com/D128049
With this change, we no longer have mutable state stored inside
the `ReferenceFrameInfo` struct, which will be important as we
introduce the follow up patches to retain the spatial tree
between display lists.
Differential Revision: https://phabricator.services.mozilla.com/D127726
ScrollSensitivity is not used by Gecko. Also remove some remnants
of the old code to combine scroll frames when display lists swap.
Differential Revision: https://phabricator.services.mozilla.com/D127609
The ability to restrict hit-tests by pipeline_id isn't used by
Gecko or wrench. Remove it to simplify landing some of the upcoming
spatial tree work.
Differential Revision: https://phabricator.services.mozilla.com/D127608
This patch adds plumbing to keep track of why we request frames to be rendered.
This information is then displayed in gecko profile markers on the renderer thread as well as in profiler HUD counters (See "Render reasons" in profiler.rs).
Differential Revision: https://phabricator.services.mozilla.com/D127274
This patch makes the SceneSpatialTree be retained by the scene
Document structure, and the SpatialTree be retained by the
RenderBackend structure.
This is exactly the same structure as the interning system uses:
- The SceneSpatialTree is mutated as a new scene is built
- A set of "deltas" is calculated and stored in SpatialTreeUpdates
- The SpatialTreeUpdates are stored in a BuiltTransaction
- The SpatialTreeUpdates are applied to the SpatialTree on scene swap
For now, the "deltas" are simply a complete list of spatial nodes,
which retains existing behavior. In future, this will contain actual
deltas, based on the unique spatial node keys that now exist.
Also update the capture/replay functionality to (de)serialize both
the retained scene and frame versions of the spatial tree.
Differential Revision: https://phabricator.services.mozilla.com/D127021