Bug 1487105 - Convert graphics documentation to restructured text. r=jrmuizel DONTBUILD

This will allow us to see our design documents on firefox-source-docs. I
ran each markdown file through pandoc, then manually fixed up any issues
I found.

Differential Revision: https://phabricator.services.mozilla.com/D4546

--HG--
rename : gfx/doc/AsyncPanZoom-HighLevel.png => gfx/docs/AsyncPanZoomArchitecture.png
rename : gfx/doc/silkArchitecture.png => gfx/docs/SilkArchitecture.png
extra : source : 05a0118455e4771d95f0667a713e32c5d0b4ee44
This commit is contained in:
Ryan Hunt 2018-08-29 00:33:56 -05:00
parent dbb6b089bd
commit 83dd220a65
14 changed files with 1525 additions and 1021 deletions

View File

@ -1,308 +0,0 @@
Advanced Layers
==============
Advanced Layers is a new method of compositing layers in Gecko. This document serves as a technical
overview and provides a short walk-through of its source code.
Overview
-------------
Advanced Layers attempts to group as many GPU operations as it can into a single draw call. This is
a common technique in GPU-based rendering called "batching". It is not always trivial, as a
batching algorithm can easily waste precious CPU resources trying to build optimal draw calls.
Advanced Layers reuses the existing Gecko layers system as much as possible. Huge layer trees do
not currently scale well (see the future work section), so opportunities for batching are currently
limited without expending unnecessary resources elsewhere. However, Advanced Layers has a few
benefits:
* It submits smaller GPU workloads and buffer uploads than the existing compositor.
* It needs only a single pass over the layer tree.
* It uses occlusion information more intelligently.
* It is easier to add new specialized rendering paths and new layer types.
* It separates compositing logic from device logic, unlike the existing compositor.
* It is much faster at rendering 3d scenes or complex layer trees.
* It has experimental code to use the z-buffer for occlusion culling.
Because of these benefits we hope that it provides a significant improvement over the existing
compositor.
Advanced Layers uses the acronym "MLG" and "MLGPU" in many places. This stands for "Mid-Level
Graphics", the idea being that it is optimized for Direct3D 11-style rendering systems as opposed
to Direct3D 12 or Vulkan.
LayerManagerMLGPU
------------------------------
Advanced layers does not change client-side rendering at all. Content still uses Direct2D (when
possible), and creates identical layer trees as it would with a normal Direct3D 11 compositor. In
fact, Advanced Layers re-uses all of the existing texture handling and video infrastructure as
well, replacing only the composite-side layer types.
Advanced Layers does not create a `LayerManagerComposite` - instead, it creates a
`LayerManagerMLGPU`. This layer manager does not have a `Compositor` - instead, it has an
`MLGDevice`, which roughly abstracts the Direct3D 11 API. (The hope is that this API is easily
interchangeable for something else when cross-platform or software support is needed.)
`LayerManagerMLGPU` also dispenses with the old "composite" layers for new layer types. For
example, `ColorLayerComposite` becomes `ColorLayerMLGPU`. Since these layer types implement
`HostLayer`, they integrate with `LayerTransactionParent` as normal composite layers would.
Rendering Overview
----------------------------
The steps for rendering are described in more detail below, but roughly the process is:
1. Sort layers front-to-back.
2. Create a dependency tree of render targets (called "views").
3. Accumulate draw calls for all layers in each view.
4. Upload draw call buffers to the GPU.
5. Execute draw commands for each view.
Advanced Layers divides the layer tree into "views" (`RenderViewMLGPU`), which correspond to a
render target. The root layer is represented by a view corresponding to the screen. Layers that
require intermediate surfaces have temporary views. Layers are analyzed front-to-back, and rendered
back-to-front within a view. Views themselves are rendered front-to-back, to minimize render target
switching.
Each view contains one or more rendering passes (`RenderPassMLGPU`). A pass represents a single
draw command with one or more rendering items attached to it. For example, a `SolidColorPass` item
contains a rectangle and an RGBA value, and many of these can be drawn with a single GPU call.
When considering a layer, views will first try to find an existing rendering batch that can support
it. If so, that pass will accumulate another draw item for the layer. Otherwise, a new pass will be
added.
When trying to find a matching pass for a layer, there is a tradeoff in CPU time versus the GPU
time saved by not issuing another draw commands. We generally care more about CPU time, so we do
not try too hard in matching items to an existing batch.
After all layers have been processed, there is a "prepare" step. This copies all accumulated draw
data and uploads it into vertex and constant buffers in the GPU.
Finally, we execute rendering commands. At the end of the frame, all batches and (most) constant
buffers are thrown away.
Shaders Overview
-------------------------------------
Advanced Layers currently has five layer-related shader pipelines:
- Textured (PaintedLayer, ImageLayer, CanvasLayer)
- ComponentAlpha (PaintedLayer with component-alpha)
- YCbCr (ImageLayer with YCbCr video)
- Color (ColorLayers)
- Blend (ContainerLayers with mix-blend modes)
There are also three special shader pipelines:
- MaskCombiner, which is used to combine mask layers into a single texture.
- Clear, which is used for fast region-based clears when not directly supported by the GPU.
- Diagnostic, which is used to display the diagnostic overlay texture.
The layer shaders follow a unified structure. Each pipeline has a vertex and pixel shader.
The vertex shader takes a layers ID, a z-buffer depth, a unit position in either a unit
square or unit triangle, and either rectangular or triangular geometry. Shaders can also
have ancillary data needed like texture coordinates or colors.
Most of the time, layers have simple rectangular clips with simple rectilinear transforms, and
pixel shaders do not need to perform masking or clipping. For these layers we use a fast-path
pipeline, using unit-quad shaders that are able to clip geometry so the pixel shader does not
have to. This type of pipeline does not support complex masks.
If a layer has a complex mask, a rotation or 3d transform, or a complex operation like blending,
then we use shaders capable of handling arbitrary geometry. Their input is a unit triangle,
and these shaders are generally more expensive.
All of the shader-specific data is modelled in ShaderDefinitionsMLGPU.h.
CPU Occlusion Culling
-------------------------------------
By default, Advanced Layers performs occlusion culling on the CPU. Since layers are visited
front-to-back, this is simply a matter of accumulating the visible region of opaque layers, and
subtracting it from the visible region of subsequent layers. There is a major difference
between this occlusion culling and PostProcessLayers of the old compositor: AL performs culling
after invalidation, not before. Completely valid layers will have an empty visible region.
Most layer types (with the exception of images) will intelligently split their draw calls into a
batch of individual rectangles, based on their visible region.
Z-Buffering and Occlusion
-------------------------------------
Advanced Layers also supports occlusion culling on the GPU, using a z-buffer. This is disabled by
default currently since it is significantly costly on integrated GPUs. When using the z-buffer, we
separate opaque layers into a separate list of passes. The render process then uses the following
steps:
1. The depth buffer is set to read-write.
2. Opaque batches are executed.,
3. The depth buffer is set to read-only.
4. Transparent batches are executed.
The problem we have observed is that the depth buffer increases writes to the GPU, and on
integrated GPUs this is expensive - we have seen draw call times increase by 20-30%, which is the
wrong direction we want to take on battery life. In particular on a full screen video, the call to
ClearDepthStencilView plus the actual depth buffer write of the video can double GPU time.
For now the depth-buffer is disabled until we can find a compelling case for it on non-integrated
hardware.
Clipping
-------------------------------------
Clipping is a bit tricky in Advanced Layers. We cannot use the hardware "scissor" feature, since the
clip can change from instance to instance within a batch. And if using the depth buffer, we
cannot write transparent pixels for the clipped area. As a result we always clip opaque draw rects
in the vertex shader (and sometimes even on the CPU, as is needed for sane texture coordiantes).
Only transparent items are clipped in the pixel shader. As a result, masked layers and layers with
non-rectangular transforms are always considered transparent, and use a more flexible clipping
pipeline.
Plane Splitting
---------------------
Plane splitting is when a 3D transform causes a layer to be split - for example, one transparent
layer may intersect another on a separate plane. When this happens, Gecko sorts layers using a BSP
tree and produces a list of triangles instead of draw rects.
These layers cannot use the "unit quad" shaders that support the fast clipping pipeline. Instead
they always use the full triangle-list shaders that support extended vertices and clipping.
This is the slowest path we can take when building a draw call, since we must interact with the
polygon clipping and texturing code.
Masks
---------
For each layer with a mask attached, Advanced Layers builds a `MaskOperation`. These operations
must resolve to a single mask texture, as well as a rectangular area to which the mask applies. All
batched pixel shaders will automatically clip pixels to the mask if a mask texture is bound. (Note
that we must use separate batches if the mask texture changes.)
Some layers have multiple mask textures. In this case, the MaskOperation will store the list of
masks, and right before rendering, it will invoke a shader to combine these masks into a single texture.
MaskOperations are shared across layers when possible, but are not cached across frames.
BigImage Support
--------------------------
ImageLayers and CanvasLayers can be tiled with many individual textures. This happens in rare cases
where the underlying buffer is too big for the GPU. Early on this caused problems for Advanced
Layers, since AL required one texture per layer. We implemented BigImage support by creating
temporary ImageLayers for each visible tile, and throwing those layers away at the end of the
frame.
Advanced Layers no longer has a 1:1 layer:texture restriction, but we retain the temporary layer
solution anyway. It is not much code and it means we do not have to split `TexturedLayerMLGPU`
methods into iterated and non-iterated versions.
Texture Locking
----------------------
Advanced Layers has a different texture locking scheme than the existing compositor. If a texture
needs to be locked, then it is locked by the MLGDevice automatically when bound to the current
pipeline. The MLGDevice keeps a set of the locked textures to avoid double-locking. At the end of
the frame, any textures in the locked set are unlocked.
We cannot easily replicate the locking scheme in the old compositor, since the duration of using
the texture is not scoped to when we visit the layer.
Buffer Measurements
-------------------------------
Advanced Layers uses constant buffers to send layer information and extended instance data to the
GPU. We do this by pre-allocating large constant buffers and mapping them with `MAP_DISCARD` at the
beginning of the frame. Batches may allocate into this up to the maximum bindable constant buffer
size of the device (currently, 64KB).
There are some downsides to this approach. Constant buffers are difficult to work with - they have
specific alignment requirements, and care must be taken not too run over the maximum number of
constants in a buffer. Another approach would be to store constants in a 2D texture and use vertex
shader texture fetches. Advanced Layers implemented this and benchmarked it to decide which
approach to use. Textures seemed to skew better on GPU performance, but worse on CPU, but this
varied depending on the GPU. Overall constant buffers performed best and most consistently, so we
have kept them.
Additionally, we tested different ways of performing buffer uploads. Buffer creation itself is
costly, especially on integrated GPUs, and especially so for immutable, immediate-upload buffers.
As a result we aggressively cache buffer objects and always allocate them as MAP_DISCARD unless
they are write-once and long-lived.
Buffer Types
------------
Advanced Layers has a few different classes to help build and upload buffers to the GPU. They are:
- `MLGBuffer`. This is the low-level shader resource that `MLGDevice` exposes. It is the building
block for buffer helper classes, but it can also be used to make one-off, immutable,
immediate-upload buffers. MLGBuffers, being a GPU resource, are reference counted.
- `SharedBufferMLGPU`. These are large, pre-allocated buffers that are read-only on the GPU and
write-only on the CPU. They usually exceed the maximum bindable buffer size. There are three
shared buffers created by default and they are automatically unmapped as needed: one for vertices,
one for vertex shader constants, and one for pixel shader constants. When callers allocate into a
shared buffer they get back a mapped pointer, a GPU resource, and an offset. When the underlying
device supports offsetable buffers (like `ID3D11DeviceContext1` does), this results in better GPU
utilization, as there are less resources and fewer upload commands.
- `ConstantBufferSection` and `VertexBufferSection`. These are "views" into a `SharedBufferMLGPU`.
They contain the underlying `MLGBuffer`, and when offsetting is supported, the offset
information necessary for resource binding. Sections are not reference counted.
- `StagingBuffer`. A dynamically sized CPU buffer where items can be appended in a free-form
manner. The stride of a single "item" is computed by the first item written, and successive
items must have the same stride. The buffer must be uploaded to the GPU manually. Staging buffers
are appropriate for creating general constant or vertex buffer data. They can also write items in
reverse, which is how we render back-to-front when layers are visited front-to-back. They can be
uploaded to a `SharedBufferMLGPU` or an immutabler `MLGBuffer` very easily. Staging buffers are not
reference counted.
Unsupported Features
--------------------------------
Currently, these features of the old compositor are not yet implemented.
- OpenGL and software support (currently AL only works on D3D11).
- APZ displayport overlay.
- Diagnostic/developer overlays other than the FPS/timing overlay.
- DEAA. It was never ported to the D3D11 compositor, but we would like it.
- Component alpha when used inside an opaque intermediate surface.
- Effects prefs. Possibly not needed post-B2G removal.
- Widget overlays and underlays used by macOS and Android.
- DefaultClearColor. This is Android specific, but is easy to added when needed.
- Frame uniformity info in the profiler. Possibly not needed post-B2G removal.
- LayerScope. There are no plans to make this work.
Future Work
--------------------------------
- Refactor for D3D12/Vulkan support (namely, split MLGDevice into something less stateful and something else more low-level).
- Remove "MLG" moniker and namespace everything.
- Other backends (D3D12/Vulkan, OpenGL, Software)
- Delete CompositorD3D11
- Add DEAA support
- Re-enable the depth buffer by default for fast GPUs
- Re-enable right-sizing of inaccurately sized containers
- Drop constant buffers for ancillary vertex data
- Fast shader paths for simple video/painted layer cases
History
----------
Advanced Layers has gone through four major design iterations. The initial version used tiling -
each render view divided the screen into 128x128 tiles, and layers were assigned to tiles based on
their screen-space draw area. This approach proved not to scale well to 3d transforms, and so
tiling was eliminated.
We replaced it with a simple system of accumulating draw regions to each batch, thus ensuring that
items could be assigned to batches while maintaining correct z-ordering. This second iteration also
coincided with plane-splitting support.
On large layer trees, accumulating the affected regions of batches proved to be quite expensive.
This led to a third iteration, using depth buffers and separate opaque and transparent batch lists
to achieve z-ordering and occlusion culling.
Finally, depth buffers proved to be too expensive, and we introduced a simple CPU-based occlusion
culling pass. This iteration coincided with using more precise draw rects and splitting pipelines
into unit-quad, cpu-clipped and triangle-list, gpu-clipped variants.

View File

@ -1,299 +0,0 @@
Asynchronous Panning and Zooming {#apz}
================================
**This document is a work in progress. Some information may be missing or incomplete.**
## Goals
We need to be able to provide a visual response to user input with minimal latency.
In particular, on devices with touch input, content must track the finger exactly while panning, or the user experience is very poor.
According to the UX team, 120ms is an acceptable latency between user input and response.
## Context and surrounding architecture
The fundamental problem we are trying to solve with the Asynchronous Panning and Zooming (APZ) code is that of responsiveness.
By default, web browsers operate in a "game loop" that looks like this:
while true:
process input
do computations
repaint content
display repainted content
In browsers the "do computation" step can be arbitrarily expensive because it can involve running event handlers in web content.
Therefore, there can be an arbitrary delay between the input being received and the on-screen display getting updated.
Responsiveness is always good, and with touch-based interaction it is even more important than with mouse or keyboard input.
In order to ensure responsiveness, we split the "game loop" model of the browser into a multithreaded variant which looks something like this:
Thread 1 (compositor thread)
while true:
receive input
send a copy of input to thread 2
adjust painted content based on input
display adjusted painted content
Thread 2 (main thread)
while true:
receive input from thread 1
do computations
repaint content
update the copy of painted content in thread 1
This multithreaded model is called off-main-thread compositing (OMTC), because the compositing (where the content is displayed on-screen) happens on a separate thread from the main thread.
Note that this is a very very simplified model, but in this model the "adjust painted content based on input" is the primary function of the APZ code.
The "painted content" is stored on a set of "layers", that are conceptually double-buffered.
That is, when the main thread does its repaint, it paints into one set of layers (the "client" layers).
The update that is sent to the compositor thread copies all the changes from the client layers into another set of layers that the compositor holds.
These layers are called the "shadow" layers or the "compositor" layers.
The compositor in theory can continuously composite these shadow layers to the screen while the main thread is busy doing other things and painting a new set of client layers.
The APZ code takes the input events that are coming in from the hardware and uses them to figure out what the user is trying to do (e.g. pan the page, zoom in).
It then expresses this user intention in the form of translation and/or scale transformation matrices.
These transformation matrices are applied to the shadow layers at composite time, so that what the user sees on-screen reflects what they are trying to do as closely as possible.
## Technical overview
As per the heavily simplified model described above, the fundamental purpose of the APZ code is to take input events and produce transformation matrices.
This section attempts to break that down and identify the different problems that make this task non-trivial.
### Checkerboarding
The content area that is painted and stored in a shadow layer is called the "displayport".
The APZ code is responsible for determining how large the displayport should be.
On the one hand, we want the displayport to be as large as possible.
At the very least it needs to be larger than what is visible on-screen, because otherwise, as soon as the user pans, there will be some unpainted area of the page exposed.
However, we cannot always set the displayport to be the entire page, because the page can be arbitrarily long and this would require an unbounded amount of memory to store.
Therefore, a good displayport size is one that is larger than the visible area but not so large that it is a huge drain on memory.
Because the displayport is usually smaller than the whole page, it is always possible for the user to scroll so fast that they end up in an area of the page outside the displayport.
When this happens, they see unpainted content; this is referred to as "checkerboarding", and we try to avoid it where possible.
There are many possible ways to determine what the displayport should be in order to balance the tradeoffs involved (i.e. having one that is too big is bad for memory usage, and having one that is too small results in excessive checkerboarding).
Ideally, the displayport should cover exactly the area that we know the user will make visible.
Although we cannot know this for sure, we can use heuristics based on current panning velocity and direction to ensure a reasonably-chosen displayport area.
This calculation is done in the APZ code, and a new desired displayport is frequently sent to the main thread as the user is panning around.
### Multiple layers
Consider, for example, a scrollable page that contains an iframe which itself is scrollable.
The iframe can be scrolled independently of the top-level page, and we would like both the page and the iframe to scroll responsively.
This means that we want independent asynchronous panning for both the top-level page and the iframe.
In addition to iframes, elements that have the overflow:scroll CSS property set are also scrollable, and also end up on separate scrollable layers.
In the general case, the layers are arranged in a tree structure, and so within the APZ code we have a matching tree of AsyncPanZoomController (APZC) objects, one for each scrollable layer.
To manage this tree of APZC instances, we have a single APZCTreeManager object.
Each APZC is relatively independent and handles the scrolling for its associated layer, but there are some cases in which they need to interact; these cases are described in the sections below.
### Hit detection
Consider again the case where we have a scrollable page that contains an iframe which itself is scrollable.
As described above, we will have two APZC instances - one for the page and one for the iframe.
When the user puts their finger down on the screen and moves it, we need to do some sort of hit detection in order to determine whether their finger is on the iframe or on the top-level page.
Based on where their finger lands, the appropriate APZC instance needs to handle the input.
This hit detection is also done in the APZCTreeManager, as it has the necessary information about the sizes and positions of the layers.
Currently this hit detection is not perfect, as it uses rects and does not account for things like rounded corners and opacity.
Also note that for some types of input (e.g. when the user puts two fingers down to do a pinch) we do not want the input to be "split" across two different APZC instances.
In the case of a pinch, for example, we find a "common ancestor" APZC instance - one that is zoomable and contains all of the touch input points, and direct the input to that APZC instance.
### Scroll Handoff
Consider yet again the case where we have a scrollable page that contains an iframe which itself is scrollable.
Say the user scrolls the iframe so that it reaches the bottom.
If the user continues panning on the iframe, the expectation is that the top-level page will start scrolling.
However, as discussed in the section on hit detection, the APZC instance for the iframe is separate from the APZC instance for the top-level page.
Thus, we need the two APZC instances to communicate in some way such that input events on the iframe result in scrolling on the top-level page.
This behaviour is referred to as "scroll handoff" (or "fling handoff" in the case where analogous behaviour results from the scrolling momentum of the page after the user has lifted their finger).
### Input event untransformation
The APZC architecture by definition results in two copies of a "scroll position" for each scrollable layer.
There is the original copy on the main thread that is accessible to web content and the layout and painting code.
And there is a second copy on the compositor side, which is updated asynchronously based on user input, and corresponds to what the user visually sees on the screen.
Although these two copies may diverge temporarily, they are reconciled periodically.
In particular, they diverge while the APZ code is performing an async pan or zoom action on behalf of the user, and are reconciled when the APZ code requests a repaint from the main thread.
Because of the way input events are stored, this has some unfortunate consequences.
Input events are stored relative to the device screen - so if the user touches at the same physical spot on the device, the same input events will be delivered regardless of the content scroll position.
When the main thread receives a touch event, it combines that with the content scroll position in order to figure out what DOM element the user touched.
However, because we now have two different scroll positions, this process may not work perfectly.
A concrete example follows:
Consider a device with screen size 600 pixels tall.
On this device, a user is viewing a document that is 1000 pixels tall, and that is scrolled down by 200 pixels.
That is, the vertical section of the document from 200px to 800px is visible.
Now, if the user touches a point 100px from the top of the physical display, the hardware will generate a touch event with y=100.
This will get sent to the main thread, which will add the scroll position (200) and get a document-relative touch event with y=300.
This new y-value will be used in hit detection to figure out what the user touched.
If the document had a absolute-positioned div at y=300, then that would receive the touch event.
Now let us add some async scrolling to this example.
Say that the user additionally scrolls the document by another 10 pixels asynchronously (i.e. only on the compositor thread), and then does the same touch event.
The same input event is generated by the hardware, and as before, the document will deliver the touch event to the div at y=300.
However, visually, the document is scrolled by an additional 10 pixels so this outcome is wrong.
What needs to happen is that the APZ code needs to intercept the touch event and account for the 10 pixels of asynchronous scroll.
Therefore, the input event with y=100 gets converted to y=110 in the APZ code before being passed on to the main thread.
The main thread then adds the scroll position it knows about and determines that the user touched at a document-relative position of y=310.
Analogous input event transformations need to be done for horizontal scrolling and zooming.
### Content independently adjusting scrolling
As described above, there are two copies of the scroll position in the APZ architecture - one on the main thread and one on the compositor thread.
Usually for architectures like this, there is a single "source of truth" value and the other value is simply a copy.
However, in this case that is not easily possible to do.
The reason is that both of these values can be legitimately modified.
On the compositor side, the input events the user is triggering modify the scroll position, which is then propagated to the main thread.
However, on the main thread, web content might be running Javascript code that programatically sets the scroll position (via window.scrollTo, for example).
Scroll changes driven from the main thread are just as legitimate and need to be propagated to the compositor thread, so that the visual display updates in response.
Because the cross-thread messaging is asynchronous, reconciling the two types of scroll changes is a tricky problem.
Our design solves this using various flags and generation counters.
The general heuristic we have is that content-driven scroll position changes (e.g. scrollTo from JS) are never lost.
For instance, if the user is doing an async scroll with their finger and content does a scrollTo in the middle, then some of the async scroll would occur before the "jump" and the rest after the "jump".
### Content preventing default behaviour of input events
Another problem that we need to deal with is that web content is allowed to intercept touch events and prevent the "default behaviour" of scrolling.
This ability is defined in web standards and is non-negotiable.
Touch event listeners in web content are allowed call preventDefault() on the touchstart or first touchmove event for a touch point; doing this is supposed to "consume" the event and prevent touch-based panning.
As we saw in a previous section, the input event needs to be untransformed by the APZ code before it can be delivered to content.
But, because of the preventDefault problem, we cannot fully process the touch event in the APZ code until content has had a chance to handle it.
Web browsers in general solve this problem by inserting a delay of up to 300ms before processing the input - that is, web content is allowed up to 300ms to process the event and call preventDefault on it.
If web content takes longer than 300ms, or if it completes handling of the event without calling preventDefault, then the browser immediately starts processing the events.
The way the APZ implementation deals with this is that upon receiving a touch event, it immediately returns an untransformed version that can be dispatched to content.
It also schedules a 400ms timeout (600ms on Android) during which content is allowed to prevent scrolling.
There is an API that allows the main-thread event dispatching code to notify the APZ as to whether or not the default action should be prevented.
If the APZ content response timeout expires, or if the main-thread event dispatching code notifies the APZ of the preventDefault status, then the APZ continues with the processing of the events (which may involve discarding the events).
The touch-action CSS property from the pointer-events spec is intended to allow eliminating this 400ms delay in many cases (although for backwards compatibility it will still be needed for a while).
Note that even with touch-action implemented, there may be cases where the APZ code does not know the touch-action behaviour of the point the user touched.
In such cases, the APZ code will still wait up to 400ms for the main thread to provide it with the touch-action behaviour information.
## Technical details
This section describes various pieces of the APZ code, and goes into more specific detail on APIs and code than the previous sections.
The primary purpose of this section is to help people who plan on making changes to the code, while also not going into so much detail that it needs to be updated with every patch.
### Overall flow of input events
This section describes how input events flow through the APZ code.
<ol>
<li value="1">
Input events arrive from the hardware/widget code into the APZ via APZCTreeManager::ReceiveInputEvent.
The thread that invokes this is called the input thread, and may or may not be the same as the Gecko main thread.
</li>
<li value="2">
Conceptually the first thing that the APZCTreeManager does is to associate these events with "input blocks".
An input block is a set of events that share certain properties, and generally are intended to represent a single gesture.
For example with touch events, all events following a touchstart up to but not including the next touchstart are in the same block.
All of the events in a given block will go to the same APZC instance and will either all be processed or all be dropped.
</li>
<li value="3">
Using the first event in the input block, the APZCTreeManager does a hit-test to see which APZC it hits.
This hit-test uses the event regions populated on the layers, which may be larger than the true hit area of the layer.
If no APZC is hit, the events are discarded and we jump to step 6.
Otherwise, the input block is tagged with the hit APZC as a tentative target and put into a global APZ input queue.
</li>
<li value="4">
<ol>
<li value="i">
If the input events landed outside the dispatch-to-content event region for the layer, any available events in the input block are processed.
These may trigger behaviours like scrolling or tap gestures.
</li>
<li value="ii">
If the input events landed inside the dispatch-to-content event region for the layer, the events are left in the queue and a 400ms timeout is initiated.
If the timeout expires before step 9 is completed, the APZ assumes the input block was not cancelled and the tentative target is correct, and processes them as part of step 10.
</li>
</ol>
</li>
<li value="5">
The call stack unwinds back to APZCTreeManager::ReceiveInputEvent, which does an in-place modification of the input event so that any async transforms are removed.
</li>
<li value="6">
The call stack unwinds back to the widget code that called ReceiveInputEvent.
This code now has the event in the coordinate space Gecko is expecting, and so can dispatch it to the Gecko main thread.
</li>
<li value="7">
Gecko performs its own usual hit-testing and event dispatching for the event.
As part of this, it records whether any touch listeners cancelled the input block by calling preventDefault().
It also activates inactive scrollframes that were hit by the input events.
</li>
<li value="8">
The call stack unwinds back to the widget code, which sends two notifications to the APZ code on the input thread.
The first notification is via APZCTreeManager::ContentReceivedInputBlock, and informs the APZ whether the input block was cancelled.
The second notification is via APZCTreeManager::SetTargetAPZC, and informs the APZ of the results of the Gecko hit-test during event dispatch.
Note that Gecko may report that the input event did not hit any scrollable frame at all.
The SetTargetAPZC notification happens only once per input block, while the ContentReceivedInputBlock notification may happen once per block, or multiple times per block, depending on the input type.
</li>
<li value="9">
<ol>
<li value="i">
If the events were processed as part of step 4(i), the notifications from step 8 are ignored and step 10 is skipped.
</li>
<li value="ii">
If events were queued as part of step 4(ii), and steps 5-8 take less than 400ms, the arrival of both notifications from step 8 will mark the input block ready for processing.
</li>
<li value="iii">
If events were queued as part of step 4(ii), but steps 5-8 take longer than 400ms, the notifications from step 8 will be ignored and step 10 will already have happened.
</li>
</ol>
</li>
<li value="10">
If events were queued as part of step 4(ii) they are now either processed (if the input block was not cancelled and Gecko detected a scrollframe under the input event, or if the timeout expired) or dropped (all other cases).
Note that the APZC that processes the events may be different at this step than the tentative target from step 3, depending on the SetTargetAPZC notification.
Processing the events may trigger behaviours like scrolling or tap gestures.
</li>
</ol>
If the CSS touch-action property is enabled, the above steps are modified as follows:
<ul>
<li>
In step 4, the APZC also requires the allowed touch-action behaviours for the input event.
This might have been determined as part of the hit-test in APZCTreeManager; if not, the events are queued.
</li>
<li>
In step 6, the widget code determines the content element at the point under the input element, and notifies the APZ code of the allowed touch-action behaviours.
This notification is sent via a call to APZCTreeManager::SetAllowedTouchBehavior on the input thread.
</li>
<li>
In step 9(ii), the input block will only be marked ready for processing once all three notifications arrive.
</li>
</ul>
#### Threading considerations
The bulk of the input processing in the APZ code happens on what we call "the input thread".
In practice the input thread could be the Gecko main thread, the compositor thread, or some other thread.
There are obvious downsides to using the Gecko main thread - that is, "asynchronous" panning and zooming is not really asynchronous as input events can only be processed while Gecko is idle.
In an e10s environment, using the Gecko main thread of the chrome process is acceptable, because the code running in that process is more controllable and well-behaved than arbitrary web content.
Using the compositor thread as the input thread could work on some platforms, but may be inefficient on others.
For example, on Android (Fennec) we receive input events from the system on a dedicated UI thread.
We would have to redispatch the input events to the compositor thread if we wanted to the input thread to be the same as the compositor thread.
This introduces a potential for higher latency, particularly if the compositor does any blocking operations - blocking SwapBuffers operations, for example.
As a result, the APZ code itself does not assume that the input thread will be the same as the Gecko main thread or the compositor thread.
#### Active vs. inactive scrollframes
The number of scrollframes on a page is potentially unbounded.
However, we do not want to create a separate layer for each scrollframe right away, as this would require large amounts of memory.
Therefore, scrollframes as designated as either "active" or "inactive".
Active scrollframes are the ones that do have their contents put on a separate layer (or set of layers), and inactive ones do not.
Consider a page with a scrollframe that is initially inactive.
When layout generates the layers for this page, the content of the scrollframe will be flattened into some other PaintedLayer (call it P).
The layout code also adds the area (or bounding region in case of weird shapes) of the scrollframe to the dispatch-to-content region of P.
When the user starts interacting with that content, the hit-test in the APZ code finds the dispatch-to-content region of P.
The input block therefore has a tentative target of P when it goes into step 4(ii) in the flow above.
When gecko processes the input event, it must detect the inactive scrollframe and activate it, as part of step 7.
Finally, the widget code sends the SetTargetAPZC notification in step 8 to notify the APZ that the input block should really apply to this new layer.
The issue here is that the layer transaction containing the new layer must reach the compositor and APZ before the SetTargetAPZC notification.
If this does not occur within the 400ms timeout, the APZ code will be unable to update the tentative target, and will continue to use P for that input block.
Input blocks that start after the layer transaction will get correctly routed to the new layer as there will now be a layer and APZC instance for the active scrollframe.
This model implies that when the user initially attempts to scroll an inactive scrollframe, it may end up scrolling an ancestor scrollframe.
(This is because in the absence of the SetTargetAPZC notification, the input events will get applied to the closest ancestor scrollframe's APZC.)
Only after the round-trip to the gecko thread is complete is there a layer for async scrolling to actually occur on the scrollframe itself.
At that point the scrollframe will start receiving new input blocks and will scroll normally.

View File

@ -1,83 +0,0 @@
Mozilla Graphics Overview {#graphicsoverview}
=================
## Work in progress. Possibly incorrect or incomplete.
Overview
--------
The graphics systems is responsible for rendering (painting, drawing) the frame tree (rendering tree) elements as created by the layout system. Each leaf in the tree has content, either bounded by a rectangle (or perhaps another shape, in the case of SVG.)
The simple approach for producing the result would thus involve traversing the frame tree, in a correct order, drawing each frame into the resulting buffer and displaying (printing non-withstanding) that buffer when the traversal is done. It is worth spending some time on the "correct order" note above. If there are no overlapping frames, this is fairly simple - any order will do, as long as there is no background. If there is background, we just have to worry about drawing that first. Since we do not control the content, chances are the page is more complicated. There are overlapping frames, likely with transparency, so we need to make sure the elements are draw "back to front", in layers, so to speak. Layers are an important concept, and we will revisit them shortly, as they are central to fixing a major issue with the above simple approach.
While the above simple approach will work, the performance will suffer. Each time anything changes in any of the frames, the complete process needs to be repeated, everything needs to be redrawn. Further, there is very little space to take advantage of the modern graphics (GPU) hardware, or multi-core computers. If you recall from the previous sections, the frame tree is only accessible from the UI thread, so while we're doing all this work, the UI is basically blocked.
### (Retained) Layers
Layers framework was introduced to address the above performance issues, by having a part of the design address each item. At the high level:
1. We create a layer tree. The leaf elements of the tree contain all frames (possibly multiple frames per leaf).
2. We render each layer tree element and cache (retain) the result.
3. We composite (combine) all the leaf elements into the final result.
Let's examine each of these steps, in reverse order.
### Compositing
We use the term composite as it implies that the order is important. If the elements being composited overlap, whether there is transparency involved or not, the order in which they are combined will effect the result.
Compositing is where we can use some of the power of the modern graphics hardware. It is optimal for doing this job. In the scenarios where only the position of individual frames changes, without the content inside them changing, we see why caching each layer would be advantageous - we only need to repeat the final compositing step, completely skipping the layer tree creation and the rendering of each leaf, thus speeding up the process considerably.
Another benefit is equally apparent in the context of the stated deficiencies of the simple approach. We can use the available graphics hardware accelerated APIs to do the compositing step. Direct3D, OpenGL can be used on different platforms and are well suited to accelerate this step.
Finally, we can now envision performing the compositing step on a separate thread, unblocking the UI thread for other work, and doing more work in parallel. More on this below.
It is important to note that the number of operations in this step is proportional to the number of layer tree (leaf) elements, so there is additional work and complexity involved, when the layer tree is large.
#### Render and retain layer elements
As we saw, the compositing step benefits from caching the intermediate result. This does result in the extra memory usage, so needs to be considered during the layer tree creation. Beyond the caching, we can accelerate the rendering of each element by (indirectly) using the available platform APIs (e.g., Direct2D, CoreGraphics, even some of the 3D APIs like OpenGL or Direct3D) as available. This is actually done through a platform independent API (see Moz2D) below, but is important to realize it does get accelerated appropriately.
#### Creating the layer tree
We need to create a layer tree (from the frames tree), which will give us the correct result while striking the right balance between a layer per frame element and a single layer for the complete frames tree. As was mentioned above, there is an overhead in traversing the whole tree and caching each of the elements, balanced by the performance improvements. Some of the performance improvements are only noticed when something changes (e.g., one element is moving, we only need to redo the compositing step).
### Refresh Driver
### Layers
#### Rendering each layer
### Tiling vs. Buffer Rotation vs. Full paint
#### Compositing for the final result
### Graphics API
#### Moz2D
* The Moz2D graphics API, part of the Azure project, is a cross-platform interface onto the various graphics backends that Gecko uses for rendering such as Direct2D (1.0 and 1.1), Skia, Cairo, Quartz, and NV Path. Adding a new graphics platform to Gecko is accomplished by adding a backend to Moz2D.
\see [Moz2D documentation on wiki](https://wiki.mozilla.org/Platform/GFX/Moz2D)
#### Compositing
#### Image Decoding
#### Image Animation
### Funny words
There are a lot of code words that we use to refer to projects, libraries, areas of the code. Here's an attempt to cover some of those:
* Azure - See Moz2D in the Graphics API section above.
* Backend - See Moz2D in the Graphics API section above.
* Cairo - http://www.cairographics.org/. Cairo is a 2D graphics library with support for multiple output devices. Currently supported output targets include the X Window System (via both Xlib and XCB), Quartz, Win32, image buffers, PostScript, PDF, and SVG file output.
* Moz2D - See Moz2D in the Graphics API section above.
* Thebes - Graphics API that preceded Moz2D.
* Reflow
* Display list
### [Historical Documents](http://www.youtube.com/watch?v=lLZQz26-kms)
A number of posts and blogs that will give you more details or more background, or reasoning that led to different solutions and approaches.
* 2010-01 [Layers: Cross Platform Acceleration] (http://www.basschouten.com/blog1.php/layers-cross-platform-acceleration)
* 2010-04 [Layers] (http://robert.ocallahan.org/2010/04/layers_01.html)
* 2010-07 [Retained Layers](http://robert.ocallahan.org/2010/07/retained-layers_16.html)
* 2011-04 [Introduction](https://blog.mozilla.org/joe/2011/04/26/introducing-the-azure-project/ Moz2D)
* 2011-07 [Layers](http://chrislord.net/index.php/2011/07/25/shadow-layers-and-learning-by-failing/ Shadow)
* 2011-09 [Graphics API Design](http://robert.ocallahan.org/2011/09/graphics-api-design.html)
* 2012-04 [Moz2D Canvas on OSX](http://muizelaar.blogspot.ca/2012/04/azure-canvas-on-os-x.html)
* 2012-05 [Mask Layers](http://featherweightmusings.blogspot.co.uk/2012/05/mask-layers_26.html)
* 2013-07 [Graphics related](http://www.basschouten.com/blog1.php)

View File

@ -1,60 +0,0 @@
This is an overview of the major events in the history of our Layers infrastructure.
- iPhone released in July 2007 (Built on a toolkit called LayerKit)
- Core Animation (October 2007) LayerKit was publicly renamed to OS X 10.5
- Webkit CSS 3d transforms (July 2009)
- Original layers API (March 2010) Introduced the idea of a layer manager that
would composite. One of the first use cases for this was hardware accelerated
YUV conversion for video.
- Retained layers (July 7 2010 - Bug 564991)
This was an important concept that introduced the idea of persisting the layer
content across paints in gecko controlled buffers instead of just by the OS. This introduced
the concept of buffer rotation to deal with scrolling instead of using the
native scrolling APIs like ScrollWindowEx
- Layers IPC (July 2010 - Bug 570294)
This introduced shadow layers and edit lists and was originally done for e10s v1
- 3d transforms (September 2011 - Bug 505115)
- OMTC (December 2011 - Bug 711168)
This was prototyped on OS X but shipped first for Fennec
- Tiling v1 (April 2012 - Bug 739679)
Originally done for Fennec.
This was done to avoid situations where we had to do a bunch of work for
scrolling a small amount. i.e. buffer rotation. It allowed us to have a
variety of interesting features like progressive painting and lower resolution
painting.
- C++ Async pan zoom controller (July 2012 - Bug 750974)
The existing APZ code was in Java for Fennec so this was reimplemented.
- Streaming WebGL Buffers (February 2013 - Bug 716859)
Infrastructure to allow OMTC WebGL and avoid the need to glFinish() every
frame.
- Compositor API (April 2013 - Bug 825928)
The planning for this started around November 2012.
Layers refactoring created a compositor API that abstracted away the differences between the
D3D vs OpenGL. The main piece of API is DrawQuad.
- Tiling v2 (Mar 7 2014 - Bug 963073)
Tiling for B2G. This work is mainly porting tiled layers to new textures,
implementing double-buffered tiles and implementing a texture client pool, to
be used by tiled content clients.
A large motivation for the pool was the very slow performance of allocating tiles because
of the sync messages to the compositor.
The slow performance of allocating was directly addressed by bug 959089 which allowed us
to allocate gralloc buffers without sync messages to the compositor thread.
- B2G WebGL performance (May 2014 - Bug 1006957, 1001417, 1024144)
This work improved the synchronization mechanism between the compositor
and the producer.

View File

@ -1,21 +0,0 @@
Mozilla Graphics {#mainpage}
======================
## Work in progress. Possibly incorrect or incomplete.
Introduction
-------
This collection of linked pages contains a combination of Doxygen
extracted source code documentation and design documents for the
Mozilla graphics architecture. The design documents live in gfx/docs directory.
This [wiki page](https://wiki.mozilla.org/Platform/GFX) contains
information about graphics and the graphics team at MoCo.
Continue here for a [very high level introductory overview](@ref graphicsoverview)
if you don't know where to start.
Useful pointers for creating documentation
------
[The mechanics of creating these files](https://wiki.mozilla.org/Platform/GFX/DesignDocumentationGuidelines)

View File

@ -1,246 +0,0 @@
Silk Architecture Overview
=================
#Architecture
Our current architecture is to align three components to hardware vsync timers:
1. Compositor
2. RefreshDriver / Painting
3. Input Events
The flow of our rendering engine is as follows:
1. Hardware Vsync event occurs on an OS specific *Hardware Vsync Thread* on a per monitor basis.
2. The *Hardware Vsync Thread* attached to the monitor notifies the **CompositorVsyncDispatchers** and **RefreshTimerVsyncDispatcher**.
3. For every Firefox window on the specific monitor, notify a **CompositorVsyncDispatcher**. The **CompositorVsyncDispatcher** is specific to one window.
4. The **CompositorVsyncDispatcher** notifies a **CompositorWidgetVsyncObserver** when remote compositing, or a **CompositorVsyncScheduler::Observer** when compositing in-process.
5. If remote compositing, a vsync notification is sent from the **CompositorWidgetVsyncObserver** to the **VsyncBridgeChild** on the UI process, which sends an IPDL message to the **VsyncBridgeParent** on the compositor thread of the GPU process, which then dispatches to **CompositorVsyncScheduler::Observer**.
6. The **RefreshTimerVsyncDispatcher** notifies the Chrome **RefreshTimer** that a vsync has occured.
7. The **RefreshTimerVsyncDispatcher** sends IPC messages to all content processes to tick their respective active **RefreshTimer**.
8. The **Compositor** dispatches input events on the *Compositor Thread*, then composites. Input events are only dispatched on the *Compositor Thread* on b2g.
9. The **RefreshDriver** paints on the *Main Thread*.
The implementation is broken into the following sections and will reference this figure. Note that **Objects** are bold fonts while *Threads* are italicized.
<img src="silkArchitecture.png" width="900px" height="630px" />
#Hardware Vsync
Hardware vsync events from (1), occur on a specific **Display** Object.
The **Display** object is responsible for enabling / disabling vsync on a per connected display basis.
For example, if two monitors are connected, two **Display** objects will be created, each listening to vsync events for their respective displays.
We require one **Display** object per monitor as each monitor may have different vsync rates.
As a fallback solution, we have one global **Display** object that can synchronize across all connected displays.
The global **Display** is useful if a window is positioned halfway between the two monitors.
Each platform will have to implement a specific **Display** object to hook and listen to vsync events.
As of this writing, both Firefox OS and OS X create their own hardware specific *Hardware Vsync Thread* that executes after a vsync has occured.
OS X creates one *Hardware Vsync Thread* per **CVDisplayLinkRef**.
We do not currently support multiple displays, so we use one global **CVDisplayLinkRef** that works across all active displays.
On Windows, we have to create a new platform *thread* that waits for DwmFlush(), which works across all active displays.
Once the thread wakes up from DwmFlush(), the actual vsync timestamp is retrieved from DwmGetCompositionTimingInfo(), which is the timestamp that is actually passed into the compositor and refresh driver.
When a vsync occurs on a **Display**, the *Hardware Vsync Thread* callback fetches all **CompositorVsyncDispatchers** associated with the **Display**.
Each **CompositorVsyncDispatcher** is notified that a vsync has occured with the vsync's timestamp.
It is the responsibility of the **CompositorVsyncDispatcher** to notify the **Compositor** that is awaiting vsync notifications.
The **Display** will then notify the associated **RefreshTimerVsyncDispatcher**, which should notify all active **RefreshDrivers** to tick.
All **Display** objects are encapsulated in a **VsyncSource** object.
The **VsyncSource** object lives in **gfxPlatform** and is instantiated only on the parent process when **gfxPlatform** is created.
The **VsyncSource** is destroyed when **gfxPlatform** is destroyed.
There is only one **VsyncSource** object throughout the entire lifetime of Firefox.
Each platform is expected to implement their own **VsyncSource** to manage vsync events.
On Firefox OS, this is through the **HwcComposer2D**.
On OS X, this is through **CVDisplayLinkRef**.
On Windows, it should be through **DwmGetCompositionTimingInfo**.
#Compositor
When the **CompositorVsyncDispatcher** is notified of the vsync event, the **CompositorVsyncScheduler::Observer** associated with the **CompositorVsyncDispatcher** begins execution.
Since the **CompositorVsyncDispatcher** executes on the *Hardware Vsync Thread* and the **Compositor** composites on the *CompositorThread*, the **CompositorVsyncScheduler::Observer** posts a task to the *CompositorThread*.
The **CompositorBridgeParent** then composites.
The model where the **CompositorVsyncDispatcher** notifies components on the *Hardware Vsync Thread*, and the component schedules the task on the appropriate thread is used everywhere.
The **CompositorVsyncScheduler::Observer** listens to vsync events as needed and stops listening to vsync when composites are no longer scheduled or required.
Every **CompositorBridgeParent** is associated and tied to one **CompositorVsyncScheduler::Observer**, which is associated with the **CompositorVsyncDispatcher**.
Each **CompositorBridgeParent** is associated with one widget and is created when a new platform window or **nsBaseWidget** is created.
The **CompositorBridgeParent**, **CompositorVsyncDispatcher**, **CompositorVsyncScheduler::Observer**, and **nsBaseWidget** all have the same lifetimes, which are created and destroyed together.
##Out-of-process Compositors
When compositing out-of-process, this model changes slightly.
In this case there are effectively two observers: a UI process observer (**CompositorWidgetVsyncObserver**), and the **CompositorVsyncScheduler::Observer** in the GPU process.
There are also two dispatchers: the widget dispatcher in the UI process (**CompositorVsyncDispatcher**), and the IPDL-based dispatcher in the GPU process (**CompositorBridgeParent::NotifyVsync**).
The UI process observer and the GPU process dispatcher are linked via an IPDL protocol called PVsyncBridge.
**PVsyncBridge** is a top-level protocol for sending vsync notifications to the compositor thread in the GPU process.
The compositor controls vsync observation through a separate actor, **PCompositorWidget**, which (as a subactor for **CompositorBridgeChild**) links the compositor thread in the GPU process to the main thread in the UI process.
Out-of-process compositors do not go through **CompositorVsyncDispatcher** directly.
Instead, the **CompositorWidgetDelegate** in the UI process creates one, and gives it a **CompositorWidgetVsyncObserver**.
This observer forwards notifications to a Vsync I/O thread, where **VsyncBridgeChild** then forwards the notification again to the compositor thread in the GPU process.
The notification is received by a **VsyncBridgeParent**.
The GPU process uses the layers ID in the notification to find the correct compositor to dispatch the notification to.
###CompositorVsyncDispatcher
The **CompositorVsyncDispatcher** executes on the *Hardware Vsync Thread*.
It contains references to the **nsBaseWidget** it is associated with and has a lifetime equal to the **nsBaseWidget**.
The **CompositorVsyncDispatcher** is responsible for notifying the **CompositorBridgeParent** that a vsync event has occured.
There can be multiple **CompositorVsyncDispatchers** per **Display**, one **CompositorVsyncDispatcher** per window.
The only responsibility of the **CompositorVsyncDispatcher** is to notify components when a vsync event has occured, and to stop listening to vsync when no components require vsync events.
We require one **CompositorVsyncDispatcher** per window so that we can handle multiple **Displays**.
When compositing in-process, the **CompositorVsyncDispatcher** is attached to the CompositorWidget for the
window. When out-of-process, it is attached to the CompositorWidgetDelegate, which forwards
observer notifications over IPDL. In the latter case, its lifetime is tied to a CompositorSession
rather than the nsIWidget.
###Multiple Displays
The **VsyncSource** has an API to switch a **CompositorVsyncDispatcher** from one **Display** to another **Display**.
For example, when one window either goes into full screen mode or moves from one connected monitor to another.
When one window moves to another monitor, we expect a platform specific notification to occur.
The detection of when a window enters full screen mode or moves is not covered by Silk itself, but the framework is built to support this use case.
The expected flow is that the OS notification occurs on **nsIWidget**, which retrieves the associated **CompositorVsyncDispatcher**.
The **CompositorVsyncDispatcher** then notifies the **VsyncSource** to switch to the correct **Display** the **CompositorVsyncDispatcher** is connected to.
Because the notification works through the **nsIWidget**, the actual switching of the **CompositorVsyncDispatcher** to the correct **Display** should occur on the *Main Thread*.
The current implementation of Silk does not handle this case and needs to be built out.
###CompositorVsyncScheduler::Observer
The **CompositorVsyncScheduler::Observer** handles the vsync notifications and interactions with the **CompositorVsyncDispatcher**.
When the **Compositor** requires a scheduled composite, it notifies the **CompositorVsyncScheduler::Observer** that it needs to listen to vsync.
The **CompositorVsyncScheduler::Observer** then observes / unobserves vsync as needed from the **CompositorVsyncDispatcher** to enable composites.
###GeckoTouchDispatcher
The **GeckoTouchDispatcher** is a singleton that resamples touch events to smooth out jank while tracking a user's finger.
Because input and composite are linked together, the **CompositorVsyncScheduler::Observer** has a reference to the **GeckoTouchDispatcher** and vice versa.
###Input Events
One large goal of Silk is to align touch events with vsync events.
On Firefox OS, touchscreens often have different touch scan rates than the display refreshes.
A Flame device has a touch refresh rate of 75 HZ, while a Nexus 4 has a touch refresh rate of 100 HZ, while the device's display refresh rate is 60HZ.
When a vsync event occurs, we resample touch events, and then dispatch the resampled touch event to APZ.
Touch events on Firefox OS occur on a *Touch Input Thread* whereas they are processed by APZ on the *APZ Controller Thread*.
We use [Google Android's touch resampling](http://www.masonchang.com/blog/2014/8/25/androids-touch-resampling-algorithm) algorithm to resample touch events.
Currently, we have a strict ordering between Composites and touch events.
When a touch event occurs on the *Touch Input Thread*, we store the touch event in a queue.
When a vsync event occurs, the **CompositorVsyncDispatcher** notifies the **Compositor** of a vsync event, which notifies the **GeckoTouchDispatcher**.
The **GeckoTouchDispatcher** processes the touch event first on the *APZ Controller Thread*, which is the same as the *Compositor Thread* on b2g, then the **Compositor** finishes compositing.
We require this strict ordering because if a vsync notification is dispatched to both the **Compositor** and **GeckoTouchDispatcher** at the same time, a race condition occurs between processing the touch event and therefore position versus compositing.
In practice, this creates very janky scrolling.
As of this writing, we have not analyzed input events on desktop platforms.
One slight quirk is that input events can start a composite, for example during a scroll and after the **Compositor** is no longer listening to vsync events.
In these cases, we notify the **Compositor** to observe vsync so that it dispatches touch events.
If touch events were not dispatched, and since the **Compositor** is not listening to vsync events, the touch events would never be dispatched.
The **GeckoTouchDispatcher** handles this case by always forcing the **Compositor** to listen to vsync events while touch events are occurring.
###Widget, Compositor, CompositorVsyncDispatcher, GeckoTouchDispatcher Shutdown Procedure
When the [nsBaseWidget shuts down](https://hg.mozilla.org/mozilla-central/file/0df249a0e4d3/widget/nsBaseWidget.cpp#l182) - It calls nsBaseWidget::DestroyCompositor on the *Gecko Main Thread*.
During nsBaseWidget::DestroyCompositor, it first destroys the CompositorBridgeChild.
CompositorBridgeChild sends a sync IPC call to CompositorBridgeParent::RecvStop, which calls [CompositorBridgeParent::Destroy](https://hg.mozilla.org/mozilla-central/file/ab0490972e1e/gfx/layers/ipc/CompositorBridgeParent.cpp#l509).
During this time, the *main thread* is blocked on the parent process.
CompositorBridgeParent::RecvStop runs on the *Compositor thread* and cleans up some resources, including setting the **CompositorVsyncScheduler::Observer** to nullptr.
CompositorBridgeParent::RecvStop also explicitly keeps the CompositorBridgeParent alive and posts another task to run CompositorBridgeParent::DeferredDestroy on the Compositor loop so that all ipdl code can finish executing.
The **CompositorVsyncScheduler::Observer** also unobserves from vsync and cancels any pending composite tasks.
Once CompositorBridgeParent::RecvStop finishes, the *main thread* in the parent process continues shutting down the nsBaseWidget.
At the same time, the *Compositor thread* is executing tasks until CompositorBridgeParent::DeferredDestroy runs, which flushes the compositor message loop.
Now we have two tasks as both the nsBaseWidget releases a reference to the Compositor on the *main thread* during destruction and the CompositorBridgeParent::DeferredDestroy releases a reference to the CompositorBridgeParent on the *Compositor Thread*.
Finally, the CompositorBridgeParent itself is destroyed on the *main thread* once both references are gone due to explicit [main thread destruction](https://hg.mozilla.org/mozilla-central/file/50b95032152c/gfx/layers/ipc/CompositorBridgeParent.h#l148).
With the **CompositorVsyncScheduler::Observer**, any accesses to the widget after nsBaseWidget::DestroyCompositor executes are invalid.
Any accesses to the compositor between the time the nsBaseWidget::DestroyCompositor runs and the CompositorVsyncScheduler::Observer's destructor runs aren't safe yet a hardware vsync event could occur between these times.
Since any tasks posted on the Compositor loop after CompositorBridgeParent::DeferredDestroy is posted are invalid, we make sure that no vsync tasks can be posted once CompositorBridgeParent::RecvStop executes and DeferredDestroy is posted on the Compositor thread.
When the sync call to CompositorBridgeParent::RecvStop executes, we explicitly set the CompositorVsyncScheduler::Observer to null to prevent vsync notifications from occurring.
If vsync notifications were allowed to occur, since the **CompositorVsyncScheduler::Observer**'s vsync notification executes on the *hardware vsync thread*, it would post a task to the Compositor loop and may execute after CompositorBridgeParent::DeferredDestroy.
Thus, we explicitly shut down vsync events in the **CompositorVsyncDispatcher** and **CompositorVsyncScheduler::Observer** during nsBaseWidget::Shutdown to prevent any vsync tasks from executing after CompositorBridgeParent::DeferredDestroy.
The **CompositorVsyncDispatcher** may be destroyed on either the *main thread* or *Compositor Thread*, since both the nsBaseWidget and **CompositorVsyncScheduler::Observer** race to destroy on different threads.
nsBaseWidget is destroyed on the *main thread* and releases a reference to the **CompositorVsyncDispatcher** during destruction.
The **CompositorVsyncScheduler::Observer** has a race to be destroyed either during CompositorBridgeParent shutdown or from the **GeckoTouchDispatcher** which is destroyed on the main thread with [ClearOnShutdown](https://hg.mozilla.org/mozilla-central/file/21567e9a6e40/xpcom/base/ClearOnShutdown.h#l15).
Whichever object, the CompositorBridgeParent or the **GeckoTouchDispatcher** is destroyed last will hold the last reference to the **CompositorVsyncDispatcher**, which destroys the object.
#Refresh Driver
The Refresh Driver is ticked from a [single active timer](https://hg.mozilla.org/mozilla-central/file/ab0490972e1e/layout/base/nsRefreshDriver.cpp#l11).
The assumption is that there are multiple **RefreshDrivers** connected to a single **RefreshTimer**.
There are two **RefreshTimers**: an active and an inactive **RefreshTimer**.
Each Tab has its own **RefreshDriver**, which connects to one of the global **RefreshTimers**.
The **RefreshTimers** execute on the *Main Thread* and tick their connected **RefreshDrivers**.
We do not want to break this model of multiple **RefreshDrivers** per a set of two global **RefreshTimers**.
Each **RefreshDriver** switches between the active and inactive **RefreshTimer**.
Instead, we create a new **RefreshTimer**, the **VsyncRefreshTimer** which ticks based on vsync messages.
We replace the current active timer with a **VsyncRefreshTimer**.
All tabs will then tick based on this new active timer.
Since the **RefreshTimer** has a lifetime of the process, we only need to create a single **RefreshTimerVsyncDispatcher** per **Display** when Firefox starts.
Even if we do not have any content processes, the Chrome process will still need a **VsyncRefreshTimer**, thus we can associate the **RefreshTimerVsyncDispatcher** with each **Display**.
When Firefox starts, we initially create a new **VsyncRefreshTimer** in the Chrome process.
The **VsyncRefreshTimer** will listen to vsync notifications from **RefreshTimerVsyncDispatcher** on the global **Display**.
When nsRefreshDriver::Shutdown executes, it will delete the **VsyncRefreshTimer**.
This creates a problem as all the **RefreshTimers** are currently manually memory managed whereas **VsyncObservers** are ref counted.
To work around this problem, we create a new **RefreshDriverVsyncObserver** as an inner class to **VsyncRefreshTimer**, which actually receives vsync notifications. It then ticks the **RefreshDrivers** inside **VsyncRefreshTimer**.
With Content processes, the start up process is more complicated.
We send vsync IPC messages via the use of the PBackground thread on the parent process, which allows us to send messages from the Parent process' without waiting on the *main thread*.
This sends messages from the Parent::*PBackground Thread* to the Child::*Main Thread*.
The *main thread* receiving IPC messages on the content process is acceptable because **RefreshDrivers** must execute on the *main thread*.
However, there is some amount of time required to setup the IPC connection upon process creation and during this time, the **RefreshDrivers** must tick to set up the process.
To get around this, we initially use software **RefreshTimers** that already exist during content process startup and swap in the **VsyncRefreshTimer** once the IPC connection is created.
During nsRefreshDriver::ChooseTimer, we create an async PBackground IPC open request to create a **VsyncParent** and **VsyncChild**.
At the same time, we create a software **RefreshTimer** and tick the **RefreshDrivers** as normal.
Once the PBackground callback is executed and an IPC connection exists, we swap all **RefreshDrivers** currently associated with the active **RefreshTimer** and swap the **RefreshDrivers** to use the **VsyncRefreshTimer**.
Since all interactions on the content process occur on the main thread, there are no need for locks.
The **VsyncParent** listens to vsync events through the **VsyncRefreshTimerDispatcher** on the parent side and sends vsync IPC messages to the **VsyncChild**.
The **VsyncChild** notifies the **VsyncRefreshTimer** on the content process.
During the shutdown process of the content process, ActorDestroy is called on the **VsyncChild** and **VsyncParent** due to the normal PBackground shutdown process.
Once ActorDestroy is called, no IPC messages should be sent across the channel.
After ActorDestroy is called, the IPDL machinery will delete the **VsyncParent/Child** pair.
The **VsyncParent**, due to being a **VsyncObserver**, is ref counted.
After **VsyncParent::ActorDestroy** is called, it unregisters itself from the **RefreshTimerVsyncDispatcher**, which holds the last reference to the **VsyncParent**, and the object will be deleted.
Thus the overall flow during normal execution is:
1. VsyncSource::Display::RefreshTimerVsyncDispatcher receives a Vsync notification from the OS in the parent process.
2. RefreshTimerVsyncDispatcher notifies VsyncRefreshTimer::RefreshDriverVsyncObserver that a vsync occured on the parent process on the hardware vsync thread.
3. RefreshTimerVsyncDispatcher notifies the VsyncParent on the hardware vsync thread that a vsync occured.
4. The VsyncRefreshTimer::RefreshDriverVsyncObserver in the parent process posts a task to the main thread that ticks the refresh drivers.
5. VsyncParent posts a task to the PBackground thread to send a vsync IPC message to VsyncChild.
6. VsyncChild receive a vsync notification on the content process on the main thread and ticks their respective RefreshDrivers.
###Compressing Vsync Messages
Vsync messages occur quite often and the *main thread* can be busy for long periods of time due to JavaScript.
Consistently sending vsync messages to the refresh driver timer can flood the *main thread* with refresh driver ticks, causing even more delays.
To avoid this problem, we compress vsync messages on both the parent and child processes.
On the parent process, newer vsync messages update a vsync timestamp but do not actually queue any tasks on the *main thread*.
Once the parent process' *main thread* executes the refresh driver tick, it uses the most updated vsync timestamp to tick the refresh driver.
After the refresh driver has ticked, one single vsync message is queued for another refresh driver tick task.
On the content process, the IPDL **compress** keyword automatically compresses IPC messages.
### Multiple Monitors
In order to have multiple monitor support for the **RefreshDrivers**, we have multiple active **RefreshTimers**.
Each **RefreshTimer** is associated with a specific **Display** via an id and tick when it's respective **Display** vsync occurs.
We have **N RefreshTimers**, where N is the number of connected displays.
Each **RefreshTimer** still has multiple **RefreshDrivers**.
When a tab or window changes monitors, the **nsIWidget** receives a display changed notification.
Based on which display the window is on, the window switches to the correct **RefreshTimerVsyncDispatcher** and **CompositorVsyncDispatcher** on the parent process based on the display id.
Each **TabParent** should also send a notification to their child.
Each **TabChild**, given the display ID, switches to the correct **RefreshTimer** associated with the display ID.
When each display vsync occurs, it sends one IPC message to notify vsync.
The vsync message contains a display ID, to tick the appropriate **RefreshTimer** on the content process.
There is still only one **VsyncParent/VsyncChild** pair, just each vsync notification will include a display ID, which maps to the correct **RefreshTimer**.
#Object Lifetime
1. CompositorVsyncDispatcher - Lives as long as the nsBaseWidget associated with the VsyncDispatcher
2. CompositorVsyncScheduler::Observer - Lives and dies the same time as the CompositorBridgeParent.
3. RefreshTimerVsyncDispatcher - As long as the associated display object, which is the lifetime of Firefox.
4. VsyncSource - Lives as long as the gfxPlatform on the chrome process, which is the lifetime of Firefox.
5. VsyncParent/VsyncChild - Lives as long as the content process
6. RefreshTimer - Lives as long as the process
#Threads
All **VsyncObservers** are notified on the *Hardware Vsync Thread*. It is the responsibility of the **VsyncObservers** to post tasks to their respective correct thread. For example, the **CompositorVsyncScheduler::Observer** will be notified on the *Hardware Vsync Thread*, and post a task to the *Compositor Thread* to do the actual composition.
1. Compositor Thread - Nothing changes
2. Main Thread - PVsyncChild receives IPC messages on the main thread. We also enable/disable vsync on the main thread.
3. PBackground Thread - Creates a connection from the PBackground thread on the parent process to the main thread in the content process.
4. Hardware Vsync Thread - Every platform is different, but we always have the concept of a hardware vsync thread. Sometimes this is actually created by the host OS. On Windows, we have to create a separate platform thread that blocks on DwmFlush().

370
gfx/docs/AdvancedLayers.rst Normal file
View File

@ -0,0 +1,370 @@
Advanced Layers
===============
Advanced Layers is a new method of compositing layers in Gecko. This
document serves as a technical overview and provides a short
walk-through of its source code.
Overview
--------
Advanced Layers attempts to group as many GPU operations as it can into
a single draw call. This is a common technique in GPU-based rendering
called “batching”. It is not always trivial, as a batching algorithm can
easily waste precious CPU resources trying to build optimal draw calls.
Advanced Layers reuses the existing Gecko layers system as much as
possible. Huge layer trees do not currently scale well (see the future
work section), so opportunities for batching are currently limited
without expending unnecessary resources elsewhere. However, Advanced
Layers has a few benefits:
- It submits smaller GPU workloads and buffer uploads than the existing
compositor.
- It needs only a single pass over the layer tree.
- It uses occlusion information more intelligently.
- It is easier to add new specialized rendering paths and new layer
types.
- It separates compositing logic from device logic, unlike the existing
compositor.
- It is much faster at rendering 3d scenes or complex layer trees.
- It has experimental code to use the z-buffer for occlusion culling.
Because of these benefits we hope that it provides a significant
improvement over the existing compositor.
Advanced Layers uses the acronym “MLG” and “MLGPU” in many places. This
stands for “Mid-Level Graphics”, the idea being that it is optimized for
Direct3D 11-style rendering systems as opposed to Direct3D 12 or Vulkan.
LayerManagerMLGPU
-----------------
Advanced layers does not change client-side rendering at all. Content
still uses Direct2D (when possible), and creates identical layer trees
as it would with a normal Direct3D 11 compositor. In fact, Advanced
Layers re-uses all of the existing texture handling and video
infrastructure as well, replacing only the composite-side layer types.
Advanced Layers does not create a ``LayerManagerComposite`` - instead,
it creates a ``LayerManagerMLGPU``. This layer manager does not have a
``Compositor`` - instead, it has an ``MLGDevice``, which roughly
abstracts the Direct3D 11 API. (The hope is that this API is easily
interchangeable for something else when cross-platform or software
support is needed.)
``LayerManagerMLGPU`` also dispenses with the old “composite” layers for
new layer types. For example, ``ColorLayerComposite`` becomes
``ColorLayerMLGPU``. Since these layer types implement ``HostLayer``,
they integrate with ``LayerTransactionParent`` as normal composite
layers would.
Rendering Overview
------------------
The steps for rendering are described in more detail below, but roughly
the process is:
1. Sort layers front-to-back.
2. Create a dependency tree of render targets (called “views”).
3. Accumulate draw calls for all layers in each view.
4. Upload draw call buffers to the GPU.
5. Execute draw commands for each view.
Advanced Layers divides the layer tree into “views”
(``RenderViewMLGPU``), which correspond to a render target. The root
layer is represented by a view corresponding to the screen. Layers that
require intermediate surfaces have temporary views. Layers are analyzed
front-to-back, and rendered back-to-front within a view. Views
themselves are rendered front-to-back, to minimize render target
switching.
Each view contains one or more rendering passes (``RenderPassMLGPU``). A
pass represents a single draw command with one or more rendering items
attached to it. For example, a ``SolidColorPass`` item contains a
rectangle and an RGBA value, and many of these can be drawn with a
single GPU call.
When considering a layer, views will first try to find an existing
rendering batch that can support it. If so, that pass will accumulate
another draw item for the layer. Otherwise, a new pass will be added.
When trying to find a matching pass for a layer, there is a tradeoff in
CPU time versus the GPU time saved by not issuing another draw commands.
We generally care more about CPU time, so we do not try too hard in
matching items to an existing batch.
After all layers have been processed, there is a “prepare” step. This
copies all accumulated draw data and uploads it into vertex and constant
buffers in the GPU.
Finally, we execute rendering commands. At the end of the frame, all
batches and (most) constant buffers are thrown away.
Shaders Overview
----------------
Advanced Layers currently has five layer-related shader pipelines:
- Textured (PaintedLayer, ImageLayer, CanvasLayer)
- ComponentAlpha (PaintedLayer with component-alpha)
- YCbCr (ImageLayer with YCbCr video)
- Color (ColorLayers)
- Blend (ContainerLayers with mix-blend modes)
There are also three special shader pipelines:
- MaskCombiner, which is used to combine mask layers into a single
texture.
- Clear, which is used for fast region-based clears when not directly
supported by the GPU.
- Diagnostic, which is used to display the diagnostic overlay texture.
The layer shaders follow a unified structure. Each pipeline has a vertex
and pixel shader. The vertex shader takes a layers ID, a z-buffer depth,
a unit position in either a unit square or unit triangle, and either
rectangular or triangular geometry. Shaders can also have ancillary data
needed like texture coordinates or colors.
Most of the time, layers have simple rectangular clips with simple
rectilinear transforms, and pixel shaders do not need to perform masking
or clipping. For these layers we use a fast-path pipeline, using
unit-quad shaders that are able to clip geometry so the pixel shader
does not have to. This type of pipeline does not support complex masks.
If a layer has a complex mask, a rotation or 3d transform, or a complex
operation like blending, then we use shaders capable of handling
arbitrary geometry. Their input is a unit triangle, and these shaders
are generally more expensive.
All of the shader-specific data is modelled in ShaderDefinitionsMLGPU.h.
CPU Occlusion Culling
---------------------
By default, Advanced Layers performs occlusion culling on the CPU. Since
layers are visited front-to-back, this is simply a matter of
accumulating the visible region of opaque layers, and subtracting it
from the visible region of subsequent layers. There is a major
difference between this occlusion culling and PostProcessLayers of the
old compositor: AL performs culling after invalidation, not before.
Completely valid layers will have an empty visible region.
Most layer types (with the exception of images) will intelligently split
their draw calls into a batch of individual rectangles, based on their
visible region.
Z-Buffering and Occlusion
-------------------------
Advanced Layers also supports occlusion culling on the GPU, using a
z-buffer. This is disabled by default currently since it is
significantly costly on integrated GPUs. When using the z-buffer, we
separate opaque layers into a separate list of passes. The render
process then uses the following steps:
1. The depth buffer is set to read-write.
2. Opaque batches are executed.,
3. The depth buffer is set to read-only.
4. Transparent batches are executed.
The problem we have observed is that the depth buffer increases writes
to the GPU, and on integrated GPUs this is expensive - we have seen draw
call times increase by 20-30%, which is the wrong direction we want to
take on battery life. In particular on a full screen video, the call to
ClearDepthStencilView plus the actual depth buffer write of the video
can double GPU time.
For now the depth-buffer is disabled until we can find a compelling case
for it on non-integrated hardware.
Clipping
--------
Clipping is a bit tricky in Advanced Layers. We cannot use the hardware
“scissor” feature, since the clip can change from instance to instance
within a batch. And if using the depth buffer, we cannot write
transparent pixels for the clipped area. As a result we always clip
opaque draw rects in the vertex shader (and sometimes even on the CPU,
as is needed for sane texture coordiantes). Only transparent items are
clipped in the pixel shader. As a result, masked layers and layers with
non-rectangular transforms are always considered transparent, and use a
more flexible clipping pipeline.
Plane Splitting
---------------
Plane splitting is when a 3D transform causes a layer to be split - for
example, one transparent layer may intersect another on a separate
plane. When this happens, Gecko sorts layers using a BSP tree and
produces a list of triangles instead of draw rects.
These layers cannot use the “unit quad” shaders that support the fast
clipping pipeline. Instead they always use the full triangle-list
shaders that support extended vertices and clipping.
This is the slowest path we can take when building a draw call, since we
must interact with the polygon clipping and texturing code.
Masks
-----
For each layer with a mask attached, Advanced Layers builds a
``MaskOperation``. These operations must resolve to a single mask
texture, as well as a rectangular area to which the mask applies. All
batched pixel shaders will automatically clip pixels to the mask if a
mask texture is bound. (Note that we must use separate batches if the
mask texture changes.)
Some layers have multiple mask textures. In this case, the MaskOperation
will store the list of masks, and right before rendering, it will invoke
a shader to combine these masks into a single texture.
MaskOperations are shared across layers when possible, but are not
cached across frames.
BigImage Support
----------------
ImageLayers and CanvasLayers can be tiled with many individual textures.
This happens in rare cases where the underlying buffer is too big for
the GPU. Early on this caused problems for Advanced Layers, since AL
required one texture per layer. We implemented BigImage support by
creating temporary ImageLayers for each visible tile, and throwing those
layers away at the end of the frame.
Advanced Layers no longer has a 1:1 layer:texture restriction, but we
retain the temporary layer solution anyway. It is not much code and it
means we do not have to split ``TexturedLayerMLGPU`` methods into
iterated and non-iterated versions.
Texture Locking
---------------
Advanced Layers has a different texture locking scheme than the existing
compositor. If a texture needs to be locked, then it is locked by the
MLGDevice automatically when bound to the current pipeline. The
MLGDevice keeps a set of the locked textures to avoid double-locking. At
the end of the frame, any textures in the locked set are unlocked.
We cannot easily replicate the locking scheme in the old compositor,
since the duration of using the texture is not scoped to when we visit
the layer.
Buffer Measurements
-------------------
Advanced Layers uses constant buffers to send layer information and
extended instance data to the GPU. We do this by pre-allocating large
constant buffers and mapping them with ``MAP_DISCARD`` at the beginning
of the frame. Batches may allocate into this up to the maximum bindable
constant buffer size of the device (currently, 64KB).
There are some downsides to this approach. Constant buffers are
difficult to work with - they have specific alignment requirements, and
care must be taken not too run over the maximum number of constants in a
buffer. Another approach would be to store constants in a 2D texture and
use vertex shader texture fetches. Advanced Layers implemented this and
benchmarked it to decide which approach to use. Textures seemed to skew
better on GPU performance, but worse on CPU, but this varied depending
on the GPU. Overall constant buffers performed best and most
consistently, so we have kept them.
Additionally, we tested different ways of performing buffer uploads.
Buffer creation itself is costly, especially on integrated GPUs, and
especially so for immutable, immediate-upload buffers. As a result we
aggressively cache buffer objects and always allocate them as
MAP_DISCARD unless they are write-once and long-lived.
Buffer Types
------------
Advanced Layers has a few different classes to help build and upload
buffers to the GPU. They are:
- ``MLGBuffer``. This is the low-level shader resource that
``MLGDevice`` exposes. It is the building block for buffer helper
classes, but it can also be used to make one-off, immutable,
immediate-upload buffers. MLGBuffers, being a GPU resource, are
reference counted.
- ``SharedBufferMLGPU``. These are large, pre-allocated buffers that
are read-only on the GPU and write-only on the CPU. They usually
exceed the maximum bindable buffer size. There are three shared
buffers created by default and they are automatically unmapped as
needed: one for vertices, one for vertex shader constants, and one
for pixel shader constants. When callers allocate into a shared
buffer they get back a mapped pointer, a GPU resource, and an offset.
When the underlying device supports offsetable buffers (like
``ID3D11DeviceContext1`` does), this results in better GPU
utilization, as there are less resources and fewer upload commands.
- ``ConstantBufferSection`` and ``VertexBufferSection``. These are
“views” into a ``SharedBufferMLGPU``. They contain the underlying
``MLGBuffer``, and when offsetting is supported, the offset
information necessary for resource binding. Sections are not
reference counted.
- ``StagingBuffer``. A dynamically sized CPU buffer where items can be
appended in a free-form manner. The stride of a single “item” is
computed by the first item written, and successive items must have
the same stride. The buffer must be uploaded to the GPU manually.
Staging buffers are appropriate for creating general constant or
vertex buffer data. They can also write items in reverse, which is
how we render back-to-front when layers are visited front-to-back.
They can be uploaded to a ``SharedBufferMLGPU`` or an immutabler
``MLGBuffer`` very easily. Staging buffers are not reference counted.
Unsupported Features
--------------------
Currently, these features of the old compositor are not yet implemented.
- OpenGL and software support (currently AL only works on D3D11).
- APZ displayport overlay.
- Diagnostic/developer overlays other than the FPS/timing overlay.
- DEAA. It was never ported to the D3D11 compositor, but we would like
it.
- Component alpha when used inside an opaque intermediate surface.
- Effects prefs. Possibly not needed post-B2G removal.
- Widget overlays and underlays used by macOS and Android.
- DefaultClearColor. This is Android specific, but is easy to added
when needed.
- Frame uniformity info in the profiler. Possibly not needed post-B2G
removal.
- LayerScope. There are no plans to make this work.
Future Work
-----------
- Refactor for D3D12/Vulkan support (namely, split MLGDevice into
something less stateful and something else more low-level).
- Remove “MLG” moniker and namespace everything.
- Other backends (D3D12/Vulkan, OpenGL, Software)
- Delete CompositorD3D11
- Add DEAA support
- Re-enable the depth buffer by default for fast GPUs
- Re-enable right-sizing of inaccurately sized containers
- Drop constant buffers for ancillary vertex data
- Fast shader paths for simple video/painted layer cases
History
-------
Advanced Layers has gone through four major design iterations. The
initial version used tiling - each render view divided the screen into
128x128 tiles, and layers were assigned to tiles based on their
screen-space draw area. This approach proved not to scale well to 3d
transforms, and so tiling was eliminated.
We replaced it with a simple system of accumulating draw regions to each
batch, thus ensuring that items could be assigned to batches while
maintaining correct z-ordering. This second iteration also coincided
with plane-splitting support.
On large layer trees, accumulating the affected regions of batches
proved to be quite expensive. This led to a third iteration, using depth
buffers and separate opaque and transparent batch lists to achieve
z-ordering and occlusion culling.
Finally, depth buffers proved to be too expensive, and we introduced a
simple CPU-based occlusion culling pass. This iteration coincided with
using more precise draw rects and splitting pipelines into unit-quad,
cpu-clipped and triangle-list, gpu-clipped variants.

452
gfx/docs/AsyncPanZoom.rst Normal file
View File

@ -0,0 +1,452 @@
.. _apz:
Asynchronous Panning and Zooming
================================
**This document is a work in progress. Some information may be missing
or incomplete.**
.. image:: AsyncPanZoomArchitecture.png
Goals
-----
We need to be able to provide a visual response to user input with
minimal latency. In particular, on devices with touch input, content
must track the finger exactly while panning, or the user experience is
very poor. According to the UX team, 120ms is an acceptable latency
between user input and response.
Context and surrounding architecture
------------------------------------
The fundamental problem we are trying to solve with the Asynchronous
Panning and Zooming (APZ) code is that of responsiveness. By default,
web browsers operate in a “game loop” that looks like this:
::
while true:
process input
do computations
repaint content
display repainted content
In browsers the “do computation” step can be arbitrarily expensive
because it can involve running event handlers in web content. Therefore,
there can be an arbitrary delay between the input being received and the
on-screen display getting updated.
Responsiveness is always good, and with touch-based interaction it is
even more important than with mouse or keyboard input. In order to
ensure responsiveness, we split the “game loop” model of the browser
into a multithreaded variant which looks something like this:
::
Thread 1 (compositor thread)
while true:
receive input
send a copy of input to thread 2
adjust painted content based on input
display adjusted painted content
Thread 2 (main thread)
while true:
receive input from thread 1
do computations
repaint content
update the copy of painted content in thread 1
This multithreaded model is called off-main-thread compositing (OMTC),
because the compositing (where the content is displayed on-screen)
happens on a separate thread from the main thread. Note that this is a
very very simplified model, but in this model the “adjust painted
content based on input” is the primary function of the APZ code.
The “painted content” is stored on a set of “layers”, that are
conceptually double-buffered. That is, when the main thread does its
repaint, it paints into one set of layers (the “client” layers). The
update that is sent to the compositor thread copies all the changes from
the client layers into another set of layers that the compositor holds.
These layers are called the “shadow” layers or the “compositor” layers.
The compositor in theory can continuously composite these shadow layers
to the screen while the main thread is busy doing other things and
painting a new set of client layers.
The APZ code takes the input events that are coming in from the hardware
and uses them to figure out what the user is trying to do (e.g. pan the
page, zoom in). It then expresses this user intention in the form of
translation and/or scale transformation matrices. These transformation
matrices are applied to the shadow layers at composite time, so that
what the user sees on-screen reflects what they are trying to do as
closely as possible.
Technical overview
------------------
As per the heavily simplified model described above, the fundamental
purpose of the APZ code is to take input events and produce
transformation matrices. This section attempts to break that down and
identify the different problems that make this task non-trivial.
Checkerboarding
~~~~~~~~~~~~~~~
The content area that is painted and stored in a shadow layer is called
the “displayport”. The APZ code is responsible for determining how large
the displayport should be. On the one hand, we want the displayport to
be as large as possible. At the very least it needs to be larger than
what is visible on-screen, because otherwise, as soon as the user pans,
there will be some unpainted area of the page exposed. However, we
cannot always set the displayport to be the entire page, because the
page can be arbitrarily long and this would require an unbounded amount
of memory to store. Therefore, a good displayport size is one that is
larger than the visible area but not so large that it is a huge drain on
memory. Because the displayport is usually smaller than the whole page,
it is always possible for the user to scroll so fast that they end up in
an area of the page outside the displayport. When this happens, they see
unpainted content; this is referred to as “checkerboarding”, and we try
to avoid it where possible.
There are many possible ways to determine what the displayport should be
in order to balance the tradeoffs involved (i.e. having one that is too
big is bad for memory usage, and having one that is too small results in
excessive checkerboarding). Ideally, the displayport should cover
exactly the area that we know the user will make visible. Although we
cannot know this for sure, we can use heuristics based on current
panning velocity and direction to ensure a reasonably-chosen displayport
area. This calculation is done in the APZ code, and a new desired
displayport is frequently sent to the main thread as the user is panning
around.
Multiple layers
~~~~~~~~~~~~~~~
Consider, for example, a scrollable page that contains an iframe which
itself is scrollable. The iframe can be scrolled independently of the
top-level page, and we would like both the page and the iframe to scroll
responsively. This means that we want independent asynchronous panning
for both the top-level page and the iframe. In addition to iframes,
elements that have the overflow:scroll CSS property set are also
scrollable, and also end up on separate scrollable layers. In the
general case, the layers are arranged in a tree structure, and so within
the APZ code we have a matching tree of AsyncPanZoomController (APZC)
objects, one for each scrollable layer. To manage this tree of APZC
instances, we have a single APZCTreeManager object. Each APZC is
relatively independent and handles the scrolling for its associated
layer, but there are some cases in which they need to interact; these
cases are described in the sections below.
Hit detection
~~~~~~~~~~~~~
Consider again the case where we have a scrollable page that contains an
iframe which itself is scrollable. As described above, we will have two
APZC instances - one for the page and one for the iframe. When the user
puts their finger down on the screen and moves it, we need to do some
sort of hit detection in order to determine whether their finger is on
the iframe or on the top-level page. Based on where their finger lands,
the appropriate APZC instance needs to handle the input. This hit
detection is also done in the APZCTreeManager, as it has the necessary
information about the sizes and positions of the layers. Currently this
hit detection is not perfect, as it uses rects and does not account for
things like rounded corners and opacity.
Also note that for some types of input (e.g. when the user puts two
fingers down to do a pinch) we do not want the input to be “split”
across two different APZC instances. In the case of a pinch, for
example, we find a “common ancestor” APZC instance - one that is
zoomable and contains all of the touch input points, and direct the
input to that APZC instance.
Scroll Handoff
~~~~~~~~~~~~~~
Consider yet again the case where we have a scrollable page that
contains an iframe which itself is scrollable. Say the user scrolls the
iframe so that it reaches the bottom. If the user continues panning on
the iframe, the expectation is that the top-level page will start
scrolling. However, as discussed in the section on hit detection, the
APZC instance for the iframe is separate from the APZC instance for the
top-level page. Thus, we need the two APZC instances to communicate in
some way such that input events on the iframe result in scrolling on the
top-level page. This behaviour is referred to as “scroll handoff” (or
“fling handoff” in the case where analogous behaviour results from the
scrolling momentum of the page after the user has lifted their finger).
Input event untransformation
~~~~~~~~~~~~~~~~~~~~~~~~~~~~
The APZC architecture by definition results in two copies of a “scroll
position” for each scrollable layer. There is the original copy on the
main thread that is accessible to web content and the layout and
painting code. And there is a second copy on the compositor side, which
is updated asynchronously based on user input, and corresponds to what
the user visually sees on the screen. Although these two copies may
diverge temporarily, they are reconciled periodically. In particular,
they diverge while the APZ code is performing an async pan or zoom
action on behalf of the user, and are reconciled when the APZ code
requests a repaint from the main thread.
Because of the way input events are stored, this has some unfortunate
consequences. Input events are stored relative to the device screen - so
if the user touches at the same physical spot on the device, the same
input events will be delivered regardless of the content scroll
position. When the main thread receives a touch event, it combines that
with the content scroll position in order to figure out what DOM element
the user touched. However, because we now have two different scroll
positions, this process may not work perfectly. A concrete example
follows:
Consider a device with screen size 600 pixels tall. On this device, a
user is viewing a document that is 1000 pixels tall, and that is
scrolled down by 200 pixels. That is, the vertical section of the
document from 200px to 800px is visible. Now, if the user touches a
point 100px from the top of the physical display, the hardware will
generate a touch event with y=100. This will get sent to the main
thread, which will add the scroll position (200) and get a
document-relative touch event with y=300. This new y-value will be used
in hit detection to figure out what the user touched. If the document
had a absolute-positioned div at y=300, then that would receive the
touch event.
Now let us add some async scrolling to this example. Say that the user
additionally scrolls the document by another 10 pixels asynchronously
(i.e. only on the compositor thread), and then does the same touch
event. The same input event is generated by the hardware, and as before,
the document will deliver the touch event to the div at y=300. However,
visually, the document is scrolled by an additional 10 pixels so this
outcome is wrong. What needs to happen is that the APZ code needs to
intercept the touch event and account for the 10 pixels of asynchronous
scroll. Therefore, the input event with y=100 gets converted to y=110 in
the APZ code before being passed on to the main thread. The main thread
then adds the scroll position it knows about and determines that the
user touched at a document-relative position of y=310.
Analogous input event transformations need to be done for horizontal
scrolling and zooming.
Content independently adjusting scrolling
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
As described above, there are two copies of the scroll position in the
APZ architecture - one on the main thread and one on the compositor
thread. Usually for architectures like this, there is a single “source
of truth” value and the other value is simply a copy. However, in this
case that is not easily possible to do. The reason is that both of these
values can be legitimately modified. On the compositor side, the input
events the user is triggering modify the scroll position, which is then
propagated to the main thread. However, on the main thread, web content
might be running Javascript code that programatically sets the scroll
position (via window.scrollTo, for example). Scroll changes driven from
the main thread are just as legitimate and need to be propagated to the
compositor thread, so that the visual display updates in response.
Because the cross-thread messaging is asynchronous, reconciling the two
types of scroll changes is a tricky problem. Our design solves this
using various flags and generation counters. The general heuristic we
have is that content-driven scroll position changes (e.g. scrollTo from
JS) are never lost. For instance, if the user is doing an async scroll
with their finger and content does a scrollTo in the middle, then some
of the async scroll would occur before the “jump” and the rest after the
“jump”.
Content preventing default behaviour of input events
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Another problem that we need to deal with is that web content is allowed
to intercept touch events and prevent the “default behaviour” of
scrolling. This ability is defined in web standards and is
non-negotiable. Touch event listeners in web content are allowed call
preventDefault() on the touchstart or first touchmove event for a touch
point; doing this is supposed to “consume” the event and prevent
touch-based panning. As we saw in a previous section, the input event
needs to be untransformed by the APZ code before it can be delivered to
content. But, because of the preventDefault problem, we cannot fully
process the touch event in the APZ code until content has had a chance
to handle it. Web browsers in general solve this problem by inserting a
delay of up to 300ms before processing the input - that is, web content
is allowed up to 300ms to process the event and call preventDefault on
it. If web content takes longer than 300ms, or if it completes handling
of the event without calling preventDefault, then the browser
immediately starts processing the events.
The way the APZ implementation deals with this is that upon receiving a
touch event, it immediately returns an untransformed version that can be
dispatched to content. It also schedules a 400ms timeout (600ms on
Android) during which content is allowed to prevent scrolling. There is
an API that allows the main-thread event dispatching code to notify the
APZ as to whether or not the default action should be prevented. If the
APZ content response timeout expires, or if the main-thread event
dispatching code notifies the APZ of the preventDefault status, then the
APZ continues with the processing of the events (which may involve
discarding the events).
The touch-action CSS property from the pointer-events spec is intended
to allow eliminating this 400ms delay in many cases (although for
backwards compatibility it will still be needed for a while). Note that
even with touch-action implemented, there may be cases where the APZ
code does not know the touch-action behaviour of the point the user
touched. In such cases, the APZ code will still wait up to 400ms for the
main thread to provide it with the touch-action behaviour information.
Technical details
-----------------
This section describes various pieces of the APZ code, and goes into
more specific detail on APIs and code than the previous sections. The
primary purpose of this section is to help people who plan on making
changes to the code, while also not going into so much detail that it
needs to be updated with every patch.
Overall flow of input events
~~~~~~~~~~~~~~~~~~~~~~~~~~~~
This section describes how input events flow through the APZ code.
1. Input events arrive from the hardware/widget code into the APZ via
APZCTreeManager::ReceiveInputEvent. The thread that invokes this is
called the input thread, and may or may not be the same as the Gecko
main thread.
2. Conceptually the first thing that the APZCTreeManager does is to
associate these events with “input blocks”. An input block is a set
of events that share certain properties, and generally are intended
to represent a single gesture. For example with touch events, all
events following a touchstart up to but not including the next
touchstart are in the same block. All of the events in a given block
will go to the same APZC instance and will either all be processed
or all be dropped.
3. Using the first event in the input block, the APZCTreeManager does a
hit-test to see which APZC it hits. This hit-test uses the event
regions populated on the layers, which may be larger than the true
hit area of the layer. If no APZC is hit, the events are discarded
and we jump to step 6. Otherwise, the input block is tagged with the
hit APZC as a tentative target and put into a global APZ input
queue.
4.
i. If the input events landed outside the dispatch-to-content event
region for the layer, any available events in the input block
are processed. These may trigger behaviours like scrolling or
tap gestures.
ii. If the input events landed inside the dispatch-to-content event
region for the layer, the events are left in the queue and a
400ms timeout is initiated. If the timeout expires before step 9
is completed, the APZ assumes the input block was not cancelled
and the tentative target is correct, and processes them as part
of step 10.
5. The call stack unwinds back to APZCTreeManager::ReceiveInputEvent,
which does an in-place modification of the input event so that any
async transforms are removed.
6. The call stack unwinds back to the widget code that called
ReceiveInputEvent. This code now has the event in the coordinate
space Gecko is expecting, and so can dispatch it to the Gecko main
thread.
7. Gecko performs its own usual hit-testing and event dispatching for
the event. As part of this, it records whether any touch listeners
cancelled the input block by calling preventDefault(). It also
activates inactive scrollframes that were hit by the input events.
8. The call stack unwinds back to the widget code, which sends two
notifications to the APZ code on the input thread. The first
notification is via APZCTreeManager::ContentReceivedInputBlock, and
informs the APZ whether the input block was cancelled. The second
notification is via APZCTreeManager::SetTargetAPZC, and informs the
APZ of the results of the Gecko hit-test during event dispatch. Note
that Gecko may report that the input event did not hit any
scrollable frame at all. The SetTargetAPZC notification happens only
once per input block, while the ContentReceivedInputBlock
notification may happen once per block, or multiple times per block,
depending on the input type.
9.
i. If the events were processed as part of step 4(i), the
notifications from step 8 are ignored and step 10 is skipped.
ii. If events were queued as part of step 4(ii), and steps 5-8 take
less than 400ms, the arrival of both notifications from step 8
will mark the input block ready for processing.
iii. If events were queued as part of step 4(ii), but steps 5-8 take
longer than 400ms, the notifications from step 8 will be
ignored and step 10 will already have happened.
10. If events were queued as part of step 4(ii) they are now either
processed (if the input block was not cancelled and Gecko detected a
scrollframe under the input event, or if the timeout expired) or
dropped (all other cases). Note that the APZC that processes the
events may be different at this step than the tentative target from
step 3, depending on the SetTargetAPZC notification. Processing the
events may trigger behaviours like scrolling or tap gestures.
If the CSS touch-action property is enabled, the above steps are
modified as follows: \* In step 4, the APZC also requires the allowed
touch-action behaviours for the input event. This might have been
determined as part of the hit-test in APZCTreeManager; if not, the
events are queued. \* In step 6, the widget code determines the content
element at the point under the input element, and notifies the APZ code
of the allowed touch-action behaviours. This notification is sent via a
call to APZCTreeManager::SetAllowedTouchBehavior on the input thread. \*
In step 9(ii), the input block will only be marked ready for processing
once all three notifications arrive.
Threading considerations
^^^^^^^^^^^^^^^^^^^^^^^^
The bulk of the input processing in the APZ code happens on what we call
“the input thread”. In practice the input thread could be the Gecko main
thread, the compositor thread, or some other thread. There are obvious
downsides to using the Gecko main thread - that is, “asynchronous”
panning and zooming is not really asynchronous as input events can only
be processed while Gecko is idle. In an e10s environment, using the
Gecko main thread of the chrome process is acceptable, because the code
running in that process is more controllable and well-behaved than
arbitrary web content. Using the compositor thread as the input thread
could work on some platforms, but may be inefficient on others. For
example, on Android (Fennec) we receive input events from the system on
a dedicated UI thread. We would have to redispatch the input events to
the compositor thread if we wanted to the input thread to be the same as
the compositor thread. This introduces a potential for higher latency,
particularly if the compositor does any blocking operations - blocking
SwapBuffers operations, for example. As a result, the APZ code itself
does not assume that the input thread will be the same as the Gecko main
thread or the compositor thread.
Active vs. inactive scrollframes
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
The number of scrollframes on a page is potentially unbounded. However,
we do not want to create a separate layer for each scrollframe right
away, as this would require large amounts of memory. Therefore,
scrollframes as designated as either “active” or “inactive”. Active
scrollframes are the ones that do have their contents put on a separate
layer (or set of layers), and inactive ones do not.
Consider a page with a scrollframe that is initially inactive. When
layout generates the layers for this page, the content of the
scrollframe will be flattened into some other PaintedLayer (call it P).
The layout code also adds the area (or bounding region in case of weird
shapes) of the scrollframe to the dispatch-to-content region of P.
When the user starts interacting with that content, the hit-test in the
APZ code finds the dispatch-to-content region of P. The input block
therefore has a tentative target of P when it goes into step 4(ii) in
the flow above. When gecko processes the input event, it must detect the
inactive scrollframe and activate it, as part of step 7. Finally, the
widget code sends the SetTargetAPZC notification in step 8 to notify the
APZ that the input block should really apply to this new layer. The
issue here is that the layer transaction containing the new layer must
reach the compositor and APZ before the SetTargetAPZC notification. If
this does not occur within the 400ms timeout, the APZ code will be
unable to update the tentative target, and will continue to use P for
that input block. Input blocks that start after the layer transaction
will get correctly routed to the new layer as there will now be a layer
and APZC instance for the active scrollframe.
This model implies that when the user initially attempts to scroll an
inactive scrollframe, it may end up scrolling an ancestor scrollframe.
(This is because in the absence of the SetTargetAPZC notification, the
input events will get applied to the closest ancestor scrollframes
APZC.) Only after the round-trip to the gecko thread is complete is
there a layer for async scrolling to actually occur on the scrollframe
itself. At that point the scrollframe will start receiving new input
blocks and will scroll normally.

View File

Before

Width:  |  Height:  |  Size: 66 KiB

After

Width:  |  Height:  |  Size: 66 KiB

View File

@ -0,0 +1,159 @@
Graphics Overview
=========================
Work in progress. Possibly incorrect or incomplete.
---------------------------------------------------
Jargon
------
There's a lot of jargon in the graphics stack. We try to maintain a list
of common words and acronyms `here <https://wiki.mozilla.org/Platform/GFX/Jargon>`__.
Overview
--------
The graphics systems is responsible for rendering (painting, drawing)
the frame tree (rendering tree) elements as created by the layout
system. Each leaf in the tree has content, either bounded by a rectangle
(or perhaps another shape, in the case of SVG.)
The simple approach for producing the result would thus involve
traversing the frame tree, in a correct order, drawing each frame into
the resulting buffer and displaying (printing non-withstanding) that
buffer when the traversal is done. It is worth spending some time on the
“correct order” note above. If there are no overlapping frames, this is
fairly simple - any order will do, as long as there is no background. If
there is background, we just have to worry about drawing that first.
Since we do not control the content, chances are the page is more
complicated. There are overlapping frames, likely with transparency, so
we need to make sure the elements are draw “back to front”, in layers,
so to speak. Layers are an important concept, and we will revisit them
shortly, as they are central to fixing a major issue with the above
simple approach.
While the above simple approach will work, the performance will suffer.
Each time anything changes in any of the frames, the complete process
needs to be repeated, everything needs to be redrawn. Further, there is
very little space to take advantage of the modern graphics (GPU)
hardware, or multi-core computers. If you recall from the previous
sections, the frame tree is only accessible from the UI thread, so while
were doing all this work, the UI is basically blocked.
(Retained) Layers
~~~~~~~~~~~~~~~~~
Layers framework was introduced to address the above performance issues,
by having a part of the design address each item. At the high level:
1. We create a layer tree. The leaf elements of the tree contain all
frames (possibly multiple frames per leaf).
2. We render each layer tree element and cache (retain) the result.
3. We composite (combine) all the leaf elements into the final result.
Lets examine each of these steps, in reverse order.
Compositing
~~~~~~~~~~~
We use the term composite as it implies that the order is important. If
the elements being composited overlap, whether there is transparency
involved or not, the order in which they are combined will effect the
result. Compositing is where we can use some of the power of the modern
graphics hardware. It is optimal for doing this job. In the scenarios
where only the position of individual frames changes, without the
content inside them changing, we see why caching each layer would be
advantageous - we only need to repeat the final compositing step,
completely skipping the layer tree creation and the rendering of each
leaf, thus speeding up the process considerably.
Another benefit is equally apparent in the context of the stated
deficiencies of the simple approach. We can use the available graphics
hardware accelerated APIs to do the compositing step. Direct3D, OpenGL
can be used on different platforms and are well suited to accelerate
this step.
Finally, we can now envision performing the compositing step on a
separate thread, unblocking the UI thread for other work, and doing more
work in parallel. More on this below.
It is important to note that the number of operations in this step is
proportional to the number of layer tree (leaf) elements, so there is
additional work and complexity involved, when the layer tree is large.
Render and retain layer elements
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
As we saw, the compositing step benefits from caching the intermediate
result. This does result in the extra memory usage, so needs to be
considered during the layer tree creation. Beyond the caching, we can
accelerate the rendering of each element by (indirectly) using the
available platform APIs (e.g., Direct2D, CoreGraphics, even some of the
3D APIs like OpenGL or Direct3D) as available. This is actually done
through a platform independent API (see Moz2D) below, but is important
to realize it does get accelerated appropriately.
Creating the layer tree
~~~~~~~~~~~~~~~~~~~~~~~
We need to create a layer tree (from the frames tree), which will give
us the correct result while striking the right balance between a layer
per frame element and a single layer for the complete frames tree. As
was mentioned above, there is an overhead in traversing the whole tree
and caching each of the elements, balanced by the performance
improvements. Some of the performance improvements are only noticed when
something changes (e.g., one element is moving, we only need to redo the
compositing step).
Refresh Driver
~~~~~~~~~~~~~~
Layers
~~~~~~
Rendering each layer
~~~~~~~~~~~~~~~~~~~~
Tiling vs. Buffer Rotation vs. Full paint
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Compositing for the final result
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Graphics API
~~~~~~~~~~~~
Moz2D
~~~~~
- The Moz2D graphics API, part of the Azure project, is a
cross-platform interface onto the various graphics backends that
Gecko uses for rendering such as Direct2D (1.0 and 1.1), Skia, Cairo,
Quartz, and NV Path. Adding a new graphics platform to Gecko is
accomplished by adding a backend to Moz2D.
See `Moz2D documentation on wiki <https://wiki.mozilla.org/Platform/GFX/Moz2D>`__.
Compositing
~~~~~~~~~~~
Image Decoding
~~~~~~~~~~~~~~
Image Animation
~~~~~~~~~~~~~~~
`Historical Documents <http://www.youtube.com/watch?v=lLZQz26-kms>`__
---------------------------------------------------------------------
A number of posts and blogs that will give you more details or more
background, or reasoning that led to different solutions and approaches.
- 2010-01 `Layers: Cross Platform Acceleration <http://www.basschouten.com/blog1.php/layers-cross-platform-acceleration>`__
- 2010-04 `Layers <http://robert.ocallahan.org/2010/04/layers_01.html>`__
- 2010-07 `Retained Layers <http://robert.ocallahan.org/2010/07/retained-layers_16.html>`__
- 2011-04 `Introduction <https://blog.mozilla.org/joe/2011/04/26/introducing-the-azure-project/%20Moz2D>`__
- 2011-07 `Layers <http://chrislord.net/index.php/2011/07/25/shadow-layers-and-learning-by-failing/%20Shadow>`__
- 2011-09 `Graphics API Design <http://robert.ocallahan.org/2011/09/graphics-api-design.html>`__
- 2012-04 `Moz2D Canvas on OSX <http://muizelaar.blogspot.ca/2012/04/azure-canvas-on-os-x.html>`__
- 2012-05 `Mask Layers <http://featherweightmusings.blogspot.co.uk/2012/05/mask-layers_26.html>`__
- 2013-07 `Graphics related <http://www.basschouten.com/blog1.php>`__

View File

@ -0,0 +1,63 @@
Layers History
==============
This is an overview of the major events in the history of our Layers
infrastructure.
- iPhone released in July 2007 (Built on a toolkit called LayerKit)
- Core Animation (October 2007) LayerKit was publicly renamed to OS X
10.5
- Webkit CSS 3d transforms (July 2009)
- Original layers API (March 2010) Introduced the idea of a layer
manager that would composite. One of the first use cases for this was
hardware accelerated YUV conversion for video.
- Retained layers (July 7 2010 - Bug 564991) This was an important
concept that introduced the idea of persisting the layer content
across paints in gecko controlled buffers instead of just by the OS.
This introduced the concept of buffer rotation to deal with scrolling
instead of using the native scrolling APIs like ScrollWindowEx
- Layers IPC (July 2010 - Bug 570294) This introduced shadow layers and
edit lists and was originally done for e10s v1
- 3D transforms (September 2011 - Bug 505115)
- OMTC (December 2011 - Bug 711168) This was prototyped on OS X but
shipped first for Fennec
- Tiling v1 (April 2012 - Bug 739679) Originally done for Fennec. This
was done to avoid situations where we had to do a bunch of work for
scrolling a small amount. i.e. buffer rotation. It allowed us to have
a variety of interesting features like progressive painting and lower
resolution painting.
- C++ Async pan zoom controller (July 2012 - Bug 750974) The existing
APZ code was in Java for Fennec so this was reimplemented.
- Streaming WebGL Buffers (February 2013 - Bug 716859) Infrastructure
to allow OMTC WebGL and avoid the need to glFinish() every frame.
- Compositor API (April 2013 - Bug 825928) The planning for this
started around November 2012. Layers refactoring created a compositor
API that abstracted away the differences between the D3D vs OpenGL.
The main piece of API is DrawQuad.
- Tiling v2 (Mar 7 2014 - Bug 963073) Tiling for B2G. This work is
mainly porting tiled layers to new textures, implementing
double-buffered tiles and implementing a texture client pool, to be
used by tiled content clients.
A large motivation for the pool was the very slow performance of
allocating tiles because of the sync messages to the compositor.
The slow performance of allocating was directly addressed by bug 959089
which allowed us to allocate gralloc buffers without sync messages to
the compositor thread.
- B2G WebGL performance (May 2014 - Bug 1006957, 1001417, 1024144) This
work improved the synchronization mechanism between the compositor
and the producer.

469
gfx/docs/Silk.rst Normal file
View File

@ -0,0 +1,469 @@
Silk Overview
==========================
.. image:: SilkArchitecture.png
Architecture
------------
Our current architecture is to align three components to hardware vsync
timers:
1. Compositor
2. RefreshDriver / Painting
3. Input Events
The flow of our rendering engine is as follows:
1. Hardware Vsync event occurs on an OS specific *Hardware Vsync Thread*
on a per monitor basis.
2. The *Hardware Vsync Thread* attached to the monitor notifies the
``CompositorVsyncDispatchers`` and ``RefreshTimerVsyncDispatcher``.
3. For every Firefox window on the specific monitor, notify a
``CompositorVsyncDispatcher``. The ``CompositorVsyncDispatcher`` is
specific to one window.
4. The ``CompositorVsyncDispatcher`` notifies a
``CompositorWidgetVsyncObserver`` when remote compositing, or a
``CompositorVsyncScheduler::Observer`` when compositing in-process.
5. If remote compositing, a vsync notification is sent from the
``CompositorWidgetVsyncObserver`` to the ``VsyncBridgeChild`` on the
UI process, which sends an IPDL message to the ``VsyncBridgeParent``
on the compositor thread of the GPU process, which then dispatches to
``CompositorVsyncScheduler::Observer``.
6. The ``RefreshTimerVsyncDispatcher`` notifies the Chrome
``RefreshTimer`` that a vsync has occured.
7. The ``RefreshTimerVsyncDispatcher`` sends IPC messages to all content
processes to tick their respective active ``RefreshTimer``.
8. The ``Compositor`` dispatches input events on the *Compositor
Thread*, then composites. Input events are only dispatched on the
*Compositor Thread* on b2g.
9. The ``RefreshDriver`` paints on the *Main Thread*.
Hardware Vsync
--------------
Hardware vsync events from (1), occur on a specific ``Display`` Object.
The ``Display`` object is responsible for enabling / disabling vsync on
a per connected display basis. For example, if two monitors are
connected, two ``Display`` objects will be created, each listening to
vsync events for their respective displays. We require one ``Display``
object per monitor as each monitor may have different vsync rates. As a
fallback solution, we have one global ``Display`` object that can
synchronize across all connected displays. The global ``Display`` is
useful if a window is positioned halfway between the two monitors. Each
platform will have to implement a specific ``Display`` object to hook
and listen to vsync events. As of this writing, both Firefox OS and OS X
create their own hardware specific *Hardware Vsync Thread* that executes
after a vsync has occured. OS X creates one *Hardware Vsync Thread* per
``CVDisplayLinkRef``. We do not currently support multiple displays, so
we use one global ``CVDisplayLinkRef`` that works across all active
displays. On Windows, we have to create a new platform ``thread`` that
waits for DwmFlush(), which works across all active displays. Once the
thread wakes up from DwmFlush(), the actual vsync timestamp is retrieved
from DwmGetCompositionTimingInfo(), which is the timestamp that is
actually passed into the compositor and refresh driver.
When a vsync occurs on a ``Display``, the *Hardware Vsync Thread*
callback fetches all ``CompositorVsyncDispatchers`` associated with the
``Display``. Each ``CompositorVsyncDispatcher`` is notified that a vsync
has occured with the vsyncs timestamp. It is the responsibility of the
``CompositorVsyncDispatcher`` to notify the ``Compositor`` that is
awaiting vsync notifications. The ``Display`` will then notify the
associated ``RefreshTimerVsyncDispatcher``, which should notify all
active ``RefreshDrivers`` to tick.
All ``Display`` objects are encapsulated in a ``VsyncSource`` object.
The ``VsyncSource`` object lives in ``gfxPlatform`` and is instantiated
only on the parent process when ``gfxPlatform`` is created. The
``VsyncSource`` is destroyed when ``gfxPlatform`` is destroyed. There is
only one ``VsyncSource`` object throughout the entire lifetime of
Firefox. Each platform is expected to implement their own
``VsyncSource`` to manage vsync events. On Firefox OS, this is through
the ``HwcComposer2D``. On OS X, this is through ``CVDisplayLinkRef``. On
Windows, it should be through ``DwmGetCompositionTimingInfo``.
Compositor
----------
When the ``CompositorVsyncDispatcher`` is notified of the vsync event,
the ``CompositorVsyncScheduler::Observer`` associated with the
``CompositorVsyncDispatcher`` begins execution. Since the
``CompositorVsyncDispatcher`` executes on the *Hardware Vsync Thread*
and the ``Compositor`` composites on the ``CompositorThread``, the
``CompositorVsyncScheduler::Observer`` posts a task to the
``CompositorThread``. The ``CompositorBridgeParent`` then composites.
The model where the ``CompositorVsyncDispatcher`` notifies components on
the *Hardware Vsync Thread*, and the component schedules the task on the
appropriate thread is used everywhere.
The ``CompositorVsyncScheduler::Observer`` listens to vsync events as
needed and stops listening to vsync when composites are no longer
scheduled or required. Every ``CompositorBridgeParent`` is associated
and tied to one ``CompositorVsyncScheduler::Observer``, which is
associated with the ``CompositorVsyncDispatcher``. Each
``CompositorBridgeParent`` is associated with one widget and is created
when a new platform window or ``nsBaseWidget`` is created. The
``CompositorBridgeParent``, ``CompositorVsyncDispatcher``,
``CompositorVsyncScheduler::Observer``, and ``nsBaseWidget`` all have
the same lifetimes, which are created and destroyed together.
Out-of-process Compositors
--------------------------
When compositing out-of-process, this model changes slightly. In this
case there are effectively two observers: a UI process observer
(``CompositorWidgetVsyncObserver``), and the
``CompositorVsyncScheduler::Observer`` in the GPU process. There are
also two dispatchers: the widget dispatcher in the UI process
(``CompositorVsyncDispatcher``), and the IPDL-based dispatcher in the
GPU process (``CompositorBridgeParent::NotifyVsync``). The UI process
observer and the GPU process dispatcher are linked via an IPDL protocol
called PVsyncBridge. ``PVsyncBridge`` is a top-level protocol for
sending vsync notifications to the compositor thread in the GPU process.
The compositor controls vsync observation through a separate actor,
``PCompositorWidget``, which (as a subactor for
``CompositorBridgeChild``) links the compositor thread in the GPU
process to the main thread in the UI process.
Out-of-process compositors do not go through
``CompositorVsyncDispatcher`` directly. Instead, the
``CompositorWidgetDelegate`` in the UI process creates one, and gives it
a ``CompositorWidgetVsyncObserver``. This observer forwards
notifications to a Vsync I/O thread, where ``VsyncBridgeChild`` then
forwards the notification again to the compositor thread in the GPU
process. The notification is received by a ``VsyncBridgeParent``. The
GPU process uses the layers ID in the notification to find the correct
compositor to dispatch the notification to.
CompositorVsyncDispatcher
-------------------------
The ``CompositorVsyncDispatcher`` executes on the *Hardware Vsync
Thread*. It contains references to the ``nsBaseWidget`` it is associated
with and has a lifetime equal to the ``nsBaseWidget``. The
``CompositorVsyncDispatcher`` is responsible for notifying the
``CompositorBridgeParent`` that a vsync event has occured. There can be
multiple ``CompositorVsyncDispatchers`` per ``Display``, one
``CompositorVsyncDispatcher`` per window. The only responsibility of the
``CompositorVsyncDispatcher`` is to notify components when a vsync event
has occured, and to stop listening to vsync when no components require
vsync events. We require one ``CompositorVsyncDispatcher`` per window so
that we can handle multiple ``Displays``. When compositing in-process,
the ``CompositorVsyncDispatcher`` is attached to the CompositorWidget
for the window. When out-of-process, it is attached to the
CompositorWidgetDelegate, which forwards observer notifications over
IPDL. In the latter case, its lifetime is tied to a CompositorSession
rather than the nsIWidget.
Multiple Displays
-----------------
The ``VsyncSource`` has an API to switch a ``CompositorVsyncDispatcher``
from one ``Display`` to another ``Display``. For example, when one
window either goes into full screen mode or moves from one connected
monitor to another. When one window moves to another monitor, we expect
a platform specific notification to occur. The detection of when a
window enters full screen mode or moves is not covered by Silk itself,
but the framework is built to support this use case. The expected flow
is that the OS notification occurs on ``nsIWidget``, which retrieves the
associated ``CompositorVsyncDispatcher``. The
``CompositorVsyncDispatcher`` then notifies the ``VsyncSource`` to
switch to the correct ``Display`` the ``CompositorVsyncDispatcher`` is
connected to. Because the notification works through the ``nsIWidget``,
the actual switching of the ``CompositorVsyncDispatcher`` to the correct
``Display`` should occur on the *Main Thread*. The current
implementation of Silk does not handle this case and needs to be built
out.
CompositorVsyncScheduler::Observer
----------------------------------
The ``CompositorVsyncScheduler::Observer`` handles the vsync
notifications and interactions with the ``CompositorVsyncDispatcher``.
When the ``Compositor`` requires a scheduled composite, it notifies the
``CompositorVsyncScheduler::Observer`` that it needs to listen to vsync.
The ``CompositorVsyncScheduler::Observer`` then observes / unobserves
vsync as needed from the ``CompositorVsyncDispatcher`` to enable
composites.
GeckoTouchDispatcher
--------------------
The ``GeckoTouchDispatcher`` is a singleton that resamples touch events
to smooth out jank while tracking a users finger. Because input and
composite are linked together, the
``CompositorVsyncScheduler::Observer`` has a reference to the
``GeckoTouchDispatcher`` and vice versa.
Input Events
------------
One large goal of Silk is to align touch events with vsync events. On
Firefox OS, touchscreens often have different touch scan rates than the
display refreshes. A Flame device has a touch refresh rate of 75 HZ,
while a Nexus 4 has a touch refresh rate of 100 HZ, while the devices
display refresh rate is 60HZ. When a vsync event occurs, we resample
touch events, and then dispatch the resampled touch event to APZ. Touch
events on Firefox OS occur on a *Touch Input Thread* whereas they are
processed by APZ on the *APZ Controller Thread*. We use `Google
Androids touch
resampling <http://www.masonchang.com/blog/2014/8/25/androids-touch-resampling-algorithm>`__
algorithm to resample touch events.
Currently, we have a strict ordering between Composites and touch
events. When a touch event occurs on the *Touch Input Thread*, we store
the touch event in a queue. When a vsync event occurs, the
``CompositorVsyncDispatcher`` notifies the ``Compositor`` of a vsync
event, which notifies the ``GeckoTouchDispatcher``. The
``GeckoTouchDispatcher`` processes the touch event first on the *APZ
Controller Thread*, which is the same as the *Compositor Thread* on b2g,
then the ``Compositor`` finishes compositing. We require this strict
ordering because if a vsync notification is dispatched to both the
``Compositor`` and ``GeckoTouchDispatcher`` at the same time, a race
condition occurs between processing the touch event and therefore
position versus compositing. In practice, this creates very janky
scrolling. As of this writing, we have not analyzed input events on
desktop platforms.
One slight quirk is that input events can start a composite, for example
during a scroll and after the ``Compositor`` is no longer listening to
vsync events. In these cases, we notify the ``Compositor`` to observe
vsync so that it dispatches touch events. If touch events were not
dispatched, and since the ``Compositor`` is not listening to vsync
events, the touch events would never be dispatched. The
``GeckoTouchDispatcher`` handles this case by always forcing the
``Compositor`` to listen to vsync events while touch events are
occurring.
Widget, Compositor, CompositorVsyncDispatcher, GeckoTouchDispatcher Shutdown Procedure
--------------------------------------------------------------------------------------
When the `nsBaseWidget shuts
down <https://hg.mozilla.org/mozilla-central/file/0df249a0e4d3/widget/nsBaseWidget.cpp#l182>`__
- It calls nsBaseWidget::DestroyCompositor on the *Gecko Main Thread*.
During nsBaseWidget::DestroyCompositor, it first destroys the
CompositorBridgeChild. CompositorBridgeChild sends a sync IPC call to
CompositorBridgeParent::RecvStop, which calls
`CompositorBridgeParent::Destroy <https://hg.mozilla.org/mozilla-central/file/ab0490972e1e/gfx/layers/ipc/CompositorBridgeParent.cpp#l509>`__.
During this time, the *main thread* is blocked on the parent process.
CompositorBridgeParent::RecvStop runs on the *Compositor thread* and
cleans up some resources, including setting the
``CompositorVsyncScheduler::Observer`` to nullptr.
CompositorBridgeParent::RecvStop also explicitly keeps the
CompositorBridgeParent alive and posts another task to run
CompositorBridgeParent::DeferredDestroy on the Compositor loop so that
all ipdl code can finish executing. The
``CompositorVsyncScheduler::Observer`` also unobserves from vsync and
cancels any pending composite tasks. Once
CompositorBridgeParent::RecvStop finishes, the *main thread* in the
parent process continues shutting down the nsBaseWidget.
At the same time, the *Compositor thread* is executing tasks until
CompositorBridgeParent::DeferredDestroy runs, which flushes the
compositor message loop. Now we have two tasks as both the nsBaseWidget
releases a reference to the Compositor on the *main thread* during
destruction and the CompositorBridgeParent::DeferredDestroy releases a
reference to the CompositorBridgeParent on the *Compositor Thread*.
Finally, the CompositorBridgeParent itself is destroyed on the *main
thread* once both references are gone due to explicit `main thread
destruction <https://hg.mozilla.org/mozilla-central/file/50b95032152c/gfx/layers/ipc/CompositorBridgeParent.h#l148>`__.
With the ``CompositorVsyncScheduler::Observer``, any accesses to the
widget after nsBaseWidget::DestroyCompositor executes are invalid. Any
accesses to the compositor between the time the
nsBaseWidget::DestroyCompositor runs and the
CompositorVsyncScheduler::Observers destructor runs arent safe yet a
hardware vsync event could occur between these times. Since any tasks
posted on the Compositor loop after
CompositorBridgeParent::DeferredDestroy is posted are invalid, we make
sure that no vsync tasks can be posted once
CompositorBridgeParent::RecvStop executes and DeferredDestroy is posted
on the Compositor thread. When the sync call to
CompositorBridgeParent::RecvStop executes, we explicitly set the
CompositorVsyncScheduler::Observer to null to prevent vsync
notifications from occurring. If vsync notifications were allowed to
occur, since the ``CompositorVsyncScheduler::Observer``\ s vsync
notification executes on the *hardware vsync thread*, it would post a
task to the Compositor loop and may execute after
CompositorBridgeParent::DeferredDestroy. Thus, we explicitly shut down
vsync events in the ``CompositorVsyncDispatcher`` and
``CompositorVsyncScheduler::Observer`` during nsBaseWidget::Shutdown to
prevent any vsync tasks from executing after
CompositorBridgeParent::DeferredDestroy.
The ``CompositorVsyncDispatcher`` may be destroyed on either the *main
thread* or *Compositor Thread*, since both the nsBaseWidget and
``CompositorVsyncScheduler::Observer`` race to destroy on different
threads. nsBaseWidget is destroyed on the *main thread* and releases a
reference to the ``CompositorVsyncDispatcher`` during destruction. The
``CompositorVsyncScheduler::Observer`` has a race to be destroyed either
during CompositorBridgeParent shutdown or from the
``GeckoTouchDispatcher`` which is destroyed on the main thread with
`ClearOnShutdown <https://hg.mozilla.org/mozilla-central/file/21567e9a6e40/xpcom/base/ClearOnShutdown.h#l15>`__.
Whichever object, the CompositorBridgeParent or the
``GeckoTouchDispatcher`` is destroyed last will hold the last reference
to the ``CompositorVsyncDispatcher``, which destroys the object.
Refresh Driver
--------------
The Refresh Driver is ticked from a `single active
timer <https://hg.mozilla.org/mozilla-central/file/ab0490972e1e/layout/base/nsRefreshDriver.cpp#l11>`__.
The assumption is that there are multiple ``RefreshDrivers`` connected
to a single ``RefreshTimer``. There are two ``RefreshTimers``: an active
and an inactive ``RefreshTimer``. Each Tab has its own
``RefreshDriver``, which connects to one of the global
``RefreshTimers``. The ``RefreshTimers`` execute on the *Main Thread*
and tick their connected ``RefreshDrivers``. We do not want to break
this model of multiple ``RefreshDrivers`` per a set of two global
``RefreshTimers``. Each ``RefreshDriver`` switches between the active
and inactive ``RefreshTimer``.
Instead, we create a new ``RefreshTimer``, the ``VsyncRefreshTimer``
which ticks based on vsync messages. We replace the current active timer
with a ``VsyncRefreshTimer``. All tabs will then tick based on this new
active timer. Since the ``RefreshTimer`` has a lifetime of the process,
we only need to create a single ``RefreshTimerVsyncDispatcher`` per
``Display`` when Firefox starts. Even if we do not have any content
processes, the Chrome process will still need a ``VsyncRefreshTimer``,
thus we can associate the ``RefreshTimerVsyncDispatcher`` with each
``Display``.
When Firefox starts, we initially create a new ``VsyncRefreshTimer`` in
the Chrome process. The ``VsyncRefreshTimer`` will listen to vsync
notifications from ``RefreshTimerVsyncDispatcher`` on the global
``Display``. When nsRefreshDriver::Shutdown executes, it will delete the
``VsyncRefreshTimer``. This creates a problem as all the
``RefreshTimers`` are currently manually memory managed whereas
``VsyncObservers`` are ref counted. To work around this problem, we
create a new ``RefreshDriverVsyncObserver`` as an inner class to
``VsyncRefreshTimer``, which actually receives vsync notifications. It
then ticks the ``RefreshDrivers`` inside ``VsyncRefreshTimer``.
With Content processes, the start up process is more complicated. We
send vsync IPC messages via the use of the PBackground thread on the
parent process, which allows us to send messages from the Parent
process without waiting on the *main thread*. This sends messages from
the Parent::\ *PBackground Thread* to the Child::\ *Main Thread*. The
*main thread* receiving IPC messages on the content process is
acceptable because ``RefreshDrivers`` must execute on the *main thread*.
However, there is some amount of time required to setup the IPC
connection upon process creation and during this time, the
``RefreshDrivers`` must tick to set up the process. To get around this,
we initially use software ``RefreshTimers`` that already exist during
content process startup and swap in the ``VsyncRefreshTimer`` once the
IPC connection is created.
During nsRefreshDriver::ChooseTimer, we create an async PBackground IPC
open request to create a ``VsyncParent`` and ``VsyncChild``. At the same
time, we create a software ``RefreshTimer`` and tick the
``RefreshDrivers`` as normal. Once the PBackground callback is executed
and an IPC connection exists, we swap all ``RefreshDrivers`` currently
associated with the active ``RefreshTimer`` and swap the
``RefreshDrivers`` to use the ``VsyncRefreshTimer``. Since all
interactions on the content process occur on the main thread, there are
no need for locks. The ``VsyncParent`` listens to vsync events through
the ``VsyncRefreshTimerDispatcher`` on the parent side and sends vsync
IPC messages to the ``VsyncChild``. The ``VsyncChild`` notifies the
``VsyncRefreshTimer`` on the content process.
During the shutdown process of the content process, ActorDestroy is
called on the ``VsyncChild`` and ``VsyncParent`` due to the normal
PBackground shutdown process. Once ActorDestroy is called, no IPC
messages should be sent across the channel. After ActorDestroy is
called, the IPDL machinery will delete the **VsyncParent/Child** pair.
The ``VsyncParent``, due to being a ``VsyncObserver``, is ref counted.
After ``VsyncParent::ActorDestroy`` is called, it unregisters itself
from the ``RefreshTimerVsyncDispatcher``, which holds the last reference
to the ``VsyncParent``, and the object will be deleted.
Thus the overall flow during normal execution is:
1. VsyncSource::Display::RefreshTimerVsyncDispatcher receives a Vsync
notification from the OS in the parent process.
2. RefreshTimerVsyncDispatcher notifies
VsyncRefreshTimer::RefreshDriverVsyncObserver that a vsync occured on
the parent process on the hardware vsync thread.
3. RefreshTimerVsyncDispatcher notifies the VsyncParent on the hardware
vsync thread that a vsync occured.
4. The VsyncRefreshTimer::RefreshDriverVsyncObserver in the parent
process posts a task to the main thread that ticks the refresh
drivers.
5. VsyncParent posts a task to the PBackground thread to send a vsync
IPC message to VsyncChild.
6. VsyncChild receive a vsync notification on the content process on the
main thread and ticks their respective RefreshDrivers.
Compressing Vsync Messages
--------------------------
Vsync messages occur quite often and the *main thread* can be busy for
long periods of time due to JavaScript. Consistently sending vsync
messages to the refresh driver timer can flood the *main thread* with
refresh driver ticks, causing even more delays. To avoid this problem,
we compress vsync messages on both the parent and child processes.
On the parent process, newer vsync messages update a vsync timestamp but
do not actually queue any tasks on the *main thread*. Once the parent
process *main thread* executes the refresh driver tick, it uses the
most updated vsync timestamp to tick the refresh driver. After the
refresh driver has ticked, one single vsync message is queued for
another refresh driver tick task. On the content process, the IPDL
``compress`` keyword automatically compresses IPC messages.
Multiple Monitors
-----------------
In order to have multiple monitor support for the ``RefreshDrivers``, we
have multiple active ``RefreshTimers``. Each ``RefreshTimer`` is
associated with a specific ``Display`` via an id and tick when its
respective ``Display`` vsync occurs. We have **N RefreshTimers**, where
N is the number of connected displays. Each ``RefreshTimer`` still has
multiple ``RefreshDrivers``.
When a tab or window changes monitors, the ``nsIWidget`` receives a
display changed notification. Based on which display the window is on,
the window switches to the correct ``RefreshTimerVsyncDispatcher`` and
``CompositorVsyncDispatcher`` on the parent process based on the display
id. Each ``TabParent`` should also send a notification to their child.
Each ``TabChild``, given the display ID, switches to the correct
``RefreshTimer`` associated with the display ID. When each display vsync
occurs, it sends one IPC message to notify vsync. The vsync message
contains a display ID, to tick the appropriate ``RefreshTimer`` on the
content process. There is still only one **VsyncParent/VsyncChild**
pair, just each vsync notification will include a display ID, which maps
to the correct ``RefreshTimer``.
Object Lifetime
---------------
1. CompositorVsyncDispatcher - Lives as long as the nsBaseWidget
associated with the VsyncDispatcher
2. CompositorVsyncScheduler::Observer - Lives and dies the same time as
the CompositorBridgeParent.
3. RefreshTimerVsyncDispatcher - As long as the associated display
object, which is the lifetime of Firefox.
4. VsyncSource - Lives as long as the gfxPlatform on the chrome process,
which is the lifetime of Firefox.
5. VsyncParent/VsyncChild - Lives as long as the content process
6. RefreshTimer - Lives as long as the process
Threads
-------
All ``VsyncObservers`` are notified on the *Hardware Vsync Thread*. It
is the responsibility of the ``VsyncObservers`` to post tasks to their
respective correct thread. For example, the
``CompositorVsyncScheduler::Observer`` will be notified on the *Hardware
Vsync Thread*, and post a task to the *Compositor Thread* to do the
actual composition.
1. Compositor Thread - Nothing changes
2. Main Thread - PVsyncChild receives IPC messages on the main thread.
We also enable/disable vsync on the main thread.
3. PBackground Thread - Creates a connection from the PBackground thread
on the parent process to the main thread in the content process.
4. Hardware Vsync Thread - Every platform is different, but we always
have the concept of a hardware vsync thread. Sometimes this is
actually created by the host OS. On Windows, we have to create a
separate platform thread that blocks on DwmFlush().

View File

Before

Width:  |  Height:  |  Size: 216 KiB

After

Width:  |  Height:  |  Size: 216 KiB

View File

@ -1,9 +1,17 @@
========
Graphics
========
The graphics team's documentation is currently using doxygen. We're tracking the work to integrate it better at https://bugzilla.mozilla.org/show_bug.cgi?id=1150232.
This collection of linked pages contains design documents for the
Mozilla graphics architecture. The design documents live in gfx/docs directory.
For now you can read the graphics source code documentation here:
This `wiki page <https://wiki.mozilla.org/Platform/GFX>`__ contains
information about graphics and the graphics team at Mozilla.
http://people.mozilla.org/~bgirard/doxygen/gfx/
.. toctree::
:maxdepth: 1
GraphicsOverview
LayersHistory
AsyncPanZoom
AdvancedLayers
Silk