mirror of
https://github.com/mozilla/gecko-dev.git
synced 2024-10-08 02:14:43 +00:00
Bug 1605508 - Write new on-boarding document with rendering overview r=jbonisteel
Differential Revision: https://phabricator.services.mozilla.com/D58059 --HG-- extra : moz-landing-system : lando
This commit is contained in:
parent
ce358595b7
commit
445ef3fc37
285
gfx/docs/RenderingOverview.rst
Normal file
285
gfx/docs/RenderingOverview.rst
Normal file
@ -0,0 +1,285 @@
|
||||
Rendering Overview
|
||||
==================
|
||||
|
||||
This document is an overview of the steps to render a webpage, and how HTML
|
||||
gets transformed and broken down, step by step, into commands that can execute
|
||||
on the GPU.
|
||||
|
||||
If you're coming into the graphics team with not a lot of background
|
||||
in browsers, start here :)
|
||||
|
||||
.. contents::
|
||||
|
||||
High level overview
|
||||
-------------------
|
||||
|
||||
.. image:: RenderingOverviewSimple.png
|
||||
:width: 100%
|
||||
|
||||
Layout
|
||||
~~~~~~
|
||||
Starting at the left in the above image, we have a document
|
||||
represented by a DOM - a Document Object Model. A Javascript engine
|
||||
will execute JS code, either to make changes to the DOM, or to respond to
|
||||
events generated by the DOM (or do both).
|
||||
|
||||
The DOM is a high level description and we don't know what to draw or
|
||||
where until it is combined with a Cascading Style Sheet (CSS).
|
||||
Combining these two and figuring out what, where and how to draw
|
||||
things is the responsibility of the Layout team. The
|
||||
DOM is converted into a hierarchical Frame Tree, which nests visual
|
||||
elements (boxes). Each element points to some node in a Style Tree
|
||||
that describes what it should look like -- color, transparency, etc.
|
||||
The result is that now we know exactly what to render where, what goes
|
||||
on top of what (layering and blending) and at what pixel coordinate.
|
||||
This is the Display List.
|
||||
|
||||
The Display List is a light-weight data structure because it's shallow
|
||||
-- it mostly points back to the Frame Tree. There are two problems
|
||||
with this. First, we want to cross process boundaries at this point.
|
||||
Everything up until now happens in a Content Process (of which there are
|
||||
several). Actual GPU rendering happens in a GPU Process (on some
|
||||
platforms). Second, everything up until now was written in C++; but
|
||||
WebRender is written in Rust. Thus the shallow Display List needs to
|
||||
be serialized in a completely self-contained binary blob that will
|
||||
survive Interprocess Communication (IPC) and a language switch (C++ to
|
||||
Rust). The result is the WebRender Display List.
|
||||
|
||||
WebRender
|
||||
~~~~~~~~~
|
||||
|
||||
The GPU process receives the WebRender Display List blob and
|
||||
de-serializes it into a Scene. This Scene contains more than the
|
||||
strictly visible elements; for example, to anticipate scrolling, we
|
||||
might have several paragraphs of text extending past the visible page.
|
||||
|
||||
For a given viewport, the Scene gets culled and stripped down to a
|
||||
Frame. This is also where we start preparing data structures for GPU
|
||||
rendering, for example getting some font glyphs into an atlas for
|
||||
rasterizing text.
|
||||
|
||||
The final step takes the Frame and submits commands to the GPU to
|
||||
actually render it. The GPU will execute the commands and composite
|
||||
the final page.
|
||||
|
||||
Software
|
||||
~~~~~~~~
|
||||
|
||||
The above is the new WebRender-enabled way to do things. But in the
|
||||
schematic you'll note a second branch towards the bottom: this is the
|
||||
legacy code path which does not use WebRender (nor Rust). In this
|
||||
case, the Display List is converted into a Layer Tree. The purpose of
|
||||
this Tree is to try and avoid having to re-render absolutely
|
||||
everything when the page needs to be refreshed. For example, when
|
||||
scrolling we should be able to redraw the page by mostly shifting
|
||||
things around. However that requires those 'things' to still be around
|
||||
from last time we drew the page. In other words, visual elements that
|
||||
are likely to be static and reusable need to be drawn into their own
|
||||
private "page" (a cache). Then we can recombine (composite) all of
|
||||
these when redrawing the actual page.
|
||||
|
||||
Figuring out which elements would be good candidates for this, and
|
||||
striking a balance between good performance versus excessive memory
|
||||
use, is the purpose of the Layer Tree. Each 'layer' is a cached image
|
||||
of some element(s). This logic also takes occlusion into account, eg.
|
||||
don't allocate and render a layer for elements that are known to be
|
||||
completely obscured by something in front of them.
|
||||
|
||||
Redrawing the page by combining the Layer Tree with any newly
|
||||
rasterized elements is the job of the Compositor.
|
||||
|
||||
|
||||
Even when a layer cannot be reused in its entirety, it is likely
|
||||
that only a small part of it was invalidated. Thus there is an
|
||||
elaborate system for tracking dirty rectangles, starting an update by
|
||||
copying the area that can be salvaged, and then redrawing only what
|
||||
cannot.
|
||||
|
||||
In fact, this idea can be extended to delta-tracking of display lists
|
||||
themselves. Traversing the layout tree and building a display list is
|
||||
also not cheap, so the code tries to partially invalidate and rebuild
|
||||
the display list incrementally when possible.
|
||||
This optimization is used both for non-WebRender and WebRender in
|
||||
fact.
|
||||
|
||||
|
||||
Asynchronous Panning And Zooming
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
Earlier we mentioned that a Scene might contain more elements than are
|
||||
strictly necessary for rendering what's visible (the Frame). The
|
||||
reason for that is Asynchronous Panning and Zooming, or APZ for short.
|
||||
The browser will feel much more responsive if scrolling & zooming can
|
||||
short-circuit all of these data transformations and IPC boundaries,
|
||||
and instead directly update an offset of some layer and recomposite.
|
||||
(Think of late-latching in a VR context)
|
||||
|
||||
This simple idea introduces a lot of complexity: how much extra do you
|
||||
rasterize, and in which direction? How much memory can we afford?
|
||||
What about Javascript that responds to scroll events and perhaps does
|
||||
something 'interesting' with the page in return? What about nested
|
||||
frames or nested scrollbars? What if we scroll so much that we go
|
||||
past the boundaries of the Scene that we know about?
|
||||
|
||||
See AsyncPanZoom.rst for all that and more.
|
||||
|
||||
A Few More Details
|
||||
~~~~~~~~~~~~~~~~~~
|
||||
|
||||
Here's another schematic which basically repeats the previous one, but
|
||||
showing a little bit more detail. Note that the direction is reversed
|
||||
-- the data flow starts at the right. Sorry about that :)
|
||||
|
||||
.. image:: RenderingOverviewDetail.png
|
||||
:width: 100%
|
||||
|
||||
Some things to note:
|
||||
|
||||
- there are multiple content processes, currently 4 of them. This is
|
||||
for security reasons (sandboxing), stability (isolate crashes) and
|
||||
performance (multi-core machines);
|
||||
- ideally each "webpage" would run in its own process for security;
|
||||
this is being developed under the term 'fission';
|
||||
- there is only a single GPU process, if there is one at all;
|
||||
some platforms have it as part of the Parent;
|
||||
- not shown here is the Extension process that isolates WebExtensions;
|
||||
- for non-WebRender, rasterization happens in the Content Process, and
|
||||
we send entire Layers to the GPU/Compositor process (via shared
|
||||
memory, only using actual IPC for its metadata like width & height);
|
||||
- if the GPU process crashes (a bug or a driver issue) we can simply
|
||||
restart it, resend the display list, and the browser itself doesn't crash;
|
||||
- the browser UI is just another set of DOM+JS, albeit one that runs
|
||||
with elevated privileges. That is, its JS can do things that
|
||||
normal JS cannot. It lives in the Parent Process, which then uses
|
||||
IPC to get it rendered, same as regular Content. (the IPC arrow also
|
||||
goes to WebRender Display List but is omitted to reduce clutter);
|
||||
- UI events get routed to APZ first, to minimize latency. By running
|
||||
inside the GPU process, we may have access to data such
|
||||
as rasterized clipping masks that enables finer grained hit testing;
|
||||
- the GPU process talks back to the content process; in particular,
|
||||
when APZ scrolls out of bounds, it asks Content to enlarge/shift the
|
||||
Scene with a new "display port";
|
||||
- we still use the GPU when we can for compositing even in the
|
||||
non-WebRender case;
|
||||
|
||||
|
||||
WebRender In Detail
|
||||
-------------------
|
||||
|
||||
Picture-, Spatial- and Clip Tree
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
Converting a display list into GPU commands is broken down into a
|
||||
number of steps and intermediate data structures.
|
||||
|
||||
The incoming display list uses "stacking contexts". For example, to
|
||||
render some text with a drop shadow, a display list will contain three
|
||||
items:
|
||||
|
||||
- "enable shadow" with some parameters such as shadow color, blur size, and offset;
|
||||
- the text item;
|
||||
- "pop all shadows" to deactivate shadows;
|
||||
|
||||
WebRender will break this down into two distinct elements, or
|
||||
"pictures". The first represents the shadow, so it contains a copy of the
|
||||
text item, but modified to use the shadow's color, and to shift the
|
||||
text by the shadow's offset. The second picture contains the original text
|
||||
to draw on top of the shadow.
|
||||
|
||||
The fact that the first picture, the shadow, needs to be blurred, is a
|
||||
"compositing" property of the picture which we'll deal with later.
|
||||
|
||||
Thus, the stack-based display list gets converted into a list of pictures
|
||||
-- or more generally, a hierarchy of pictures, since items are nested
|
||||
as per the original HTML.
|
||||
|
||||
Meanwhile, we also build a Spatial Tree -- a hierarchy of Spatial
|
||||
Nodes. This tree is a representation of how frames and divs are
|
||||
nested in the original DOM, or more precisely:
|
||||
|
||||
- a Reference Frame corresponds to a <div>
|
||||
- a Scrolling Frame corresponds to a scrollable part
|
||||
- a Sticky Frame corresponds to some fixed position CSS style
|
||||
|
||||
Each picture then points to a spatial node inside this tree, so by
|
||||
walking up and down the tree we can find the absolute position of
|
||||
where each picture should render (traversing down) and how large each
|
||||
element needs to be (traversing up).
|
||||
|
||||
And finally, we also have a Clip Tree, which contains Clip Shapes. For
|
||||
example, a rounded corner div will produce a clip shape, and since
|
||||
divs can be nested, you end up with another tree.
|
||||
|
||||
In summary, at the end of scene building the display list turned into
|
||||
a picture tree, plus a spatial tree that tells us what goes where
|
||||
relative to what, plus a clip tree.
|
||||
|
||||
RenderTask Tree
|
||||
~~~~~~~~~~~~~~~
|
||||
Now in a perfect world we could simply traverse the picture tree and
|
||||
start drawing things. However, recall that the first picture in our
|
||||
example is a "text shadow" that needs to be blurred. We can't just
|
||||
rasterize blurry text directly, so we need a number of steps or
|
||||
"render passes" to get the intended effect:
|
||||
|
||||
.. image:: RenderingOverviewBlurTask.png
|
||||
:align: right
|
||||
:height: 400px
|
||||
|
||||
- rasterize the text into an offscreen rendertarget;
|
||||
- apply one or more downscaling passes until the blur radius is reasonable;
|
||||
- apply a horizontal Gaussian blur;
|
||||
- apply a vertical Gaussian blur;
|
||||
- use the result as an input for whatever comes next, or blit it to
|
||||
its final position on the page (or more generally, on the containing
|
||||
parent surface/picture).
|
||||
|
||||
In the general case, which passes we need and how many of them depends
|
||||
on how the picture is supposed to be composited (CSS filters, SVG
|
||||
filters, effects) and its parameters (very large vs. small blur
|
||||
radius, say).
|
||||
|
||||
Thus, we walk the picture tree and build a render task tree: each high
|
||||
level abstraction like "blur me" gets broken down into the necessary
|
||||
render passes to get the effect. The result is again a tree because a
|
||||
render pass can have multiple input dependencies (eg. blending).
|
||||
|
||||
(Cfr. games, this has echoes of the Frostbite Framegraph in that it
|
||||
dynamically builds up a renderpass DAG and dynamically allocates storage
|
||||
for the outputs).
|
||||
|
||||
If there are complicated clip shapes that need to be rasterized first,
|
||||
so their output can be sampled as a texture for clip/discard
|
||||
operations, that would also end up in this tree as a dependency... (I think?).
|
||||
|
||||
Once we have the entire tree of dependencies, we analyze it to see
|
||||
which tasks can be combined into a single pass for efficiency. We
|
||||
ping-pong rendertargets when we can, but sometimes the dependencies
|
||||
cut across more than one level of the rendertask tree, and some
|
||||
copying is necessary.
|
||||
|
||||
Once we've figured out the passes and allocated storage for anything
|
||||
we wish to persist in the texture cache, we finally start rendering.
|
||||
|
||||
Caching
|
||||
```````
|
||||
|
||||
Just as with layers in the software rasterizer, it is not always
|
||||
necessary to redraw absolutely everything when parts of a document
|
||||
change. The webrender equivalent of layers is Slices -- a grouping of
|
||||
pictures that are expected to render and update together. If a slice
|
||||
isn't expected to change much, we give it a TileCacheInstance. It is
|
||||
itself made up of Tiles, where each tile will track what's
|
||||
in it, what's changing, and if it needs to be invalidated and redrawn
|
||||
or not as a result. Thus the "damage" from changes can be localized
|
||||
to single tiles, while we salvage the rest of the cache. If tiles
|
||||
keep seeing a lot of invalidations, they will recursively divide
|
||||
themselves in a quad-tree like structure to try and localize the
|
||||
invalidations. (And conversely, they'll recombine children if nothing is
|
||||
invalidating them "for a while").
|
||||
|
||||
Callbacks
|
||||
`````````
|
||||
GPU text rendering assumes that the individual font-glyphs are already
|
||||
available in a texture atlas. Likewise SVG is not being rendered on
|
||||
the GPU. Both inputs are prepared during scene building; glyph
|
||||
rasterization via a thread pool from within Rust itself, and SVG via
|
||||
opaque callbacks (back to C++) that produce blobs.
|
BIN
gfx/docs/RenderingOverviewBlurTask.png
Normal file
BIN
gfx/docs/RenderingOverviewBlurTask.png
Normal file
Binary file not shown.
After Width: | Height: | Size: 16 KiB |
BIN
gfx/docs/RenderingOverviewDetail.png
Normal file
BIN
gfx/docs/RenderingOverviewDetail.png
Normal file
Binary file not shown.
After Width: | Height: | Size: 145 KiB |
BIN
gfx/docs/RenderingOverviewSimple.png
Normal file
BIN
gfx/docs/RenderingOverviewSimple.png
Normal file
Binary file not shown.
After Width: | Height: | Size: 54 KiB |
Loading…
Reference in New Issue
Block a user