Bug 1605508 - Write new on-boarding document with rendering overview r=jbonisteel

Differential Revision: https://phabricator.services.mozilla.com/D58059

--HG--
extra : moz-landing-system : lando
This commit is contained in:
Bert Peers 2019-12-22 12:44:09 +00:00
parent ce358595b7
commit 445ef3fc37
4 changed files with 285 additions and 0 deletions

View File

@ -0,0 +1,285 @@
Rendering Overview
==================
This document is an overview of the steps to render a webpage, and how HTML
gets transformed and broken down, step by step, into commands that can execute
on the GPU.
If you're coming into the graphics team with not a lot of background
in browsers, start here :)
.. contents::
High level overview
-------------------
.. image:: RenderingOverviewSimple.png
:width: 100%
Layout
~~~~~~
Starting at the left in the above image, we have a document
represented by a DOM - a Document Object Model. A Javascript engine
will execute JS code, either to make changes to the DOM, or to respond to
events generated by the DOM (or do both).
The DOM is a high level description and we don't know what to draw or
where until it is combined with a Cascading Style Sheet (CSS).
Combining these two and figuring out what, where and how to draw
things is the responsibility of the Layout team. The
DOM is converted into a hierarchical Frame Tree, which nests visual
elements (boxes). Each element points to some node in a Style Tree
that describes what it should look like -- color, transparency, etc.
The result is that now we know exactly what to render where, what goes
on top of what (layering and blending) and at what pixel coordinate.
This is the Display List.
The Display List is a light-weight data structure because it's shallow
-- it mostly points back to the Frame Tree. There are two problems
with this. First, we want to cross process boundaries at this point.
Everything up until now happens in a Content Process (of which there are
several). Actual GPU rendering happens in a GPU Process (on some
platforms). Second, everything up until now was written in C++; but
WebRender is written in Rust. Thus the shallow Display List needs to
be serialized in a completely self-contained binary blob that will
survive Interprocess Communication (IPC) and a language switch (C++ to
Rust). The result is the WebRender Display List.
WebRender
~~~~~~~~~
The GPU process receives the WebRender Display List blob and
de-serializes it into a Scene. This Scene contains more than the
strictly visible elements; for example, to anticipate scrolling, we
might have several paragraphs of text extending past the visible page.
For a given viewport, the Scene gets culled and stripped down to a
Frame. This is also where we start preparing data structures for GPU
rendering, for example getting some font glyphs into an atlas for
rasterizing text.
The final step takes the Frame and submits commands to the GPU to
actually render it. The GPU will execute the commands and composite
the final page.
Software
~~~~~~~~
The above is the new WebRender-enabled way to do things. But in the
schematic you'll note a second branch towards the bottom: this is the
legacy code path which does not use WebRender (nor Rust). In this
case, the Display List is converted into a Layer Tree. The purpose of
this Tree is to try and avoid having to re-render absolutely
everything when the page needs to be refreshed. For example, when
scrolling we should be able to redraw the page by mostly shifting
things around. However that requires those 'things' to still be around
from last time we drew the page. In other words, visual elements that
are likely to be static and reusable need to be drawn into their own
private "page" (a cache). Then we can recombine (composite) all of
these when redrawing the actual page.
Figuring out which elements would be good candidates for this, and
striking a balance between good performance versus excessive memory
use, is the purpose of the Layer Tree. Each 'layer' is a cached image
of some element(s). This logic also takes occlusion into account, eg.
don't allocate and render a layer for elements that are known to be
completely obscured by something in front of them.
Redrawing the page by combining the Layer Tree with any newly
rasterized elements is the job of the Compositor.
Even when a layer cannot be reused in its entirety, it is likely
that only a small part of it was invalidated. Thus there is an
elaborate system for tracking dirty rectangles, starting an update by
copying the area that can be salvaged, and then redrawing only what
cannot.
In fact, this idea can be extended to delta-tracking of display lists
themselves. Traversing the layout tree and building a display list is
also not cheap, so the code tries to partially invalidate and rebuild
the display list incrementally when possible.
This optimization is used both for non-WebRender and WebRender in
fact.
Asynchronous Panning And Zooming
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Earlier we mentioned that a Scene might contain more elements than are
strictly necessary for rendering what's visible (the Frame). The
reason for that is Asynchronous Panning and Zooming, or APZ for short.
The browser will feel much more responsive if scrolling & zooming can
short-circuit all of these data transformations and IPC boundaries,
and instead directly update an offset of some layer and recomposite.
(Think of late-latching in a VR context)
This simple idea introduces a lot of complexity: how much extra do you
rasterize, and in which direction? How much memory can we afford?
What about Javascript that responds to scroll events and perhaps does
something 'interesting' with the page in return? What about nested
frames or nested scrollbars? What if we scroll so much that we go
past the boundaries of the Scene that we know about?
See AsyncPanZoom.rst for all that and more.
A Few More Details
~~~~~~~~~~~~~~~~~~
Here's another schematic which basically repeats the previous one, but
showing a little bit more detail. Note that the direction is reversed
-- the data flow starts at the right. Sorry about that :)
.. image:: RenderingOverviewDetail.png
:width: 100%
Some things to note:
- there are multiple content processes, currently 4 of them. This is
for security reasons (sandboxing), stability (isolate crashes) and
performance (multi-core machines);
- ideally each "webpage" would run in its own process for security;
this is being developed under the term 'fission';
- there is only a single GPU process, if there is one at all;
some platforms have it as part of the Parent;
- not shown here is the Extension process that isolates WebExtensions;
- for non-WebRender, rasterization happens in the Content Process, and
we send entire Layers to the GPU/Compositor process (via shared
memory, only using actual IPC for its metadata like width & height);
- if the GPU process crashes (a bug or a driver issue) we can simply
restart it, resend the display list, and the browser itself doesn't crash;
- the browser UI is just another set of DOM+JS, albeit one that runs
with elevated privileges. That is, its JS can do things that
normal JS cannot. It lives in the Parent Process, which then uses
IPC to get it rendered, same as regular Content. (the IPC arrow also
goes to WebRender Display List but is omitted to reduce clutter);
- UI events get routed to APZ first, to minimize latency. By running
inside the GPU process, we may have access to data such
as rasterized clipping masks that enables finer grained hit testing;
- the GPU process talks back to the content process; in particular,
when APZ scrolls out of bounds, it asks Content to enlarge/shift the
Scene with a new "display port";
- we still use the GPU when we can for compositing even in the
non-WebRender case;
WebRender In Detail
-------------------
Picture-, Spatial- and Clip Tree
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Converting a display list into GPU commands is broken down into a
number of steps and intermediate data structures.
The incoming display list uses "stacking contexts". For example, to
render some text with a drop shadow, a display list will contain three
items:
- "enable shadow" with some parameters such as shadow color, blur size, and offset;
- the text item;
- "pop all shadows" to deactivate shadows;
WebRender will break this down into two distinct elements, or
"pictures". The first represents the shadow, so it contains a copy of the
text item, but modified to use the shadow's color, and to shift the
text by the shadow's offset. The second picture contains the original text
to draw on top of the shadow.
The fact that the first picture, the shadow, needs to be blurred, is a
"compositing" property of the picture which we'll deal with later.
Thus, the stack-based display list gets converted into a list of pictures
-- or more generally, a hierarchy of pictures, since items are nested
as per the original HTML.
Meanwhile, we also build a Spatial Tree -- a hierarchy of Spatial
Nodes. This tree is a representation of how frames and divs are
nested in the original DOM, or more precisely:
- a Reference Frame corresponds to a <div>
- a Scrolling Frame corresponds to a scrollable part
- a Sticky Frame corresponds to some fixed position CSS style
Each picture then points to a spatial node inside this tree, so by
walking up and down the tree we can find the absolute position of
where each picture should render (traversing down) and how large each
element needs to be (traversing up).
And finally, we also have a Clip Tree, which contains Clip Shapes. For
example, a rounded corner div will produce a clip shape, and since
divs can be nested, you end up with another tree.
In summary, at the end of scene building the display list turned into
a picture tree, plus a spatial tree that tells us what goes where
relative to what, plus a clip tree.
RenderTask Tree
~~~~~~~~~~~~~~~
Now in a perfect world we could simply traverse the picture tree and
start drawing things. However, recall that the first picture in our
example is a "text shadow" that needs to be blurred. We can't just
rasterize blurry text directly, so we need a number of steps or
"render passes" to get the intended effect:
.. image:: RenderingOverviewBlurTask.png
:align: right
:height: 400px
- rasterize the text into an offscreen rendertarget;
- apply one or more downscaling passes until the blur radius is reasonable;
- apply a horizontal Gaussian blur;
- apply a vertical Gaussian blur;
- use the result as an input for whatever comes next, or blit it to
its final position on the page (or more generally, on the containing
parent surface/picture).
In the general case, which passes we need and how many of them depends
on how the picture is supposed to be composited (CSS filters, SVG
filters, effects) and its parameters (very large vs. small blur
radius, say).
Thus, we walk the picture tree and build a render task tree: each high
level abstraction like "blur me" gets broken down into the necessary
render passes to get the effect. The result is again a tree because a
render pass can have multiple input dependencies (eg. blending).
(Cfr. games, this has echoes of the Frostbite Framegraph in that it
dynamically builds up a renderpass DAG and dynamically allocates storage
for the outputs).
If there are complicated clip shapes that need to be rasterized first,
so their output can be sampled as a texture for clip/discard
operations, that would also end up in this tree as a dependency... (I think?).
Once we have the entire tree of dependencies, we analyze it to see
which tasks can be combined into a single pass for efficiency. We
ping-pong rendertargets when we can, but sometimes the dependencies
cut across more than one level of the rendertask tree, and some
copying is necessary.
Once we've figured out the passes and allocated storage for anything
we wish to persist in the texture cache, we finally start rendering.
Caching
```````
Just as with layers in the software rasterizer, it is not always
necessary to redraw absolutely everything when parts of a document
change. The webrender equivalent of layers is Slices -- a grouping of
pictures that are expected to render and update together. If a slice
isn't expected to change much, we give it a TileCacheInstance. It is
itself made up of Tiles, where each tile will track what's
in it, what's changing, and if it needs to be invalidated and redrawn
or not as a result. Thus the "damage" from changes can be localized
to single tiles, while we salvage the rest of the cache. If tiles
keep seeing a lot of invalidations, they will recursively divide
themselves in a quad-tree like structure to try and localize the
invalidations. (And conversely, they'll recombine children if nothing is
invalidating them "for a while").
Callbacks
`````````
GPU text rendering assumes that the individual font-glyphs are already
available in a texture atlas. Likewise SVG is not being rendered on
the GPU. Both inputs are prepared during scene building; glyph
rasterization via a thread pool from within Rust itself, and SVG via
opaque callbacks (back to C++) that produce blobs.

Binary file not shown.

After

Width:  |  Height:  |  Size: 16 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 145 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 54 KiB