gecko-dev/gfx/wr/swgl
Lee Salzman 6fcd1830ea Bug 1917964 - Use __builtin_convertvector and __builtin_shufflevector on GCC when available. r=aosmond
GCC upstream recommends we use __builtin_convertvector and __builtin_shufflevector instead of __builtin_shuffle
for better code generation.

See: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=116666#c7

Differential Revision: https://phabricator.services.mozilla.com/D222067
2024-09-13 14:55:30 +00:00
..
src Bug 1917964 - Use __builtin_convertvector and __builtin_shufflevector on GCC when available. r=aosmond 2024-09-13 14:55:30 +00:00
build.rs
Cargo.toml
README.md

swgl

Software OpenGL implementation for WebRender

Overview

This is a relatively simple single threaded software rasterizer designed for use by WebRender. It will shade one quad at a time using a 4xf32 vector with one vertex per lane. It rasterizes quads usings spans and shades that span 4 pixels at a time.

Building

clang-cl is required to build on Windows. This can be done by installing the llvm binaries from https://releases.llvm.org/ and adding the installation to the path with something like set PATH=%PATH%;C:\Program Files\LLVM\bin. Then set CC=clang-cl and set CXX=clang-cl. That should be sufficient for cc-rs to use clang-cl instead of cl.

Extensions

SWGL contains a number of OpenGL and GLSL extensions designed to both ease integration with WebRender and to help accelerate span rasterization.

GLSL extension intrinsics are generally prefixed with swgl_ to distinguish them from other items in the GLSL namespace.

Inside GLSL, the SWGL preprocessor token is defined so that usage of SWGL extensions may be conditionally compiled.

void swgl_drawSpanRGBA8();
void swgl_drawSpanR8();

int swgl_SpanLength;
int swgl_StepSize;

mixed swgl_interpStep(mixed varying_input);
void swgl_stepInterp();

SWGL's default fragment processing calls a fragment shader's main function on groups of fragments in units of swgl_StepSize. On return, the value of gl_FragColor is read, packed to an appropriate pixel format, and sent to the blend stage for output to the destination framebuffer. This can be inefficient for some types of fragment shaders, such as those that must lookup from a texture and immediately output it, unpacking the texels only to subsequently repack them at cost. Also, various per-fragment conditions in the shader might need to be repeatedly checked, even though they are actually constant over the entire primitive.

To work around this inefficiency, SWGL allows fragments to optionally be processed over entire spans. This can both side-step the packing inefficiency as well as more efficiently deal with conditions that remain constant over an entire span. SWGL also introduces various specialized intrinsics for more efficiently dealing with certain types of primitive spans with optimal fixed-function processing.

Inside a fragment shader, a swgl_drawSpan function may be defined to override the normal fragment processing for that fragment shader. The function must then call some form of swgl_commit intrinsic to actually output to the destination framebuffer via the blend stage, as normal fragment processing does not take place otherwise as would have happened in main. This function is used by the rasterizer to process an entire span of fragments that have passed the depth test (if applicable) and clipping, but have not yet been output to the blend stage.

The amount of fragments within the span to be processed is designated by swgl_SpanLength and is always aligned to units of swgl_StepSize. The size of a group of fragments in terms of which swgl_commit intrinsics process and output fragments is designated by swgl_StepSize. The swgl_commit intrinsics will deduct accordingly from swgl_SpanLength in units of swgl_StepSize to reflect the fragments actually processed, which may be less than the entire span or up to the entire span.

Fragments should be output until swgl_SpanLength becomes zero to process the entire span. If swgl_drawSpan returns while leaving any fragments unprocessed, the remaining fragments will be processed as normal by the fragment shader's main function. This can be used to conditionally handle certain fast-paths in a fragment shader by otherwise defaulting to the main function if swgl_drawSpan can't appropriately process some or all of the fragments.

The values of any varying inputs to the fragment shader will be set to their values for the start of the span, but do not automatically update over the the course of a span within a given call to swgl_drawSpan. The swgl_interpStep intrinsic may be used to get the derivative per swgl_StepSize group of fragments of a varying input so that the caller may update such variables manually if desired or otherwise use that information for processing. The swgl_stepInterp intrinsic forces all such varying inputs to advance by a single step.

The RGBA8 version will be used when the destination framebuffer is RGBA8 format, and the R8 version will be used when the destination framebuffer is R8. Various other intrinsics described below may have restrictions on whether they can be used only with a certain destination framebuffer format and are noted as such if so.

void swgl_clipMask(sampler2D mask, vec2 offset, vec2 bb_origin, vec2 bb_size);

When called from the the vertex shader, this specifies a clip mask texture to be used to mask the currently drawn primitive while blending is enabled. This mask will only apply to the current primitive.

The mask must be an R8 texture that will be interpreted as alpha weighting applied to the source pixel prior to the blend stage. It is sampled 1:1 with nearest filtering without any applied transform. The given offset specifies the positioning of the clip mask relative to the framebuffer's viewport.

The supplied bounding box constrains sampling of the clip mask to only fall within the given rectangle, specified relative to the clip mask offset. Anything falling outside this rectangle will be clipped entirely. If the rectangle is empty, then the clip mask will be ignored.

void swgl_antiAlias(int edgeMask);

When called from the vertex shader, this enables anti-aliasing for the currently drawn primitive while blending is enabled. This setting will only apply to the current primitive. Anti-aliasing will be applied only to the edges corresponding to bits supplied in the mask. For simple use-cases, the edge mask can be set to all 1 bits to enable AA for the entire quad.

The order of the bits in the edge mask must match the winding order in which the vertices are output in the vertex shader if processed as a quad, so that the edge ends on that vertex. The easiest way to understand this ordering is that for a rectangle (x0,y0,x1,y1) then the edge Nth edge bit corresponds to the edge where Nth coordinate in the rectangle is constant.

SWGL tries to use an anti-aliasing method that is reasonably close to WR's signed-distance field approximation. WR would normally try to discern the 2D local-space coordinates of a given destination pixel relative to the 2D local-space bounding rectangle of a primitive. It then uses the screen- space derivative to try to determine the how many local-space units equate to a distance of around one screen-space pixel. A distance approximation of coverage is then used based on the distance in local-space from the the current pixel's center, roughly at half-intensity at pixel center and ranging to zero or full intensity within a radius of half a pixel away from the center. To account for AAing going outside the normal geometry boundaries of the primitive, WR has to extrude the primitive by a local-space estimate to allow some AA to happen within the extruded region.

SWGL can ultimately do this approximation more simply and get around the extrusion limitations by just ensuring spans encompass any pixel that is partially covered when computing span boundaries. Further, since SWGL already knows the slope of an edge and the coordinate of the span relative to the span boundaries, finding the partial coverage of a given span becomes easy to do without requiring any extra interpolants to track against local-space bounds. Essentially, SWGL just performs anti-aliasing on the actual geometry bounds, but when the pixels on a span's edge are determined to be partially covered during span rasterization, it uses the same distance field method as WR on those span boundary pixels to estimate the coverage based on edge slope.

void swgl_commitTextureLinearRGBA8(sampler, vec2 uv, vec4 uv_bounds);
void swgl_commitTextureLinearR8(sampler, vec2 uv, vec4 uv_bounds);
void swgl_commitTextureLinearR8ToRGBA8(sampler, vec2 uv, vec4 uv_bounds);

void swgl_commitTextureLinearColorRGBA8(sampler, vec2 uv, vec4 uv_bounds, vec4|float color);
void swgl_commitTextureLinearColorR8(sampler, vec2 uv, vec4 uv_bounds, vec4|float color);
void swgl_commitTextureLinearColorR8ToRGBA8(sampler, vec2 uv, vec4 uv_bounds, vec4|float color);

void swgl_commitTextureLinearRepeatRGBA8(sampler, vec2 uv, vec2 tile_repeat, vec4 uv_repeat, vec4 uv_bounds);
void swgl_commitTextureLinearRepeatColorRGBA8(sampler, vec2 uv, vec2 tile_repeat, vec4 uv_repeat, vec4 uv_bounds, vec4|float color);

void swgl_commitTextureNearestRGBA8(sampler, vec2 uv, vec4 uv_bounds);
void swgl_commitTextureNearestColorRGBA8(sampler, vec2 uv, vec4 uv_bounds, vec4|float color);

void swgl_commitTextureNearestRepeatRGBA8(sampler, vec2 uv, vec2 tile_repeat, vec4 uv_repeat, vec4 uv_bounds);
void swgl_commitTextureNearestRepeatColorRGBA8(sampler, vec2 uv, vec2 tile_repeat, vec4 uv_repeat, vec4 uv_bounds, vec4|float color);

void swgl_commitTextureRGBA8(sampler, vec2 uv, vec4 uv_bounds);
void swgl_commitTextureColorRGBA8(sampler, vec2 uv, vec4 uv_bounds, vec4|float color);

void swgl_commitTextureRepeatRGBA8(sampler, vec2 uv, vec2 tile_repeat, vec4 uv_repeat, vec4 uv_bounds);
void swgl_commitTextureRepeatColorRGBA8(sampler, vec2 uv, vec2 tile_repeat, vec4 uv_repeat, vec4 uv_bounds, vec4|float color);

void swgl_commitPartialTextureLinearR8(int len, sampler, vec2 uv, vec4 uv_bounds);
void swgl_commitPartialTextureLinearInvertR8(int len, sampler, vec2 uv, vec4 uv_bounds);

Samples and commits an entire span of texture starting at the given uv and within the supplied uv bounds. The color variations also accept a supplied color that modulates the result.

The RGBA8 versions may only be used to commit within swgl_drawSpanRGBA8, and the R8 versions may only be used to commit within swgl_drawSpanR8. The R8ToRGBA8 versions may be used to sample from an R8 source while committing to an RGBA8 framebuffer.

The Linear variations use a linear filter that bilinearly interpolates between the four samples near the pixel. The Nearest variations use a nearest filter that chooses the closest aliased sample to the center of the pixel. If neither Linear nor Nearest is specified in the swgl_commitTexture variation name, then it will automatically select either the Linear or Nearest variation depending on the sampler's specified filter.

The Repeat variations require an optional repeat rect that specifies how to scale and offset the UVs, assuming the UVs are normalized to repeat in the range 0 to 1. For NearestRepeat variations, it is assumed the repeat rect is always within the bounds. The tile repeat limit, if non-zero, specifies the maximum number of repetitions allowed.

The Partial variations allow committing only a sub-span rather the entire remaining span. These are currently only implemented in linear R8 variants for optimizing clip shaders in WebRender. The Invert variant of these is useful for implementing clip-out modes by inverting the source texture value.

// Premultiplied alpha over blend, but with source color set to source alpha modulated with a constant color.
void swgl_blendDropShadow(vec4 color);
// Premultiplied alpha over blend, but treats the source as a subpixel mask modulated with a constant color.
void swgl_blendSubpixelText(vec4 color);

SWGL allows overriding the blend mode per-primitive by calling swgl_blend intrinsics in the vertex shader. The existing blend mode set by the GL is replaced with the one specified by the intrinsic for the current primitive. The blend mode will be reset to the blend mode set by the GL for the next primitive after the current one, even within the same draw call.