mirror of
https://gitee.com/openharmony/third_party_mesa3d
synced 2024-11-23 23:41:13 +00:00
87b83531bf
We need a single empty line between the code-block state and the text
in the block, otherwise the rST is invalid and the entire block will be
dropped, as is currently the case on the website.
While we're at it, remove some needless colons from these code-blocks as
well. They're not needed, and we usually don't have these in the docs.
Fixes: a2a8c6a36c
("docs: Add some documentation of game GL buffer object mapping behavior.")
Reviewed-by: Eric Engestrom <eric@engestrom.ch>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9710>
416 lines
25 KiB
ReStructuredText
416 lines
25 KiB
ReStructuredText
Buffer mapping patterns
|
|
-----------------------
|
|
|
|
There are two main strategies the driver has for CPU access to GL buffer
|
|
objects. One is that the GL calls allocate temporary storage and blit to the GPU
|
|
at
|
|
``glBufferSubData()``/``glBufferData()``/``glFlushMappedBufferRange()``/``glUnmapBuffer()``
|
|
time. This makes the behavior easily match. However, this may be more costly
|
|
than direct mapping of the GL BO on some platforms, and is essentially not
|
|
available to tiling GPUs (since tiling involves running through the command
|
|
stream multiple times). Thus, GL has additional interfaces to help make it so
|
|
apps can directly access memory while avoiding implicit blocking on the GPU
|
|
rendering from those BOs.
|
|
|
|
Rendering engines have a variety of knobs to set on those GL interfaces for data
|
|
upload, and as a whole they seem to take just about every path available. Let's
|
|
look at some examples to see how they might constrain GL driver buffer upload
|
|
behavior.
|
|
|
|
Portal 2
|
|
========
|
|
|
|
.. code-block:: console
|
|
|
|
1030842 glXSwapBuffers(dpy = 0x82a8000, drawable = 20971540)
|
|
1030876 glBufferDataARB(target = GL_ELEMENT_ARRAY_BUFFER, size = 65536, data = NULL, usage = GL_DYNAMIC_DRAW)
|
|
1030877 glBufferSubData(target = GL_ELEMENT_ARRAY_BUFFER, offset = 0, size = 576, data = blob(576))
|
|
1030896 glDrawRangeElementsBaseVertex(mode = GL_TRIANGLES, start = 0, end = 526, count = 252, type = GL_UNSIGNED_SHORT, indices = NULL, basevertex = 0)
|
|
1030915 glDrawRangeElementsBaseVertex(mode = GL_TRIANGLES, start = 0, end = 19657, count = 36, type = GL_UNSIGNED_SHORT, indices = 0x1f8, basevertex = 0)
|
|
1030917 glBufferDataARB(target = GL_ARRAY_BUFFER, size = 1572864, data = NULL, usage = GL_DYNAMIC_DRAW)
|
|
1030918 glBufferSubData(target = GL_ARRAY_BUFFER, offset = 0, size = 128, data = blob(128))
|
|
1030919 glBufferSubData(target = GL_ELEMENT_ARRAY_BUFFER, offset = 576, size = 12, data = blob(12))
|
|
1030936 glDrawRangeElementsBaseVertex(mode = GL_TRIANGLES, start = 0, end = 3, count = 6, type = GL_UNSIGNED_SHORT, indices = 0x240, basevertex = 0)
|
|
1030937 glBufferSubData(target = GL_ARRAY_BUFFER, offset = 128, size = 128, data = blob(128))
|
|
1030938 glBufferSubData(target = GL_ELEMENT_ARRAY_BUFFER, offset = 588, size = 12, data = blob(12))
|
|
1030940 glDrawRangeElementsBaseVertex(mode = GL_TRIANGLES, start = 4, end = 7, count = 6, type = GL_UNSIGNED_SHORT, indices = 0x24c, basevertex = 0)
|
|
[... repeated draws at increasing offsets]
|
|
1033097 glXSwapBuffers(dpy = 0x82a8000, drawable = 20971540)
|
|
|
|
From this sequence, we can see that it is important that the driver either
|
|
implement ``glBufferSubData()`` as a blit from a streaming uploader in sequence with
|
|
the ``glDraw*()`` calls (a common behavior for non-tiled GPUs, particularly those with
|
|
dedicated memory), or that you:
|
|
|
|
1) Track the valid range of the buffer so that you don't have to flush the draws
|
|
and synchronize on each following ``glBufferSubData()``.
|
|
|
|
2) Reallocate the buffer storage on ``glBufferData`` so that your first
|
|
``glBufferSubData()`` of the frame doesn't stall on the last frame's
|
|
rendering completing.
|
|
|
|
You can't just empty your valid range on ``glBufferData()`` unless you know that
|
|
the GPU access from the previous frame has completed. This pattern of
|
|
incrementing ``glBufferSubData()`` offsets interleaved with draws from that data
|
|
is common among newer Valve games.
|
|
|
|
.. code-block:: console
|
|
|
|
[ during setup ]
|
|
|
|
679259 glGenBuffersARB(n = 1, buffers = &1314)
|
|
679260 glBindBufferARB(target = GL_ELEMENT_ARRAY_BUFFER, buffer = 1314)
|
|
679261 glBufferDataARB(target = GL_ELEMENT_ARRAY_BUFFER, size = 3072, data = NULL, usage = GL_STATIC_DRAW)
|
|
679264 glMapBufferRange(target = GL_ELEMENT_ARRAY_BUFFER, offset = 0, length = 3072, access = GL_MAP_WRITE_BIT | GL_MAP_FLUSH_EXPLICIT_BIT) = 0xd7384000
|
|
679269 glFlushMappedBufferRange(target = GL_ELEMENT_ARRAY_BUFFER, offset = 0, length = 3072)
|
|
679270 glUnmapBuffer(target = GL_ELEMENT_ARRAY_BUFFER) = GL_TRUE
|
|
|
|
[... setup of other buffers on this binding point]
|
|
|
|
679343 glBindBufferARB(target = GL_ELEMENT_ARRAY_BUFFER, buffer = 1314)
|
|
679344 glMapBufferRange(target = GL_ELEMENT_ARRAY_BUFFER, offset = 0, length = 768, access = GL_MAP_WRITE_BIT | GL_MAP_FLUSH_EXPLICIT_BIT) = 0xd7384000
|
|
679346 glFlushMappedBufferRange(target = GL_ELEMENT_ARRAY_BUFFER, offset = 0, length = 768)
|
|
679347 glUnmapBuffer(target = GL_ELEMENT_ARRAY_BUFFER) = GL_TRUE
|
|
679348 glMapBufferRange(target = GL_ELEMENT_ARRAY_BUFFER, offset = 768, length = 768, access = GL_MAP_WRITE_BIT | GL_MAP_FLUSH_EXPLICIT_BIT) = 0xd7384300
|
|
679350 glFlushMappedBufferRange(target = GL_ELEMENT_ARRAY_BUFFER, offset = 0, length = 768)
|
|
679351 glUnmapBuffer(target = GL_ELEMENT_ARRAY_BUFFER) = GL_TRUE
|
|
679352 glMapBufferRange(target = GL_ELEMENT_ARRAY_BUFFER, offset = 1536, length = 768, access = GL_MAP_WRITE_BIT | GL_MAP_FLUSH_EXPLICIT_BIT) = 0xd7384600
|
|
679354 glFlushMappedBufferRange(target = GL_ELEMENT_ARRAY_BUFFER, offset = 0, length = 768)
|
|
679355 glUnmapBuffer(target = GL_ELEMENT_ARRAY_BUFFER) = GL_TRUE
|
|
679356 glMapBufferRange(target = GL_ELEMENT_ARRAY_BUFFER, offset = 2304, length = 768, access = GL_MAP_WRITE_BIT | GL_MAP_FLUSH_EXPLICIT_BIT) = 0xd7384900
|
|
679358 glFlushMappedBufferRange(target = GL_ELEMENT_ARRAY_BUFFER, offset = 0, length = 768)
|
|
679359 glUnmapBuffer(target = GL_ELEMENT_ARRAY_BUFFER) = GL_TRUE
|
|
|
|
[... setup completes and we start drawing later]
|
|
|
|
761845 glBindBufferARB(target = GL_ELEMENT_ARRAY_BUFFER, buffer = 1314)
|
|
761846 glDrawRangeElementsBaseVertex(mode = GL_TRIANGLES, start = 0, end = 323, count = 384, type = GL_UNSIGNED_SHORT, indices = NULL, basevertex = 0)
|
|
|
|
This suggests that, for non-blitting drivers, resetting your "might be used on
|
|
the GPU" range after a stall could save you a bunch of additional GPU stalls
|
|
during setup.
|
|
|
|
Terraria
|
|
========
|
|
|
|
.. code-block:: console
|
|
|
|
167581 glXSwapBuffers(dpy = 0x3004630, drawable = 25165844)
|
|
|
|
167585 glBufferData(target = GL_ARRAY_BUFFER, size = 196608, data = NULL, usage = GL_STREAM_DRAW)
|
|
167586 glBufferSubData(target = GL_ARRAY_BUFFER, offset = 0, size = 1728, data = blob(1728))
|
|
167588 glDrawRangeElementsBaseVertex(mode = GL_TRIANGLES, start = 0, end = 71, count = 108, type = GL_UNSIGNED_SHORT, indices = NULL, basevertex = 0)
|
|
167589 glBufferData(target = GL_ARRAY_BUFFER, size = 196608, data = NULL, usage = GL_STREAM_DRAW)
|
|
167590 glBufferSubData(target = GL_ARRAY_BUFFER, offset = 0, size = 27456, data = blob(27456))
|
|
167592 glDrawRangeElementsBaseVertex(mode = GL_TRIANGLES, start = 0, end = 7, count = 12, type = GL_UNSIGNED_SHORT, indices = NULL, basevertex = 0)
|
|
167594 glDrawRangeElementsBaseVertex(mode = GL_TRIANGLES, start = 0, end = 3, count = 6, type = GL_UNSIGNED_SHORT, indices = NULL, basevertex = 8)
|
|
167596 glDrawRangeElementsBaseVertex(mode = GL_TRIANGLES, start = 0, end = 3, count = 6, type = GL_UNSIGNED_SHORT, indices = NULL, basevertex = 12)
|
|
[...]
|
|
|
|
In this game, we can see ``glBufferData()`` being used on the same array buffer
|
|
throughout, to get new storage so that the ``glBufferSubData()`` doesn't cause
|
|
synchronization.
|
|
|
|
Don't Starve
|
|
============
|
|
|
|
.. code-block:: console
|
|
|
|
7251917 glGenBuffers(n = 1, buffers = &115052)
|
|
7251918 glBindBuffer(target = GL_ARRAY_BUFFER, buffer = 115052)
|
|
7251919 glBufferData(target = GL_ARRAY_BUFFER, size = 144, data = blob(144), usage = GL_STREAM_DRAW)
|
|
7251921 glBindBuffer(target = GL_ARRAY_BUFFER, buffer = 115052)
|
|
7251928 glDrawArrays(mode = GL_TRIANGLES, first = 0, count = 6)
|
|
7251930 glBindBuffer(target = GL_ARRAY_BUFFER, buffer = 114872)
|
|
7251936 glDrawArrays(mode = GL_TRIANGLES, first = 0, count = 18)
|
|
7251938 glGenBuffers(n = 1, buffers = &115053)
|
|
7251939 glBindBuffer(target = GL_ARRAY_BUFFER, buffer = 115053)
|
|
7251940 glBufferData(target = GL_ARRAY_BUFFER, size = 144, data = blob(144), usage = GL_STREAM_DRAW)
|
|
7251942 glBindBuffer(target = GL_ARRAY_BUFFER, buffer = 115053)
|
|
7251949 glDrawArrays(mode = GL_TRIANGLES, first = 0, count = 6)
|
|
7251973 glXSwapBuffers(dpy = 0x86dd860, drawable = 20971540)
|
|
[... drawing next frame]
|
|
7252388 glDeleteBuffers(n = 1, buffers = &115052)
|
|
7252389 glDeleteBuffers(n = 1, buffers = &115053)
|
|
7252390 glXSwapBuffers(dpy = 0x86dd860, drawable = 20971540)
|
|
|
|
In this game we have a lot of tiny ``glBufferData()`` calls, suggesting that we
|
|
could see working set wins and possibly CPU overhead reduction by packing small
|
|
GL buffers in the same BO. Interestingly, the deletes of the temporary buffers
|
|
always happen at the end of the next frame.
|
|
|
|
Euro Truck Simulator
|
|
====================
|
|
|
|
.. code-block:: console
|
|
|
|
[usage of VBO 14,15]
|
|
[...]
|
|
885199 glXSwapBuffers(dpy = 0x379a3e0, drawable = 20971527)
|
|
885203 glInvalidateBufferData(buffer = 14)
|
|
885204 glInvalidateBufferData(buffer = 15)
|
|
[...]
|
|
889330 glXSwapBuffers(dpy = 0x379a3e0, drawable = 20971527)
|
|
889334 glInvalidateBufferData(buffer = 12)
|
|
889335 glInvalidateBufferData(buffer = 16)
|
|
[...]
|
|
893461 glXSwapBuffers(dpy = 0x379a3e0, drawable = 20971527)
|
|
893462 glClientWaitSync(sync = 0x77eee10, flags = 0x0, timeout = 0) = GL_ALREADY_SIGNALED
|
|
893463 glDeleteSync(sync = 0x780a630)
|
|
893464 glFenceSync(condition = GL_SYNC_GPU_COMMANDS_COMPLETE, flags = 0) = 0x78ec730
|
|
893465 glInvalidateBufferData(buffer = 13)
|
|
893466 glInvalidateBufferData(buffer = 17)
|
|
893505 glBindBuffer(target = GL_COPY_READ_BUFFER, buffer = 14)
|
|
893506 glMapBufferRange(target = GL_COPY_READ_BUFFER, offset = 0, length = 788, access = GL_MAP_WRITE_BIT | GL_MAP_INVALIDATE_BUFFER_BIT | GL_MAP_UNSYNCHRONIZED_BIT) = 0x7b034efd1000
|
|
893508 glUnmapBuffer(target = GL_COPY_READ_BUFFER) = GL_TRUE
|
|
893509 glBindBuffer(target = GL_COPY_READ_BUFFER, buffer = 15)
|
|
893510 glMapBufferRange(target = GL_COPY_READ_BUFFER, offset = 0, length = 32, access = GL_MAP_WRITE_BIT | GL_MAP_INVALIDATE_BUFFER_BIT | GL_MAP_UNSYNCHRONIZED_BIT) = 0x7b034e5df000
|
|
893512 glUnmapBuffer(target = GL_COPY_READ_BUFFER) = GL_TRUE
|
|
893532 glBindVertexBuffers(first = 0, count = 2, buffers = {10, 15}, offsets = {0, 0}, strides = {52, 16})
|
|
893552 glDrawElementsInstancedBaseVertex(mode = GL_TRIANGLES, count = 18, type = GL_UNSIGNED_SHORT, indices = 0x13f280, instancecount = 1, basevertex = 25131)
|
|
893609 glDrawArrays(mode = GL_TRIANGLES, first = 0, count = 6)
|
|
893732 glBindVertexBuffers(first = 0, count = 1, buffers = &14, offsets = &0, strides = &48)
|
|
893733 glBindBuffer(target = GL_ELEMENT_ARRAY_BUFFER, buffer = 14)
|
|
893744 glDrawElementsBaseVertex(mode = GL_TRIANGLES, count = 6, type = GL_UNSIGNED_SHORT, indices = 0xf0, basevertex = 0)
|
|
893759 glDrawElementsBaseVertex(mode = GL_TRIANGLES, count = 24, type = GL_UNSIGNED_SHORT, indices = 0x2e0, basevertex = 6)
|
|
893786 glDrawElementsBaseVertex(mode = GL_TRIANGLES, count = 600, type = GL_UNSIGNED_SHORT, indices = 0xe87b0, basevertex = 21515)
|
|
893822 glDrawArrays(mode = GL_TRIANGLES, first = 0, count = 6)
|
|
893845 glBindBuffer(target = GL_COPY_READ_BUFFER, buffer = 14)
|
|
893846 glMapBufferRange(target = GL_COPY_READ_BUFFER, offset = 788, length = 788, access = GL_MAP_WRITE_BIT | GL_MAP_INVALIDATE_RANGE_BIT | GL_MAP_UNSYNCHRONIZED_BIT) = 0x7b034efd1314
|
|
893848 glUnmapBuffer(target = GL_COPY_READ_BUFFER) = GL_TRUE
|
|
893886 glDrawElementsInstancedBaseVertex(mode = GL_TRIANGLES, count = 18, type = GL_UNSIGNED_SHORT, indices = 0x13f280, instancecount = 1, basevertex = 25131)
|
|
893943 glDrawArrays(mode = GL_TRIANGLES, first = 0, count = 6)
|
|
|
|
At the start of this frame, buffer 14 and 15 haven't been used in the previous 2
|
|
frames, and the ``GL_ARB_sync`` fence has ensured that the GPU has at least started
|
|
frame n-1 as the CPU starts the current frame. The first map is ``offset = 0,
|
|
INVALIDATE_BUFFER | UNSYNCHRONIZED``, which suggests that the driver should
|
|
reallocate storage for the mapping even in the ``UNSYNCHRONIZED`` case, except
|
|
that the buffer is definitely going to be idle, making reallocation unnecessary
|
|
(you may need to empty your valid range, though, to prevent unnecessary batch
|
|
flushes).
|
|
|
|
Also note the use of a totally unrelated binding point for the mapping of the
|
|
vertex array -- you can't effectively use it as a hint for any buffer placement
|
|
in memory. The game does also use ``glCopyBufferSubData()``, but only on a
|
|
different buffer.
|
|
|
|
|
|
Plague Inc
|
|
==========
|
|
|
|
.. code-block:: console
|
|
|
|
1640732 glXSwapBuffers(dpy = 0xb218f20, drawable = 23068674)
|
|
1640733 glClientWaitSync(sync = 0xb4141430, flags = 0x0, timeout = 0) = GL_ALREADY_SIGNALED
|
|
1640734 glDeleteSync(sync = 0xb4141430)
|
|
1640735 glFenceSync(condition = GL_SYNC_GPU_COMMANDS_COMPLETE, flags = 0) = 0xb4141430
|
|
|
|
1640780 glBindBuffer(target = GL_ARRAY_BUFFER, buffer = 78)
|
|
1640787 glBindBuffer(target = GL_ELEMENT_ARRAY_BUFFER, buffer = 79)
|
|
1640788 glDrawElements(mode = GL_TRIANGLES, count = 9636, type = GL_UNSIGNED_SHORT, indices = NULL)
|
|
1640795 glDrawElements(mode = GL_TRIANGLES, count = 9636, type = GL_UNSIGNED_SHORT, indices = NULL)
|
|
1640813 glBindBuffer(target = GL_COPY_WRITE_BUFFER, buffer = 1096)
|
|
1640814 glMapBufferRange(target = GL_COPY_WRITE_BUFFER, offset = 0, length = 67584, access = GL_MAP_WRITE_BIT | GL_MAP_FLUSH_EXPLICIT_BIT | GL_MAP_UNSYNCHRONIZED_BIT) = 0xbfef4000
|
|
1640815 glBindBuffer(target = GL_COPY_WRITE_BUFFER, buffer = 1091)
|
|
1640816 glMapBufferRange(target = GL_COPY_WRITE_BUFFER, offset = 0, length = 12, access = GL_MAP_WRITE_BIT | GL_MAP_FLUSH_EXPLICIT_BIT | GL_MAP_UNSYNCHRONIZED_BIT) = 0xc3998000
|
|
1640817 glBindBuffer(target = GL_COPY_WRITE_BUFFER, buffer = 1096)
|
|
1640819 glFlushMappedBufferRange(target = GL_COPY_WRITE_BUFFER, offset = 0, length = 352)
|
|
1640820 glUnmapBuffer(target = GL_COPY_WRITE_BUFFER) = GL_TRUE
|
|
1640821 glBindBuffer(target = GL_COPY_WRITE_BUFFER, buffer = 1091)
|
|
1640823 glFlushMappedBufferRange(target = GL_COPY_WRITE_BUFFER, offset = 0, length = 12)
|
|
1640824 glUnmapBuffer(target = GL_COPY_WRITE_BUFFER) = GL_TRUE
|
|
1640825 glBindBuffer(target = GL_ARRAY_BUFFER, buffer = 1096)
|
|
1640831 glBindBuffer(target = GL_ELEMENT_ARRAY_BUFFER, buffer = 1091)
|
|
1640832 glDrawElements(mode = GL_TRIANGLES, count = 6, type = GL_UNSIGNED_SHORT, indices = NULL)
|
|
|
|
1640847 glBindBuffer(target = GL_COPY_WRITE_BUFFER, buffer = 1096)
|
|
1640848 glMapBufferRange(target = GL_COPY_WRITE_BUFFER, offset = 352, length = 67584, access = GL_MAP_WRITE_BIT | GL_MAP_FLUSH_EXPLICIT_BIT | GL_MAP_UNSYNCHRONIZED_BIT) = 0xbfef4160
|
|
1640849 glBindBuffer(target = GL_COPY_WRITE_BUFFER, buffer = 1091)
|
|
1640850 glMapBufferRange(target = GL_COPY_WRITE_BUFFER, offset = 88, length = 12, access = GL_MAP_WRITE_BIT | GL_MAP_FLUSH_EXPLICIT_BIT | GL_MAP_UNSYNCHRONIZED_BIT) = 0xc3998058
|
|
1640851 glBindBuffer(target = GL_COPY_WRITE_BUFFER, buffer = 1096)
|
|
1640853 glFlushMappedBufferRange(target = GL_COPY_WRITE_BUFFER, offset = 0, length = 352)
|
|
1640854 glUnmapBuffer(target = GL_COPY_WRITE_BUFFER) = GL_TRUE
|
|
1640855 glBindBuffer(target = GL_COPY_WRITE_BUFFER, buffer = 1091)
|
|
1640857 glFlushMappedBufferRange(target = GL_COPY_WRITE_BUFFER, offset = 0, length = 12)
|
|
1640858 glUnmapBuffer(target = GL_COPY_WRITE_BUFFER) = GL_TRUE
|
|
1640863 glDrawElementsBaseVertex(mode = GL_TRIANGLES, count = 6, type = GL_UNSIGNED_SHORT, indices = 0x58, basevertex = 4)
|
|
|
|
At the start of this frame, the VBOs haven't been used in about 6 frames, and
|
|
the ``GL_ARB_sync`` fence has ensured that the GPU has started frame n-1.
|
|
|
|
Note the use of ``glFlushMappedBufferRange()`` on a small fraction of the size
|
|
of the VBO -- it is important that a blitting driver make use of the flush
|
|
ranges when in explicit mode.
|
|
|
|
Darkest Dungeon
|
|
===============
|
|
|
|
.. code-block:: console
|
|
|
|
938384 glXSwapBuffers(dpy = 0x377fcd0, drawable = 23068692)
|
|
|
|
938385 glBindBuffer(target = GL_ARRAY_BUFFER, buffer = 2)
|
|
938386 glBufferData(target = GL_ARRAY_BUFFER, size = 1048576, data = NULL, usage = GL_STREAM_DRAW)
|
|
938511 glBindBuffer(target = GL_ARRAY_BUFFER, buffer = 2)
|
|
938512 glMapBufferRange(target = GL_ARRAY_BUFFER, offset = 0, length = 1048576, access = GL_MAP_WRITE_BIT | GL_MAP_FLUSH_EXPLICIT_BIT | GL_MAP_UNSYNCHRONIZED_BIT) = 0x7a73fcaa7000
|
|
938514 glFlushMappedBufferRange(target = GL_ARRAY_BUFFER, offset = 0, length = 512)
|
|
938515 glUnmapBuffer(target = GL_ARRAY_BUFFER) = GL_TRUE
|
|
938523 glBindBuffer(target = GL_ELEMENT_ARRAY_BUFFER, buffer = 1)
|
|
938524 glBindBuffer(target = GL_ARRAY_BUFFER, buffer = 2)
|
|
938525 glDrawElements(mode = GL_TRIANGLES, count = 24, type = GL_UNSIGNED_SHORT, indices = NULL)
|
|
938527 glBindBuffer(target = GL_ARRAY_BUFFER, buffer = 2)
|
|
938528 glMapBufferRange(target = GL_ARRAY_BUFFER, offset = 0, length = 1048576, access = GL_MAP_WRITE_BIT | GL_MAP_FLUSH_EXPLICIT_BIT | GL_MAP_UNSYNCHRONIZED_BIT) = 0x7a73fcaa7000
|
|
938530 glFlushMappedBufferRange(target = GL_ARRAY_BUFFER, offset = 512, length = 512)
|
|
938531 glUnmapBuffer(target = GL_ARRAY_BUFFER) = GL_TRUE
|
|
938539 glBindBuffer(target = GL_ELEMENT_ARRAY_BUFFER, buffer = 1)
|
|
938540 glBindBuffer(target = GL_ARRAY_BUFFER, buffer = 2)
|
|
938541 glDrawElements(mode = GL_TRIANGLES, count = 24, type = GL_UNSIGNED_SHORT, indices = 0x30)
|
|
[... more maps and draws at increasing offsets]
|
|
|
|
Interesting note for this game, after the initial ``glBufferData()`` in the
|
|
frame to reallocate the storage, it unsync maps the whole buffer each time, and
|
|
just changes which region it flushes. The same GL buffer name is used in every
|
|
frame.
|
|
|
|
Tabletop Simulator
|
|
==================
|
|
|
|
.. code-block:: console
|
|
|
|
1287594 glXSwapBuffers(dpy = 0x3e10810, drawable = 23068692)
|
|
1287595 glClientWaitSync(sync = 0x7abf554e37b0, flags = 0x0, timeout = 0) = GL_ALREADY_SIGNALED
|
|
1287596 glDeleteSync(sync = 0x7abf554e37b0)
|
|
1287597 glFenceSync(condition = GL_SYNC_GPU_COMMANDS_COMPLETE, flags = 0) = 0x7abf56647490
|
|
|
|
1287614 glBindBuffer(target = GL_COPY_WRITE_BUFFER, buffer = 480)
|
|
1287615 glMapBufferRange(target = GL_COPY_WRITE_BUFFER, offset = 0, length = 384, access = GL_MAP_WRITE_BIT | GL_MAP_INVALIDATE_RANGE_BIT | GL_MAP_FLUSH_EXPLICIT_BIT | GL_MAP_UNSYNCHRONIZED_BIT) = 0x7abf2e79a000
|
|
1287642 glBindBuffer(target = GL_ARRAY_BUFFER, buffer = 614)
|
|
1287650 glBindBuffer(target = GL_COPY_WRITE_BUFFER, buffer = 5)
|
|
1287651 glBufferSubData(target = GL_COPY_WRITE_BUFFER, offset = 0, size = 1088, data = blob(1088))
|
|
1287652 glBindBuffer(target = GL_ELEMENT_ARRAY_BUFFER, buffer = 615)
|
|
1287653 glDrawElements(mode = GL_TRIANGLES, count = 1788, type = GL_UNSIGNED_SHORT, indices = NULL)
|
|
[... more draw calls]
|
|
1289055 glBindBuffer(target = GL_COPY_WRITE_BUFFER, buffer = 480)
|
|
1289057 glFlushMappedBufferRange(target = GL_COPY_WRITE_BUFFER, offset = 0, length = 384)
|
|
1289058 glUnmapBuffer(target = GL_COPY_WRITE_BUFFER) = GL_TRUE
|
|
1289059 glBindBuffer(target = GL_ARRAY_BUFFER, buffer = 480)
|
|
1289066 glDrawArrays(mode = GL_TRIANGLE_STRIP, first = 12, count = 4)
|
|
1289068 glDrawArrays(mode = GL_TRIANGLE_STRIP, first = 8, count = 4)
|
|
1289553 glXSwapBuffers(dpy = 0x3e10810, drawable = 23068692)
|
|
|
|
In this app, buffer 480 gets used like this every other frame. The ``GL_ARB_sync``
|
|
fence ensures that frame n-1 has started on the GPU before CPU work starts on
|
|
the current frame, so the unsynchronized access to the buffers is safe.
|
|
|
|
Hollow Knight
|
|
=============
|
|
|
|
.. code-block:: console
|
|
|
|
1873034 glXSwapBuffers(dpy = 0x28609d0, drawable = 23068692)
|
|
1873035 glClientWaitSync(sync = 0x7b1a5ca6e130, flags = 0x0, timeout = 0) = GL_ALREADY_SIGNALED
|
|
1873036 glDeleteSync(sync = 0x7b1a5ca6e130)
|
|
1873037 glFenceSync(condition = GL_SYNC_GPU_COMMANDS_COMPLETE, flags = 0) = 0x7b1a5ca6e130
|
|
1873038 glBindBuffer(target = GL_COPY_WRITE_BUFFER, buffer = 29)
|
|
1873039 glMapBufferRange(target = GL_COPY_WRITE_BUFFER, offset = 0, length = 8640, access = GL_MAP_WRITE_BIT | GL_MAP_FLUSH_EXPLICIT_BIT | GL_MAP_UNSYNCHRONIZED_BIT) = 0x7b1a04c7e000
|
|
1873040 glBindBuffer(target = GL_COPY_WRITE_BUFFER, buffer = 30)
|
|
1873041 glMapBufferRange(target = GL_COPY_WRITE_BUFFER, offset = 0, length = 720, access = GL_MAP_WRITE_BIT | GL_MAP_FLUSH_EXPLICIT_BIT | GL_MAP_UNSYNCHRONIZED_BIT) = 0x7b1a07430000
|
|
1873065 glBindBuffer(target = GL_COPY_WRITE_BUFFER, buffer = 29)
|
|
1873067 glFlushMappedBufferRange(target = GL_COPY_WRITE_BUFFER, offset = 0, length = 8640)
|
|
1873068 glUnmapBuffer(target = GL_COPY_WRITE_BUFFER) = GL_TRUE
|
|
1873069 glBindBuffer(target = GL_COPY_WRITE_BUFFER, buffer = 30)
|
|
1873071 glFlushMappedBufferRange(target = GL_COPY_WRITE_BUFFER, offset = 0, length = 720)
|
|
1873072 glUnmapBuffer(target = GL_COPY_WRITE_BUFFER) = GL_TRUE
|
|
1873073 glBindBuffer(target = GL_COPY_WRITE_BUFFER, buffer = 29)
|
|
1873074 glMapBufferRange(target = GL_COPY_WRITE_BUFFER, offset = 8640, length = 576, access = GL_MAP_WRITE_BIT | GL_MAP_FLUSH_EXPLICIT_BIT | GL_MAP_UNSYNCHRONIZED_BIT) = 0x7b1a04c801c0
|
|
1873075 glBindBuffer(target = GL_COPY_WRITE_BUFFER, buffer = 30)
|
|
1873076 glMapBufferRange(target = GL_COPY_WRITE_BUFFER, offset = 720, length = 72, access = GL_MAP_WRITE_BIT | GL_MAP_FLUSH_EXPLICIT_BIT | GL_MAP_UNSYNCHRONIZED_BIT) = 0x7b1a074302d0
|
|
1873077 glBindBuffer(target = GL_COPY_WRITE_BUFFER, buffer = 29)
|
|
1873079 glFlushMappedBufferRange(target = GL_COPY_WRITE_BUFFER, offset = 0, length = 576)
|
|
1873080 glUnmapBuffer(target = GL_COPY_WRITE_BUFFER) = GL_TRUE
|
|
1873081 glBindBuffer(target = GL_COPY_WRITE_BUFFER, buffer = 30)
|
|
1873083 glFlushMappedBufferRange(target = GL_COPY_WRITE_BUFFER, offset = 0, length = 72)
|
|
1873084 glUnmapBuffer(target = GL_COPY_WRITE_BUFFER) = GL_TRUE
|
|
1873085 glBindBuffer(target = GL_ARRAY_BUFFER, buffer = 29)
|
|
1873096 glBindBuffer(target = GL_ELEMENT_ARRAY_BUFFER, buffer = 30)
|
|
1873097 glDrawElementsBaseVertex(mode = GL_TRIANGLES, count = 36, type = GL_UNSIGNED_SHORT, indices = 0x2d0, basevertex = 240)
|
|
|
|
In this app, buffer 29/30 get used like this starting from offset 0 every other
|
|
frame. The ``GL_ARB_sync`` fence is used to make sure that the GPU has reached the
|
|
start of the previous frame before we go unsynchronized writing over the n-2
|
|
frame's buffer.
|
|
|
|
Borderlands 2
|
|
=============
|
|
|
|
.. code-block:: console
|
|
|
|
3561998 glFlush()
|
|
3562004 glXSwapBuffers(dpy = 0xbaf0f90, drawable = 23068705)
|
|
3562006 glClientWaitSync(sync = 0x231c2ab0, flags = GL_SYNC_FLUSH_COMMANDS_BIT, timeout = 10000000000) = GL_ALREADY_SIGNALED
|
|
3562007 glDeleteSync(sync = 0x231c2ab0)
|
|
3562008 glFenceSync(condition = GL_SYNC_GPU_COMMANDS_COMPLETE, flags = 0) = 0x231aadc0
|
|
|
|
3562050 glBindBufferARB(target = GL_ARRAY_BUFFER, buffer = 1193)
|
|
3562051 glMapBufferRange(target = GL_ARRAY_BUFFER, offset = 0, length = 1792, access = GL_MAP_WRITE_BIT | GL_MAP_INVALIDATE_BUFFER_BIT) = 0xde056000
|
|
3562053 glUnmapBufferARB(target = GL_ARRAY_BUFFER) = GL_TRUE
|
|
3562054 glBindBufferARB(target = GL_ARRAY_BUFFER, buffer = 1194)
|
|
3562055 glMapBufferRange(target = GL_ARRAY_BUFFER, offset = 0, length = 1280, access = GL_MAP_WRITE_BIT | GL_MAP_INVALIDATE_BUFFER_BIT) = 0xd9426000
|
|
3562057 glUnmapBufferARB(target = GL_ARRAY_BUFFER) = GL_TRUE
|
|
[... unrelated draws]
|
|
3563051 glBindBufferARB(target = GL_ARRAY_BUFFER, buffer = 1193)
|
|
3563064 glBindBufferARB(target = GL_ELEMENT_ARRAY_BUFFER, buffer = 875)
|
|
3563065 glDrawElementsInstancedARB(mode = GL_TRIANGLES, count = 72, type = GL_UNSIGNED_SHORT, indices = NULL, instancecount = 28)
|
|
|
|
The ``GL_ARB_sync`` fence ensures that the GPU has started frame n-1 before the CPU
|
|
starts on the current frame.
|
|
|
|
This sequence of buffer uploads appears in each frame with the same buffer
|
|
names, so you do need to handle the ``GL_MAP_INVALIDATE_BUFFER_BIT`` as a
|
|
reallocate if the buffer is GPU-busy (it wasn't in this trace capture) to avoid
|
|
stalls on the n-1 frame completing.
|
|
|
|
Note that this is just one small buffer. Most of the vertex data goes through a
|
|
``glBufferSubData()``/``glDraw*()`` path with the VBO used across multiple
|
|
frames, with a ``glBufferData()`` when needing to wrap.
|
|
|
|
Buffer mapping conclusions
|
|
--------------------------
|
|
|
|
* Non-blitting drivers must track the valid range of a freshly allocated buffer
|
|
as it gets uploaded in ``pipe_transfer_map()`` and avoid stalling on the GPU
|
|
when mapping an undefined portion of the buffer when ``glBufferSubData()`` is
|
|
interleaved with drawing.
|
|
|
|
* Non-blitting drivers must reallocate storage on ``glBufferData(NULL)`` so that
|
|
the following ``glBufferSubData()`` won't stall. That ``glBufferData(NULL)``
|
|
call will appear in the driver as an ``invalidate_resource()`` call if
|
|
``PIPE_CAP_INVALIDATE_BUFFER`` is available. (If that flag is not set, then
|
|
mesa/st will create a new pipe_resource for you). Storage reallocation may be
|
|
skipped if you for some reason know that the buffer is idle, in which case you
|
|
can just empty the valid region.
|
|
|
|
* Blitting drivers must use the ``transfer_flush_region()`` region
|
|
instead of the mapped range when ``PIPE_MAP_FLUSH_EXPLICIT`` is set, to avoid
|
|
blitting too much data. (When that bit is unset, you just blit the whole
|
|
mapped range at unmap time.)
|
|
|
|
* Buffer valid range tracking in non-blitting drivers must use the
|
|
``transfer_flush_region()`` region instead of the mapped range when
|
|
``PIPE_MAP_FLUSH_EXPLICIT`` is set, to avoid excess stalls.
|
|
|
|
* Buffer valid range tracking doesn't need to be fancy, "number of bytes
|
|
valid starting from 0" is sufficient for all examples found.
|
|
|
|
* Use the ``pipe_debug_callback`` to report stalls on buffer mapping to ease
|
|
debug.
|
|
|
|
* Buffer binding points are not useful for tuning buffer placement (See all the
|
|
``PIPE_COPY_WRITE_BUFFER`` instances), you have to track the actual usage
|
|
history of a GL BO name. mesa/st does this for optimizing its state updates
|
|
on reallocation in the ``!PIPE_CAP_INVALIDATE_BUFFER`` case, and if you set
|
|
``PIPE_CAP_INVALIDATE_BUFFER`` then you have to flag your own internal state
|
|
updates (VBO addresses, XFB addresses, texture buffer addresses, etc.) on
|
|
reallocation based on usage history.
|