Assumes cmd_buffer != NULL for this path to eliminate the checks in the generic code.
Benchmarks:
I made a microbenchmark based on Bas' vulkan microbench suite.
https://gitlab.freedesktop.org/bnieuwenhuizen/vulkan_microbench/-/merge_requests/1
In this benchmark, this improves vkDescriptorTemplateUpdate performance consistently by 36%.
-------------------------------------------------------------------
Benchmark Time CPU Iterations
-------------------------------------------------------------------
DescriptorTemplateUpdate 81.2 ns 81.2 ns 8573169
->
-------------------------------------------------------------------
Benchmark Time CPU Iterations
-------------------------------------------------------------------
DescriptorTemplateUpdate 52.9 ns 52.9 ns 13306065
Signed-off-by: Joshua Ashton <joshua@froggi.es>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13342>
We have to set 8c01 to say "leave these channels alone" when
clearing/storing just Z or S of z24s8. Fixes the bypass path for
KHR-GLES3.packed_depth_stencil.verify_read_pixels.depth24_stencil8.
Cc: mesa-stable
Fixes: #5592
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13659>
According to Jonathan Marek:
Only one immediate can be decoded in a cat2 instruction (if both srcs
are immediates, they will use the value of the either the first or
second one, I don't remember which) - using 2 immediates in a cat2
instruction is only "correct" if they are both equal.
The (i,j) in the second src of flat.b is not unused, but behaves as 0
for any (small) integer because it is a float src. The hack I
suggested is to set the second src equal to (immediate) first src,
which seems to work.
This allows us to remove a couple of mov instructions or a bit of extra
constfile usage.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13558>
fbfetch descriptors are uncacheable due to having mixed descriptor types
in the same set, so this needs to always use lazy updating to avoid
exploding the cache and crashing
cc: mesa-stable
Reviewed-by: Dave Airlie <airlied@redhat.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13654>
if the driver is requesting that psiz always be set, precompile the first
variant with psiz since that's most likely to be what's eventually used
Acked-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13588>
the context flag is immutable, and it's set by drivers that always
require the shader to export a psiz value for rasterization
"always" is all cases except GL_VERTEX_PROGRAM_POINT_SIZE being
enabled, in which case it should be assumed that the shader already
writes it, and no changes should be made
thus the shader key value can be changed to reflect that, when set,
it requires the shader to export a psiz value if it doesn't already
Acked-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13588>
We rely on mesa/main to tell us whether to update vertex elements.
This decreases overhead for obvious reasons.
The select/feedback code path doesn't use this, which is why you see
unconditional "ALL" in a few codepaths.
This sequence of GL calls doesn't update vertex elements if only
the pointer and stride vary:
glVertexPointer()
glDrawElements()
glVertexPointer()
glDrawElements()
glVertexPointer()
glDrawElements()
Reviewed-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13512>
This splits the per-VAO NewArrays flag into NewVertexBuffers and
NewVertexElements, and adds a global NewVertexElements flag to be used
by drivers.
This allows gallium to skip updating vertex elements when only vertex
buffers need to be updated.
Reviewed-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13512>
There are three approaches for problem:
- use dummy shader (current solution)
- use software rendering
- stub
First option always gonna give bad results, second one
gonna be slideshow (r300/r400 at best are running with
old 2 cores cpu), third one can sometimes give graphical
gliches.
IMHO third one is least annoying option.
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13642>
sparse binds have to be processed synchronously with cmdbuf recording to
avoid resource object desync in the vk driver, which means they have to be
done in the driver thread instead of the flush thread. this necessitates
adding locking for the queue since there is now a case when submissions occur
in a different thread
fixes illegal multithread usage in KHR-GL46.CommonBugs.CommonBug_SparseBuffersWithCopyOps
cc: mesa-stable
Reviewed-by: Dave Airlie <airlied@redhat.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13597>
vulkan requires that vertex attribute access be aligned to the size of
a component for the attribute, but GL has no such requirements
the existing alignment caps are unnecessarily restrictive for applying
this limitation, so this cap now pre-calculates the masks for elements
and vertex buffers in vbuf to enable rewriting misaligned buffers
Acked-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13556>
The single-plane descriptor emit helper doesn't strictly need the UBWC
reloc, since imageBuffer can't be UBWC, but it means the function is ready
to be used for non-buffer image descriptors later.
no-hw drawoverhead 1-imageBuffer change throughput 1.95457% +/- 1.44325%
(n=127).
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13635>
The gmem store stores both depth and stencil for z24s8. So, if we're
doing a write (clear or draw) to one or the other of the channels, we need
the other one restored as well.
Cc: mesa-stable
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13649>
Instead of checking for MESA_SHADER_COMPUTE (and KERNEL). Where
appropriate, also use gl_shader_stage_is_compute().
This allows most of the workgroup-related lowering to be applied to
Task and Mesh shaders. These will be added later and "inherit" from
cs_prog_data structure.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13629>
Enum values for MESA_SHADER_TASK and MESA_SHADER_MESH are larger than
MESA_SHADER_FRAGMENT, so can't rely on the them for ordering anymore.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13637>
We could decouple the locations in the array from the gl_shader_stage
enum values, but for now this is convenient.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13637>
The code is correct, but compiler can't see it. Initialize the value
to NULL and assert on it if the function succeeds. It both helps the
compiler and make the code slightly more robust.
```
../src/intel/vulkan/anv_queue.c: In function ‘anv_QueueSubmit2KHR’:
../src/intel/vulkan/anv_queue.c:932:16: warning: ‘impl’ may be used uninitialized in this function [-Wmaybe-uninitialized]
932 | result = anv_queue_submit_add_timeline_wait(queue, submit,
| ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
933 | &impl->timeline,
| ~~~~~~~~~~~~~~~~
934 | value);
| ~~~~~~
../src/intel/vulkan/anv_queue.c:899:31: note: ‘impl’ was declared here
899 | struct anv_semaphore_impl *impl;
| ^~~~
```
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13638>