If an output has a dual source index > 0 then we need to emit both
outputs, even if the number of outputs is larger than the number
of active outputs.
Signed-off-by: Gert Wollny <gert.wollny@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9874>
Indirect handling was completely missing. And even though we have to
emulate multi-draw, this pushes it out of the fast/hot path in the
driver's draw_vbo()
Signed-off-by: Rob Clark <robdclark@chromium.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9742>
Without TC, we'd get draw->count==0.. which is obviously not correct.
With TC we get random garbage for draw->count. Which turns into
exciting things like trying to allocate multi-gigabyte buffers for
tess param/factor buffers.
But we can just tell the CP to split up large tess draws, and put an
upper bound on the tess param/factor buffer sizes.
Signed-off-by: Rob Clark <robdclark@chromium.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9864>
when building with Meson. It requires libexpat that is not available
on Android and we want to avoid it.
Signed-off-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9848>
nir_to_tgsi lowers nir_op_isub to UADD and the negate source mod
on the second operator. Since r600 doesn't support source mods
for integers but support SUB_INT we switch the opcode and clear the
negate flag.
Signed-off-by: Gert Wollny <gert.wollny@collabora.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9579>
Since the last round of enhanced layout packing, it seems like this code
is no longer needed. This allows us to use the HANDLE_EMIT_BUILTIN()
macro, which simplifies things further.
It would have been incorrect anyway, because it's the only code that
uses the location directly as an index. All other locations multiplies
it by four first.
Reviewed-By: Mike Blumenkrantz <michael.blumenkrantz@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9860>
Fixes the following building error:
In file included from external/mesa/src/amd/addrlib/src/gfx9/gfx9addrlib.cpp:36:
external/mesa/src/amd/addrlib/src/chip/gfx9/gfx9_gb_reg.h:43:2: error: "BIGENDIAN_CPU or LITTLEENDIAN_CPU must be defined"
^
NOTE: Android ABI specifies Little Endian for arm and x86 targets
Fixes: 3616e02ef3 ("amd/addrlib: define endianess differently")
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9869>
If we've added the array, then we should update the info. This is the
value that gallium drivers setting !PIPE_CAP_CLIP_PLANES have to use in
place of rasterizer->clip_planes_enabled.
Reviewed-by: Jesse Natalie <jenatali@microsoft.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9815>
Since image stores can now compress and we can't track image stores
this also stops using predication for DCC decompression.
In GFX10 this was benchmarked to be faster. For GFX10.3 the microbenchmarks
are not as possible though I haven't tested any games, so this is not enabled
there yet.
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6796>
For 16x16 we get 4 16x4 waves, which is bad for DCC image stores.
The workgroup size doesn't really matter for speed, the important
part is the number of waves, which should stay constant here.
(Though some optimization would be nice, but out of scope for this
patch)
The compute DCC compress shader still uses 16x16 due to functional
requirements (and we're sure it won't write with DCC compression ...)
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6796>
We are providing a BO with the default attribute values for the
GL_SHADER_STATE_RECORD, that contains 16 vec4. Such default value for
each vec4 is (0, 0, 0, 1). As the attribute format could be int or
float, the "1" value needs to take into account the attribute format.
But in the practice, the most common case is all floats. So we create
one default attribute values BO assuming that all attributes will be
floats, and we store it at v3dv_device and only create a new one if a
int format type is defined. That allows to reduce the amount of BOs
needed.
Note that we could still try to reduce the amount of BOs used by the
pipelines if we create a bigger BO, and we just play with the
offsets. But as mentioned, that's not the usual, and would add an
extra complexity,so it is not a priority right now.
This makes the following test passing when disabling the pipeline
cache support:
dEQP-VK.api.object_management.max_concurrent.graphics_pipeline
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9845>
The upper 32 bits are truncated before converting, which still produces
correct results since they never meaningfully contribute to the result.
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9597>
The previous code produced incorrect results for inputs outside the
range [INT16_MIN, INT16_MAX].
A problematic case is e.g. i2f16 32768, which previously would be
converted to -32768.0 instead of returning the exactly representable
floating point result.
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9597>