This warning detects non-void functions with a missing return statement,
return statements with a value in void functions, and functions with an
bogus return type that ends up defaulting to int. It's already enabled
by default with -Wall. Generally, these are fairly serious bugs in the
code, which developers would like to notice and fix immediately. This
patch promotes it from a warning to an error, to help developers catch
such mistakes early.
I would not expect this warning to change much based on the compiler
version, so hopefully it won't become a problem for packagers/builders.
See the GCC documentation or 'man gcc' for more details:
https://gcc.gnu.org/onlinedocs/gcc-7.3.0/gcc/Warning-Options.html#index-Wreturn-type
Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>
Instead of having weak references to the anv functions and separate
trampoline functions with their own dispatch table, just make the
trampoline functions weak. This gets rid of a dispatch table and
potentially lets the compiler delete the unused weak function. The
end result is a reduction in the .text section of 5.7K and a reduction
in the .data section of 1.4K.
Before:
text data bss dec hex filename
3190329 282232 8960 3481521 351fb1 _install/lib64/libvulkan_intel.so
After:
text data bss dec hex filename
3184548 280792 8960 3474300 35037c _install/lib64/libvulkan_intel.so
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Fixes: 86b4bd52dc ("docs: update calendar, add news item and link
release notes for 18.2.3")
Signed-off-by: Juan A. Suarez Romero <jasuarez@igalia.com>
It's broken, and WGL state tracker is always built with GLES support
noawadays.
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
Trying to access the bus info before it is initialized is not going
to work.
Fixes: baa38c144f "vulkan/wsi: Use VK_EXT_pci_bus_info for DRM fd matching"
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=108491
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Tested-by: Andre Heider <a.heider@gmail.com>
This lets us avoid passing the DRM fd around all over the place and gets
us closer to layer utopia.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>
In that case, we have to wait for the fence to synchronize with the
corresponding drawing we triggered in the X server.
Fixes incorrect display with the i965 driver and some applications, e.g.
solvespace.
Bugzilla: https://bugs.freedesktop.org/108097
Fixes: aefac10fec "loader/dri3: Only wait for back buffer fences in
dri3_get_buffer"
Tested-by: Sergii Romantsov <sergii.romantsov@globallogic.com>
We don't need weak references to instance entrypoints because we never
have more than one of each so we don't need the NULL fall-back. This
also helps us avoid forgetting things because we now get link errors for
missing instance entrypoints.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
This got missed during 1.1 enabling because it was defined as an
interaction between device groups and WSI and it wasn't obvious it was
in the delta.
The idea behind it is that it's supposed to provide a hint to the
application in a multi-GPU setup to indicate which regions of the screen
are being scanned out by which GPU so a multi-device split-screen
rendering application can render each part of the screen on the GPU that
will be presenting it and avoid extra bus traffic between GPUs. On a
single-GPU setup or one which doesn't support this present mode, we need
to do something. We choose to return the window size (or a max-size
rect) if the compositor, X server, or crtc is associated with the given
physical device and zero rectangles otherwise.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
We already have wsi_device and we know the instance allocator at
wsi_device_init time so there's no need to pass it into the physical
device queries. This also fixes a memory allocation domain bug that can
occur if CreateSwapchain gets called prior to any queries (not likely)
in which case the cached connection gets allocated off the device
instead of the instance.
Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
The number of immediate constants was fixed and the size check was
only done by means of an assertion. Given this a shader that emits
more immediate constants would result in a memory corruption when
mesa is build in release mode.
Instead of using this fixed limit allocate the space dynamically, let it
grow as needed, and also remove the unused ImmArray.
Fixes: dEQP-GLES31.functional.ssbo.layout.random.arrays_of_arrays.1
Signed-off-by: Gert Wollny <gert.wollny@collabora.com>
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
Totals from affected shaders:
SGPRS: 1112 -> 1112 (0.00 %)
VGPRS: 1492 -> 1196 (-19.84 %)
Spilled SGPRs: 0 -> 0 (0.00 %)
Spilled VGPRs: 0 -> 0 (0.00 %)
Private memory VGPRs: 0 -> 0 (0.00 %)
Scratch size: 0 -> 0 (0.00 %) dwords per thread
Code Size: 112172 -> 101316 (-9.68 %) bytes
LDS: 0 -> 0 (0.00 %) blocks
Max Waves: 93 -> 98 (5.38 %)
Wait states: 0 -> 0 (0.00 %)
All affected shaders are from "Batman: Arkham City" over DXVK.
The pass detects that the temporary array created by DXVK for
storing TCS inputs is a copy of the input arrays and allows
us to avoid copying all of the input data and then indirecting
on it with if-ladders, instead we just do indirect indexing.
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Offers three clocks, device, clock monotonic and clock monotonic
raw. Could use some kernel support to reduce the deviation between
clock values.
v2:
Ensure deviation is at least as big as the GPU time interval.
v3:
Set device->lost when returning DEVICE_LOST.
Use MAX2 and DIV_ROUND_UP instead of open coding these.
Delete spurious TIMESTAMP in radv version.
Suggested-by: Jason Ekstrand <jason@jlekstrand.net>
Suggested-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
v4:
Add anv_gem_reg_read to anv_gem_stubs.c
Suggested-by: Jason Ekstrand <jason@jlekstrand.net>
v5:
Adjust maxDeviation computation to max(sampled_clock_period) +
sample_interval.
Suggested-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Suggested-by: Jason Ekstrand <jason@jlekstrand.net>
Signed-off-by: Keith Packard <keithp@keithp.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Some of the .dir-locals.el had the wrong name for the truthy value so
it wasn’t setting indent-tabs-mode.
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
Now that a single cmdstream is used for both binning and draw passes, we
can skip allocation of cmdstream buffer for binning.
Signed-off-by: Rob Clark <robdclark@gmail.com>
Now that state which is different for draw vs binning pass is split out
into different state-groups with appropriate enable_mask (so the
appropriate one is chosen for draw vs binning), switch over to using a
single cmdstream for both passes.
This should significantly lower draw overhead for CPU bound benchmarks.
Signed-off-by: Rob Clark <robdclark@gmail.com>
Blob seems to manage to use same input registers for BS (binning pass)
vs VS (draw pass) shaders, so it can use the same VBO state for both.
We can't quite do that yet, so split them.
Signed-off-by: Rob Clark <robdclark@gmail.com>
We don't need to keep this IGNORE_VISIBILITY in binning pass. Prep work
for using single cmdstream for both draw and binning passes.
Signed-off-by: Rob Clark <robdclark@gmail.com>
Move this to after ir3_cp (which can add lowered immediates to the const
state) for a6xx+, to ensure the uniform state matches between binning
and vertex shaders. This way we can emit just a single VS_CONST state-
group when we re-use single cmdstream for both binning and draw passes.
Signed-off-by: Rob Clark <robdclark@gmail.com>
Use the in-memory cache to construct shader program state and re-use it
on subsequent draws, to lower driver overhead.
Signed-off-by: Rob Clark <robdclark@gmail.com>
Cache that maps gallium hwcso (in this case, 'struct ir3_shader') plus
shader variant key to a generation specific state object.
This could eventually replace the linked list of shader variants, but
for now it lets us re-use the work currently done in fdN_program_emit()
Signed-off-by: Rob Clark <robdclark@gmail.com>
Prep work for a following patch, that introduces a cache to map from
program state (all shader stages) plus variant key to pre-baked hw
state (which could be emit'd via CP_SET_DRAW_STATE, for example).
To do that, we really want the variant key to be immutable, and to
treat the binning pass shader as an extra shader stage, rather than
as a VS variant.
Signed-off-by: Rob Clark <robdclark@gmail.com>
Unfortunately gallium doesn't match what the hw wants perfectly here, in
using a separate CSO for each texture/sampler. So we have to use a hash
table to map the collection of texture/samplers to hw state object.
We probably could use separate hw state objects for texture and sampler
state, but mesa/st tends to update the tex and samp state together.
Signed-off-by: Rob Clark <robdclark@gmail.com>
Intended to be something more compact than a 64b pointer, which could be
used as a key into hashtables. Prep work for texture state objects.
Signed-off-by: Rob Clark <robdclark@gmail.com>
Eventually we want to move nearly everything, but no other state depends
on const state, so this is the easiest one to move first.
For webgl aquarium, this reduces GPU load by about 10%, since for each
fish it does a uniform upload plus draw.. fish frequently are visible in
only a single tile, so this skips the uniform uploads for other tiles.
The additional step of avoiding WFI's when using CP_SET_DRAW_STATE seems
to be work an additional 10% gain for aquarium.
Signed-off-by: Rob Clark <robdclark@gmail.com>