Commit Graph

105226 Commits

Author SHA1 Message Date
Marek Olšák
165817d47f radeonsi: fix a deadlock due to partially-initialized context on CI 2018-10-18 16:08:56 -04:00
Jan Vesely
06bf56725d radeonsi: Bump number of allowed global buffers to 32
Fixes assertion failure/crash when running luxmark/luxball on clover.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=108272
CC: mesa-stable@lists.freedesktop.org
Signed-off-by: Jan Vesely <jan.vesely@rutgers.edu>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2018-10-18 16:02:42 -04:00
Andres Rodriguez
e71a87775e radv: fix check for perftest options size
It was using the debug options array size.

CC: mesa-stable@lists.freedesktop.org
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2018-10-18 15:42:20 -04:00
Marek Olšák
6cc79e4411 radeonsi: fix incorrect hw screen offset and guardband computation
It resulted in assertion failures or incorrect rendering.

Broken by: 9e182b8313
2018-10-18 14:42:42 -04:00
Jason Ekstrand
baa38c144f vulkan/wsi: Use VK_EXT_pci_bus_info for DRM fd matching
This lets us avoid passing the DRM fd around all over the place and gets
us closer to layer utopia.

Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>
2018-10-18 11:29:00 -05:00
Michel Dänzer
c20ba1be18 loader/dri3: Also wait for front buffer fence if we triggered it
In that case, we have to wait for the fence to synchronize with the
corresponding drawing we triggered in the X server.

Fixes incorrect display with the i965 driver and some applications, e.g.
solvespace.

Bugzilla: https://bugs.freedesktop.org/108097
Fixes: aefac10fec "loader/dri3: Only wait for back buffer fences in
                     dri3_get_buffer"
Tested-by: Sergii Romantsov <sergii.romantsov@globallogic.com>
2018-10-18 16:52:06 +02:00
Jason Ekstrand
00bb42105d anv: Stop generating weak references for instance entrypoints
We don't need weak references to instance entrypoints because we never
have more than one of each so we don't need the NULL fall-back.  This
also helps us avoid forgetting things because we now get link errors for
missing instance entrypoints.

Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
2018-10-18 09:17:39 -05:00
Jason Ekstrand
7c65cf9844 vulkan/wsi: Implement GetPhysicalDevicePresentRectanglesKHR
This got missed during 1.1 enabling because it was defined as an
interaction between device groups and WSI and it wasn't obvious it was
in the delta.

The idea behind it is that it's supposed to provide a hint to the
application in a multi-GPU setup to indicate which regions of the screen
are being scanned out by which GPU so a multi-device split-screen
rendering application can render each part of the screen on the GPU that
will be presenting it and avoid extra bus traffic between GPUs.  On a
single-GPU setup or one which doesn't support this present mode, we need
to do something.  We choose to return the window size (or a max-size
rect) if the compositor, X server, or crtc is associated with the given
physical device and zero rectangles otherwise.

Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
2018-10-18 09:17:39 -05:00
Jason Ekstrand
7629c00557 vulkan/wsi: Store the instance allocator in wsi_device
We already have wsi_device and we know the instance allocator at
wsi_device_init time so there's no need to pass it into the physical
device queries.  This also fixes a memory allocation domain bug that can
occur if CreateSwapchain gets called prior to any queries (not likely)
in which case the cached connection gets allocated off the device
instead of the instance.

Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
2018-10-18 09:17:39 -05:00
Michał Janiszewski
0ef50ecc69 st/xlib: Use more appropriate include guard
Signed-off-by: Michał Janiszewski <janisozaur+signed@gmail.com>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com
2018-10-18 11:03:04 +01:00
Michał Janiszewski
bcc613acc1 gallium: Fix mismatched ifdef-guards
Signed-off-by: Michał Janiszewski <janisozaur+signed@gmail.com>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
2018-10-18 11:03:03 +01:00
Gert Wollny
74adc624b6 softpipe: dynamically allocate space for immediate constants
The number of immediate constants was fixed and the size check was
only done by means of an assertion. Given this a shader that emits
more immediate constants would result in a memory corruption when
mesa is build in release mode.

Instead of using this fixed limit allocate the space dynamically, let it
grow as needed, and also remove the unused ImmArray.

Fixes: dEQP-GLES31.functional.ssbo.layout.random.arrays_of_arrays.1

Signed-off-by: Gert Wollny <gert.wollny@collabora.com>
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
2018-10-18 10:59:51 +02:00
Timothy Arceri
3a95396f3c radv: use nir_shrink_vec_array_vars()
Totals from affected shaders:
SGPRS: 1096 -> 1096 (0.00 %)
VGPRS: 1192 -> 1056 (-11.41 %)
Spilled SGPRs: 0 -> 0 (0.00 %)
Spilled VGPRs: 0 -> 0 (0.00 %)
Private memory VGPRs: 0 -> 0 (0.00 %)
Scratch size: 0 -> 0 (0.00 %) dwords per thread
Code Size: 100940 -> 94384 (-6.49 %) bytes
LDS: 0 -> 0 (0.00 %) blocks
Max Waves: 100 -> 112 (12.00 %)
Wait states: 0 -> 0 (0.00 %)

All affected shaders are from Batman Arkham City.

Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2018-10-18 15:04:09 +11:00
Timothy Arceri
8086fa1bcd radv: use nir_split_array_vars()
We call in the opt loop in case another pass results in an
array with indirect access being turned into direct access.

Totals from affected shaders:
SGPRS: 512 -> 496 (-3.12 %)
VGPRS: 456 -> 452 (-0.88 %)
Spilled SGPRs: 0 -> 0 (0.00 %)
Spilled VGPRs: 0 -> 0 (0.00 %)
Private memory VGPRs: 0 -> 0 (0.00 %)
Scratch size: 0 -> 0 (0.00 %) dwords per thread
Code Size: 40040 -> 39664 (-0.94 %) bytes
LDS: 0 -> 0 (0.00 %) blocks
Max Waves: 41 -> 43 (4.88 %)
Wait states: 0 -> 0 (0.00 %)

All affected shaders are from Batman Arkham City.

Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2018-10-18 15:04:09 +11:00
Timothy Arceri
06675711e7 radv: use nir_opt_find_array_copies()
Totals from affected shaders:
SGPRS: 1112 -> 1112 (0.00 %)
VGPRS: 1492 -> 1196 (-19.84 %)
Spilled SGPRs: 0 -> 0 (0.00 %)
Spilled VGPRs: 0 -> 0 (0.00 %)
Private memory VGPRs: 0 -> 0 (0.00 %)
Scratch size: 0 -> 0 (0.00 %) dwords per thread
Code Size: 112172 -> 101316 (-9.68 %) bytes
LDS: 0 -> 0 (0.00 %) blocks
Max Waves: 93 -> 98 (5.38 %)
Wait states: 0 -> 0 (0.00 %)

All affected shaders are from "Batman: Arkham City" over DXVK.

The pass detects that the temporary array created by DXVK for
storing TCS inputs is a copy of the input arrays and allows
us to avoid copying all of the input data and then indirecting
on it with if-ladders, instead we just do indirect indexing.

Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2018-10-18 15:04:09 +11:00
Timothy Arceri
9d5b106b2e radv: use nir_opt_copy_prop_vars and nir_opt_dead_write_vars
Totals from affected shaders:
SGPRS: 2856 -> 2856 (0.00 %)
VGPRS: 3236 -> 3248 (0.37 %)
Spilled SGPRs: 0 -> 0 (0.00 %)
Spilled VGPRs: 0 -> 0 (0.00 %)
Private memory VGPRs: 0 -> 0 (0.00 %)
Scratch size: 0 -> 0 (0.00 %) dwords per thread
Code Size: 236560 -> 233548 (-1.27 %) bytes
LDS: 0 -> 0 (0.00 %) blocks
Max Waves: 277 -> 283 (2.17 %)
Wait states: 0 -> 0 (0.00 %)

Even in the cases were we have increased VGPR use it appears
the NIR is improved significantly.

Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2018-10-18 15:04:09 +11:00
Keith Packard
67a2c1493c vulkan: Add VK_EXT_calibrated_timestamps extension (radv and anv) [v5]
Offers three clocks, device, clock monotonic and clock monotonic
raw. Could use some kernel support to reduce the deviation between
clock values.

v2:
	Ensure deviation is at least as big as the GPU time interval.

v3:
	Set device->lost when returning DEVICE_LOST.
	Use MAX2 and DIV_ROUND_UP instead of open coding these.
	Delete spurious TIMESTAMP in radv version.

	Suggested-by: Jason Ekstrand <jason@jlekstrand.net>
	Suggested-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>

v4:
	Add anv_gem_reg_read to anv_gem_stubs.c

	Suggested-by: Jason Ekstrand <jason@jlekstrand.net>

v5:
	Adjust maxDeviation computation to max(sampled_clock_period) +
	sample_interval.

	Suggested-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
	Suggested-by: Jason Ekstrand <jason@jlekstrand.net>

Signed-off-by: Keith Packard <keithp@keithp.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2018-10-17 20:10:15 -07:00
Topi Pohjolainen
a11cafbd7a intel/compiler/icl: Use invocation id bits 22:16 instead of 23:17
Identifier bits in the dispatch header have changed. See Bspec:

SINGLE_PATCH Payload:

3D Pipeline Stages - 3D Pipeline Geometry -
Hull Shader (HS) Stage IVB+ - Payloads IVB+

Fixes: KHR-GL46.tessellation_shader.tessellation_shader_tc_barriers.barrier_guarded_read_write_calls

Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
Signed-off-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
2018-10-17 21:19:57 +03:00
Neil Roberts
a9475d9337 Fix setting indent-tabs-mode in the Emacs .dir-locals.el files
Some of the .dir-locals.el had the wrong name for the truthy value so
it wasn’t setting indent-tabs-mode.

Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
2018-10-17 19:03:08 +02:00
Rob Clark
d27b1c83b9 freedreno/a6xx: don't allocate binning rb
Now that a single cmdstream is used for both binning and draw passes, we
can skip allocation of cmdstream buffer for binning.

Signed-off-by: Rob Clark <robdclark@gmail.com>
2018-10-17 12:44:49 -04:00
Rob Clark
24d57a6d8f freedreno/a6xx: single cmdstream for draw+binning
Now that state which is different for draw vs binning pass is split out
into different state-groups with appropriate enable_mask (so the
appropriate one is chosen for draw vs binning), switch over to using a
single cmdstream for both passes.

This should significantly lower draw overhead for CPU bound benchmarks.

Signed-off-by: Rob Clark <robdclark@gmail.com>
2018-10-17 12:44:49 -04:00
Rob Clark
72f6164fef freedreno/a6xx: split binning vs draw program stateobj's
Signed-off-by: Rob Clark <robdclark@gmail.com>
2018-10-17 12:44:49 -04:00
Rob Clark
3313d693af freedreno/a6xx: split VBO state into binning/draw variants
Blob seems to manage to use same input registers for BS (binning pass)
vs VS (draw pass) shaders, so it can use the same VBO state for both.
We can't quite do that yet, so split them.

Signed-off-by: Rob Clark <robdclark@gmail.com>
2018-10-17 12:44:49 -04:00
Rob Clark
b23fc4cacb freedreno/a6xx: move VBO state to stateobj
Signed-off-by: Rob Clark <robdclark@gmail.com>
2018-10-17 12:44:49 -04:00
Rob Clark
e194056832 freedreno/a6xx: move ZSA state to stateobj
Step towards single cmdstream, where we need different state-group-id's
for binning vs draw ZSA state.

Signed-off-by: Rob Clark <robdclark@gmail.com>
2018-10-17 12:44:48 -04:00
Rob Clark
a50a9a44e8 freedreno/a6xx: remove vismode param
We don't need to keep this IGNORE_VISIBILITY in binning pass.  Prep work
for using single cmdstream for both draw and binning passes.

Signed-off-by: Rob Clark <robdclark@gmail.com>
2018-10-17 12:44:48 -04:00
Rob Clark
d9dbc9c21f freedreno/ir3: move binning-pass fixup for a6xx+
Move this to after ir3_cp (which can add lowered immediates to the const
state) for a6xx+, to ensure the uniform state matches between binning
and vertex shaders.  This way we can emit just a single VS_CONST state-
group when we re-use single cmdstream for both binning and draw passes.

Signed-off-by: Rob Clark <robdclark@gmail.com>
2018-10-17 12:44:48 -04:00
Rob Clark
1a51c4a87e freedreno/a6xx: a bit more state emit cleanup
Signed-off-by: Rob Clark <robdclark@gmail.com>
2018-10-17 12:44:48 -04:00
Rob Clark
2ffc79c7d1 freedreno/a6xx: move framebuffer state emit to emit_mrt()
No point in checking this per-draw, since framebuffer change means new
batch.

Signed-off-by: Rob Clark <robdclark@gmail.com>
2018-10-17 12:44:48 -04:00
Rob Clark
5894f37b85 freedreno/a6xx: small emit_mrt() cleanup
On a6xx, this is only used for pfb->cbufs so we can just directly pass
the pfb state.

Signed-off-by: Rob Clark <robdclark@gmail.com>
2018-10-17 12:44:48 -04:00
Rob Clark
b4e94af37d freedreno/a6xx: use program cache
Use the in-memory cache to construct shader program state and re-use it
on subsequent draws, to lower driver overhead.

Signed-off-by: Rob Clark <robdclark@gmail.com>
2018-10-17 12:44:48 -04:00
Rob Clark
1d7fbe2cd1 freedreno/ir3: shader variant cache
Cache that maps gallium hwcso (in this case, 'struct ir3_shader') plus
shader variant key to a generation specific state object.

This could eventually replace the linked list of shader variants, but
for now it lets us re-use the work currently done in fdN_program_emit()

Signed-off-by: Rob Clark <robdclark@gmail.com>
2018-10-17 12:44:48 -04:00
Rob Clark
2e9c08c0bc freedreno/ir3: move binning_pass out of shader variant key
Prep work for a following patch, that introduces a cache to map from
program state (all shader stages) plus variant key to pre-baked hw
state (which could be emit'd via CP_SET_DRAW_STATE, for example).
To do that, we really want the variant key to be immutable, and to
treat the binning pass shader as an extra shader stage, rather than
as a VS variant.

Signed-off-by: Rob Clark <robdclark@gmail.com>
2018-10-17 12:44:48 -04:00
Rob Clark
8b1a3b5dde freedreno/ir3: track # of samplers used by shader
This is useful for a6xx to avoid program state from depending on bound
tex/samp state.

Signed-off-by: Rob Clark <robdclark@gmail.com>
2018-10-17 12:44:48 -04:00
Rob Clark
1b9d69410c freedreno/a6xx: texture state obj
Unfortunately gallium doesn't match what the hw wants perfectly here, in
using a separate CSO for each texture/sampler.  So we have to use a hash
table to map the collection of texture/samplers to hw state object.

We probably could use separate hw state objects for texture and sampler
state, but mesa/st tends to update the tex and samp state together.

Signed-off-by: Rob Clark <robdclark@gmail.com>
2018-10-17 12:44:48 -04:00
Rob Clark
e8606b11dd freedreno: add resource seqno
Intended to be something more compact than a 64b pointer, which could be
used as a key into hashtables.  Prep work for texture state objects.

Signed-off-by: Rob Clark <robdclark@gmail.com>
2018-10-17 12:44:48 -04:00
Rob Clark
abcdf5627a freedreno/a6xx: move const emit to state group
Eventually we want to move nearly everything, but no other state depends
on const state, so this is the easiest one to move first.

For webgl aquarium, this reduces GPU load by about 10%, since for each
fish it does a uniform upload plus draw.. fish frequently are visible in
only a single tile, so this skips the uniform uploads for other tiles.

The additional step of avoiding WFI's when using CP_SET_DRAW_STATE seems
to be work an additional 10% gain for aquarium.

Signed-off-by: Rob Clark <robdclark@gmail.com>
2018-10-17 12:44:48 -04:00
Rob Clark
a398d26fd2 freedreno/a6xx: add infrastructure for CP_DRAW_STATE
Add helper to add state-groups to emit, and code to emit CP_DRAW_STATE
packet if we have any state-groups.

Signed-off-by: Rob Clark <robdclark@gmail.com>
2018-10-17 12:44:48 -04:00
Rob Clark
ec717fc629 freedreno: reduce resource dependency tracking overhead
Signed-off-by: Rob Clark <robdclark@gmail.com>
2018-10-17 12:44:48 -04:00
Neil Roberts
ee61790daf freedreno: Remove the Emacs mode lines
These are not necessary because the corresponding settings are set via
the .dir-locals.el file anyway. Most of them were missing a ‘:’ after
“tab-width” which was making Emacs display an annoying warning
whenever you open the file.

This patch was made with:

sed -ri '/-\*- mode:/,/^$/d' \
    $(find src/gallium/{drivers,winsys} -name \*.\[ch\] \
               -exec grep -l -- '-\*- mode:' {} \+)

Signed-off-by: Rob Clark <robdclark@gmail.com>
2018-10-17 12:44:48 -04:00
Neil Roberts
afe640b360 freedreno: Fix the Emacs indentation configuration file
The .dir-locals.el had the wrong name for the truthy value so it
wasn’t setting indent-tabs-mode.

Signed-off-by: Rob Clark <robdclark@gmail.com>
2018-10-17 12:44:48 -04:00
Hyunjun Ko
8e798e28f7 freedreno: allocate batches from the cache in launch_grid
Needs to allocate batches from the cache so that it could
get a valid index and make resource dependancy tracking right.

In addition this fixes assertion on debug build since the commit
1a40faa8 landed.

Signed-off-by: Rob Clark <robdclark@gmail.com>
2018-10-17 12:44:48 -04:00
Hyunjun Ko
2385d7b066 freedreno: adds nondraw param to fd_bc_alloc_batch
Needs to specify nondraw when creating a batch through
fd_bc_alloc_batch since it'd better create a batch through
it rather than fd_batch_create.

Signed-off-by: Rob Clark <robdclark@gmail.com>
2018-10-17 12:44:48 -04:00
Rob Clark
9e6019bd46 freedreno/a6xx: remove fd6_emit_render_cntl()
It was dead code carried over from a5xx

Signed-off-by: Rob Clark <robdclark@gmail.com>
2018-10-17 12:44:48 -04:00
Rob Clark
835cb06965 freedreno/ir3: fix broken texcoord inputs
TODO not sure if this is best solution, but current logic is broken for
texcoord inputs.  It is definitely the simplest solution.

Fixes: 1a24f51966 freedreno/ir3: ignore unused inputs
Signed-off-by: Rob Clark <robdclark@gmail.com>
2018-10-17 12:44:48 -04:00
Rob Clark
cbf9fe50b5 freedreno: fix off-by-one error in BEGIN_RING()
Signed-off-by: Rob Clark <robdclark@gmail.com>
2018-10-17 12:44:48 -04:00
Marek Olšák
669dd22983 util: document a limitation of util_fast_udiv32
trivial
2018-10-17 12:27:58 -04:00
Matt Turner
58a51d0a67 i965/fs: Add 64-bit int immediate support to dump_instructions()
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
2018-10-16 17:48:17 -07:00
Marek Olšák
fcc70e4855 radeonsi: track context rolls better for the Vega scissor bug workaround
We should get fewer context rolls with the SET_CONTEXT_REG optimization,
but it would have been for nothing if the scissor state rolled the context
anyway. Don't emit the scissor state if there is no context roll.
2018-10-16 17:23:25 -04:00
Marek Olšák
25ddb15cfe radeonsi: emit sample locations for 1xAA only when the hw bug is present 2018-10-16 17:23:25 -04:00