------------------------------------------------------------------------
r278268 | nhaehnle | 2016-08-10 11:51:14 -0700 (Wed, 10 Aug 2016) | 28 lines
LiveIntervalAnalysis: fix a crash in repairOldRegInRange
Summary:
See the new test case for one that was (non-deterministically) crashing
on trunk and deterministically hit the assertion that I added in D23302.
Basically, the machine function contains a sequence
DS_WRITE_B32 %vreg4, %vreg14:sub0, ...
DS_WRITE_B32 %vreg4, %vreg14:sub0, ...
%vreg14:sub1<def> = COPY %vreg14:sub0
and SILoadStoreOptimizer::mergeWrite2Pair merges the two DS_WRITE_B32
instructions into one before calling repairIntervalsInRange.
Now repairIntervalsInRange wants to repair %vreg14, in particular, and
ends up trying to repair %vreg14:sub1 as well, but that only becomes
active _after_ the range that is to be repaired, hence the crash due
to LR.find(...) == LR.begin() at the start of repairOldRegInRange.
I believe that just skipping those subrange is fine, but again, not too
familiar with that code.
Reviewers: MatzeB, kparzysz, tstellarAMD
Subscribers: llvm-commits, MatzeB
Differential Revision: https://reviews.llvm.org/D23303
------------------------------------------------------------------------
git-svn-id: https://llvm.org/svn/llvm-project/llvm/branches/release_39@288454 91177308-0d34-0410-b5e6-96231b3b80d8
------------------------------------------------------------------------
r287339 | nhaehnle | 2016-11-18 03:55:52 -0800 (Fri, 18 Nov 2016) | 20 lines
AMDGPU: Fix legalization of MUBUF instructions in shaders
Summary:
The addr64-based legalization is incorrect for MUBUF instructions with idxen
set as well as for BUFFER_LOAD/STORE_FORMAT_* instructions. This affects
e.g. shaders that access buffer textures.
Since we never actually need the addr64-legalization in shaders, this patch
takes the easy route and keys off the calling convention. If this ever
affects (non-OpenGL) compute, the type of legalization needs to be chosen
based on some TSFlag.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=98664
Reviewers: arsenm, tstellarAMD
Subscribers: kzhuravl, wdng, yaxunl, tony-tye, llvm-commits
Differential Revision: https://reviews.llvm.org/D26747
------------------------------------------------------------------------
git-svn-id: https://llvm.org/svn/llvm-project/llvm/branches/release_39@288106 91177308-0d34-0410-b5e6-96231b3b80d8
------------------------------------------------------------------------
r280589 | nhaehnle | 2016-09-03 05:26:32 -0700 (Sat, 03 Sep 2016) | 19 lines
AMDGPU: Fix an interaction between WQM and polygon stippling
Summary:
This fixes a rare bug in polygon stippling with non-monolithic pixel shaders.
The underlying problem is as follows: the prolog part contains the polygon
stippling sequence, i.e. a kill. The main part then enables WQM based on the
_reduced_ exec mask, effectively undoing most of the polygon stippling.
Since we cannot know whether polygon stippling will be used, the main part
of a non-monolithic shader must always return to exact mode to fix this
problem.
Reviewers: arsenm, tstellarAMD, mareko
Subscribers: arsenm, llvm-commits, kzhuravl
Differential Revision: https://reviews.llvm.org/D23131
------------------------------------------------------------------------
git-svn-id: https://llvm.org/svn/llvm-project/llvm/branches/release_39@288105 91177308-0d34-0410-b5e6-96231b3b80d8
------------------------------------------------------------------------
r277504 | nhaehnle | 2016-08-02 12:31:14 -0700 (Tue, 02 Aug 2016) | 21 lines
AMDGPU: Stay in WQM for non-intrinsic stores
Summary:
Two types of stores are possible in pixel shaders: stores to memory that are
explicitly requested at the API level, and stores that are an implementation
detail of register spilling or lowering of arrays.
For the first kind of store, we must ensure that helper pixels have no effect
and hence WQM must be disabled. The second kind of store must always be
executed, because the written value may be loaded again in a way that is
relevant for helper pixels as well -- and there are no externally visible
effects anyway.
This is a candidate for the 3.9 release branch.
Reviewers: arsenm, tstellarAMD, mareko
Subscribers: arsenm, kzhuravl, llvm-commits
Differential Revision: https://reviews.llvm.org/D22675
------------------------------------------------------------------------
git-svn-id: https://llvm.org/svn/llvm-project/llvm/branches/release_39@288104 91177308-0d34-0410-b5e6-96231b3b80d8
------------------------------------------------------------------------
r277500 | nhaehnle | 2016-08-02 12:17:37 -0700 (Tue, 02 Aug 2016) | 18 lines
AMDGPU: Track physical registers in SIWholeQuadMode
Summary:
There are cases where uniform branch conditions are computed in VGPRs, and
we didn't correctly mark those as WQM.
The stray change in basic-branch.ll is because invoking the LiveIntervals
analysis leads to the detection of a dead register that would otherwise
not be seen at -O0.
This is a candidate for the 3.9 branch, as it fixes a possible hang.
Reviewers: arsenm, tstellarAMD, mareko
Subscribers: arsenm, llvm-commits, kzhuravl
Differential Revision: https://reviews.llvm.org/D22673
------------------------------------------------------------------------
git-svn-id: https://llvm.org/svn/llvm-project/llvm/branches/release_39@288103 91177308-0d34-0410-b5e6-96231b3b80d8
------------------------------------------------------------------------
r276051 | arsenm | 2016-07-19 16:16:53 -0700 (Tue, 19 Jul 2016) | 8 lines
AMDGPU: Change fdiv lowering based on !fpmath metadata
If 2.5 ulp is acceptable, denormals are not required, and
isn't a reciprocal which will already be handled, replace
with a faster fdiv.
Simplify the lowering tests by using per function
subtarget features.
------------------------------------------------------------------------
------------------------------------------------------------------------
r276823 | arsenm | 2016-07-26 16:25:44 -0700 (Tue, 26 Jul 2016) | 4 lines
AMDGPU: Use rcp for fdiv 1, x with fpmath metadata
Using rcp should be OK for safe math usually, so this
should not be replacing the original fdiv.
------------------------------------------------------------------------
git-svn-id: https://llvm.org/svn/llvm-project/llvm/branches/release_39@278243 91177308-0d34-0410-b5e6-96231b3b80d8
------------------------------------------------------------------------
r277504 | nha | 2016-08-02 12:31:14 -0700 (Tue, 02 Aug 2016) | 20 lines
AMDGPU: Stay in WQM for non-intrinsic stores
Summary:
Two types of stores are possible in pixel shaders: stores to memory that are
explicitly requested at the API level, and stores that are an implementation
detail of register spilling or lowering of arrays.
For the first kind of store, we must ensure that helper pixels have no effect
and hence WQM must be disabled. The second kind of store must always be
executed, because the written value may be loaded again in a way that is
relevant for helper pixels as well -- and there are no externally visible
effects anyway.
This is a candidate for the 3.9 release branch.
Reviewers: arsenm, tstellarAMD, mareko
Subscribers: arsenm, kzhuravl, llvm-commits
Differential Revision: https://reviews.llvm.org/D22675
------------------------------------------------------------------------
git-svn-id: https://llvm.org/svn/llvm-project/llvm/branches/release_39@277620 91177308-0d34-0410-b5e6-96231b3b80d8
------------------------------------------------------------------------
r277500 | nha | 2016-08-02 12:17:37 -0700 (Tue, 02 Aug 2016) | 17 lines
AMDGPU: Track physical registers in SIWholeQuadMode
Summary:
There are cases where uniform branch conditions are computed in VGPRs, and
we didn't correctly mark those as WQM.
The stray change in basic-branch.ll is because invoking the LiveIntervals
analysis leads to the detection of a dead register that would otherwise not
be seen at -O0.
This is a candidate for the 3.9 branch, as it fixes a possible hang.
Reviewers: arsenm, tstellarAMD, mareko
Subscribers: arsenm, llvm-commits, kzhuravl
Differential Revision: https://reviews.llvm.org/D22673
------------------------------------------------------------------------
git-svn-id: https://llvm.org/svn/llvm-project/llvm/branches/release_39@277619 91177308-0d34-0410-b5e6-96231b3b80d8
------------------------------------------------------------------------
r275869 | arsenm | 2016-07-18 11:34:53 -0700 (Mon, 18 Jul 2016) | 7 lines
AMDGPU: Remove dead check in AMDGPUPromoteAlloca
This is currently only called with GEP users. A direct
alloca would only happen with current typed pointers
for arrays which are a perverse case.
Also fix crashes on 0 x and 1 x arrays.
------------------------------------------------------------------------
git-svn-id: https://llvm.org/svn/llvm-project/llvm/branches/release_39@277077 91177308-0d34-0410-b5e6-96231b3b80d8
------------------------------------------------------------------------
r275928 | arsenm | 2016-07-18 16:09:51 -0700 (Mon, 18 Jul 2016) | 1 line
AMDGPU: Fix test name and broken CHECK-LABEL
------------------------------------------------------------------------
------------------------------------------------------------------------
r276438 | arsenm | 2016-07-22 10:01:33 -0700 (Fri, 22 Jul 2016) | 6 lines
AMDGPU: Fix groupstaticsize for large LDS
The size can exceed s_movk_i32's limit, and we don't
want to use it this early since it inhibits optimizations.
This should probably be merged to the release branch.
------------------------------------------------------------------------
git-svn-id: https://llvm.org/svn/llvm-project/llvm/branches/release_39@276664 91177308-0d34-0410-b5e6-96231b3b80d8
Summary:
The work item intrinsics are not available for the shader
calling conventions. And even if we did hook them up most
shader stages haves some extra restrictions on the amount
of available LDS.
Reviewers: tstellarAMD, arsenm
Subscribers: nhaehnle, arsenm, llvm-commits, kzhuravl
Differential Revision: https://reviews.llvm.org/D20728
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@275779 91177308-0d34-0410-b5e6-96231b3b80d8
In this situation:
%VGPR2<def> = BUFFER_LOAD_DWORD_OFFSET %SGPR8_SGPR9_SGPR10_SGPR11,
%VGPR7<def,tied3> = V_MAC_F32_e32 %VGPR0<undef>, %VGPR1<kill>, %VGPR7<kill,tied0>, %EXEC<imp-use>
%VGPR3_VGPR4_VGPR5_VGPR6<def> = COPY %VGPR0_VGPR1_VGPR2_VGPR3
%VGPR4<def> = COPY %VGPR2
The copy for VGPR1 -> VGPR4 was an error from reading undefined VGPR1,
but VGPR4 is defined immediately after this copy.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@275635 91177308-0d34-0410-b5e6-96231b3b80d8
Mesa still has a use of llvm.AMDGPU.rsq.f64 remaining.
Also fix mismatch with non-IEEE rsq selecting to IEEE rsq.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@275617 91177308-0d34-0410-b5e6-96231b3b80d8
Added emitting metadata to elf for runtime.
Runtime requires certain information (metadata) about kernels to be able to execute and query them. Such information is emitted to an elf section as a key-value pair stream.
Differential Revision: https://reviews.llvm.org/D21849
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@275566 91177308-0d34-0410-b5e6-96231b3b80d8
Also stop trying to insert skip blocks at end_cf. This
was inserting them at the end of the block which doesn't make
sense. The skip should be inserted at the beginning of the block
right after the end cf. Just remove this for now since no tests
seem to stress this and I think this can be handled more generally
later.
Fixes bug 28550
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@275510 91177308-0d34-0410-b5e6-96231b3b80d8
It wasn't actually running the pass, and since it is
missing the llvm prefix, the eh intrinsic was not
really an IntrinsicInst.
Also add missing test for lifetime markers.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@275370 91177308-0d34-0410-b5e6-96231b3b80d8
Currently the MIR framework prints all its outputs (errors and actual
representation) on stderr.
This patch fixes that by printing the regular output in the output
specified with -o.
Differential Revision: http://reviews.llvm.org/D22251
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@275314 91177308-0d34-0410-b5e6-96231b3b80d8
Summary:
Previously, constant index insertelements would be turned into SI_INDIRECT_DST,
which is bound to prevent some optimization opportunities. Worse, it mislead
the heuristic that decides whether immediates should be lowered to S_MOV_B32
or V_MOV_B32 in a way that resulted in unnecessary v_readfirstlanes.
Reviewers: arsenm, tstellarAMD
Subscribers: arsenm, kzhuravl, llvm-commits
Differential Revision: http://reviews.llvm.org/D22217
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@275160 91177308-0d34-0410-b5e6-96231b3b80d8
Summary:
Setting MIMG to 0 has a bunch of unexpected side effects, including that
isVMEM returns false which leads to incorrect treatment in the hazard
recognizer. The reason I noticed it is that it also leads to incorrect
treatment in VGPR-to-SGPR copies, which is one cause of the referenced bug.
The only reason why MIMG was set to 0 is to signal the special handling of
dmasks, but that can be checked differently.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=96877
Reviewers: arsenm, tstellarAMD
Subscribers: arsenm, kzhuravl, llvm-commits
Differential Revision: http://reviews.llvm.org/D22210
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@275113 91177308-0d34-0410-b5e6-96231b3b80d8
Summary:
The main bug fix here is using the 32-bit encoding of V_ADD_I32 in
materializeFrameBaseRegister and resolveFrameIndex, so that arbitrary
immediates work.
The second part is that we may now require the SegmentWaveByteOffset
even when there are initially no stack objects and VGPR spilling isn't
enabled, for stack slots that are allocated later. This means that some
bits become effectively dead and can be cleaned up.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=96602
Tested-by: Kai Wasserbäch <kai@dev.carbon-project.org>
Reviewers: arsenm, tstellarAMD
Subscribers: arsenm, llvm-commits, kzhuravl
Differential Revision: http://reviews.llvm.org/D21551
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@275108 91177308-0d34-0410-b5e6-96231b3b80d8
This only really matters when the index is non-constant since the
constant case already gets taken care of by other combines.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@274569 91177308-0d34-0410-b5e6-96231b3b80d8
Because of the special immediate operand, the constant
bus is already used so SGPRs are never useful.
r263212 changed the name of the immediate operand, which
broke the verifier check for the restriction.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@274564 91177308-0d34-0410-b5e6-96231b3b80d8