Commit Graph

1740 Commits

Author SHA1 Message Date
Scott Linder
7b19ab70e3 [CodeGen] Fix assert in SelectionDAG::computeKnownBits
Fix SelectionDAG::computeKnownBits asserting when handling EXTRACT_SUBVECTOR
when zero extending the demanded elements mask if it is already as long as the
source vector.

Differential Revision: https://reviews.llvm.org/D49574


git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@339600 91177308-0d34-0410-b5e6-96231b3b80d8
2018-08-13 18:44:21 +00:00
Matt Arsenault
1f25a887f6 AMDGPU: Cleanup min/max legacy tests
Also add some more tests in preparation for
a future patch.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@339526 91177308-0d34-0410-b5e6-96231b3b80d8
2018-08-12 19:29:53 +00:00
Matt Arsenault
f0912abc34 DAG: Check no-signed-zeros instead of unsafe-fp-math
Addresses fixme, although this should still be checking individual
operand flags.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@339525 91177308-0d34-0410-b5e6-96231b3b80d8
2018-08-12 19:09:12 +00:00
Matt Arsenault
10398322af AMDGPU: Check NSZ MI flag when folding omod
I'm not sure the exact nsz flag combination that
is OK. I think as long as it's on either, this is OK.
For now just check it on the omod multiply.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@339513 91177308-0d34-0410-b5e6-96231b3b80d8
2018-08-12 08:44:25 +00:00
Matt Arsenault
3b2fa4ee59 AMDGPU: Use splat vectors for undefs when folding canonicalize
If one of the elements is undef, use the canonicalized constant
from the other element instead of 0.

Splat vectors are more useful for other optimizations, such
as matching vector clamps. This was breaking on clamps
of half3 from the undef 4th component.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@339512 91177308-0d34-0410-b5e6-96231b3b80d8
2018-08-12 08:42:54 +00:00
Matt Arsenault
8750be505d AMDGPU: Fix packing undef parts of build_vector
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@339511 91177308-0d34-0410-b5e6-96231b3b80d8
2018-08-12 08:42:46 +00:00
Tom Stellard
202efa7409 AMDGPU/GlobalISel: Define instruction mapping for G_INSERT
Reviewers: arsenm

Reviewed By: arsenm

Subscribers: kzhuravl, wdng, nhaehnle, yaxunl, rovka, kristof.beyls, dstuttard, tpr, t-tye, llvm-commits

Differential Revision: https://reviews.llvm.org/D49625

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@339491 91177308-0d34-0410-b5e6-96231b3b80d8
2018-08-11 00:51:54 +00:00
Matt Arsenault
7f32e5e190 AMDGPU: More canonicalized operations
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@339464 91177308-0d34-0410-b5e6-96231b3b80d8
2018-08-10 19:20:17 +00:00
Matt Arsenault
a13d395b9e AMDGPU: Combine and of seto/setuo and fp_class
Clear the nan (or non-nan) test bits from the mask.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@339462 91177308-0d34-0410-b5e6-96231b3b80d8
2018-08-10 18:58:56 +00:00
Matt Arsenault
4d8cda85ad AMDGPU: Match isfinite pattern to class instructions
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@339460 91177308-0d34-0410-b5e6-96231b3b80d8
2018-08-10 18:58:41 +00:00
Matt Arsenault
f1757fd807 AMDGPU: Error more gracefully on libcalls
I think this is the only situation where the callsite
will have a null instruction.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@339271 91177308-0d34-0410-b5e6-96231b3b80d8
2018-08-08 16:58:39 +00:00
Matt Arsenault
35d3bbfa09 AMDGPU: Fix shifts for i128
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@339270 91177308-0d34-0410-b5e6-96231b3b80d8
2018-08-08 16:58:33 +00:00
Jan Vesely
dcf686e1e7 AMDGPU: Remove broken i16 ternary patterns
Fixup test to check for GCN prefix
These patterns always zero extend the result even though it might need sign extension.
This has been broken since the addition of i16 support.
It has popped up in mad_sat(char) test since min(max()) combination is turned into v_med3, resulting in the following (incorrect) sequence:
        v_mad_i16 v2, v10, v9, v11
        v_med3_i32 v2, v2, v8, v7

Fixes mad_sat(char) piglit on VI.

Differential Revision: https://reviews.llvm.org/D49836

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@339190 91177308-0d34-0410-b5e6-96231b3b80d8
2018-08-07 21:54:37 +00:00
Matt Arsenault
f966a40853 AMDGPU: cvt_pk_rtz_f16 canonicalizes
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@339078 91177308-0d34-0410-b5e6-96231b3b80d8
2018-08-06 23:01:31 +00:00
Matt Arsenault
6ae1bfa7a5 AMDGPU: Handle some vector operations in isCanonicalized
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@339077 91177308-0d34-0410-b5e6-96231b3b80d8
2018-08-06 22:45:51 +00:00
Matt Arsenault
a8868c6067 AMDGPU: Push fcanonicalize through partially constant build_vector
This usually avoids some re-packing code, and may
help find canonical sources.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@339072 91177308-0d34-0410-b5e6-96231b3b80d8
2018-08-06 22:30:44 +00:00
Matt Arsenault
f583815bba AMDGPU: Treat more custom operations as canonicalizing
Everything should quiet, and I think everything should
flush.

I assume the min3/med3/max3 follow the same rules
as regular min/max for flushing, which should at
least be conservatively correct.

There are still more operations that need to
be handled.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@339065 91177308-0d34-0410-b5e6-96231b3b80d8
2018-08-06 21:58:11 +00:00
Matt Arsenault
a0ad797381 AMDGPU: Conversions always produce canonical results
Not sure why this was checking for denormals for f16.
My interpretation of the IEEE standard is conversions
should produce a canonical result, and the ISA manual
says denormals are created when appropriate.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@339064 91177308-0d34-0410-b5e6-96231b3b80d8
2018-08-06 21:51:52 +00:00
Matt Arsenault
f0fa788ac7 AMDGPU: Fix implementation of isCanonicalized
If denormals are enabled, denormals are canonical.
Also fix a few other issues. minnum/maxnum are supposed
to canonicalize. Temporarily improve workaround for the
instruction behavior change in gfx9.

Handle selects and fcopysign.

The tests were also largely broken, since they were
checking for a flush used on some targets after the
store of the result.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@339061 91177308-0d34-0410-b5e6-96231b3b80d8
2018-08-06 21:38:27 +00:00
Matt Arsenault
273374717e AMDGPU: Fold v_lshl_or_b32 with 0 src0
Appears from expansion of some packed cases.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@339025 91177308-0d34-0410-b5e6-96231b3b80d8
2018-08-06 15:40:20 +00:00
Matt Arsenault
c3263d7dee AMDGPU: Rename check prefixes in test
Will avoid noisy diff in future change.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@339022 91177308-0d34-0410-b5e6-96231b3b80d8
2018-08-06 15:16:12 +00:00
Matt Arsenault
7166ee595d DAG: Enhance isKnownNeverNaN
Add a parameter for testing specifically for
sNaNs - at least one instruction pattern on AMDGPU
needs to check specifically for this.

Also handle more cases, and add a target hook
for custom nodes, similar to the hooks for known
bits.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@338910 91177308-0d34-0410-b5e6-96231b3b80d8
2018-08-03 18:27:52 +00:00
Tim Renouf
79905333cd [AMDGPU] Reworked SIFixWWMLiveness
Summary:
I encountered some problems with SIFixWWMLiveness when WWM is in a loop:

1. It sometimes gave invalid MIR where there is some control flow path
   to the new implicit use of a register on EXIT_WWM that does not pass
   through any def.

2. There were lots of false positives of registers that needed to have
   an implicit use added to EXIT_WWM.

3. Adding an implicit use to EXIT_WWM (and adding an implicit def just
   before the WWM code, which I tried in order to fix (1)) caused lots
   of the values to be spilled and reloaded unnecessarily.

This commit is a rework of SIFixWWMLiveness, with the following changes:

1. Instead of considering any register with a def that can reach the WWM
   code and a def that can be reached from the WWM code, it now
   considers three specific cases that need to be handled.

2. A register that needs liveness over WWM to be synthesized now has it
   done by adding itself as an implicit use to defs other than the
   dominant one.

Also added the following fixmes:

FIXME: We should detect whether a register in one of the above
categories is already live at the WWM code before deciding to add the
implicit uses to synthesize its liveness.

FIXME: I believe this whole scheme may be flawed due to the possibility
of the register allocator doing live interval splitting.

Subscribers: arsenm, kzhuravl, wdng, nhaehnle, yaxunl, dstuttard, t-tye, llvm-commits

Differential Revision: https://reviews.llvm.org/D46756

Change-Id: Ie7fba0ede0378849181df3f1a9a7a39ed1a94a94

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@338783 91177308-0d34-0410-b5e6-96231b3b80d8
2018-08-02 23:31:32 +00:00
Tim Renouf
5e96e38d96 [AMDGPU] Avoid using divergent value in mubuf addr64 descriptor
Summary:
This fixes a problem where a load from global+idx generated incorrect
code on <=gfx7 when the index is divergent.

Subscribers: arsenm, kzhuravl, wdng, nhaehnle, yaxunl, dstuttard, t-tye, llvm-commits

Differential Revision: https://reviews.llvm.org/D47383

Change-Id: Ib4d177d6254b1dd3f8ec0203fdddec94bd8bc5ed

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@338779 91177308-0d34-0410-b5e6-96231b3b80d8
2018-08-02 22:53:57 +00:00
Matt Arsenault
2920ef7815 DAG: Fix vector widening fcanonicalize
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@338715 91177308-0d34-0410-b5e6-96231b3b80d8
2018-08-02 13:43:53 +00:00
Matt Arsenault
c9baad19d3 AMDGPU: Fix scalarizing v4f16 fcanonicalize
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@338714 91177308-0d34-0410-b5e6-96231b3b80d8
2018-08-02 13:43:42 +00:00
Matt Arsenault
7943061ff9 AMDGPU: Improve hack for packing conversion ops
Mutate the node type during selection when it
doesn't matter. This avoids an intermediate bitcast
node on targets with legal i16/f16.

Also fixes missing output modifiers on v_cvt_pkrtz_f32_f16,
which I assume are OK.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@338619 91177308-0d34-0410-b5e6-96231b3b80d8
2018-08-01 20:13:58 +00:00
Matt Arsenault
6446fcd61a AMDGPU: Partially fix handling of packed amdgpu_ps arguments
Fixes annoying limitations when writing tests.
Also remove more leftover code for manually scalarizing arguments
and return values.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@338618 91177308-0d34-0410-b5e6-96231b3b80d8
2018-08-01 19:57:34 +00:00
Jan Vesely
bf6429d608 AMDGPU/R600: Convert kernel param loads to use PARAM_I_ADDRESS
Non ext aligned i32 loads are still optimized to use CONSTANT_BUFFER (AS 8)

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@338610 91177308-0d34-0410-b5e6-96231b3b80d8
2018-08-01 18:36:07 +00:00
Ryan Taylor
a9d3893acc [AMDGPU] Optimize _L image intrinsic to _LZ when lod is zero
Summary:
Add _L to _LZ image intrinsic table mapping to table gen.
In ISelLowering check if image intrinsic has lod and if it's equal
to zero, if so remove lod and change opcode to equivalent mapped _LZ.

Change-Id: Ie24cd7e788e2195d846c7bd256151178cbb9ec71

Subscribers: arsenm, mehdi_amini, kzhuravl, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, steven_wu, dexonsmith, llvm-commits

Differential Revision: https://reviews.llvm.org/D49483

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@338523 91177308-0d34-0410-b5e6-96231b3b80d8
2018-08-01 12:12:01 +00:00
Konstantin Zhuravlyov
b47f061f5b AMDGPU: Add clamp bit to dot intrinsics
Differential Revision: https://reviews.llvm.org/D49874


git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@338470 91177308-0d34-0410-b5e6-96231b3b80d8
2018-08-01 01:31:30 +00:00
Matt Arsenault
fa5ec153c2 AMDGPU: Split amdgcn/r600 fminnum/fmaxnum tests
R600 breaks on too many things to usefully test changes
with ieee_mode on vs. off.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@338435 91177308-0d34-0410-b5e6-96231b3b80d8
2018-07-31 20:38:42 +00:00
Matt Arsenault
4b6157df8b AMDGPU: Break 64-bit arguments into 32-bit pieces
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@338421 91177308-0d34-0410-b5e6-96231b3b80d8
2018-07-31 19:29:04 +00:00
Matt Arsenault
b9d99ce19e AMDGPU: Split wide vectors of i16/f16 into 32-bit regs on calls
This improves code for the same reasons as scalarizing 32-bit
element vectors.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@338418 91177308-0d34-0410-b5e6-96231b3b80d8
2018-07-31 19:17:47 +00:00
Matt Arsenault
0a67b1c905 AMDGPU: Scalarize vector argument types to calls
When lowering calling conventions, prefer to decompose vectors
into the constitute register types. This avoids artifical constraints
to satisfy a wide super-register.

This improves code quality because now optimizations don't need to
deal with the super-register constraint. For example the immediate
folding code doesn't deal with 4 component reg_sequences, so by
breaking the register down earlier the existing immediate folding
code is able to work.

This also avoids the need for the shader input processing code
to manually split vector types.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@338416 91177308-0d34-0410-b5e6-96231b3b80d8
2018-07-31 19:05:14 +00:00
Matt Arsenault
48e2f47300 DAG: Fix PromoteFloatResult for fcanonicalize
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@338382 91177308-0d34-0410-b5e6-96231b3b80d8
2018-07-31 14:15:22 +00:00
Matt Arsenault
8f44f41e0f AMDGPU: Fold undef fcanonicalize to qNaN
We could choose a free 0 for this, but this
matches the behavior for fmul undef, 1.0. Also,
the NaN use is more useful for folding use operations
although if it's not eliminated it is more expensive
in terms of code size.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@338376 91177308-0d34-0410-b5e6-96231b3b80d8
2018-07-31 13:34:31 +00:00
Matt Arsenault
8d00765ed1 AMDGPU: Fix test check line bugs
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@338374 91177308-0d34-0410-b5e6-96231b3b80d8
2018-07-31 13:25:23 +00:00
Matt Arsenault
78e0f47487 AMDGPU: Reduce code size with fcanonicalize (fneg x)
When fcanonicalize is lowered to a mul, we can
use -1.0 for free and avoid the cost of the bigger
encoding for source modifers.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@338244 91177308-0d34-0410-b5e6-96231b3b80d8
2018-07-30 12:16:58 +00:00
Matt Arsenault
86dcb58e5d AMDGPU: Make fneg combine handle fcanonicalize
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@338243 91177308-0d34-0410-b5e6-96231b3b80d8
2018-07-30 12:16:47 +00:00
Nicolai Haehnle
bfada8913e AMDGPU: Force skip over s_sendmsg and exp instructions
Summary:
These instructions interact with hardware blocks outside the shader core,
and they can have "scalar" side effects even when EXEC = 0. We don't
want these scalar side effects to occur when all lanes want to skip
these instructions, so always add the execz skip branch instruction
for basic blocks that contain them.

Also ensure that we skip scalar stores / atomics, though we don't
code-gen those yet.

Reviewers: arsenm, rampitec

Subscribers: kzhuravl, wdng, yaxunl, dstuttard, tpr, t-tye, llvm-commits

Differential Revision: https://reviews.llvm.org/D48431

Change-Id: Ieaeb58352e2789ffd64745603c14970c60819d44

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@338235 91177308-0d34-0410-b5e6-96231b3b80d8
2018-07-30 09:23:59 +00:00
Matt Arsenault
c358be353a AMDGPU: Stop wasting argument registers with v3i32/v3f32
SelectionDAGBuilder widens v3i32/v3f32 arguments to
to v4i32/v4f32 which consume an additional register.
In addition to wasting argument space, this produces extra
instructions since now it appears the 4th vector component has
a meaningful value to most combines.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@338197 91177308-0d34-0410-b5e6-96231b3b80d8
2018-07-28 14:11:34 +00:00
Matt Arsenault
3d794576ed AMDGPU: Stop trying to extend arguments for clover
This was trying to replace i8/i16 arguments with i32, which
was broken and no longer necessary.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@338193 91177308-0d34-0410-b5e6-96231b3b80d8
2018-07-28 12:34:25 +00:00
Jan Vesely
0a1753ac2d AMDGPU/R600: Add MOV instructions to BFE patterns
R600 can't handle immediates for BFE, these will be eliminated later.
Fixes powr/pow regressions n r600 since r334817

Differential Revision: https://reviews.llvm.org/D49641

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@338127 91177308-0d34-0410-b5e6-96231b3b80d8
2018-07-27 15:00:13 +00:00
Matt Arsenault
e9c22aa83f AMDGPU: Fix code size for return_to_epilog pseudo
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@338113 91177308-0d34-0410-b5e6-96231b3b80d8
2018-07-27 09:15:03 +00:00
Tom Stellard
6dce5ed08b AMDGPU/GlobalISel: Fix crash in regbankselect on non-power-of-2 types
Reviewers: arsenm

Reviewed By: arsenm

Subscribers: kzhuravl, wdng, nhaehnle, yaxunl, rovka, kristof.beyls, dstuttard, tpr, llvm-commits, t-tye

Differential Revision: https://reviews.llvm.org/D49624

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@338102 91177308-0d34-0410-b5e6-96231b3b80d8
2018-07-27 06:04:40 +00:00
Scott Linder
7531be0d75 [AMDGPU] Fix VGPR spills where offset doesn't fit in 12 bits
Scale the offset of VGPR spills by the wave size when it cannot fit in the
12-bit offset immediate field and so is added to the soffset SGPR. This
accounts for hardware swizzling of scratch memory.

Differential Revision: https://reviews.llvm.org/D49448


git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@338060 91177308-0d34-0410-b5e6-96231b3b80d8
2018-07-26 19:47:51 +00:00
Stanislav Mekhanoshin
a31192b537 [AMDGPU] Use AssumptionCacheTracker in the divrem32 expansion
Differential Revision: https://reviews.llvm.org/D49761

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@337938 91177308-0d34-0410-b5e6-96231b3b80d8
2018-07-25 17:02:11 +00:00
Tom Stellard
6779fccada AMDGPU/GlobalISel: Legalize G_INSERT
Reviewers: arsenm

Subscribers: kzhuravl, wdng, nhaehnle, yaxunl, rovka, kristof.beyls, dstuttard, tpr, t-tye, llvm-commits

Differential Revision: https://reviews.llvm.org/D49601

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@337798 91177308-0d34-0410-b5e6-96231b3b80d8
2018-07-24 02:19:20 +00:00
Matt Arsenault
06b493f7f0 Reapply "AMDGPU: Fix handling of alignment padding in DAG argument lowering"
Reverts r337079 with fix for msan error.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@337535 91177308-0d34-0410-b5e6-96231b3b80d8
2018-07-20 09:05:08 +00:00