Commit Graph

1935 Commits

Author SHA1 Message Date
Matt Arsenault
078c435803 AMDGPU: Return correct type during argument lowering
The type needs to be casted back to the original argument type.
Fixes an assert that for some reason is only run when
using -debug.

Includes an additional combine to avoid test regressions
from having conversions mixed with multiple Assert[SZ]ext
nodes. On subtargets where i16 is legal, this was producing an i32
register with an i16 AssertZExt, truncated to i16 with another i8
AssertZExt.

t2: i32,ch = CopyFromReg t0, Register:i32 %vreg0
t3: i16 = truncate t2
t5: i16 = AssertZext t3, ValueType:ch:i8
t6: i8 = truncate t5
t7: i32 = zero_extend t6

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@308082 91177308-0d34-0410-b5e6-96231b3b80d8
2017-07-15 05:52:59 +00:00
Davide Italiano
33778b7f13 [AMDGPU] Throw away more dead code. NFCI.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@308055 91177308-0d34-0410-b5e6-96231b3b80d8
2017-07-14 21:20:29 +00:00
Davide Italiano
9994117767 [AMDGPU] Garbage collect dead code. NFCI.
Unbreaks the build with GCC7.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@308047 91177308-0d34-0410-b5e6-96231b3b80d8
2017-07-14 18:47:29 +00:00
Alfred Huang
1356a150af [AMDGPU] Do not insert an instruction into worklist twice in movetovalu
In moveToVALU(), move to vector ALU is performed, all instrs in
the use chain will be visited. We do not want the same node to be
pushed to the visit worklist more than once.

Differential Revision: https://reviews.llvm.org/D34726

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@308039 91177308-0d34-0410-b5e6-96231b3b80d8
2017-07-14 17:56:55 +00:00
Matt Arsenault
f9915c27c2 AMDGPU: Detect kernarg segment pointer
This is necessary to pass the kernarg segment pointer
to callee functions. Also don't unconditionally enable
for kernels.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@307978 91177308-0d34-0410-b5e6-96231b3b80d8
2017-07-14 00:11:13 +00:00
Stanislav Mekhanoshin
9fc15af9b2 [AMDGPU] fcaninicalize optimization for GFX9+
Since GFX9 supports denorm modes for v_min_f32/v_max_f32 that
is possible to further optimize fcanonicalize and remove it
if applied to min/max given their operands are known not to be
an sNaN or that sNaNs are not supported.

Additionally we can remove fcanonicalize if denorms are supported
for the VT and we know that its argument is never a NaN.

Differential Revision: https://reviews.llvm.org/D35335

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@307976 91177308-0d34-0410-b5e6-96231b3b80d8
2017-07-13 23:59:15 +00:00
Matt Arsenault
a20c1d0cec AMDGPU: Annotate call graph with used features
Previously this wouldn't detect used features indirectly
used in callee functions.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@307967 91177308-0d34-0410-b5e6-96231b3b80d8
2017-07-13 21:43:42 +00:00
Hiroshi Inoue
ff281e5fb6 fix typos in comments and error messges; NFC
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@307885 91177308-0d34-0410-b5e6-96231b3b80d8
2017-07-13 06:48:39 +00:00
Matt Arsenault
ffac88a158 AMDGPU: Fix converting unanalyzable global loads to SMRD
Not all memory dependence queries succeed, so this needs to
be conservative if it fails.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@307861 91177308-0d34-0410-b5e6-96231b3b80d8
2017-07-12 23:06:18 +00:00
Stanislav Mekhanoshin
16be511cb4 [AMDGPU] fcanonicalize elimination optimization
We are using multiplication by 1.0 to flush denormals and quiet sNaNs.
That is possible to omit this multiplication if source of the
fcanonicalize instruction is known to be flushed/quieted, i.e.
if it comes from another instruction known to do the normalization
and we are using IEEE mode to quiet sNaNs.

Differential Revision: https://reviews.llvm.org/D35218

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@307848 91177308-0d34-0410-b5e6-96231b3b80d8
2017-07-12 21:20:28 +00:00
Rafael Espindola
4aebf83110 Fully fix the movw/movt addend.
The issue is not if the value is pcrel. It is whether we have a
relocation or not.

If we have a relocation, the static linker will select the upper
bits. If we don't have a relocation, we have to do it.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@307730 91177308-0d34-0410-b5e6-96231b3b80d8
2017-07-11 23:18:25 +00:00
Evandro Menezes
fdda7ea9d5 [CodeGen] Rename DEBUG_TYPE to match passnames
Rename missing DEBUG_TYPE "machine-scheduler" from backend files, which were
absent from https://reviews.llvm.org/rL303921.

Differential revision: https://reviews.llvm.org/D35231

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@307719 91177308-0d34-0410-b5e6-96231b3b80d8
2017-07-11 22:08:28 +00:00
Konstantin Zhuravlyov
2e2081eea2 Revert "AMDGPU: Do not test for SI in getIsaVersion"
This reverts commit r307573.

This breaks downstream test.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@307678 91177308-0d34-0410-b5e6-96231b3b80d8
2017-07-11 17:57:41 +00:00
Nirav Dave
c7acbe2ea6 Add DAG argument to canMergeStoresTo NFC.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@307583 91177308-0d34-0410-b5e6-96231b3b80d8
2017-07-10 20:25:54 +00:00
Matt Arsenault
d380c14b7a AMDGPU: Allow SIShrinkInstructions to fold FrameIndexes
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@307576 91177308-0d34-0410-b5e6-96231b3b80d8
2017-07-10 20:04:35 +00:00
Matt Arsenault
a038a8340c AMDGPU: Allow SIShrinkInstructions to work in non-SSA
Immediates can be folded as long as the immediate is a vreg.

Also undo commuting instructions if it didn't fold an immediate.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@307575 91177308-0d34-0410-b5e6-96231b3b80d8
2017-07-10 19:53:57 +00:00
Matt Arsenault
7231966089 AMDGPU: Remove unnecessary check for constant operands
An instruction that has an immediate operand can't reach
this point. This is only called for a freshly shrunk instruction,
which prevously couldn't have had a literal constant operand.
This was also not conservative enough since it woudl also have
had to filter other constant-like inputs like frame indexes.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@307574 91177308-0d34-0410-b5e6-96231b3b80d8
2017-07-10 19:33:38 +00:00
Konstantin Zhuravlyov
f392c1f922 AMDGPU: Do not test for SI in getIsaVersion
SI is being tested by isa version in the first two if statements of the function.


git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@307573 91177308-0d34-0410-b5e6-96231b3b80d8
2017-07-10 19:24:05 +00:00
Simon Pilgrim
db24b6e4f7 [AMDGPU] Fix -Wimplicit-fallthrough warning. NFCI.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@307485 91177308-0d34-0410-b5e6-96231b3b80d8
2017-07-08 19:50:03 +00:00
Simon Pilgrim
2541a59ac3 Fix some more -Wimplicit-fallthrough warnings. NFCI.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@307411 91177308-0d34-0410-b5e6-96231b3b80d8
2017-07-07 16:40:06 +00:00
Sam Kolton
f9327929eb [AMDGPU] Assembler: refactor convert methods (VOP3 and MIMG)
Summary: Simplified converter methods for VOP3 and MIMG.

Reviewers: dp, artem.tamazov

Subscribers: arsenm, kzhuravl, wdng, nhaehnle, yaxunl, dstuttard, tpr, vpykhtin, t-tye

Differential Revision: https://reviews.llvm.org/D35047

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@307407 91177308-0d34-0410-b5e6-96231b3b80d8
2017-07-07 15:21:52 +00:00
Dmitry Preobrazhensky
c956bf87e0 [AMDGPU][mc][gfx9] Added support of op_sel/op_sel_hi for V_MAD_MIX*
See https://bugs.llvm.org//show_bug.cgi?id=33595

Reviewers: vpykhtin, artem.tamazov, arsenm

Differential Revision: https://reviews.llvm.org/D35021

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@307402 91177308-0d34-0410-b5e6-96231b3b80d8
2017-07-07 14:29:06 +00:00
Simon Pilgrim
26aa51226a [AMDGPU] Fix -Wimplicit-fallthrough warnings. NFCI.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@307381 91177308-0d34-0410-b5e6-96231b3b80d8
2017-07-07 10:18:57 +00:00
Sean Fertile
471398ffea Extend memcpy expansion in Transform/Utils to handle wider operand types.
Adds loop expansions for known-size and unknown-sized memcpy calls, allowing the
target to provide the operand types through TTI callbacks. The default values
for the TTI callbacks use int8 operand types and matches the existing behaviour
if they aren't overridden by the target.

Differential revision: https://reviews.llvm.org/D32536

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@307346 91177308-0d34-0410-b5e6-96231b3b80d8
2017-07-07 02:00:06 +00:00
Matt Arsenault
8763b3ac42 AMDGPU: Add macro fusion schedule DAG mutation
Try to increase opportunities to shrink vcc uses.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@307313 91177308-0d34-0410-b5e6-96231b3b80d8
2017-07-06 20:57:05 +00:00
Matt Arsenault
92223c6fe5 AMDGPU: Minor cleanup of shrinking logic
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@307312 91177308-0d34-0410-b5e6-96231b3b80d8
2017-07-06 20:56:59 +00:00
Stanislav Mekhanoshin
71b4fe4228 [AMDGPU] Always use rcp + mul with fast math
Regardless of relaxation options such as -cl-fast-relaxed-math
we are producing rather long code for fdiv via amdgcn_fdiv_fast
intrinsic. This intrinsic is used to replace fdiv with 2.5ulp
metadata and does not handle denormals, thus believed to be fast.

An fdiv instruction can also have fast math flag either by itself
or together with fpmath metadata. Clang used with a relaxation flag
always produces both metadata and fast flag:

%div = fdiv fast float %v, %0, !fpmath !12
!12 = !{float 2.500000e+00}

Current implementation ignores fast flag and favors metadata. An
instruction with just fast flag would be lowered to a fastest rcp +
mul, but that never happen on practice because of described mutual
clang and BE behavior.

This change allows an "fdiv fast" to be always lowered as rcp + mul.

Differential Revision: https://reviews.llvm.org/D34844

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@307308 91177308-0d34-0410-b5e6-96231b3b80d8
2017-07-06 20:34:21 +00:00
Craig Topper
6dbd34d261 [Constants] If we already have a ConstantInt*, prefer to use isZero/isOne/isMinusOne instead of isNullValue/isOneValue/isAllOnesValue inherited from Constant. NFCI
Going through the Constant methods requires redetermining that the Constant is a ConstantInt and then calling isZero/isOne/isMinusOne.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@307292 91177308-0d34-0410-b5e6-96231b3b80d8
2017-07-06 18:39:47 +00:00
Quentin Colombet
d268a8d71a [AMDGPU] Move GISel accessor initialization from TargetMachine to Subtarget.
NFC

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@307186 91177308-0d34-0410-b5e6-96231b3b80d8
2017-07-05 18:40:56 +00:00
Alexander Timofeev
f9e9586c80 [AMDGPU] Switch scalarize global loads ON by default
Differential revision: https://reviews.llvm.org/D34407

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@307097 91177308-0d34-0410-b5e6-96231b3b80d8
2017-07-04 17:32:00 +00:00
Marek Olsak
97186bff40 [AMDGPU] Fix latency of MIMG instructions
Patch by cwabbott (Connor Abbott).

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@307081 91177308-0d34-0410-b5e6-96231b3b80d8
2017-07-04 14:43:38 +00:00
NAKAMURA Takumi
0a256123a4 Revert r307026, "[AMDGPU] Switch scalarize global loads ON by default"
It broke a testcase.

  Failing Tests (1):
      LLVM :: CodeGen/AMDGPU/alignbit-pat.ll

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@307054 91177308-0d34-0410-b5e6-96231b3b80d8
2017-07-04 02:14:18 +00:00
Alexander Timofeev
0f9ec97238 [AMDGPU] Switch scalarize global loads ON by default
Differential revision: https://reviews.llvm.org/D34407

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@307026 91177308-0d34-0410-b5e6-96231b3b80d8
2017-07-03 14:54:11 +00:00
Matt Arsenault
ff0022d12c AMDGPU: Add operand target flags serialization
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@306995 91177308-0d34-0410-b5e6-96231b3b80d8
2017-07-02 23:21:48 +00:00
Hiroshi Inoue
71a28cb414 fix trivial typos; NFC
suport -> support



git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@306968 91177308-0d34-0410-b5e6-96231b3b80d8
2017-07-02 03:24:54 +00:00
Matt Arsenault
c278dccfd0 AMDGPU: Remove SITypeRewriter
This was an old workaround for using v16i8 in some old intrinsics
for resource descriptors.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@306603 91177308-0d34-0410-b5e6-96231b3b80d8
2017-06-28 21:38:50 +00:00
Geoff Berry
28b3f06e1a [LoopUnroll] Pass SCEV to getUnrollingPreferences hook. NFCI.
Reviewers: sanjoy, anna, reames, apilipenko, igor-laevsky, mkuper

Subscribers: jholewinski, arsenm, mzolotukhin, nemanjai, nhaehnle, javed.absar, mcrosier, llvm-commits

Differential Revision: https://reviews.llvm.org/D34531

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@306554 91177308-0d34-0410-b5e6-96231b3b80d8
2017-06-28 15:53:17 +00:00
Stanislav Mekhanoshin
8b38a13919 [AMDGPU] Add pattern for v_alignbit_b32 with immediate
If immediate in shift is less than 32 we can use alignbit too.

Differential Revision: https://reviews.llvm.org/D34729

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@306500 91177308-0d34-0410-b5e6-96231b3b80d8
2017-06-28 02:52:39 +00:00
Stanislav Mekhanoshin
040f338ab8 [AMDGPU] Add 2 new alignbit patterns
Differential Revision: https://reviews.llvm.org/D34655

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@306449 91177308-0d34-0410-b5e6-96231b3b80d8
2017-06-27 19:10:47 +00:00
Stanislav Mekhanoshin
e764e24028 [AMDGPU] Simplify setcc (sext from i1 b), -1|0, cc
Depending on the compare code that can be either an argument of
sext or negate of it. This helps to avoid v_cndmask_b64 instruction
for sext. A reversed value can be further simplified and folded into
its parent comparison if possible.

Differential Revision: https://reviews.llvm.org/D34545

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@306446 91177308-0d34-0410-b5e6-96231b3b80d8
2017-06-27 18:53:03 +00:00
Stanislav Mekhanoshin
e2d935510c [AMDGPU] Combine and x, (sext cc from i1) => select cc, x, 0
Also factored out function to check if a boolean is an already
deserialized value which does not require v_cndmask_b32 to be
loaded. Added binary logical operators to its check.

Differential Revision: https://reviews.llvm.org/D34500

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@306439 91177308-0d34-0410-b5e6-96231b3b80d8
2017-06-27 18:25:26 +00:00
Sam Kolton
06ed4a14fd [AMDGPU] SDWA: several fixes for V_CVT and VOPC instructions
Summary:
1. Instruction V_CVT_U32_F32 allow omod operand (see SIInstrInfo.td:1435). In fact this operand shouldn't be allowed here. This fix checks if SDWA pseudo instruction has OMod operand and then copy it.
2. There were several problems with support of VOPC instructions in SDWA peephole pass.

Reviewers: tstellar, arsenm, vpykhtin, airlied, kzhuravl

Subscribers: wdng, nhaehnle, yaxunl, dstuttard, tpr, sarnex, t-tye

Differential Revision: https://reviews.llvm.org/D34626

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@306413 91177308-0d34-0410-b5e6-96231b3b80d8
2017-06-27 15:02:23 +00:00
Hiroshi Inoue
0df653a65e fix trivial typos, NFC
succesor -> successor



git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@306393 91177308-0d34-0410-b5e6-96231b3b80d8
2017-06-27 10:35:37 +00:00
Nicolai Haehnle
7ca35760c5 AMDGPU: M0 operands to spill/restore opcodes are dead
Summary:
With scalar stores, M0 is clobbered and therefore marked as implicitly
defined. However, it is also dead.

This fixes an assertion when the Greedy Register Allocator decides to
optimize a spill/restore pair away again (via tryHintsRecoloring).

Reviewers: arsenm

Subscribers: qcolombet, kzhuravl, wdng, yaxunl, dstuttard, tpr, t-tye, llvm-commits

Differential Revision: https://reviews.llvm.org/D33319

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@306375 91177308-0d34-0410-b5e6-96231b3b80d8
2017-06-27 08:04:13 +00:00
Matt Arsenault
8e828b87b2 AMDGPU: Setup SP/FP in callee function prolog/epilog
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@306312 91177308-0d34-0410-b5e6-96231b3b80d8
2017-06-26 17:53:59 +00:00
Tom Stellard
8d3ca7cfeb AMDGPU/GlobalISel: Mark 32-bit G_SHL as legal
Reviewers: arsenm

Reviewed By: arsenm

Subscribers: kzhuravl, wdng, nhaehnle, yaxunl, rovka, kristof.beyls, igorb, dstuttard, tpr, t-tye, llvm-commits

Differential Revision: https://reviews.llvm.org/D34589

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@306298 91177308-0d34-0410-b5e6-96231b3b80d8
2017-06-26 15:56:52 +00:00
Matt Arsenault
92c7507eee AMDGPU: Whitespace fixes
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@306265 91177308-0d34-0410-b5e6-96231b3b80d8
2017-06-26 03:01:36 +00:00
Matt Arsenault
ec6175c524 AMDGPU: Partially fix implicit.buffer.ptr intrinsic handling
This should not be treated as a different version of
private_segment_buffer. These are distinct things with
different uses and register classes, and requires the
function argument info to have more context about the
function's type and environment.

Also add missing test coverage for the intrinsic, and
emit an error for HSA. This also encovers that the intrinsic
is broken unless there happen to be stack objects.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@306264 91177308-0d34-0410-b5e6-96231b3b80d8
2017-06-26 03:01:31 +00:00
Rafael Espindola
3a48f331ba Remove a processFixupValue hack.
The intention of processFixupValue is not to redefine the semantics of
MCExpr. It is odd enough that a expression lowers to a PCRel MCExpr or
not depending on what it looks like. At least it is a local hack now.

I left a fix for anyone trying to figure out what producers should be
producing a different expression.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@306200 cdac9f57-aa62-4fd3-8940-286f4534e8a0
2017-06-24 05:12:29 +00:00
Rafael Espindola
bfb1e6dd81 Remove redundant argument.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@306189 91177308-0d34-0410-b5e6-96231b3b80d8
2017-06-24 00:26:57 +00:00