Commit Graph

1636 Commits

Author SHA1 Message Date
Michael Berg
56057ccc17 Utilize new SDNode flag functionality to expand current support for fadd
Summary: This patch originated from D46562 and is a proper subset, with some issues addressed.

Reviewers: spatel, hfinkel, wristow, arsenm, javed.absar

Reviewed By: spatel

Subscribers: wdng, nhaehnle

Differential Revision: https://reviews.llvm.org/D47909

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@334996 91177308-0d34-0410-b5e6-96231b3b80d8
2018-06-18 23:44:59 +00:00
Krzysztof Parzyszek
db53e01a53 Shrink interval after moving copy in removePartialRedundancy
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@334963 91177308-0d34-0410-b5e6-96231b3b80d8
2018-06-18 17:16:39 +00:00
Stanislav Mekhanoshin
ec4b3c5670 [AMDGPU] setcc (select cc, CT, CF), CF, eq | ne -> xor cc, -1 | cc
This is the common case in the BE when we serialize condition and then
rematerialize it. Use either original or inverted condition.

Differential Revision: https://reviews.llvm.org/D48246

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@334882 91177308-0d34-0410-b5e6-96231b3b80d8
2018-06-16 03:46:59 +00:00
Michael Berg
97de3c8816 Utilize new SDNode flag functionality to expand current support for fdiv
Summary: This patch originated from D46562 and is a proper subset, with some issues addressed.

Reviewers: spatel, hfinkel, wristow, arsenm

Reviewed By: spatel

Subscribers: wdng, nhaehnle

Differential Revision: https://reviews.llvm.org/D47954

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@334862 91177308-0d34-0410-b5e6-96231b3b80d8
2018-06-15 20:44:55 +00:00
Matt Arsenault
bd0b6b0e98 AMDGPU: Add combine for short vector extract_vector_elts
Try to access pieces 4 bytes at a time. This helps
various hasOneUse extract_vector_elt combines, such
as load width reductions.

Avoids test regressions in a future commit.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@334836 91177308-0d34-0410-b5e6-96231b3b80d8
2018-06-15 15:31:36 +00:00
Matt Arsenault
9e41f5314e AMDGPU: Make v4i16/v4f16 legal
Some image loads return these, and it's awkward working
around them not being legal.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@334835 91177308-0d34-0410-b5e6-96231b3b80d8
2018-06-15 15:15:46 +00:00
Matt Arsenault
bd512dd0f9 DAG: Fix creating concat_vectors with illegal type
Test passes as is, but fails with future patch to make v4i16/v4f16
legal.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@334823 91177308-0d34-0410-b5e6-96231b3b80d8
2018-06-15 12:09:15 +00:00
Roman Lebedev
f2c20b5ace [AMDGPU] Recognize x & ~(-1 << y) pattern.
Summary: The same pattern as D48010, but this one is IR-canonical as of D47428.

Reviewers: nhaehnle, bogner, tstellar, arsenm

Reviewed By: arsenm

Subscribers: arsenm, kzhuravl, wdng, yaxunl, dstuttard, tpr, t-tye, llvm-commits

Tags: #amdgpu

Differential Revision: https://reviews.llvm.org/D48012

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@334817 91177308-0d34-0410-b5e6-96231b3b80d8
2018-06-15 09:56:45 +00:00
Roman Lebedev
fc84800456 [AMDGPU] Recognize x & ((1 << y) - 1) pattern.
Summary:
As a followup for D48007.

Since we already handle `x << (bitwidth - y) >> (bitwidth - y)` pattern,
which does not have ub for both the edge cases (`y == 0`, `y == bitwidth`),
i think also handling a pattern that is ub for `y == bitwidth` should be fine.

Reviewers: nhaehnle, bogner, tstellar, arsenm

Reviewed By: arsenm

Subscribers: arsenm, kzhuravl, wdng, yaxunl, dstuttard, tpr, t-tye, llvm-commits

Tags: #amdgpu

Differential Revision: https://reviews.llvm.org/D48010

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@334816 91177308-0d34-0410-b5e6-96231b3b80d8
2018-06-15 09:56:39 +00:00
Roman Lebedev
1d9a02a498 [AMDGPU] Recognize x & (-1 >> (32 - y)) pattern.
Summary:
D47980 will canonicalize the `x << (32 - y) >> (32 - y)`,
which is the pattern the AMDGPU expects to `x &  (-1 >> (32 - y))`,
which is not recognized by AMDGPU.

Thus, it needs to be recognized, too.

Reviewers: nhaehnle, bogner, tstellar, arsenm

Reviewed By: arsenm

Subscribers: arsenm, kzhuravl, wdng, yaxunl, dstuttard, tpr, t-tye, llvm-commits

Tags: #amdgpu

Differential Revision: https://reviews.llvm.org/D48007

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@334815 91177308-0d34-0410-b5e6-96231b3b80d8
2018-06-15 09:56:31 +00:00
Tom Stellard
b4220f48ed AMDGPU/GlobalISel: Implement select() for @llvm.amdgcn.cvt.pkrtz
Reviewers: arsenm, nhaehnle

Reviewed By: arsenm

Subscribers: kzhuravl, wdng, yaxunl, rovka, kristof.beyls, dstuttard, tpr, t-tye, llvm-commits

Differential Revision: https://reviews.llvm.org/D45907

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@334757 91177308-0d34-0410-b5e6-96231b3b80d8
2018-06-14 19:26:37 +00:00
Tom Stellard
2cf1b47d89 AMDGPU/GlobalISel: Implement select() for 32-bit G_FADD and G_FMUL
Reviewers: arsenm, nhaehnle

Reviewed By: arsenm

Subscribers: kzhuravl, wdng, yaxunl, rovka, kristof.beyls, dstuttard, tpr, t-tye, llvm-commits

Differential Revision: https://reviews.llvm.org/D46171

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@334665 91177308-0d34-0410-b5e6-96231b3b80d8
2018-06-13 22:30:47 +00:00
Stanislav Mekhanoshin
822ea1bfe8 [AMDGPU] Corrected computeKnownBits for V_PERM_B32
Differential Revision: https://reviews.llvm.org/D48133

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@334640 91177308-0d34-0410-b5e6-96231b3b80d8
2018-06-13 18:52:54 +00:00
Yaxun Liu
350359838b [AMDGPU] Change enqueue kernel handle type
Currently the handle type is a global pointer which holds 8 bytes.
We need a larger type which hold 16 bytes, therefore change it
to [i64 x 2].

Differential Revision: https://reviews.llvm.org/D48094


git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@334625 91177308-0d34-0410-b5e6-96231b3b80d8
2018-06-13 17:31:51 +00:00
Krzysztof Parzyszek
be100d0a8d Revert "Improve handling of COPY instructions with identical value numbers"
This reverts r334594, it breaks buildbots and fails with expensive checks.


git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@334598 91177308-0d34-0410-b5e6-96231b3b80d8
2018-06-13 13:49:06 +00:00
Krzysztof Parzyszek
eee75ac731 Improve handling of COPY instructions with identical value numbers
Differential Revision: https://reviews.llvm.org/D48102


git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@334594 91177308-0d34-0410-b5e6-96231b3b80d8
2018-06-13 12:47:17 +00:00
Stanislav Mekhanoshin
6c5eb4370b [AMDGPU] DAG combine to produce V_PERM_B32
Differential Revision: https://reviews.llvm.org/D48099

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@334559 91177308-0d34-0410-b5e6-96231b3b80d8
2018-06-12 23:50:37 +00:00
Konstantin Zhuravlyov
0397fe5863 AMDHSA/NFC: Code object v3 updates (additional):
- Move section selection and alignment to AMDGPUAsmPrinter


git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@334521 91177308-0d34-0410-b5e6-96231b3b80d8
2018-06-12 18:33:51 +00:00
Konstantin Zhuravlyov
299cf5ff6a AMDHSA: Code object v3 updates
- Do not emit following assembler directives:
  - .hsa_code_object_version
  - .hsa_code_object_isa
  - .amd_amdgpu_isa
  - .amd_amdgpu_hsa_metadata
  - .amd_amdgpu_pal_metadata
- Do not emit .note entries
- Cleanup and bring in sync kernel descriptor header file
- Emit kernel descriptor into .rodata with appropriate relocations and
  alignments



git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@334519 91177308-0d34-0410-b5e6-96231b3b80d8
2018-06-12 18:02:46 +00:00
Mark Searles
8f93b43810 [AMDGPU] prevent hitting Assertion `isReg() && "Wrong MachineOperand accessor"'
The use iterator, used within findMaskOperands(), can return anything which is
not a def. isUse() requires a register, so check isReg() before calling isUse().

Differential Revision: https://reviews.llvm.org/D48047

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@334459 91177308-0d34-0410-b5e6-96231b3b80d8
2018-06-12 00:41:26 +00:00
Stanislav Mekhanoshin
7a3f751cbc [AMDGPU] Do not consider indirect acces through phi for wave limiter
Rational: if there is indirect access that is usually an issue
because load is not ready by the use. However, if use is inside a
loop and load is outside that is potentially an issue for a first
iteration only.

Differential Revision: https://reviews.llvm.org/D47740

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@334420 91177308-0d34-0410-b5e6-96231b3b80d8
2018-06-11 16:50:49 +00:00
Roman Lebedev
8a8a729242 [NFC][AMDGPU] Add tests for all the various IR patterns equivalent to extracting low bits.
Summary:
The idiom recognition seems rather poor.
Only the `@bzhi32_d0` produces `v_bfe_u32`.
But they all should.

This needs to be fixed before D47980 can be re-landed.

Reviewers: mareko, bogner, rampitec, arsenm, tstellar, nhaehnle

Reviewed By: nhaehnle

Subscribers: kzhuravl, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, llvm-commits

Tags: #amdgpu

Differential Revision: https://reviews.llvm.org/D48005

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@334398 91177308-0d34-0410-b5e6-96231b3b80d8
2018-06-11 10:21:10 +00:00
Daniil Fukalov
8fbd41bc43 [AMDGPU] Inline asm - added i16, half and i128 types support
AMDGPU inline assembler support i16, half and i128 typed variables in constraints, but they were reported as error.
Needed to fix https://github.com/RadeonOpenCompute/ROCm/issues/341,
e.g. to be able to load with global_load_dwordx4 to a 128bit integer variable

Differential Revision: https://reviews.llvm.org/D44920

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@334301 91177308-0d34-0410-b5e6-96231b3b80d8
2018-06-08 16:29:04 +00:00
Matt Arsenault
0aedafefd1 AMDGPU: Error on LDS global address in functions
These won't work as expected now, so error on them to avoid
wasting time debugging this in the future.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@334269 91177308-0d34-0410-b5e6-96231b3b80d8
2018-06-08 08:05:54 +00:00
Tony Tye
1310a7556e [AMDGPU] Simplify memory legalizer
- Make code easier to maintain.
- Avoid generating waitcnts for VMEM if the address sppace does not involve VMEM.
- Add support to generate waitcnts for LDS and GDS memory.

Differential Revision: https://reviews.llvm.org/D47504



git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@334241 91177308-0d34-0410-b5e6-96231b3b80d8
2018-06-07 22:28:32 +00:00
Matt Arsenault
4dea9c2811 AMDGPU: Fix not including v2f64 in SReg_128
Fixes assertion with calls returning v2f64.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@334189 91177308-0d34-0410-b5e6-96231b3b80d8
2018-06-07 12:16:31 +00:00
Matt Arsenault
86e569e7c6 AMDGPU: Use scalar operations for f16 fabs/fneg patterns
Fixes unnecessary differences between subtargets.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@334184 91177308-0d34-0410-b5e6-96231b3b80d8
2018-06-07 10:15:20 +00:00
Matt Arsenault
75c4f68f53 AMDGPU: Try a lot harder to emit scalar loads
This has two main components. First, widen
widen short constant loads in DAG when they have
the correct alignment. This is already done a bit in
AMDGPUCodeGenPrepare, since that has access to
DivergenceAnalysis. This can't help kernarg loads
created in the DAG. Start to use DAG divergence analysis
to help this case.

The second part is to avoid kernel argument lowering
breaking the alignment of short vector elements because
calling convention lowering wants to split everything
into legal register types.

When loading a split type, load the nearest 4-byte aligned
segment and shift to get the desired bits. This extra
load of the earlier argument piece ends up merging,
and the bit extract hopefully folds out.

There are a number of improvements and regressions with
this, but I think as-is this is a better compromise between
several of the worst parts of SelectionDAG.

Particularly when i16 is legal, this produces worse code
for i8 and i16 element vector kernel arguments. This is
partially due to the very weak load merging the DAG does.
It only looks for fairly specific combines between pairs
of loads which no longer appear. In particular this
causes v4i16 loads to be split into 2 components when
previously the two halves were merged.

Worse, because of the newly introduced shifts, there
is a lot more unnecessary vector packing and unpacking code
emitted. At least some of this is due to reporting
false for isTypeDesirableForOp for i16 as a workaround for
the lack of divergence information in the DAG. The cases
where this happens it doesn't actually matter, but the
relevant code in SimplifyDemandedBits doens't have the context
to know to ignore this.

The use of the  scalar cache is probably more important
than the mess of mostly scalar instructions doing this packing
and unpacking. Future work can fix this, possibly by making better
use of the new DAG divergence information for controlling promotion
decisions, or adding another version of shift + trunc + shift
combines that doesn't only know about the used types.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@334180 91177308-0d34-0410-b5e6-96231b3b80d8
2018-06-07 09:54:49 +00:00
Stanislav Mekhanoshin
ded60aa7f8 [AMDGPU] Improve reciprocal handling
When denormals are supported we are producing a full division for
1.0f / x. That still can be replaced by the faster version:

    bool c = fabs(x) > 0x1.0p+96f;
    float s = c ? 0x1.0p-32f : 1.0f;
    x *= s;
    return s * v_rcp_f32(x)

in case if requested accuracy is 2.5ulp or less. The same version
is used if denormals are not supported for non 1.0 numerators, where
just v_rcp_f32 is then used for 1.0 numerator.

The optimization of 1/x is extended to the case -1/x, which is the
same except for the resulting sign bit.

OpenCL conformance passed with both enabled and disabled denorms.

Differential Revision: https://reviews.llvm.org/D47805

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@334142 91177308-0d34-0410-b5e6-96231b3b80d8
2018-06-06 22:22:32 +00:00
Matt Arsenault
2d78ae2ab1 AMDGPU: Custom lower v2f16 fneg/fabs with illegal f16
Fixes terrible code on targets without f16 support. The
legalization creates a mess that is difficult to recover
from. Also should avoid randomly breaking these tests
multiple times in sequence in future commits.

Some regressions in cases where it happens to be better
to pull the source modifier after the conversion.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@334132 91177308-0d34-0410-b5e6-96231b3b80d8
2018-06-06 21:28:11 +00:00
Matt Arsenault
19dfb4b388 AMDGPU: Preserve metadata when widening loads
Preserves the low bound of the !range. I don't think
it's legal to do anything with the top half since it's
theoretically reading garbage.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@334045 91177308-0d34-0410-b5e6-96231b3b80d8
2018-06-05 19:52:56 +00:00
Matt Arsenault
0c5ab47e3b AMDGPU: Use more custom insert/extract_vector_elt lowering
Apply to i8 vectors.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@334044 91177308-0d34-0410-b5e6-96231b3b80d8
2018-06-05 19:52:46 +00:00
Matt Arsenault
e5fc674090 DAG: Stop dropping invariant/dereferencable
When legalizing illegal FP load results, this was
for some reason dropping the invariant and dereferencable
memory flags. There doesn't seem to be any reason for this,
and the equivalent isn't done for integer loads.

Fixes an issue in a future AMDGPU commit where some identical
loads fail to merge because one of the loads ends up
dropping the flags.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@334020 91177308-0d34-0410-b5e6-96231b3b80d8
2018-06-05 14:52:24 +00:00
Scott Linder
b9cccf44aa [CodeGen] Always update divergence in SelectionDAG::UpdateNodeOperands
Some overloads failed to update divergence.

Differential Revision: https://reviews.llvm.org/D47148



git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@333947 91177308-0d34-0410-b5e6-96231b3b80d8
2018-06-04 20:19:45 +00:00
Mark Searles
8177aafa74 [AMDGPU][Waitcnt] Fix handling of flat instrs
On GFX9 and earlier, flat memory ops may decrement VMCNT out-of-order as well as LGKMCNT out-of-order.

Differential Revision: https://reviews.llvm.org/D46616

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@333926 91177308-0d34-0410-b5e6-96231b3b80d8
2018-06-04 16:51:59 +00:00
Matt Arsenault
0176ae243c AMDGPU: Switch some half using-tests to use amdhsa
The default clover ABI weirdly promotes half to float,
which should probably be fixed.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@333730 91177308-0d34-0410-b5e6-96231b3b80d8
2018-06-01 07:06:03 +00:00
Stanislav Mekhanoshin
e5240df77a [AMDGPU] Construct memory clauses before RA
Memory clauses are formed into bundles in presence of xnack.
Their source operands are marked as early-clobber.

This allows to allocate distinct source and destination registers
within a clause and prevent breaking the clause with s_nop in the
hazard recognizer.

Clauses are undone before post-RA scheduler to allow some rescheduling,
which will not break the clause since artificial edges are created in
the dag to keep memory operations together. Yet this allows a better
ILP in some cases.

Differential Revision: https://reviews.llvm.org/D47511

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@333691 91177308-0d34-0410-b5e6-96231b3b80d8
2018-05-31 20:13:51 +00:00
Stanislav Mekhanoshin
93b037daa0 [AMDGPU] Fixed incorrect -mcpu=gfx800 in xnor.ll test. NFC.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@333687 91177308-0d34-0410-b5e6-96231b3b80d8
2018-05-31 19:39:54 +00:00
Jan Vesely
655741c807 AMDGPU/R600: Make sure functions are cacheline aligned
v2: use "ensureAlignment"
    make functions cache line aligned
Fixes GPU hangs since r333219:
"AMDGPU: Split R600 AsmPrinter code into its own class"

Differential Revision: https://reviews.llvm.org/D47516

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@333622 91177308-0d34-0410-b5e6-96231b3b80d8
2018-05-31 04:08:08 +00:00
Matt Arsenault
664e38cb13 AMDGPU: Use better alignment for kernarg lowering
This was just emitting loads with the ABI alignment
for the raw type. The true alignment is often better,
especially when an illegal vector type was scalarized.
The better alignment allows using a scalar load
more often.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@333558 91177308-0d34-0410-b5e6-96231b3b80d8
2018-05-30 16:17:51 +00:00
Mark Searles
b81ef9cebe [AMDGPU][Waitcnt] Fix handling of loops with many bottom blocks
In terms of waitcnt insertion/if necessary, the waitcnt pass forces convergence
for a loop. Previously, that kicked if greater than 2 passes over a loop, which
doesn't account for loop with many bottom blocks. So, increase the threshold to
(n+1), where n is the number of bottom blocks. This gives the pass an
opportunity to consider the contribution of each bottom block, to the overall
loop, before the forced convergence potentially kicks in.

Differential Revision: https://reviews.llvm.org/D47488

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@333556 91177308-0d34-0410-b5e6-96231b3b80d8
2018-05-30 15:47:45 +00:00
Matt Arsenault
7f19a98b4c AMDGPU: Fix broken check lines
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@333458 91177308-0d34-0410-b5e6-96231b3b80d8
2018-05-29 19:35:53 +00:00
Matt Arsenault
1fa5b55214 AMDGPU: Round up kernel argument allocation size
AFAIK the driver's allocation will actually have to round this
up anyway. It is useful to track the rounded up size, so that
the end of the kernel segment is known to be dereferencable so
a wider s_load_dword can be used for a short argument at the end
of the segment.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@333456 91177308-0d34-0410-b5e6-96231b3b80d8
2018-05-29 19:35:00 +00:00
Konstantin Zhuravlyov
840f423383 AMDGPU: Always set COMPUTE_PGM_RSRC2.ENABLE_TRAP_HANDLER to zero for AMDHSA as
it is set by CP

Differential Revision: https://reviews.llvm.org/D47392



git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@333451 91177308-0d34-0410-b5e6-96231b3b80d8
2018-05-29 19:09:13 +00:00
Tim Renouf
822fd80cbe [AMDGPU] Fixed WWM bug in block otherwise entirely in WQM
Summary:
For a block with WQM on entry and exit and containing no exact mode
code, but containing some WWM code, the WQM pass forgot to process the
block at all and so did not insert code to enter and leave WWM.

This commit fixes that.

Subscribers: arsenm, kzhuravl, wdng, nhaehnle, yaxunl, dstuttard, t-tye, llvm-commits

Differential Revision: https://reviews.llvm.org/D47027

Change-Id: I044792eead1293bed4203fb26ce75f47878afeb6

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@333362 91177308-0d34-0410-b5e6-96231b3b80d8
2018-05-27 17:26:11 +00:00
Mark Searles
d900d78107 [AMDGPU][Waitcnt] Remove obsolete waitcnt option
With the removal of the old waitcnt pass, the '-enable-si-insert-waitcnts' option is obsolete. Remove it.

Differential Revision: https://reviews.llvm.org/D47378

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@333303 91177308-0d34-0410-b5e6-96231b3b80d8
2018-05-25 20:24:08 +00:00
Stanislav Mekhanoshin
277225f527 [AMDGPU] Add perf hints to functions
This is adoption of HSAIL perfhint pass. Two types of hints are produced:

1. Function is memory bound.
2. Kernel can use wave limiter.

Currently these hints are used in the scheduler. If a function is suspected
to be memory bound we allow occupancy to decrease to 4 waves in the course
of scheduling.

Differential Revision: https://reviews.llvm.org/D46992

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@333289 91177308-0d34-0410-b5e6-96231b3b80d8
2018-05-25 17:25:12 +00:00
Tim Renouf
dde1312300 [AMDGPU] Fixed incorrect break from loop
Summary:
Lower control flow did not correctly handle the case that a loop break
in if/else was on a condition that was not guaranteed to be masked by
exec. The first test kernel shows an example of this going wrong; after
exiting the loop, exec is all ones, even if it was not before the loop.

The fix is for lowering of if-break and else-break to insert an
S_AND_B64 to mask the break condition with exec. This commit also
includes the optimization of not inserting that S_AND_B64 if it is
obviously not needed because the break condition is the result of a
V_CMP in the same basic block.

V2: Addressed some review comments.
V3: Test fixes.

Subscribers: arsenm, kzhuravl, wdng, nhaehnle, yaxunl, dstuttard, t-tye, llvm-commits

Differential Revision: https://reviews.llvm.org/D44046

Change-Id: I0fc56a01209a9e99d1d5c9b0ffd16f111caf200c

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@333258 91177308-0d34-0410-b5e6-96231b3b80d8
2018-05-25 07:55:04 +00:00
Changpeng Fang
310e78bb8f StructurizeCFG: Adjust the loop depth for a subregion to order the nodes correctly
Summary:
  StructurizeCFG::orderNodes basically uses a reverse post-order (RPO) traversal of the region list to get the order.
The only problem with it is that sometimes backedges for outer loops will be visited before backedges for inner loops.
To solve this problem, a loop depth based approach has been used to make sure all blocks in this loop has been visited
before moving on to outer loop.

However, we found a problem for a SubRegion which is a loop itself:

--> BB1 --> BB2 --> BB3 -->

In this case, BB2 is a SubRegion (loop), and thus its loopdepth is different than that of BB1 and BB3. This fact will lead
BB2 to be placed in the wrong order.

In this work, we treat the SubRegion as a special case and use its exit block to determine the loop and its depth
to guard the sorting.

Reviewers:
  arsenm, jlebar

Differential Revision:
  https://reviews.llvm.org/D46912

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@333111 91177308-0d34-0410-b5e6-96231b3b80d8
2018-05-23 18:34:48 +00:00
Matt Arsenault
46d48331d6 AMDGPU: Fix missing test coverage for some 16-bit and packed ops
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@333024 91177308-0d34-0410-b5e6-96231b3b80d8
2018-05-22 20:42:00 +00:00