Commit Graph

38645 Commits

Author SHA1 Message Date
Petar Avramovic
29902ee1b4 AMDGPU/GlobalISel: Fix negative offset folding for buffer_load
Buffer_load does unsigned offset calculations. Don't fold
operands of 32-bit add that are likely to cause unsigned add
overflow (common case is when one of the operands is negative).

Differential Revision: https://reviews.llvm.org/D91336
2021-04-27 14:45:22 +02:00
Petar Avramovic
b0f45068ce AMDGPU/GlobalISel: Add test for buffer_load with negative offset
Pre-commit test for D91336.
2021-04-27 14:45:21 +02:00
Zarko Todorovski
04b8e84be8 [AIX] Allow safe for 32bit P9 VSX extract and insert pattern matches
In https://reviews.llvm.org/D92789 PPC64 checks were added that disallowed most
VSX pattern matching.  We enable some safe ones for 32bit in this patch.

Reviewed By: nemanjai

Differential Revision: https://reviews.llvm.org/D97503
2021-04-27 07:27:43 -04:00
Petar Avramovic
b83185c286 AMDGPU/GlobalISel: Remove redundant G_FCANONICALIZE
Add basic version of isCanonicalized for global-isel. Copied from sdag.
Add post legalizer combine that deletes G_FCANONICALIZE when its input
is already Canonicalized.

Differential Revision: https://reviews.llvm.org/D96605
2021-04-27 12:26:37 +02:00
Petar Avramovic
495d2a275a AMDGPU/GlobalISel: Add integer med3 combines
Add signed and unsigned integer version of med3 combine.
Source pattern is min(max(Val, K0), K1) or max(min(Val, K1), K0)
where K0 and K1 are constants and K0 <= K1. Destination is med3
that corresponds to signedness of min/max in source.

Differential Revision: https://reviews.llvm.org/D90050
2021-04-27 11:52:23 +02:00
David Sherwood
1d6ba88234 [NFC][SVE] Add tests for inserting subvectors into illegal scalable vectors
A previous commit fixed some issues with inserting subvectors into
illegal scalable vectors:

0035decae7ab9ab1c988fdcede46598540afd1a0

I've created a patch that simply adds some of those same tests for SVE.

Differential Revision: https://reviews.llvm.org/D100641
2021-04-27 09:02:43 +01:00
Chen Zheng
611f6b8043 [XCOFF] make .file directive have directory info
The .file directive is changed to only have basename in D36018 for
ELF.

But on AIX, we require the .file directive to also contain the
directory info. This aligns with other AIX compiler like XLC and is
required by some AIX tool like DBX.

Reviewed By: hubert.reinterpretcast

Differential Revision: https://reviews.llvm.org/D99785
2021-04-27 00:15:23 -04:00
Wang, Pengfei
0711512527 Reapply "[X86][AMX] Try to hoist AMX shapes' def"
We request no intersections between AMX instructions and their shapes'
def when we insert ldtilecfg. However, this is not always ture resulting
from not only users don't follow AMX API model, but also optimizations.

This patch adds a mechanism that tries to hoist AMX shapes' def as well.
It only hoists shapes inside a BB, we can improve it for cases across
BBs in future. Currently, it only hoists shapes of which all sources' def
above the first AMX instruction. We can improve for the case that only
source that moves an immediate value to a register below AMX instruction.

Reviewed By: xiangzhangllvm

Differential Revision: https://reviews.llvm.org/D101067
2021-04-27 10:27:59 +08:00
Yonghong Song
1c5a0c4acc BPF: generate BTF info for LD_imm64 loaded function pointer
For an example like below,
    extern int do_work(int);
    long bpf_helper(void *callback_fn);
    long prog() {
        return bpf_helper(&do_work);
    }

The final generated codes look like:
    r1 = do_work ll
    call bpf_helper
    exit
where we have debuginfo for do_work() extern function:
    !17 = !DISubprogram(name: "do_work", ...)

This patch implemented to add additional checking
in processing LD_imm64 operands for possible function pointers
so BTF for bpf function do_work() can be properly generated.
The original llvm function name processReloc() is renamed to
processGlobalValue() to better reflect what the function is doing.

Differential Revision: https://reviews.llvm.org/D100568
2021-04-26 17:23:36 -07:00
William S. Moses
e7084f2810 [NVPTX] Enable lowering of atomics on local memory
LLVM does not have valid assembly backends for atomicrmw on local memory. However, as this memory is thread local, we should be able to lower this to the relevant load/store.

Differential Revision: https://reviews.llvm.org/D98650
2021-04-26 20:12:12 -04:00
William S. Moses
5b35b95712 Revert "[NVPTX] Enable lowering of atomics on local memory"
This reverts commit fede99d386ec9e7bab2762aa16cb10c0513ae464.
2021-04-26 19:33:01 -04:00
William S. Moses
9ac62ee58d [NVPTX] Enable lowering of atomics on local memory
LLVM does not have valid assembly backends for atomicrmw on local memory. However, as this memory is thread local, we should be able to lower this to the relevant load/store.

Differential Revision: https://reviews.llvm.org/D98650
2021-04-26 19:27:27 -04:00
Craig Topper
8ceb7af3d5 [RISCV] Use stack slot to handle SPLAT_VECTOR_PARTS on RV32.
Reduces the amount of vector ALU operations and reduces vector
register pressure.
2021-04-26 15:43:02 -07:00
Baptiste Saleil
c4a674e21e [AMDGPU] Experiments show that the GCNRegBankReassign pass significantly impacts
the compilation time and there is no case for which we see any improvement in
performance. This patch removes this pass and its associated test cases from
the tree.

Differential Revision: https://reviews.llvm.org/D101313

Change-Id: I0599169a7609c19a887f8d847a71e664030cc141
2021-04-26 17:21:49 -04:00
Craig Topper
ca9962891d [RISCV] Match splatted load to scalar load + splat. Form strided load during isel.
This modifies my previous patch to push the strided load formation
to isel. This gives us opportunity to fold the splat into a .vx
operation first. Using a scalar register and a .vx operation reduces
vector register pressure which can be important for larger LMULs.

If we can't fold the splat into a .vx operation, then it can make
sense to use a strided load to free up the vector arithmetic
ALU to do actual arithmetic rather than tying it up with vmv.v.x.

Reviewed By: khchen

Differential Revision: https://reviews.llvm.org/D101138
2021-04-26 13:32:03 -07:00
Sebastian Neubauer
6b1be183c5 [AMDGPU] Fix autogenerated wwm-reserved-spill.ll
Due to a bug in update_llc_test_checks.py, the test is wrongly
coalesced between run lines. Remove common check prefix to fix that.
NFC.
2021-04-26 19:09:09 +02:00
Sebastian Neubauer
dec086fb0d [AMDGPU] Use MapVector for WWMReservedRegs
Use MapVector instead of SmallDenseMap because it has a deterministic
iteration order.

Differential Revision: https://reviews.llvm.org/D101299
2021-04-26 17:43:00 +02:00
Tim Northover
d8308abd8d AArch64: support atomics in GISel 2021-04-26 14:38:06 +01:00
Bradley Smith
385ac4b46b [AArch64][SVE] Add missing patterns for scalar versions of SQSHL/UQSHL
Differential Revision: https://reviews.llvm.org/D101058
2021-04-26 13:07:12 +01:00
David Green
746a7315fd [ARM] Expand VMOVRRD simplification pattern
This expands the VMOVRRD(extract(..(build_vector(a, b, c, d)))) pattern,
to also handle insert_vectors. Providing we can find the correct insert,
this helps further simplify patterns by removing the redundant VMOVRRD.

Differential Revision: https://reviews.llvm.org/D100245
2021-04-26 12:27:38 +01:00
David Green
bd921d06d1 [ARM] Additional soft float BE test. NFC 2021-04-26 11:44:10 +01:00
David Green
df2cb039cf [ARM] Ensure loop invariant active.lane.mask operands
CGP can move instructions like a ptrtoint into a loop, but the
MVETailPredication when converting them will currently assume invariant
trip counts. This tries to ensure the operands are loop invariant, and
bails if not.

Differential Revision: https://reviews.llvm.org/D100550
2021-04-26 10:04:33 +01:00
Ben Shi
d474a426e5 [RISCV] Optimize addition with immediate
Reviewed by: craig.topper

Differential Revision: https://reviews.llvm.org/D101244
2021-04-26 13:26:17 +08:00
Craig Topper
88feba99f6 [RISCV] Teach DAG combine what bits Zbp instructions demanded from their inputs.
This teaches DAG combine that shift amount operands for grev, gorc
shfl, unshfl only read a few bits.

This also teaches DAG combine that grevw, gorcw, shflw, unshflw,
bcompressw, bdecompressw only consume the lower 32 bits of their
inputs.

In the future we can teach SimplifyDemandedBits to also propagate
demanded bits of the output to the inputs in some cases.
2021-04-25 21:54:06 -07:00
Levy Hsu
b3953bd33d [RISCV] [1/2] Add IR intrinsic for Zbe extension
RV32/64:
bcompress
bdecompress

RV64 ONLY:
bcompressw
bdecompressw

Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D101143
2021-04-25 19:14:34 -07:00
Roman Lebedev
35f6937f2b [NFC][X86][AVX2] Add baseline CodeGen/CostModel tests for interleaved loads/stores of i16 w/ strides 2/3/4
`X86TTIImpl::getInterleavedMemoryOpCostAVX2()` currently contains data
only for a handful of tuples. For now, at least add tests for a few more.

I'm guessing that we care how well the patterns codegen since
we use their presumed cost for vectorization decisions,
so i've added codegen tests too.

There's one really easy caveat for these codegen tests:
for interleaved load tests, we really have to ensure that the
deinterleaved vectors are escaped separately. Similarly for stores.
2021-04-26 01:13:07 +03:00
Simon Pilgrim
622199a58f Revert rG2149aa73f640c96 "[X86] Add support for reusing ZF etc. from locked XADD instructions (PR20841)"
This might be the cause of some msan build failures - I don't have access to a msan build right now, so this is a speculative revert.
2021-04-25 12:45:07 +01:00
Simon Pilgrim
b2f9c3dec2 [X86] Add support for reusing ZF etc. from locked XADD instructions (PR20841)
XADD has the same EFLAGS behaviour as ADD
2021-04-25 12:02:33 +01:00
Simon Pilgrim
b90d4aa510 [X86] Add PR20841 test cases showing failure to reuse ZF from XADD ops 2021-04-25 11:50:18 +01:00
Simon Pilgrim
b5d73cafc6 [X86] Regenerate atomic-flags.ll test file 2021-04-25 11:50:18 +01:00
Xiang1 Zhang
cf25c4dbf6 [X86] Refine AMX fast register allocation 2021-04-25 14:20:53 +08:00
Xiang1 Zhang
6da00a5d84 [X86] Support AMX fast register allocation
Differential Revision: https://reviews.llvm.org/D100026
2021-04-25 09:45:41 +08:00
Dávid Bolvanský
7c4c0f3460 [Analysis] Attribute alignment should not prevent tail call optimization
Fixes tail folding issue mentioned in D100879.
Reviewed By: dmgreen
Differential Revision: https://reviews.llvm.org/D101230
2021-04-24 19:57:42 +02:00
David Green
1503461e4f [AArch64] Enable UseAA globally in the AArch64 backend
This is similar to D69796 from the ARM backend. We remove the UseAA
feature, enabling it globally in the AArch64 backend. This should in
general be an improvement allowing the backend to reorder more
instructions in scheduling and codegen, and enabling it by default helps
to improve the testing of the feature, not making it cpu-specific. A
debugging option is added instead for testing.

Differential Revision: https://reviews.llvm.org/D98781
2021-04-24 17:51:50 +01:00
Michael Kitzan
a3f088c51a [MachineCSE] Prevent CSE of non-local convergent instrs
At the moment, MachineCSE allows CSE-ing convergent instrs which are
non-local to each other. This can cause illegal codegen as convergent
instrs are control flow dependent. The patch prevents non-local CSE of
convergent instrs by adding a check in isProfitableToCSE and rejecting
CSE-ing if we're considering CSE-ing non-local convergent instrs. We
can still CSE convergent instrs which are in the same control flow
scope, so the patch purposely does not make all convergent instrs
non-CSE candidates in isCSECandidate.

https://reviews.llvm.org/D101187
2021-04-23 16:44:48 -07:00
Mitch Phillips
6c51544efa Revert "[X86][AMX] Try to hoist AMX shapes' def"
This reverts commit 90118563ad0f133c696e070ad72761fa0daa4517.

Reason: Broke the MSan buildbots.
https://lab.llvm.org/buildbot/#/builders/5/builds/6967/steps/9/logs/stdio

More details can be found in the original phabricator review:
https://reviews.llvm.org/D101067
2021-04-23 10:42:26 -07:00
Craig Topper
d426ff687e [RISCV] Remove GetVRegNoV0 from the output register class of masked compare pseudo instructions.
Theses instructions are allowed to write v0 when they are masked.
We'll still never use v0 because of the earlyclobber constraint so
this doesn't really help anything. It just makes the definitions
correct.

While I was there remove an unused multiclass I noticed.

Reviewed By: HsiangKai

Differential Revision: https://reviews.llvm.org/D101118
2021-04-23 09:33:29 -07:00
Sebastian Neubauer
1f0605d432 [AMDGPU] Save WWM registers in functions
The values of registers in inactive lanes needs to be saved during
function calls.

Save all registers used for whole wave mode, similar to how it is done
for VGPRs that are used for SGPR spilling.

Differential Revision: https://reviews.llvm.org/D99429

Reapply with fixed tests on window.
2021-04-23 18:09:24 +02:00
Nemanja Ivanovic
572869bea6 [PowerPC] Add vec_ctsl and vec_ctul to altivec.h
These are added for compatibility with XLC. They are similar to
vec_cts and vec_ctu except that the result is a doubleword vector
regardless of the parameter type.
2021-04-23 11:03:38 -05:00
Joe Ellis
0afb9c183a [AArch64][SVE] Fix bug in lowering of fixed-length integer vector divides
The function AArch64TargetLowering::LowerFixedLengthVectorIntDivideToSVE
previously assumed the operands were full vectors, but this is not
always true. This function would produce bogus if the division operands
are not full vectors, resulting in miscompiles when dividing 8-bit or
16-bit vectors.

The fix is to perform an extend + div + truncate for non-full vectors,
instead of the usual unpacking and unzipping logic. This is an additive
change which reduces the non-full integer vector divisions to a pattern
recognised by the existing lowering logic.

For future reference, an example of code that would miscompile before
this patch is below:

     1  int8_t foo(unsigned N, int8_t *a, int8_t *b, int8_t *c) {
     2      int8_t result = 0;
     3      for (int i = 0; i < N; ++i) {
     4          result += (a[i] / b[i]) / c[i];
     5      }
     6      return result;
     7  }

Differential Revision: https://reviews.llvm.org/D100370
2021-04-23 14:55:10 +00:00
Jay Foad
982c01a843 [AMDGPU] Fix typo in implicit operand lists
Several tests had a typo where they mentioned sgpr17 twice instead of
sgpr17 and sgpr27. This had a significant effect on the
"scavenge_sgpr_pei_no_sgprs" tests because there was actually an sgpr
available, namely sgpr27.

Differential Revision: https://reviews.llvm.org/D100960
2021-04-23 15:44:17 +01:00
Sebastian Neubauer
8b5cb86ad7 Revert "[AMDGPU] Save WWM registers in functions"
This reverts commit 91464c30bfcf731ccb7f9d6ef6d26e8c1657a6e6.

Seems to break tests on windows.
2021-04-23 16:38:50 +02:00
Piotr Sobczak
099aac7b88 [AMDGPU][NFC] Update auto-gen test
Most likely the "glc" was not added to the test when
the volatile loads started generating those bits.
2021-04-23 16:33:16 +02:00
Sebastian Neubauer
ec74f9a23a [AMDGPU] Save WWM registers in functions
The values of registers in inactive lanes needs to be saved during
function calls.

Save all registers used for whole wave mode, similar to how it is done
for VGPRs that are used for SGPR spilling.

Differential Revision: https://reviews.llvm.org/D99429
2021-04-23 16:09:31 +02:00
Simon Pilgrim
a7d1c869bb [X86] Add Win32/64 mulo test coverage
Part of an investigation to solve the windows regressions caused by rG13ec913bdf50
2021-04-23 14:51:42 +01:00
Matt Arsenault
182934f750 AMDGPU: Fix assert on inline asm on gfx90a
This was assuming all mayLoad instructions have one def.
2021-04-23 09:00:25 -04:00
Fraser Cormack
23b998863a [RISCV] Custom lower vector F(MIN|MAX)NUM to vf(min|max)
This patch adds support for both scalable- and fixed-length vector code
lowering of the llvm.minnum and llvm.maxnum intrinsics to the equivalent
RVV instructions.

Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D101035
2021-04-23 12:22:15 +01:00
Daniel Kiss
30b326d46e [AArch64] Fix for BTI landing pad insertion with PAC-RET+bkey.
EMITBKEY is emitted for PAC-RET+bkey, which is a non machine instructions.

PR: 49957

Reviewed By: eugenis

Differential Revision: https://reviews.llvm.org/D100996
2021-04-23 10:07:25 +02:00
Wang, Pengfei
d73d62d45f [X86][AMX] Try to hoist AMX shapes' def
We request no intersections between AMX instructions and their shapes'
def when we insert ldtilecfg. However, this is not always ture resulting
from not only users don't follow AMX API model, but also optimizations.

This patch adds a mechanism that tries to hoist AMX shapes' def as well.
It only hoists shapes inside a BB, we can improve it for cases across
BBs in future. Currently, it only hoists shapes of which all sources' def
above the first AMX instruction. We can improve for the case that only
source that moves an immediate value to a register below AMX instruction.

Differential Revision: https://reviews.llvm.org/D101067
2021-04-23 12:17:00 +08:00
Wang, Pengfei
d7776e3283 [X86] Enable compilation of user interrupt handlers.
Add __uintr_frame structure and use UIRET instruction for functions with
x86 interrupt calling convention when UINTR is present.

Reviewed By: LuoYuanke

Differential Revision: https://reviews.llvm.org/D99708
2021-04-23 11:43:57 +08:00