Commit Graph

30284 Commits

Author SHA1 Message Date
Jinsong Ji
e09833c6e9 [UpdateChecks] Add support for armv7-apple-darwin
armv7-apple-darwin was not supported well, the script can't generate
checks.

https://reviews.llvm.org/D60601/new/#inline-568671

Differential Revision: https://reviews.llvm.org/D63939

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@364668 91177308-0d34-0410-b5e6-96231b3b80d8
2019-06-28 18:07:19 +00:00
Simon Pilgrim
08a4b34e50 [X86] CombineShuffleWithExtract - recurse through EXTRACT_SUBVECTOR chain
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@364667 91177308-0d34-0410-b5e6-96231b3b80d8
2019-06-28 17:57:32 +00:00
Roman Lebedev
ae7898988a [NFC][Codegen] Revisit test coverage for X % C == 0 fold once more (add tests with '1' divisor)
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@364661 91177308-0d34-0410-b5e6-96231b3b80d8
2019-06-28 17:26:28 +00:00
Sam Tebbs
a71ac42554 [ARM] Add support for the MVE long shift instructions
MVE adds the lsll, lsrl and asrl instructions, which perform a shift on a 64 bit value separated into two 32 bit registers.

The Expand64BitShift function is modified to accept ISD::SHL, ISD::SRL and ISD::SRA and convert it into the appropriate opcode in ARMISD. An SHL is converted into an lsll, an SRL is converted into an lsrl for the immediate form and a negation and lsll for the register form, and SRA is converted into an asrl.

test/CodeGen/ARM/shift_parts.ll is added to test the logic of emitting these instructions.

Differential Revision: https://reviews.llvm.org/D63430

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@364654 91177308-0d34-0410-b5e6-96231b3b80d8
2019-06-28 15:43:31 +00:00
Dmitry Preobrazhensky
3c357a49a3 [AMDGPU][MC] Enabled constant expressions as operands of sendmsg
See bug 40820: https://bugs.llvm.org/show_bug.cgi?id=40820

Reviewers: artem.tamazov, arsenm

Differential Revision: https://reviews.llvm.org/D62735

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@364645 91177308-0d34-0410-b5e6-96231b3b80d8
2019-06-28 14:14:02 +00:00
Simon Pilgrim
ef0e051c6d [X86] CombineShuffleWithExtract - only require 1 source to be EXTRACT_SUBVECTOR
We were requiring that both shuffle operands were EXTRACT_SUBVECTORs, but we can relax this to only require one of them to be.

Also, we shouldn't bother attempting this if both operands are from the lowest subvector (or not EXTRACT_SUBVECTOR at all).

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@364644 91177308-0d34-0410-b5e6-96231b3b80d8
2019-06-28 12:24:49 +00:00
David Green
5f0953343c [ARM] Add MVE mul patterns
This simply adds integer and floating point VMUL patterns for MVE, same as we
have add and sub.

Differential Revision: https://reviews.llvm.org/D63866


git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@364643 91177308-0d34-0410-b5e6-96231b3b80d8
2019-06-28 11:44:03 +00:00
Roman Lebedev
bae9441dbd [NFC][Codegen] Revisit test coverage for X % C == 0 fold
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@364642 91177308-0d34-0410-b5e6-96231b3b80d8
2019-06-28 11:36:34 +00:00
David Green
2dd6a9d6de [ARM] Mark math routines as non-legal for MVE
This adds handling and tests for a number of floating point math routines,
which have no MVE instructions.

Differential Revision: https://reviews.llvm.org/D63725


git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@364641 91177308-0d34-0410-b5e6-96231b3b80d8
2019-06-28 11:17:38 +00:00
David Green
4cbe79049c [ARM] MVE patterns for VABS and VNEG
This simply adds the required patterns for fp neg and abs.

Differential Revision: https://reviews.llvm.org/D63861


git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@364640 91177308-0d34-0410-b5e6-96231b3b80d8
2019-06-28 10:25:35 +00:00
David Green
0272917d5a [ARM] Widening loads and narrowing stores
MVE has instructions to widen as it loads, and narrow as it stores. This adds
the required patterns and legalisation to make them work including specifying
that they are legal, patterns to select them and test changes.

Patch by David Sherwood.

Differential Revision: https://reviews.llvm.org/D63839


git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@364636 91177308-0d34-0410-b5e6-96231b3b80d8
2019-06-28 09:47:55 +00:00
David Green
c2c9f7413e [ARM] MVE loads and stores
This fills in the gaps for basic MVE loads and stores, allowing unaligned
access and adding far too many tests. These will become important as
narrowing/expanding and pre/post inc are added. Big endian might still not be
handled very well, because we have not yet added bitcasts (and I'm not sure how
we want it to work yet). I've included the alignment code anyway which maps
with our current patterns. We plan to return to that later.

Code written by Simon Tatham, with additional tests from Me and Mikhail Maltsev.

Differential Revision: https://reviews.llvm.org/D63838


git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@364633 91177308-0d34-0410-b5e6-96231b3b80d8
2019-06-28 08:41:40 +00:00
David Green
67091fc7e1 [ARM] Mark div and rem as expand for MVE
We don't have vector operations for these, so they need to be expanded for both
integer and float.

Differential Revision: https://reviews.llvm.org/D63595


git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@364631 91177308-0d34-0410-b5e6-96231b3b80d8
2019-06-28 08:18:55 +00:00
David Green
03b8387d14 [ARM] Select MVE fp add and sub
The same as integer arithmetic, we can add simple floating point MVE addition and
subtraction patterns.

Initial code by David Sherwood

Differential Revision: https://reviews.llvm.org/D63257


git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@364629 91177308-0d34-0410-b5e6-96231b3b80d8
2019-06-28 07:41:09 +00:00
David Green
93744b32bb [ARM] Select MVE add and sub
This adds the first few patterns for MVE code generation, adding simple integer
add and sub patterns.

Initial code by David Sherwood

Differential Revision: https://reviews.llvm.org/D63255


git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@364627 91177308-0d34-0410-b5e6-96231b3b80d8
2019-06-28 07:21:11 +00:00
David Green
3b9e91d904 [ARM] MVE vector shuffles
This patch adds necessary shuffle vector and buildvector support for ARM MVE.
It essentially adds support for VDUP, VREVs and some VMOVs, which are often
required by other code (like upcoming patches).

This mostly uses the same code from Neon that already generated
NEONvdup/NEONvduplane/NEONvrev's. These have been renamed to ARMvdup/etc and
moved to ARMInstrInfo as they are common to both architectures. Most of the
selection code seems to be applicable to both, but NEON does have some more
instructions making some parts specific.

Most code originally by David Sherwood.

Differential Revision: https://reviews.llvm.org/D63567


git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@364626 91177308-0d34-0410-b5e6-96231b3b80d8
2019-06-28 07:08:42 +00:00
Stanislav Mekhanoshin
41e9ec03c7 [AMDGPU] Packed thread ids in function call ABI
Differential Revision: https://reviews.llvm.org/D63851

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@364619 91177308-0d34-0410-b5e6-96231b3b80d8
2019-06-28 01:52:13 +00:00
Amara Emerson
b9f4ed6752 [GlobalISel][IRTranslator] Fix some PHI bugs related to jump tables when optimizations are used.
The new switch lowering code that tries to generate jump tables and range checks
were tested at -O0 on arm64, but on -O3 the generic switch lowering code goes to
town on trying to generate optimized lowerings, e.g. multiple jump tables, range
checks etc. This exposed bugs in the way PHI nodes are handled because the CFG
looks even stranger after all of this is done.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@364613 91177308-0d34-0410-b5e6-96231b3b80d8
2019-06-27 23:56:34 +00:00
Roman Lebedev
65afd3a0fd [CodeGen] [SelectionDAG] More efficient code for X % C == 0 (UREM case) (try 3)
Summary:
I'm submitting a new revision since i don't understand how to reclaim/reopen/take over the existing one, D50222.
There is no such action in "Add Action" menu...

This implements an optimization described in Hacker's Delight 10-17: when `C` is constant,
the result of `X % C == 0` can be computed more cheaply without actually calculating the remainder.
The motivation is discussed here: https://bugs.llvm.org/show_bug.cgi?id=35479.

This is a recommit, the original commit rL364563 was reverted in rL364568
because test-suite detected miscompile - the new comparison constant 'Q'
was being computed incorrectly (we divided by `D0` instead of `D`).

Original patch D50222 by @hermord (Dmytro Shynkevych)

Notes:
- In principle, it's possible to also handle the `X % C1 == C2` case, as discussed on bugzilla.
  This seems to require an extra branch on overflow, so I refrained from implementing this for now.
- An explicit check for when the `REM` can be reduced to just its LHS is included:
  the `X % C` == 0 optimization breaks `test1` in `test/CodeGen/X86/jump_sign.ll` otherwise.
  I hadn't managed to find a better way to not generate worse output in this case.
- The `test/CodeGen/X86/jump_sign.ll` regresses, and is being fixed by a followup patch D63390.

Reviewers: RKSimon, craig.topper, spatel, hermord, xbolva00

Reviewed By: RKSimon, xbolva00

Subscribers: dexonsmith, kristina, xbolva00, javed.absar, llvm-commits, hermord

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D63391

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@364600 91177308-0d34-0410-b5e6-96231b3b80d8
2019-06-27 21:52:10 +00:00
Sanjay Patel
f5c10ec47c [x86] remove whitespace; NFC
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@364588 91177308-0d34-0410-b5e6-96231b3b80d8
2019-06-27 20:37:12 +00:00
Sanjay Patel
df3cc01106 [x86] prevent crashing from select narrowing with AVX512
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@364585 91177308-0d34-0410-b5e6-96231b3b80d8
2019-06-27 20:16:58 +00:00
Roman Lebedev
93f902be53 [NFC][CodeGen] Add negative test for X u% C == 0 fold (D63391)
The fold (D63391) uses multiplicativeInverse(),
but it is not guaranteed to always succeed,
and '100' appears to be one of the problematic values.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@364578 91177308-0d34-0410-b5e6-96231b3b80d8
2019-06-27 19:09:51 +00:00
Roman Lebedev
1df452bb5a Revert "[CodeGen] [SelectionDAG] More efficient code for X % C == 0 (UREM case) (try 2)"
*Appears* to break test-suite on
http://lab.llvm.org:8011/builders/clang-cmake-x86_64-sde-avx512-linux/builds/23790

FAIL: burg.execution_time
FAIL: spiff.execution_time
FAIL: employ.execution_time
FAIL: llu.execution_time
FAIL: gramschmidt.execution_time
FAIL: fdtd-apml.execution_time

This reverts commit r364563.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@364568 91177308-0d34-0410-b5e6-96231b3b80d8
2019-06-27 17:22:31 +00:00
Nicolai Haehnle
c71a1c3902 AMDGPU: Make fixing i1 copies robust against re-ordering
Summary:
The new test case led to incorrect code.

Change-Id: Ief48b227e97aa662dd3535c9bafb27d4a184efca

Reviewers: arsenm, david-salinas

Subscribers: kzhuravl, jvesely, wdng, yaxunl, dstuttard, tpr, t-tye, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D63871

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@364566 91177308-0d34-0410-b5e6-96231b3b80d8
2019-06-27 16:56:44 +00:00
Roman Lebedev
84139109d4 [CodeGen] [SelectionDAG] More efficient code for X % C == 0 (UREM case) (try 2)
Summary:
I'm submitting a new revision since i don't understand how to reclaim/reopen/take over the existing one, D50222.
There is no such action in "Add Action" menu...
Original patch D50222 by @hermord (Dmytro Shynkevych)

This implements an optimization described in Hacker's Delight 10-17: when `C` is constant,
the result of `X % C == 0` can be computed more cheaply without actually calculating the remainder.
The motivation is discussed here: https://bugs.llvm.org/show_bug.cgi?id=35479.

Original patch author: @hermord (Dmytro Shynkevych)!

Notes:
- In principle, it's possible to also handle the `X % C1 == C2` case, as discussed on bugzilla.
  This seems to require an extra branch on overflow, so I refrained from implementing this for now.
- An explicit check for when the `REM` can be reduced to just its LHS is included:
  the `X % C` == 0 optimization breaks `test1` in `test/CodeGen/X86/jump_sign.ll` otherwise.
  I hadn't managed to find a better way to not generate worse output in this case.
- The `test/CodeGen/X86/jump_sign.ll` regresses, and is being fixed by a followup patch D63390.

Reviewers: RKSimon, craig.topper, spatel, hermord, xbolva00

Reviewed By: RKSimon, xbolva00

Subscribers: xbolva00, javed.absar, llvm-commits, hermord

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D63391

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@364563 91177308-0d34-0410-b5e6-96231b3b80d8
2019-06-27 16:45:42 +00:00
Paul Robinson
2a7bae43b0 [debug-info] Make a couple of tests more robust.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@364556 91177308-0d34-0410-b5e6-96231b3b80d8
2019-06-27 15:53:07 +00:00
Simon Pilgrim
39f0e3cf18 [TargetLowering] SimplifyDemandedVectorElts - add shift/rotate support.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@364548 91177308-0d34-0410-b5e6-96231b3b80d8
2019-06-27 14:25:54 +00:00
Jinsong Ji
e946fcb412 [PowerPC][HTM] Fix disassembling buffer overflow for tabortdc and others
This was reported in https://bugs.llvm.org/show_bug.cgi?id=41751
llvm-mc aborted when disassembling tabortdc.

This patch try to clean up TM related DAGs.

* Fixes the problem by remove explicit output of cr0, and put it as implicit def.
* Update int_ppc_tbegin pattern to accommodate the implicit def of cr0.
* Update the TCHECK operand and int_ppc_tcheck accordingly.
* Add some builtin test and disassembly tests.
* Remove unused CRRC0/crrc0

Differential Revision: https://reviews.llvm.org/D61935

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@364544 91177308-0d34-0410-b5e6-96231b3b80d8
2019-06-27 14:11:31 +00:00
Simon Pilgrim
c2a0046960 [TargetLowering] SimplifyDemandedBits - use DemandedElts to better identify partial splat shift amounts
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@364541 91177308-0d34-0410-b5e6-96231b3b80d8
2019-06-27 13:48:43 +00:00
Simon Pilgrim
59fa40e264 [X86][SSE] Regenerate v48 shuffle test on a variety of targets
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@364520 91177308-0d34-0410-b5e6-96231b3b80d8
2019-06-27 11:22:23 +00:00
Simon Pilgrim
d56a9b8674 [X86][AVX] SimplifyDemandedVectorElts - combine PERMPD(x) -> EXTRACTF128(X)
If we only use the bottom lane, see if we can simplify this to extract_subvector - which is always at least as quick as PERMPD/PERMQ.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@364518 91177308-0d34-0410-b5e6-96231b3b80d8
2019-06-27 11:16:03 +00:00
Djordje Todorovic
e02e5c6f21 [ISEL][X86] Tracking of registers that forward call arguments
While lowering calls, collect info about registers that forward arguments
into following function frame. We store such info into the MachineFunction
of the call. This is used very late when dumping DWARF info about
call site parameters.

([9/13] Introduce the debug entry values.)

Co-authored-by: Ananth Sowda <asowda@cisco.com>
Co-authored-by: Nikola Prica <nikola.prica@rt-rk.com>
Co-authored-by: Ivan Baev <ibaev@cisco.com>

Differential Revision: https://reviews.llvm.org/D60715

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@364516 91177308-0d34-0410-b5e6-96231b3b80d8
2019-06-27 10:51:15 +00:00
Diana Picus
65c682330e [GlobalISel] Accept multiple vregs for lowerCall's args
Change the interface of CallLowering::lowerCall to accept several
virtual registers for each argument, instead of just one.  This is a
follow-up to D46018.

CallLowering::lowerReturn was similarly refactored in D49660 and
lowerFormalArguments in D63549.

With this change, we no longer pack the virtual registers generated for
aggregates into one big lump before delegating to the target. Therefore,
the target can decide itself whether it wants to handle them as separate
pieces or use one big register.

ARM and AArch64 have been updated to use the passed in virtual registers
directly, which means we no longer need to generate so many
merge/extract instructions.

NFCI for AMDGPU, Mips and X86.

Differential Revision: https://reviews.llvm.org/D63551

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@364512 91177308-0d34-0410-b5e6-96231b3b80d8
2019-06-27 09:18:03 +00:00
Diana Picus
d3b26382a9 [GlobalISel] Accept multiple vregs for lowerCall's result
Change the interface of CallLowering::lowerCall to accept several
virtual registers for the call result, instead of just one.  This is a
follow-up to D46018.

CallLowering::lowerReturn was similarly refactored in D49660 and
lowerFormalArguments in D63549.

With this change, we no longer pack the virtual registers generated for
aggregates into one big lump before delegating to the target. Therefore,
the target can decide itself whether it wants to handle them as separate
pieces or use one big register.

ARM and AArch64 have been updated to use the passed in virtual registers
directly, which means we no longer need to generate so many
merge/extract instructions.

NFCI for AMDGPU, Mips and X86.

Differential Revision: https://reviews.llvm.org/D63550

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@364511 91177308-0d34-0410-b5e6-96231b3b80d8
2019-06-27 09:15:53 +00:00
Diana Picus
4776c1ff97 [GlobalISel] Accept multiple vregs in lowerFormalArgs
Change the interface of CallLowering::lowerFormalArguments to accept
several virtual registers for each formal argument, instead of just one.
This is a follow-up to D46018.

CallLowering::lowerReturn was similarly refactored in D49660. lowerCall
will be refactored in the same way in follow-up patches.

With this change, we forward the virtual registers generated for
aggregates to CallLowering. Therefore, the target can decide itself
whether it wants to handle them as separate pieces or use one big
register. We also copy the pack/unpackRegs helpers to CallLowering to
facilitate this.

ARM and AArch64 have been updated to use the passed in virtual registers
directly, which means we no longer need to generate so many
merge/extract instructions.

AArch64 seems to have had a bug when lowering e.g. [1 x i8*], which was
put into a s64 instead of a p0. Added a test-case which illustrates the
problem more clearly (it crashes without this patch) and fixed the
existing test-case to expect p0.

AMDGPU has been updated to unpack into the virtual registers for
kernels. I think the other code paths fall back for aggregates, so this
should be NFC.

Mips doesn't support aggregates yet, so it's also NFC.

x86 seems to have code for dealing with aggregates, but I couldn't find
the tests for it, so I just added a fallback to DAGISel if we get more
than one virtual register for an argument.

Differential Revision: https://reviews.llvm.org/D63549

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@364510 91177308-0d34-0410-b5e6-96231b3b80d8
2019-06-27 08:54:17 +00:00
Jay Foad
135f7bd084 [AMDGPU] Fix +DumpCode to print an entry label for the first function
Summary:
The +DumpCode attribute is a horrible hack in AMDGPU to embed the
disassembly of the generated code into the elf file. It is used by LLPC
to implement an extension that allows the application to read back the
disassembly of the code.

It tries to print an entry label at the start of every function, but
that didn't work for the first function in the module because
DumpCodeInstEmitter wasn't initialised until EmitFunctionBodyStart
which is too late.

Change-Id: I790d73ddf4f51fd02ab32529380c7cb7c607c4ee

Reviewers: arsenm, tpr, kzhuravl

Reviewed By: arsenm

Subscribers: arsenm, kzhuravl, jvesely, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D63712

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@364508 91177308-0d34-0410-b5e6-96231b3b80d8
2019-06-27 08:19:28 +00:00
Djordje Todorovic
3aa859711c [MachineFunction] Base support for call site info tracking
Add an attribute into the MachineFunction that tracks call site info.

([8/13] Introduce the debug entry values.)

Co-authored-by: Ananth Sowda <asowda@cisco.com>
Co-authored-by: Nikola Prica <nikola.prica@rt-rk.com>
Co-authored-by: Ivan Baev <ibaev@cisco.com>

Differential Revision: https://reviews.llvm.org/D61061

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@364506 91177308-0d34-0410-b5e6-96231b3b80d8
2019-06-27 07:48:06 +00:00
Craig Topper
1fadf01eab [X86] Teach selectScalarSSELoad to not narrow volatile loads.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@364498 91177308-0d34-0410-b5e6-96231b3b80d8
2019-06-27 05:51:56 +00:00
Eli Friedman
f5ce44c687 [ARM] Don't reserve R12 on Thumb1 as an emergency spill slot.
The current implementation of ThumbRegisterInfo::saveScavengerRegister
is bad for two reasons: one, it's buggy, and two, it blocks using R12
for other optimizations.  So this patch gets rid of it, and adds the
necessary support for using an ordinary emergency spill slot on Thumb1.

(Specifically, I think saveScavengerRegister was broken by r305625, and
nobody noticed for two years because the codepath is almost never used.
The new code will also probably not be used much, but it now has better
tests, and if we fail to emit a necessary emergency spill slot we get a
reasonable error message instead of a miscompile.)

A rough outline of the changes in the patch:

1. Gets rid of ThumbRegisterInfo::saveScavengerRegister.
2. Modifies ARMFrameLowering::determineCalleeSaves to allocate an
emergency spill slot for Thumb1.
3. Implements useFPForScavengingIndex, so the emergency spill slot isn't
placed at a negative offset from FP on Thumb1.
4. Modifies the heuristics for allocating an emergency spill slot to
support Thumb1.  This includes fixing ExtraCSSpill so we don't try to
use "lr" as a substitute for allocating an emergency spill slot.
5. Allocates a base pointer in more cases, so the emergency spill slot
is always accessible.
6. Modifies ARMFrameLowering::ResolveFrameIndexReference to compute the
right offset in the new cases where we're forcing a base pointer.
7. Ensures we never generate a load or store with an offset outside of
its frame object.  This makes the heuristics more straightforward.
8. Changes Thumb1 prologue and epilogue emission so it never uses
register scavenging.

Some of the changes to the emergency spill slot heuristics in
determineCalleeSaves affect ARM/Thumb2; hopefully, they should allow
the compiler to avoid allocating an emergency spill slot in cases
where it isn't necessary. The rest of the changes should only affect
Thumb1.

Differential Revision: https://reviews.llvm.org/D63677



git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@364490 91177308-0d34-0410-b5e6-96231b3b80d8
2019-06-26 23:46:51 +00:00
Matt Arsenault
220e7b197a [AMDGPU] Fix Livereg computation during epilogue insertion
The LivePhysRegs calculated in order to find a scratch register in the
epilogue code wrongly uses 'LiveIns'. Instead, it should use the
'Liveout' sets.  For the liveness, also considering the operands of
the terminator (return) instruction which is the insertion point for
the scratch-exec-copy instruction.

Patch by Christudasan Devadasan

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@364470 91177308-0d34-0410-b5e6-96231b3b80d8
2019-06-26 20:35:18 +00:00
Craig Topper
355b9a12ae [X86] Rework the logic in LowerBuildVectorv16i8 to make better use of any_extend and break false dependencies. Other improvements
This patch rewrites the loop iteration to only visit every other element starting with element 0. And we work on the "even" element and "next" element at the same time. The "First" logic has been moved to the bottom of the loop and doesn't run on every element. I believe it could create dangling nodes previously since we didn't check if we were going to use SCALAR_TO_VECTOR for the first insertion. I got rid of the "First" variable and just do a null check on V which should be equivalent. We also no longer use undef as the starting V for vectors with no zeroes to avoid false dependencies. This matches v8i16.

I've changed all the extends and OR operations to use MVT::i32 since that's what they'll be promoted to anyway. I've tried to use zero_extend only when necessary and use any_extend otherwise. This resulted in some improvements in tests where we are now able to promote aligned (i32 (extload i8)) to a 32-bit load.

Differential Revision: https://reviews.llvm.org/D63702

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@364469 91177308-0d34-0410-b5e6-96231b3b80d8
2019-06-26 20:16:19 +00:00
Simon Pilgrim
6a67d10b3b [X86][SSE] getFauxShuffleMask - handle OR(x,y) where x and y have no overlapping bits
Create a per-byte shuffle mask based on the computeKnownBits from each operand - if for each byte we have a known zero (or both) then it can be safely blended.

Fixes PR41545

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@364458 91177308-0d34-0410-b5e6-96231b3b80d8
2019-06-26 18:21:26 +00:00
Simon Pilgrim
cf436bba72 [X86][AVX] Add reduced test case for PR41545
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@364454 91177308-0d34-0410-b5e6-96231b3b80d8
2019-06-26 17:56:53 +00:00
Ulrich Weigand
3f58e45931 Allow matching extend-from-memory with strict FP nodes
This implements a small enhancement to https://reviews.llvm.org/D55506

Specifically, while we were able to match strict FP nodes for
floating-point extend operations with a register as source, this
did not work for operations with memory as source.

That is because from regular operations, this is represented as
a combined "extload" node (which is a variant of a load SD node);
but there is no equivalent using a strict FP operation.

However, it turns out that even in the absence of an extload
node, we can still just match the operations explicitly, e.g.
   (strict_fpextend (f32 (load node:$ptr))

This patch implements that method to match the LDEB/LXEB/LXDB
SystemZ instructions even when the extend uses a strict-FP node.



git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@364450 91177308-0d34-0410-b5e6-96231b3b80d8
2019-06-26 17:19:12 +00:00
Thomas Lively
65adefc141 [WebAssembly] Omit wrap on i64x2.{shl,shr*} ISel when possible
Summary:
Since the WebAssembly SIMD shift instructions take i32 operands, we
truncate the i64 operand to <2 x i64> shifts during ISel. When the i64
operand is sign extended from i32, this CL makes it so the sign
extension is dropped instead of a wrap instruction added.

Reviewers: dschuff, aheejin

Subscribers: sbc100, jgravelle-google, hiraditya, sunfish, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D63615

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@364446 91177308-0d34-0410-b5e6-96231b3b80d8
2019-06-26 16:19:59 +00:00
Thomas Lively
78cbc90959 [WebAssembly] Implement tail calls and unify tablegen call classes
Summary:
Implements direct and indirect tail calls enabled by the 'tail-call'
feature in both DAG ISel and FastISel. Updates existing call tests and
adds new tests including a binary encoding test.

Reviewers: aheejin

Subscribers: dschuff, sbc100, jgravelle-google, hiraditya, sunfish, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D62877

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@364445 91177308-0d34-0410-b5e6-96231b3b80d8
2019-06-26 16:17:15 +00:00
Evandro Menezes
6016bbfa46 [CodeGen] Improve formatting of jump tables (NFC)
Split jump tables into individual lines and fix spacing.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@364436 91177308-0d34-0410-b5e6-96231b3b80d8
2019-06-26 15:11:31 +00:00
Simon Pilgrim
d889d38a51 [X86][SSE] X86TargetLowering::isCommutativeBinOp - add PCMPEQ
Allows narrowInsertExtractVectorBinOp to reduce vector size

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@364432 91177308-0d34-0410-b5e6-96231b3b80d8
2019-06-26 14:40:49 +00:00
Simon Pilgrim
95db6ffd34 [X86][SSE] X86TargetLowering::isBinOp - add PCMPGT
Allows narrowInsertExtractVectorBinOp to reduce vector size

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@364431 91177308-0d34-0410-b5e6-96231b3b80d8
2019-06-26 14:34:41 +00:00
Roman Lebedev
1c83fed7e7 [X86] X86DAGToDAGISel::matchBitExtract(): pattern c: truncation awareness
Summary:
The one thing of note here is that the 'bitwidth' constant (32/64) was previously pessimistic.
Given `x & (-1 >> (C - z))`, we were taking `C` to be `bitwidth(x)`, but in reality
we want `(-1 >> (C - z))` pattern to mean "low z bits must be all-ones".
And for that, `C` should be `bitwidth(-1 >> (C - z))`, i.e. of the shift operation itself.

Last pattern D does not seem to exhibit any of these truncation issues.
Although it has the opposite problem - if we extract low bits (no shift) from i64,
and then truncate to i32, then we fail to shrink this 64-bit extraction into 32-bit extraction.

Reviewers: RKSimon, craig.topper, spatel

Reviewed By: RKSimon

Subscribers: llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D62806

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@364419 91177308-0d34-0410-b5e6-96231b3b80d8
2019-06-26 12:19:47 +00:00