21321 Commits

Author SHA1 Message Date
Stanislav Mekhanoshin
64373efcab [AMDGPU] Fix illegal shrink of V_SUBB_U32 and V_ADDC_U32
If there is an immediate operand we shall not shrink V_SUBB_U32
and V_ADDC_U32, it does not fit e32 encoding.

Differential Revison: https://reviews.llvm.org/D34291

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@305840 91177308-0d34-0410-b5e6-96231b3b80d8
2017-06-20 20:33:44 +00:00
Aditya Nandakumar
c7c608fe5e [GISel]: Add G_FMA opcode for fused multiply adds
https://reviews.llvm.org/D34372

Reviewed by dsanders

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@305824 91177308-0d34-0410-b5e6-96231b3b80d8
2017-06-20 19:25:23 +00:00
Matt Arsenault
5516bd1387 AMDGPU: Do operand folding in program order
Before it was possible to partially fold use instructions
before the defs. After the xor is folded into a copy, the same
mov can end up in the fold list twice, so on the second attempt
it will fail expecting to see a register to fold.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@305821 91177308-0d34-0410-b5e6-96231b3b80d8
2017-06-20 18:56:32 +00:00
Matthias Braun
0cc137e051 RegisterScavenging: Followup to r305625
This does some improvements/cleanup to the recently introduced
scavengeRegisterBackwards() functionality:

- Rewrite findSurvivorBackwards algorithm to use the existing
  LiveRegUnit::accumulateBackward() code. This also avoids the Available
  and Candidates bitset and just need 1 LiveRegUnit instance
  (= 1 bitset).
- Pick registers in allocation order instead of register number order.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@305817 91177308-0d34-0410-b5e6-96231b3b80d8
2017-06-20 18:43:14 +00:00
Matt Arsenault
73854fd751 AMDGPU: Preserve undef when folding register operands
If the source was a copy of an undef register, this would
produce a read of an undefined register which is a verifier
error.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@305816 91177308-0d34-0410-b5e6-96231b3b80d8
2017-06-20 18:41:31 +00:00
Stanislav Mekhanoshin
423a449bd5 [AMDGPU] Eliminate SGPR to VGPR copy when possible
SGPRs are generally cheaper, so try to use them over VGPRs.

Differential Revision: https://reviews.llvm.org/D34130

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@305815 91177308-0d34-0410-b5e6-96231b3b80d8
2017-06-20 18:32:42 +00:00
Matt Arsenault
4cabef582f AMDGPU: Fix crash with undef vreg input operand
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@305814 91177308-0d34-0410-b5e6-96231b3b80d8
2017-06-20 18:28:02 +00:00
Sanjay Patel
aab686b3f7 [x86] enable CGP memcmp() expansion for 2/4/8 byte sizes
There are a couple of potential improvements as seen in the IR and asm:
1. We're unnecessarily extending to a larger type to compare values.
2. The codegen for (select cond, 1, -1) could avoid a cmov.
(or we could change the order of the compares, so we have a select with 0 operand)


git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@305802 91177308-0d34-0410-b5e6-96231b3b80d8
2017-06-20 15:58:30 +00:00
Simon Pilgrim
85838270a5 [X86][SSE] Relax 0/-1 vector element insertion to work for any vector with >=16bit elements
Shuffle lowering/combining now does a good job for 256/512-bit vectors - we don't need to prevent this

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@305801 91177308-0d34-0410-b5e6-96231b3b80d8
2017-06-20 15:19:02 +00:00
Tim Northover
f37b0dbf16 DAG: correctly legalize UMULO.
We were incorrectly sign extending into the high word (as you would for
SMULO) when legalizing UMULO in terms of a wider full multiplication.

Patch by James Duley.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@305800 91177308-0d34-0410-b5e6-96231b3b80d8
2017-06-20 15:01:38 +00:00
Daniel Sanders
a4b49d696a [globalisel][tablegen] Add support for COPY_TO_REGCLASS.
Summary:
As part of this
* Emitted instructions now have named MachineInstr variables associated
  with them. This isn't particularly important yet but it's a small step
  towards multiple-insn emission.
* constrainSelectedInstRegOperands() is no longer hardcoded. It's now added
  as the ConstrainOperandsToDefinitionAction() action. COPY_TO_REGCLASS uses
  an alternate constraint mechanism ConstrainOperandToRegClassAction() which
  supports arbitrary constraints such as that defined by COPY_TO_REGCLASS.

Reviewers: ab, qcolombet, t.p.northover, rovka, kristof.beyls, aditya_nandakumar

Reviewed By: ab

Subscribers: javed.absar, igorb, llvm-commits

Differential Revision: https://reviews.llvm.org/D33590



git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@305791 91177308-0d34-0410-b5e6-96231b3b80d8
2017-06-20 12:36:34 +00:00
Simon Pilgrim
2bba203993 Fixed test name. NFCI.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@305787 91177308-0d34-0410-b5e6-96231b3b80d8
2017-06-20 10:24:06 +00:00
Igor Breger
515e735353 [GlobalISel][X86] Get correct RegClass for given RegBank.
Summary:
In some cases RegClass depends on target feature. Hight (16-31) vector registers exist only if AVX512f available.
Split from https://reviews.llvm.org/D33665

Reviewers: qcolombet, t.p.northover, zvi, guyblank

Reviewed By: t.p.northover, guyblank

Subscribers: guyblank, rovka, llvm-commits, kristof.beyls

Differential Revision: https://reviews.llvm.org/D33952

Conflicts:
	test/CodeGen/X86/GlobalISel/select-memop-scalar.mir

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@305784 91177308-0d34-0410-b5e6-96231b3b80d8
2017-06-20 09:15:10 +00:00
Igor Breger
e193a7694c [GlobalISel] combine not symmetric merge/unmerge nodes.
Summary:
In some cases legalization ends up with not symmetric merge/unmerge nodes.
Transform it to merge/unmerge nodes.

Reviewers: t.p.northover, qcolombet, zvi

Reviewed By: t.p.northover

Subscribers: rovka, kristof.beyls, guyblank, llvm-commits

Differential Revision: https://reviews.llvm.org/D33626

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@305783 91177308-0d34-0410-b5e6-96231b3b80d8
2017-06-20 08:54:17 +00:00
Igor Breger
88c4c546d4 [GlobalISel][X86] add legalizer mir tests. NFC
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@305781 91177308-0d34-0410-b5e6-96231b3b80d8
2017-06-20 08:30:48 +00:00
Alexandros Lamprineas
46e03a371d [ARM] Support constant pools in data when generating execute-only code.
Resubmission of r305387, which was reverted at r305390. The Address
Sanitizer caught a stack-use-after-scope of a Twine variable. This
is now fixed by passing the Twine directly as a function parameter.

The ARM backend asserts against constant pool lowering when it generates
execute-only code in order to prevent the generation of constant pools in
the text section. It appears that target independent optimizations might
generate DAG nodes that represent constant pools. By lowering such nodes
as global addresses we don't violate the semantics of execute-only code
and also it is guaranteed that execute-only behaves correct with the
position-independent addressing modes that support execute-only code.

Differential Revision: https://reviews.llvm.org/D33773

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@305776 91177308-0d34-0410-b5e6-96231b3b80d8
2017-06-20 07:20:52 +00:00
Matt Arsenault
5a7b3305c5 AMDGPU: Fix scratch wave offset relative FI expansion
The offset may not be an inline immediate, so this needs
to be materialized into a register. The post-RA run of
SIShrinkInstructions is able to fold it later if it can.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@305761 91177308-0d34-0410-b5e6-96231b3b80d8
2017-06-19 23:47:21 +00:00
Stanislav Mekhanoshin
7c44c2a308 [AMDGPU] Add infer address spaces pass before SROA
It adds it for the target after inlining but before SROA where
we can get most out of it.

Differential Revision: https://reviews.llvm.org/D34366

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@305759 91177308-0d34-0410-b5e6-96231b3b80d8
2017-06-19 23:17:36 +00:00
Sanjoy Das
7ef9a64157 Fix machine instruction in test case
The AMD64rm instruction used in the test case was incorrect.  Since
the first input register to AND64rm is tied to output register, they
must be the same.

Thanks for Jesper Antonsson for pointing this out!

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@305756 91177308-0d34-0410-b5e6-96231b3b80d8
2017-06-19 22:35:48 +00:00
Nico Weber
4f5f095c93 Revert r305382, it caused PR33513.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@305735 91177308-0d34-0410-b5e6-96231b3b80d8
2017-06-19 19:48:59 +00:00
Sanjay Patel
7a0e66cc56 [CGP, PowerPC] try to constant fold before creating loads for memcmp expansion
This is the last step needed to avoid regressions for x86 before we flip the switch to allow 
expansion of the smallest set of memcpy() via CGP. The DAG version checks for constant strings, 
so we need to do that here too.

FWIW, the 2 constant test is not handled by LibCallSimplifier::optimizeMemCmp() because that 
code is limited to 8-bit constant arrays. LibCallSimplifier will also fail to optimize some 1 
constant tests because its alignment requirements are too strict (shouldn't require alignment 
for a constant operand).

Differential Revision: https://reviews.llvm.org/D34071


git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@305734 91177308-0d34-0410-b5e6-96231b3b80d8
2017-06-19 19:48:35 +00:00
Hans Wennborg
08030e7683 Revert r304824 "Fix PR23384 (part 3 of 3)"
This seems to be interacting badly with ASan somehow, causing false reports of
heap-buffer overflows: PR33514.

> Summary:
> The patch makes instruction count the highest priority for
> LSR solution for X86 (previously registers had highest priority).
>
> Reviewers: qcolombet
>
> Differential Revision: http://reviews.llvm.org/D30562
>
> From: Evgeny Stupachenko <evstupac@gmail.com>

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@305720 91177308-0d34-0410-b5e6-96231b3b80d8
2017-06-19 17:57:15 +00:00
Nirav Dave
d9d2f4eb1f Add test for store merge with noimplicitfloat
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@305697 91177308-0d34-0410-b5e6-96231b3b80d8
2017-06-19 15:18:20 +00:00
Tom Stellard
865802ce11 AMDGPU/GlobalISel: Mark G_BITCAST s32 <--> <2 x s16> legal
Reviewers: arsenm

Reviewed By: arsenm

Subscribers: kzhuravl, wdng, nhaehnle, yaxunl, rovka, kristof.beyls, igorb, dstuttard, tpr, t-tye, llvm-commits

Differential Revision: https://reviews.llvm.org/D34129

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@305692 91177308-0d34-0410-b5e6-96231b3b80d8
2017-06-19 13:15:45 +00:00
Igor Breger
c56841e7f4 [GlobalISel][X86] Fold FI/G_GEP into LDR/STR instruction addressing mode.
Summary: Implement some of the simplest addressing modes.It should help to test ABI.

Reviewers: zvi, guyblank

Reviewed By: guyblank

Subscribers: rovka, llvm-commits, kristof.beyls

Differential Revision: https://reviews.llvm.org/D33888

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@305691 91177308-0d34-0410-b5e6-96231b3b80d8
2017-06-19 13:12:57 +00:00
Diana Picus
afb808fd9e [ARM] GlobalISel: Support G_ICMP for s8 and s16
Widen to s32 (like all other binary ops).

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@305683 91177308-0d34-0410-b5e6-96231b3b80d8
2017-06-19 11:47:28 +00:00
Diana Picus
df1f7afc1e [ARM] GlobalISel: Support G_ICMP for i32 and pointers
Add support throughout the pipeline:
- mark as legal for s32 and pointers
- map to GPRs
- lower to a sequence of instructions, which moves 0 or 1 into the
  result register based on the flags set by a CMPrr

We have copied from FastISel a helper function which maps CmpInst
predicates into ARMCC codes. Ideally, we should be able to move it
somewhere that both FastISel and GlobalISel can use.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@305672 91177308-0d34-0410-b5e6-96231b3b80d8
2017-06-19 09:40:51 +00:00
Guy Blank
f00ca4f77d [X86] Simplify vector-shuffle-v48 test. NFC.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@305670 91177308-0d34-0410-b5e6-96231b3b80d8
2017-06-19 08:58:13 +00:00
Sanjay Patel
ffa33d04b9 [x86] specify triples and auto-generate complete checks; NFC
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@305656 91177308-0d34-0410-b5e6-96231b3b80d8
2017-06-18 21:48:44 +00:00
Sanjay Patel
4e08dc9667 [x86] specify triples and auto-generate complete checks; NFC
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@305655 91177308-0d34-0410-b5e6-96231b3b80d8
2017-06-18 21:42:19 +00:00
Sanjay Patel
6661e76505 [x86] specify triple and auto-generate checks; NFC
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@305654 91177308-0d34-0410-b5e6-96231b3b80d8
2017-06-18 21:30:57 +00:00
Sanjay Patel
5c2d0a5c5e x86] adjust test constants to maintain coverage; NFC
Increment (add 1) could be transformed to sub -1, and we'd lose coverage for these patterns.


git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@305646 91177308-0d34-0410-b5e6-96231b3b80d8
2017-06-18 14:45:23 +00:00
Sanjay Patel
a9fb6f91c6 [x86] adjust test constants to maintain coverage; NFC
Increment (add 1) could be transformed to sub -1, and we'd lose coverage for these patterns.


git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@305645 91177308-0d34-0410-b5e6-96231b3b80d8
2017-06-18 14:23:47 +00:00
Sanjay Patel
1004a9ebc5 [x86] adjust test constants to maintain coverage; NFC
Increment (add 1) could be transformed to sub -1, and we'd lose coverage for these patterns.


git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@305644 91177308-0d34-0410-b5e6-96231b3b80d8
2017-06-18 14:01:32 +00:00
Matthias Braun
cd03942492 RegScavenging: Add scavengeRegisterBackwards()
Re-apply r276044/r279124/r305516. Fixed a problem where we would refuse
to place spills as the very first instruciton of a basic block and thus
artifically increase pressure (test in
test/CodeGen/PowerPC/scavenging.mir:spill_at_begin)

This is a variant of scavengeRegister() that works for
enterBasicBlockEnd()/backward(). The benefit of the backward mode is
that it is not affected by incomplete kill flags.

This patch also changes
PrologEpilogInserter::doScavengeFrameVirtualRegs() to use the register
scavenger in backwards mode.

Differential Revision: http://reviews.llvm.org/D21885

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@305625 91177308-0d34-0410-b5e6-96231b3b80d8
2017-06-17 02:08:18 +00:00
Davide Italiano
1cea15532e [SelectionDAG] Update Loop info after splitting critical edges.
The analysis is expected to be preserved by SelectionDAG.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@305621 91177308-0d34-0410-b5e6-96231b3b80d8
2017-06-17 00:56:27 +00:00
Sam Clegg
46016f24b4 [WebAssembly] Use __stack_pointer global when writing wasm binary
This ensures that symbolic relocations are generated for stack
pointer manipulations.

These relocations are of type R_WEBASSEMBLY_GLOBAL_INDEX_LEB.
This change also adds support for reading relocations of this
type in WasmObjectFile.cpp.

Since its a globally imported symbol this does mean that
the get_global/set_global instruction won't be valid until
the objects are linked that global used in no longer an
imported global.

Differential Revision: https://reviews.llvm.org/D34172

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@305616 91177308-0d34-0410-b5e6-96231b3b80d8
2017-06-16 23:59:10 +00:00
Matthias Braun
bd1a266898 Revert "RegScavenging: Add scavengeRegisterBackwards()"
Revert because of reports of some PPC input starting to spill when it
was predicted that it wouldn't and no spillslot was reserved.

This reverts commit r305516.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@305566 91177308-0d34-0410-b5e6-96231b3b80d8
2017-06-16 17:48:08 +00:00
Yonghong Song
1aa4ba7ed9 bpf: avoid load from read-only sections
If users tried to have a structure decl/init code like below
   struct test_t t = { .memeber1 = 45 };
It is very likely that compiler will generate a readonly section
to hold up the init values for variable t. Later load of t members,
e.g., t.member1 will result in a read from readonly section.

BPF program cannot handle relocation. This will force users to
write:
  struct test_t t = {};
  t.member1 = 45;
This is just inconvenient and unintuitive.

This patch addresses this issue by implementing BPF PreprocessISelDAG.
For any load from a global constant structure or an global array of
constant struct, it attempts to
translate it into a constant directly. The traversal of the
constant struct and other constant data structures are similar
to where the assembler emits read-only sections.

Four different unit test cases are also added to cover
different scenarios.

Signed-off-by: Yonghong Song <yhs@fb.com>

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@305560 91177308-0d34-0410-b5e6-96231b3b80d8
2017-06-16 15:41:16 +00:00
Daniel Neilson
470c6959b7 [Atomics] Rename and change prototype for atomic memcpy intrinsic
Summary:

Background: http://lists.llvm.org/pipermail/llvm-dev/2017-May/112779.html

This change is to alter the prototype for the atomic memcpy intrinsic. The prototype itself is being changed to more closely resemble the semantics and parameters of the llvm.memcpy intrinsic -- to ease later combination of the llvm.memcpy and atomic memcpy intrinsics. Furthermore, the name of the atomic memcpy intrinsic is being changed to make it clear that it is not a generic atomic memcpy, but specifically a memcpy is unordered atomic.

Reviewers: reames, sanjoy, efriedma

Reviewed By: reames

Subscribers: mzolotukhin, anna, llvm-commits, skatkov

Differential Revision: https://reviews.llvm.org/D33240

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@305558 91177308-0d34-0410-b5e6-96231b3b80d8
2017-06-16 14:43:59 +00:00
Simon Dardis
7810ae7481 Revert "[mips][microMIPS] Extending size reduction pass with ADDIUSP and ADDIUR1SP"
This reverts commit r305455. This commit was reported as breaking one of
the sanitizer buildbots. Reverting until lab.llvm.org comes back online.



git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@305557 91177308-0d34-0410-b5e6-96231b3b80d8
2017-06-16 14:00:33 +00:00
Krzysztof Parzyszek
c86dcc6411 [Hexagon] Don't kill live registers when creating mux out of tfr
The second part of r305300: when placing the mux at the later location,
make sure that it won't use any register that was killed between the
two original instructions. Remove any such kills and transfer them to
the mux.


git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@305553 91177308-0d34-0410-b5e6-96231b3b80d8
2017-06-16 12:24:03 +00:00
Matthias Braun
02688b00ef RegScavenging: Add scavengeRegisterBackwards()
Re-apply r276044/r279124. Trying to reproduce or disprove the ppc64
problems reported in the stage2 build last time, which I cannot
reproduce right now.

This is a variant of scavengeRegister() that works for
enterBasicBlockEnd()/backward(). The benefit of the backward mode is
that it is not affected by incomplete kill flags.

This patch also changes
PrologEpilogInserter::doScavengeFrameVirtualRegs() to use the register
scavenger in backwards mode.

Differential Revision: http://reviews.llvm.org/D21885

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@305516 91177308-0d34-0410-b5e6-96231b3b80d8
2017-06-15 22:14:55 +00:00
Alexander Timofeev
7807f69e9b DivergencyAnalysis patch for review
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@305494 91177308-0d34-0410-b5e6-96231b3b80d8
2017-06-15 19:33:10 +00:00
Lei Huang
edd54f82b6 [MachineLICM] Hoist TOC-based address instructions
Add condition for MachineLICM to safely hoist instructions that utilize
non constant registers that are reserved.

On PPC, global variable access is done through the table of contents (TOC)
which is always in register X2.  The ABI reserves this register in any
functions that have calls or access global variables.

A call through a function pointer involves saving, changing and restoring
this register around the call and thus MachineLICM does not consider it to
be invariant. We can however guarantee the register is preserved across the
call and thus is invariant.

Differential Revision: https://reviews.llvm.org/D33562

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@305490 91177308-0d34-0410-b5e6-96231b3b80d8
2017-06-15 18:29:59 +00:00
Arnold Schwaighofer
00cc11273b ISel: Fix FastISel of swifterror values
The code assumed that we process instructions in basic block order.  FastISel
processes instructions in reverse basic block order. We need to pre-assign
virtual registers before selecting otherwise we get def-use relationships wrong.

This only affects code with swifterror registers.

rdar://32659327

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@305484 91177308-0d34-0410-b5e6-96231b3b80d8
2017-06-15 17:34:42 +00:00
Hiroshi Inoue
1a33599c52 [PowerPC] fix potential verification errors on CFENCE8
This patch fixes a potential verification error (64-bit register operands for cmpw) with -verify-machineinstrs.

Differential Revision: https://reviews.llvm.org/D34208



git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@305479 91177308-0d34-0410-b5e6-96231b3b80d8
2017-06-15 16:51:28 +00:00
Simon Pilgrim
d62e545b2b [X86][AVX2] Fix issue in lowerV8I16GeneralSingleInputVectorShuffle that was assuming v8i16 vectors
We can use this with v16i16/v32i16 as well.

Found during fuzz testing.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@305472 91177308-0d34-0410-b5e6-96231b3b80d8
2017-06-15 14:52:30 +00:00
Simon Pilgrim
81f6df5c35 Revert r305465: [X86][AVX512] Improve lowering of AVX512 compare intrinsics (remove redundant shift left+right instructions).
This is causing windows buildbot failures

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@305470 91177308-0d34-0410-b5e6-96231b3b80d8
2017-06-15 14:39:34 +00:00
Ayman Musa
1abb66152a [X86][AVX512] Improve lowering of AVX512 compare intrinsics (remove redundant shift left+right instructions).
AVX512 compare instructions return v*i1 types.
In cases where the number of elements in the returned value are less than 8, clang adds zeroes to get a mask of v8i1 type.
Later on it's replaced with CONCAT_VECTORS, which then is lowered to many DAG nodes including insert/extract element and shift right/left nodes.
The fact that AVX512 compare instructions put the result in a k register and zeroes all its upper bits allows us to remove the extra nodes simply by copying the result to the required register class.

When lowering, identify these cases and transform them into an INSERT_SUBVECTOR node (marked legal), then catch this pattern in instructions selection phase and transform it into one avx512 cmp instruction.

Differential Revision: https://reviews.llvm.org/D33188



git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@305465 91177308-0d34-0410-b5e6-96231b3b80d8
2017-06-15 13:02:37 +00:00