Commit Graph

21008 Commits

Author SHA1 Message Date
Amara Emerson
f69362835a Re-commit r302678, fixing PR33053.
The issue was that the AArch64 TTI hook allowed unpacked integer cmp reductions
which didn't have a lowering.



git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@303211 91177308-0d34-0410-b5e6-96231b3b80d8
2017-05-16 21:29:22 +00:00
Tim Shen
50ecf9b407 [PPC] Add -ppc-asm-full-reg-names to atomic-2.ll. NFC.
Differential Revisions: https://reviews.llvm.org/D32763

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@303209 91177308-0d34-0410-b5e6-96231b3b80d8
2017-05-16 20:58:55 +00:00
Tim Shen
1a2e7acb99 [PPC] Lower load acquire/seq_cst trailing fence to cmp + bne + isync.
Summary:
This fixes pr32392.

The lowering pipeline is:
llvm.ppc.cfence in IR -> PPC::CFENCE8 in isel -> Actual instructions in
expandPostRAPseudo.

The reason why expandPostRAPseudo is chosen is because previous passes
are likely eliminating instructions like cmpw 3, 3 (early CSE) and bne-
7, .+4 (some branch pass(s)).

Differential Revision: https://reviews.llvm.org/D32763

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@303205 91177308-0d34-0410-b5e6-96231b3b80d8
2017-05-16 20:18:06 +00:00
Reid Kleckner
c4cdac05ab Revert "[X86] Replace slow LEA instructions in X86"
This reverts commit r303183, it broke various buildbots and introduced
sanitizer errors.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@303199 91177308-0d34-0410-b5e6-96231b3b80d8
2017-05-16 19:55:03 +00:00
Nirav Dave
acc2c1d71d Elide stores which are overwritten without being observed.
Summary:
In SelectionDAG, when a store is immediately chained to another store
to the same address, elide the first store as it has no observable
effects. This is causes small improvements dealing with intrinsics
lowered to stores.

Test notes:

* Many testcases overwrite store addresses multiple times and needed
  minor changes, mainly making stores volatile to prevent the
  optimization from optimizing the test away.

* Many X86 test cases optimized out instructions associated with
  associated with va_start.

* Note that test_splat in CodeGen/AArch64/misched-stp.ll no longer has
  dependencies to check and can probably be removed and potentially
  replaced with another test.

Reviewers: rnk, john.brawn

Subscribers: aemerson, rengolin, qcolombet, jyknight, nemanjai, nhaehnle, javed.absar, llvm-commits

Differential Revision: https://reviews.llvm.org/D33206

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@303198 91177308-0d34-0410-b5e6-96231b3b80d8
2017-05-16 19:43:56 +00:00
Renato Golin
15886e1270 Revert "[ARM] Mark LEApcrel instructions as isAsCheapAsAMove"
Revert "[ARM] Mark LEApcrel as not having side effects"

This reverts commit r303054 and r303053, as they broke the ARM
self-hosting buildbots:

http://lab.llvm.org:8011/builders/clang-cmake-thumbv7-a15-full-sh/builds/1550

http://lab.llvm.org:8011/builders/clang-cmake-armv7-a15-selfhost-neon/builds/1349

http://lab.llvm.org:8011/builders/clang-cmake-armv7-a15-selfhost/builds/1845

Offline investigation on course.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@303193 91177308-0d34-0410-b5e6-96231b3b80d8
2017-05-16 17:59:07 +00:00
Lama Saba
371c802090 [X86] Replace slow LEA instructions in X86
According to Intel's Optimization Reference Manual for SNB+:
  " For LEA instructions with three source operands and some specific situations, instruction latency has increased to 3 cycles, and must
    dispatch via port 1:
  - LEA that has all three source operands: base, index, and offset
  - LEA that uses base and index registers where the base is EBP, RBP,or R13
  - LEA that uses RIP relative addressing mode
  - LEA that uses 16-bit addressing mode "
  This patch currently handles the first 2 cases only.
 
Differential Revision: https://reviews.llvm.org/D32277



git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@303183 91177308-0d34-0410-b5e6-96231b3b80d8
2017-05-16 16:01:36 +00:00
Igor Breger
6ab1190a77 [GlobalISel][X86] Split memop test file. NFC
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@303169 91177308-0d34-0410-b5e6-96231b3b80d8
2017-05-16 13:37:31 +00:00
Francis Visoiu Mistrih
cc8486f611 [ShrinkWrapping] Handle restores on no-return paths
Shrink-wrapping uses post-dominators to find a restore point that
post-dominates all the uses of CSR / stack.

The way dominator trees are modeled in LLVM today is that unreachable
blocks are not present in a generic dominator tree, so, an unreachable node is
dominated by anything: include/llvm/Support/GenericDomTree.h:467.

Since for post-dominators, a no-return block is considered
"unreachable", calling findNearestCommonDominator on an unreachable node
A and a non-unreachable node B, will return B, which can be false. If we
find such node, we bail out since there is no good restore point
available.

rdar://problem/30186931

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@303130 91177308-0d34-0410-b5e6-96231b3b80d8
2017-05-15 23:13:35 +00:00
Tim Northover
87dff1dfcd AArch64: use linker-private symbols for globals in MachO.
We don't use section-relative relocations on AArch64, so all symbols must be at
least visible to the linker (i.e. properly global or l_whatever, but not
L_whatever).

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@303118 91177308-0d34-0410-b5e6-96231b3b80d8
2017-05-15 21:51:38 +00:00
Hans Wennborg
05f671ecfa Revert r302678 "[AArch64] Enable use of reduction intrinsics."
This caused PR33053.

Original commit message:

> The new experimental reduction intrinsics can now be used, so I'm enabling this
> for AArch64. We will need this for SVE anyway, so it makes sense to do this for
> NEON reductions as well.
>
> The existing code to match shufflevector patterns are replaced with a direct
> lowering of the reductions to AArch64-specific nodes. Tests updated with the
> new, simpler, representation.
>
> Differential Revision: https://reviews.llvm.org/D32247

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@303115 91177308-0d34-0410-b5e6-96231b3b80d8
2017-05-15 20:59:32 +00:00
Kyle Butt
e6202480d9 CodeGen: BlockPlacement: Increase tail duplication size for O3.
At O3 we are more willing to increase size if we believe it will improve
performance. The current threshold for tail-duplication of 2 instructions is
conservative, and can be relaxed at O3.

Benchmark results:
llvm test-suite:
6% improvement in aha, due to duplication of loop latch
3% improvement in hexxagon

2% slowdown in lpbench. Seems related, but couldn't completely diagnose.

Internal google benchmark:
Produces 4% improvement on internal google protocol buffer serialization
benchmarks.

Differential-Revision: https://reviews.llvm.org/D32324

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@303084 91177308-0d34-0410-b5e6-96231b3b80d8
2017-05-15 17:30:47 +00:00
Simon Pilgrim
2223371da5 [NVPTX] Don't flag StoreParam/LoadParam memory chain operands as ReadMem/WriteMem (PR32146)
Follow up to D33147

NVPTXTargetLowering::LowerCall was trusting the default argument values.

Fixes another 17 of the NVPTX '-verify-machineinstrs with EXPENSIVE_CHECKS' errors in PR32146.

Differential Revision: https://reviews.llvm.org/D33189

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@303082 91177308-0d34-0410-b5e6-96231b3b80d8
2017-05-15 17:17:44 +00:00
Florian Hahn
eb48e7d58f [AArch64] Enable FeatureFuseAES on Cortex-A72.
This patch enables fusing dependent AESE/AESMC and AESD/AESIMC
instruction pairs on Cortex-A72, as recommended in the Software
Optimization Guide, section 4.10.



git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@303073 91177308-0d34-0410-b5e6-96231b3b80d8
2017-05-15 15:15:22 +00:00
Dmitry Preobrazhensky
232c3d52ea [AMDGPU][MC] Corrected several VI opcodes to avoid printing _e64
See bug 32936: https://bugs.llvm.org//show_bug.cgi?id=32936

Reviewers: artem.tamazov, vpykhtin

Differential Revision: https://reviews.llvm.org/D33123

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@303070 91177308-0d34-0410-b5e6-96231b3b80d8
2017-05-15 14:28:23 +00:00
Dinar Temirbulatov
8632b74d7d Test commit.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@303059 91177308-0d34-0410-b5e6-96231b3b80d8
2017-05-15 13:14:04 +00:00
John Brawn
57bb7925b5 [ARM] Mark LEApcrel instructions as isAsCheapAsAMove
Doing this means that if an LEApcrel is used in two places we will rematerialize
instead of generating two MOVs. This is particularly useful for printfs using
the same format string, where we want to generate an address into a register
that's going to get corrupted by the call.

Differential Revision: https://reviews.llvm.org/D32858


git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@303054 91177308-0d34-0410-b5e6-96231b3b80d8
2017-05-15 11:57:54 +00:00
John Brawn
91719efd8f [ARM] Mark LEApcrel as not having side effects
Doing this lets us hoist it out of loops, and I've also marked it as
rematerializable the same as the thumb1 and thumb2 counterparts.

It looks like it being marked as such was just a mistake, as the commit that
made that change only mentions LEApcrelJT and in thumb1 and thumb2 only the
LEApcrelJT instructions were marked as having side-effects, so it looks like
the intent was to only mark LEApcrelJT as having side-effects but LEApcrel was
accidentally marked as such also.

Differential Revision: https://reviews.llvm.org/D32857


git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@303053 91177308-0d34-0410-b5e6-96231b3b80d8
2017-05-15 11:50:21 +00:00
Ayman Musa
eadb58fda7 [X86] Relocate code of replacement of subtarget unsupported masked memory intrinsics to run also on -O0 option.
Currently, when masked load, store, gather or scatter intrinsics are used, we check in CodeGenPrepare pass if the subtarget support these intrinsics, if not we replace them with scalar code - this is a functional transformation not an optimization (not optional).

CodeGenPrepare pass does not run when the optimization level is set to CodeGenOpt::None (-O0).

Functional transformation should run with all optimization levels, so here I created a new pass which runs on all optimization levels and does no more than this transformation.

Differential Revision: https://reviews.llvm.org/D32487



git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@303050 91177308-0d34-0410-b5e6-96231b3b80d8
2017-05-15 11:30:54 +00:00
Igor Breger
4448b5e925 [GlobalISel][X86] G_BR instruction select test
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@303036 91177308-0d34-0410-b5e6-96231b3b80d8
2017-05-15 07:03:38 +00:00
Craig Topper
ac7c8ceef7 [X86] Add avx512vl command lines to the 128/256-bit vector-lzcnt tests so we can see what compare instructions are being used in the lookup table code.
I noticed the 512-bit lzcnts don't use the X86 specific lookup table code and instead use the EXPAND case in LegalizeDAG. I was toying around with fixing this and noticed it would require compare instructions that generate i1 masks and then converting from mask to vector. Then I noticed that we don't test which compares are used with avx512vl and no avx512cd.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@303020 91177308-0d34-0410-b5e6-96231b3b80d8
2017-05-14 19:38:11 +00:00
Craig Topper
eba544c71c [X86] Cleanup some of the check-prefixes in the vector-lzcnt tests.
Remove an unneeded prefix from the 32-bit command line. Make all the 64-bit triples match. Replace ALL with X64 and remove it from the 32-bit test.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@303019 91177308-0d34-0410-b5e6-96231b3b80d8
2017-05-14 19:38:09 +00:00
Simon Pilgrim
8390fe6ccf [X86][AVX] Allow 32-bit targets to peek through subvectors to extract constant splats for vXi64 shifts.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@303009 91177308-0d34-0410-b5e6-96231b3b80d8
2017-05-14 11:46:26 +00:00
Simon Pilgrim
dc5d066e45 [X86][AVX] Add additional 32-bit target vector shift tests
Shows issue with 32-bits not being able to peek through subvectors to extract constant splats

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@303008 91177308-0d34-0410-b5e6-96231b3b80d8
2017-05-14 11:13:03 +00:00
Simon Pilgrim
72a3a14d8b [SelectionDAG] Added support for EXTRACT_SUBVECTOR/CONCAT_VECTORS demandedelts in ComputeNumSignBits
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@302997 91177308-0d34-0410-b5e6-96231b3b80d8
2017-05-13 22:10:58 +00:00
Simon Pilgrim
e7198dbba5 [X86][SSE] Test showing missing EXTRACT_SUBVECTOR/CONCAT_VECTORS demandedelts support in ComputeNumSignBits
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@302994 91177308-0d34-0410-b5e6-96231b3b80d8
2017-05-13 21:50:18 +00:00
Simon Pilgrim
bacfc66c2e [SelectionDAG] Add VECTOR_SHUFFLE support to ComputeNumSignBits
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@302993 91177308-0d34-0410-b5e6-96231b3b80d8
2017-05-13 19:57:10 +00:00
Simon Pilgrim
ba64adafb1 [X86][SSE] Test showing inability of ComputeNumSignBits to resolve shuffles
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@302992 91177308-0d34-0410-b5e6-96231b3b80d8
2017-05-13 17:41:07 +00:00
Simon Pilgrim
3c8af5480a [x86, SSE] AVX1 PR28129 (256-bit all-ones rematerialization)
Further perf tests on Jaguar indicate that:

vxorps  %ymm0, %ymm0, %ymm0
vcmpps  $15, %ymm0, %ymm0, %ymm0

is consistently faster (by about 9%) than:

vpcmpeqd  %xmm0, %xmm0, %xmm0
vinsertf128  $1, %xmm0, %ymm0, %ymm0

Testing equivalent code on a SandyBridge (E5-2640) puts it slightly (~3%) faster as well.

Committed on behalf of @dtemirbulatov

Differential Revision: https://reviews.llvm.org/D32416

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@302989 91177308-0d34-0410-b5e6-96231b3b80d8
2017-05-13 13:42:35 +00:00
Dylan McKay
af4bf77761 [AVR] When lowering Select8/Select16, put newly generated MBBs in the same spot
Contributed by Dr. Gergő Érdi.

Fixes a bug.

Raised from (https://github.com/avr-rust/rust/issues/49).

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@302973 91177308-0d34-0410-b5e6-96231b3b80d8
2017-05-13 00:22:34 +00:00
Sanjay Patel
62352f820d [x86] add vector tests for demanded bits; NFC
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@302949 91177308-0d34-0410-b5e6-96231b3b80d8
2017-05-12 20:53:48 +00:00
Changpeng Fang
ed4c8077b0 AMDGPU/SI: Don't promote to vector if the load/store is volatile.
Summary:
  We should not change volatile loads/stores in promoting alloca to vector.

Reviewers:
  arsenm

Differential Revision:
  http://reviews.llvm.org/D33107

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@302943 91177308-0d34-0410-b5e6-96231b3b80d8
2017-05-12 20:31:12 +00:00
Simon Pilgrim
4633bbb4c7 [NVPTX] Don't flag StoreRetVal memory chain operands as ReadMem (PR32146)
This fixes 47 of the 75 NVPTX '-verify-machineinstrs with EXPENSIVE_CHECKS' errors in PR32146.

Differential Revision: https://reviews.llvm.org/D33147

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@302942 91177308-0d34-0410-b5e6-96231b3b80d8
2017-05-12 19:56:43 +00:00
Dehao Chen
0faf9ed31e Add LiveRangeShrink pass to shrink live range within BB.
Summary: LiveRangeShrink pass moves instruction right after the definition with the same BB if the instruction and its operands all have more than one use. This pass is inexpensive and guarantees optimal live-range within BB.

Reviewers: davidxl, wmi, hfinkel, MatzeB, andreadb

Reviewed By: MatzeB, andreadb

Subscribers: hiraditya, jyknight, sanjoy, skatkov, gberry, jholewinski, qcolombet, javed.absar, krytarowski, atrick, spatel, RKSimon, andreadb, MatzeB, mehdi_amini, mgorny, efriedma, davide, dberlin, llvm-commits

Differential Revision: https://reviews.llvm.org/D32563

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@302938 91177308-0d34-0410-b5e6-96231b3b80d8
2017-05-12 19:29:27 +00:00
Tom Stellard
593d52aaad AMDGPU: Add lit.local.cfg to disable global-isel tests when global-isel is disabled
This should fix bots broken by r302919.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@302928 91177308-0d34-0410-b5e6-96231b3b80d8
2017-05-12 17:59:30 +00:00
Tom Stellard
f366f4cc57 AMDGPU/GlobalISel: Mark 32-bit integer constants as legal
Reviewers: arsenm

Reviewed By: arsenm

Subscribers: kzhuravl, wdng, nhaehnle, yaxunl, rovka, kristof.beyls, igorb, dstuttard, tpr, t-tye, llvm-commits

Differential Revision: https://reviews.llvm.org/D33115

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@302919 91177308-0d34-0410-b5e6-96231b3b80d8
2017-05-12 16:46:46 +00:00
James Y Knight
c5d0c88a98 [SPARC] Support 'f' and 'e' inline asm constraints.
Based on patch by Patrick Boettcher and Chris Dewhurst.

Differential Revision: https://reviews.llvm.org/D29116

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@302911 91177308-0d34-0410-b5e6-96231b3b80d8
2017-05-12 15:59:10 +00:00
Sanjay Patel
0ee4a484f7 [x86] add tests for potential vector narrowing optimization (PR32790)
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@302910 91177308-0d34-0410-b5e6-96231b3b80d8
2017-05-12 15:56:39 +00:00
Jonas Paulsson
668e541eed Handle a COPY with undef source operand in LowerCopy()
Llvm-stress discovered that a COPY may end up in ExpandPostRA::LowerCopy()
with an undef source operand. It is not possible for the target to handle
this, as this flag is not passed to TII->copyPhysReg().

This patch solves this by treating such a COPY as an identity COPY.

Review: Matthias Braun
https://reviews.llvm.org/D32892

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@302877 91177308-0d34-0410-b5e6-96231b3b80d8
2017-05-12 06:32:03 +00:00
Mikael Holmen
e43c10d201 [IfConversion] Keep the CFG updated incrementally in IfConvertTriangle
Summary:
Instead of using RemoveExtraEdges (which uses analyzeBranch, which cannot
always be trusted) at the end to fixup the CFG we keep the CFG updated as
we go along and remove or add branches and merge blocks.

This way we won't have any problems if the involved MBBs contain
unanalyzable instructions.

This fixes PR32721.

In that case we had a triangle

   EBB
   | \
   |  |
   | TBB
   |  /
   FBB

where FBB didn't have any successors at all since it ended with an
unconditional return. Then TBB and FBB were be merged into EBB, but EBB
would still keep its successors, and the use of analyzeBranch and
CorrectExtraCFGEdges wouldn't help to remove them since the return
instruction is not analyzable (at least not on ARM).

Reviewers: kparzysz, iteratee, MatzeB

Reviewed By: iteratee

Subscribers: aemerson, rengolin, javed.absar, llvm-commits

Differential Revision: https://reviews.llvm.org/D33037

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@302876 91177308-0d34-0410-b5e6-96231b3b80d8
2017-05-12 06:28:58 +00:00
Guozhi Wei
d3fe6038ab [PPC] Change the register constraint of the first source operand of instruction mtvsrdd to g8rc_nox0
According to Power ISA V3.0 document, the first source operand of mtvsrdd is constant 0 if r0 is specified. So the corresponding register constraint should be g8rc_nox0.

This bug caused wrong output generated by 401.bzip2 when -mcpu=power9 and fdo are specified.

Differential Revision: https://reviews.llvm.org/D32880



git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@302834 91177308-0d34-0410-b5e6-96231b3b80d8
2017-05-11 22:17:35 +00:00
Chad Rosier
2d534b8ab2 [AArch64][MachineCombine] Fold FNMUL+FSUB -> FNMADD.
Differential Revision: http://reviews.llvm.org/D33101.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@302822 91177308-0d34-0410-b5e6-96231b3b80d8
2017-05-11 20:07:24 +00:00
Vadzim Dambrouski
29165da1cd [MSP430] Generate EABI-compliant libcalls
Updates the MSP430 target to generate EABI-compatible libcall names.
As a byproduct, adjusts the hardware multiplier options available in
the MSP430 target, adds support for promotion of the ISD::MUL operation
for 8-bit integers, and correctly marks R11 as used by call instructions.

Patch by Andrew Wygle.

Differential Revision: https://reviews.llvm.org/D32676

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@302820 91177308-0d34-0410-b5e6-96231b3b80d8
2017-05-11 19:56:14 +00:00
Matt Arsenault
9f4e5a06c6 AMDGPU: Remove tfe bit from flat instruction definitions
We don't use it and it was removed in gfx9, and the encoding
bit repurposed.

Additionally actually using it requires changing the output register
class, which wasn't done anyway.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@302814 91177308-0d34-0410-b5e6-96231b3b80d8
2017-05-11 17:38:33 +00:00
Matt Arsenault
2bbb56fd75 AMDGPU: Pull fneg out of extract_vector_elt
This allows folding source modifiers in more f16 cases.
Makes it easier to select per-component packed neg modifiers.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@302813 91177308-0d34-0410-b5e6-96231b3b80d8
2017-05-11 17:26:25 +00:00
Nemanja Ivanovic
0470a16690 [PowerPC] Eliminate integer compare instructions - vol. 1
This patch is the first in a series of patches to provide code gen for
doing compares in GPRs when the compare result is required in a GPR.

It adds the infrastructure to select GPR sequences for i1->i32 and i1->i64
extensions. This first patch handles equality comparison on i32 operands with
the result sign or zero extended.

Differential Revision: https://reviews.llvm.org/D31847


git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@302810 91177308-0d34-0410-b5e6-96231b3b80d8
2017-05-11 16:54:23 +00:00
Simon Pilgrim
9a01b517fb [X86][AVX] Added zeroall/zeroupper scheduler tests
Missing on SandyBridge and Btver2 models

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@302804 91177308-0d34-0410-b5e6-96231b3b80d8
2017-05-11 15:02:49 +00:00
Chandler Carruth
c4b356568b [x86] Fix a failure to select with AVX-512 when the type legalizer
manages to form a VSELECT with a non-i1 element type condition. Those
are technically allowed in SDAG (at least, the generic type legalization
logic will form them and I wouldn't want to try to audit everything te
preclude forming them) so we need to be able to lower them.

This isn't too hard to implement. We mark VSELECT as custom so we get
a chance in C++, add a fast path for i1 conditions to get directly
handled by the patterns, and a fallback when we need to manually force
the condition to be an i1 that uses the vptestm instruction to turn
a non-mask into a mask.

This, unsurprisingly, generates awful code. But it at least doesn't
crash. This was actually impacting open source packages built with LLVM
for AVX-512 in the wild, so quickly landing a patch that at least stops
the immediate bleeding.

I think I've found where to fix the codegen quality issue, but less
confident of that change so separating it out from the thing that
doesn't change the result of any existing test case but causes mine to
not crash.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@302785 91177308-0d34-0410-b5e6-96231b3b80d8
2017-05-11 10:52:16 +00:00
Diana Picus
c978c0ff91 [ARM][GlobalISel] Legalize narrow scalar ops by widening
This is the same as r292827 for AArch64: we widen 8- and 16-bit ADD, SUB
and MUL to 32 bits since we only have TableGen patterns for 32 bits.
See the commit message for r292827 for more details.

At this point we could just remove some of the tests for regbankselect
and instruction-select, since we're not going to see any narrow
operations at those levels anymore. Instead I decided to update them
with G_ANYEXT/G_TRUNC operations, so we can validate the full sequences
generated by the legalizer.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@302782 91177308-0d34-0410-b5e6-96231b3b80d8
2017-05-11 09:45:57 +00:00
Diana Picus
62bc4f0ac3 [ARM][GlobalISel] Support for G_ANYEXT
G_ANYEXT can be introduced by the legalizer when widening scalars. Add
support for it in the register bank info (same mapping as everything
else) and in the instruction selector.

When selecting it, we treat it as a COPY, just like G_TRUNC. On this
occasion we get rid of some assertions in selectCopy so we can reuse it.
This shouldn't be a problem at the moment since we're not supporting any
complicated cases (e.g. FPR, different register banks). We might want to
separate the paths when we do.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@302778 91177308-0d34-0410-b5e6-96231b3b80d8
2017-05-11 08:28:31 +00:00