Commit Graph

17533 Commits

Author SHA1 Message Date
Hal Finkel
35dddafa6b [PowerPC] Refactor soft-float support, and enable PPC64 soft float
This change enables soft-float for PowerPC64, and also makes soft-float disable
all vector instruction sets for both 32-bit and 64-bit modes. This latter part
is necessary because the PPC backend canonicalizes many Altivec vector types to
floating-point types, and so soft-float breaks scalarization support for many
operations. Both for embedded targets and for operating-system kernels desiring
soft-float support, it seems reasonable that disabling hardware floating-point
also disables vector instructions (embedded targets without hardware floating
point support are unlikely to have Altivec, etc. and operating system kernels
desiring not to use floating-point registers to lower syscall cost are unlikely
to want to use vector registers either). If someone needs this to work, we'll
need to change the fact that we promote many Altivec operations to act on
v4f32. To make it possible to disable Altivec when soft-float is enabled,
hardware floating-point support needs to be expressed as a positive feature,
like the others, and not a negative feature, because target features cannot
have dependencies on the disabling of some other feature. So +soft-float has
now become -hard-float.

Fixes PR26970.

llvm-svn: 283060
2016-10-02 02:10:20 +00:00
Simon Pilgrim
3c7330116c [X86][SSE] Add support for combining target shuffles to binary BLEND
We already had support for 1-input BLEND with zero - this adds support for 2-input BLEND as well.

llvm-svn: 283040
2016-10-01 16:04:28 +00:00
Simon Pilgrim
fe10dc6252 [X86][SSE] Always combine target shuffles to MOVSD/MOVSS
Now we can commute to BLENDPD/BLENDPS on SSE41+ targets if necessary, so simplify the combine matching where we can.

This required me to add a couple of scalar math movsd/moss fold patterns that hadn't been needed in the past.

llvm-svn: 283038
2016-10-01 15:33:01 +00:00
Simon Pilgrim
74de116143 [X86][SSE] Enable commutation from MOVSD/MOVSS to BLENDPD/BLENDPS on SSE41+ targets
Instead of selecting between MOVSD/MOVSS and BLENDPD/BLENDPS at shuffle lowering by subtarget this will help us select the instruction based on actual commutation requirements.

We could possibly add BLENDPD/BLENDPS -> MOVSD/MOVSS commutation and MOVSD/MOVSS memory folding using a similar approach if it proves useful

I avoided adding AVX512 handling as I'm not sure when we should be making use of VBLENDPD/VBLENDPS on EVEX targets

llvm-svn: 283037
2016-10-01 14:26:11 +00:00
Simon Pilgrim
a9d66b0321 [X86][SSE] Regenerate vselect tests and improve AVX1/AVX2 coverage
llvm-svn: 283035
2016-10-01 13:10:14 +00:00
Craig Topper
6d5567d561 [X86] Cleanup patterns for using VMOVDDUP for broadcasts.
-Remove OptForSize. Not all of the backend follows the same rules for creating broadcasts and there is no conflicting pattern.
-Don't stop selecting VEX VMOVDDUP when AVX512 is supported. We need VLX for EVEX VMOVDDUP.
-Only use VMOVDDUP for v2i64 broadcasts if AVX2 is not supported.

llvm-svn: 283020
2016-10-01 07:11:24 +00:00
Craig Topper
722b7c4513 [AVX-512] Add VLX command lines to 128 and 256-bit shufffle tests.
llvm-svn: 283014
2016-10-01 06:01:18 +00:00
Mehdi Amini
1851fff9c0 Revert "AMDGPU: Don't use offen if it is 0"
This reverts commit r282999.
Tests are not passing: http://lab.llvm.org:8011/builders/clang-x86_64-linux-selfhost-modules/builds/20038

llvm-svn: 283003
2016-10-01 02:35:24 +00:00
Matt Arsenault
58e2ff3f3c AMDGPU: Don't use offen if it is 0
This removes many re-initializations of a base register to 0.

llvm-svn: 282999
2016-10-01 01:37:15 +00:00
Reid Kleckner
5f6739ae19 [SEH] Emit the parent frame offset label even if there are no funclets
This avoids errors about references to undefined local labels from
unreferenced filter functions.

Fixes (sort of) PR30431

llvm-svn: 282967
2016-09-30 22:10:12 +00:00
Hans Wennborg
a560abffe4 X86: Allow conditional tail calls in Win64 "leaf" functions (PR26302)
We can't use Jcc to leave a Win64 function in general, because that
confuses the unwinder. However, for "leaf" functions, that is, functions
where the return address is always on top of the stack and which don't
have unwind info, it's OK.

Differential Revision: https://reviews.llvm.org/D24836

llvm-svn: 282920
2016-09-30 20:07:35 +00:00
Derek Schuff
cc1cf3d580 [WebAssembly] Make register stackification more conservative
Register stackification currently checks VNInfo for changes. Make that
more accurate by testing each intervening instruction for any other defs
to the same virtual register.

Patch by Jacob Gravelle

Differential Revision: https://reviews.llvm.org/D24942

llvm-svn: 282886
2016-09-30 18:02:54 +00:00
Dylan McKay
8e9c5b26db Revert "[RegAllocGreedy] Attempt to split unspillable live intervals"
It was accidentally committed.

llvm-svn: 282855
2016-09-30 14:05:15 +00:00
Dylan McKay
2b3e685349 [RegAllocGreedy] Attempt to split unspillable live intervals
Summary:
Previously, when allocating unspillable live ranges, we would never
attempt to split. We would always bail out and try last ditch graph
recoloring.

This patch changes this by attempting to split all live intervals before
performing recoloring.

This fixes LLVM bug PR14879.

I can't add test cases for any backends other than AVR because none of
them have small enough register classes to trigger the bug.

Reviewers: qcolombet

Subscribers: MatzeB

Differential Revision: https://reviews.llvm.org/D25070

llvm-svn: 282852
2016-09-30 13:59:20 +00:00
Craig Topper
0213fa7d16 Revert r282835 "[AVX-512] Always use the full 32 register vector classes for addRegisterClass regardless of whether AVX512/VLX is enabled or not."
Turns out this doesn't pass verify-machineinstrs.

llvm-svn: 282841
2016-09-30 05:35:42 +00:00
Craig Topper
829b549027 [AVX-512] Always use the full 32 register vector classes for addRegisterClass regardless of whether AVX512/VLX is enabled or not.
If AVX512 is disabled, the registers should already be marked reserved. Pattern predicates and register classes on instructions should take care of most of the rest. Loads/stores and physical register copies for XMM16-31 and YMM16-31 without VLX have already been taken care of.

I'm a little unclear why this changed the register allocation of the SSE2 run of the sad.ll test, but the registers selected appear to be valid after this change.

llvm-svn: 282835
2016-09-30 04:31:33 +00:00
Matt Arsenault
3a9a1ac61b AMDGPU: Use unsigned compare for eq/ne
For some reason there are both of these available, except
for scalar 64-bit compares which only has u64. I'm not sure
why there are both (I'm guessing it's for the one bit inputs we
don't use), but for consistency always using the
unsigned one.

llvm-svn: 282832
2016-09-30 01:50:20 +00:00
Reid Kleckner
d8564063fd [X86] Don't preserve Win64 SSE CSRs when SSE is disabled
Code that doesn't use floating point and doesn't use SSE (kernel code)
shouldn't save and restore SSE registers.

Fixes PR30503

llvm-svn: 282819
2016-09-30 00:17:49 +00:00
Simon Pilgrim
3fc7eac79f [X86] Add explicit test triple to make windows/msvc builds happier
llvm-svn: 282719
2016-09-29 15:10:09 +00:00
Craig Topper
e3a9100e63 [X86] Really fix the FileCheck line from r282690.
Why does Folded Spill comments print with a different number of # characters on different systems?

llvm-svn: 282693
2016-09-29 06:49:21 +00:00
Craig Topper
44003789a5 [AVX-512] Fix a check line from r282690.
llvm-svn: 282691
2016-09-29 06:37:21 +00:00
Craig Topper
574fec2e71 [AVX-512] Support spills of XMM16-31 and YMM16-31 when VLX isn't available.
This adds new pseudo instructions that can be selected during register allocation to represent loads and stores of XMM/YMM registers when AVX512F is available, but VLX isn't. They will be converted to VEX encoded moves if the register turns out to be XMM0-15/YMM0-15. Otherwise either an EVEX VEXTRACT(store) or VBROADCAST(load) will be used.

Fixes one of the cases from PR29112.

llvm-svn: 282690
2016-09-29 06:07:09 +00:00
Craig Topper
eb81e5883d [X86] Remove extra FileCheck lines that got left behind in r282688.
llvm-svn: 282689
2016-09-29 06:07:07 +00:00
Craig Topper
b2bb9a2bbc [AVX-512] Replicate pattern from AVX to select VMOVDDUP for (v2f64 (X86VBroadcast f64:)). Add AVX512VL to command line of existing AVX2 test that hits this condition.
llvm-svn: 282688
2016-09-29 05:54:43 +00:00
Craig Topper
4a193b69ad [X86] Add EVEX encoded VBROADCASTSS/SD and VPBROADCASTD/Q to execution domain fixing table.
llvm-svn: 282687
2016-09-29 05:54:39 +00:00
Craig Topper
4152760e70 [X86] Add 512-bit VPBROADCASTB and VPBROADCASTW tests.
llvm-svn: 282685
2016-09-29 05:54:32 +00:00
Craig Topper
9593dd5f5b [X86] Add VBROADCASTF128/VBROADCASTI128 to execution domain fixing tables.
llvm-svn: 282684
2016-09-29 05:54:28 +00:00
Matt Arsenault
c8493c6153 AMDGPU: Partially fix control flow at -O0
Fixes to allow spilling all registers at the end of the block
work with exec modifications. Don't emit s_and_saveexec_b64 for
if lowering, and instead emit copies. Mark control flow mask
instructions as terminators to get correct spill code placement
with fast regalloc, and then have a separate optimization pass
form the saveexec.

This should work if SGPRs are spilled to VGPRs, but
will likely fail in the case that an SGPR spills to memory
and no workitem takes a divergent branch.

llvm-svn: 282667
2016-09-29 01:44:16 +00:00
Krzysztof Parzyszek
738486a316 IfConversion: Add implicit uses for redefined regs with live subregisters
Normally, if conversion would add implicit uses for redefined registers,
e.g. R0<def> = add_if ..., R0<imp-use>. However, if only subregisters of
R0 are known to be live but not R0 itself, such implicit uses will not be
added, causing prior definitions of such subregisters and R0 itself to
become dead.

llvm-svn: 282626
2016-09-28 20:07:41 +00:00
Konstantin Zhuravlyov
ebf3beb03f [AMDGPU] Promote uniform i16 ops to i32 ops for targets that have 16 bit instructions
Differential Revision: https://reviews.llvm.org/D24125

llvm-svn: 282624
2016-09-28 20:05:39 +00:00
Simon Pilgrim
f88cb7e092 [X86][AVX] Add test showing that VBROADCAST loads don't correctly respect dependencies
llvm-svn: 282613
2016-09-28 17:59:30 +00:00
Adrian Prantl
47833cff03 Teach LiveDebugValues about lexical scopes.
This addresses PR26055 LiveDebugValues is very slow.

Contrary to the old LiveDebugVariables pass LiveDebugValues currently
doesn't look at the lexical scopes before inserting a DBG_VALUE
intrinsic. This means that we often propagate DBG_VALUEs much further
down than necessary. This is especially noticeable in large C++
functions with many inlined method calls that all use the same
"this"-pointer.

For example, in the following code it makes no sense to propagate the
inlined variable a from the first inlined call to f() into any of the
subsequent basic blocks, because the variable will always be out of
scope:

void sink(int a);
void __attribute((always_inline)) f(int a) { sink(a); }
void foo(int i) {
   f(i);
   if (i)
     f(i);
   f(i);
}

This patch reuses the LexicalScopes infrastructure we have for
LiveDebugVariables to take this into account.

The effect on compile time and memory consumption is quite noticeable:
I tested a benchmark that is a large C++ source with an enormous
amount of inlined "this"-pointers that would previously eat >24GiB
(most of them for DBG_VALUE intrinsics) and whose compile time was
dominated by LiveDebugValues. With this patch applied the memory
consumption is 1GiB and 1.7% of the time is spent in LiveDebugValues.

https://reviews.llvm.org/D24994
Thanks to Daniel Berlin and Keith Walker for reviewing!

llvm-svn: 282611
2016-09-28 17:51:14 +00:00
Artem Belevich
ed0bd7024b [NVPTX] Added intrinsics for atom.gen.{sys|cta}.* instructions.
These are only available on sm_60+ GPUs.

Differential Revision: https://reviews.llvm.org/D24943

llvm-svn: 282607
2016-09-28 17:25:38 +00:00
Nirav Dave
1f7d22e77d Revert "In visitSTORE, always use FindBetterChain, rather than only when UseAA is enabled."
This reverts commit r282600 due to test failues with MCJIT

llvm-svn: 282604
2016-09-28 16:37:50 +00:00
Nirav Dave
080cb64e9c In visitSTORE, always use FindBetterChain, rather than only when UseAA is enabled.
Simplify Consecutive Merge Store Candidate Search

  Now that address aliasing is much less conservative, push through
  simplified store merging search which only checks for parallel stores
  through the chain subgraph. This is cleaner as the separation of
  non-interfering loads/stores from the store-merging logic.

  Whem merging stores, search up the chain through a single load, and
  finds all possible stores by looking down from through a load and a
  TokenFactor to all stores visited. This improves the quality of the
  output SelectionDAG and generally the output CodeGen (with some
  exceptions).

  Additional Minor Changes:

    1. Finishes removing unused AliasLoad code
    2. Unifies the the chain aggregation in the merged stores across
       code paths
    3. Re-add the Store node to the worklist after calling
       SimplifyDemandedBits.
    4. Increase GatherAllAliasesMaxDepth from 6 to 18. That number is
       arbitrary, but seemed sufficient to not cause regressions in
       tests.

  This finishes the change Matt Arsenault started in r246307 and
  jyknight's original patch.

  Many tests required some changes as memory operations are now
  reorderable. Some tests relying on the order were changed to use
  volatile memory operations

  Noteworthy tests:

    CodeGen/AArch64/argument-blocks.ll -
      It's not entirely clear what the test_varargs_stackalign test is
      supposed to be asserting, but the new code looks right.

    CodeGen/AArch64/arm64-memset-inline.lli -
    CodeGen/AArch64/arm64-stur.ll -
    CodeGen/ARM/memset-inline.ll -
      The backend now generates *worse* code due to store merging
      succeeding, as we do do a 16-byte constant-zero store efficiently.

    CodeGen/AArch64/merge-store.ll -
      Improved, but there still seems to be an extraneous vector insert
      from an element to itself?

    CodeGen/PowerPC/ppc64-align-long-double.ll -
      Worse code emitted in this case, due to the improved store->load
      forwarding.

    CodeGen/X86/dag-merge-fast-accesses.ll -
    CodeGen/X86/MergeConsecutiveStores.ll -
    CodeGen/X86/stores-merging.ll -
    CodeGen/Mips/load-store-left-right.ll -
      Restored correct merging of non-aligned stores

    CodeGen/AMDGPU/promote-alloca-stored-pointer-value.ll -
      Improved. Correctly merges buffer_store_dword calls

    CodeGen/AMDGPU/si-triv-disjoint-mem-access.ll -
      Improved. Sidesteps loading a stored value and merges two stores

    CodeGen/X86/pr18023.ll -
      This test has been removed, as it was asserting incorrect
      behavior. Non-volatile stores *CAN* be moved past volatile loads,
      and now are.

    CodeGen/X86/vector-idiv.ll -
    CodeGen/X86/vector-lzcnt-128.ll -
      It's basically impossible to tell what these tests are actually
      testing. But, looks like the code got better due to the memory
      operations being recognized as non-aliasing.

    CodeGen/X86/win32-eh.ll -
      Both loads of the securitycookie are now merged.

    CodeGen/AMDGPU/vgpr-spill-emergency-stack-slot-compute.ll -
      This test appears to work but no longer exhibits the spill
      behavior.

Reviewers: arsenm, hfinkel, tstellarAMD, nhaehnle, jyknight

Subscribers: wdng, nhaehnle, nemanjai, arsenm, weimingz, niravd, RKSimon, aemerson, qcolombet, resistor, tstellarAMD, t.p.northover, spatel

Differential Revision: https://reviews.llvm.org/D14834

llvm-svn: 282600
2016-09-28 15:50:43 +00:00
Guy Blank
4de6e44893 [X86][FastISel] Use a COPY from K register to a GPR instead of a K operation
The KORTEST was introduced due to a bug where a TEST instruction used a K register.
but, turns out that the opposite case of KORTEST using a GPR is now happening

The change removes the KORTEST flow and adds a COPY instruction from the K reg to a GPR.

Differential Revision: https://reviews.llvm.org/D24953

llvm-svn: 282580
2016-09-28 11:22:17 +00:00
Michael Kuperstein
8fe0e6eb01 [DAG] Remove isVectorClearMaskLegal() check from vector_build dagcombine
This check currently doesn't seem to do anything useful on any in-tree target:
On non-x86, it always evaluates to false, so we never hit the code path that
creates the shuffle with zero.
On x86, it just forwards to isShuffleMaskLegal(), which is a reasonable thing to
query in general, but doesn't make sense if only restricted to zero blends.

Differential Revision: https://reviews.llvm.org/D24625

llvm-svn: 282567
2016-09-28 06:13:58 +00:00
Sanjay Patel
f1f848bbad [x86] add folds for FP logic with vector zeros
The 'or' case shows up in copysign. The copysign code also had 
redundant checking for a scalar zero operand with 'and', so I 
removed that. 

I'm not sure how to test vector 'and', 'andn', and 'xor' yet, 
but it seems better to just include all of the logic ops since
we're fixing 'or' anyway.

llvm-svn: 282546
2016-09-27 22:28:13 +00:00
Geoff Berry
8e556287ce [TargetRegisterInfo, AArch64] Add target hook for isConstantPhysReg().
Summary:
The current implementation of isConstantPhysReg() checks for defs of
physical registers to determine if they are constant.  Some
architectures (e.g. AArch64 XZR/WZR) have registers that are constant
and may be used as destinations to indicate the generated value is
discarded, preventing isConstantPhysReg() from returning true.  This
change adds a TargetRegisterInfo hook that overrides the no defs check
for cases such as this.

Reviewers: MatzeB, qcolombet, t.p.northover, jmolloy

Subscribers: junbuml, aemerson, mcrosier, rengolin

Differential Revision: https://reviews.llvm.org/D24570

llvm-svn: 282543
2016-09-27 22:17:27 +00:00
Keith Walker
92e5b1fa76 Propagate DBG_VALUE entries when there are unvisited predecessors
Variables are sometimes missing their debug location information in
blocks in which the variables should be available. This would occur
when one or more predecessor blocks had not yet been visited by the
routine which propagated the information from predecessor blocks.

This is addressed by only considering predecessor blocks which have
already been visited.

The solution to this problem was suggested by Daniel Berlin on the
LLVM developer mailing list.

Differential Revision: https://reviews.llvm.org/D24927

llvm-svn: 282506
2016-09-27 16:46:07 +00:00
Simon Dardis
1e548fbd1e [mips] Disable tail calls temporarily
Disable tail calls while the remaining bugs are fixed. Enable only for tests.

Reviewers: vkalintiris

Differential Review: https://reviews.llvm.org/D24912

llvm-svn: 282487
2016-09-27 13:15:54 +00:00
Nemanja Ivanovic
bf003f8d13 [Power9] Builtins for ELF v.2 API conformance - back end portion
This patch corresponds to review:
https://reviews.llvm.org/D24396

This patch adds support for the "vector count trailing zeroes",
"vector compare not equal" and "vector compare not equal or zero instructions"
as well as "scalar count trailing zeroes" instructions. It also changes the
vector negation to use XXLNOR (when VSX is enabled) so as not to increase
register pressure (previously this was done with a splat immediate of all
ones followed by an XXLXOR). This was done because the altivec.h
builtins (patch to follow) use vector negation and the use of an additional
register for the splat immediate is not optimal.

llvm-svn: 282478
2016-09-27 08:42:12 +00:00
Craig Topper
d123dcae08 [X86] Add test case for PR30511 and r282341.
llvm-svn: 282473
2016-09-27 06:44:30 +00:00
Craig Topper
b55a029a4f [X86] Expand all-ones-vector test to cover 256-bit and 512-bit vectors.
llvm-svn: 282472
2016-09-27 06:44:27 +00:00
Davide Italiano
836f9d2e4e [CodeGen] Add support for emitting .init_array instead of .ctors on FreeBSD.
PR: 30494
llvm-svn: 282451
2016-09-26 22:53:15 +00:00
Davide Italiano
5722b8c65d [CodeGen] Switch test as FreeBSD will support .init_array soon.
llvm-svn: 282450
2016-09-26 22:38:17 +00:00
Derek Schuff
831613e75c [WebAssembly] Use the frame pointer instead of the stack pointer
When we have dynamic allocas we have a frame pointer, and
when we're lowering frame indexes we should make sure we use it.

Patch by Jacob Gravelle

Differential Revision: https://reviews.llvm.org/D24889

llvm-svn: 282442
2016-09-26 21:18:03 +00:00
Evandro Menezes
22d7aa5fa1 [AArch64] Fix test triplet
Specify proper target triplet to pass under Windows too.

llvm-svn: 282423
2016-09-26 18:09:21 +00:00
Tom Stellard
6e3f633648 AMDGPU/SI: Don't crash on anonymous GlobalValues
Summary:
We need to call AsmPrinter::getNameWithPrefix() in order to handle
anonymous GlobalValues (e.g. @0, @1).

Reviewers: arsenm, b-sumner

Subscribers: arsenm, kzhuravl, wdng, nhaehnle, yaxunl, tony-tye, llvm-commits

Differential Revision: https://reviews.llvm.org/D24865

llvm-svn: 282420
2016-09-26 17:29:25 +00:00
Geoff Berry
0e859047b9 [AArch64] Improve add/sub/cmp isel of uxtw forms.
Don't match the UXTW extended reg forms of ADD/ADDS/SUB/SUBS if the
32-bit to 64-bit zero-extend can be done for free by taking advantage
of the 32-bit defining instruction zeroing the upper 32-bits of the X
register destination.  This enables better instruction selection in a
few cases, such as:

  sub x0, xzr, x8
  instead of:
  mov x8, xzr
  sub x0, x8, w9, uxtw

  madd x0, x1, x1, x8
  instead of:
  mul x9, x1, x1
  add x0, x9, w8, uxtw

  cmp x2, x8
  instead of:
  sub x8, x2, w8, uxtw
  cmp x8, #0

  add x0, x8, x1, lsl #3
  instead of:
  lsl x9, x1, #3
  add x0, x9, w8, uxtw

Reviewers: t.p.northover, jmolloy

Subscribers: mcrosier, aemerson, llvm-commits, rengolin

Differential Revision: https://reviews.llvm.org/D24747

llvm-svn: 282413
2016-09-26 15:34:47 +00:00