19528 Commits

Author SHA1 Message Date
Hans Wennborg
9e194fb206 Merging r296093 and r296260:
------------------------------------------------------------------------
r296093 | ctopper | 2017-02-23 22:38:24 -0800 (Thu, 23 Feb 2017) | 1 line

[ExecutionDepsFix] Use range-based for loop. NFC
------------------------------------------------------------------------

------------------------------------------------------------------------
r296260 | ctopper | 2017-02-25 10:12:25 -0800 (Sat, 25 Feb 2017) | 18 lines

[ExecutionDepsFix] Don't make copies of LiveReg objects when collecting operands for soft instructions

Summary:
While collecting operands we make copies of the LiveReg objects which are stored in the LiveRegs array. If the instruction uses the same register multiple times we end up with multiple copies. Later we iterate through the collected list of LiveReg objects and merge DomainValues. In the process of doing this the merge function can change the contents of the original LiveReg object in the LiveRegs array, but not the copies that have been made. So when we get to the second usage of the register we end up seeing a stale copy of the LiveReg object.

To fix this I've stopped copying and now just store a pointer to the original LiveReg object. Another option might be to avoid adding the same register to the Regs array twice, but this approach seemed simpler.

The included test case exposes this bug due to an AVX-512 masked OR instruction using the same register for the passthru operand and one of the inputs to the OR operation.

Fixes PR30284.

Reviewers: RKSimon, stoklund, MatzeB, spatel, myatsina

Reviewed By: RKSimon

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D30242
------------------------------------------------------------------------


git-svn-id: https://llvm.org/svn/llvm-project/llvm/branches/release_40@296380 91177308-0d34-0410-b5e6-96231b3b80d8
2017-02-27 20:53:28 +00:00
Hans Wennborg
fa13347f23 Merging r295990:
------------------------------------------------------------------------
r295990 | jvesely | 2017-02-23 08:12:21 -0800 (Thu, 23 Feb 2017) | 5 lines

AMDGPU/SI: Fix trunc i16 pattern

Hit on ASICs that support 16bit instructions.

Differential Revision: https://reviews.llvm.org/D30281
------------------------------------------------------------------------


git-svn-id: https://llvm.org/svn/llvm-project/llvm/branches/release_40@296158 91177308-0d34-0410-b5e6-96231b3b80d8
2017-02-24 18:56:41 +00:00
Hans Wennborg
8de5c2160f Merging r295762:
------------------------------------------------------------------------
r295762 | eugenis | 2017-02-21 12:17:34 -0800 (Tue, 21 Feb 2017) | 3 lines

Fix PR31896.

Address of an alias of a global with offset is incorrectly lowered as an address of the global (i.e. ignoring offset).
------------------------------------------------------------------------


git-svn-id: https://llvm.org/svn/llvm-project/llvm/branches/release_40@296002 91177308-0d34-0410-b5e6-96231b3b80d8
2017-02-23 18:39:15 +00:00
Hans Wennborg
d8ff05ed89 Backport r293433, ARM: support -mlong-calls with AEABI TLS on ELF
Support lowering AEABI TLS access (__aeabi_read_tp) with long calls.
This requires adjusting the call sequence to use an indirect call to get
full addressability.

Resolves PR31769!

By Saleem Abdulrasool!


git-svn-id: https://llvm.org/svn/llvm-project/llvm/branches/release_40@295910 91177308-0d34-0410-b5e6-96231b3b80d8
2017-02-23 00:14:14 +00:00
Hans Wennborg
eb6d6dd6f6 Merging r295512:
------------------------------------------------------------------------
r295512 | matze | 2017-02-17 15:15:03 -0800 (Fri, 17 Feb 2017) | 8 lines

AArch64LoadStoreOptimizer: Correctly clear kill flags

When promoting the Load of a Store-Load pair to a COPY all kill flags
between the store and the load need to be cleared.

rdar://30402435

Differential Revision: https://reviews.llvm.org/D30110
------------------------------------------------------------------------


git-svn-id: https://llvm.org/svn/llvm-project/llvm/branches/release_40@295744 91177308-0d34-0410-b5e6-96231b3b80d8
2017-02-21 18:46:57 +00:00
Hans Wennborg
00db617958 Merging r295213:
------------------------------------------------------------------------
r295213 | mkuper | 2017-02-15 10:37:26 -0800 (Wed, 15 Feb 2017) | 10 lines

[DAG] Don't try to create an INSERT_SUBVECTOR with an illegal source

We currently can't legalize those, but we should really not be creating
them in the first place, since legalization would probably look similar to the
way we legalize CONCAT_VECTORS - basically replace the INSERT with a BUILD.

This fixes PR311956.

Differential Revision: https://reviews.llvm.org/D29961

------------------------------------------------------------------------


git-svn-id: https://llvm.org/svn/llvm-project/llvm/branches/release_40@295374 91177308-0d34-0410-b5e6-96231b3b80d8
2017-02-16 21:11:04 +00:00
Hans Wennborg
96275962aa Merging r294129:
------------------------------------------------------------------------
r294129 | gberry | 2017-02-05 10:28:14 -0800 (Sun, 05 Feb 2017) | 16 lines

[SelectionDAG] In InstrEmitter, handle EXTRACT_SUBREG of a physical register.

Summary:
Without this change, the getVR() call would hit an assert since it was
being passed a physical register.

Update the AArch64/ldst-opt.ll test with a case that triggers this
behavior by adding a run with strict-align, which causes an unaligned
STR XZR instruction to be split into byte stores, creating an
EXTRACT_SUBREG of XZR that triggers the original problem.

Reviewers: bogner, qcolombet, MatzeB, atrick

Subscribers: aemerson, mcrosier, llvm-commits

Differential Revision: https://reviews.llvm.org/D29495
------------------------------------------------------------------------


git-svn-id: https://llvm.org/svn/llvm-project/llvm/branches/release_40@295250 91177308-0d34-0410-b5e6-96231b3b80d8
2017-02-15 22:36:35 +00:00
Hans Wennborg
946015a083 Merging r294003:
------------------------------------------------------------------------
r294003 | abataev | 2017-02-03 04:28:40 -0800 (Fri, 03 Feb 2017) | 8 lines

[SelectionDAG] Fix for PR30775: Assertion `NodeToMatch->getOpcode() !=
ISD::DELETED_NODE && "NodeToMatch was removed partway through
selection"' failed.

NodeToMatch can be modified during matching, but code does not handle
this situation.

Differential Revision: https://reviews.llvm.org/D29292
------------------------------------------------------------------------


git-svn-id: https://llvm.org/svn/llvm-project/llvm/branches/release_40@295249 91177308-0d34-0410-b5e6-96231b3b80d8
2017-02-15 22:34:24 +00:00
Hans Wennborg
f0610d1aac Merging r294982:
------------------------------------------------------------------------
r294982 | arnolds | 2017-02-13 11:58:28 -0800 (Mon, 13 Feb 2017) | 6 lines

swiftcc: Don't emit tail calls from callers with swifterror parameters

Backends don't support this yet. They would have to move to the swifterror
register before the tail call to make sure it is live-in to the call.

rdar://30495920
------------------------------------------------------------------------


git-svn-id: https://llvm.org/svn/llvm-project/llvm/branches/release_40@295219 91177308-0d34-0410-b5e6-96231b3b80d8
2017-02-15 19:12:45 +00:00
Hans Wennborg
dbca326a90 Merging r294527:
------------------------------------------------------------------------
r294527 | arnolds | 2017-02-08 14:30:47 -0800 (Wed, 08 Feb 2017) | 14 lines

[ARM/AArch ISel] SwiftCC: First parameters that are marked swiftself are not 'this returns'

We mark X0 as preserved by a call that passes the returned parameter.

 x0 = ...
 fun(x0) // no implicit def of x0

This no longer is valid if we pass the parameter in a different register then
the returned value as is the case with a swiftself parameter (passed in x20).

x20 = ...
fun(x20) // there should be an implict def of x8

rdar://30425845
------------------------------------------------------------------------


git-svn-id: https://llvm.org/svn/llvm-project/llvm/branches/release_40@295135 91177308-0d34-0410-b5e6-96231b3b80d8
2017-02-15 01:06:12 +00:00
Hans Wennborg
407aa2606f Merging r294551:
------------------------------------------------------------------------
r294551 | arnolds | 2017-02-08 17:52:17 -0800 (Wed, 08 Feb 2017) | 10 lines

SwiftCC: swifterror register cannot be as the base register

Functions that have a dynamic alloca require a base register which is defined to
be X19 on AArch64 and r6 on ARM.  We have defined the swifterror register to be
the same register. Use a different callee save register for swifterror instead:

 X21 on AArch64
 R8 on ARM

rdar://30433803
------------------------------------------------------------------------


git-svn-id: https://llvm.org/svn/llvm-project/llvm/branches/release_40@295079 91177308-0d34-0410-b5e6-96231b3b80d8
2017-02-14 17:39:39 +00:00
Matthias Braun
16757866b1 RegisterCoalescer: Fix joinReservedPhysReg()
Merging r294268:

joinReservedPhysReg() can only deal with a liverange in a single basic
block when copying from a vreg into a physreg.

See also rdar://30306405

Differential Revision: https://reviews.llvm.org/D29436

Fixes http://llvm.org/PR31889

git-svn-id: https://llvm.org/svn/llvm-project/llvm/branches/release_40@294631 91177308-0d34-0410-b5e6-96231b3b80d8
2017-02-09 21:45:33 +00:00
Hans Wennborg
366ce55054 Merging r294348:
------------------------------------------------------------------------
r294348 | hans | 2017-02-07 12:37:45 -0800 (Tue, 07 Feb 2017) | 6 lines

[X86] Disable conditional tail calls (PR31257)

They are currently modelled incorrectly (as calls, which clobber
registers, confusing e.g. Machine Copy Propagation).

Reverting until we figure out the proper solution.
------------------------------------------------------------------------


git-svn-id: https://llvm.org/svn/llvm-project/llvm/branches/release_40@294476 91177308-0d34-0410-b5e6-96231b3b80d8
2017-02-08 16:50:40 +00:00
Hans Wennborg
ba84ee42e8 Forgot to add this in r294473
git-svn-id: https://llvm.org/svn/llvm-project/llvm/branches/release_40@294474 91177308-0d34-0410-b5e6-96231b3b80d8
2017-02-08 16:47:33 +00:00
Hans Wennborg
524e53d43f Merging r294203:
------------------------------------------------------------------------
r294203 | john.brawn | 2017-02-06 10:07:20 -0800 (Mon, 06 Feb 2017) | 9 lines

[AArch64] Fix incorrect MachinePointerInfo in splitStoreSplat

When splitting up one store into several in splitStoreSplat we have to
make sure we get the MachinePointerInfo right, otherwise alias
analysis thinks they all store to the same location. This can then
cause invalid scheduling later on.

Differential Revision: https://reviews.llvm.org/D29446

------------------------------------------------------------------------


git-svn-id: https://llvm.org/svn/llvm-project/llvm/branches/release_40@294242 91177308-0d34-0410-b5e6-96231b3b80d8
2017-02-06 21:27:55 +00:00
Hans Wennborg
18b90cd722 Merging r293635:
------------------------------------------------------------------------
r293635 | nha | 2017-01-31 06:35:37 -0800 (Tue, 31 Jan 2017) | 16 lines

[DAGCombine] require UnsafeFPMath for re-association of addition

Summary:
The affected transforms all implicitly use associativity of addition,
for which we usually require unsafe math to be enabled.

The "Aggressive" flag is only meant to convey information about the
performance of the fused ops relative to a fmul+fadd sequence.

Fixes Bug 31626.

Reviewers: spatel, hfinkel, mehdi_amini, arsenm, tstellarAMD

Subscribers: jholewinski, nemanjai, wdng, llvm-commits

Differential Revision: https://reviews.llvm.org/D28675
------------------------------------------------------------------------


git-svn-id: https://llvm.org/svn/llvm-project/llvm/branches/release_40@293940 91177308-0d34-0410-b5e6-96231b3b80d8
2017-02-02 21:34:25 +00:00
Hans Wennborg
5d814fbb84 Merging r293309:
------------------------------------------------------------------------
r293309 | mssimpso | 2017-01-27 09:33:16 -0800 (Fri, 27 Jan 2017) | 20 lines

[ARM/AArch64] Relocate and update InterleavedAccessPass tests (NFC)

The interleaved access pass is an IR-to-IR transformation that runs before code
generation. It matches interleaved memory operations to target-specific
intrinsics (that are later lowered to load and store multiple instructions on
ARM/AArch64). We place tests for similar passes (e.g., GlobalMergePass) under
test/Transforms. This patch moves the InterleavedAccessPass tests out of
test/CodeGen and into target-specific directories under
test/Transforms/InterleavedAccess.

Although the pass is an IR pass, many of the existing tests were llc tests
rather opt tests. For example, the tests would check for ldN/stN instructions
generated by llc rather than the intrinsic calls the pass actually inserts.
Thus, this patch updates all tests to be opt tests that check for the inserted
intrinsics. We already have separate CodeGen tests that ensure we lower the
interleaved access intrinsics to their corresponding ldN/stN instructions. In
addition to migrating the tests to opt, this patch also performs some minor
clean-up (to ensure consistent naming, etc.).

Differential Revision: https://reviews.llvm.org/D29184
------------------------------------------------------------------------


git-svn-id: https://llvm.org/svn/llvm-project/llvm/branches/release_40@293804 91177308-0d34-0410-b5e6-96231b3b80d8
2017-02-01 19:40:39 +00:00
Hans Wennborg
52140ec352 Merging r293522:
------------------------------------------------------------------------
r293522 | bogner | 2017-01-30 10:29:46 -0800 (Mon, 30 Jan 2017) | 8 lines

SDAG: Update ChainNodesMatched during UpdateChains if a node is replaced

Previously, we would hit UB (or the ISD::DELETED_NODE assert) if we
happened to replace a node during UpdateChains, because it would be
left in the list we were iterating over. This nulls out the pointer
when that happens so that we can avoid the issue.

Fixes llvm.org/PR31710
------------------------------------------------------------------------


git-svn-id: https://llvm.org/svn/llvm-project/llvm/branches/release_40@293650 91177308-0d34-0410-b5e6-96231b3b80d8
2017-01-31 17:02:48 +00:00
Evandro Menezes
a9053251e8 [AArch64] Rename 'no-quad-ldst-pairs' to 'slow-paired-128'
In order to follow the pattern of the existing 'slow-misaligned-128store'
option, rename the option 'no-quad-ldst-pairs' to 'slow-paired-128'.



git-svn-id: https://llvm.org/svn/llvm-project/llvm/branches/release_40@293332 91177308-0d34-0410-b5e6-96231b3b80d8
2017-01-27 20:57:35 +00:00
Matt Arsenault
974695d103 Merging r293310:
------------------------------------------------------------------------
r293310 | arsenm | 2017-01-27 09:42:26 -0800 (Fri, 27 Jan 2017) | 8 lines

AMDGPU: Enable FeatureFlatForGlobal on Volcanic Islands

Accomplishes what r292982 was supposed to, which ended up
only really making the necessary test changes.

This should be applied to the 4.0 branch.

Patch by Vedran Miletić <vedran@miletic.net>
------------------------------------------------------------------------


git-svn-id: https://llvm.org/svn/llvm-project/llvm/branches/release_40@293329 91177308-0d34-0410-b5e6-96231b3b80d8
2017-01-27 20:21:31 +00:00
Matt Arsenault
e536396218 Merging r292982:
------------------------------------------------------------------------
r292982 | arsenm | 2017-01-24 14:02:15 -0800 (Tue, 24 Jan 2017) | 8 lines

Enable FeatureFlatForGlobal on Volcanic Islands

This switches to the workaround that HSA defaults to
for the mesa path.

This should be applied to the 4.0 branch.

Patch by Vedran Miletić <vedran@miletic.net>
------------------------------------------------------------------------


git-svn-id: https://llvm.org/svn/llvm-project/llvm/branches/release_40@293326 91177308-0d34-0410-b5e6-96231b3b80d8
2017-01-27 19:26:48 +00:00
Matt Arsenault
cb3fa250a4 Merging r292473:
------------------------------------------------------------------------
r292473 | arsenm | 2017-01-18 22:35:27 -0800 (Wed, 18 Jan 2017) | 9 lines

AMDGPU: Disable some fneg combines unless nsz

For -(x + y) -> (-x) + (-y), if x == -y, this would
change the result from -0.0 to 0.0. Since the fma/fmad
combine is an extension of this problem it also
applies there.

fmul should be fine, and I don't think any of the unary
operators or conversions should be a problem either.
------------------------------------------------------------------------


git-svn-id: https://llvm.org/svn/llvm-project/llvm/branches/release_40@293319 91177308-0d34-0410-b5e6-96231b3b80d8
2017-01-27 18:39:19 +00:00
Matt Arsenault
802910f034 Merging r292472:
------------------------------------------------------------------------
r292472 | arsenm | 2017-01-18 22:04:12 -0800 (Wed, 18 Jan 2017) | 5 lines

AMDGPU: Remove modifiers from v_div_scale_*

They seem to produce nonsense results when used.

This should be applied to the release branch.
------------------------------------------------------------------------


git-svn-id: https://llvm.org/svn/llvm-project/llvm/branches/release_40@293317 91177308-0d34-0410-b5e6-96231b3b80d8
2017-01-27 18:31:33 +00:00
Hans Wennborg
27ec2b89fc Merging r293259:
------------------------------------------------------------------------
r293259 | compnerd | 2017-01-26 19:41:53 -0800 (Thu, 26 Jan 2017) | 11 lines

ARM: fix vectorized division on WoA

The Windows on ARM target uses custom division for normal division as
the backend needs to insert division-by-zero checks.  However, it is
designed to only handle non-vectorized division.  ARM has custom
lowering for vectorized division as that can avoid loading registers
with the values and invoke a division routine for each one, preferring
to lower using NEON instructions.  Fall back to the custom lowering for
the NEON instructions if we encounter a vectorized division.

Resolves PR31778!
------------------------------------------------------------------------


git-svn-id: https://llvm.org/svn/llvm-project/llvm/branches/release_40@293306 91177308-0d34-0410-b5e6-96231b3b80d8
2017-01-27 17:13:59 +00:00
Hans Wennborg
8a028a206e Merging r292712 and r292713:
------------------------------------------------------------------------
r292712 | ctopper | 2017-01-20 22:59:35 -0800 (Fri, 20 Jan 2017) | 1 line

[X86] Add test cases that show bad commuting being allowed to create a phsub operation.
------------------------------------------------------------------------

------------------------------------------------------------------------
r292713 | ctopper | 2017-01-20 22:59:38 -0800 (Fri, 20 Jan 2017) | 3 lines

[X86] Don't allow commuting to form phsub operations.

Fixes PR31714.
------------------------------------------------------------------------


git-svn-id: https://llvm.org/svn/llvm-project/llvm/branches/release_40@293299 91177308-0d34-0410-b5e6-96231b3b80d8
2017-01-27 16:37:00 +00:00
Hans Wennborg
6144eee70a Merging r292516:
------------------------------------------------------------------------
r292516 | rserge | 2017-01-19 12:24:23 -0800 (Thu, 19 Jan 2017) | 14 lines

[XRay][Arm] Repair XRay table emission on Arm32 and add tests to identify such problem earlier

Summary:
Emission of XRay table was occasionally disabled for Arm32, but this bug was not then detected because earlier (also by mistake) testing of XRay was occasionally disabled on 32-bit Arm targets. This patch should fix that problem and detect such problems in the future.
This patch is one of a series, see also
- https://reviews.llvm.org/D28623

Reviewers: rengolin, dberris

Reviewed By: dberris

Subscribers: llvm-commits, aemerson, rengolin, dberris, iid_iunknown

Differential Revision: https://reviews.llvm.org/D28624
------------------------------------------------------------------------


git-svn-id: https://llvm.org/svn/llvm-project/llvm/branches/release_40@293295 91177308-0d34-0410-b5e6-96231b3b80d8
2017-01-27 16:17:56 +00:00
Hans Wennborg
4fba04fd96 Merging r292651:
------------------------------------------------------------------------
r292651 | jvesely | 2017-01-20 13:24:26 -0800 (Fri, 20 Jan 2017) | 8 lines

AMDGPU/R600: Serialize vector trunc stores to private AS

Add DUMMY_CHAIN SDNode to denote stores of interest

Bugzilla: https://llvm.org/bugs/show_bug.cgi?id=28915
Bugzilla: https://llvm.org/bugs/show_bug.cgi?id=30411

Differential Revision: https://reviews.llvm.org/D27964
------------------------------------------------------------------------


git-svn-id: https://llvm.org/svn/llvm-project/llvm/branches/release_40@293118 91177308-0d34-0410-b5e6-96231b3b80d8
2017-01-26 00:26:36 +00:00
Tim Northover
649560b9e2 Merging rr293088:
------------------------------------------------------------------------
r293088 | tnorthover | 2017-01-25 12:58:26 -0800 (Wed, 25 Jan 2017) | 5 lines

SDag: fix how initial loads are formed when splitting vector ops.

Later code expects the vector loads produced to be directly
concatenable, which means we shouldn't pad anything except the last load
produced with UNDEF.
------------------------------------------------------------------------


git-svn-id: https://llvm.org/svn/llvm-project/llvm/branches/release_40@293103 91177308-0d34-0410-b5e6-96231b3b80d8
2017-01-25 22:10:07 +00:00
Hans Wennborg
dc54ec4e5b Merging r292444:
------------------------------------------------------------------------
r292444 | mkuper | 2017-01-18 15:05:58 -0800 (Wed, 18 Jan 2017) | 7 lines

Revert r291670 because it introduces a crash.

r291670 doesn't crash on the original testcase from PR31589,
but it crashes on a slightly more complex one.

PR31589 has the new reproducer.

------------------------------------------------------------------------


git-svn-id: https://llvm.org/svn/llvm-project/llvm/branches/release_40@293070 91177308-0d34-0410-b5e6-96231b3b80d8
2017-01-25 16:57:43 +00:00
Hans Wennborg
52cd70f859 Merging r291909:
------------------------------------------------------------------------
r291909 | compnerd | 2017-01-13 08:25:33 -0800 (Fri, 13 Jan 2017) | 9 lines

ARM: match GCC's behaviour for builtins

GCC changes the CC between the user-code and the builtins based on the
value of `-target` rather than `-mfloat-abi`.  When a HF target is used,
the VFP variant of the AAPCS CC is used.  Otherwise, the AAPCS variant
is used.  In all cases, the AEABI functions use the AAPCS CC.  Adjust
the calling convention based on the target.

Resolves PR30543!
------------------------------------------------------------------------


git-svn-id: https://llvm.org/svn/llvm-project/llvm/branches/release_40@292951 91177308-0d34-0410-b5e6-96231b3b80d8
2017-01-24 16:58:58 +00:00
Hans Wennborg
3ff1f39e4d Merging r292758:
------------------------------------------------------------------------
r292758 | spatel | 2017-01-22 09:06:12 -0800 (Sun, 22 Jan 2017) | 4 lines

[x86] avoid crashing with illegal vector type (PR31672)

https://llvm.org/bugs/show_bug.cgi?id=31672

------------------------------------------------------------------------


git-svn-id: https://llvm.org/svn/llvm-project/llvm/branches/release_40@292832 91177308-0d34-0410-b5e6-96231b3b80d8
2017-01-23 21:33:34 +00:00
Matthias Braun
b3eb8fe616 Cherry pick r292625
git-svn-id: https://llvm.org/svn/llvm-project/llvm/branches/release_40@292820 91177308-0d34-0410-b5e6-96231b3b80d8
2017-01-23 19:26:12 +00:00
Joerg Sonnenberger
58af97afa2 Merging r292244:
------------------------------------------------------------------------
r292244 | joerg | 2017-01-17 20:29:15 +0100 (Di, 17. Jan 2017) | 2 Zeilen

Remove an overeager assert from r288844.

------------------------------------------------------------------------


git-svn-id: https://llvm.org/svn/llvm-project/llvm/branches/release_40@292453 91177308-0d34-0410-b5e6-96231b3b80d8
2017-01-19 00:19:28 +00:00
Hans Wennborg
eb0fbb2a84 Merging r292242:
------------------------------------------------------------------------
r292242 | bwilson | 2017-01-17 11:18:57 -0800 (Tue, 17 Jan 2017) | 5 lines

Revert r291640 change to fold X86 comparison with atomic_load_add.

Even with the fix from r291630, this still causes problems. I get
widespread assertion failures in the Swift runtime's WeakRefCount::increment()
function. I sent a reduced testcase in reply to the commit.
------------------------------------------------------------------------


git-svn-id: https://llvm.org/svn/llvm-project/llvm/branches/release_40@292243 91177308-0d34-0410-b5e6-96231b3b80d8
2017-01-17 19:29:13 +00:00
Nikolai Bozhenov
7b4bd48edb [X86] Replace AND+IMM64 with SRL/SHL
Emit SHRQ/SHLQ instead of ANDQ with a 64 bit constant mask if the result
is unused and the mask has only higher/lower bits set. For example, with
this patch LLVM emits

  shrq $41, %rdi
  je

instead of

  movabsq $0xFFFFFE0000000000, %rcx
  testq   %rcx, %rdi
  je

This reduces number of instructions, code size and register pressure.
The transformation is applied only for cases where the mask cannot be
encoded as an immediate value within TESTQ instruction.

Differential Revision: https://reviews.llvm.org/D28198


git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@291806 91177308-0d34-0410-b5e6-96231b3b80d8
2017-01-12 19:54:27 +00:00
Nikolai Bozhenov
7257b8b517 [X86] Modify BypassSlowDivision tests to match their new names (NFC)
- bypass-slow-division-32.ll:
  tests verifying correctness of divl-to-divb bypassing

- bypass-slow-division-64.ll:
  tests verifying correctness of divq-to-divl bypassing

- bypass-slow-division-tune.ll:
  tests verifying that bypassing is enabled only when appropriate

Differential Revision: https://reviews.llvm.org/D28551


git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@291804 91177308-0d34-0410-b5e6-96231b3b80d8
2017-01-12 19:48:01 +00:00
Nikolai Bozhenov
19b86062e1 [X86] Rename tests for bypassing slow division (NFC)
For tests on bypassing slow division there's no need to be
Atom-specific. The patch renames all tests on division bypassing
and makes their names more consistent:

  atom-bypass-slow-division.ll -> bypass-slow-division-32.ll
  (tests verifying correctness of divl-to-divb bypassing)

  atom-bypass-slow-division-64.ll -> bypass-slow-division-64.ll
  (tests verifying correctness of divq-to-divl bypassing)

  slow-div.ll -> bypass-slow-division-tune.ll
  (tests verifying that bypassing is enabled only when appropriate)

Differential Revision: https://reviews.llvm.org/D28197


git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@291802 91177308-0d34-0410-b5e6-96231b3b80d8
2017-01-12 19:41:27 +00:00
Nikolai Bozhenov
724695062b [X86] Tune bypassing of slow division for Intel CPUs
64-bit integer division in Intel CPUs is extremely slow, much slower
than 32-bit division. On the other hand, 8-bit and 16-bit divisions
aren't any faster. The only important exception is Atom where DIV8
is fastest. Because of that, the patch
1) Enables bypassing of 64-bit division for Atom, Silvermont and
   all big cores.
2) Modifies 64-bit bypassing to use 32-bit division instead of
   16-bit one. This doesn't make the shorter division slower but
   increases chances of taking it. Moreover, it's much more likely
   to prove at compile-time that a value fits 32 bits and doesn't
   require a run-time check (e.g. zext i32 to i64).

Differential Revision: https://reviews.llvm.org/D28196


git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@291800 91177308-0d34-0410-b5e6-96231b3b80d8
2017-01-12 19:34:15 +00:00
Nikolai Bozhenov
ac2f8ae366 [X86] Update LLC tests for slow division bypassing (NFC)
Run update_llc_test_checks.py on

    CodeGen/X86/atom-bypass-slow-division.ll
    CodeGen/X86/atom-bypass-slow-division-64.ll
    CodeGen/X86/slow-div.ll

Differential Revision: https://reviews.llvm.org/D28469


git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@291799 91177308-0d34-0410-b5e6-96231b3b80d8
2017-01-12 19:29:18 +00:00
Matt Arsenault
cd002582ba AMDGPU: Skip fneg/select combine if it can fold into other
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@291792 91177308-0d34-0410-b5e6-96231b3b80d8
2017-01-12 18:58:15 +00:00
Matt Arsenault
9db1ec3d4d AMDGPU: Fold free fneg into sin
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@291790 91177308-0d34-0410-b5e6-96231b3b80d8
2017-01-12 18:48:09 +00:00
Matt Arsenault
49dd8fcb21 AMDGPU: Fold fneg into fmul_legacy
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@291784 91177308-0d34-0410-b5e6-96231b3b80d8
2017-01-12 18:26:30 +00:00
Matt Arsenault
bd870734a5 AMDGPU: Fold fneg into rcp
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@291779 91177308-0d34-0410-b5e6-96231b3b80d8
2017-01-12 17:46:35 +00:00
Matt Arsenault
cca494fd03 AMDGPU: Fold fneg into fp_round
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@291778 91177308-0d34-0410-b5e6-96231b3b80d8
2017-01-12 17:46:33 +00:00
Matt Arsenault
e652041f69 AMDGPU: Fold fneg into fp_extend
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@291777 91177308-0d34-0410-b5e6-96231b3b80d8
2017-01-12 17:46:28 +00:00
Craig Topper
4a4c1fcaaa [AVX-512] Improve lowering of zero_extend of v4i1 to v4i32 and v2i1 to v2i64 with VLX, but no DQ or BW support.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@291747 91177308-0d34-0410-b5e6-96231b3b80d8
2017-01-12 06:49:12 +00:00
Craig Topper
49cfd1ffd8 [AVX-512] Improve lowering of sign_extend of v4i1 to v4i32 and v2i1 to v2i64 when avx512vl is available, but not avx512dq.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@291746 91177308-0d34-0410-b5e6-96231b3b80d8
2017-01-12 06:49:08 +00:00
Elad Cohen
f160ecc799 [X86][AVX512] Fix PR31515 - Do not flip vselect condition if it's not a vXi1 mask
r289653 added a case where `vselect <cond> <vector1> <all-zeros>`
is transformed to:
`vselect xor(cond, DAG.getConstant(1, DL, CondVT) <all-zeros> <vector1>`
This was not aimed to catch cases where Cond is not a vXi1
mask but it does. Moreover, when Cond type is VxiN (N > 1)
then xor(cond, DAG.getConstant(1, DL, CondVT) != NOT(cond).
This patch changes the above to xor with allones, and avoids
entering the case for non-mask Conds.



git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@291745 91177308-0d34-0410-b5e6-96231b3b80d8
2017-01-12 06:49:03 +00:00
Craig Topper
ea5c0e9cfe [AVX-512] Add more varied avx512 feature command lines to the avx512-cvt.ll test to show some poor codegen examples.
We're definitely doing bad things when avx512vl is enabled without avx512dq. It looks like avx512vl/dq without avx512bw may also have some issues.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@291744 91177308-0d34-0410-b5e6-96231b3b80d8
2017-01-12 06:49:03 +00:00
Matt Arsenault
94bf68d551 AMDGPU: Fold fneg into fma or fmad
Patch mostly by Fiona Glaser

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@291733 91177308-0d34-0410-b5e6-96231b3b80d8
2017-01-12 00:32:16 +00:00