Commit Graph

10455 Commits

Author SHA1 Message Date
Simon Pilgrim
f9d4e1794d [X86] Add v2i4 store test case (PR20012)
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@312874 91177308-0d34-0410-b5e6-96231b3b80d8
2017-09-09 20:28:50 +00:00
Simon Pilgrim
5037d51d6d [X86] Add v2i2 test case (PR20011)
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@312873 91177308-0d34-0410-b5e6-96231b3b80d8
2017-09-09 20:22:35 +00:00
Simon Pilgrim
8b80450d25 [X86][FMA] Regenerate FMA tests
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@312871 91177308-0d34-0410-b5e6-96231b3b80d8
2017-09-09 19:25:59 +00:00
Simon Pilgrim
a22f9f2405 [X86][SSE] i32 vector multiplications test cases from PR6399
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@312868 91177308-0d34-0410-b5e6-96231b3b80d8
2017-09-09 18:18:17 +00:00
Simon Pilgrim
ce6571eae6 [X86][MOVBE] Fix typo in MOVBE scheduling test names
Copy+paste is not your friend

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@312867 91177308-0d34-0410-b5e6-96231b3b80d8
2017-09-09 17:52:44 +00:00
Craig Topper
9080ea8806 [X86] Don't disable slow INC/DEC if optimizing for size
Summary:
Just because INC/DEC is a little slow on some processors doesn't mean we shouldn't prefer it when optimizing for size.

This appears to match gcc behavior.

Reviewers: chandlerc, zvi, RKSimon, spatel

Reviewed By: RKSimon

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D37177

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@312866 91177308-0d34-0410-b5e6-96231b3b80d8
2017-09-09 17:11:59 +00:00
Craig Topper
403bab200a [X86] Simplify the slow-incdec test and add test cases with optsize.
I think we want to consider using inc/dec with optsize.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@312804 91177308-0d34-0410-b5e6-96231b3b80d8
2017-09-08 17:33:54 +00:00
Simon Pilgrim
977c908e78 [X86] Added PR31045 test case
Reduced version of 'addr-calc-crash.ll' that was included in D27044, that had been fixed already by D31286/rL298633

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@312786 91177308-0d34-0410-b5e6-96231b3b80d8
2017-09-08 10:49:11 +00:00
Jatin Bhateja
8277ad7473 [X86] Adding a test point for PR34149 'Suboptimal codegen for "fast" minnum and maxnum'
Differential Revision: https://reviews.llvm.org/D37614

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@312778 91177308-0d34-0410-b5e6-96231b3b80d8
2017-09-08 09:15:36 +00:00
Chandler Carruth
575526c082 [x86] Flesh out the custom ISel for RMW aritmetic ops with used flags to
cover the bitwise operators.

Nothing really exciting here, this just stamps out the rest of the core
operations that can RMW memory and set flags.

Still not implemented here: ADC, SBB. Those will require more
interesting logic to channel the flags *in*, and I'm not currently
planning to try to tackle that. It might be interesting for someone who
wants to improve our code generation for bignum implementations.

Differential Revision: https://reviews.llvm.org/D37141

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@312768 91177308-0d34-0410-b5e6-96231b3b80d8
2017-09-08 00:17:12 +00:00
Chandler Carruth
803b2c7e69 [x86] Extend the manual ISel of add and sub with both RMW memory
operands and used flags to support matching immediate operands.

This is a bit trickier than register operands, and we still want to fall
back on a register operands even for things that appear to be
"immediates" when they won't actually select into the operation's
immediate operand. This also requires us to handle things like selecting
`sub` vs. `add` to minimize the number of bits needed to represent the
immediate, and picking the shortest immediate encoding. In order to
that, we in turn need to scan to make sure that CF isn't used as it will
get inverted.

The end result seems very nice though, and we're now generating
optimal instruction sequences for these patterns IMO.

A follow-up patch will further expand this to other operations with RMW
memory operands. But handing `add` and `sub` are useful starting points
to flesh out the machinery and make sure interesting and complex cases
can be handled.

Thanks to Craig Topper who provided a few fixes and improvements to this
patch in addition to the review!

Differential Revision: https://reviews.llvm.org/D37139

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@312764 91177308-0d34-0410-b5e6-96231b3b80d8
2017-09-07 23:54:24 +00:00
Paul Robinson
06296a65fd [DWARF] Line 0 should not have a discriminator.
It's meaningless and takes up extra space in the line table.

Differential Revision: https://reviews.llvm.org/D37364


git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@312751 91177308-0d34-0410-b5e6-96231b3b80d8
2017-09-07 22:15:44 +00:00
Michael Zuckerman
563f2fdd92 [X86][LLVM]Expanding Supports lowerInterleavedLoad() in X86InterleavedAccess (VF{8|16|32} stride 3).
This patch expands the support of lowerInterleavedload to {8|16|32}x8i stride 3.

LLVM creates suboptimal shuffle code-gen for AVX2. In overall, this patch is a specific fix for the pattern (Strid=3 VF={8|16|32}) and we plan to include the store (deinterleved side).

The patch goal is to optimize the following sequence:
a0 b0 c0 a1 b1 c1 a2 b2
c2 a3 b3 c3 a4 b4 c4 a5
b5 c5 a6 b6 c6 a7 b7 c7

into

a0 a1 a2 a3 a4 a5 a6 a7
b0 b1 b2 b3 b4 b5 b6 b7
c0 c1 c2 c3 c4 c5 c6 c7

Reviewers
1. zvi
2. igor
3. guyblank
4. dorit
5. Ayal

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@312722 91177308-0d34-0410-b5e6-96231b3b80d8
2017-09-07 14:02:13 +00:00
Florian Hahn
651af02437 [MachineCombiner] Update instruction depths incrementally for large BBs.
Summary:
For large basic blocks with lots of combinable instructions, the
MachineTraceMetrics computations in MachineCombiner can dominate the compile
time, as computing the trace information is quadratic in the number of
instructions in a BB and it's relevant successors/predecessors.

In most cases, knowing the instruction depth should be enough to make
combination decisions. As we already iterate over all instructions in a basic
block, the instruction depth can be computed incrementally. This reduces the
cost of machine-combine drastically in cases where lots of instructions
are combined. The major drawback is that AFAIK, computing the critical path
length cannot be done incrementally. Therefore we only compute
instruction depths incrementally, for basic blocks with more
instructions than inc_threshold. The -machine-combiner-inc-threshold
option can be used to set the threshold and allows for easier
experimenting and checking if using incremental updates for all basic
blocks has any impact on the performance.

Reviewers: sanjoy, Gerolf, MatzeB, efriedma, fhahn

Reviewed By: fhahn

Subscribers: kiranchandramohan, javed.absar, efriedma, llvm-commits

Differential Revision: https://reviews.llvm.org/D36619

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@312719 91177308-0d34-0410-b5e6-96231b3b80d8
2017-09-07 12:49:39 +00:00
Alexander Ivchenko
8f5188c5d8 [x86] Update to cmov promotion tests for D36711; NFC
Adding i8 -> [i16, i32, i64] and i32 -> i64 cases.
This way we can see what the current codegen looks like.



git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@312707 91177308-0d34-0410-b5e6-96231b3b80d8
2017-09-07 08:59:05 +00:00
Zvi Rackover
8a2fcfe5be X86: Improve AVX512 fptoui lowering
Summary:
Add patterns for
  fptoui <16 x float> to <16 x i8>
  fptoui <16 x float> to <16 x i16>

Reviewers: igorb, delena, craig.topper

Reviewed By: craig.topper

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D37505

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@312704 91177308-0d34-0410-b5e6-96231b3b80d8
2017-09-07 07:40:34 +00:00
Sanjay Patel
64aa32b606 [x86] fix triple and regenerate checks for psubus; NFC
Patch by Yulia Koval!

Differential Revision: https://reviews.llvm.org/D37523


git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@312662 91177308-0d34-0410-b5e6-96231b3b80d8
2017-09-06 19:05:20 +00:00
Wei Mi
78696b31cd [TailCall] Allow llvm.memcpy/memset/memmove to be tail calls when parent
function return the intrinsics's first argument.

llvm.memcpy/memset/memmove return void but they will return the first
argument after they are expanded as libcalls. Now if the parent function
has any return value, llvm.memcpy cannot be turned into tail call after
expansion.

The patch is to handle that case in SelectionDAGBuilder so when caller
function return the same value as the first argument of llvm.memcpy,
tail call is allowed.

Differential Revision: https://reviews.llvm.org/D37406


git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@312641 91177308-0d34-0410-b5e6-96231b3b80d8
2017-09-06 16:05:17 +00:00
Chandler Carruth
1467a089bc [x86] Fix PR34377 by disabling cmov conversion when we relied on it
performing a zext of a register.

On the PR there is discussion of how to more effectively handle this,
but this patch prevents us from miscompiling code.

Differential Revision: https://reviews.llvm.org/D37504

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@312620 91177308-0d34-0410-b5e6-96231b3b80d8
2017-09-06 06:28:08 +00:00
Zvi Rackover
922eae4d2e X86 Tests: Tidy up AVX512 conversion tests. NFC.
Rename functions to a consistent format to make it easier to track coverage.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@312619 91177308-0d34-0410-b5e6-96231b3b80d8
2017-09-06 05:33:04 +00:00
Jatin Bhateja
2411ad4316 Updating a test reference for rL312608.
Differential Revision: https://reviews.llvm.org/D37501

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@312614 91177308-0d34-0410-b5e6-96231b3b80d8
2017-09-06 03:58:14 +00:00
Jatin Bhateja
f3b9c95869 [X86] Allow cross-lane permutations for sub targets supporting AVX2.
Summary:
Most instructions in AVX work “in-lane”, that is, each source element is applied only to other
elements of the same lane, thus a cross lane permutation is costly and needs more than one instrution.
AVX2 includes instructions to perform any-to-any permutation of words over a 256-bit register
and vectorized table lookup.

This should also Fix PR34369

Differential Revision: https://reviews.llvm.org/D37388

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@312608 91177308-0d34-0410-b5e6-96231b3b80d8
2017-09-06 02:58:47 +00:00
Reid Kleckner
c86178ea37 Add llvm.codeview.annotation to implement MSVC __annotation
Summary:
This intrinsic represents a label with a list of associated metadata
strings. It is modelled as reading and writing inaccessible memory so
that it won't be removed as dead code. I think the intention is that the
annotation strings should appear at most once in the debug info, so I
marked it noduplicate. We are allowed to inline code with annotations as
long as we strip the annotation, but that can be done later.

Reviewers: majnemer

Subscribers: eraman, llvm-commits, hiraditya

Differential Revision: https://reviews.llvm.org/D36904

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@312569 91177308-0d34-0410-b5e6-96231b3b80d8
2017-09-05 20:14:58 +00:00
Craig Topper
8c5b337a87 [X86] Remove unnecessary (v4f32 (X86vzmovl (v4f32 (scalar_to_vector FR32X)))) patterns
We had already disabled the pattern for SSE4.1 and SSE4.2. But it got re-enabled for AVX and AVX512.

With SSE41 we rely on a separate (v4f32 (X86vzmovl VR128)) pattern to select blendps with a xorps to create zeroess. And a separate (v4f32 (scalar_to_vector FR32X)) to select a COPY_TO_REG_CLASS to move FR32 to VR128

The same thing can happen for AVX with vblendps and those separate patterns already exist.

For AVX512, (v4f32 (X86vzmov VR128)) will select a VMOVSS instruction instead of VBLENDPS due to their not being a EVEX VBLENDPS. This is what we were getting out of the larger pattern anyway. So the larger pattern is unneeded for AVX512 too.

For SSE1-SSSE3 we can rely on (v4f32 (X86vzmov VR128)) selecting a MOVSS similar to AVX512. Again this is what the larger pattern did too.

So the only real change here is that AVX1/2 now properly outputs a VBLENDPS during isel instead of a VMOVSS to match SSE41. Most tests didn't notice because the two address instruction pass knows how to turn VMOVSS into VBLENDPS to get an independent destination register.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@312564 91177308-0d34-0410-b5e6-96231b3b80d8
2017-09-05 19:09:02 +00:00
Zvi Rackover
9c369c6f9c X86 Tests: Adding missing AVX512 fptoui coverage tests. NFC.
Some of the cases show missing pattern i intend to fix shortly.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@312560 91177308-0d34-0410-b5e6-96231b3b80d8
2017-09-05 18:24:39 +00:00
Craig Topper
035520018a [AVX512] Remove patterns for (v8f32 (X86vzmovl (insert_subvector undef, (v4f32 (scalar_to_vector FR32X:)), (iPTR 0)))) and the same for v4f64.
We don't have this same pattern for AVX2 so I don't believe we should have it for AVX512. We also didn't have it for v16f32.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@312543 91177308-0d34-0410-b5e6-96231b3b80d8
2017-09-05 17:33:58 +00:00
Simon Pilgrim
76db91a4f0 [X86] Limit store merge size when implicitfloat is enabled (PR34421)
As suggested by @niravd : https://bugs.llvm.org/show_bug.cgi?id=34421#c2

Differential Revision: https://reviews.llvm.org/D37464

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@312534 91177308-0d34-0410-b5e6-96231b3b80d8
2017-09-05 13:40:29 +00:00
Simon Pilgrim
34cbdf56ca [X86] Regenerate scalar rotation tests
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@312530 91177308-0d34-0410-b5e6-96231b3b80d8
2017-09-05 12:28:30 +00:00
Simon Pilgrim
d5802f5e18 [X86][AVX512] Use AVX512 attributes instead of -mcpu in vector shift tests
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@312529 91177308-0d34-0410-b5e6-96231b3b80d8
2017-09-05 12:23:45 +00:00
Simon Pilgrim
3eb1ddf19a [X86][AVX512] Use AVX512 attributes instead of -mcpu
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@312528 91177308-0d34-0410-b5e6-96231b3b80d8
2017-09-05 12:05:14 +00:00
Sanjay Patel
cfc091852b [x86] add tests for vector store merge opportunity; NFC
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@312504 91177308-0d34-0410-b5e6-96231b3b80d8
2017-09-04 22:01:25 +00:00
Sanjay Patel
07477455af [x86] auto-generate complete checks; NFC
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@312503 91177308-0d34-0410-b5e6-96231b3b80d8
2017-09-04 21:46:05 +00:00
Sanjay Patel
9435706923 [x86] add/regenerate complete checks; NFC
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@312502 91177308-0d34-0410-b5e6-96231b3b80d8
2017-09-04 21:43:32 +00:00
Sanjay Patel
8bafe87c16 [x86] add test for unnecessary cmp + masked store; NFC
As noted in PR11210:
https://bugs.llvm.org/show_bug.cgi?id=11210
...fixing this should allow us to eliminate x86-specific masked store intrinsics in IR.
(Although more testing will be needed to confirm that.)


git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@312496 91177308-0d34-0410-b5e6-96231b3b80d8
2017-09-04 17:21:17 +00:00
Sam McCall
c7c869be7e Revert "Re-enable "[MachineCopyPropagation] Extend pass to do COPY source forwarding""
This crashes on boringSSL on PPC (will send reduced testcase)

This reverts commit r312328.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@312490 91177308-0d34-0410-b5e6-96231b3b80d8
2017-09-04 15:47:00 +00:00
Simon Pilgrim
f070a0d73d [X86][AVX512] Add support for VPERMILPS v16f32 shuffle lowering (PR34382)
Avoid use of VPERMPS where we don't need it by instead using the variable mask version of VPERMILPS for unary shuffles.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@312486 91177308-0d34-0410-b5e6-96231b3b80d8
2017-09-04 13:51:57 +00:00
Simon Pilgrim
e6cf8170cc Added shuffle test case from PR34382
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@312485 91177308-0d34-0410-b5e6-96231b3b80d8
2017-09-04 13:43:13 +00:00
Simon Pilgrim
2677f9404b Added shuffle test case from PR34369
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@312481 91177308-0d34-0410-b5e6-96231b3b80d8
2017-09-04 11:08:47 +00:00
Ayman Musa
228d11f2a9 [X86] Replace -mcpu option with -mattr in LIT tests added in https://reviews.llvm.org/rL312442
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@312474 91177308-0d34-0410-b5e6-96231b3b80d8
2017-09-04 09:31:32 +00:00
Igor Breger
4e66147d1e [GlobalISel][X86] G_PHI support.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@312473 91177308-0d34-0410-b5e6-96231b3b80d8
2017-09-04 09:06:45 +00:00
Dean Michael Berris
cfbb872e5e [XRay][CodeGen] Use PIC-friendly code in XRay sleds and remove synthetic references in .text
Summary:
This is a re-roll of D36615 which uses PLT relocations in the back-end
to the call to __xray_CustomEvent() when building in -fPIC and
-fxray-instrument mode.

Reviewers: pcc, djasper, bkramer

Subscribers: sdardis, javed.absar, llvm-commits

Differential Revision: https://reviews.llvm.org/D37373

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@312466 91177308-0d34-0410-b5e6-96231b3b80d8
2017-09-04 05:34:58 +00:00
Craig Topper
7f04a10c78 [X86] Add a combine to recognize when we have two insert subvectors that together write the whole vector, but the starting vector isn't undef.
In this case we should replace the starting vector with undef.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@312462 91177308-0d34-0410-b5e6-96231b3b80d8
2017-09-04 01:13:36 +00:00
Craig Topper
42d6767626 [X86] Add a combine to turn (insert_subvector zero, (insert_subvector zero, X, Idx), Idx) into an insert of X into the larger zero vector.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@312460 91177308-0d34-0410-b5e6-96231b3b80d8
2017-09-03 22:25:52 +00:00
Craig Topper
03f273f10e [X86] Add more patterns to use moves to zero the upper portions of a vector register that I missed in r312450.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@312459 91177308-0d34-0410-b5e6-96231b3b80d8
2017-09-03 22:25:50 +00:00
Craig Topper
3b552a56bb [X86] Combine inserting a vector of zeros into a vector of zeros just the larger vector.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@312458 91177308-0d34-0410-b5e6-96231b3b80d8
2017-09-03 22:25:49 +00:00
Craig Topper
3cef9810b2 [X86] Add patterns to turn an insert into lower subvector of a zero vector into a move instruction which will implicitly zero the upper elements.
Ideally we'd be able to emit the SUBREG_TO_REG without the explicit register->register move, but we'd need to be sure the producing operation would select something that guaranteed the upper bits were already zeroed.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@312450 91177308-0d34-0410-b5e6-96231b3b80d8
2017-09-03 17:52:25 +00:00
Craig Topper
849412352b [X86] Add VBLENDPS/VPBLENDD to the execution domain fixing tables.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@312449 91177308-0d34-0410-b5e6-96231b3b80d8
2017-09-03 17:52:23 +00:00
Craig Topper
05f56c0b3f [X86] Canonicalize (concat_vectors X, zero) -> (insert_subvector zero, X, 0).
In a future patch, I plan to teach isel to use a small vector move with implicit zeroing of the upper elements when it sees the (insert_subvector zero, X, 0) pattern.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@312448 91177308-0d34-0410-b5e6-96231b3b80d8
2017-09-03 17:52:19 +00:00
Ayman Musa
1ee1fb6046 [X86] Add -mtriple option to LIT tests added in https://reviews.llvm.org/rL312442
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@312443 91177308-0d34-0410-b5e6-96231b3b80d8
2017-09-03 15:06:26 +00:00
Ayman Musa
5684b1b1ad [X86][AVX512] Add simple tests for all AVX512 shuffle instructions.
Throughout an effort to strongly check the behavior of CodeGen with the IR shufflevector instruction we generated many tests while predicting the best X86 sequence that may be generated.

This is a subset of the generated tests that we think may add value to our X86 set of tests.

Some of the checks are not optimal and will be changed after fixing:
1. PR34394
2. PR34382
3. PR34380
4. PR34359

Differential Revision: https://reviews.llvm.org/D37329

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@312442 91177308-0d34-0410-b5e6-96231b3b80d8
2017-09-03 13:53:44 +00:00