Commit Graph

22250 Commits

Author SHA1 Message Date
Simon Pilgrim
f9d4e1794d [X86] Add v2i4 store test case (PR20012)
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@312874 91177308-0d34-0410-b5e6-96231b3b80d8
2017-09-09 20:28:50 +00:00
Simon Pilgrim
5037d51d6d [X86] Add v2i2 test case (PR20011)
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@312873 91177308-0d34-0410-b5e6-96231b3b80d8
2017-09-09 20:22:35 +00:00
Simon Pilgrim
8b80450d25 [X86][FMA] Regenerate FMA tests
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@312871 91177308-0d34-0410-b5e6-96231b3b80d8
2017-09-09 19:25:59 +00:00
Simon Pilgrim
a22f9f2405 [X86][SSE] i32 vector multiplications test cases from PR6399
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@312868 91177308-0d34-0410-b5e6-96231b3b80d8
2017-09-09 18:18:17 +00:00
Simon Pilgrim
ce6571eae6 [X86][MOVBE] Fix typo in MOVBE scheduling test names
Copy+paste is not your friend

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@312867 91177308-0d34-0410-b5e6-96231b3b80d8
2017-09-09 17:52:44 +00:00
Craig Topper
9080ea8806 [X86] Don't disable slow INC/DEC if optimizing for size
Summary:
Just because INC/DEC is a little slow on some processors doesn't mean we shouldn't prefer it when optimizing for size.

This appears to match gcc behavior.

Reviewers: chandlerc, zvi, RKSimon, spatel

Reviewed By: RKSimon

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D37177

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@312866 91177308-0d34-0410-b5e6-96231b3b80d8
2017-09-09 17:11:59 +00:00
Kyle Butt
6ece35c79b PPC: Don't select lxv/stxv for insufficiently aligned stack slots.
The lxv/stxv instructions require an offset that is 0 % 16. Previously we were
selecting lxv/stxv for loads and stores to the stack where the offset from the
slot was a multiple of 16, but the stack slot was not 16 or more byte aligned.
When the frame gets lowered these transform to r(1|31) + slot + offset.
If slot is not aligned, slot + offset may not be 0 % 16.
Now we require 16 byte or more alignment for select lxv/stxv to stack slots.

Includes a testcase that shows both sufficiently and insufficiently aligned
stack slots.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@312843 91177308-0d34-0410-b5e6-96231b3b80d8
2017-09-09 00:37:56 +00:00
Yonghong Song
982a89e06c bpf: fix test failures due to previous bpf change of assembly code syntax
Signed-off-by: Yonghong Song <yhs@fb.com>

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@312840 91177308-0d34-0410-b5e6-96231b3b80d8
2017-09-09 00:11:13 +00:00
Matt Arsenault
fadb61df65 AMDGPU: Recompute scc liveness
The various scalar bit operations set SCC,
so one is erased or moved it needs to be recomputed.
Not sure why the existing tests don't fail on this.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@312819 91177308-0d34-0410-b5e6-96231b3b80d8
2017-09-08 18:51:26 +00:00
Craig Topper
403bab200a [X86] Simplify the slow-incdec test and add test cases with optsize.
I think we want to consider using inc/dec with optsize.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@312804 91177308-0d34-0410-b5e6-96231b3b80d8
2017-09-08 17:33:54 +00:00
Wei Mi
9930880cbf Fix a bug for rL312641.
rL312641 Allowed llvm.memcpy/memset/memmove to be tail calls when parent
function return the intrinsics's first argument. However on arm-none-eabi
platform, llvm.memcpy will be expanded to __aeabi_memcpy which doesn't
have return value. The fix is to check the libcall name after expansion
to match "memcpy/memset/memmove" before allowing those intrinsic to be
tail calls.


git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@312799 91177308-0d34-0410-b5e6-96231b3b80d8
2017-09-08 16:44:52 +00:00
Krzysztof Parzyszek
145740999d Preserve existing regs when adding pristines to LivePhysRegs/LiveRegUnits
Differential Revision: https://reviews.llvm.org/D37600


git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@312797 91177308-0d34-0410-b5e6-96231b3b80d8
2017-09-08 16:29:50 +00:00
Simon Pilgrim
977c908e78 [X86] Added PR31045 test case
Reduced version of 'addr-calc-crash.ll' that was included in D27044, that had been fixed already by D31286/rL298633

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@312786 91177308-0d34-0410-b5e6-96231b3b80d8
2017-09-08 10:49:11 +00:00
Jatin Bhateja
8277ad7473 [X86] Adding a test point for PR34149 'Suboptimal codegen for "fast" minnum and maxnum'
Differential Revision: https://reviews.llvm.org/D37614

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@312778 91177308-0d34-0410-b5e6-96231b3b80d8
2017-09-08 09:15:36 +00:00
Dean Michael Berris
f7f70f7f6c [XRay][CodeGen][PowerPC] Fix tail exit codegen for XRay in PPC
Summary:
This fixes code-gen for XRay in PPC. The regression wasn't caught by
codegen tests  which we add in this change.

What happened was the following:

- For tail exits, we used to unconditionally prepend the returns/exits
  with a pseudo-instruction that gets lowered to the instrumentation
  sled (and leave the actual return/exit instruction as-is).
- Changes to the XRay instrumentation pass caused the tail exits to
  suddenly also emit the tail exit pseudo-instruction, since the check
  for whether a return instruction was also a call instruction meant it
  was a tail exit instruction.
- None of the tests caught the regression either due to non-existent
  tests, or the tests being disabled/removed for continuous breakage.

This change re-introduces some of the basic tests and verifies that
we're back to a state that allows the back-end to generate appropriate
XRay instrumented binaries for PPC in the presence of tail exits.

Reviewers: echristo, timshen

Subscribers: nemanjai, kbarton, llvm-commits

Differential Revision: https://reviews.llvm.org/D37570

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@312772 91177308-0d34-0410-b5e6-96231b3b80d8
2017-09-08 01:47:56 +00:00
Chandler Carruth
575526c082 [x86] Flesh out the custom ISel for RMW aritmetic ops with used flags to
cover the bitwise operators.

Nothing really exciting here, this just stamps out the rest of the core
operations that can RMW memory and set flags.

Still not implemented here: ADC, SBB. Those will require more
interesting logic to channel the flags *in*, and I'm not currently
planning to try to tackle that. It might be interesting for someone who
wants to improve our code generation for bignum implementations.

Differential Revision: https://reviews.llvm.org/D37141

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@312768 91177308-0d34-0410-b5e6-96231b3b80d8
2017-09-08 00:17:12 +00:00
Chandler Carruth
803b2c7e69 [x86] Extend the manual ISel of add and sub with both RMW memory
operands and used flags to support matching immediate operands.

This is a bit trickier than register operands, and we still want to fall
back on a register operands even for things that appear to be
"immediates" when they won't actually select into the operation's
immediate operand. This also requires us to handle things like selecting
`sub` vs. `add` to minimize the number of bits needed to represent the
immediate, and picking the shortest immediate encoding. In order to
that, we in turn need to scan to make sure that CF isn't used as it will
get inverted.

The end result seems very nice though, and we're now generating
optimal instruction sequences for these patterns IMO.

A follow-up patch will further expand this to other operations with RMW
memory operands. But handing `add` and `sub` are useful starting points
to flesh out the machinery and make sure interesting and complex cases
can be handled.

Thanks to Craig Topper who provided a few fixes and improvements to this
patch in addition to the review!

Differential Revision: https://reviews.llvm.org/D37139

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@312764 91177308-0d34-0410-b5e6-96231b3b80d8
2017-09-07 23:54:24 +00:00
Paul Robinson
06296a65fd [DWARF] Line 0 should not have a discriminator.
It's meaningless and takes up extra space in the line table.

Differential Revision: https://reviews.llvm.org/D37364


git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@312751 91177308-0d34-0410-b5e6-96231b3b80d8
2017-09-07 22:15:44 +00:00
Artem Belevich
4059d374ce [CUDA] Added rudimentary support for CUDA-9 and sm_70.
For now CUDA-9 is not included in the list of CUDA versions clang
searches for, so the path to CUDA-9 must be explicitly passed
via --cuda-path=.

On LLVM side NVPTX added sm_70 GPU type which bumps required
PTX version to 6.0, but otherwise is equivalent to sm_62 at the moment.

Differential Revision: https://reviews.llvm.org/D37576

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@312734 91177308-0d34-0410-b5e6-96231b3b80d8
2017-09-07 18:14:32 +00:00
Matt Arsenault
0bb6355f63 AMDGPU: Start selecting v_mad_mix_f32
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@312732 91177308-0d34-0410-b5e6-96231b3b80d8
2017-09-07 18:05:07 +00:00
Konstantin Zhuravlyov
3964b8bfc8 AMDGPU: Handle non-temporal loads and stores
Differential Revision: https://reviews.llvm.org/D36862


git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@312729 91177308-0d34-0410-b5e6-96231b3b80d8
2017-09-07 17:14:54 +00:00
Konstantin Zhuravlyov
b6f64be453 AMDGPU: Handle more than one memory operand in SIMemoryLegalizer
Differential Revision: https://reviews.llvm.org/D37397


git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@312725 91177308-0d34-0410-b5e6-96231b3b80d8
2017-09-07 16:14:21 +00:00
Benjamin Kramer
55915d8771 [ARM] Remove redundant vcvt patterns.
These don't add any value as they're just compositions of existing
patterns. However, they can confuse the cost logic in ISel, leading to
duplicated vcvt instructions like in PR33199.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@312724 91177308-0d34-0410-b5e6-96231b3b80d8
2017-09-07 14:52:26 +00:00
Michael Zuckerman
563f2fdd92 [X86][LLVM]Expanding Supports lowerInterleavedLoad() in X86InterleavedAccess (VF{8|16|32} stride 3).
This patch expands the support of lowerInterleavedload to {8|16|32}x8i stride 3.

LLVM creates suboptimal shuffle code-gen for AVX2. In overall, this patch is a specific fix for the pattern (Strid=3 VF={8|16|32}) and we plan to include the store (deinterleved side).

The patch goal is to optimize the following sequence:
a0 b0 c0 a1 b1 c1 a2 b2
c2 a3 b3 c3 a4 b4 c4 a5
b5 c5 a6 b6 c6 a7 b7 c7

into

a0 a1 a2 a3 a4 a5 a6 a7
b0 b1 b2 b3 b4 b5 b6 b7
c0 c1 c2 c3 c4 c5 c6 c7

Reviewers
1. zvi
2. igor
3. guyblank
4. dorit
5. Ayal

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@312722 91177308-0d34-0410-b5e6-96231b3b80d8
2017-09-07 14:02:13 +00:00
Florian Hahn
651af02437 [MachineCombiner] Update instruction depths incrementally for large BBs.
Summary:
For large basic blocks with lots of combinable instructions, the
MachineTraceMetrics computations in MachineCombiner can dominate the compile
time, as computing the trace information is quadratic in the number of
instructions in a BB and it's relevant successors/predecessors.

In most cases, knowing the instruction depth should be enough to make
combination decisions. As we already iterate over all instructions in a basic
block, the instruction depth can be computed incrementally. This reduces the
cost of machine-combine drastically in cases where lots of instructions
are combined. The major drawback is that AFAIK, computing the critical path
length cannot be done incrementally. Therefore we only compute
instruction depths incrementally, for basic blocks with more
instructions than inc_threshold. The -machine-combiner-inc-threshold
option can be used to set the threshold and allows for easier
experimenting and checking if using incremental updates for all basic
blocks has any impact on the performance.

Reviewers: sanjoy, Gerolf, MatzeB, efriedma, fhahn

Reviewed By: fhahn

Subscribers: kiranchandramohan, javed.absar, efriedma, llvm-commits

Differential Revision: https://reviews.llvm.org/D36619

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@312719 91177308-0d34-0410-b5e6-96231b3b80d8
2017-09-07 12:49:39 +00:00
Alexander Ivchenko
8f5188c5d8 [x86] Update to cmov promotion tests for D36711; NFC
Adding i8 -> [i16, i32, i64] and i32 -> i64 cases.
This way we can see what the current codegen looks like.



git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@312707 91177308-0d34-0410-b5e6-96231b3b80d8
2017-09-07 08:59:05 +00:00
Zvi Rackover
8a2fcfe5be X86: Improve AVX512 fptoui lowering
Summary:
Add patterns for
  fptoui <16 x float> to <16 x i8>
  fptoui <16 x float> to <16 x i16>

Reviewers: igorb, delena, craig.topper

Reviewed By: craig.topper

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D37505

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@312704 91177308-0d34-0410-b5e6-96231b3b80d8
2017-09-07 07:40:34 +00:00
Matt Arsenault
ca22b05483 AMDGPU: Don't legalize i16 extloads to i32 with legal i16
Keeping non-i16 extloads makes it easier to match some new
gfx9 load instructions.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@312699 91177308-0d34-0410-b5e6-96231b3b80d8
2017-09-07 05:37:34 +00:00
Saleem Abdulrasool
1211c5d71e ARM: track globals promoted to coalesced const pool entries
Globals that are promoted to an ARM constant pool may alias with another
existing constant pool entry. We need to keep a reference to all globals
that were promoted to each constant pool value so that we can emit a
distinct label for each promoted global. These labels are necessary so
that debug info can refer to the promoted global without an undefined
reference during linking.

Patch by Stephen Crane!

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@312692 91177308-0d34-0410-b5e6-96231b3b80d8
2017-09-07 04:00:13 +00:00
Stanislav Mekhanoshin
6148c30603 [AMDGPU] Use v_pk_max_f16 for fcanonicalize
Differential Revision: https://reviews.llvm.org/D37325

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@312676 91177308-0d34-0410-b5e6-96231b3b80d8
2017-09-06 22:27:29 +00:00
Matthias Braun
109c6e02f7 Insert IMPLICIT_DEFS for undef uses in tail merging
Tail merging can convert an undef use into a normal one when creating a
common tail. Doing so can make the register live out from a block which
previously contained the undef use. To keep the liveness up-to-date,
insert IMPLICIT_DEFs in such blocks when necessary.

To enable this patch the computeLiveIns() function which used to
compute live-ins for a block and set them immediately is split into new
functions:
- computeLiveIns() just computes the live-ins in a LivePhysRegs set.
- addLiveIns() applies the live-ins to a block live-in list.
- computeAndAddLiveIns() is a convenience function combining the other
  two functions and behaving like computeLiveIns() before this patch.

Based on a patch by Krzysztof Parzyszek <kparzysz@codeaurora.org>

Differential Revision: https://reviews.llvm.org/D37034

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@312668 91177308-0d34-0410-b5e6-96231b3b80d8
2017-09-06 20:45:24 +00:00
Sanjay Patel
64aa32b606 [x86] fix triple and regenerate checks for psubus; NFC
Patch by Yulia Koval!

Differential Revision: https://reviews.llvm.org/D37523


git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@312662 91177308-0d34-0410-b5e6-96231b3b80d8
2017-09-06 19:05:20 +00:00
Stanislav Mekhanoshin
953b70393a [AMDGPU] Fixed encoding of v_pk_mul_f16 in fcanonicalize
Differential Revision: https://reviews.llvm.org/D37522

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@312660 91177308-0d34-0410-b5e6-96231b3b80d8
2017-09-06 18:29:51 +00:00
Krzysztof Parzyszek
7e5553d4f2 [IfConversion] Remove kill flags from common instructions as well
When if-converting a diamond, two separate blocks will be placed back
to back to form a straight line code. To ensure correctness of the
liveness information, any registers that are live in the second block
should not be killed in the first block, even if they were in the
original code.
Additionally, when the two blocks share common instructions at the
beginning, these instructions will not be duplicated, but only placed
once, before both of the blocks. Since the function "isIdenticalTo"
(as used here) ignores kill flags, the common initial code in one
block may have a kill flag for a register that is live in the other
block.
Because the code that removes kill flags only runs for the non-common
parts of the predicated blocks, a kill flag mismatch in the common
code could still lead to a live register being killed prematurely.


git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@312654 91177308-0d34-0410-b5e6-96231b3b80d8
2017-09-06 17:57:13 +00:00
Krzysztof Parzyszek
0c3d5af968 [Hexagon] Add option to generate calls to "abort" for "unreachable"
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@312644 91177308-0d34-0410-b5e6-96231b3b80d8
2017-09-06 16:22:55 +00:00
Wei Mi
78696b31cd [TailCall] Allow llvm.memcpy/memset/memmove to be tail calls when parent
function return the intrinsics's first argument.

llvm.memcpy/memset/memmove return void but they will return the first
argument after they are expanded as libcalls. Now if the parent function
has any return value, llvm.memcpy cannot be turned into tail call after
expansion.

The patch is to handle that case in SelectionDAGBuilder so when caller
function return the same value as the first argument of llvm.memcpy,
tail call is allowed.

Differential Revision: https://reviews.llvm.org/D37406


git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@312641 91177308-0d34-0410-b5e6-96231b3b80d8
2017-09-06 16:05:17 +00:00
Stanislav Mekhanoshin
651c4efd77 [AMDGPU] Fix shouldClusterMemOps to process flat loads
Flat loads do not have vdata operand but have vdst instead.

Differential Revision: https://reviews.llvm.org/D37502

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@312640 91177308-0d34-0410-b5e6-96231b3b80d8
2017-09-06 15:31:30 +00:00
Nicolai Haehnle
adf1cb63f2 AMDGPU: Make worst-case assumption about the wait states in inline assembly
Summary:
Mesa still uses a hack where empty inline assembly is used as a kind of
optimization barrier. This exposed a problem where not enough wait states
were inserted, because the hazard recognizer implicitly assumed that each
inline assembly "instruction" has at least one wait state.

Reviewers: arsenm

Subscribers: kzhuravl, wdng, yaxunl, dstuttard, tpr, llvm-commits, t-tye

Differential Revision: https://reviews.llvm.org/D37205

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@312635 91177308-0d34-0410-b5e6-96231b3b80d8
2017-09-06 13:50:13 +00:00
Chandler Carruth
1467a089bc [x86] Fix PR34377 by disabling cmov conversion when we relied on it
performing a zext of a register.

On the PR there is discussion of how to more effectively handle this,
but this patch prevents us from miscompiling code.

Differential Revision: https://reviews.llvm.org/D37504

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@312620 91177308-0d34-0410-b5e6-96231b3b80d8
2017-09-06 06:28:08 +00:00
Zvi Rackover
922eae4d2e X86 Tests: Tidy up AVX512 conversion tests. NFC.
Rename functions to a consistent format to make it easier to track coverage.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@312619 91177308-0d34-0410-b5e6-96231b3b80d8
2017-09-06 05:33:04 +00:00
Jatin Bhateja
2411ad4316 Updating a test reference for rL312608.
Differential Revision: https://reviews.llvm.org/D37501

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@312614 91177308-0d34-0410-b5e6-96231b3b80d8
2017-09-06 03:58:14 +00:00
Hal Finkel
a481ab548d [PowerPC] Don't use xscvdpspn on the P7
xscvdpspn was not introduced until the P8, so don't use it on the P7. Fixes a
regression introduced in r288152.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@312612 91177308-0d34-0410-b5e6-96231b3b80d8
2017-09-06 03:08:26 +00:00
Jatin Bhateja
f3b9c95869 [X86] Allow cross-lane permutations for sub targets supporting AVX2.
Summary:
Most instructions in AVX work “in-lane”, that is, each source element is applied only to other
elements of the same lane, thus a cross lane permutation is costly and needs more than one instrution.
AVX2 includes instructions to perform any-to-any permutation of words over a 256-bit register
and vectorized table lookup.

This should also Fix PR34369

Differential Revision: https://reviews.llvm.org/D37388

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@312608 91177308-0d34-0410-b5e6-96231b3b80d8
2017-09-06 02:58:47 +00:00
Yaxun Liu
1e1d0b01c1 [AMDGPU] Transform __read_pipe_* and __write_pipe_*
When packet size equals packet align and is power of 2, transform
__read_pipe* and __write_pipe* to specialized library function.

Differential Revision: https://reviews.llvm.org/D36831


git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@312598 91177308-0d34-0410-b5e6-96231b3b80d8
2017-09-06 00:30:27 +00:00
Eli Friedman
83b0e44429 [ARM] Make ARMExpandPseudo add implicit uses for predicated instructions
Missing these could potentially screw up post-ra scheduling.

Issue found by inspection, so I don't have a real testcase. Included
test just verifies the expected operands after expansion.

Differential Revision: https://reviews.llvm.org/D35156



git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@312589 91177308-0d34-0410-b5e6-96231b3b80d8
2017-09-05 22:54:06 +00:00
Reid Kleckner
c86178ea37 Add llvm.codeview.annotation to implement MSVC __annotation
Summary:
This intrinsic represents a label with a list of associated metadata
strings. It is modelled as reading and writing inaccessible memory so
that it won't be removed as dead code. I think the intention is that the
annotation strings should appear at most once in the debug info, so I
marked it noduplicate. We are allowed to inline code with annotations as
long as we strip the annotation, but that can be done later.

Reviewers: majnemer

Subscribers: eraman, llvm-commits, hiraditya

Differential Revision: https://reviews.llvm.org/D36904

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@312569 91177308-0d34-0410-b5e6-96231b3b80d8
2017-09-05 20:14:58 +00:00
Craig Topper
8c5b337a87 [X86] Remove unnecessary (v4f32 (X86vzmovl (v4f32 (scalar_to_vector FR32X)))) patterns
We had already disabled the pattern for SSE4.1 and SSE4.2. But it got re-enabled for AVX and AVX512.

With SSE41 we rely on a separate (v4f32 (X86vzmovl VR128)) pattern to select blendps with a xorps to create zeroess. And a separate (v4f32 (scalar_to_vector FR32X)) to select a COPY_TO_REG_CLASS to move FR32 to VR128

The same thing can happen for AVX with vblendps and those separate patterns already exist.

For AVX512, (v4f32 (X86vzmov VR128)) will select a VMOVSS instruction instead of VBLENDPS due to their not being a EVEX VBLENDPS. This is what we were getting out of the larger pattern anyway. So the larger pattern is unneeded for AVX512 too.

For SSE1-SSSE3 we can rely on (v4f32 (X86vzmov VR128)) selecting a MOVSS similar to AVX512. Again this is what the larger pattern did too.

So the only real change here is that AVX1/2 now properly outputs a VBLENDPS during isel instead of a VMOVSS to match SSE41. Most tests didn't notice because the two address instruction pass knows how to turn VMOVSS into VBLENDPS to get an independent destination register.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@312564 91177308-0d34-0410-b5e6-96231b3b80d8
2017-09-05 19:09:02 +00:00
Matt Arsenault
4e0c4fb9c1 AMDGPU: Fix not accounting for tail call resource usage
If the only call in a function is a tail call, the
function isn't considered to have a call since it's a
type of return.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@312561 91177308-0d34-0410-b5e6-96231b3b80d8
2017-09-05 18:36:36 +00:00
Zvi Rackover
9c369c6f9c X86 Tests: Adding missing AVX512 fptoui coverage tests. NFC.
Some of the cases show missing pattern i intend to fix shortly.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@312560 91177308-0d34-0410-b5e6-96231b3b80d8
2017-09-05 18:24:39 +00:00
Craig Topper
035520018a [AVX512] Remove patterns for (v8f32 (X86vzmovl (insert_subvector undef, (v4f32 (scalar_to_vector FR32X:)), (iPTR 0)))) and the same for v4f64.
We don't have this same pattern for AVX2 so I don't believe we should have it for AVX512. We also didn't have it for v16f32.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@312543 91177308-0d34-0410-b5e6-96231b3b80d8
2017-09-05 17:33:58 +00:00