Commit Graph

23902 Commits

Author SHA1 Message Date
Krzysztof Parzyszek
7347a65675 [Hexagon] Implement basic vector operations on vectors vNi1
In addition to that, make sure that there are no boolean vector types that
are associated with multiple register classes. Specifically, remove v32i1
and v64i1 from integer register classes. These types will correspond to
results of vector comparisons, and as such should belong to the vector
predicate class. Having them in scalar registers as well makes legalization
ambiguous.


git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@323229 91177308-0d34-0410-b5e6-96231b3b80d8
2018-01-23 17:53:59 +00:00
Simon Pilgrim
fe48225d2a [X86][SSE] LowerBUILD_VECTORAsVariablePermute - extract subvector from oversized index vectors
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@323223 91177308-0d34-0410-b5e6-96231b3b80d8
2018-01-23 17:02:15 +00:00
Dan Gohman
ecb7ca8228 [WebAssembly] Add mem.* intrinsics.
The grow_memory and current_memory instructions are expected to be
officially renamed to mem.grow and mem.size. Introduce new intrinsics
with the new names. These new names aren't yet official, so for now,
use them at your own risk.

Also, take this opportunity to add arguments for the currently unused
immediate field in those instructions.


git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@323222 91177308-0d34-0410-b5e6-96231b3b80d8
2018-01-23 17:02:02 +00:00
Dan Gohman
64b37973fb [WebAssembly] Switch to *-wasm as the default target triple.
This makes wasm32-unknown-unknown-wasm the default, which supports
the .o file writer and the new linking ABI. To enable s2wasm-compatible
output, use the wasm32-unknown-unknown-elf triple.


git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@323220 91177308-0d34-0410-b5e6-96231b3b80d8
2018-01-23 16:55:44 +00:00
Alexander Ivchenko
90f5ff4016 [x86] Reautogenerate a bunch of tests for D42287. NFC
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@323215 91177308-0d34-0410-b5e6-96231b3b80d8
2018-01-23 16:08:15 +00:00
Yaxun Liu
73f6582e91 CodeGen: Fix assertion in ScheduleDAGMILive::scheduleMI due to llvm.dbg.value
Fix a bug in ScheduleDAGMILive::scheduleMI which causes BotRPTracker not tracking CurrentBottom in some rare cases involving llvm.dbg.value.

This issues causes amdgcn target to assert when compiling some user codes with -g.

Differential Revision: https://reviews.llvm.org/D42394


git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@323214 91177308-0d34-0410-b5e6-96231b3b80d8
2018-01-23 16:04:53 +00:00
Craig Topper
e558b6ee99 [X86] Rewrite vXi1 element insertion by using a vXi1 scalar_to_vector and inserting into a vXi1 vector.
The existing code was already doing something very similar to subvector insertion so this allows us to remove the nearly duplicate code.

This patch is a little larger than it should be due to differences between the DQI handling between the two today.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@323212 91177308-0d34-0410-b5e6-96231b3b80d8
2018-01-23 15:56:36 +00:00
Simon Pilgrim
1423fd89fd [X86][SSE] LowerBUILD_VECTORAsVariablePermute - ensure that the source vector is not larger than the destination
We might be able to support this in the future with VPERMV3, OR(PSHUFB, PSHUFB) etc.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@323210 91177308-0d34-0410-b5e6-96231b3b80d8
2018-01-23 15:51:03 +00:00
Alexander Ivchenko
09ce1bd00f [x86] Mostly reautogenerate a bunch of tests that affect D37775. NFC
Tests required minor manual tweaks:
CodeGen/MIR/X86/generic-instr-type.mir
CodeGen/X86/GlobalISel/select-copy.mir
CodeGen/X86/GlobalISel/select-ext.mir
CodeGen/X86/GlobalISel/select-intrinsic-x86-flags-read-u32.mir
CodeGen/X86/GlobalISel/select-phi.mir
CodeGen/X86/GlobalISel/select-trunc.mir
CodeGen/X86/GlobalISel/select-frameIndex.mir

And following tests are split into 32/64 versions:
CodeGen/X86/GlobalISel/legalize-GV.mir
CodeGen/X86/GlobalISel/select-frameIndex.mir



git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@323209 91177308-0d34-0410-b5e6-96231b3b80d8
2018-01-23 15:48:50 +00:00
Simon Pilgrim
088d7796ed [X86][SSE] LowerBUILD_VECTORAsVariablePermute - ensure that the index vector has the correct number of elements
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@323206 91177308-0d34-0410-b5e6-96231b3b80d8
2018-01-23 15:13:37 +00:00
Tim Northover
e17aaf826b AArch64: get type from correct result when forming BFX
Some nodes produce multiple values so when obtaining the type of an ISD::OR we
need to make sure we ask for the correct one. Hopefully that's all of them.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@323205 91177308-0d34-0410-b5e6-96231b3b80d8
2018-01-23 15:11:27 +00:00
Tim Northover
f6fa06bcb8 AArch64: get type from correct result when forming BFI/BFM
Some nodes produce multiple values so when obtaining the type of an ISD::OR we
need to make sure we ask for the correct one.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@323202 91177308-0d34-0410-b5e6-96231b3b80d8
2018-01-23 14:37:03 +00:00
Craig Topper
60a0ef023c [X86] Legalize v32i1 without BWI via splitting to v16i1 rather than the default of promoting to v32i8.
Summary:
For the most part its better to keep v32i1 as a mask type of a narrower width than trying to promote it to a ymm register.

I had to add some overrides to the methods that get the types for the calling convention so that we still use v32i8 for argument/return purposes.

There are still some regressions in here. I definitely saw some around shuffles. I think we probably should move vXi1 shuffle from lowering to a DAG combine where I think the extend and truncate we have to emit would be better combined.

I think we also need a DAG combine to remove trunc from (extract_vector_elt (trunc))

Overall this removes something like 13000 CHECK lines from lit tests.

Reviewers: zvi, RKSimon, delena, spatel

Reviewed By: RKSimon

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D42031

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@323201 91177308-0d34-0410-b5e6-96231b3b80d8
2018-01-23 14:25:39 +00:00
Simon Pilgrim
32eb83ac88 [X86][SSE] LowerBUILD_VECTORAsVariablePermute - fix PSHUFB source/index operand ordering
As detailed in rL317463, PSHUFB (like most variable shuffle instructions) uses Op[0] for the source vector and Op[1] for the shuffle index vector, VPERMV works in reverse which is probably where the confusion comes from.

Differential Revision: https://reviews.llvm.org/D42380

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@323190 91177308-0d34-0410-b5e6-96231b3b80d8
2018-01-23 11:39:06 +00:00
MinSeong Kim
3c68a97987 [Analysis] Disable exp/exp2/pow finite lib calls on Android with -ffast-math.
Summary:
Since r322087, glibc's finite lib calls are generated when possible.
However, glibc is not supported on Android. Therefore this change
enables llvm to finely distinguish between linux and Android for
unsupported library calls. The change also include some regression
tests.

Reviewers: srhines, pirama

Reviewed By: srhines

Subscribers: kongyi, chh, javed.absar, llvm-commits

Differential Revision: https://reviews.llvm.org/D42288

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@323187 91177308-0d34-0410-b5e6-96231b3b80d8
2018-01-23 11:11:36 +00:00
Stefan Maksimovic
76abbaff70 [mips] Properly select abs and sqrt instructions
- Alter abs for micromips to have both AFGR64 and FGR64
  variants, same as sqrt
- Remove sqrt and abs from MicroMips32r6InstrInfo.td,
  use micromips FGR64 variants
- Restrict non-micromips abs/sqrt with NotInMicroMips
  predicate

Differential revision: https://reviews.llvm.org/D41439


git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@323184 91177308-0d34-0410-b5e6-96231b3b80d8
2018-01-23 10:09:39 +00:00
Craig Topper
34edacbdd6 [X86] Don't reorder (srl (and X, C1), C2) if (and X, C1) can be matched as a movzx
Summary:
If we can match as a zero extend there's no need to flip the order to get an encoding benefit. As movzx is 3 bytes with independent source/dest registers. The shortest 'and' we could make is also 3 bytes unless we get lucky in the register allocator and its on AL/AX/EAX which have a 2 byte encoding.

This patch was more impressive before r322957 went in. It removed some of the same Ands that got deleted by that patch.

Reviewers: spatel, RKSimon

Reviewed By: spatel

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D42313

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@323175 91177308-0d34-0410-b5e6-96231b3b80d8
2018-01-23 05:45:52 +00:00
Craig Topper
30a2bf5c68 [X86] Remove 'NOREX' comment from the printing of _NOREX instructions.
Some of the NOREX instructions are used in 32-bit mode making this printing confusing. It also doesn't provide a lot of value since you can see the h-register being used by the instruction.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@323174 91177308-0d34-0410-b5e6-96231b3b80d8
2018-01-23 05:37:00 +00:00
Chandler Carruth
fd5a8723ce Introduce the "retpoline" x86 mitigation technique for variant #2 of the speculative execution vulnerabilities disclosed today, specifically identified by CVE-2017-5715, "Branch Target Injection", and is one of the two halves to Spectre..
Summary:
First, we need to explain the core of the vulnerability. Note that this
is a very incomplete description, please see the Project Zero blog post
for details:
https://googleprojectzero.blogspot.com/2018/01/reading-privileged-memory-with-side.html

The basis for branch target injection is to direct speculative execution
of the processor to some "gadget" of executable code by poisoning the
prediction of indirect branches with the address of that gadget. The
gadget in turn contains an operation that provides a side channel for
reading data. Most commonly, this will look like a load of secret data
followed by a branch on the loaded value and then a load of some
predictable cache line. The attacker then uses timing of the processors
cache to determine which direction the branch took *in the speculative
execution*, and in turn what one bit of the loaded value was. Due to the
nature of these timing side channels and the branch predictor on Intel
processors, this allows an attacker to leak data only accessible to
a privileged domain (like the kernel) back into an unprivileged domain.

The goal is simple: avoid generating code which contains an indirect
branch that could have its prediction poisoned by an attacker. In many
cases, the compiler can simply use directed conditional branches and
a small search tree. LLVM already has support for lowering switches in
this way and the first step of this patch is to disable jump-table
lowering of switches and introduce a pass to rewrite explicit indirectbr
sequences into a switch over integers.

However, there is no fully general alternative to indirect calls. We
introduce a new construct we call a "retpoline" to implement indirect
calls in a non-speculatable way. It can be thought of loosely as
a trampoline for indirect calls which uses the RET instruction on x86.
Further, we arrange for a specific call->ret sequence which ensures the
processor predicts the return to go to a controlled, known location. The
retpoline then "smashes" the return address pushed onto the stack by the
call with the desired target of the original indirect call. The result
is a predicted return to the next instruction after a call (which can be
used to trap speculative execution within an infinite loop) and an
actual indirect branch to an arbitrary address.

On 64-bit x86 ABIs, this is especially easily done in the compiler by
using a guaranteed scratch register to pass the target into this device.
For 32-bit ABIs there isn't a guaranteed scratch register and so several
different retpoline variants are introduced to use a scratch register if
one is available in the calling convention and to otherwise use direct
stack push/pop sequences to pass the target address.

This "retpoline" mitigation is fully described in the following blog
post: https://support.google.com/faqs/answer/7625886

We also support a target feature that disables emission of the retpoline
thunk by the compiler to allow for custom thunks if users want them.
These are particularly useful in environments like kernels that
routinely do hot-patching on boot and want to hot-patch their thunk to
different code sequences. They can write this custom thunk and use
`-mretpoline-external-thunk` *in addition* to `-mretpoline`. In this
case, on x86-64 thu thunk names must be:
```
  __llvm_external_retpoline_r11
```
or on 32-bit:
```
  __llvm_external_retpoline_eax
  __llvm_external_retpoline_ecx
  __llvm_external_retpoline_edx
  __llvm_external_retpoline_push
```
And the target of the retpoline is passed in the named register, or in
the case of the `push` suffix on the top of the stack via a `pushl`
instruction.

There is one other important source of indirect branches in x86 ELF
binaries: the PLT. These patches also include support for LLD to
generate PLT entries that perform a retpoline-style indirection.

The only other indirect branches remaining that we are aware of are from
precompiled runtimes (such as crt0.o and similar). The ones we have
found are not really attackable, and so we have not focused on them
here, but eventually these runtimes should also be replicated for
retpoline-ed configurations for completeness.

For kernels or other freestanding or fully static executables, the
compiler switch `-mretpoline` is sufficient to fully mitigate this
particular attack. For dynamic executables, you must compile *all*
libraries with `-mretpoline` and additionally link the dynamic
executable and all shared libraries with LLD and pass `-z retpolineplt`
(or use similar functionality from some other linker). We strongly
recommend also using `-z now` as non-lazy binding allows the
retpoline-mitigated PLT to be substantially smaller.

When manually apply similar transformations to `-mretpoline` to the
Linux kernel we observed very small performance hits to applications
running typical workloads, and relatively minor hits (approximately 2%)
even for extremely syscall-heavy applications. This is largely due to
the small number of indirect branches that occur in performance
sensitive paths of the kernel.

When using these patches on statically linked applications, especially
C++ applications, you should expect to see a much more dramatic
performance hit. For microbenchmarks that are switch, indirect-, or
virtual-call heavy we have seen overheads ranging from 10% to 50%.

However, real-world workloads exhibit substantially lower performance
impact. Notably, techniques such as PGO and ThinLTO dramatically reduce
the impact of hot indirect calls (by speculatively promoting them to
direct calls) and allow optimized search trees to be used to lower
switches. If you need to deploy these techniques in C++ applications, we
*strongly* recommend that you ensure all hot call targets are statically
linked (avoiding PLT indirection) and use both PGO and ThinLTO. Well
tuned servers using all of these techniques saw 5% - 10% overhead from
the use of retpoline.

We will add detailed documentation covering these components in
subsequent patches, but wanted to make the core functionality available
as soon as possible. Happy for more code review, but we'd really like to
get these patches landed and backported ASAP for obvious reasons. We're
planning to backport this to both 6.0 and 5.0 release streams and get
a 5.0 release with just this cherry picked ASAP for distros and vendors.

This patch is the work of a number of people over the past month: Eric, Reid,
Rui, and myself. I'm mailing it out as a single commit due to the time
sensitive nature of landing this and the need to backport it. Huge thanks to
everyone who helped out here, and everyone at Intel who helped out in
discussions about how to craft this. Also, credit goes to Paul Turner (at
Google, but not an LLVM contributor) for much of the underlying retpoline
design.

Reviewers: echristo, rnk, ruiu, craig.topper, DavidKreitzer

Subscribers: sanjoy, emaste, mcrosier, mgorny, mehdi_amini, hiraditya, llvm-commits

Differential Revision: https://reviews.llvm.org/D41723

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@323155 91177308-0d34-0410-b5e6-96231b3b80d8
2018-01-22 22:05:25 +00:00
Mark Searles
fb6bc12a56 [AMDGPU] SI Load Store Optimizer: When merging with offset, use V_ADD_{I|U}32_e64
- Change inserted add ( V_ADD_{I|U}32_e32 ) to _e64 version ( V_ADD_{I|U}32_e64 ) so that the add uses a vreg for the carry; this prevents inserted v_add from killing VCC; the _e64 version doesn't accept a literal in its encoding, so we need to introduce a mov instr as well to get the imm into a register.
- Change pass name to "SI Load Store Optimizer"; this removes the '/', which complicates scripts.

Differential Revision: https://reviews.llvm.org/D42124

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@323153 91177308-0d34-0410-b5e6-96231b3b80d8
2018-01-22 21:46:43 +00:00
Petar Jovanovic
d9fb399f43 [mips] add warnings for using dsp and msa flags with inappropriate revisions
Dsp and dspr2 require MIPS revision 2, while msa requires revision 5. Adding
warnings for cases when these flags are used with earlier revision.

Patch by Milos Stojanovic.

Differential Revision: https://reviews.llvm.org/D40490


git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@323131 91177308-0d34-0410-b5e6-96231b3b80d8
2018-01-22 16:43:30 +00:00
Carey Williams
95a5df84b6 [AArch64] optimise v4f16 fcmps to utilise vector instructions
Improves the code generation for v4f16 FCMP instructions when FullFP16 is not supported.
Generating FCTVL(s) rather than a longer series of FCVTs.

Differential Revision: https://reviews.llvm.org/D41772

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@323118 91177308-0d34-0410-b5e6-96231b3b80d8
2018-01-22 14:16:11 +00:00
Simon Pilgrim
3bd60530e3 [X86][AVX] Add test case for PR34370
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@323106 91177308-0d34-0410-b5e6-96231b3b80d8
2018-01-22 12:27:22 +00:00
Simon Pilgrim
7e9a1c6fb4 [X86][SSE] Add ISD::VECTOR_SHUFFLE to faux shuffle decoding (Reapplied)
Primarily, this allows us to use the aggressive extraction mechanisms in combineExtractWithShuffle earlier and make use of UNDEF elements that may be lost during lowering.

Reapplied after rL322279 was reverted at rL322335 due to PR35918, underlying issue was fixed at rL322644.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@323104 91177308-0d34-0410-b5e6-96231b3b80d8
2018-01-22 12:05:17 +00:00
Marina Yatsina
5e110fb917 Break false dependencies for POPCNT, LZCNT, TZCNT
Add POPCNT, LZCNT, TZCNT to the list of instructions that have false dependency.
Add a test to make sure BreakFalseDeps breaks the dependencies for these instructions.
Update affected tests.

This fixes bugzilla https://bugs.llvm.org/show_bug.cgi?id=33869

This is the final of multiple patches that fix this bugzilla.
Most of the patches are intended at refactoring the existent code.

Reviews of the refactoring done to enable this change:
https://reviews.llvm.org/D40330
https://reviews.llvm.org/D40331
https://reviews.llvm.org/D40332
https://reviews.llvm.org/D40333

Differential Revision: https://reviews.llvm.org/D40334

Change-Id: If95cbf1a3f5c7dccff8f1b22ecb397542147303d

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@323096 91177308-0d34-0410-b5e6-96231b3b80d8
2018-01-22 10:07:01 +00:00
Marina Yatsina
ef279188ca Separate ExecutionDepsFix into 4 parts:
1. ReachingDefsAnalysis - Allows to identify for each instruction what is the “closest” reaching def of a certain register. Used by BreakFalseDeps (for clearance calculation) and ExecutionDomainFix (for arbitrating conflicting domains).
2. ExecutionDomainFix - Changes the variant of the instructions in order to minimize domain crossings.
3. BreakFalseDeps - Breaks false dependencies.
4. LoopTraversal - Creatws a traversal order of the basic blocks that is optimal for loops (introduced in revision L293571). Both ExecutionDomainFix and ReachingDefsAnalysis use this to determine the order they will traverse the basic blocks.

This also included the following changes to ExcecutionDepsFix original logic:
1. BreakFalseDeps and ReachingDefsAnalysis logic no longer restricted by a register class.
2. ReachingDefsAnalysis tracks liveness of reg units instead of reg indices into a given reg class.

Additional changes in affected files:
1. X86 and ARM targets now inherit from ExecutionDomainFix instead of ExecutionDepsFix. BreakFalseDeps also was added to the passes they activate.
2. Comments and references to ExecutionDepsFix replaced with ExecutionDomainFix and BreakFalseDeps, as appropriate.

Additional refactoring changes will follow.

This commit is (almost) NFC.
The only functional change is that now BreakFalseDeps will break dependency for all register classes.
Since no additional instructions were added to the list of instructions that have false dependencies, there is no actual change yet.
In a future commit several instructions (and tests) will be added.

This is the first of multiple patches that fix bugzilla https://bugs.llvm.org/show_bug.cgi?id=33869
Most of the patches are intended at refactoring the existent code.

Additional relevant reviews:
https://reviews.llvm.org/D40331
https://reviews.llvm.org/D40332
https://reviews.llvm.org/D40333
https://reviews.llvm.org/D40334

Differential Revision: https://reviews.llvm.org/D40330

Change-Id: Icaeb75e014eff96a8f721377783f9a3e6c679275

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@323087 91177308-0d34-0410-b5e6-96231b3b80d8
2018-01-22 10:05:23 +00:00
Craig Topper
10fbe29b93 [X86] Add an override of targetShrinkDemandedConstant to limit the damage that shrinkdemandedbits can do to zext_in_reg operations
Summary:
This patch adds an implementation of targetShrinkDemandedConstant that tries to keep shrinkdemandedbits from removing bits that would otherwise have been recognized as a movzx.

We still need a follow patch to stop moving ands across srl if the and could be represented as a movzx before the shift but not after. I think this should help with some of the cases that D42088 ended up removing during isel.

Reviewers: spatel, RKSimon

Reviewed By: spatel

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D42265

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@323048 91177308-0d34-0410-b5e6-96231b3b80d8
2018-01-20 18:50:09 +00:00
Jonas Paulsson
9ed46d5b97 Move new test from Generic to SystemZ.
A few build bots failed with r323042 because they are not configured to
build the SystemZ target.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@323044 91177308-0d34-0410-b5e6-96231b3b80d8
2018-01-20 16:57:06 +00:00
Jonas Paulsson
e64dbcb7af [SelectionDAG] Fix codegen of vector stores with non byte-sized elements.
This was completely broken, but hopefully fixed by this patch.

In cases where it is needed, a vector with non byte-sized elements is stored
by extracting, zero-extending, shift:ing and or:ing the elements into an
integer of the same width as the vector, which is then stored.

Review: Eli Friedman, Ulrich Weigand
https://reviews.llvm.org/D42100#inline-369520
https://bugs.llvm.org/show_bug.cgi?id=35520

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@323042 91177308-0d34-0410-b5e6-96231b3b80d8
2018-01-20 16:05:10 +00:00
Craig Topper
7987a0b910 [X86] Add some more v32i1 shuffle tests with shuffles between mask creation and mask usage rather than being just shuffling input arguments.
The existing tests just tested shuffles of v32i1 inputs, but arguments are promoted to v32i8. So it wasn't a good demonstration of v32i1 shuffle handling.

The new test cases use compares and selects to get k-register operations around the shuffle.

This is prep work for demonstrating changes from D42031.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@323031 91177308-0d34-0410-b5e6-96231b3b80d8
2018-01-20 08:13:35 +00:00
Craig Topper
54aeb345dd [X86] Add test cases for failures to use movzx due to various issues with demanded bits.
D42265 and D42313 should help with some of these.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@323030 91177308-0d34-0410-b5e6-96231b3b80d8
2018-01-20 07:50:57 +00:00
Saleem Abdulrasool
7cc04d977c test: fix ARM tests harder
Remove the missed check update for the removal of the x86 specific
vector call on ARM.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@323023 91177308-0d34-0410-b5e6-96231b3b80d8
2018-01-20 01:26:46 +00:00
Saleem Abdulrasool
d6398c64e4 test: move ARM test from x86
The ARM backend is not guaranteed to be present on x86, move the test to
the ARM tests.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@323021 91177308-0d34-0410-b5e6-96231b3b80d8
2018-01-20 01:03:11 +00:00
Saleem Abdulrasool
860652c3f8 CodeGen: handle llvm.used properly for COFF
`llvm.used` contains a list of pointers to named values which the
compiler, assembler, and linker are required to treat as if there is a
reference that they cannot see.  Ensure that the symbols are preserved
by adding an explicit `-include` reference to the linker command.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@323017 91177308-0d34-0410-b5e6-96231b3b80d8
2018-01-20 00:28:02 +00:00
Craig Topper
8bb1297fe0 [X86] Teach X86 codegen to use vector width preference to avoid promoting to 512-bit types when VLX is enabled and the preference is for a smaller size.
This change applies to places where we would turn 128/256-bit code into 512-bit in order to get a wider element type through sext/zext. Any 512-bit types that already existed in the IR/DAG will be left that way.

The width preference has no effect on codegen behavior when the target does not have AVX512 enabled. So AVX/AVX2 codegen cannot be limited via this mechanism yet.

If the preference is lower than 256 we may still use a 256 bit type to do the operation. Constraining to 128 bits makes it much more difficult to support some operations. For many of these cases we need to change element width while keeping element count constant which is easiest done by switching between 256 and 128 bit.

The preference is only obeyed when AVX512 and VLX are available. This means the preference is not obeyed for KNL, but is obeyed for SKX, Cannonlake, and Icelake. For KNL, the only way to do masked operation is on 512-bit registers so we would have to completely disable masking to obey the preference. We would also lose support for gather, scatter, ctlz, vXi64 multiplies, etc. This may change in the future, but this simplifies the initial implementation.

Differential Revision: https://reviews.llvm.org/D41895

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@323016 91177308-0d34-0410-b5e6-96231b3b80d8
2018-01-20 00:26:12 +00:00
Sanjay Patel
250016404b [x86] add tests for sqrt estimate that should respect denorms; NFC (PR34994)
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@323003 91177308-0d34-0410-b5e6-96231b3b80d8
2018-01-19 22:47:49 +00:00
Craig Topper
6c4e0e3baf [X86] Autogenerate complete checks on a couple tests. NFC
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@322997 91177308-0d34-0410-b5e6-96231b3b80d8
2018-01-19 22:04:20 +00:00
Jessica Paquette
fa14209a6c Add optional DICompileUnit to DIBuilder + make outliner debug info use it
Previously, the DIBuilder didn't expose functionality to set its compile unit
in any other way than calling createCompileUnit. This meant that the outliner,
which creates new functions, had to create a new compile unit for its debug
info.

This commit adds an optional parameter in the DIBuilder's constructor which
lets you set its CU at construction.

It also changes the MachineOutliner so that it keeps track of the DISubprograms
for each outlined sequence. If debugging information is requested, then it
uses one of the outlined sequence's DISubprograms to grab a CU. It then uses
that CU to construct the DISubprogram for the new outlined function.

The test has also been updated to reflect this change.

See https://reviews.llvm.org/D42254 for more information. Also see the e-mail
discussion on D42254 in llvm-commits for more context.



git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@322992 91177308-0d34-0410-b5e6-96231b3b80d8
2018-01-19 21:21:49 +00:00
Ulrich Weigand
8466b57b53 [SystemZ] Prefer LOCHI over generating IPM sequences
On current machines we have load-on-condition instructions that can be
used to directly implement the SETCC semantics.  If we have those, it is
always preferable to use them instead of generating the IPM sequence.



git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@322989 91177308-0d34-0410-b5e6-96231b3b80d8
2018-01-19 20:56:04 +00:00
Ulrich Weigand
ff28671f85 [SystemZ] Directly use CC result of compare-and-swap
In order to implement a test whether a compare-and-swap succeeded, the
SystemZ back-end currently emits a rather inefficient sequence of first
converting the CC result into an integer, and then testing that integer
against zero.  This commit changes the back-end to simply directly test
the CC value set by the compare-and-swap instruction.



git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@322988 91177308-0d34-0410-b5e6-96231b3b80d8
2018-01-19 20:54:18 +00:00
Ulrich Weigand
e762c03a67 [SystemZ] Rework IPM sequence generation
The SystemZ back-end uses a sequence of IPM followed by arithmetic
operations to implement the SETCC primitive.  This is currently done
early during SelectionDAG.  This patch moves generating those sequences
to much later in SelectionDAG (during PreprocessISelDAG).

This doesn't change much in generated code by itself, but it allows
further enhancements that will be checked-in as follow-on commits.



git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@322987 91177308-0d34-0410-b5e6-96231b3b80d8
2018-01-19 20:52:04 +00:00
Ulrich Weigand
a565e7db88 [SystemZ] Run branch-12.ll test only if long tests enabled
This avoids excessive test run times e.g. with expensive checks enabled.


git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@322983 91177308-0d34-0410-b5e6-96231b3b80d8
2018-01-19 19:51:38 +00:00
Simon Pilgrim
b1c9d37e54 [X86][SSE] Add SSE2 gather tests
Check codegen without PEXTRD 

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@322974 91177308-0d34-0410-b5e6-96231b3b80d8
2018-01-19 17:50:25 +00:00
Joel Galenson
1f504230b9 [ARM] Fix perf regression in compare optimization.
Fix a performance regression caused by r322737.

While trying to make it easier to replace compares with existing adds and
subtracts, I accidentally stopped it from doing so in some cases.  This should
fix that.  I'm also fixing another potential bug in that commit.

Differential Revision: https://reviews.llvm.org/D42263

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@322972 91177308-0d34-0410-b5e6-96231b3b80d8
2018-01-19 17:46:27 +00:00
Derek Schuff
667e26447a [WebAssembly] Fix libcall signature lookup
RuntimeLibcallSignatures previously manually initialized all the libcall
names into an array and searched it linearly for the first match to lookup
the corresponding index.
r322802 switched that to initializing a map keyed by the libcall name.
Neither of these approaches works correctly because some libcall numbers use
the same name on different platforms (e.g. the "l" suffixed functions
use f80 or f128 or ppcf128).

This change fixes that by ensuring that each name only goes into the map
once. It also adds tests.

Differential Revision: https://reviews.llvm.org/D42271

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@322971 91177308-0d34-0410-b5e6-96231b3b80d8
2018-01-19 17:45:54 +00:00
Dan Gohman
cee475a4d7 [WebAssembly] Make sign-extension opcodes a distinct feature.
Sign-extension opcodes have been split into a separate proposal from
the main threads proposal, so switch them to their own target
feature. See:

https://github.com/WebAssembly/sign-extension-ops


git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@322966 91177308-0d34-0410-b5e6-96231b3b80d8
2018-01-19 17:16:24 +00:00
Daniel Neilson
afa2e7e6a6 Remove alignment argument from memcpy/memmove/memset in favour of alignment attributes (Step 1)
Summary:
 This is a resurrection of work first proposed and discussed in Aug 2015:
   http://lists.llvm.org/pipermail/llvm-dev/2015-August/089384.html
and initially landed (but then backed out) in Nov 2015:
   http://lists.llvm.org/pipermail/llvm-commits/Week-of-Mon-20151109/312083.html

 The @llvm.memcpy/memmove/memset intrinsics currently have an explicit argument
which is required to be a constant integer. It represents the alignment of the
dest (and source), and so must be the minimum of the actual alignment of the
two.

 This change is the first in a series that allows source and dest to each
have their own alignments by using the alignment attribute on their arguments.

 In this change we:
1) Remove the alignment argument.
2) Add alignment attributes to the source & dest arguments. We, temporarily,
   require that the alignments for source & dest be equal.

 For example, code which used to read:
  call void @llvm.memcpy.p0i8.p0i8.i32(i8* %dest, i8* %src, i32 100, i32 4, i1 false)
will now read
  call void @llvm.memcpy.p0i8.p0i8.i32(i8* align 4 %dest, i8* align 4 %src, i32 100, i1 false)

 Downstream users may have to update their lit tests that check for
@llvm.memcpy/memmove/memset call/declaration patterns. The following extended sed script
may help with updating the majority of your tests, but it does not catch all possible
patterns so some manual checking and updating will be required.

s~declare void @llvm\.mem(set|cpy|move)\.p([^(]*)\((.*), i32, i1\)~declare void @llvm.mem\1.p\2(\3, i1)~g
s~call void @llvm\.memset\.p([^(]*)i8\(i8([^*]*)\* (.*), i8 (.*), i8 (.*), i32 [01], i1 ([^)]*)\)~call void @llvm.memset.p\1i8(i8\2* \3, i8 \4, i8 \5, i1 \6)~g
s~call void @llvm\.memset\.p([^(]*)i16\(i8([^*]*)\* (.*), i8 (.*), i16 (.*), i32 [01], i1 ([^)]*)\)~call void @llvm.memset.p\1i16(i8\2* \3, i8 \4, i16 \5, i1 \6)~g
s~call void @llvm\.memset\.p([^(]*)i32\(i8([^*]*)\* (.*), i8 (.*), i32 (.*), i32 [01], i1 ([^)]*)\)~call void @llvm.memset.p\1i32(i8\2* \3, i8 \4, i32 \5, i1 \6)~g
s~call void @llvm\.memset\.p([^(]*)i64\(i8([^*]*)\* (.*), i8 (.*), i64 (.*), i32 [01], i1 ([^)]*)\)~call void @llvm.memset.p\1i64(i8\2* \3, i8 \4, i64 \5, i1 \6)~g
s~call void @llvm\.memset\.p([^(]*)i128\(i8([^*]*)\* (.*), i8 (.*), i128 (.*), i32 [01], i1 ([^)]*)\)~call void @llvm.memset.p\1i128(i8\2* \3, i8 \4, i128 \5, i1 \6)~g
s~call void @llvm\.memset\.p([^(]*)i8\(i8([^*]*)\* (.*), i8 (.*), i8 (.*), i32 ([0-9]*), i1 ([^)]*)\)~call void @llvm.memset.p\1i8(i8\2* align \6 \3, i8 \4, i8 \5, i1 \7)~g
s~call void @llvm\.memset\.p([^(]*)i16\(i8([^*]*)\* (.*), i8 (.*), i16 (.*), i32 ([0-9]*), i1 ([^)]*)\)~call void @llvm.memset.p\1i16(i8\2* align \6 \3, i8 \4, i16 \5, i1 \7)~g
s~call void @llvm\.memset\.p([^(]*)i32\(i8([^*]*)\* (.*), i8 (.*), i32 (.*), i32 ([0-9]*), i1 ([^)]*)\)~call void @llvm.memset.p\1i32(i8\2* align \6 \3, i8 \4, i32 \5, i1 \7)~g
s~call void @llvm\.memset\.p([^(]*)i64\(i8([^*]*)\* (.*), i8 (.*), i64 (.*), i32 ([0-9]*), i1 ([^)]*)\)~call void @llvm.memset.p\1i64(i8\2* align \6 \3, i8 \4, i64 \5, i1 \7)~g
s~call void @llvm\.memset\.p([^(]*)i128\(i8([^*]*)\* (.*), i8 (.*), i128 (.*), i32 ([0-9]*), i1 ([^)]*)\)~call void @llvm.memset.p\1i128(i8\2* align \6 \3, i8 \4, i128 \5, i1 \7)~g
s~call void @llvm\.mem(cpy|move)\.p([^(]*)i8\(i8([^*]*)\* (.*), i8([^*]*)\* (.*), i8 (.*), i32 [01], i1 ([^)]*)\)~call void @llvm.mem\1.p\2i8(i8\3* \4, i8\5* \6, i8 \7, i1 \8)~g
s~call void @llvm\.mem(cpy|move)\.p([^(]*)i16\(i8([^*]*)\* (.*), i8([^*]*)\* (.*), i16 (.*), i32 [01], i1 ([^)]*)\)~call void @llvm.mem\1.p\2i16(i8\3* \4, i8\5* \6, i16 \7, i1 \8)~g
s~call void @llvm\.mem(cpy|move)\.p([^(]*)i32\(i8([^*]*)\* (.*), i8([^*]*)\* (.*), i32 (.*), i32 [01], i1 ([^)]*)\)~call void @llvm.mem\1.p\2i32(i8\3* \4, i8\5* \6, i32 \7, i1 \8)~g
s~call void @llvm\.mem(cpy|move)\.p([^(]*)i64\(i8([^*]*)\* (.*), i8([^*]*)\* (.*), i64 (.*), i32 [01], i1 ([^)]*)\)~call void @llvm.mem\1.p\2i64(i8\3* \4, i8\5* \6, i64 \7, i1 \8)~g
s~call void @llvm\.mem(cpy|move)\.p([^(]*)i128\(i8([^*]*)\* (.*), i8([^*]*)\* (.*), i128 (.*), i32 [01], i1 ([^)]*)\)~call void @llvm.mem\1.p\2i128(i8\3* \4, i8\5* \6, i128 \7, i1 \8)~g
s~call void @llvm\.mem(cpy|move)\.p([^(]*)i8\(i8([^*]*)\* (.*), i8([^*]*)\* (.*), i8 (.*), i32 ([0-9]*), i1 ([^)]*)\)~call void @llvm.mem\1.p\2i8(i8\3* align \8 \4, i8\5* align \8 \6, i8 \7, i1 \9)~g
s~call void @llvm\.mem(cpy|move)\.p([^(]*)i16\(i8([^*]*)\* (.*), i8([^*]*)\* (.*), i16 (.*), i32 ([0-9]*), i1 ([^)]*)\)~call void @llvm.mem\1.p\2i16(i8\3* align \8 \4, i8\5* align \8 \6, i16 \7, i1 \9)~g
s~call void @llvm\.mem(cpy|move)\.p([^(]*)i32\(i8([^*]*)\* (.*), i8([^*]*)\* (.*), i32 (.*), i32 ([0-9]*), i1 ([^)]*)\)~call void @llvm.mem\1.p\2i32(i8\3* align \8 \4, i8\5* align \8 \6, i32 \7, i1 \9)~g
s~call void @llvm\.mem(cpy|move)\.p([^(]*)i64\(i8([^*]*)\* (.*), i8([^*]*)\* (.*), i64 (.*), i32 ([0-9]*), i1 ([^)]*)\)~call void @llvm.mem\1.p\2i64(i8\3* align \8 \4, i8\5* align \8 \6, i64 \7, i1 \9)~g
s~call void @llvm\.mem(cpy|move)\.p([^(]*)i128\(i8([^*]*)\* (.*), i8([^*]*)\* (.*), i128 (.*), i32 ([0-9]*), i1 ([^)]*)\)~call void @llvm.mem\1.p\2i128(i8\3* align \8 \4, i8\5* align \8 \6, i128 \7, i1 \9)~g

 The remaining changes in the series will:
Step 2) Expand the IRBuilder API to allow creation of memcpy/memmove with differing
   source and dest alignments.
Step 3) Update Clang to use the new IRBuilder API.
Step 4) Update Polly to use the new IRBuilder API.
Step 5) Update LLVM passes that create memcpy/memmove calls to use the new IRBuilder API,
        and those that use use MemIntrinsicInst::[get|set]Alignment() to use
        getDestAlignment() and getSourceAlignment() instead.
Step 6) Remove the single-alignment IRBuilder API for memcpy/memmove, and the
        MemIntrinsicInst::[get|set]Alignment() methods.

Reviewers: pete, hfinkel, lhames, reames, bollu

Reviewed By: reames

Subscribers: niosHD, reames, jholewinski, qcolombet, jfb, sanjoy, arsenm, dschuff, dylanmckay, mehdi_amini, sdardis, nemanjai, david2050, nhaehnle, javed.absar, sbc100, jgravelle-google, eraman, aheejin, kbarton, JDevlieghere, asb, rbar, johnrusso, simoncook, jordy.potman.lists, apazos, sabuasal, llvm-commits

Differential Revision: https://reviews.llvm.org/D41675

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@322965 91177308-0d34-0410-b5e6-96231b3b80d8
2018-01-19 17:13:12 +00:00
Sanjay Patel
9e17395714 [x86] add RUN line and auto-generate checks
There were checks for a 32-bit target here, but no RUN line
corresponding to that prefix. I don't know what the intent
of these tests is, but at least now we can see what happens
for both targets.


git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@322961 91177308-0d34-0410-b5e6-96231b3b80d8
2018-01-19 17:09:28 +00:00
Sanjay Patel
13149c89f3 [x86] regenerate complete checks; NFC
D42265 will improve something here, but it's not obvious how without more checks.


git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@322960 91177308-0d34-0410-b5e6-96231b3b80d8
2018-01-19 17:05:16 +00:00
Sanjay Patel
4956f49847 [x86] shrink 'and' immediate values by setting the high bits (PR35907)
Try to reverse the constant-shrinking that happens in SimplifyDemandedBits()
for 'and' masks when it results in a smaller sign-extended immediate.

We are also able to detect dead 'and' ops here (the mask is all ones). In
that case, we replace and return without selecting the 'and'.

Other targets might want to share some of this logic by enabling this under a
target hook, but I didn't see diffs for simple cases with PowerPC or AArch64,
so they may already have some specialized logic for this kind of thing or have
different needs.

This should solve PR35907:
https://bugs.llvm.org/show_bug.cgi?id=35907

Differential Revision: https://reviews.llvm.org/D42088


git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@322957 91177308-0d34-0410-b5e6-96231b3b80d8
2018-01-19 16:37:25 +00:00