1633 Commits

Author SHA1 Message Date
Florian Hahn
3cd4111205 [ARM] Sink zext/sext operands for add and sub to enable vsubl generation.
This uses the infrastructure added in rL353152 to sink zext and sexts to
sub/add users, to enable vsubl/vaddl generation when NEON is available.

See https://bugs.llvm.org/show_bug.cgi?id=40025.

Reviewers: SjoerdMeijer, t.p.northover, samparker, efriedma

Reviewed By: samparker

Differential Revision: https://reviews.llvm.org/D58063

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@355460 91177308-0d34-0410-b5e6-96231b3b80d8
2019-03-06 00:10:03 +00:00
Oliver Stannard
3a22eb400f [ARM] Fix select_cc lowering for fp16
When lowering a select_cc node where the true and false values are of type f16,
we can't use a general conditional move because the FP16 instructions do not
support conditional execution. Instead, we must ensure that the condition code
is one of the four supported by the VSEL instruction.

Differential revision: https://reviews.llvm.org/D58813



git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@355385 91177308-0d34-0410-b5e6-96231b3b80d8
2019-03-05 10:42:34 +00:00
Oliver Stannard
c064077099 [ARM] Consider undefined-on-NaN conditions in checkVSELConstraints
This function was not checking for the condition code variants which are
undefined if either input is NaN, so we were missing selection of the VSEL
instruction in some cases when using -fno-honor-nans or -ffast-math.

Differential revision: https://reviews.llvm.org/D58812



git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@355199 91177308-0d34-0410-b5e6-96231b3b80d8
2019-03-01 13:58:25 +00:00
Kadir Cetinkaya
0ad049beea [Target][ARM] Add a usage for SrcSz to unbreak build-bots without assertions
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@355101 91177308-0d34-0410-b5e6-96231b3b80d8
2019-02-28 15:55:11 +00:00
Bjorn Pettersson
85de1fd399 Add support for computing "zext of value" in KnownBits. NFCI
Summary:
The description of KnownBits::zext() and
KnownBits::zextOrTrunc() has confusingly been telling
that the operation is equivalent to zero extending the
value we're tracking. That has not been true, instead
the user has been forced to explicitly set the extended
bits as known zero afterwards.

This patch adds a second argument to KnownBits::zext()
and KnownBits::zextOrTrunc() to control if the extended
bits should be considered as known zero or as unknown.

Reviewers: craig.topper, RKSimon

Reviewed By: RKSimon

Subscribers: javed.absar, hiraditya, jdoerfert, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D58650

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@355099 91177308-0d34-0410-b5e6-96231b3b80d8
2019-02-28 15:45:29 +00:00
Sam Parker
fcb98b7928 [ARM] Add OptMinSize to ARMSubtarget
In many places in the backend, we like to know whether we're
optimising for code size and this is performed by checking the
current machine function attributes. A subtarget is created on a
per-function basis, so it's possible to know when we're compiling for
code size on construction so record this in the new object.

Differential Revision: https://reviews.llvm.org/D57812


git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@353501 91177308-0d34-0410-b5e6-96231b3b80d8
2019-02-08 07:57:42 +00:00
James Y Knight
3bab951f0f [opaque pointer types] Pass value type to GetElementPtr creation.
This cleans up all GetElementPtr creation in LLVM to explicitly pass a
value type rather than deriving it from the pointer's element-type.

Differential Revision: https://reviews.llvm.org/D57173

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@352913 91177308-0d34-0410-b5e6-96231b3b80d8
2019-02-01 20:44:47 +00:00
Reid Kleckner
2f8d69acb0 [ARM] Deduplicate table generated CC analysis code
Create ARMCallingConv.cpp and emit code for calling convention analysis
from there.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@352431 91177308-0d34-0410-b5e6-96231b3b80d8
2019-01-28 21:28:43 +00:00
Matt Arsenault
52f6127659 Reapply "IR: Add fp operations to atomicrmw"
This reapplies commits r351778 and r351782 with
RISCV test fixes.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@351850 91177308-0d34-0410-b5e6-96231b3b80d8
2019-01-22 18:18:02 +00:00
Chandler Carruth
73f9a1d617 Revert r351778: IR: Add fp operations to atomicrmw
This broke the RISCV build, and even with that fixed, one of the RISCV
tests behaves surprisingly differently with asserts than without,
leaving there no clear test pattern to use. Generally it seems bad for
hte IR to differ substantially due to asserts (as in, an alloca is used
with asserts that isn't needed without!) and nothing I did simply would
fix it so I'm reverting back to green.

This also required reverting the RISCV build fix in r351782.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@351796 91177308-0d34-0410-b5e6-96231b3b80d8
2019-01-22 10:29:58 +00:00
Matt Arsenault
020dc8d94b IR: Add fp operations to atomicrmw
Add just fadd/fsub for now.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@351778 91177308-0d34-0410-b5e6-96231b3b80d8
2019-01-22 03:32:36 +00:00
Eli Friedman
6788e26abe [ARM] Combine ands+lsls to lsls+lsrs for Thumb1.
This patch may seem familiar... but my previous patch handled the
equivalent lsls+and, not this case.  Usually instcombine puts the
"and" after the shift, so this case doesn't come up. However, if the
shift comes out of a GEP, it won't get canonicalized by instcombine,
and DAGCombine doesn't have an equivalent transform.

This also modifies isDesirableToCommuteWithShift to suppress DAGCombine
transforms which would make the overall code worse.

I'm not really happy adding a bunch of code to handle this, but it would
probably be tricky to substantially improve the behavior of DAGCombine
here.

Differential Revision: https://reviews.llvm.org/D56032



git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@351776 91177308-0d34-0410-b5e6-96231b3b80d8
2019-01-22 01:51:37 +00:00
Chandler Carruth
6b547686c5 Update the file headers across all of the LLVM projects in the monorepo
to reflect the new license.

We understand that people may be surprised that we're moving the header
entirely to discuss the new license. We checked this carefully with the
Foundation's lawyer and we believe this is the correct approach.

Essentially, all code in the project is now made available by the LLVM
project under our new license, so you will see that the license headers
include that license only. Some of our contributors have contributed
code under our old license, and accordingly, we have retained a copy of
our old license notice in the top-level files in each project and
repository.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@351636 91177308-0d34-0410-b5e6-96231b3b80d8
2019-01-19 08:50:56 +00:00
Diogo N. Sampaio
08d38f01b3 [ARM] ComputeKnownBits to handle extract vectors
This patch adds the sign/zero extension done by
vgetlane to ARM computeKnownBitsForTargetNode.

Differential revision: https://reviews.llvm.org/D56098


git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@350553 91177308-0d34-0410-b5e6-96231b3b80d8
2019-01-07 19:01:47 +00:00
Simon Pilgrim
7edfa8c28b [ARM] Always use the version of computeKnownBits that returns a value. NFCI.
Continues the work started by @bogner in rL340594 to remove uses of the KnownBits output paramater version.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@349909 91177308-0d34-0410-b5e6-96231b3b80d8
2018-12-21 15:15:38 +00:00
Eli Friedman
8e797de70d [ARM] Complete the Thumb1 shift+and->shift+shift transforms.
This saves materializing the immediate.  The additional forms are less
common (they don't usually show up for bitfield insert/extract), but
they're still relevant.

I had to add a new target hook to prevent DAGCombine from reversing the
transform. That isn't the only possible way to solve the conflict, but
it seems straightforward enough.

Differential Revision: https://reviews.llvm.org/D55630



git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@349857 91177308-0d34-0410-b5e6-96231b3b80d8
2018-12-20 23:39:54 +00:00
Tim Northover
e2e53c456d ARM: use acquire/release instruction variants when available.
These features (fairly) recently got split out into their own feature, so we
should make CodeGen use them when available. The main change here is that the
check used to be based on the triple, but now it's based on CPU features.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@349355 91177308-0d34-0410-b5e6-96231b3b80d8
2018-12-17 15:05:32 +00:00
Tim Northover
0487bd8f42 ARM: use target-specific SUBS node when combining cmp with cmov.
This has two positive effects. First, using a custom node prevents
recombination leading to an infinite loop since the output DAG is notionally a
little more complex than the input one. Using a flag-setting instruction also
allows the subtraction to be folded with the related comparison more easily.

https://reviews.llvm.org/D53190

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@348122 91177308-0d34-0410-b5e6-96231b3b80d8
2018-12-03 11:16:21 +00:00
Sjoerd Meijer
b6be32f152 [ARM] Don't expand sdiv when optimising for minsize
Don't expand SDIV with an immediate that is a power of 2 if we optimise for
minimum code size. For example:

sdiv %1, i32 4

gets expanded to a sequence of 3 instructions, but this is suboptimal for
minimum code size so instead we just generate a MOV and a SDIV if integer
division is supported.

Differential Revision: https://reviews.llvm.org/D54546


git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@347965 91177308-0d34-0410-b5e6-96231b3b80d8
2018-11-30 08:14:28 +00:00
Eli Friedman
7c67a1fab1 [ARM] Fix CPSR liveness in tMOVCCr_pseudo lowering.
The lowering was missing live-ins in certain cases, like a sequence of
multiple tMOVCCr_pseudo instructions.  This would lead to a verifier
failure, and on pre-v6 Thumb CPSR would be incorrectly clobbered.

For reasons I don't completely understand, it's hard to get a sequence
of multiple tMOVCCr_pseudo instructions; the issue only seems to show up
with 64-bit comparisons where the result is zero-extended. I added some
extra testcases in case that changes in the future. Probably some
optimization opportunities here if anyone is interested. (@test_slt_not
is the case that was getting miscompiled.)

The code to check the liveness of CPSR was stolen from
X86ISelLowering.cpp; maybe it could be refactored into common helper,
but I have no idea where to put it.

Differential Revision: https://reviews.llvm.org/D54192



git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@346355 91177308-0d34-0410-b5e6-96231b3b80d8
2018-11-07 21:08:13 +00:00
Thomas Lively
bbc2ea9b21 [NFC] Rename minnan and maxnan to minimum and maximum
Summary:
Changes all uses of minnan/maxnan to minimum/maximum
globally. These names emphasize that the semantic difference between
these operations is more than just NaN-propagation.

Reviewers: arsenm, aheejin, dschuff, javed.absar

Subscribers: jholewinski, sdardis, wdng, sbc100, jgravelle-google, jrtc27, atanasyan, llvm-commits

Differential Revision: https://reviews.llvm.org/D53112

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@345218 91177308-0d34-0410-b5e6-96231b3b80d8
2018-10-24 22:49:55 +00:00
Peter Collingbourne
fdd0e51f3e ARM: Use BKPT instead of TRAP to implement llvm.debugtrap.
The BKPT instruction is specified to cause a software breakpoint,
and at least on Linux results in a SIGTRAP. This makes it more
suitable for implementing debugtrap than TRAP (aka UDF #254), which
is specified to cause an undefined instruction exception and results
in a SIGILL on Linux.

Moreover, BKPT is not marked as a terminator, which is not only
consistent with the IR instruction but allows the analyzeBlock
function to correctly analyze a basic block containing the instruction,
which fixes an assertion failure in the machine block placement pass
previously triggered by the included test case.

Because BKPT is only supported starting with ARMv5T, we continue to
use UDF #254 when targeting v4T.

Differential Revision: https://reviews.llvm.org/D53614

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@345171 91177308-0d34-0410-b5e6-96231b3b80d8
2018-10-24 18:10:38 +00:00
Saleem Abdulrasool
9a2b84c951 ARM: handle checking aliases with out-of-bounds GEPs
A global alias may use indices which are not considered in bounds.  In
such a case, accessing the base object will fail as it only peers
through inbounds accesses.  This pattern is used by the swift compiler
to create references to preceeding members in the type metadata.  This
would cause the code generation to fail when targeting a platform that
used ELF as the object file format.  Be conservative and fail the
read-only check if we run into an alias that we cannot peer through.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@345107 91177308-0d34-0410-b5e6-96231b3b80d8
2018-10-24 00:00:52 +00:00
Simon Pilgrim
7f770f7d21 [ARM][NEON] Improve vector popcnt lowering with PADDL (PR39281)
As I suggested on PR39281, this patch uses PADDL pairwise addition to widen from the vXi8 CTPOP result to the target vector type.

This is a blocker for moving more x86 code to generic vector CTPOP expansion (P32655 + D53258) - ARM's vXi64 CTPOP currently expands, which would generate a vXi64 MUL but ARM's custom lowering expands the general MUL case and vectors aren't well handled in LegalizeDAG - improving the CTPOP lowering was a lot easier than fixing the MUL lowering for this one case......

Differential Revision: https://reviews.llvm.org/D53257

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@344512 91177308-0d34-0410-b5e6-96231b3b80d8
2018-10-15 13:20:41 +00:00
Eli Friedman
b10248393f [ARM] Fix correctness checks in promoteToConstantPool.
Correctly check for relocations in the constant to promote. And don't
allow promoting a constant multiple times.

This partially fixes https://bugs.llvm.org//show_bug.cgi?id=32780 ;
it's not a complete fix because we also need to prevent
ARMConstantIslands from cloning the constant.

(-arm-promote-constant is currently off by default, and it stays off
with this patch. I'll look into turning it on again when all the known
issues are fixed.)

Differential Revision: https://reviews.llvm.org/D51472



git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@343361 91177308-0d34-0410-b5e6-96231b3b80d8
2018-09-28 20:27:31 +00:00
Eli Friedman
770516eb32 [ARM] Use preferred alignment for constants in promoteToConstantPool.
This mostly affects IR generated by non-clang frontends because clang
generally sets the alignment of globals explicitly.

Fixes https://bugs.llvm.org//show_bug.cgi?id=32394 .

(-arm-promote-constant is currently off by default, and it stays off
with this patch. I'll look into turning it on again when all the known
issues are fixed.)

Differential Revision: https://reviews.llvm.org/D51469



git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@343359 91177308-0d34-0410-b5e6-96231b3b80d8
2018-09-28 20:21:51 +00:00
Nirav Dave
7ee7224156 [ARM] Share predecessor bookkeeping in CombineBaseUpdate. NFCI.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@342987 91177308-0d34-0410-b5e6-96231b3b80d8
2018-09-25 15:30:47 +00:00
Alex Bradbury
490f68fb29 [AtomicExpandPass]: Add a hook for custom cmpxchg expansion in IR
This involves changing the shouldExpandAtomicCmpXchgInIR interface, but I have 
updated the in-tree backends using this hook (ARM, AArch64, Hexagon) so they 
will see no functional change. Previously this hook returned bool, but it now 
returns AtomicExpansionKind.

This hook allows targets to select how a given cmpxchg is to be expanded. 
D48131 uses this to expand part-word cmpxchg to a target-specific intrinsic.

See my associated RFC for more info on the motivation for this change 
<http://lists.llvm.org/pipermail/llvm-dev/2018-June/123993.html>.

Differential Revision: https://reviews.llvm.org/D48130


git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@342550 91177308-0d34-0410-b5e6-96231b3b80d8
2018-09-19 14:51:42 +00:00
Tim Northover
29369e8ff6 ARM: align loops to 4 bytes on Cortex-M3 and Cortex-M4.
The Technical Reference Manuals for these two CPUs state that branching
to an unaligned 32-bit instruction incurs an extra pipeline reload
penalty. That's bad.

This also enables the optimization at -Os since it costs on average one
byte per loop in return for 1 cycle per iteration, which is pretty good
going.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@342127 91177308-0d34-0410-b5e6-96231b3b80d8
2018-09-13 10:28:05 +00:00
Martin Storsjo
5d04fc0ef3 [MinGW] Move code for indicating "potentially not DSO local" into shouldAssumeDSOLocal. NFC.
On Windows, if shouldAssumeDSOLocal returns false, it's either a
dllimport reference, or a reference that we should treat as non-local
and create a stub for.

Clean up AArch64Subtarget::ClassifyGlobalReference a little while
touching the flag handling relating to dllimport.

Differential Revision: https://reviews.llvm.org/D51590

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@341402 91177308-0d34-0410-b5e6-96231b3b80d8
2018-09-04 20:56:28 +00:00
Martin Storsjo
d67818ff54 [MinGW] [ARM] Add stubs for potential automatic dllimported variables
The runtime pseudo relocations can't handle the ARM format embedded
addresses in movw/movt pairs. By using stubs, the potentially
dllimported addresses can be touched up by the runtime pseudo relocation
framework.

Differential Revision: https://reviews.llvm.org/D51450

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@341176 91177308-0d34-0410-b5e6-96231b3b80d8
2018-08-31 08:00:25 +00:00
Eli Friedman
575895c12e [ARM] Lower llvm.ctlz.i32 to a libcall when clz is not available.
The inline sequence is very long (about 70 bytes on Thumb1), so it's
not really a good idea to inline it, especially when optimizing for
size.

Differential Revision: https://reviews.llvm.org/D47917



git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@340458 91177308-0d34-0410-b5e6-96231b3b80d8
2018-08-22 21:47:14 +00:00
Eli Friedman
3c6df19a68 [ARM] Handle all-ones mask explicitly in targetShrinkDemandedConstant.
This avoids a potential infinite loop setting and unsetting bits in the
mask.

Reduced from a failure on the polly-aosp bot.

Differential Revision: https://reviews.llvm.org/D51066



git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@340446 91177308-0d34-0410-b5e6-96231b3b80d8
2018-08-22 20:13:45 +00:00
David Green
3d765ce4b7 [AArch64] Add Tiny Code Model for AArch64
This adds the plumbing for the Tiny code model for the AArch64 backend. This,
instead of loading addresses through the normal ADRP;ADD pair used in the Small
model, uses a single ADR. The 21 bit range of an ADR means that the code and
its statically defined symbols need to be within 1MB of each other.

This makes it mostly interesting for embedded applications where we want to fit
as much as we can in as small a space as possible.

Differential Revision: https://reviews.llvm.org/D49673


git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@340397 91177308-0d34-0410-b5e6-96231b3b80d8
2018-08-22 11:31:39 +00:00
Chandler Carruth
fc187011be [SDAG] Remove the reliance on MI's allocation strategy for
`MachineMemOperand` pointers attached to `MachineSDNodes` and instead
have the `SelectionDAG` fully manage the memory for this array.

Prior to this change, the memory management was deeply confusing here --
The way the MI was built relied on the `SelectionDAG` allocating memory
for these arrays of pointers using the `MachineFunction`'s allocator so
that the raw pointer to the array could be blindly copied into an
eventual `MachineInstr`. This creates a hard coupling between how
`MachineInstr`s allocate their array of `MachineMemOperand` pointers and
how the `MachineSDNode` does.

This change is motivated in large part by a change I am making to how
`MachineFunction` allocates these pointers, but it seems like a layering
improvement as well.

This would run the risk of increasing allocations overall, but I've
implemented an optimization that should avoid that by storing a single
`MachineMemOperand` pointer directly instead of allocating anything.
This is expected to be a net win because the vast majority of uses of
these only need a single pointer.

As a side-effect, this makes the API for updating a `MachineSDNode` and
a `MachineInstr` reasonably different which seems nice to avoid
unexpected coupling of these two layers. We can map between them, but we
shouldn't be *surprised* at where that occurs. =]

Differential Revision: https://reviews.llvm.org/D50680

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@339740 91177308-0d34-0410-b5e6-96231b3b80d8
2018-08-14 23:30:32 +00:00
Eli Friedman
1a5e05d070 [ARM] Make PerformSHLSimplify add nodes to the DAG worklist correctly.
Intentionally excluding nodes from the DAGCombine worklist is likely to
lead to weird optimizations and infinite loops, so it's generally a bad
idea.

To avoid the infinite loops, fix DAGCombine to use the
isDesirableToCommuteWithShift target hook before performing the
transforms in question, and implement the target hook in the ARM backend
disable the transforms in question.

Fixes https://bugs.llvm.org/show_bug.cgi?id=38530 . (I don't have a
reduced testcase for that bug. But we should have sufficient test
coverage for PerformSHLSimplify given that we're not playing weird
tricks with the worklist. I can try to bugpoint it if necessary,
though.)

Differential Revision: https://reviews.llvm.org/D50667



git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@339734 91177308-0d34-0410-b5e6-96231b3b80d8
2018-08-14 22:10:25 +00:00
Eli Friedman
033cdeca67 Fix unused lambda capture warning from r339472.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@339479 91177308-0d34-0410-b5e6-96231b3b80d8
2018-08-10 22:03:25 +00:00
Eli Friedman
0365cb9494 [ARM] Adjust AND immediates to make them cheaper to select.
LLVM normally prefers to minimize the number of bits set in an AND
immediate, but that doesn't always match the available ARM instructions.
In Thumb1 mode, prefer uxtb or uxth where possible; otherwise, prefer
a two-instruction sequence movs+ands or movs+bics.

Some potential improvements outlined in
ARMTargetLowering::targetShrinkDemandedConstant, but seems to work
pretty well already.

The ARMISelDAGToDAG fix ensures we don't generate an invalid UBFX
instruction due to a larger-than-expected mask. (It's orthogonal, in
some sense, but as far as I can tell it's either impossible or nearly
impossible to reproduce the bug without this change.)

According to my testing, this seems to consistently improve codesize by
a small amount by forming bic more often for ISD::AND with an immediate.

Differential Revision: https://reviews.llvm.org/D50030



git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@339472 91177308-0d34-0410-b5e6-96231b3b80d8
2018-08-10 21:21:53 +00:00
Sjoerd Meijer
93ef7cf14b [ARM] FP16: support vector INT_TO_FP and FP_TO_INT
This adds codegen support for the different vcvt_f16 variants.

Differential Revision: https://reviews.llvm.org/D50393


git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@339227 91177308-0d34-0410-b5e6-96231b3b80d8
2018-08-08 09:45:34 +00:00
Sjoerd Meijer
ea2a187f0d [ARM] FP16: support the vector vmin and vmax variants
Differential Revision: https://reviews.llvm.org/D50238


git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@339221 91177308-0d34-0410-b5e6-96231b3b80d8
2018-08-08 07:20:15 +00:00
Fangrui Song
af7b1832a0 Remove trailing space
sed -Ei 's/[[:space:]]+$//' include/**/*.{def,h,td} lib/**/*.{cpp,h}

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@338293 91177308-0d34-0410-b5e6-96231b3b80d8
2018-07-30 19:41:25 +00:00
Eli Friedman
999d896788 [ARM] Prefer lsls+lsrs over lsls+ands or lsrs+ands in Thumb1.
Saves materializing the immediate for the "ands".

Corresponding patterns exist for lsrs+lsls, but that seems less common
in practice.

Now implemented as a DAGCombine.

Differential Revision: https://reviews.llvm.org/D49585



git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@337945 91177308-0d34-0410-b5e6-96231b3b80d8
2018-07-25 18:22:22 +00:00
Tim Northover
b7eb4975c4 ARM: stop explicitly marking armv7k libcalls as hard-float. NFC.
Since the triple's default is hard float, the libcalls will already use VFP
registers.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@337386 91177308-0d34-0410-b5e6-96231b3b80d8
2018-07-18 12:37:43 +00:00
Eli Friedman
43b8da3b5b [ARM] Treat cmn immediates as legal in isLegalICmpImmediate.
The original code attempted to do this, but the std::abs() call didn't
actually do anything due to implicit type conversions.  Fix the type
conversions, and perform the correct check for negative immediates.

This probably has very little practical impact, but it's worth fixing
just to avoid confusion in the future, I think.

Differential Revision: https://reviews.llvm.org/D48907



git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@336742 91177308-0d34-0410-b5e6-96231b3b80d8
2018-07-10 23:44:37 +00:00
Ivan A. Kosarev
e816e74216 [NEON] Fix combining of vldx_dup intrinsics with updating of base addresses
Resolves:
Unsupported ARM Neon intrinsics in Target-specific DAG combine
function for VLDDUP
https://bugs.llvm.org/show_bug.cgi?id=38031

Related diff: D48439

Differential Revision: https://reviews.llvm.org/D48920


git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@336325 91177308-0d34-0410-b5e6-96231b3b80d8
2018-07-05 08:59:49 +00:00
Vadzim Dambrouski
2546414701 [ARM] Fix PR37382: Don't optimize mul.with.overflow on thumbv6m.
Reviewers: efriedma, rogfer01, javed.absar

Reviewed By: efriedma, rogfer01

Subscribers: kristof.beyls, chrib, llvm-commits

Differential Revision: https://reviews.llvm.org/D48846

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@336144 91177308-0d34-0410-b5e6-96231b3b80d8
2018-07-02 21:05:26 +00:00
Ivan A. Kosarev
212054e1a9 [NEON] Support vldNq intrinsics in AArch32 (LLVM part)
This patch adds support for the q versions of the dup
(load-to-all-lanes) NEON intrinsics, such as vld2q_dup_f16() for
example.

Currently, non-q versions of the dup intrinsics are implemented
in clang by generating IR that first loads the elements of the
structure into the first lane with the lane (to-single-lane)
intrinsics, and then propagating it other lanes. There are at
least two problems with this approach. First, there are no
double-spaced to-single-lane byte-element instructions. For
example, there is no such instruction as 'vld2.8 { d0[0], d2[0]
}, [r0]'. That means we cannot rely on the to-single-lane
intrinsics and instructions to implement the q versions of the
dup intrinsics. Note that to-all-lanes instructions do support
all sizes of data items, including bytes.

The second problem with the current approach is that we need a
separate vdup instruction to propagate the structure to each
lane. So for vld4q_dup_f16() we would need four vdup instructions
in addition to the initial vld instruction.

This patch introduces dup LLVM intrinsics and reworks handling of
the currently supported (non-q) NEON dup intrinsics to expand
them into those LLVM intrinsics, thus eliminating the need for
using to-single-lane intrinsics and instructions.

Additionally, this patch adds support for u64 and s64 dup NEON
intrinsics. These are marked as Arch64-only in the ARM NEON
Reference, but it seems there are no reasons to not support them
in AArch32 mode. Please correct, if that is wrong.

That's what we generate with this patch applied:

vld2q_dup_f16:
  vld2.16 {d0[], d2[]}, [r0]
  vld2.16 {d1[], d3[]}, [r0]

vld3q_dup_f16:
  vld3.16 {d0[], d2[], d4[]}, [r0]
  vld3.16 {d1[], d3[], d5[]}, [r0]

vld4q_dup_f16:
  vld4.16 {d0[], d2[], d4[], d6[]}, [r0]
  vld4.16 {d1[], d3[], d5[], d7[]}, [r0]

Differential Revision: https://reviews.llvm.org/D48439


git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@335733 91177308-0d34-0410-b5e6-96231b3b80d8
2018-06-27 13:57:52 +00:00
Ivan A. Kosarev
c7f180e8c4 [NEON] Support VST1xN intrinsics in AArch32 mode (LLVM part)
We currently support them only in AArch64. The NEON Reference,
however, says they are 'ARMv7, ARMv8' intrinsics.

Differential Revision: https://reviews.llvm.org/D47447


git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@334361 91177308-0d34-0410-b5e6-96231b3b80d8
2018-06-10 09:27:27 +00:00
Ivan A. Kosarev
a13992d918 [NEON] Support VLD1xN intrinsics in AArch32 mode (LLVM part)
We currently support them only in AArch64. The NEON Reference,
however, says they are 'ARMv7, ARMv8' intrinsics.

Differential Revision: https://reviews.llvm.org/D47120


git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@333825 91177308-0d34-0410-b5e6-96231b3b80d8
2018-06-02 16:40:03 +00:00
Ivan A. Kosarev
f646a586eb Revert r333819 "[NEON] Support VLD1xN intrinsics in AArch32 mode (Clang part)"
The LLVM part was committed instead of the Clang part.

Differential Revision: https://reviews.llvm.org/D47121


git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@333824 91177308-0d34-0410-b5e6-96231b3b80d8
2018-06-02 16:38:38 +00:00