1591 Commits

Author SHA1 Message Date
Tim Northover
b7eb4975c4 ARM: stop explicitly marking armv7k libcalls as hard-float. NFC.
Since the triple's default is hard float, the libcalls will already use VFP
registers.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@337386 91177308-0d34-0410-b5e6-96231b3b80d8
2018-07-18 12:37:43 +00:00
Eli Friedman
43b8da3b5b [ARM] Treat cmn immediates as legal in isLegalICmpImmediate.
The original code attempted to do this, but the std::abs() call didn't
actually do anything due to implicit type conversions.  Fix the type
conversions, and perform the correct check for negative immediates.

This probably has very little practical impact, but it's worth fixing
just to avoid confusion in the future, I think.

Differential Revision: https://reviews.llvm.org/D48907



git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@336742 91177308-0d34-0410-b5e6-96231b3b80d8
2018-07-10 23:44:37 +00:00
Ivan A. Kosarev
e816e74216 [NEON] Fix combining of vldx_dup intrinsics with updating of base addresses
Resolves:
Unsupported ARM Neon intrinsics in Target-specific DAG combine
function for VLDDUP
https://bugs.llvm.org/show_bug.cgi?id=38031

Related diff: D48439

Differential Revision: https://reviews.llvm.org/D48920


git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@336325 91177308-0d34-0410-b5e6-96231b3b80d8
2018-07-05 08:59:49 +00:00
Vadzim Dambrouski
2546414701 [ARM] Fix PR37382: Don't optimize mul.with.overflow on thumbv6m.
Reviewers: efriedma, rogfer01, javed.absar

Reviewed By: efriedma, rogfer01

Subscribers: kristof.beyls, chrib, llvm-commits

Differential Revision: https://reviews.llvm.org/D48846

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@336144 91177308-0d34-0410-b5e6-96231b3b80d8
2018-07-02 21:05:26 +00:00
Ivan A. Kosarev
212054e1a9 [NEON] Support vldNq intrinsics in AArch32 (LLVM part)
This patch adds support for the q versions of the dup
(load-to-all-lanes) NEON intrinsics, such as vld2q_dup_f16() for
example.

Currently, non-q versions of the dup intrinsics are implemented
in clang by generating IR that first loads the elements of the
structure into the first lane with the lane (to-single-lane)
intrinsics, and then propagating it other lanes. There are at
least two problems with this approach. First, there are no
double-spaced to-single-lane byte-element instructions. For
example, there is no such instruction as 'vld2.8 { d0[0], d2[0]
}, [r0]'. That means we cannot rely on the to-single-lane
intrinsics and instructions to implement the q versions of the
dup intrinsics. Note that to-all-lanes instructions do support
all sizes of data items, including bytes.

The second problem with the current approach is that we need a
separate vdup instruction to propagate the structure to each
lane. So for vld4q_dup_f16() we would need four vdup instructions
in addition to the initial vld instruction.

This patch introduces dup LLVM intrinsics and reworks handling of
the currently supported (non-q) NEON dup intrinsics to expand
them into those LLVM intrinsics, thus eliminating the need for
using to-single-lane intrinsics and instructions.

Additionally, this patch adds support for u64 and s64 dup NEON
intrinsics. These are marked as Arch64-only in the ARM NEON
Reference, but it seems there are no reasons to not support them
in AArch32 mode. Please correct, if that is wrong.

That's what we generate with this patch applied:

vld2q_dup_f16:
  vld2.16 {d0[], d2[]}, [r0]
  vld2.16 {d1[], d3[]}, [r0]

vld3q_dup_f16:
  vld3.16 {d0[], d2[], d4[]}, [r0]
  vld3.16 {d1[], d3[], d5[]}, [r0]

vld4q_dup_f16:
  vld4.16 {d0[], d2[], d4[], d6[]}, [r0]
  vld4.16 {d1[], d3[], d5[], d7[]}, [r0]

Differential Revision: https://reviews.llvm.org/D48439


git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@335733 91177308-0d34-0410-b5e6-96231b3b80d8
2018-06-27 13:57:52 +00:00
Ivan A. Kosarev
c7f180e8c4 [NEON] Support VST1xN intrinsics in AArch32 mode (LLVM part)
We currently support them only in AArch64. The NEON Reference,
however, says they are 'ARMv7, ARMv8' intrinsics.

Differential Revision: https://reviews.llvm.org/D47447


git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@334361 91177308-0d34-0410-b5e6-96231b3b80d8
2018-06-10 09:27:27 +00:00
Ivan A. Kosarev
a13992d918 [NEON] Support VLD1xN intrinsics in AArch32 mode (LLVM part)
We currently support them only in AArch64. The NEON Reference,
however, says they are 'ARMv7, ARMv8' intrinsics.

Differential Revision: https://reviews.llvm.org/D47120


git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@333825 91177308-0d34-0410-b5e6-96231b3b80d8
2018-06-02 16:40:03 +00:00
Ivan A. Kosarev
f646a586eb Revert r333819 "[NEON] Support VLD1xN intrinsics in AArch32 mode (Clang part)"
The LLVM part was committed instead of the Clang part.

Differential Revision: https://reviews.llvm.org/D47121


git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@333824 91177308-0d34-0410-b5e6-96231b3b80d8
2018-06-02 16:38:38 +00:00
Ivan A. Kosarev
c5b2db16de [NEON] Support VLD1xN intrinsics in AArch32 mode (Clang part)
We currently support them only in AArch64. The NEON Reference,
however, says they are 'ARMv7, ARMv8' intrinsics.

Differential Revision: https://reviews.llvm.org/D47121


git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@333819 91177308-0d34-0410-b5e6-96231b3b80d8
2018-06-02 16:26:42 +00:00
Amaury Sechet
a0bb0ca79d [ARM] Remove code handling ADDC/ADDE/SUBC/SUBE
Summary: This code is now dead as the ARM backend uses ADDCARRY/SUBCARRY/SETCCCARRY .

Reviewers: rogfer01, efriedma, rengolin, javed.absar

Subscribers: kristof.beyls, chrib, llvm-commits

Differential Revision: https://reviews.llvm.org/D47413

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@333544 91177308-0d34-0410-b5e6-96231b3b80d8
2018-05-30 13:45:43 +00:00
Eli Friedman
9ef6691720 [ARM] Enable SETCCCARRY lowering for Thumb1.
We've had Thumb1 support for ARMISD::SUBE for a while now, so this just
works.  Reduces codesize a bit for 64-bit integer comparisons.

Differential Revision: https://reviews.llvm.org/D47387



git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@333445 91177308-0d34-0410-b5e6-96231b3b80d8
2018-05-29 18:17:16 +00:00
Tim Northover
b096d31bb6 ARM: be conservative when asked load/store alignment of weird type.
Chances are we'll be asked again after type legalization, but before that point
it's better to claim misaligned accesses aren't allowed than to assert.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@332840 91177308-0d34-0410-b5e6-96231b3b80d8
2018-05-21 12:43:54 +00:00
Nicola Zaghen
0818e789cb Rename DEBUG macro to LLVM_DEBUG.
The DEBUG() macro is very generic so it might clash with other projects.
The renaming was done as follows:
- git grep -l 'DEBUG' | xargs sed -i 's/\bDEBUG\s\?(/LLVM_DEBUG(/g'
- git diff -U0 master | ../clang/tools/clang-format/clang-format-diff.py -i -p1 -style LLVM
- Manual change to APInt
- Manually chage DOCS as regex doesn't match it.

In the transition period the DEBUG() macro is still present and aliased
to the LLVM_DEBUG() one.

Differential Revision: https://reviews.llvm.org/D43624



git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@332240 91177308-0d34-0410-b5e6-96231b3b80d8
2018-05-14 12:53:11 +00:00
Amaury Sechet
0f4cc99891 [ARM] Add support for SETCCCARRY instead of SETCCE
Summary: As per title. SETCCE is deprecated and will eventually be removed.

Reviewers: rogfer01, efriedma, rengolin, javed.absar

Subscribers: kristof.beyls, chrib, llvm-commits

Differential Revision: https://reviews.llvm.org/D46512

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@331929 91177308-0d34-0410-b5e6-96231b3b80d8
2018-05-09 22:15:51 +00:00
Amaury Sechet
ece3eb8cb6 [ARM] Select result 1 from ConvertBooleanCarryToCarryFlag's result automatically. NFC
The old behavior return the value 0, which is error prone.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@331614 91177308-0d34-0410-b5e6-96231b3b80d8
2018-05-07 01:43:42 +00:00
Tim Northover
4c19071b5b ARM: don't try to over-align large vectors as arguments.
By default LLVM thinks very large vectors get aligned to their size when
passed across functions. Unfortunately no-one told the ARM backend so it
doesn't trigger stack realignment and so accesses can cause the usual
misalignment issues (e.g. a data abort).

This changes the ABI alignment to the stack alignment, which in practice
(and as a bonus) also coincides with the alignment "natural" vectors get.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@331451 91177308-0d34-0410-b5e6-96231b3b80d8
2018-05-03 12:54:25 +00:00
Adrian Prantl
26b584c691 Remove \brief commands from doxygen comments.
We've been running doxygen with the autobrief option for a couple of
years now. This makes the \brief markers into our comments
redundant. Since they are a visual distraction and we don't want to
encourage more \brief markers in new code either, this patch removes
them all.

Patch produced by

  for i in $(git grep -l '\\brief'); do perl -pi -e 's/\\brief //g' $i & done

Differential Revision: https://reviews.llvm.org/D46290

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@331272 91177308-0d34-0410-b5e6-96231b3b80d8
2018-05-01 15:54:18 +00:00
Sjoerd Meijer
b69d7e5c93 [ARM] FP16 vmaxnm/vminnm scalar instructions
This adds code generation support for the FP16 vmaxnm/vminnm scalar
instructions.

Differential Revision: https://reviews.llvm.org/D44675


git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@330034 91177308-0d34-0410-b5e6-96231b3b80d8
2018-04-13 15:34:26 +00:00
Sjoerd Meijer
30552b9602 [ARM] FP16 VSEL codegen
This is a follow up of rL327695 to instruction select more variants of VSELGT
and VSELGE, for which it is necessary to custom lower SELECT.

More work is required in this area, which will be addressed soon:
- more variants need to be regression tested, but this depends on the next point.
- first LowerConstantFP need to be adjusted for fp16 values.

Differential Revision: https://reviews.llvm.org/D45205


git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@329788 91177308-0d34-0410-b5e6-96231b3b80d8
2018-04-11 09:28:04 +00:00
Craig Topper
f137ed238d [IR][CodeGen] Remove dependency on EVT from IR/Function.cpp. Move EVT to CodeGen layer.
Currently EVT is in the IR layer only because of Function.cpp needing a very small piece of the functionality of EVT::getEVTString(). The rest of EVT is used in codegen making CodeGen a better place for it.

The previous code converted a Type* to EVT and then called getEVTString. This was only expected to handle the primitive types from Type*. Since there only a few primitive types, we can just print them as strings directly.

Differential Revision: https://reviews.llvm.org/D45017

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@328806 91177308-0d34-0410-b5e6-96231b3b80d8
2018-03-29 17:21:10 +00:00
Christof Douma
fd7cc69a64 [ARM] Support float literals under XO
Follow up patch of r328313 to support the UseVMOVSR constraint. Removed
some unneeded instructions from the test and removed some stray
comments.

Differential Revision: https://reviews.llvm.org/D44941

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@328691 91177308-0d34-0410-b5e6-96231b3b80d8
2018-03-28 10:02:26 +00:00
David Blaikie
b91d9a7128 Fix layering by moving ValueTypes.h from CodeGen to IR
ValueTypes.h is implemented in IR already.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@328397 91177308-0d34-0410-b5e6-96231b3b80d8
2018-03-23 23:58:31 +00:00
David Blaikie
9d9a46a465 Fix layering of MachineValueType.h by moving it from CodeGen to Support
This is used by llvm tblgen as well as by LLVM Targets, so the only
common place is Support for now. (maybe we need another target for these
sorts of things - but for now I'm at least making them correct & we can
make them better if/when people have strong feelings)

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@328395 91177308-0d34-0410-b5e6-96231b3b80d8
2018-03-23 23:58:25 +00:00
Christof Douma
a0e24a5d9f [ARM] Support float literals under XO
When targeting execute-only and fp-armv8, float constants in a compare
resulted in instruction selection failures. This is now fixed by using
vmov.f32 where possible, otherwise the floating point constant is
lowered into a integer constant that is moved into a floating point
register.

This patch also restores using fpcmp with immediate 0 under fp-armv8.

Change-Id: Ie87229706f4ed879a0c0cf66631b6047ed6c6443

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@328313 91177308-0d34-0410-b5e6-96231b3b80d8
2018-03-23 13:02:03 +00:00
Martin Storsjo
dbc74cb872 [ARM, AArch64] Check the no-stack-arg-probe attribute for dynamic stack probes
This extends the use of this attribute on ARM and AArch64 from
SVN r325900 (where it was only checked for fixed stack
allocations on ARM/AArch64, but for all stack allocations on X86).

This also adds a testcase for the existing use of disabling the
fixed stack probe with the attribute on ARM and AArch64.

Differential Revision: https://reviews.llvm.org/D44291

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@327897 91177308-0d34-0410-b5e6-96231b3b80d8
2018-03-19 20:06:50 +00:00
Sjoerd Meijer
e9c814d4ae [ARM] Support for v4f16 and v8f16 vectors
This is the groundwork for adding the Armv8.2-A FP16 vector intrinsics, which
uses v4f16 and v8f16 vector operands and return values. All the moving parts
are tested with two intrinsics, a 1-operand v8f16 and a 2-operand v4f16
intrinsic. In a follow-up patch the rest of the intrinsics and tests will be
added.

Differential Revision: https://reviews.llvm.org/D44538


git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@327839 91177308-0d34-0410-b5e6-96231b3b80d8
2018-03-19 13:35:25 +00:00
Sjoerd Meijer
547e2f8dcc [ARM] FP16 codegen support for VSEL
This implements lowering of SELECT_CC for f16s, which enables
codegen of VSEL with f16 types.

Differential Revision: https://reviews.llvm.org/D44518


git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@327695 91177308-0d34-0410-b5e6-96231b3b80d8
2018-03-16 08:06:25 +00:00
Sjoerd Meijer
65019820f7 [ARM] Fix for PR36577
Don't PerformSHLSimplify if the given node is used by a node that also uses a
constant because we may get stuck in an infinite combine loop.

bugzilla: https://bugs.llvm.org/show_bug.cgi?id=36577

Patch by Sam Parker.

Differential Revision: https://reviews.llvm.org/D44097


git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@326882 91177308-0d34-0410-b5e6-96231b3b80d8
2018-03-07 09:10:44 +00:00
Chih-Hung Hsieh
70716e54e0 [TLS] use emulated TLS if the target supports only this mode
Emulated TLS is enabled by llc flag -emulated-tls,
which is passed by clang driver.
When llc is called explicitly or from other drivers like LTO,
missing -emulated-tls flag would generate wrong TLS code for targets
that supports only this mode.
Now use useEmulatedTLS() instead of Options.EmulatedTLS to decide whether
emulated TLS code should be generated.
Unit tests are modified to run with and without the -emulated-tls flag.

Differential Revision: https://reviews.llvm.org/D42999



git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@326341 91177308-0d34-0410-b5e6-96231b3b80d8
2018-02-28 17:48:55 +00:00
Pablo Barrio
efd80a619f [ARM] Lower lower saturate to 0 and lower saturate to -1 using bit-operations
Summary:
Expressions of the form x < 0 ? 0 :  x; and x < -1 ? -1 : x can be lowered using bit-operations instead of branching or conditional moves

In thumb-mode this results in a two-instruction sequence, a shift followed by a bic or or while in ARM/thumb2 mode that has flexible second operand the shift can be folded into a single bic/or instructions. In most cases this results in smaller code and possibly less branches, and in no case larger than before.

Patch by Martin Svanfeldt

Reviewers: fhahn, pbarrio, rogfer01

Reviewed By: pbarrio, rogfer01

Subscribers: chrib, yroux, eugenis, efriedma, rogfer01, aemerson, javed.absar, kristof.beyls, llvm-commits

Differential Revision: https://reviews.llvm.org/D42574

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@326333 91177308-0d34-0410-b5e6-96231b3b80d8
2018-02-28 17:13:07 +00:00
Geoff Berry
13357c96d2 [MachineOperand][Target] MachineOperand::isRenamable semantics changes
Summary:
Add a target option AllowRegisterRenaming that is used to opt in to
post-register-allocation renaming of registers.  This is set to 0 by
default, which causes the hasExtraSrcRegAllocReq/hasExtraDstRegAllocReq
fields of all opcodes to be set to 1, causing
MachineOperand::isRenamable to always return false.

Set the AllowRegisterRenaming flag to 1 for all in-tree targets that
have lit tests that were effected by enabling COPY forwarding in
MachineCopyPropagation (AArch64, AMDGPU, ARM, Hexagon, Mips, PowerPC,
RISCV, Sparc, SystemZ and X86).

Add some more comments describing the semantics of the
MachineOperand::isRenamable function and how it is set and maintained.

Change isRenamable to check the operand's opcode
hasExtraSrcRegAllocReq/hasExtraDstRegAllocReq bit directly instead of
relying on it being consistently reflected in the IsRenamable bit
setting.

Clear the IsRenamable bit when changing an operand's register value.

Remove target code that was clearing the IsRenamable bit when changing
registers/opcodes now that this is done conservatively by default.

Change setting of hasExtraSrcRegAllocReq in AMDGPU target to be done in
one place covering all opcodes that have constant pipe read limit
restrictions.

Reviewers: qcolombet, MatzeB

Subscribers: aemerson, arsenm, jyknight, mcrosier, sdardis, nhaehnle, javed.absar, tpr, arichardson, kristof.beyls, kbarton, fedor.sergeev, asb, rbar, johnrusso, simoncook, jordy.potman.lists, apazos, sabuasal, niosHD, escha, nemanjai, llvm-commits

Differential Revision: https://reviews.llvm.org/D43042

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@325931 91177308-0d34-0410-b5e6-96231b3b80d8
2018-02-23 18:25:08 +00:00
Sjoerd Meijer
c381a107ab [ARM] Lower BR_CC for f16
This case wasn't handled yet.

Differential Revision: https://reviews.llvm.org/D43508



git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@325616 91177308-0d34-0410-b5e6-96231b3b80d8
2018-02-20 19:28:05 +00:00
Roger Ferrer Ibanez
ca353f0f9c [ARM] Materialise some boolean values to avoid a branch
This patch combines some cases of ARMISD::CMOV for integers that arise in comparisons of the form

  a != b ? x : 0
  a == b ? 0 : x

and that currently (e.g. in Thumb1) are emitted as branches.

Differential Revision: https://reviews.llvm.org/D34515



git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@325323 91177308-0d34-0410-b5e6-96231b3b80d8
2018-02-16 09:23:59 +00:00
Pablo Barrio
5f393824ae [ARM] Allow 64- and 128-bit types with 't' inline asm constraint
Summary:
In LLVM, 't' selects a floating-point/SIMD register and only supports
32-bit values. This is appropriately documented in the LLVM Language
Reference Manual. However, this behaviour diverges from that of GCC, where
't' selects the s0-s31 registers and its qX and dX variants depending on
additional operand modifiers (q/P).

For example, the following C code:

#include <arm_neon.h>
float32x4_t a, b, x;
asm("vadd.f32 %0, %1, %2" : "=t" (x) : "t" (a), "t" (b))

results in the following assembly if compiled with GCC:

vadd.f32 s0, s0, s1

whereas LLVM will show "error: couldn't allocate output register for
constraint 't'", since a, b, x are 128-bit variables, not 32-bit.

This patch extends the use of 't' to mean that of GCC, thus allowing
selection of the lower Q vector regs and their D/S variants. For example,
the earlier code will now compile as:

vadd.f32 q0, q0, q1

This behaviour still differs from that of GCC but I think it is actually
more correct, since LLVM picks up the right register type based on the
datatype of x, while GCC would need an extra operand modifier to achieve
the same result, as follows:

asm("vadd.f32 %q0, %q1, %q2" : "=t" (x) : "t" (a), "t" (b))

Since this is only an extension of functionality, existing code should not
be affected by this change. Note that operand modifiers q/P are already
supported by LLVM, so this patch should suffice to support inline
assembly with constraint 't' originally built for GCC.

Reviewers: grosbach, rengolin

Reviewed By: rengolin

Subscribers: rogfer01, efriedma, olista01, aemerson, javed.absar, eraman, kristof.beyls, llvm-commits

Differential Revision: https://reviews.llvm.org/D42962

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@325244 91177308-0d34-0410-b5e6-96231b3b80d8
2018-02-15 14:44:22 +00:00
Sjoerd Meijer
96102045c0 [ARM] Armv8.2-A FP16 code generation (part 3/3)
This adds most of the FP16 codegen support, but these areas need further work:

- FP16 literals and immediates are not properly supported yet (e.g. literal
  pool needs work),
- Instructions that are generated from intrinsics (e.g. vabs) haven't been
  added.

This will be addressed in follow-up patches.

Differential Revision: https://reviews.llvm.org/D42849


git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@324321 91177308-0d34-0410-b5e6-96231b3b80d8
2018-02-06 08:43:56 +00:00
Sjoerd Meijer
1e23490f9a [ARM] FullFP16 LowerReturn Fix
Commit r323512 introduced an optimisation in LowerReturn for half-precision
return values. A missing check caused a crash when the return value is "undef"
(i.e. a node that has no operands).

Differential Revision: https://reviews.llvm.org/D42743


git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@323968 91177308-0d34-0410-b5e6-96231b3b80d8
2018-02-01 13:48:40 +00:00
Evgeniy Stepanov
37f3f33c55 Revert "[ARM] Lower lower saturate to 0 and lower saturate to -1 using bit-operations"
Miscompiles code. Testcase pending.

This reverts commit r323869.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@323929 91177308-0d34-0410-b5e6-96231b3b80d8
2018-01-31 22:55:19 +00:00
Pablo Barrio
d69d8351ae [ARM] Lower lower saturate to 0 and lower saturate to -1 using bit-operations
Summary:
Expressions of the form x < 0 ? 0 :  x; and x < -1 ? -1 : x can be lowered using bit-operations instead of branching or conditional moves

In thumb-mode this results in a two-instruction sequence, a shift followed by a bic or or while in ARM/thumb2 mode that has flexible second operand the shift can be folded into a single bic/or instructions. In most cases this results in smaller code and possibly less branches, and in no case larger than before.

Patch by Marten Svanfeldt.

Reviewers: fhahn, pbarrio

Reviewed By: pbarrio

Subscribers: efriedma, rogfer01, aemerson, javed.absar, kristof.beyls, llvm-commits

Differential Revision: https://reviews.llvm.org/D42574

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@323869 91177308-0d34-0410-b5e6-96231b3b80d8
2018-01-31 13:20:10 +00:00
Sjoerd Meijer
e18fe9a179 [ARM] Armv8.2-A FP16 code generation (part 2/3)
Half-precision arguments and return values are passed as if it were an int or
float for ARM. This results in truncates and bitcasts to/from i16 and f16
values, which are legalized very early to stack stores/loads. When FullFP16 is
enabled, we want to avoid codegen for these bitcasts as it is unnecessary and
inefficient.

Differential Revision: https://reviews.llvm.org/D42580


git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@323861 91177308-0d34-0410-b5e6-96231b3b80d8
2018-01-31 10:18:29 +00:00
Sjoerd Meijer
3b4e0d39af [ARM] Armv8.2-A FP16 code generation (part 1/3)
This is the groundwork for Armv8.2-A FP16 code generation .

Clang passes and returns _Float16 values as floats, together with the required
bitconverts and truncs etc. to implement correct AAPCS behaviour, see D42318.
We will implement half-precision argument passing/returning lowering in the ARM
backend soon, but for now this means that this:

_Float16 sub(_Float16 a, _Float16 b) {
  return a + b;
}

gets lowered to this:

define float @sub(float %a.coerce, float %b.coerce) {
entry:
  %0 = bitcast float %a.coerce to i32
  %tmp.0.extract.trunc = trunc i32 %0 to i16
  %1 = bitcast i16 %tmp.0.extract.trunc to half
  <SNIP>
  %add = fadd half %1, %3
  <SNIP>
}

When FullFP16 is *not* supported, we don't make f16 a legal type, and we get
legalization for "free", i.e. nothing changes and everything works as before.
And also f16 argument passing/returning is handled.

When FullFP16 is supported, we do make f16 a legal type, and have 2 places that
we need to patch up: f16 argument passing and returning, which involves minor
tweaks to avoid unnecessary code generation for some bitcasts.

As a "demonstrator" that this works for the different FP16, FullFP16, softfp
modes, etc., I've added match rules to the VSUB instruction description showing
that we can codegen this instruction from IR, but more importantly, also to
some conversion instructions. These conversions were causing issue before in
the FP16 and FullFP16 cases.

I've also added match rules to the VLDRH and VSTRH desriptions, so that we can
actually compile the entire half-precision sub code example above. This showed
that these loads and stores had the wrong addressing mode specified: AddrMode5
instead of AddrMode5FP16, which turned out not be implemented at all, so that
has also been added.

This is the minimal patch that shows all the different moving parts. In patch
2/3 I will add some efficient lowering of bitcasts, and in 2/3 I will add the
remaining Armv8.2-A FP16 instruction descriptions.


Thanks to Sam Parker and Oliver Stannard for their help and reviews!


Differential Revision: https://reviews.llvm.org/D38315



git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@323512 91177308-0d34-0410-b5e6-96231b3b80d8
2018-01-26 09:26:40 +00:00
Weiming Zhao
e999114a77 [ARM] Expand long shifts for Thumb1 to __aeabi_ calls
Summary: For long shifts, the inlined version takes about 20 instructions on Thumb1. To avoid the code bloat, expand to __aeabi_ calls if target is Thumb1.

Reviewers: samparker

Reviewed By: samparker

Subscribers: samparker, aemerson, javed.absar, kristof.beyls, llvm-commits

Differential Revision: https://reviews.llvm.org/D42401

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@323354 91177308-0d34-0410-b5e6-96231b3b80d8
2018-01-24 18:00:57 +00:00
Martin Storsjo
34715e9f75 [ARM] Call __chkstk for dynamic stack allocation in all windows environments
This matches what MSVC does for alloca() function calls on ARM.
Even if MSVC doesn't support VLAs at the language level, it does
support the alloca function.

On the clang level, both the _alloca() (when emulating MSVC, which is
what the alloca() function expands to) and __builtin_alloca() builtin
functions, and VLAs, map to the same LLVM IR "alloca" function - so
within LLVM they're not distinguishable from each other.

Differential Revision: https://reviews.llvm.org/D42292

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@323308 91177308-0d34-0410-b5e6-96231b3b80d8
2018-01-24 06:40:11 +00:00
Joel Galenson
47c91393e1 [ARM] Optimize {s,u}mul.with.overflow.
This extends my previous patches to also optimize overflow-checked multiplies during SelectionDAG.

Differential revision: https://reviews.llvm.org/D40922

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@322738 91177308-0d34-0410-b5e6-96231b3b80d8
2018-01-17 19:19:05 +00:00
Andre Vieira
8eb918fc8a [ARM] Add codegen for SMMULR, SMMLAR and SMMLSR
This patch teaches the Arm back-end to generate the SMMULR, SMMLAR and SMMLSR
instructions from equivalent IR patterns.

Differential Revision: https://reviews.llvm.org/D41775


git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@322361 91177308-0d34-0410-b5e6-96231b3b80d8
2018-01-12 09:24:41 +00:00
Joel Galenson
7931efae70 [ARM] Optimize {s,u}{add,sub}.with.overflow.
The AArch64 backend contains code to optimize {s,u}{add,sub}.with.overflow during SelectionDAG.  This commit ports that code to the ARM backend.

Differential revision: https://reviews.llvm.org/D35635

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@321224 91177308-0d34-0410-b5e6-96231b3b80d8
2017-12-20 22:25:39 +00:00
Florian Hahn
b68371dd25 [ARM] Lower unsigned saturation to USAT
Summary:
Implement lower of unsigned saturation on an interval [0, k] where k + 1 is a power of two using USAT instruction in a similar way to how [~k, k] is lowered using SSAT on ARM models that supports it.

Patch by Marten Svanfeldt

Reviewers: t.p.northover, pbarrio, eastig, SjoerdMeijer, javed.absar, fhahn

Reviewed By: fhahn

Subscribers: fhahn, aemerson, javed.absar, llvm-commits, kristof.beyls

Differential Revision: https://reviews.llvm.org/D41348

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@321164 91177308-0d34-0410-b5e6-96231b3b80d8
2017-12-20 11:13:57 +00:00
Adrian Prantl
c0ade050e0 Silence a bunch of implicit fallthrough warnings
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@321114 91177308-0d34-0410-b5e6-96231b3b80d8
2017-12-19 22:05:25 +00:00
Matthias Braun
4f43d6f3e0 X86/AArch64/ARM: Factor out common sincos_stret logic; NFCI
Note:
- X86ISelLowering: setLibcallName(SINCOS) was superfluous as
  InitLibcalls() already does it.
- ARMISelLowering: Setting libcallnames for sincos/sincosf seemed
  superfluous as in the darwin case it wouldn't be used while for all
  other cases InitLibcalls already does it.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@321036 91177308-0d34-0410-b5e6-96231b3b80d8
2017-12-18 23:19:42 +00:00
Matthias Braun
d318139827 MachineFunction: Return reference from getFunction(); NFC
The Function can never be nullptr so we can return a reference.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@320884 91177308-0d34-0410-b5e6-96231b3b80d8
2017-12-15 22:22:58 +00:00
Matt Arsenault
45d0bf280d TLI: Allow using PSV for intrinsic mem operands
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@320756 91177308-0d34-0410-b5e6-96231b3b80d8
2017-12-14 22:34:10 +00:00