Commit Graph

3521 Commits

Author SHA1 Message Date
Vadzim Dambrouski
2546414701 [ARM] Fix PR37382: Don't optimize mul.with.overflow on thumbv6m.
Reviewers: efriedma, rogfer01, javed.absar

Reviewed By: efriedma, rogfer01

Subscribers: kristof.beyls, chrib, llvm-commits

Differential Revision: https://reviews.llvm.org/D48846

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@336144 91177308-0d34-0410-b5e6-96231b3b80d8
2018-07-02 21:05:26 +00:00
Sjoerd Meijer
ee2becd704 [ARM] Parallel DSP Pass
Armv6 introduced instructions to perform 32-bit SIMD operations. The purpose of
this pass is to do some straightforward IR pattern matching to create ACLE DSP
intrinsics, which map on these 32-bit SIMD operations.

Currently, only the SMLAD instruction gets recognised. This instruction
performs two multiplications with 16-bit operands, and stores the result in an
accumulator. We will follow this up with patches to recognise SMLAD in more
cases, and also to generate other DSP instructions (like e.g. SADD16).

Patch by: Sam Parker and Sjoerd Meijer

Differential Revision: https://reviews.llvm.org/D48128


git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@335850 91177308-0d34-0410-b5e6-96231b3b80d8
2018-06-28 12:55:29 +00:00
Ivan A. Kosarev
212054e1a9 [NEON] Support vldNq intrinsics in AArch32 (LLVM part)
This patch adds support for the q versions of the dup
(load-to-all-lanes) NEON intrinsics, such as vld2q_dup_f16() for
example.

Currently, non-q versions of the dup intrinsics are implemented
in clang by generating IR that first loads the elements of the
structure into the first lane with the lane (to-single-lane)
intrinsics, and then propagating it other lanes. There are at
least two problems with this approach. First, there are no
double-spaced to-single-lane byte-element instructions. For
example, there is no such instruction as 'vld2.8 { d0[0], d2[0]
}, [r0]'. That means we cannot rely on the to-single-lane
intrinsics and instructions to implement the q versions of the
dup intrinsics. Note that to-all-lanes instructions do support
all sizes of data items, including bytes.

The second problem with the current approach is that we need a
separate vdup instruction to propagate the structure to each
lane. So for vld4q_dup_f16() we would need four vdup instructions
in addition to the initial vld instruction.

This patch introduces dup LLVM intrinsics and reworks handling of
the currently supported (non-q) NEON dup intrinsics to expand
them into those LLVM intrinsics, thus eliminating the need for
using to-single-lane intrinsics and instructions.

Additionally, this patch adds support for u64 and s64 dup NEON
intrinsics. These are marked as Arch64-only in the ARM NEON
Reference, but it seems there are no reasons to not support them
in AArch32 mode. Please correct, if that is wrong.

That's what we generate with this patch applied:

vld2q_dup_f16:
  vld2.16 {d0[], d2[]}, [r0]
  vld2.16 {d1[], d3[]}, [r0]

vld3q_dup_f16:
  vld3.16 {d0[], d2[], d4[]}, [r0]
  vld3.16 {d1[], d3[], d5[]}, [r0]

vld4q_dup_f16:
  vld4.16 {d0[], d2[], d4[], d6[]}, [r0]
  vld4.16 {d1[], d3[], d5[], d7[]}, [r0]

Differential Revision: https://reviews.llvm.org/D48439


git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@335733 91177308-0d34-0410-b5e6-96231b3b80d8
2018-06-27 13:57:52 +00:00
Than McIntosh
61f67f4637 [X86,ARM] Retain split-stack prolog check for sibling calls
Summary:
If a routine with no stack frame makes a sibling call, we need to
preserve the stack space check even if the local stack frame is empty,
since the call target could be a "no-split" function (in which case
the linker needs to be able to fix up the prolog sequence in order to
switch to a larger stack).

This fixes PR37807.

Reviewers: cherry, javed.absar

Subscribers: srhines, llvm-commits

Differential Revision: https://reviews.llvm.org/D48444

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@335604 91177308-0d34-0410-b5e6-96231b3b80d8
2018-06-26 14:11:30 +00:00
George Rimar
af36295232 Recommit r335333 "[MC] - Add .stack_size sections into groups and link them with .text"
With compilation fix.

Original commit message:

D39788 added a '.stack-size' section containing metadata on function stack sizes
to output ELF files behind the new -stack-size-section flag.

This change does following two things on top:

1) Imagine the case when there are -ffunction-sections flag given and there are text sections in COMDATs. 
    The patch adds a '.stack-size' section into corresponding COMDAT group, so that linker will be able to
    eliminate them fast during resolving the COMDATs.
2) Patch sets a SHF_LINK_ORDER flag and links '.stack-size' with the corresponding .text.
   With that linker will be able to do -gc-sections on dead stack sizes sections.

Differential revision: https://reviews.llvm.org/D46874

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@335336 91177308-0d34-0410-b5e6-96231b3b80d8
2018-06-22 10:53:47 +00:00
George Rimar
51ddc3757a Revert r335332 "[MC] - Add .stack_size sections into groups and link them with .text"
It broke bots.

http://lab.llvm.org:8011/builders/clang-ppc64le-linux-lnt/builds/12891
http://lab.llvm.org:8011/builders/clang-cmake-x86_64-sde-avx512-linux/builds/9443
http://lab.llvm.org:8011/builders/lldb-x86_64-ubuntu-14.04-buildserver/builds/25551

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@335333 91177308-0d34-0410-b5e6-96231b3b80d8
2018-06-22 10:27:33 +00:00
George Rimar
08d6b0d9f0 [MC] - Add .stack_size sections into groups and link them with .text
D39788 added a '.stack-size' section containing metadata on function stack sizes
to output ELF files behind the new -stack-size-section flag.

This change does following two things on top:

1) Imagine the case when there are -ffunction-sections flag given and there are text sections in COMDATs. 
    The patch adds a '.stack-size' section into corresponding COMDAT group, so that linker will be able to
    eliminate them fast during resolving the COMDATs.
2) Patch sets a SHF_LINK_ORDER flag and links '.stack-size' with the corresponding .text.
   With that linker will be able to do -gc-sections on dead stack sizes sections.

Differential revision: https://reviews.llvm.org/D46874

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@335332 91177308-0d34-0410-b5e6-96231b3b80d8
2018-06-22 10:10:53 +00:00
Sjoerd Meijer
9104c92c0b Recommit of r335326, with the test fixed that I missed.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@335331 91177308-0d34-0410-b5e6-96231b3b80d8
2018-06-22 10:03:03 +00:00
Sjoerd Meijer
5f1676b4d6 Reverting r335326 while I look at the test failure
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@335328 91177308-0d34-0410-b5e6-96231b3b80d8
2018-06-22 09:17:08 +00:00
Sjoerd Meijer
2ee51272b7 [ARM] ARMv6m and v8m.baseline strict align
This sets target feature FeatureStrictAlign for Armv6-m and Armv8-m.baseline,
because it has no support for unaligned accesses.
It looks like we always pass target feature "+strict-align" from
Clang, so this is not a user facing problem, but querying the subtarget
(in e.g. llc) for unaligned access support is incorrect.

Differential Revision: https://reviews.llvm.org/D48437


git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@335326 91177308-0d34-0410-b5e6-96231b3b80d8
2018-06-22 08:48:13 +00:00
David Green
3cd5aad15c [ARM] Enable useAA() for the in-order Cortex-R52
This option allows codegen (such as DAGCombine or MI scheduling) to use alias
analysis information, which can help with the codegen on in-order cpu's,
especially machine scheduling. Here I have done things the same way as AArch64,
adding a subtarget feature to enable this for specific cores, and enabled it for
the R52 where we have a schedule to make use of it.

Differential Revision: https://reviews.llvm.org/D48074



git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@335249 91177308-0d34-0410-b5e6-96231b3b80d8
2018-06-21 15:48:29 +00:00
Sam Parker
a3ff01d101 [NFC][ARM] ldrd/strd negative tests
Add negative tests for load and stores of alignment 2.


git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@335241 91177308-0d34-0410-b5e6-96231b3b80d8
2018-06-21 14:53:06 +00:00
David Green
2e8cb93a84 [DAGCombine] Fix alignment for offset loads/stores
The alignment parameter to getExtLoad is treated as a base alignment,
not the alignment of the load (base + offset). When we infer a better
alignment for a Ptr we need to ensure that it applies to the base to
prevent the alignment on the load from being wrong.

This fixes a bug where the alignment could then be used to incorrectly
prove noalias between a load and a store, leading to a miscompile.

Differential Revision: https://reviews.llvm.org/D48029



git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@335210 91177308-0d34-0410-b5e6-96231b3b80d8
2018-06-21 08:30:07 +00:00
Alina Sbirlea
8d9ac7ff94 Generalize MergeBlockIntoPredecessor. Replace uses of MergeBasicBlockIntoOnlyPred.
Summary:
Two utils methods have essentially the same functionality. This is an attempt to merge them into one.
1. lib/Transforms/Utils/Local.cpp : MergeBasicBlockIntoOnlyPred
2. lib/Transforms/Utils/BasicBlockUtils.cpp : MergeBlockIntoPredecessor

Prior to the patch:
1. MergeBasicBlockIntoOnlyPred
Updates either DomTree or DeferredDominance
Moves all instructions from Pred to BB, deletes Pred
Asserts BB has single predecessor
If address was taken, replace the block address with constant 1 (?)

2. MergeBlockIntoPredecessor
Updates DomTree, LoopInfo and MemoryDependenceResults
Moves all instruction from BB to Pred, deletes BB
Returns if doesn't have a single predecessor
Returns if BB's address was taken

After the patch:
Method 2. MergeBlockIntoPredecessor is attempting to become the new default:
Updates DomTree or DeferredDominance, and LoopInfo and MemoryDependenceResults
Moves all instruction from BB to Pred, deletes BB
Returns if doesn't have a single predecessor
Returns if BB's address was taken

Uses of MergeBasicBlockIntoOnlyPred that need to be replaced:

1. lib/Transforms/Scalar/LoopSimplifyCFG.cpp
Updated in this patch. No challenges.

2. lib/CodeGen/CodeGenPrepare.cpp
Updated in this patch.
  i. eliminateFallThrough is straightforward, but I added using a temporary array to avoid the iterator invalidation.
  ii. eliminateMostlyEmptyBlock(s) methods also now use a temporary array for blocks
Some interesting aspects:
  - Since Pred is not deleted (BB is), the entry block does not need updating.
  - The entry block was being updated with the deleted block in eliminateMostlyEmptyBlock. Added assert to make obvious that BB=SinglePred.
  - isMergingEmptyBlockProfitable assumes BB is the one to be deleted.
  - eliminateMostlyEmptyBlock(BB) does not delete BB on one path, it deletes its unique predecessor instead.
  - adding some test owner as subscribers for the interesting tests modified:
    test/CodeGen/X86/avx-cmp.ll
    test/CodeGen/AMDGPU/nested-loop-conditions.ll
    test/CodeGen/AMDGPU/si-annotate-cf.ll
    test/CodeGen/X86/hoist-spill.ll
    test/CodeGen/X86/2006-11-17-IllegalMove.ll

3. lib/Transforms/Scalar/JumpThreading.cpp
Not covered in this patch. It is the only use case using the DeferredDominance.
I would defer to Brian Rzycki to make this replacement.

Reviewers: chandlerc, spatel, davide, brzycki, bkramer, javed.absar

Subscribers: qcolombet, sanjoy, nemanjai, nhaehnle, jlebar, tpr, kbarton, RKSimon, wmi, arsenm, llvm-commits

Differential Revision: https://reviews.llvm.org/D48202

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@335183 91177308-0d34-0410-b5e6-96231b3b80d8
2018-06-20 22:01:04 +00:00
Tim Northover
4d3b212098 ARM: convert ORR instructions to ADD where possible on Thumb.
Thumb has more 16-bit encoding space dedicated to ADD than ORR, allowing both a
3-address encoding and a wider range of immediates. So, particularly when
optimizing for code size (but it doesn't make things worse elsewhere) it's
beneficial to select an OR operation to an ADD if we know overflow won't occur.

This is made even better by LLVM's penchant for putting operations in canonical
form by converting the other way.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@335119 91177308-0d34-0410-b5e6-96231b3b80d8
2018-06-20 12:09:44 +00:00
Eli Friedman
959e5f8164 [ARM] Add Thumb1 coverage for cmn testcases.
There's a missed optimization for immediates: we can save two
instructions by using adds instead of movs+mvns+cmp.



git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@335002 91177308-0d34-0410-b5e6-96231b3b80d8
2018-06-19 00:09:44 +00:00
Eli Friedman
7d6421e0ed [ARM] Testcase for missed optimization with i16 compare.
The result looks weird because the DAG actually has an explicit
shift; I haven't figured out why, exactly.



git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@335000 91177308-0d34-0410-b5e6-96231b3b80d8
2018-06-19 00:07:30 +00:00
Michael Berg
11f1b27ce7 easing the constraint for isNegatibleForFree and GetNegatedExpression
Summary:
Here we relax the old constraint which utilized unsafe with the TargetOption flag HonorSignDependentRoundingFPMathOption, with the assertion that unsafe is no longer needed or never was required for correctness on FDIV/FMUL.  



Reviewers: spatel, hfinkel, wristow, arsenm, javed.absar

Reviewed By: spatel

Subscribers: efriedma, wdng, tpr

Differential Revision: https://reviews.llvm.org/D48057

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@334769 91177308-0d34-0410-b5e6-96231b3b80d8
2018-06-14 20:54:13 +00:00
Matt Arsenault
696326c395 DAG: Fix extract_subvector combine for a single element
This would fail before because 1x vectors aren't legal,
so instead just use the scalar type.

Avoids regressions in a future AMDGPU commit to add
v4i16/v4f16 as legal types.

Test update is just the one test that this triggers
on in tree now. It wasn't checking anything before.
The result is completely  changed since the selects
are eliminated. Not sure if it's considered better
or not.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@334440 91177308-0d34-0410-b5e6-96231b3b80d8
2018-06-11 21:27:41 +00:00
Ivan A. Kosarev
c7f180e8c4 [NEON] Support VST1xN intrinsics in AArch32 mode (LLVM part)
We currently support them only in AArch64. The NEON Reference,
however, says they are 'ARMv7, ARMv8' intrinsics.

Differential Revision: https://reviews.llvm.org/D47447


git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@334361 91177308-0d34-0410-b5e6-96231b3b80d8
2018-06-10 09:27:27 +00:00
Eli Friedman
09c10d86a8 [ARM] Allow CMPZ transforms even if the input has multiple uses.
It looks like this got left in by accident in r289794; I can't think of
any reason this check would be necessary.  (Maybe it was meant to be a
check that the AND has one use? But we check that a few lines earlier.)

Differential Revision: https://reviews.llvm.org/D47921



git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@334322 91177308-0d34-0410-b5e6-96231b3b80d8
2018-06-08 21:16:56 +00:00
Evandro Menezes
df07044b5f [AArch64, ARM] Add support for Samsung Exynos M4
Create a separate feature set for Exynos M4 and add test cases.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@334115 91177308-0d34-0410-b5e6-96231b3b80d8
2018-06-06 18:56:00 +00:00
David Green
b439319a25 [GlobalMerge] Set the alignment on merged global structs
If no alignment is set, the abi/preferred alignment of structs will be
used which may be higher than required. This can lead to extra padding
and in the end an increase in data size.

Differential Revision: https://reviews.llvm.org/D47633



git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@334099 91177308-0d34-0410-b5e6-96231b3b80d8
2018-06-06 14:48:32 +00:00
Peter Smith
e2b2a91087 [MC] Pass MCSubtargetInfo to fixupNeedsRelaxation and applyFixup
On targets like Arm some relaxations may only be performed when certain
architectural features are available. As functions can be compiled with
differing levels of architectural support we must make a judgement on
whether we can relax based on the MCSubtargetInfo for the function. This
change passes through the MCSubtargetInfo for the function to
fixupNeedsRelaxation so that the decision on whether to relax can be made
per function. In this patch, only the ARM backend makes use of this
information. We must also pass the MCSubtargetInfo to applyFixup because
some fixups skip error checking on the assumption that relaxation has
occurred, to prevent code-generation errors applyFixup must see the same
MCSubtargetInfo as fixupNeedsRelaxation.

Differential Revision: https://reviews.llvm.org/D44928



git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@334078 91177308-0d34-0410-b5e6-96231b3b80d8
2018-06-06 09:40:06 +00:00
Ivan A. Kosarev
a13992d918 [NEON] Support VLD1xN intrinsics in AArch32 mode (LLVM part)
We currently support them only in AArch64. The NEON Reference,
however, says they are 'ARMv7, ARMv8' intrinsics.

Differential Revision: https://reviews.llvm.org/D47120


git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@333825 91177308-0d34-0410-b5e6-96231b3b80d8
2018-06-02 16:40:03 +00:00
Ivan A. Kosarev
f646a586eb Revert r333819 "[NEON] Support VLD1xN intrinsics in AArch32 mode (Clang part)"
The LLVM part was committed instead of the Clang part.

Differential Revision: https://reviews.llvm.org/D47121


git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@333824 91177308-0d34-0410-b5e6-96231b3b80d8
2018-06-02 16:38:38 +00:00
Ivan A. Kosarev
c5b2db16de [NEON] Support VLD1xN intrinsics in AArch32 mode (Clang part)
We currently support them only in AArch64. The NEON Reference,
however, says they are 'ARMv7, ARMv8' intrinsics.

Differential Revision: https://reviews.llvm.org/D47121


git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@333819 91177308-0d34-0410-b5e6-96231b3b80d8
2018-06-02 16:26:42 +00:00
Eli Friedman
9ef6691720 [ARM] Enable SETCCCARRY lowering for Thumb1.
We've had Thumb1 support for ARMISD::SUBE for a while now, so this just
works.  Reduces codesize a bit for 64-bit integer comparisons.

Differential Revision: https://reviews.llvm.org/D47387



git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@333445 91177308-0d34-0410-b5e6-96231b3b80d8
2018-05-29 18:17:16 +00:00
Tim Northover
b096d31bb6 ARM: be conservative when asked load/store alignment of weird type.
Chances are we'll be asked again after type legalization, but before that point
it's better to claim misaligned accesses aren't allowed than to assert.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@332840 91177308-0d34-0410-b5e6-96231b3b80d8
2018-05-21 12:43:54 +00:00
Haicheng Wu
b508121673 [GlobalMerge] Exit early if only one global is to be merged
To save some compilation time and prevent some unnecessary changes.

Differential Revision: https://reviews.llvm.org/D46640

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@332813 91177308-0d34-0410-b5e6-96231b3b80d8
2018-05-19 18:00:02 +00:00
Sanjay Patel
b231180388 [ARM] preserve test intent by removing undef
We need to clean up the DAG floating-point undef logic.
This process is similar to how we handled integer undef
logic in https://reviews.llvm.org/D43141.

And as we did there, I'm trying to reduce the patch by
changing tests that would probably become meaningless
once we correct FP undef folding.


git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@332638 91177308-0d34-0410-b5e6-96231b3b80d8
2018-05-17 18:09:56 +00:00
Sanjay Patel
28e0f25edb [ARM] preserve test intent by removing undef
We need to clean up the DAG floating-point undef logic.
This process is similar to how we handled integer undef
logic in https://reviews.llvm.org/D43141.

And as we did there, I'm trying to reduce the patch by
changing tests that would probably become meaningless
once we correct FP undef folding.

Follow-up to:
https://reviews.llvm.org/rL332538
...because that change wasn't enough.


git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@332637 91177308-0d34-0410-b5e6-96231b3b80d8
2018-05-17 18:08:27 +00:00
Sanjay Patel
9be09777af [ARM] preserve test intent by removing undef
We need to clean up the DAG floating-point undef logic.
This process is similar to how we handled integer undef
logic in D43141.

And as we did there, I'm trying to reduce the patch by
changing tests that would probably become meaningless
once we correct FP undef folding.


git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@332539 91177308-0d34-0410-b5e6-96231b3b80d8
2018-05-16 22:20:33 +00:00
Sanjay Patel
9b35a3acde [ARM] preserve test intent by removing undef
We need to clean up the DAG floating-point undef logic.
This process is similar to how we handled integer undef
logic in D43141.

And as we did there, I'm trying to reduce the patch by
changing tests that would probably become meaningless
once we correct FP undef folding.


git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@332538 91177308-0d34-0410-b5e6-96231b3b80d8
2018-05-16 22:20:26 +00:00
Sanjay Patel
5cb89e921d [ARM] preserve test intent by removing undef
We need to clean up the DAG floating-point undef logic.
This process is similar to how we handled integer undef
logic in D43141.

And as we did there, I'm trying to reduce the patch by
changing tests that would probably become meaningless
once we correct FP undef folding.


git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@332537 91177308-0d34-0410-b5e6-96231b3b80d8
2018-05-16 22:20:11 +00:00
Sanjay Patel
f57d10b7e5 [ARM] preserve test intent by removing undef
We need to clean up the DAG floating-point undef logic.
This process is similar to how we handled integer undef
logic in D43141.

And as we did there, I'm trying to reduce the patch by
changing tests that would probably become meaningless
once we correct FP undef folding.


git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@332533 91177308-0d34-0410-b5e6-96231b3b80d8
2018-05-16 21:57:19 +00:00
Sanjay Patel
398f674fff [ARM] preserve test intent by removing undef
We need to clean up the DAG floating-point undef logic.
This process is similar to how we handled integer undef
logic in D43141.

And as we did there, I'm trying to reduce the patch by
changing tests that would probably become meaningless
once we correct FP undef folding.


git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@332532 91177308-0d34-0410-b5e6-96231b3b80d8
2018-05-16 21:57:00 +00:00
Sirish Pande
f9deb98480 [AArch64] Gangup loads and stores for pairing.
Keep loads and stores together (target defines how many loads
and stores to gang up), such that it will help in pairing
and vectorization.

Differential Revision https://reviews.llvm.org/D46477

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@332482 91177308-0d34-0410-b5e6-96231b3b80d8
2018-05-16 15:36:52 +00:00
Amara Emerson
b148259401 [GlobalISel][IRTranslator] Split aggregates during IR translation.
We currently handle all aggregates by creating one large LLT, and letting the
legalizer deal with splitting them up. However using this approach means that
we can't support big endian code correctly.

This patch changes the way that the IRTranslator deals with aggregate values,
by splitting them up into their constituent element values. To do this, parts
of the translator need to be modified to deal with multiple VRegs for a single
Value.

A new Value to VReg mapper is introduced to help keep compile time under
control, currently there is no measurable impact on CTMark despite the extra
code being generated in some cases.

Patch is based on the original work of Tim Northover.

Differential Revision: https://reviews.llvm.org/D46018

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@332449 91177308-0d34-0410-b5e6-96231b3b80d8
2018-05-16 10:32:02 +00:00
Sanjay Patel
f46201df37 [DAG] propagate FMF for all FPMathOperators
This is a simple hack based on what's proposed in D37686, but we can extend it if needed in follow-ups. 
It gets us most of the FMF functionality that we want without adding any state bits to the flags. It 
also intentionally leaves out non-FMF flags (nsw, etc) to minimize the patch.

It should provide a superset of the functionality from D46563 - the extra tests show propagation and 
codegen diffs for fcmp, vecreduce, and FP libcalls.

The PPC log2() test shows the limits of this most basic approach - we only applied 'afn' to the last 
node created for the call. AFAIK, there aren't any libcall optimizations based on the flags currently, 
so that shouldn't make any difference.

Differential Revision: https://reviews.llvm.org/D46854


git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@332358 91177308-0d34-0410-b5e6-96231b3b80d8
2018-05-15 14:16:24 +00:00
Martin Storsjo
4fc07cda7d [ARM] Back up R4 and LR if calling the stack probe function
Differential Revision: https://reviews.llvm.org/D46777

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@332298 91177308-0d34-0410-b5e6-96231b3b80d8
2018-05-14 21:32:52 +00:00
Shiva Chen
a8a13bc662 [DebugInfo] Add DILabel metadata and intrinsic llvm.dbg.label.
In order to set breakpoints on labels and list source code around
labels, we need collect debug information for labels, i.e., label
name, the function label belong, line number in the file, and the
address label located. In order to keep these information in LLVM
IR and to allow backend to generate debug information correctly.
We create a new kind of metadata for labels, DILabel. The format
of DILabel is

!DILabel(scope: !1, name: "foo", file: !2, line: 3)

We hope to keep debug information as much as possible even the
code is optimized. So, we create a new kind of intrinsic for label
metadata to avoid the metadata is eliminated with basic block.
The intrinsic will keep existing if we keep it from optimized out.
The format of the intrinsic is

llvm.dbg.label(metadata !1)

It has only one argument, that is the DILabel metadata. The
intrinsic will follow the label immediately. Backend could get the
label metadata through the intrinsic's parameter.

We also create DIBuilder API for labels to be used by Frontend.
Frontend could use createLabel() to allocate DILabel objects, and use
insertLabel() to insert llvm.dbg.label intrinsic in LLVM IR.

Differential Revision: https://reviews.llvm.org/D45024

Patch by Hsiangkai Wang.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@331841 91177308-0d34-0410-b5e6-96231b3b80d8
2018-05-09 02:40:45 +00:00
Daniel Sanders
722cccb8cf [globalisel] Remove redundant -global-isel option from tests that use -run-pass. NFC
As Roman Tereshin pointed out in https://reviews.llvm.org/D45541, the
-global-isel option is redundant when -run-pass is given. -global-isel sets up
the GlobalISel passes in the pass manager but -run-pass skips that entirely and
configures it's own pipeline.



git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@331603 91177308-0d34-0410-b5e6-96231b3b80d8
2018-05-05 21:19:59 +00:00
Tim Northover
4c19071b5b ARM: don't try to over-align large vectors as arguments.
By default LLVM thinks very large vectors get aligned to their size when
passed across functions. Unfortunately no-one told the ARM backend so it
doesn't trigger stack realignment and so accesses can cause the usual
misalignment issues (e.g. a data abort).

This changes the ABI alignment to the stack alignment, which in practice
(and as a bonus) also coincides with the alignment "natural" vectors get.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@331451 91177308-0d34-0410-b5e6-96231b3b80d8
2018-05-03 12:54:25 +00:00
Vedant Kumar
1ae0aae552 [DAGCombiner] Fix SDLoc in a (zext (zextload x)) combine (4/N)
The logic for this combine is almost identical to the logic for a
(sext (sextload x)) combine.

This commit factors out the logic so it can be shared by both combines,
and corrects the SDLoc assigned in the zext version of the combine.

Prior to this patch, for the given test case, we would apply the
location associated with the udiv instruction to instructions which
perform the load.

Part of: llvm.org/PR37262

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@331303 91177308-0d34-0410-b5e6-96231b3b80d8
2018-05-01 19:51:15 +00:00
Vedant Kumar
cad24a0704 [DAGCombiner] Fix SDLoc in a (sext (sextload x)) combine (3/N)
Prior to this patch, for the given test case, we would apply the
location associated with the sdiv instruction to instructions which
perform the load.

Part of: llvm.org/PR37262.

Differential Revision: https://reviews.llvm.org/D46222

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@331302 91177308-0d34-0410-b5e6-96231b3b80d8
2018-05-01 19:51:15 +00:00
Vedant Kumar
f3a5c86e78 [DAGCombiner] Set the right SDLoc on a newly-created zextload (1/N)
Setting the right SDLoc on a newly-created zextload fixes a line table
bug which resulted in non-linear stepping behavior.

Several backend tests contained CHECK lines which relied on the IROrder
inherited from the wrong SDLoc. This patch breaks that dependence where
feasbile and regenerates test cases where not.

In some cases, changing a node's IROrder may alter register allocation
and spill behavior. This can affect performance. I have chosen not to
prevent this by applying a "known good" IROrder to SDLocs, as this may
hide a more general bug in the scheduler, or cause regressions on other
test inputs.

rdar://33755881, Part of: llvm.org/PR37262

Differential Revision: https://reviews.llvm.org/D45995

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@331300 91177308-0d34-0410-b5e6-96231b3b80d8
2018-05-01 19:26:15 +00:00
Daniel Sanders
9378c2c1c0 [globalisel][legalizerinfo] Add support for legalization based on the MachineMemOperand
Summary:
Currently only the memory size is supported but others can be added as
needed.

narrowScalar for G_LOAD and G_STORE now correctly update the
MachineMemOperand and will refuse to legalize atomics since those need more
careful expansions to maintain atomicity.

Reviewers: ab, aditya_nandakumar, bogner, rtereshin, aemerson, javed.absar

Reviewed By: aemerson

Subscribers: aemerson, rovka, kristof.beyls, javed.absar, llvm-commits

Differential Revision: https://reviews.llvm.org/D45466

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@331071 91177308-0d34-0410-b5e6-96231b3b80d8
2018-04-27 19:48:53 +00:00
Oliver Stannard
3a0e81d014 [ARM] Codegen for v8.2A dot product intrinsics
This adds IR intrinsics for the ARM dot-product instructions introduced in
v8.2-A.

Differential revision: https://reviews.llvm.org/D46106



git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@331032 91177308-0d34-0410-b5e6-96231b3b80d8
2018-04-27 12:50:40 +00:00
David Green
3b36f77f20 [ARM] Enable misched for R52.
Back when the R52 schedule was added in rL286949, there was no way
to enable machine schedules in ARM for specific cores. Since then a
target feature has been added. This enables the feature for R52,
removing the need to manually specify compiler flags.



git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@331027 91177308-0d34-0410-b5e6-96231b3b80d8
2018-04-27 11:29:49 +00:00