Backends don't support this yet. They would have to move to the swifterror
register before the tail call to make sure it is live-in to the call.
rdar://30495920
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@294982 91177308-0d34-0410-b5e6-96231b3b80d8
I'd missed a creator of FCMP nodes - duplicateCmp().
Kindly and promptly reported by Gabor Ballabas, due to his CSiBE test suite.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@294968 91177308-0d34-0410-b5e6-96231b3b80d8
Summary:
The attached test case fails with "fatal error: error in backend:
misaligned pc-relative fixup value" as the jump table is misaligned.
The EmitAlignment existed already for ARM and Thumb-1 code, but was
missing for Thumb-2.
The test checks that the fatal error disappears when generating an obj
file, as well as checking the align directive is there when producing an
asm file.
Reviewers: rengolin, grosbach, t.p.northover, jmolloy, SjoerdMeijer, samparker
Reviewed By: samparker
Subscribers: samparker, aemerson, llvm-commits
Differential Revision: https://reviews.llvm.org/D29650
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@294950 91177308-0d34-0410-b5e6-96231b3b80d8
When generating a floating point comparison we currently unconditionally
generate VCMPE. This has the sideeffect of setting the cumulative Invalid
bit in FPSCR if any of the operands are QNaN.
It is expected that use of a relational predicate on a QNaN value should
raise Invalid. Quoting from the C standard:
The relational and equality operators support the usual mathematical
relationships between numeric values. For any ordered pair of numeric
values exactly one of relationships the less, greater, equal and is true.
Relational operators may raise the floating-point exception when argument
values are NaNs.
The standard doesn't explicitly state the expectation for equality operators,
but the implication and obvious expectation is that equality operators
should not raise Invalid on a QNaN input, as those predicates are wholly
defined on unordered inputs (to return not equal).
Therefore, add a new operand to ARMISD::FPCMP and FPCMPZ indicating if
QNaN should raise Invalid, and pipe that through to TableGen.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@294945 91177308-0d34-0410-b5e6-96231b3b80d8
In the encoding of system registers in the M-class MSR instruction the mask bits
should be 2 for registers that don't take a _<bits> qualifier (the instruction
is unpredictable otherwise), and should also be 2 if the register takes a
_<bits> qualifier but it's not present as no _<bits> is an alias for _nzcvq.
Differential Revision: https://reviews.llvm.org/D29828
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@294762 91177308-0d34-0410-b5e6-96231b3b80d8
Gcc supports target armv7ve which is armv7-a with virtualization
extensions. This change adds support for this in llvm for gcc
compatibility.
Also remove redundant FeatureHWDiv, FeatureHWDivARM for a few models as
this is specified automatically by FeatureVirtualization.
Patch by Manoj Gupta.
Differential Revision: https://reviews.llvm.org/D29472
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@294661 91177308-0d34-0410-b5e6-96231b3b80d8
If some of the trailing or leading bytes of a load combine pattern are zeroes we can combine the pattern to a load + zext and shift. Currently we don't support it, so the tests check the current codegen without load combine. This change will make the patch to support this kind of combine a bit more clear.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@294591 91177308-0d34-0410-b5e6-96231b3b80d8
Both for aapcscc and aapcs_vfpcc. We currently filter out soft float targets
because we don't support libcalls yet.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@294584 91177308-0d34-0410-b5e6-96231b3b80d8
Functions that have a dynamic alloca require a base register which is defined to
be X19 on AArch64 and r6 on ARM. We have defined the swifterror register to be
the same register. Use a different callee save register for swifterror instead:
X21 on AArch64
R8 on ARM
rdar://30433803
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@294551 91177308-0d34-0410-b5e6-96231b3b80d8
We mark X0 as preserved by a call that passes the returned parameter.
x0 = ...
fun(x0) // no implicit def of x0
This no longer is valid if we pass the parameter in a different register then
the returned value as is the case with a swiftself parameter (passed in x20).
x20 = ...
fun(x20) // there should be an implict def of x8
rdar://30425845
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@294527 91177308-0d34-0410-b5e6-96231b3b80d8
I forgot to remove the neonfp target feature from the test, which means we'd
have trouble selecting VADDS on targets that have neonfp enabled by default.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@294451 91177308-0d34-0410-b5e6-96231b3b80d8
Add a register bank for floating point values and select simple instructions
using them (add, copies from GPR).
This assumes that the hardware can cope with a single precision add (VADDS)
instruction, so the legalizer will treat G_FADD as legal and the instruction
selector will refuse to select if the hardware doesn't support it. In the future
we'll want to be more careful about this, and legalize to libcalls if we have to
use soft float.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@294442 91177308-0d34-0410-b5e6-96231b3b80d8
Currently we don't support these nodes, so the tests check the current codegen without load combine. This change makes the review of the change to support these nodes more clear.
Separated from https://reviews.llvm.org/D29591 review.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@294305 91177308-0d34-0410-b5e6-96231b3b80d8
When constructing global address literals while targeting the RWPI
relocation model. LLVM currently only uses literal pools. If MOVW/MOVT
instructions are available we can use these instead. Beside being more
efficient it allows -arm-execute-only to work with
-relocation-model=RWPI as well.
When we generate MOVW/MOVT for global addresses when targeting the RWPI
relocation model, we need to use base relative relocations. This patch
does the needed plumbing in MC to generate these for MOVW/MOVT.
Differential Revision: https://reviews.llvm.org/D29487
Change-Id: I446786e43a6f5aa9b6a5bb2cd216d60d41c7755d
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@294298 91177308-0d34-0410-b5e6-96231b3b80d8
Summary:
The tail call optimisation is performed before register allocation, so
at that point we don't know if LR is being spilt or not. If LR was spilt
to the stack, then we cannot do a tail call optimisation. That would
involve popping back into LR which is not possible in Thumb1 code.
Reviewers: rengolin, jmolloy, rovka, olista01
Reviewed By: olista01
Subscribers: llvm-commits, aemerson
Differential Revision: https://reviews.llvm.org/D29020
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@294000 91177308-0d34-0410-b5e6-96231b3b80d8
Summary:
llc would hit a fatal error for errors in inline assembly. The
diagnostics message is now printed.
Reviewers: rengolin, MatzeB, javed.absar, anemet
Reviewed By: anemet
Subscribers: jyknight, nemanjai, llvm-commits
Differential Revision: https://reviews.llvm.org/D29408
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@293999 91177308-0d34-0410-b5e6-96231b3b80d8
This is the second in the series of patches to enable adding
of machine sched-models for ARM processors easier and compact.
This patch focuses on integer instructions and adds missing
sched definitions.
Reviewers: rovka, rengolin
Differential Revision: https://reviews.llvm.org/D29127
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@293935 91177308-0d34-0410-b5e6-96231b3b80d8
Recommiting after fixing X86 inc/dec chain bug.
* Simplify Consecutive Merge Store Candidate Search
Now that address aliasing is much less conservative, push through
simplified store merging search and chain alias analysis which only
checks for parallel stores through the chain subgraph. This is cleaner
as the separation of non-interfering loads/stores from the
store-merging logic.
When merging stores search up the chain through a single load, and
finds all possible stores by looking down from through a load and a
TokenFactor to all stores visited.
This improves the quality of the output SelectionDAG and the output
Codegen (save perhaps for some ARM cases where we correctly constructs
wider loads, but then promotes them to float operations which appear
but requires more expensive constant generation).
Some minor peephole optimizations to deal with improved SubDAG shapes (listed below)
Additional Minor Changes:
1. Finishes removing unused AliasLoad code
2. Unifies the chain aggregation in the merged stores across code
paths
3. Re-add the Store node to the worklist after calling
SimplifyDemandedBits.
4. Increase GatherAllAliasesMaxDepth from 6 to 18. That number is
arbitrary, but seems sufficient to not cause regressions in
tests.
5. Remove Chain dependencies of Memory operations on CopyfromReg
nodes as these are captured by data dependence
6. Forward loads-store values through tokenfactors containing
{CopyToReg,CopyFromReg} Values.
7. Peephole to convert buildvector of extract_vector_elt to
extract_subvector if possible (see
CodeGen/AArch64/store-merge.ll)
8. Store merging for the ARM target is restricted to 32-bit as
some in some contexts invalid 64-bit operations are being
generated. This can be removed once appropriate checks are
added.
This finishes the change Matt Arsenault started in r246307 and
jyknight's original patch.
Many tests required some changes as memory operations are now
reorderable, improving load-store forwarding. One test in
particular is worth noting:
CodeGen/PowerPC/ppc64-align-long-double.ll - Improved load-store
forwarding converts a load-store pair into a parallel store and
a memory-realized bitcast of the same value. However, because we
lose the sharing of the explicit and implicit store values we
must create another local store. A similar transformation
happens before SelectionDAG as well.
Reviewers: arsenm, hfinkel, tstellarAMD, jyknight, nhaehnle
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@293893 91177308-0d34-0410-b5e6-96231b3b80d8
It is important to change the ArgInfo's type from pointer to integer, otherwise
the CC assign function won't know what to do. Instead of hacking it up, we use
ComputeValueVTs and introduce some of the helpers that we will need later on for
lowering more complex types.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@293889 91177308-0d34-0410-b5e6-96231b3b80d8
Make it legal to load pointer values. Also check that pointers are assigned
to the GPR reg bank by default.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@293886 91177308-0d34-0410-b5e6-96231b3b80d8
Add both cores to the target parser and TableGen. Test that eabi
attributes are set correctly for both cores. Additionally, test the
absence and presence of MOVT in Cortex-M23 and Cortex-M33, respectively.
Committed on behalf of Sanne Wouda.
Reviewers : rengolin, olista01.
Differential Revision: https://reviews.llvm.org/D29073
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@293761 91177308-0d34-0410-b5e6-96231b3b80d8
When choosing the best successor for a block, ordinarily we would have preferred
a block that preserves the CFG unless there is a strong probability the other
direction. For small blocks that can be duplicated we now skip that requirement
as well, subject to some simple frequency calculations.
Differential Revision: https://reviews.llvm.org/D28583
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@293716 91177308-0d34-0410-b5e6-96231b3b80d8
The Requires class overrides the target requirements of an instruction,
rather than adding to them, so all ARM instructions need to include the
IsARM predicate when they have overwitten requirements.
This caused the swp and swpb instructions to be allowed in thumb mode
assembly, and the ARM encoding of CDP to be selected in codegen (which
is different for conditional instructions).
Differential Revision: https://reviews.llvm.org/D29283
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@293634 91177308-0d34-0410-b5e6-96231b3b80d8
This fixes emitting conversions of constants on targets
without legal f16 that need to use these for legalization.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@293499 91177308-0d34-0410-b5e6-96231b3b80d8
Support lowering AEABI TLS access (__aeabi_read_tp) with long calls.
This requires adjusting the call sequence to use an indirect call to get
full addressability.
Resolves PR31769!
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@293433 91177308-0d34-0410-b5e6-96231b3b80d8
The interleaved access pass is an IR-to-IR transformation that runs before code
generation. It matches interleaved memory operations to target-specific
intrinsics (that are later lowered to load and store multiple instructions on
ARM/AArch64). We place tests for similar passes (e.g., GlobalMergePass) under
test/Transforms. This patch moves the InterleavedAccessPass tests out of
test/CodeGen and into target-specific directories under
test/Transforms/InterleavedAccess.
Although the pass is an IR pass, many of the existing tests were llc tests
rather opt tests. For example, the tests would check for ldN/stN instructions
generated by llc rather than the intrinsic calls the pass actually inserts.
Thus, this patch updates all tests to be opt tests that check for the inserted
intrinsics. We already have separate CodeGen tests that ensure we lower the
interleaved access intrinsics to their corresponding ldN/stN instructions. In
addition to migrating the tests to opt, this patch also performs some minor
clean-up (to ensure consistent naming, etc.).
Differential Revision: https://reviews.llvm.org/D29184
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@293309 91177308-0d34-0410-b5e6-96231b3b80d8
The Windows on ARM target uses custom division for normal division as
the backend needs to insert division-by-zero checks. However, it is
designed to only handle non-vectorized division. ARM has custom
lowering for vectorized division as that can avoid loading registers
with the values and invoke a division routine for each one, preferring
to lower using NEON instructions. Fall back to the custom lowering for
the NEON instructions if we encounter a vectorized division.
Resolves PR31778!
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@293259 91177308-0d34-0410-b5e6-96231b3b80d8
* Simplify Consecutive Merge Store Candidate Search
Now that address aliasing is much less conservative, push through
simplified store merging search and chain alias analysis which only
checks for parallel stores through the chain subgraph. This is cleaner
as the separation of non-interfering loads/stores from the
store-merging logic.
When merging stores search up the chain through a single load, and
finds all possible stores by looking down from through a load and a
TokenFactor to all stores visited.
This improves the quality of the output SelectionDAG and the output
Codegen (save perhaps for some ARM cases where we correctly constructs
wider loads, but then promotes them to float operations which appear
but requires more expensive constant generation).
Some minor peephole optimizations to deal with improved SubDAG shapes (listed below)
Additional Minor Changes:
1. Finishes removing unused AliasLoad code
2. Unifies the chain aggregation in the merged stores across code
paths
3. Re-add the Store node to the worklist after calling
SimplifyDemandedBits.
4. Increase GatherAllAliasesMaxDepth from 6 to 18. That number is
arbitrary, but seems sufficient to not cause regressions in
tests.
5. Remove Chain dependencies of Memory operations on CopyfromReg
nodes as these are captured by data dependence
6. Forward loads-store values through tokenfactors containing
{CopyToReg,CopyFromReg} Values.
7. Peephole to convert buildvector of extract_vector_elt to
extract_subvector if possible (see
CodeGen/AArch64/store-merge.ll)
8. Store merging for the ARM target is restricted to 32-bit as
some in some contexts invalid 64-bit operations are being
generated. This can be removed once appropriate checks are
added.
This finishes the change Matt Arsenault started in r246307 and
jyknight's original patch.
Many tests required some changes as memory operations are now
reorderable, improving load-store forwarding. One test in
particular is worth noting:
CodeGen/PowerPC/ppc64-align-long-double.ll - Improved load-store
forwarding converts a load-store pair into a parallel store and
a memory-realized bitcast of the same value. However, because we
lose the sharing of the explicit and implicit store values we
must create another local store. A similar transformation
happens before SelectionDAG as well.
Reviewers: arsenm, hfinkel, tstellarAMD, jyknight, nhaehnle
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@293184 91177308-0d34-0410-b5e6-96231b3b80d8
Add support for loading i1, i8 and i16 arguments from the stack, with or without
the ABI extension flags.
When the ABI extension flags are present, we load a 4-byte value, otherwise we
preserve the size of the load and let the instruction selector replace it with a
LDRB/LDRH. This generates the same thing as DAGISel.
Differential Revision: https://reviews.llvm.org/D27803
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@293163 91177308-0d34-0410-b5e6-96231b3b80d8
Later code expects the vector loads produced to be directly
concatenable, which means we shouldn't pad anything except the last load
produced with UNDEF.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@293088 91177308-0d34-0410-b5e6-96231b3b80d8
The previous patch (https://reviews.llvm.org/rL289538) got reverted because of a bug. Chandler also requested some changes to the algorithm.
http://lists.llvm.org/pipermail/llvm-commits/Week-of-Mon-20161212/413479.html
This is an updated patch. The key difference is that collectBitProviders (renamed to calculateByteProvider) now collects the origin of one byte, not the whole value. It simplifies the implementation and allows to stop the traversal earlier if we know that the result won't be used.
From the original commit:
Match a pattern where a wide type scalar value is loaded by several narrow loads and combined by shifts and ors. Fold it into a single load or a load and a bswap if the targets supports it.
Assuming little endian target:
i8 *a = ...
i32 val = a[0] | (a[1] << 8) | (a[2] << 16) | (a[3] << 24)
=>
i32 val = *((i32)a)
i8 *a = ...
i32 val = (a[0] << 24) | (a[1] << 16) | (a[2] << 8) | a[3]
=>
i32 val = BSWAP(*((i32)a))
This optimization was discussed on llvm-dev some time ago in "Load combine pass" thread. We came to the conclusion that we want to do this transformation late in the pipeline because in presence of atomic loads load widening is irreversible transformation and it might hinder other optimizations.
Eventually we'd like to support folding patterns like this where the offset has a variable and a constant part:
i32 val = a[i] | (a[i + 1] << 8) | (a[i + 2] << 16) | (a[i + 3] << 24)
Matching the pattern above is easier at SelectionDAG level since address reassociation has already happened and the fact that the loads are adjacent is clear. Understanding that these loads are adjacent at IR level would have involved looking through geps/zexts/adds while looking at the addresses.
The general scheme is to match OR expressions by recursively calculating the origin of individual bytes which constitute the resulting OR value. If all the OR bytes come from memory verify that they are adjacent and match with little or big endian encoding of a wider value. If so and the load of the wider type (and bswap if needed) is allowed by the target generate a load and a bswap if needed.
Reviewed By: RKSimon, filcab, chandlerc
Differential Revision: https://reviews.llvm.org/D27861
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@293036 91177308-0d34-0410-b5e6-96231b3b80d8
At the moment, this means supporting the signext/zeroext attribute on the return
type of the function. For function arguments, signext/zeroext should be handled
by the caller, so there's nothing for us to do until we start lowering calls.
Note that this does not include support for other extensions (i8 to i16), those
will be added later.
Differential Revision: https://reviews.llvm.org/D27705
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@293034 91177308-0d34-0410-b5e6-96231b3b80d8
This is a series of patches to enable adding of machine sched
models for ARM processors easier and compact. They define new
sched-readwrites for groups of ARM instructions. This has been
missing so far, and as a consequence, machine scheduler models
for individual sub-targets have tended to be larger than they
needed to be.
The current patch focuses on floating-point instructions.
Reviewers: Diana Picus (rovka), Renato Golin (rengolin)
Differential Revision: https://reviews.llvm.org/D28194
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@292825 91177308-0d34-0410-b5e6-96231b3b80d8
We also want to optimise tests like this: return a*b == 0. The MULS
instruction is flag setting, so we don't need the CMP instruction but can
instead branch on the result of the MULS. The generated instructions sequence
for this example was: MULS, MOVS, MOVS, CMP. The MOVS instruction load the
boolean values resulting from the select instruction, but these MOVS
instructions are flag setting and were thus preventing this optimisation. Now
we first reorder and move the MULS to before the CMP and generate sequence
MOVS, MOVS, MULS, CMP so that the optimisation could trigger. Reordering of the
MULS and MOVS is safe to do because the subsequent MOVS instructions just set
the CPSR register and don't use it, i.e. the CPSR is dead.
Differential Revision: https://reviews.llvm.org/D27990
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@292608 91177308-0d34-0410-b5e6-96231b3b80d8
Summary:
Emission of XRay table was occasionally disabled for Arm32, but this bug was not then detected because earlier (also by mistake) testing of XRay was occasionally disabled on 32-bit Arm targets. This patch should fix that problem and detect such problems in the future.
This patch is one of a series, see also
- https://reviews.llvm.org/D28623
Reviewers: rengolin, dberris
Reviewed By: dberris
Subscribers: llvm-commits, aemerson, rengolin, dberris, iid_iunknown
Differential Revision: https://reviews.llvm.org/D28624
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@292516 91177308-0d34-0410-b5e6-96231b3b80d8
This reverts commit r292210, as it broke the Thumb buldbot with:
clang-5.0: error: the clang compiler does not support '-fxray-instrument
on thumbv7-unknown-linux-gnueabihf'.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@292357 91177308-0d34-0410-b5e6-96231b3b80d8
Summary:
Emission of XRay table was occasionally disabled for Arm32, but this bug was not then detected because earlier (also by mistake) testing of XRay was occasionally disabled on 32-bit Arm targets. This patch should fix that problem and detect such problems in the future.
This patch is one of a series, see also
- https://reviews.llvm.org/D28623
Reviewers: rengolin, dberris
Reviewed By: dberris
Subscribers: llvm-commits, aemerson, rengolin, dberris, iid_iunknown
Differential Revision: https://reviews.llvm.org/D28624
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@292210 91177308-0d34-0410-b5e6-96231b3b80d8
We were relying on constant folding of the legalized instructions to do what constant folding we had previously
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@292114 91177308-0d34-0410-b5e6-96231b3b80d8
When choosing the best successor for a block, ordinarily we would have preferred
a block that preserves the CFG unless there is a strong probability the other
direction. For small blocks that can be duplicated we now skip that requirement
as well.
Differential revision: https://reviews.llvm.org/D27742
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@291609 91177308-0d34-0410-b5e6-96231b3b80d8