Inline spiller can decide to move a spill as early as possible in the basic block.
It will skip phis and label, but we also need to make sure it skips instructions
in the basic block prologue which restore exec mask.
Added isPositionLike callback in TargetInstrInfo to detect instructions which
shall be skipped in addition to common phis, labels etc.
Differential Revision: https://reviews.llvm.org/D27997
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@292554 91177308-0d34-0410-b5e6-96231b3b80d8
This function can be used to accumulate the set of all read and modified
register in a sequence of instructions.
Use this code in AArch64A57FPLoadBalancing::scavengeRegister() to prove
the concept.
- The AArch64A57LoadBalancing code is using a backwards analysis now
which is irrespective of kill flags. This is the main motivation for
this change.
Differential Revision: http://reviews.llvm.org/D22082
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@292543 91177308-0d34-0410-b5e6-96231b3b80d8
Summary:
Emission of XRay table was occasionally disabled for Arm32, but this bug was not then detected because earlier (also by mistake) testing of XRay was occasionally disabled on 32-bit Arm targets. This patch should fix that problem and detect such problems in the future.
This patch is one of a series, see also
- https://reviews.llvm.org/D28623
Reviewers: rengolin, dberris
Reviewed By: dberris
Subscribers: llvm-commits, aemerson, rengolin, dberris, iid_iunknown
Differential Revision: https://reviews.llvm.org/D28624
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@292516 91177308-0d34-0410-b5e6-96231b3b80d8
As discussed on D28219 - it is profitable to combine trunc(binop (s/zext(x), s/zext(y)) to binop(trunc(s/zext(x)), trunc(s/zext(y))) assuming the trunc(ext()) will simplify further
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@292493 91177308-0d34-0410-b5e6-96231b3b80d8
Summary:
Adds a RegisterBank tablegen class that can be used to declare the register
banks and an associated tablegen pass to generate the necessary code.
Changes since first commit attempt:
* Added missing guards
* Added more missing guards
* Found and fixed a use-after-free bug involving Twine locals
Reviewers: t.p.northover, ab, rovka, qcolombet
Reviewed By: qcolombet
Subscribers: aditya_nandakumar, rengolin, kristof.beyls, vkalintiris, mgorny, dberris, llvm-commits, rovka
Differential Revision: https://reviews.llvm.org/D27338
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@292478 91177308-0d34-0410-b5e6-96231b3b80d8
Summary:
Currently we expand and scalarize these operations, but I think we should be able to implement ADD/SUB with KXOR and MUL with KAND.
We already do this for scalar i1 operations so I just extended it to vectors of i1.
Reviewers: zvi, delena
Reviewed By: delena
Subscribers: guyblank, llvm-commits
Differential Revision: https://reviews.llvm.org/D28888
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@292474 91177308-0d34-0410-b5e6-96231b3b80d8
For -(x + y) -> (-x) + (-y), if x == -y, this would
change the result from -0.0 to 0.0. Since the fma/fmad
combine is an extension of this problem it also
applies there.
fmul should be fine, and I don't think any of the unary
operators or conversions should be a problem either.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@292473 91177308-0d34-0410-b5e6-96231b3b80d8
r291670 doesn't crash on the original testcase from PR31589,
but it crashes on a slightly more complex one.
PR31589 has the new reproducer.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@292444 91177308-0d34-0410-b5e6-96231b3b80d8
ARM seems to prefer that long literals be formed from their little end in
order to promote the fusion of the instrs pairs MOV/MOVK and MOVK/MOVK on
Cortex A57 and others (v. "Cortex A57 Software Optimisation Guide", section
4.14).
Differential revision: https://reviews.llvm.org/D28697
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@292422 91177308-0d34-0410-b5e6-96231b3b80d8
Limit register coalescer by not allowing it to artificially increase
size of registers beyond dword. Such super-registers are in fact
register sequences and not distinct HW registers.
With more super-regs we would need to allocate adjacent registers
and constraint regalloc more than needed. Moreover, our super
registers are overlapping. For instance we have VGPR0_VGPR1_VGPR2,
VGPR1_VGPR2_VGPR3, VGPR2_VGPR3_VGPR4 etc, which complicates registers
allocation even more, resulting in excessive spilling.
Differential Revision: https://reviews.llvm.org/D28782
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@292413 91177308-0d34-0410-b5e6-96231b3b80d8
Summary:
In this function, virtual registers can be introduced (for example
through calls to emitThumbRegPlusImmInReg). doScavengeFrameVirtualRegs
will replace those virtual registers with concrete registers later on
in PrologEpilogInserter, which sets NoVRegs again.
This patch fixes the Codegen/Thumb/segmented-stacks.ll test case which
failed with expensive checks.
https://llvm.org/bugs/show_bug.cgi?id=27484
Reviewers: rnk, bkramer, olista01
Reviewed By: olista01
Subscribers: llvm-commits, rengolin
Differential Revision: https://reviews.llvm.org/D28829
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@292372 91177308-0d34-0410-b5e6-96231b3b80d8
Summary:
Adds a RegisterBank tablegen class that can be used to declare the register
banks and an associated tablegen pass to generate the necessary code.
Changes since last commit:
The new tablegen pass is now correctly guarded by LLVM_BUILD_GLOBAL_ISEL and
this should fix the buildbots however it may not be the whole fix. The previous
buildbot failures suggest there may be a memory bug lurking that I'm unable to
reproduce (including when using asan) or spot in the source. If they re-occur
on this commit then I'll need assistance from the bot owners to track it down.
Reviewers: t.p.northover, ab, rovka, qcolombet
Reviewed By: qcolombet
Subscribers: aditya_nandakumar, rengolin, kristof.beyls, vkalintiris, mgorny, dberris, llvm-commits, rovka
Differential Revision: https://reviews.llvm.org/D27338
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@292367 91177308-0d34-0410-b5e6-96231b3b80d8
Enable an ELFObjectFile to read the its arm build attributes to
produce a target triple with a specific ARM architecture.
llvm-objdump now uses this functionality to automatically produce
a more accurate target.
Differential Revision: https://reviews.llvm.org/D28769
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@292366 91177308-0d34-0410-b5e6-96231b3b80d8
This patch improves the mul instruction combine function (combineMul)
by adding new layer of logic.
In this patch, we are adding the ability to fold (mul x, -((1 << c) -1))
or (mul x, -((1 << c) +1)) into (neg(X << c) -x) or (neg((x << c) + x) respective.
Differential Revision: https://reviews.llvm.org/D28232
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@292358 91177308-0d34-0410-b5e6-96231b3b80d8
This reverts commit r292210, as it broke the Thumb buldbot with:
clang-5.0: error: the clang compiler does not support '-fxray-instrument
on thumbv7-unknown-linux-gnueabihf'.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@292357 91177308-0d34-0410-b5e6-96231b3b80d8
During post-RA pseudo expansion, an 'undef' flag of the source operand should
be propagated by emitGRX32Move().
Review: Ulrich Weigand
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@292353 91177308-0d34-0410-b5e6-96231b3b80d8
The grow_memory instruction now returns the previous memory size. Add the
return type to the LLVM intrinsic.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@292322 91177308-0d34-0410-b5e6-96231b3b80d8
Some instructions were printed as "foo\tbar", but most are printed as
"foo \bar". Standardize on the latter form.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@292306 91177308-0d34-0410-b5e6-96231b3b80d8
!strconcat is a variadic function; it will concatenate an arbitrary
number of strings. There's no need to nest it.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@292305 91177308-0d34-0410-b5e6-96231b3b80d8
Summary:
This change also lets us use max.{s,u}16. There's a vague warning in a
test about this maybe being less efficient, but I could not come up with
a case where the resulting SASS (sm_35 or sm_60) was different with or
without max.{s,u}16. It's true that nvcc seems to emit only
max.{s,u}32, but even ptxas 7.0 seems to have no problem generating
efficient SASS from max.{s,u}16 (the casts up to i32 and back down to
i16 seem to be implicit and nops, happening via register aliasing).
In the absence of evidence, better to have fewer special cases, emit
more straightforward code, etc. In particular, if a new GPU has 16-bit
min/max instructions, we want to be able to use them.
Reviewers: tra
Subscribers: jholewinski, llvm-commits
Differential Revision: https://reviews.llvm.org/D28732
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@292304 91177308-0d34-0410-b5e6-96231b3b80d8
Summary:
Avoid an unnecessary conversion operation when using the result of
ctpop.i32 or ctpop.i16 as an i32, as in both cases the ptx instruction
we run returns an i32.
(Previously if we used the value as an i32, we'd do an unnecessary
zext+trunc.)
Reviewers: tra
Subscribers: jholewinski, llvm-commits
Differential Revision: https://reviews.llvm.org/D28721
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@292302 91177308-0d34-0410-b5e6-96231b3b80d8
Summary:
* Disable "ctlz speculation", which inserts a branch on every ctlz(x) which
has defined behavior on x == 0 to check whether x is, in fact zero.
* Add DAG patterns that avoid re-truncating or re-expanding the result
of the 16- and 64-bit ctz instructions.
Reviewers: tra
Subscribers: llvm-commits, jholewinski
Differential Revision: https://reviews.llvm.org/D28719
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@292299 91177308-0d34-0410-b5e6-96231b3b80d8
Some platforms (notably iOS) use a different calling convention for unnamed vs
named parameters in varargs functions, so we need to keep track of this
information when translating calls.
Since not many platforms are involved, the guts of the special handling is in
the ValueHandler class (with a generic implementation that should work for most
targets).
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@292283 91177308-0d34-0410-b5e6-96231b3b80d8
Even with the fix from r291630, this still causes problems. I get
widespread assertion failures in the Swift runtime's WeakRefCount::increment()
function. I sent a reduced testcase in reply to the commit.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@292242 91177308-0d34-0410-b5e6-96231b3b80d8
Summary:
Emission of XRay table was occasionally disabled for Arm32, but this bug was not then detected because earlier (also by mistake) testing of XRay was occasionally disabled on 32-bit Arm targets. This patch should fix that problem and detect such problems in the future.
This patch is one of a series, see also
- https://reviews.llvm.org/D28623
Reviewers: rengolin, dberris
Reviewed By: dberris
Subscribers: llvm-commits, aemerson, rengolin, dberris, iid_iunknown
Differential Revision: https://reviews.llvm.org/D28624
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@292210 91177308-0d34-0410-b5e6-96231b3b80d8