- Fix doxygen file comment
- reduce indentation in loop
- Factor out some common subexpressions
- Move independent helper function out of class
- Fix Changed flag (this is not strictly NFC but a bugfix, but the flag
seems ignored anyway)
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@285488 91177308-0d34-0410-b5e6-96231b3b80d8
Summary:
Flat instruction can return out of order, so we need always need to wait
for all the outstanding flat operations.
Reviewers: tony-tye, arsenm
Subscribers: kzhuravl, wdng, nhaehnle, llvm-commits, yaxunl
Differential Revision: https://reviews.llvm.org/D25998
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@285479 91177308-0d34-0410-b5e6-96231b3b80d8
Also add glc bit to the scalar loads since they exist on VI
and change the caching behavior.
This currently has an assembler bug where the glc bit is incorrectly
accepted on SI/CI which do not have it.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@285463 91177308-0d34-0410-b5e6-96231b3b80d8
While trying to add the glc bit to SMEM instructions on VI
with the new refactoring I ran into some kind of shadowing
problem for the glc operand when using the pseudoinstruction
as a multiclass parameter.
Everywhere that currently uses it defines the operand to have the same
name as its type, i.e. glc:$glc which works. For some reason now it
conflicts, and its up evaluating to the wrong thing. For the
real encoding classes,
let Inst{16} = !if(ps.has_glc, glc, ?); was not being evaluated
and still visible in the Inst initializer in the expanded td file.
In other cases I got a a different error about an illegal operand
where this was using { 0 } initializer from the bits<1> glc initializer
instead of evaluating it as false in the if.
For consistency all of the operand types should probably
be captialized to avoid conflicting with the variable names
unless somebody has a better idea of how to fix this.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@285462 91177308-0d34-0410-b5e6-96231b3b80d8
Summary:
In isel, transform
Num % Den
into
Num - (Num / Den) * Den
if the result of Num / Den is already available.
Reviewers: tra
Subscribers: hfinkel, llvm-commits, jholewinski
Differential Revision: https://reviews.llvm.org/D26090
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@285461 91177308-0d34-0410-b5e6-96231b3b80d8
It's possible to have a use of the private resource descriptor or
scratch wave offset registers even though there are no allocated
stack objects. This would result in continuing to use the maximum
number reserved registers. This could go over the number of SGPRs
available on VI, or violate the SGPR limit requested by
the function attributes.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@285435 91177308-0d34-0410-b5e6-96231b3b80d8
Do not use LiveIntervals to recalculate kills, because that cannot be
done accurately without implicit uses on predicated instructions.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@285409 91177308-0d34-0410-b5e6-96231b3b80d8
The Windows ARM target expects the compiler to emit a division-by-zero check.
The check would use the form of:
cmp r?, #0
cbz .Ltrap
b .Lbody
.Lbody:
...
.Ltrap:
udf #249 @ __brkdiv0
This works great most of the time. However, if the body of the function is
greater than 127 bytes, the branch target limitation of cbz becomes an issue.
This occurs in the unoptimized code generation cases sometimes (like in
compiler-rt).
Since this is a matter of correctness, possibly pay a small penalty instead. We
now form this slightly differently:
cbnz .Lbody
udf #249 @ __brkdiv0
.Lbody:
...
The positive case is through the branch instead of being the next instruction.
However, because of the basic block layout, the negated branch is going to be
a short distance always (2 bytes away, after the inserted __brkdiv0).
The new t__brkdiv0 instruction is required to explicitly mark the instruction as
a terminator as the generic UDF instruction is not a terminator.
Addresses PR30532!
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@285312 91177308-0d34-0410-b5e6-96231b3b80d8
r282428 added the MipsOptimizePICCall as an opt-in pass that can be
skipped when using the -opt-bisect-limit option. However, this pass is
needed because it generates code that conforms to the o32 ABI
specification by using the $t9 register for PIC calls with JALR
instructions.
This bug was exposed by the fact that skipFunction() also checks for
the "optnone" attribute. This caused functions with that attribute to
break the requirements of the o32 ABI.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@285305 91177308-0d34-0410-b5e6-96231b3b80d8
UMAAL is a DSP instruction and it is not available on thumbv7m
(Cortex-M3) and thumbv6m (Cortex-M0+1) targets. Also fix wrong
CHECK prefix in longMAC.ll test.
Patch by Vadzim Dambrouski.
Differential Revision: https://reviews.llvm.org/D25890
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@285278 91177308-0d34-0410-b5e6-96231b3b80d8
Summary:
When finding a match for a merge and collecting the instructions that must
be moved, keep in mind that the instruction we merge might actually use one
of the defs that are being moved.
Fixes piglit spec/arb_enhanced_layouts/execution/component-layout/vs-tcs-load-output[-indirect].
The fact that the ds_read in the test case is not eliminated suggests that
there might be another problem related to alias analysis, but that's a
separate problem: this pass should still work correctly even when earlier
optimization passes missed something or were disabled.
Reviewers: tstellarAMD, arsenm
Subscribers: kzhuravl, wdng, yaxunl, llvm-commits, tony-tye
Differential Revision: https://reviews.llvm.org/D25829
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@285273 91177308-0d34-0410-b5e6-96231b3b80d8
Since Exynos-M2 improved the FP square root unit a bit over the one in
Exynos-M1, it does not benefit from using the Newton series for such
operations.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@285246 91177308-0d34-0410-b5e6-96231b3b80d8
It would be a very nice invariant to rely on, but unfortunately it doesn't
necessarily hold (and the causes of mis-sorted reglists appear to be quite
varied) so to be robust the frame lowering code can't assume that the first
register in the list is also the first one that actually gets pushed.
Should fix an issue where we were turning something like:
push {r8, r4, r7, lr}
sub sp, #24
into nonsense like:
push {r2, r3, r4, r5, r6, r7, r8, r4, r7, lr}
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@285232 91177308-0d34-0410-b5e6-96231b3b80d8
Update the README.txt with newer information, add a link to the Emscripten
page explaining the current easiest way to use the LLVM wasm backend, and
mention that other ways of using the LLVM wasm backend are in development.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@285215 91177308-0d34-0410-b5e6-96231b3b80d8
Add missing ISA versions 7.0.2/8.0.4/8.1.0. to backend.
Refactor processor definition to use ISA version features.
Fixed ISA version for stoney.
Based on Laurent Morichetti's patch.
Differential Revision: https://reviews.llvm.org/D25919
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@285210 91177308-0d34-0410-b5e6-96231b3b80d8
Summary: Clang's intrinsic header currently tries to negate the third operand of a vfmadd mask3 in order to create vfmsub, but this fails isel. This patch adds scalar vfmsub and vfnmsub mask3 that we can use instead to avoid the negate. This is consistent with the packed instructions.
Reviewers: igorb, delena
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D25933
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@285173 91177308-0d34-0410-b5e6-96231b3b80d8
On SparcV8, it was previously the case that a variable-sized alloca
might overlap by 4-bytes the last fixed stack variable, effectively
because 92 (the number of bytes reserved for the register spill area) !=
96 (the offset added to SP for where to start a DYNAMIC_STACKALLOC).
It's not as simple as changing 96 to 92, because variables that should
be 8-byte aligned would then be misaligned.
For now, simply increase the allocation size by 8 bytes for each dynamic
allocation -- wastes space, but at least doesn't overlap. As the large
comment says, doing this more efficiently will require larger changes in
llvm.
Also adds some test cases showing that we continue to not support
dynamic stack allocation and over-alignment in the same function.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@285131 91177308-0d34-0410-b5e6-96231b3b80d8
It is not safe to use LOAD ON CONDITION to implement access to a memory
location marked "volatile", since the architecture leaves it unspecified
whether or not an access happens if the condition is false.
The current code already appears to care about that:
def LOC : CondUnaryRSY<"loc", 0xEBF2, nonvolatile_load, GR32, 4>;
Unfortunately, that "nonvolatile_load" operator is simply ignored
by the CondUnaryRSY class, and there was no test to catch it.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@285077 91177308-0d34-0410-b5e6-96231b3b80d8
We already have (V)PMOVZX* combining support, this is the beginning of handling (V)PMOVSX* similarly - other combines in combineVSZext can be generalized in future patches.
This unearthed an interesting bug in that we were generating illegal build vectors on 32-bit targets - it was proving difficult to create a test for it from PMOVZX, but it fired immediately with PMOVSX. I've created a more general form of the existing getConstVector to handle these cases - ideally this should be handled in non-target-specific code but I couldn't find an equivalent.
Differential Revision: https://reviews.llvm.org/D25874
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@285072 91177308-0d34-0410-b5e6-96231b3b80d8
Summary: The one tricky thing about this is that the sign/zero_extend_inreg uses v64i8 as an input type which isn't legal without BWI support. Though the vpmovsxbq and vpmovzxbq instructions themselves don't require BWI. To support this we need to add custom lowering for ZERO_EXTEND_VECTOR_INREG with v64i8 input. This can mostly reuse the existing sign extend code with a couple checks for sign extend vs zero extend added.
Reviewers: delena, RKSimon
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D25594
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@285053 91177308-0d34-0410-b5e6-96231b3b80d8
This is a function to go backwards in a block to find the first
instruction in a bundle, so iterator is a more natural choice for
parameter/return rather than a reference to a MachineInstruction.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@285051 91177308-0d34-0410-b5e6-96231b3b80d8
The p2align operand of a load/store is encoded before the offset
operand; reorder the MachineInstr operands accordingly.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@285044 91177308-0d34-0410-b5e6-96231b3b80d8