Win64 has some strict requirements for the epilogue. As a result, we disable
shrink-wrapping for Win64 unless the block that gets the epilogue is already an
exit block.
Fixes PR24193.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@252088 91177308-0d34-0410-b5e6-96231b3b80d8
This patch improves the memory folding of the inserted float element for the (V)INSERTPS instruction.
The existing implementation occurs in the DAGCombiner and relies on the narrowing of a whole vector load into a scalar load (and then converted into a vector) to (hopefully) allow folding to occur later on. Not only has this proven problematic for debug builds, it also prevents other memory folds (notably stack reloads) from happening.
This patch removes the old implementation and moves the folding code to the X86 foldMemoryOperand handler. A new private 'special case' function - foldMemoryOperandCustom - has been added to deal with memory folding of instructions that can't just use the lookup tables - (V)INSERTPS is the first of several that could be done.
It also tweaks the memory operand folding code with an additional pointer offset that allows existing memory addresses to be modified, in this case to convert the vector address to the explicit address of the scalar element that will be inserted.
Unlike the previous implementation we now set the insertion source index to zero, although this is ignored for the (V)INSERTPSrm version, anything that relied on shuffle decodes (such as unfolding of insertps loads) was incorrectly calculating the source address - I've added a test for this at insertps-unfold-load-bug.ll
Differential Revision: http://reviews.llvm.org/D13988
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@252074 91177308-0d34-0410-b5e6-96231b3b80d8
Patch by Slava Klochkov
The key difference between FMA* and FMA*_Int opcodes is that FMA*_Int opcodes are handled more conservatively. It is illegal to commute the 1st operand of FMA*_Int instructions as the upper bits of scalar FMA intrinsic result must be taken from the 1st operand, but such commute transformation would change those upper bits and invalidate the intrinsic's result.
Reviewers: Quentin Colombet, Elena Demikhovsky
Differential Revision: http://reviews.llvm.org/D13710
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@252060 91177308-0d34-0410-b5e6-96231b3b80d8
If we have a CMOV, OR and AND combination such as:
if (x & CN)
y |= CM;
And:
* CN is a single bit;
* All bits covered by CM are known zero in y;
Then we can convert this to a sequence of BFI instructions. This will always be a win if CM is a single bit, will always be no worse than the TST & OR sequence if CM is two bits, and for thumb will be no worse if CM is three bits (due to the extra IT instruction).
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@252057 91177308-0d34-0410-b5e6-96231b3b80d8
The x86 "sitofp i64 to double" dag combine, in 32-bit mode, lowers sitofp
directly to X86ISD::FILD (or FILD_FLAG). This should not be done in soft-float mode.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@252042 91177308-0d34-0410-b5e6-96231b3b80d8
There is no point in having invoke safepoints handled differently than the
call safepoints. All relevant decisions could be made by looking at whether
or not gc.result and gc.relocate lay in a same basic block. This change will
allow to lower call safepoints with relocates and results in a different
basic blocks. See test case for example.
Differential Revision: http://reviews.llvm.org/D14158
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@252028 91177308-0d34-0410-b5e6-96231b3b80d8
XOP has the VPCMOV instruction that performs the common vector bit select operation OR( AND( SRC1, SRC3 ), AND( SRC2, ~SRC3 ) )
This patch adds tablegen pattern matching for this instruction.
Differential Revision: http://reviews.llvm.org/D8841
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@251975 91177308-0d34-0410-b5e6-96231b3b80d8
When push instructions are being used to pass function arguments on
the stack, and either EH or debugging are enabled, we need to generate
.cfi_adjust_cfa_offset directives appropriately. For (synch) EH, it is
enough for the CFA offset to be correct at every call site, while
for debugging we want to be correct after every push.
Darwin does not support this well, so don't use pushes whenever it
would be required.
Differential Revision: http://reviews.llvm.org/D13767
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@251904 91177308-0d34-0410-b5e6-96231b3b80d8
This was causing a variety of test failures when v2i64
is added as a legal type.
SIFixSGPRCopies should correctly handle the case of vector inputs
to a scalar reg_sequence, so this isn't necessary anymore. This
was hiding some deficiencies in how reg_sequence is handled later,
but this shouldn't be a problem anymore since the register class
copy of a reg_sequence is now done before the reg_sequence.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@251860 91177308-0d34-0410-b5e6-96231b3b80d8
I've found myself pointlessly debugging problems from running
graphics tests with an HSA triple a few times, so stop this from
happening again.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@251858 91177308-0d34-0410-b5e6-96231b3b80d8
In the current BB placement algorithm, a loop chain always contains all loop blocks. This has a drawback that cold blocks in the loop may be inserted on a hot function path, hence increasing branch cost and also reducing icache locality.
Consider a simple example shown below:
A
|
B⇆C
|
D
When B->C is quite cold, the best BB-layout should be A,B,D,C. But the current implementation produces A,C,B,D.
This patch filters those cold blocks off from the loop chain by comparing the ratio:
LoopBBFreq / LoopFreq
to 20%: if it is less than 20%, we don't include this BB to the loop chain. Here LoopFreq is the frequency of the loop when we reduce the loop into a single node. In general we have more cold blocks when the loop has few iterations. And vice versa.
Differential revision: http://reviews.llvm.org/D11662
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@251833 91177308-0d34-0410-b5e6-96231b3b80d8
1) PR25154. This is basically a repeat of PR18102, which was fixed in
r200201, and broken again by r234430. The latter changed which of the
store nodes was merged into from the first to the last. Thus, we now
also need to prefer merging a later store at a given address into the
target node, instead of an earlier one.
2) While investigating that, I also realized I'd introduced a bug in
r236850. There, I removed a check for alignment -- not realizing that
nothing except the alignment check was ensuring that none of the stores
were overlapping! This is a really bogus way to ensure there's no
aliased stores.
A better solution to both of these issues is likely to always use the
code added in the 'if (UseAA)' branches which rearrange the chain based
on a more principled analysis. I'll look into whether that can be used
always, but in the interest of getting things back to working, I think a
minimal change makes sense.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@251816 91177308-0d34-0410-b5e6-96231b3b80d8
This revision has introduced an issue that only affects bootstrapped compiler
when it is printing the ASM. It turns out that the new code path taken due to
legalizing a scalar_to_vector of i64 -> v2i64 exposes a missing check in a
micro optimization to change a load followed by a scalar_to_vector into a
load and splat instruction on PPC.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@251798 91177308-0d34-0410-b5e6-96231b3b80d8
Optimized <8 x i32> to <8 x i16>
<4 x i64> to < 4 x i32>
<16 x i16> to <16 x i8>
All these oprtrations use now AVX512F set (KNL). Before this change it was implemented with AVX2 set.
Differential Revision: http://reviews.llvm.org/D14108
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@251764 91177308-0d34-0410-b5e6-96231b3b80d8
This patch generalizes the zeroing of vector elements with the BLEND instructions. Currently a zero vector will only blend if the shuffled elements are correctly inline, this patch recognises when a vector input is zero (or zeroable) and modifies a local copy of the shuffle mask to support a blend. As a zeroable vector input may not be all zeroes, the zeroable vector is regenerated if necessary.
Differential Revision: http://reviews.llvm.org/D14050
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@251659 91177308-0d34-0410-b5e6-96231b3b80d8
Summary: Refer PR23377. This test was XFAIL'ed for Hexagon as well as ARM. But it has now started passing for ARM.
Reviewers: hans, rengolin, aemerson, kparzysz
Subscribers: aemerson, llvm-commits, rengolin
Differential Revision: http://reviews.llvm.org/D14155
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@251652 91177308-0d34-0410-b5e6-96231b3b80d8
This was discovered to be necessary while running memchr-01.ll with
-verify-machinstrs, because it is not allowed to have a phys reg live
accross block boundaries while on SSA form, if the register is
allocatable (expect in entry block and landing pads).
In this test case, stringRRE pseudos are expanded after isel by adding
a loop block which produces a live out CC register. To make the test
pass, it was also necessary to not say that StringRRELoop pseudo uses
R0L, this is only true for the StringRRE opcode.
-verify-machineinstrs added to memchr-01.ll test.
New test case int-cmp-51.ll to test that MachineCSE can eliminate
an identical compare (which it couldn't do before).
Reviewed by Ulrich Weigand
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@251634 91177308-0d34-0410-b5e6-96231b3b80d8
Summary:
This commit resolves wrong opcodes for ll and sc instructions for r6 architecutres, which were generated in method MipsTargetLowering::emitAtomicBinary.
Author: Jelena.Losic
Reviewers: dsanders
Subscribers: dsanders, llvm-commits
Differential Revision: http://reviews.llvm.org/D13593
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@251629 91177308-0d34-0410-b5e6-96231b3b80d8
Summary:
The microMIPS register class GPRMM16 does not contain the $zero register.
However, MipsSEDAGToDAGISel::replaceUsesWithZeroReg() would replace uses
of the $dst register:
[d]addiu, $dst, $zero, 0
with the $zero register, without checking for membership in the register
class of the target machine operand.
Reviewers: dsanders
Subscribers: llvm-commits, dsanders
Differential Revision: http://reviews.llvm.org/D13984
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@251622 91177308-0d34-0410-b5e6-96231b3b80d8
Since the verifier will give false reports if it incorrectly thinks MI is
loading or storing using an FI, it is necessary to scan memoperands and
find out how the FI is used in the instruction. This should be relatively
rare.
Needed to make CodeGen/SystemZ/spill-01.ll pass, which now runs with this flag.
Reviewed by Quentin Colombet.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@251620 91177308-0d34-0410-b5e6-96231b3b80d8
Summary:
Conversion opcode name format should be f64.convert_u/i64 not f64_convert_u
Author: s3ththompson
Reviewers: jfb
Subscribers: sunfish, jfb, llvm-commits, dschuff
Differential Revision: http://reviews.llvm.org/D14160
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@251613 91177308-0d34-0410-b5e6-96231b3b80d8
We cannot form ctr-based loops around function calls, including calls to
__tls_get_addr used for PIC TLS variables. References to such TLS variables,
however, might be buried within constant expressions, and so we need to search
the entire constant expression to be sure that no references to such TLS
variables exist.
Fixes PR25256, reported by Eric Schweitz. This is a slightly-modified version
of the patch suggested by Eric in the bug report, and a test case I created.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@251582 91177308-0d34-0410-b5e6-96231b3b80d8
As a follow-up to r251566, do the same for the other optionally-supported
register classes (mostly for vector registers). Don't return an unavailable
register class (which would cause an assert later), but fail cleanly when
provided an unsupported inline asm constraint.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@251575 91177308-0d34-0410-b5e6-96231b3b80d8
The most substantial changes are again for watchOS: libcalls are hard-float if
needed and sincos has a different calling convention.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@251571 91177308-0d34-0410-b5e6-96231b3b80d8
At the LLVM level this ABI is essentially a minimal modification of AAPCS to
support 16-byte alignment for vector types and the stack.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@251570 91177308-0d34-0410-b5e6-96231b3b80d8
When crbits are disabled, cleanly reject the constraint (return the register
class only to cause an assert later).
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@251566 91177308-0d34-0410-b5e6-96231b3b80d8
Add the crbits processor feature so that the test can be run at -O1, etc.
regardless of the default crbits setting.
Fixes PR23778.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@251548 91177308-0d34-0410-b5e6-96231b3b80d8
cntlz is the old POWER mnemonic. cntlzw is the PowerPC mnemonic.
This change fixes an issue when -no-integrated-as: The opcode cntlz is
unrecognized by gas
Alias the POWER mnemonic cntlz[.] to the PowerPC mnemonic cntlzw[.]
This is done for because the POWER cntlz mnemonic has be used by LLVM for
a very long time. We need to make sure that assembly programs
that are using the cntlz[.] do not break with this change.
Change PowerPC tests to reflect the insn change from cntlz to cntlzw.
Add assembly test to verify cntlz[.] is encoded correctly.
Patch by Tom Rix!
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@251489 91177308-0d34-0410-b5e6-96231b3b80d8
Summary:
Don't call `computeKnownBitsFromRangeMetadata` for extended loads --
this can cause a mismatch between the width of the !range metadata and
the width of the APInt's accumulating `KnownZero` (and `KnownOne` in the
future). This isn't a problem now, but will be after a future change.
Note: this can be made more aggressive in the future.
Reviewers: nlewycky
Subscribers: llvm-commits
Differential Revision: http://reviews.llvm.org/D14107
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@251486 91177308-0d34-0410-b5e6-96231b3b80d8