The memory form of the xmm->xmm version only writes 64-bits. If we use it in the folding tables and its get used for a stack spill, only half the slot will be written. Then a reload may read all 128-bits which will pull in garbage. But without the spill the upper bits of the register would have been zero. By not folding we would preserve the zeros.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@321950 91177308-0d34-0410-b5e6-96231b3b80d8
This is the last step needed to fix PR33325:
https://bugs.llvm.org/show_bug.cgi?id=33325
We're trading branch and compares for loads and logic ops.
This makes the code smaller and hopefully faster in most cases.
The 24-byte test shows an interesting construct: we load the trailing scalar
elements into vector registers and generate the same pcmpeq+movmsk code that
we expected for a pair of full vector elements (see the 32- and 64-byte tests).
Differential Revision: https://reviews.llvm.org/D41714
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@321934 91177308-0d34-0410-b5e6-96231b3b80d8
Summary:
This commit updates the BufferByteStreamer, used by DebugLocStream
to buffer bytes/comments to put in the debug_loc section, to
make sure that the Buffer and Comments vectors are synced.
Previously, when an SLEB128 or ULEB128 was emitted together with
a comment, the vectors could be out-of-sync if the LEB encoding
added several entries to the Buffer vectors, while we only added
a single entry to the Comments vector.
The goal with this is to get the comments in the debug_loc
section in the .s file correctly aligned.
Example (using ARM as target):
Instead of
.byte 144 @ sub-register DW_OP_regx
.byte 128 @ 256
.byte 2 @ DW_OP_piece
.byte 147 @ 8
.byte 8 @ sub-register DW_OP_regx
.byte 144 @ 257
.byte 129 @ DW_OP_piece
.byte 2 @ 8
.byte 147 @
.byte 8 @
we now get
.byte 144 @ sub-register DW_OP_regx
.byte 128 @ 256
.byte 2 @
.byte 147 @ DW_OP_piece
.byte 8 @ 8
.byte 144 @ sub-register DW_OP_regx
.byte 129 @ 257
.byte 2 @
.byte 147 @ DW_OP_piece
.byte 8 @ 8
Reviewers: JDevlieghere, rnk, aprantl
Reviewed By: aprantl
Subscribers: davide, Ka-Ka, uabelho, aemerson, javed.absar, kristof.beyls, llvm-commits, JDevlieghere
Differential Revision: https://reviews.llvm.org/D41763
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@321907 91177308-0d34-0410-b5e6-96231b3b80d8
This behavior existed to work with an old version of the gnu assembler on MacOS that only accepted this form. Newer versions of GNU assembler and the current LLVM derived version of the assembler on MacOS support movq as well.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@321898 91177308-0d34-0410-b5e6-96231b3b80d8
Select G_PHI to PHI and manually constrain the result register. This is
very similar to how COPY is handled, so extract and reuse some of that
code.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@321797 91177308-0d34-0410-b5e6-96231b3b80d8
We used to handle G_CONSTANT with pointer type by forcing the type of
the result register to s32 and then letting TableGen handle it.
Unfortunately, setting the type only works for generic virtual
registers, that haven't yet been constrained to a register class (e.g.
those used only by a COPY later on). If the result register has already
been constrained as a use of a previously selected instruction, then
setting the type will assert.
It would be nice to be able to teach TableGen to select pointer
constants the same as integer constants, but since it's such an edge
case (at the moment the only pointer constant that we're generally
interested in is 0, and that is mostly used for comparisons and selects,
which are also not supported by TableGen) it's probably not worth the
effort right now. Instead, handle pointer constants with some trivial
handwritten code.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@321793 91177308-0d34-0410-b5e6-96231b3b80d8
Handle this in DAGCombiner::visitEXTRACT_VECTOR_ELT the same as we already do in SelectionDAG::getNode and use APInt instead of getZExtValue.
This should also fix oss-fuzz #4910
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@321767 91177308-0d34-0410-b5e6-96231b3b80d8
The work order was changed in r228186 from SCC order
to RPO with an arbitrary sorting function. The sorting
function attempted to move inner loop nodes earlier. This
was was apparently relying on an assumption that every block
in a given loop / the same loop depth would be seen before
visiting another loop. In the broken testcase, a block
outside of the loop was encountered before moving onto
another block in the same loop. The testcase would then
structurize such that one blocks unconditional successor
could never be reached.
Revert to plain RPO for the analysis phase. This fixes
detecting edges as backedges that aren't really.
The processing phase does use another visited set, and
I'm unclear on whether the order there is as important.
An arbitrary order doesn't work, and triggers some infinite
loops. The reversed RPO list seems to work and is closer
to the order that was used before, minus the arbitary
custom sorting.
A few of the changed tests now produce smaller code,
and a few are slightly worse looking.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@321751 91177308-0d34-0410-b5e6-96231b3b80d8
Currently we use SIGN_EXTEND in lowerMasksToReg as part of calling convention setup, but we don't require a specific value for the upper bits.
This patch changes it to ANY_EXTEND which will be lowered as SIGN_EXTEND if it ends up sticking around.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@321746 91177308-0d34-0410-b5e6-96231b3b80d8
Previously the code for handling G_SMULO didn't properly check for the signed
multiply overflow, instead treating it the same as the unsigned G_UMULO.
Fixes PR35800.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@321690 91177308-0d34-0410-b5e6-96231b3b80d8
A call may have an intrinsic name but not have a valid intrinsic ID,
for example with llvm.invariant.group.barrier. If so, treat it as a
normal call like FastISel does.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@321662 91177308-0d34-0410-b5e6-96231b3b80d8
This is an extension of D31156 with the goal that we'll allow memcmp() == 0 expansion
for x86 to use 2 pairs of loads per block.
The memcmp expansion pass (formerly part of CGP) will generate this kind of pattern
with oversized integer compares, so we want to transform these into x86-specific vector
nodes before legalization splits things into scalar chunks.
See PR33325 for more details:
https://bugs.llvm.org/show_bug.cgi?id=33325
Differential Revision: https://reviews.llvm.org/D41618
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@321656 91177308-0d34-0410-b5e6-96231b3b80d8
Tests updated to explicitly use fast-isel at -O0 instead of implicitly.
This change also allows an explicit -fast-isel option to override an
implicitly enabled global-isel. Otherwise -fast-isel would have no effect at -O0.
Differential Revision: https://reviews.llvm.org/D41362
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@321655 91177308-0d34-0410-b5e6-96231b3b80d8
Our internal testing has revealed has discovered bugs in PPC builds.
I have forward reproduction instructions to the original author (Nirav).
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@321649 91177308-0d34-0410-b5e6-96231b3b80d8