During type promotion, sometimes we convert negative an add with a
negative constant into a sub with a positive constant. The loop that
performs this transformation has two issues:
- it iterates over a set, causing non-determinism.
- it breaks, instead of continuing, when it finds the first
non-negative operand.
Differential Revision: https://reviews.llvm.org/D58452
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@354557 91177308-0d34-0410-b5e6-96231b3b80d8
Previously if we couldn't derive a prototype for a "no-prototype"
function from C we would leave it as is:
void foo(...)
With this change we instead give is an empty signature and remove
the "no-prototype" attribute.
This fixes the current wasm waterfall test failure.
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D58488
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@354544 91177308-0d34-0410-b5e6-96231b3b80d8
and greendragon also passes, but some other bots fail for reasons I don't understand.
The only difference I can see between these tests is it's missing an -O0
If this doesn't work I'll revert and continue investigating.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@354529 91177308-0d34-0410-b5e6-96231b3b80d8
When we can't determine with certainty the signature of a function
import we pick the fist signature we find rather than error'ing out.
The resulting program might not do what is expected since we might pick
the wrong signature. However since undefined behavior in C to use the
same function with different signatures this seems better than refusing
to compile such programs.
Fixes PR40472
Differential Revision: https://reviews.llvm.org/D58304
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@354523 91177308-0d34-0410-b5e6-96231b3b80d8
This change makes some basic type combinations for G_SHUFFLE_VECTOR legal, and
implements them with a very pessimistic TBL2 instruction in the selector.
For TBL2, support is also needed to generate constant pool entries and load from
them in order to materialize the mask register.
Currently supports <2 x s64> and <4 x s32> result types.
Differential Revision: https://reviews.llvm.org/D58466
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@354521 91177308-0d34-0410-b5e6-96231b3b80d8
We can currently remove the mask if the immediate has all ones in the LSBs, but if the LHS of the AND is known zero, then the immediate might have had bits removed.
A similar issue also occurs with shifts and rotates. I'm preparing a common fix for all of them.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@354520 91177308-0d34-0410-b5e6-96231b3b80d8
If the LHS has known zeros, then the RHS immediate mask might have been simplified to remove those bits.
This patch adds a call to computeKnownBits to get the known zeroes to handle that possibility. I left an early out to skip the call if all of the demanded bits are set in the mask.
Differential Revision: https://reviews.llvm.org/D58464
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@354514 91177308-0d34-0410-b5e6-96231b3b80d8
Second part of https://bugs.llvm.org/show_bug.cgi?id=40442.
This adds an extra UnrollVectorOverflowOp() method to SDAG, because
the general UnrollOverflowOp() method can't deal with multiple results.
Additionally we need to expand UMULO/SMULO during vector op
legalization, as it may result in unrolling, which may need additional
type legalization.
Differential Revision: https://reviews.llvm.org/D57997
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@354513 91177308-0d34-0410-b5e6-96231b3b80d8
This avoids depending on the peephole pass to do load folding.
Also adds some load folding for some insert_subvector patterns that use blend.
All of this was found by temporarily adding TB_NO_FORWARD to the blend immediate entries in the load folding tables.
I've added -disable-peephole to some of the affected tests from that experiment to ensure we're testing isel patterns.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@354511 91177308-0d34-0410-b5e6-96231b3b80d8
If the bit position has known zeros in it, then the AND immediate will likely be optimized to remove bits.
This can prevent GetDemandedBits from recognizing that the AND is unnecessary.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@354501 91177308-0d34-0410-b5e6-96231b3b80d8
If the bit position has known zeros in it, then the AND immediate will likely be optimized to remove bits.
This can prevent GetDemandedBits from recognizing that the AND is unnecessary.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@354498 91177308-0d34-0410-b5e6-96231b3b80d8
Summary:
This test was failing in one of our setups because the generated ModuleID
had the full path of the test file and that path contained the string
BL.
Reviewers: t.p.northover, jpaquette, paquette
Reviewed By: paquette
Subscribers: javed.absar, kristof.beyls, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D58217
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@354497 91177308-0d34-0410-b5e6-96231b3b80d8
Summary:
Inc and Dec were at one point slow on Intel CPUs due to their tendency to cause partial flag stalls on P6 derived CPU cores. This is because these instructions are defined to preserve the carry flag. This partial flag stall issue persisted until Sandy Bridge when flag merging was changed to be handled as a data dependency instead of as a stall until retirement. Sandy Bridge and later CPUs rename the C flag separately from OSPAZ so there is no flag merge needed on INC/DEC to preserve the C flag.
Given these improvements I don't know why INC/DEC was ever considered slow on Sandy Bridge. If anything they should have been disabled on the earlier CPUs instead.
Note after this patch, INC/DEC are still considered slow on Silvermont, Goldmont, Knights Landing and our generic "x86-64" CPU.
Reviewers: spatel, RKSimon, chandlerc
Reviewed By: chandlerc
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D58412
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@354436 91177308-0d34-0410-b5e6-96231b3b80d8
The test case reloc-btf.ll is generated with an IR containing
spFlags introduced by https://reviews.llvm.org/rL347806.
In the case of BTF backporting, the old compiler may not
have this patch, so this test will fail during
validation.
This patch removed spFlags from IR in the test case
and used the old way for various flags.
Acked-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Yonghong Song <yhs@fb.com>
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@354409 91177308-0d34-0410-b5e6-96231b3b80d8
Summary:
Rename MemoryIndex to InitFlags and implement logic for determining
data segment layout in ObjectYAML and MC. Also adds a "passive" flag
for the .section assembler directive although this cannot be assembled
yet because the assembler does not support data sections.
Reviewers: sbc100, aardappel, aheejin, dschuff
Subscribers: jgravelle-google, hiraditya, sunfish, rupprecht, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D57938
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@354397 91177308-0d34-0410-b5e6-96231b3b80d8
After r354178, these instruction expand to a sequence that uses an OR instruction. That OR clobbers EFLAGS so we need to state that to avoid accidentally using the clobbered flags.
Our tests show the bug, but I didn't notice because the SETcc instructions didn't move after r354178 since it used to be safe to do the fp->int conversion first.
We should probably convert this whole sequence to SelectionDAG instead of a custom inserter to avoid mistakes like this.
Fixes PR40779
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@354395 91177308-0d34-0410-b5e6-96231b3b80d8
D42042 introduced the ability for the ExecutionDomainFixPass to more easily change between BLENDPD/BLENDPS/PBLENDW as the domains required.
With this ability, we can avoid most bitcasts/scaling in the DAG that was occurring with X86ISD::BLENDI lowering/combining, blend with the vXi32/vXi64 vectors directly and use isel patterns to lower to the float vector equivalent vectors.
This helps the shuffle combining and SimplifyDemandedVectorElts be more aggressive as we lose track of fewer UNDEF elements than when we go up/down through bitcasts.
I've introduced a basic blend(bitcast(x),bitcast(y)) -> bitcast(blend(x,y)) fold, there are more generalizations I can do there (e.g. widening/scaling and handling the tricky v16i16 repeated mask case).
The vector-reduce-smin/smax regressions will be fixed in a future improvement to SimplifyDemandedBits to peek through bitcasts and support X86ISD::BLENDV.
Reapplied after reversion at rL353699 - AVX2 isel fix was applied at rL354358, additional test at rL354360/rL354361
Differential Revision: https://reviews.llvm.org/D57888
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@354363 91177308-0d34-0410-b5e6-96231b3b80d8
Directly use the correct shift amount type if it is possible, and
future-proof the code against vectors. The added test makes sure that
bitwidths that do not fit into the shift amount type do not assert.
Split out from D57997.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@354359 91177308-0d34-0410-b5e6-96231b3b80d8
The VBROADCAST combines and SimplifyDemandedVectorElts improvements mean that we now more consistently use shorter (128-bit) X86vzload input operands.
Follow up to D58053
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@354346 91177308-0d34-0410-b5e6-96231b3b80d8
This patch adds scalar/subvector BROADCAST handling to EltsFromConsecutiveLoads.
It mainly shows codegen changes to 32-bit code which failed to handle i64 loads, although 64-bit code is also using this new path to more efficiently combine to a broadcast load.
Differential Revision: https://reviews.llvm.org/D58053
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@354340 91177308-0d34-0410-b5e6-96231b3b80d8
Re-organise calling convention tests to prepare for ilp32f and ilp32d hard
float ABI tests. It's also clear that we need to introduce similar tests for
lp64.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@354323 91177308-0d34-0410-b5e6-96231b3b80d8
Legalize/select llvm.ctlz.*
Add select-ctlz to show that we actually select them. Update arm64-clrsb.ll and
arm64-vclz.ll to show that we perform valid transformations in optimized builds,
and document where GISel can improve.
Differential Revision: https://reviews.llvm.org/D58155
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@354299 91177308-0d34-0410-b5e6-96231b3b80d8
The motivating x86 cases for forming the intrinsic are shown in PR31754 and PR40487:
https://bugs.llvm.org/show_bug.cgi?id=31754https://bugs.llvm.org/show_bug.cgi?id=40487
..and those are shown in the IR test file and x86 codegen file.
Matching the usubo pattern is harder than uaddo because we have 2 independent values rather than a def-use.
This adds a TLI hook that should preserve the existing behavior for uaddo formation, but disables usubo
formation by default. Only x86 overrides that setting for now although other targets will likely benefit
by forming usbuo too.
Differential Revision: https://reviews.llvm.org/D57789
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@354298 91177308-0d34-0410-b5e6-96231b3b80d8
Similar to D57867 - this is a small patch with lots of test diffs.
With half-vector-width narrowing potential, using an extract + 128-bit vshufps
is a win because it replaces a 256-bit shuffle with a 128-bit shufle.
This seems like it should be a win even for targets with 'fast-variable-shuffle',
but we are intentionally deferring that to an independent change to make sure
that is true.
Differential Revision: https://reviews.llvm.org/D58181
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@354279 91177308-0d34-0410-b5e6-96231b3b80d8