Summary:
This "pass" eagerly creates div and rem instructions even when only one
is needed -- it relies on a later pass (machine DCE?) to clean them up.
This is problematic not just from a cleanliness perspective (this pass
is running during CodeGenPrepare, so should leave the IR in a better
state), but it also creates a problem for instruction selection. If we
always have a div+rem, isel will always select a divrem instruction (if
possible), even when a single div or rem would do.
Specifically, in NVPTX, we want to compute rem from the output of div,
if available. But if a div is not available, we want to leave the rem
alone. This transformation is overeager if div is always available.
Because this code runs as part of CodeGenPrepare, it's nontrivial to
write a test for this change. But this will effectively be tested by
a later patch which adds the aforementioned change to NVPTX isel.
Reviewers: tra
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D26088
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@285460 91177308-0d34-0410-b5e6-96231b3b80d8
Summary:
In BypassSlowDivision's short-dividend path, we would create e.g.
udiv exact i32 %a, %b
"exact" here means that we are asserting that %a is a multiple of %b.
But we have no reason to believe this must be true -- this is just a
bug, as far as I can tell.
Reviewers: tra
Subscribers: jholewinski, llvm-commits
Differential Revision: https://reviews.llvm.org/D26097
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@285459 91177308-0d34-0410-b5e6-96231b3b80d8
When LivePhysRegs adds live-in registers, it recognizes ~0 as a special
lane mask indicating the entire register. If the lane mask is not ~0,
it will only add the subregisters that overlap the specified lane mask.
The problem is that if a live-in register does not have subregisters,
and the lane mask is not ~0, it will not be added to the live set.
(The given lane mask may simply be the lane mask of its register class.)
If a register does not have subregisters, add it to the live set if
the lane mask is non-zero.
Differential Revision: https://reviews.llvm.org/D26094
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@285440 91177308-0d34-0410-b5e6-96231b3b80d8
It's possible to have a use of the private resource descriptor or
scratch wave offset registers even though there are no allocated
stack objects. This would result in continuing to use the maximum
number reserved registers. This could go over the number of SGPRs
available on VI, or violate the SGPR limit requested by
the function attributes.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@285435 91177308-0d34-0410-b5e6-96231b3b80d8
This is actually a deficiency in ValueTracking's matchSelectPattern(),
but a codegen test is the simplest way to expose the bug.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@285429 91177308-0d34-0410-b5e6-96231b3b80d8
Do not use LiveIntervals to recalculate kills, because that cannot be
done accurately without implicit uses on predicated instructions.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@285409 91177308-0d34-0410-b5e6-96231b3b80d8
Summary:
We were trying to add APInt values with different bit sizes after
visiting an addrspacecast instruction which changed the bit width
of the pointer.
Reviewers: majnemer, hfinkel
Subscribers: hfinkel, wdng, llvm-commits
Differential Revision: https://reviews.llvm.org/D24774
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@285407 91177308-0d34-0410-b5e6-96231b3b80d8
Fixes PR 30784. Discussed with Justin, who pointed out that
in the new PassManager infrastructure we can have more fine-grained
control on which analyses we want to preserve, but this is the
best we can do with the current infrastructure.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@285380 91177308-0d34-0410-b5e6-96231b3b80d8
Programs with very large __llvm_covmap sections may fail to link on
Darwin because because of out-of-range 32-bit RIP relative references.
It isn't possible to work around this by using the large code model
because it isn't supported on Darwin. One solution is to move the
__llvm_covmap section past the end of the __DATA segment.
=== Testing ===
In addition to check-{llvm,clang,profile}, I performed a link test on a
simple object after injecting ~4GB of padding into __llvm_covmap:
@__llvm_coverage_padding = internal constant [4000000000 x i8] zeroinitializer, section "__LLVM_COV,__llvm_covmap", align 8
(This test is too expensive to check-in.)
=== Backwards Compatibility ===
This patch should not pose any backwards-compatibility concerns. LLVM
is expected to scan all of the sections in a binary for __llvm_covmap,
so changing its segment shouldn't affect anything. I double-checked this
by loading coverage produced by an unpatched compiler with a patched
llvm-cov.
Suggested by Nick Kledzik.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@285360 91177308-0d34-0410-b5e6-96231b3b80d8
In the past the compiler always emitted .debug_line version 2, though some opcodes from DWARF 3 (e.g. DW_LNS_set_prologue_end, DW_LNS_set_epilogue_begin or DW_LNS_set_isa) and from DWARF 4 could be emitted by the compiler.
This patch changes version information of .debug_line to exactly match the DWARF version. For .debug_line version 4, a new field maximum_operations_per_instruction is emitted.
Differential Revision: https://reviews.llvm.org/D16697
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@285355 91177308-0d34-0410-b5e6-96231b3b80d8
obsolete load commands.
Again the philosophy of the error checking in libObject for
Mach-O files, the idea behind the checking is that we never
will return a Mach-O file out of libObject that contains unknown
things the library code can’t operate on. So known obsolete
load commands will cause a hard error.
Also to make things clear I have added comments to the
values and structures in Support/Mach-O.h and
Support/MachO.def as to what is obsolete.
As noted in a TODO in the code, there may need to be a
non-default mode to allow some unknown values for well
structured Mach-O files with things like unknown load
load commands. So things like using an old lldb on a newer
Mach-O file could still provide some limited functionality.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@285342 91177308-0d34-0410-b5e6-96231b3b80d8
This testcase was originally part of r284995, but I put it in a wrong directory.
So I removed it. Before adding it back I did some small enhancements. Also I
changed the assertions a little bit, to take into account the impact of some
changes performed since code review is done.
This is similar to changes done for another testcase in the original commit.
See: https://reviews.llvm.org/D23614#577749
Basically for instead of vxor we now generate xxlxor in some cases, which is
better.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@285333 91177308-0d34-0410-b5e6-96231b3b80d8
The Windows ARM target expects the compiler to emit a division-by-zero check.
The check would use the form of:
cmp r?, #0
cbz .Ltrap
b .Lbody
.Lbody:
...
.Ltrap:
udf #249 @ __brkdiv0
This works great most of the time. However, if the body of the function is
greater than 127 bytes, the branch target limitation of cbz becomes an issue.
This occurs in the unoptimized code generation cases sometimes (like in
compiler-rt).
Since this is a matter of correctness, possibly pay a small penalty instead. We
now form this slightly differently:
cbnz .Lbody
udf #249 @ __brkdiv0
.Lbody:
...
The positive case is through the branch instead of being the next instruction.
However, because of the basic block layout, the negated branch is going to be
a short distance always (2 bytes away, after the inserted __brkdiv0).
The new t__brkdiv0 instruction is required to explicitly mark the instruction as
a terminator as the generic UDF instruction is not a terminator.
Addresses PR30532!
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@285312 91177308-0d34-0410-b5e6-96231b3b80d8
Summary: LICM may hoist instructions to preheader speculatively. Before code generation, we need to sink down the hoisted instructions inside to loop if it's beneficial. This pass is a reverse of LICM: looking at instructions in preheader and sinks the instruction to basic blocks inside the loop body if basic block frequency is smaller than the preheader frequency.
Reviewers: hfinkel, davidxl, chandlerc
Subscribers: anna, modocache, mgorny, beanz, reames, dberlin, chandlerc, mcrosier, junbuml, sanjoy, mzolotukhin, llvm-commits
Differential Revision: https://reviews.llvm.org/D22778
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@285308 91177308-0d34-0410-b5e6-96231b3b80d8
r282428 added the MipsOptimizePICCall as an opt-in pass that can be
skipped when using the -opt-bisect-limit option. However, this pass is
needed because it generates code that conforms to the o32 ABI
specification by using the $t9 register for PIC calls with JALR
instructions.
This bug was exposed by the fact that skipFunction() also checks for
the "optnone" attribute. This caused functions with that attribute to
break the requirements of the o32 ABI.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@285305 91177308-0d34-0410-b5e6-96231b3b80d8
Currently computeKnownBits returns the common known zero/one bits for all elements of vector data, when we may only be interested in one/some of the elements.
This patch adds a DemandedElts argument that allows us to specify the elements we actually care about. The original computeKnownBits implementation calls with a DemandedElts demanding all elements to match current behaviour. Scalar types set this to 1.
The approach was found to be easier than trying to add a per-element known bits solution, for a similar usefulness given the combines where computeKnownBits is typically used.
I've only added support for a few opcodes so far (the ones that have proven straightforward to test), all others will default to demanding all elements but can be updated in due course.
DemandedElts support could similarly be added to computeKnownBitsForTargetNode in a future commit.
Differential Revision: https://reviews.llvm.org/D25691
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@285296 91177308-0d34-0410-b5e6-96231b3b80d8
After successfull horizontal reduction vectorization attempt for PHI node
vectorizer tries to update root binary op by combining vectorized tree
and the ReductionPHI node. But during vectorization this ReductionPHI
can be vectorized itself and replaced by the `undef` value, while the
instruction itself is marked for deletion. This 'marked for deletion'
PHI node then can be used in new binary operation, causing "Use still
stuck around after Def is destroyed" crash upon PHI node deletion.
Also the test is fixed to make it perform actual testing.
Differential Revision: https://reviews.llvm.org/D25671
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@285286 91177308-0d34-0410-b5e6-96231b3b80d8
Elf.h already has code checking that section table does not go past end of file.
Problem is that this check may not work on values greater than UINT64_MAX / Header->e_shentsize
because of calculation overflow.
Parch fixes the issue.
Differential revision: https://reviews.llvm.org/D25432
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@285285 91177308-0d34-0410-b5e6-96231b3b80d8
UMAAL is a DSP instruction and it is not available on thumbv7m
(Cortex-M3) and thumbv6m (Cortex-M0+1) targets. Also fix wrong
CHECK prefix in longMAC.ll test.
Patch by Vadzim Dambrouski.
Differential Revision: https://reviews.llvm.org/D25890
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@285278 91177308-0d34-0410-b5e6-96231b3b80d8