Summary:
`getConstantEvolutionLoopExitValue` and `ComputeExitCountExhaustively`
assumed all phi nodes in the loop header have the same order of incoming
values. This is not correct, and this commit changes
`getConstantEvolutionLoopExitValue` and `ComputeExitCountExhaustively`
to lookup the backedge value of a phi node using the loop's latch block.
Unfortunately, there is still some code duplication
`getConstantEvolutionLoopExitValue` and `ComputeExitCountExhaustively`.
At some point in the future we should extract out a helper class /
method that can evolve constant evolution phi nodes across iterations.
Fixes 25060. Thanks to Mattias Eriksson for the spot-on analysis!
Depends on D13457.
Reviewers: atrick, hfinkel
Subscribers: materi, llvm-commits
Differential Revision: http://reviews.llvm.org/D13458
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@249712 91177308-0d34-0410-b5e6-96231b3b80d8
This fixes yet another scenario where tryBuildVectorShuffle would
attempt to create a BUILD_VECTOR node with an invalid combination
of types. This can happen if the incoming BUILD_VECTOR has elements
of a type different from the vector element type, which is allowed
in certain cases as long as they are all the same type.
When one of these elements is used in the residual vector, and
UNDEF elements are added to fill up the residual vector, those
UNDEFs then have to use the type of the original element, not
the vector element type, or else the resulting BUILD_VECTOR
will have an invalid type combination.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@249706 91177308-0d34-0410-b5e6-96231b3b80d8
This is a partial fix for PR24886:
https://llvm.org/bugs/show_bug.cgi?id=24886
Without this IR transform, the backend (x86 at least) was producing inefficient code.
This patch is making 2 assumptions:
1. The canonical form of a fabs() operation is, in fact, the LLVM fabs() intrinsic.
2. The high bit of an FP value is always the sign bit; as noted in the bug report, this isn't specified by the LangRef.
Differential Revision: http://reviews.llvm.org/D13076
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@249702 91177308-0d34-0410-b5e6-96231b3b80d8
This was requested in D13076: if we're going to canonicalize to fabs(), ValueTracking
should know that fabs() clears sign bits.
In this patch (as in D13076), we're not handling vectors yet even though computeKnownBits'
fabs() case itself should be vector-ready via the splat in this patch.
Fixing this will require follow-on patches to correct other logic that uses 'getScalarType'.
Differential Revision: http://reviews.llvm.org/D13222
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@249701 91177308-0d34-0410-b5e6-96231b3b80d8
Problem was in SearchPathW function that does not attach an extension if file already has one.
That does not work for executables like ld.lld2 for example which require to have .exe extension but SearchPath thinks that its "lld2".
Solution was to add the extension manually.
Differential Revision: http://reviews.llvm.org/D13536
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@249696 91177308-0d34-0410-b5e6-96231b3b80d8
Summary:
Adds support for automatically detecting and printing strings
represented by Array abbrev operands, analogous to the string dumping
performed for Blob abbrev operands.
Enhanced the ThinLTO combined index test to check for the appropriate
module and function strings.
Reviewers: dexonsmith, joker.eph, davidxl
Subscribers: llvm-commits
Differential Revision: http://reviews.llvm.org/D13553
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@249695 91177308-0d34-0410-b5e6-96231b3b80d8
We emit 1 compact unwind encoding per function, and this can’t represent
the varying stack pointer that will be generated by X86CallFrameOptimization.
Disable the optimization on Darwin.
(It might be possible to split the function into multiple ranges
and emit 1 compact unwind info per range. The compact unwind emission
code isn’t ready for that and this kind of info certainly isn’t
tested/used anywhere. It might be worth exploring this path if we want
to get the space savings at some point though)
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@249694 91177308-0d34-0410-b5e6-96231b3b80d8
Removed an unused abbrev op in the VST_CODE_COMBINED_FNENTRY abbrev.
I noticed while writing/testing an array string dumper for
llvm-bcanalyze that the combined function's VST entry abbrevs contained
an old field that I am not using. Everything was working fine since the
bitcode writer and reader were in sync on how the record fields were
actually being set up and interpreted.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@249691 91177308-0d34-0410-b5e6-96231b3b80d8
Instead of bailing out when we see an icmp, we can instead at least
say that if the upper bits of both operands are known zero, they are
not demanded. This doesn't help with signed comparisons, but it's at
least better than bailing out.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@249687 91177308-0d34-0410-b5e6-96231b3b80d8
Like adds and subtracts, muls ripple only to the left so we can use
the same logic.
While we're here, add a print method to DemandedBits so it can be used
with -analyze, which we'll use in the testcase.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@249686 91177308-0d34-0410-b5e6-96231b3b80d8
The algorithm itself is still eager, but it doesn't get run until a
query function is called. This greatly reduces the compile-time impact
of requiring DemandedBits when at runtime it is not often used.
NFCI.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@249685 91177308-0d34-0410-b5e6-96231b3b80d8
This fixes two separate bugs:
1) The mask for the high lane was not set correctly. That fixes PR24532.
2) The transformation should bail out if it believes it involves more than
2 lanes, as it does not currently do anything sensible in this case.
Differential Revision: http://reviews.llvm.org/D13505
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@249669 91177308-0d34-0410-b5e6-96231b3b80d8
Compare elimination extended to recognize load-and-test instructions used
for comparison and eliminate them the same way as with compare instructions.
Test case fp-cmp-05.ll updated to expect optimized results now also for z13.
The order of instruction shortening and compare elimination passes have been
changed so that opcodes do not have to be handled in both passes.
Reviewed by Ulrich Weigand.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@249666 91177308-0d34-0410-b5e6-96231b3b80d8
The following instruction shortening transformations would introduce a
definition of the CC reg, so therefore liveness of CC reg must be checked:
WFADB -> ADBR
WFSDB -> SDBR
Also add the CC reg implicit def operand to the MI in case of change of opcode.
Reviewed by Ulrich Weigand.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@249665 91177308-0d34-0410-b5e6-96231b3b80d8
Since the LTxBRCompare instructions can't be used with vector registers, a
normal load-and-test instruction (with a modelled def operand) is used instead.
Reviewed by Ulrich Weigand.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@249664 91177308-0d34-0410-b5e6-96231b3b80d8
Comparing `Pred` with `ICmpInst::ICMP_ULT` is cheaper that memory access
-- do that check before loading / storing `ProvingSplitPredicate`.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@249654 91177308-0d34-0410-b5e6-96231b3b80d8
This patch adds support for reading sample profiles with inline stacks.
Inline stacks in a profile are generated when the sampled binary has
samples in inlined functions.
For instance, if main() calls foo() and foo() calls bar(), and bar() is
inlined into foo() and foo() inlined into main(), the profile may look
something like:
main total:364084 head:0
[ ... ]
2.3: _Z3fool total:243786
1: 60149
1.2: 38568
1.4: 46511
1.7: _Z3bari total:98558
1.1: 52672
1.2: 45886
At line 2, discriminator 3, main() calls foo(). In turn, foo() calls
bar() at line 1, discriminator 7.
In the textual format, this stacking of inline calls is represented
with indentation.
With this change, LLVM can now read sample profile files generated by
the create_gcov tool from https://github.com/google/autofdo.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@249644 91177308-0d34-0410-b5e6-96231b3b80d8
In r224059, we started verifying after addPass, but missed doing so on
insertPass. There isn't a good reason for the discrepancy, and
skipping the verifier in these cases causes bugs.
This also exposes a verifier error that was introduced in r249087, but
the verifier doesn't run until after the register coalescer, when the
issue happens to have been resolved. I've skipped the verifier after
SIFixSGPRLiveRangesID to avoid the failures for now and will follow up
with Matt for a proper fix.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@249643 91177308-0d34-0410-b5e6-96231b3b80d8
In particular, passing non-trivially copyable objects by value on win32
uses a dynamic alloca (inalloca). We would clobber ESP in the epilogue
and end up returning to outer space.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@249637 91177308-0d34-0410-b5e6-96231b3b80d8
llvm-nm only needs the target to parse module level assembly in bitcode. It doesn't need a disassembler or codegen.
llvm-objdump needs to be able to disassemble a file, but doesn't need asm parsers or codegen.
This reduces the sizes of these tools by a few MB each, depending on how many backends are linked in.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@249632 91177308-0d34-0410-b5e6-96231b3b80d8
Previously the CompileOnDemand layer always created single-function partitions.
In theory this new API allows for more interesting partitions, though this has
not been well tested yet.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@249623 91177308-0d34-0410-b5e6-96231b3b80d8
The relocation for the filter funclet will be against a symbol table
entry for a function instead of the section, making it easier to
understand what is going on.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@249621 91177308-0d34-0410-b5e6-96231b3b80d8
The __CxxFrameHandler3 tables for 32-bit are supposed to hold stack
offsets relative to EBP, not ESP. I blindly updated the win-catchpad.ll
test case, and immediately noticed that 32-bit catching stopped working.
While I'm at it, move the frame index to frame offset WinEH table logic
out of PEI. PEI shouldn't have to know about WinEHFuncInfo. I realized
we can calculate frame index offsets just fine from the table printer.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@249618 91177308-0d34-0410-b5e6-96231b3b80d8
We remove unreachable blocks because it is pointless to consider them
for coloring. However, we still had stale pointers to these blocks in
some data structures after we removed them from the function.
Instead, remove the unreachable blocks before attempting to do anything
with the function.
This fixes PR25099.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@249617 91177308-0d34-0410-b5e6-96231b3b80d8
Recycler just needs a singly-linked list, and it takes less (and
simpler) code to hand-roll one of those than to build up the equivalent
`iplist_traits`. In theory, this should speed things up a bit too, but
this is really just a drive-by cleanup so I haven't measured.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@249615 91177308-0d34-0410-b5e6-96231b3b80d8
Create `SymbolTableList`, a wrapper around `iplist` for lists that
automatically manage a symbol table. This commit reduces a ton of code
duplication between the six traits classes that were used previously.
As a drive by, reduce the number of template parameters from 2 to 1 by
using a SymbolTableListParentType metafunction (I originally had this as
a separate commit, but it touched most of the same lines so I squashed
them).
I'm in the process of trying to remove the UB in `createSentinel()` (see
the FIXMEs I added for `ilist_embedded_sentinel_traits` and
`ilist_half_embedded_sentinel_traits`). My eventual goal is to separate
the list logic into a base class layer that knows nothing about (and
isn't templated on) the downcasted nodes -- removing the need to invoke
UB -- but for now I'm just trying to get a handle on all the current use
cases (and cleaning things up as I see them).
Besides these six SymbolTable lists, there are two others that use the
addNode/removeNode/transferNodes() hooks: the `MachineInstruction` and
`MachineBasicBlock` lists. Ideally there'll be a way to factor these
hooks out of the low-level API entirely, but I'm not quite there yet.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@249602 91177308-0d34-0410-b5e6-96231b3b80d8
Summary:
This adds some more routines to `IRBuilder` around creating calls and
invokes to `gc.statepoint`. These will be used later.
Reviewers: reames, swaroop.sridhar
Subscribers: sanjoy, llvm-commits
Differential Revision: http://reviews.llvm.org/D13371
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@249596 91177308-0d34-0410-b5e6-96231b3b80d8
Summary:
This is necessary to keep the cloner from making bogus copies of debug
metadata attached to the IR it is cloning.
Also, avoid running RemapInstruction over all instructions in the common
case that no cloning was performed.
Reviewers: rnk, andrew.w.kaylor, majnemer
Subscribers: llvm-commits
Differential Revision: http://reviews.llvm.org/D13514
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@249591 91177308-0d34-0410-b5e6-96231b3b80d8
This reverts commit r249528 and reapply r249431. The fix for the
fallout has been commited in r249575.
From: Mehdi Amini <mehdi.amini@apple.com>
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@249581 91177308-0d34-0410-b5e6-96231b3b80d8