Summary:
This allows us to use virtual registers when we need extra registers
for inserting spill instructions in SIRegisterInfo:eliminateFrameIndex().
Once all the frame indices have been eliminated, the
PrologEpilogueInserter does an extra pass over the program to replace
all virtual registers with physical ones.
This allows us to make more efficient use of our emergency spill slots,
so we only need to create one.
Reviewers: arsenm
Subscribers: arsenm, llvm-commits
Differential Revision: http://reviews.llvm.org/D17591
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@262728 91177308-0d34-0410-b5e6-96231b3b80d8
Summary:
Since IR files are all compiled into separate independent object files
in ThinLTO mode, the prevailing linkonce symbols must be emitted in its
object file even if it is no longer referenced there, e.g. if no
references remain in the module after inlining, since it may be
referenced by another ThinLTO compiled object file. This is done by
changing LDPR_PREVAILING_DEF_IRONLY* symbols to LDPR_PREVAILING_DEF,
which converts the prevailing linkonce to weak. We also don't need the
other prevailing IRONLY handling for internalization, which is not
currently performed for ThinLTO.
Test case included.
Reviewers: davidxl, rafael
Subscribers: rafael, joker.eph, llvm-commits
Differential Revision: http://reviews.llvm.org/D16173
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@262727 91177308-0d34-0410-b5e6-96231b3b80d8
Summary:
Launch ThinLTO backends (LTO and codegen pipelines with importing) in
parallel using a ThreadPool, after creating the combined index.
The number of threads is controlled by the existing -jobs gold plugin
option, or the hardware concurrency if not specified.
The old behavior of exiting after creating the combined index can be
invoked via a new thinlto-index-only plugin option.
This commit involves just the ThinLTO-specific pieces of D15390, the NFC
and other restructuring pieces were committed independently:
r262677: Add hardware_concurrency interface to llvm::thread (NFC)
r262719: Change split code gen to use ThreadPool
r262721: Refactor gold-plugin codegen to prepare for ThinLTO threads (NFC)
Reviewers: pcc, joker.eph, rafael
Subscribers: rafael, davidxl, llvm-commits, joker.eph
Differential Revision: http://reviews.llvm.org/D15390
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@262724 91177308-0d34-0410-b5e6-96231b3b80d8
This is the NFC part remaining from D15390, which refactors the
current codegen() into a CodeGen class with various modular methods and
other helper functions that will be used by the follow-on ThinLTO piece.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@262721 91177308-0d34-0410-b5e6-96231b3b80d8
We have known UB in some ilists where we static cast half nodes to
(larger) derived types and use the address. See llvm.org/PR26753.
This needs to be fixed, but in the meantime it'd be nice if running
ubsan didn't complain. This adds annotations in the two places where
ubsan complains while running check-all of a sanitized clang build.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@262683 91177308-0d34-0410-b5e6-96231b3b80d8
The vast majority of LiveRanges (ie, 4/5) have exactly 1 segment and 1
value number, and a good chunk of the rest have 2 of each, so
allocating space for 4 is wasteful. This is especially noticeable when
dealing with a very large number of vregs, and I have an internal case
where dropping this to 2 shaves over 5% off of peak memory when
compiling a particularly large function.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@262681 91177308-0d34-0410-b5e6-96231b3b80d8
Gold has a newly added LDPT_GET_SYMBOLS_V3 callback that can
distinguish between a module that is not included in the link, and
one that is included but has its entire interface preempted by others.
Fixes PR26674.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@262676 91177308-0d34-0410-b5e6-96231b3b80d8
The RTDyldMemoryManager::getSymbolAddressInProcess method accepts a
linker-mangled symbol name, but it calls through to dlsym to do the lookup (via
DynamicLibrary::SearchForAddressOfSymbol), and dlsym expects an unmangled
symbol name.
Historically we've attempted to "demangle" by removing leading '_'s on all
platforms, and fallen back to an extra search if that failed. That's broken, as
it can cause symbols to resolve incorrectly on platforms that don't do mangling
if you query '_foo' and the process also happens to contain a 'foo'.
Fix this by demangling conditionally based on the host platform. That's safe
here because this function is specifically for symbols in the host process, so
the usual cross-process JIT looking concerns don't apply.
M unittests/ExecutionEngine/ExecutionEngineTest.cpp
M lib/ExecutionEngine/RuntimeDyld/RTDyldMemoryManager.cpp
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@262657 91177308-0d34-0410-b5e6-96231b3b80d8
This experiment was originally about trying to use facts implied dominating conditions to infer more precise known bits. While the compile time was found to be acceptable on several large code bases, we never found sufficiently profitable examples to justify turning on the code by default. Given this, it's time to abandon the experiment.
Several folks have commented that they've found this useful for experimentation, but nothing has come of those experiments. Given how easy the patch is to apply, there's no reason to leave the code in tree.
For anyone interested in further investigation in this area, I recommend finding the summary email I sent on one of the original review threads. In particular, I now believe the use-list based approach is strictly worse than the dom-tree-walking approach.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@262646 91177308-0d34-0410-b5e6-96231b3b80d8
Given that we're not actually reducing the instruction count in the included
regression tests, I think we would call this a canonicalization step.
The motivation comes from the example in PR26702:
https://llvm.org/bugs/show_bug.cgi?id=26702
If we hoist the bitwise logic ahead of the bitcast, the previously unoptimizable
example of:
define <4 x i32> @is_negative(<4 x i32> %x) {
%lobit = ashr <4 x i32> %x, <i32 31, i32 31, i32 31, i32 31>
%not = xor <4 x i32> %lobit, <i32 -1, i32 -1, i32 -1, i32 -1>
%bc = bitcast <4 x i32> %not to <2 x i64>
%notnot = xor <2 x i64> %bc, <i64 -1, i64 -1>
%bc2 = bitcast <2 x i64> %notnot to <4 x i32>
ret <4 x i32> %bc2
}
Simplifies to the expected:
define <4 x i32> @is_negative(<4 x i32> %x) {
%lobit = ashr <4 x i32> %x, <i32 31, i32 31, i32 31, i32 31>
ret <4 x i32> %lobit
}
Differential Revision: http://reviews.llvm.org/D17583
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@262645 91177308-0d34-0410-b5e6-96231b3b80d8
Exploit ScalarEvolution::getRange's newly acquired smartness (since
r262438) by using that to infer nsw and nuw when possible.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@262639 91177308-0d34-0410-b5e6-96231b3b80d8
After r262438 we can have provably positive NSW SCEV expressions whose
zero extensions cannot be simplified (since r262438 makes SCEV better at
computing constant ranges). This means demoting sexts of positive add
recurrences eagerly can result in an unsimplified zero extension where
we could have had a simplified sign extension. This change fixes the
issue by teaching SCEV to demote sext of a positive SCEV expression to a
zext only if the sext could not be simplified.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@262638 91177308-0d34-0410-b5e6-96231b3b80d8
This will be used in a later patch to ScalarEvolution. Right now only
the unit tests exercise the newly added code.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@262637 91177308-0d34-0410-b5e6-96231b3b80d8
This patch provides the following infrastructure for PGO enhancements in inliner:
Enable the use of block level profile information in inliner
Incremental update of block frequency information during inlining
Update the function entry counts of callees when they get inlined into callers.
Differential Revision: http://reviews.llvm.org/D16381
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@262636 91177308-0d34-0410-b5e6-96231b3b80d8
The variable mask form of VPERMILPD/VPERMILPS were only partially implemented, with much of it still performed as an intrinsic.
This patch properly defines the instructions in terms of X86ISD::VPERMILPV, permitting the opcode to be easily combined as a target shuffle.
Differential Revision: http://reviews.llvm.org/D17681
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@262635 91177308-0d34-0410-b5e6-96231b3b80d8
Summary: With discriminator, LineLocation can uniquely identify a callsite without the need to specifying callee name. Remove Callee function name from the key, and put it in the value (FunctionSamples).
Reviewers: davidxl, dnovillo
Subscribers: llvm-commits
Differential Revision: http://reviews.llvm.org/D17827
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@262634 91177308-0d34-0410-b5e6-96231b3b80d8
lowerShift was manually splitting BUILD_VECTOR cases when it could just call Extract128BitVector which does this anyway.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@262633 91177308-0d34-0410-b5e6-96231b3b80d8
That's not the case for VPERMV/VPERMV3, which cover all possible
combinations (the C intrinsics use a different order; the AVX vs
AVX512 intrinsics are different still).
Since:
r246981 AVX-512: Lowering for 512-bit vector shuffles.
VPERMV is recognized in getTargetShuffleMask.
This breaks assumptions in most callers, as they expect
the non-mask operands to start at index 0.
VPERMV has the mask as operand #0; VPERMV3 has it in the middle.
Instead of the faulty assumption, have getTargetShuffleMask return
its operands as well.
One alternative we considered was to change the operand order of
VPERMV, but we agreed to stick to the instruction order, as there
are more AVX512 weirdness to cover (vpermt2/vpermi2 in particular).
Differential Revision: http://reviews.llvm.org/D17041
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@262627 91177308-0d34-0410-b5e6-96231b3b80d8
The vectorization of first-order recurrences (r261346) caused PR26734. When
detecting these recurrences, we need to ensure that the previous value is
actually defined inside the loop. This patch includes the fix and test case.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@262624 91177308-0d34-0410-b5e6-96231b3b80d8
This test failed in some ARM bots after a divmod change because it was
running on a native llc, instead of targeted one. This makes sure the test
is target-specific (as intended), and also copies to ARM and AArch64
directories. If it is also supposed to work on other architectures, I'll
leave as an exercise to the respective maintainers.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@262620 91177308-0d34-0410-b5e6-96231b3b80d8
Generalise the existing SIGN_EXTEND to SIGN_EXTEND_VECTOR_INREG combine to support zero extension as well and get rid of a lot of unnecessary ANY_EXTEND + mask patterns.
Differential Revision: http://reviews.llvm.org/D17691
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@262599 91177308-0d34-0410-b5e6-96231b3b80d8