insertUniqueBackedgeBlock in lib/Transforms/Utils/LoopSimplify.cpp now
propagates existing llvm.loop metadata to newly the added backedge.
llvm::TryToSimplifyUncondBranchFromEmptyBlock in lib/Transforms/Utils/Local.cpp
now propagates existing llvm.loop metadata to the branch instructions in the
predecessor blocks of the empty block that is removed.
Differential Revision: https://reviews.llvm.org/D26495
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@287341 91177308-0d34-0410-b5e6-96231b3b80d8
This is a straightforward extension of the existing support for 32/64-bit element types. Just needed to add the additional instrinsics to the switches.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@287316 91177308-0d34-0410-b5e6-96231b3b80d8
This patch updates a bunch of places where add_dependencies was being explicitly called to add dependencies on intrinsics_gen to instead use the DEPENDS named parameter. This cleanup is needed for a patch I'm working on to add a dependency debugging mode to the build system.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@287206 91177308-0d34-0410-b5e6-96231b3b80d8
Summary:
For flat loop, even if it is hot, it is not a good idea to unroll in runtime, thus we set a lower partial unroll threshold.
For hot loop, we set a higher unroll threshold and allows expensive tripcount computation to allow more aggressive unrolling.
Reviewers: davidxl, mzolotukhin
Subscribers: sanjoy, mehdi_amini, llvm-commits
Differential Revision: https://reviews.llvm.org/D26527
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@287186 91177308-0d34-0410-b5e6-96231b3b80d8
Summary: These intrinsics have been unused for clang for a while. This patch removes them. We auto upgrade them to extractelements, a scalar operation and then an insertelement. This matches the sequence used by clangs intrinsic file.
Reviewers: zvi, delena, RKSimon
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D26660
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@287083 91177308-0d34-0410-b5e6-96231b3b80d8
Summary:
We don't do BypassSlowDivision when the denominator is a constant, but
we do do it when the numerator is a constant.
This patch makes two related changes to BypassSlowDivision when the
numerator is a constant:
* If the numerator is too large to fit into the bypass width, don't
bypass slow division (because we'll never run the smaller-width
code).
* If we bypass slow division where the numerator is a constant, don't
OR together the numerator and denominator when determining whether
both operands fit within the bypass width. We need to check only the
denominator.
Reviewers: tra
Subscribers: llvm-commits, jholewinski
Differential Revision: https://reviews.llvm.org/D26699
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@287062 91177308-0d34-0410-b5e6-96231b3b80d8
This patch adds support for instrumenting masked loads and stores under
ASan, if they have a constant mask.
isInterestingMemoryAccess now supports returning a mask to be applied to
the loads, and instrumentMop will use it to generate additional checks.
Added tests for v4i32 v8i32, and v4p0i32 (~v4i64) for both loads and
stores (as well as a test to verify we don't add checks to non-constant
masks).
Differential Revision: https://reviews.llvm.org/D26230
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@287047 91177308-0d34-0410-b5e6-96231b3b80d8
In RateRegister of existing LSR, if a formula contains a Reg which is a SCEVAddRecExpr,
and this SCEVAddRecExpr's loop is an outerloop, the formula will be marked as Loser
and dropped.
Suppose we have an IR that %for.body is outerloop and %for.body2 is innerloop. LSR only
handle inner loop now so only %for.body2 will be handled.
Using the logic above, formula like
reg(%array) + reg({1,+, %size}<%for.body>) + 1*reg({0,+,1}<%for.body2>) will be dropped
no matter what because reg({1,+, %size}<%for.body>) is a SCEVAddRecExpr type reg related
with outerloop. Only formula like
reg(%array) + 1*reg({{1,+, %size}<%for.body>,+,1}<nuw><nsw><%for.body2>) will be kept
because the SCEVAddRecExpr related with outerloop is folded into the initial value of the
SCEVAddRecExpr related with current loop.
But in some cases, we do need to share the basic induction variable
reg{0 ,+, 1}<%for.body2> among LSR Uses to reduce the final total number of induction
variables used by LSR, so we don't want to drop the formula like
reg(%array) + reg({1,+, %size}<%for.body>) + 1*reg({0,+,1}<%for.body2>) unconditionally.
From the existing comment, it tries to avoid considering multiple level loops at the same time.
However, existing LSR only handles innermost loop, so for any SCEVAddRecExpr with a loop other
than current loop, it is an invariant and will be simple to handle, and the formula doesn't have
to be dropped.
Differential Revision: https://reviews.llvm.org/D26429
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@286999 91177308-0d34-0410-b5e6-96231b3b80d8
When both WidenIV::getWideRecurrence and WidenIV::getExtendedOperandRecurrence
return non-null but different WideAddRec, if getWideRecurrence is called
before getExtendedOperandRecurrence, we won't bother to call
getExtendedOperandRecurrence again. But As we know it is possible that after
SCEV folding, we cannot prove the legality using the SCEVAddRecExpr returned
by getWideRecurrence. Meanwhile if getExtendedOperandRecurrence returns non-null
WideAddRec, we know for sure that it is legal to do widening for current instruction.
So it is better to put getExtendedOperandRecurrence before getWideRecurrence, which
will increase the chance of successful widening.
Differential Revision: https://reviews.llvm.org/D26059
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@286987 91177308-0d34-0410-b5e6-96231b3b80d8
Just needed to add the intrinsics to the exist switch. The code is generic enough to support the wider vectors with no changes.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@286980 91177308-0d34-0410-b5e6-96231b3b80d8
The register usage algorithm incorrectly treats instructions whose value is
not used within the loop (e.g. those that do not produce a value).
The algorithm first calculates the usages within the loop. It iterates over
the instructions in order, and records at which instruction index each use
ends (in fact, they're actually recorded against the next index, as this is
when we want to delete them from the open intervals).
The algorithm then iterates over the instructions again, adding each
instruction in turn to a list of open intervals. Instructions are then
removed from the list of open intervals when they occur in the list of uses
ended at the current index.
The problem is, instructions which are not used in the loop are skipped.
However, although they aren't used, the last use of a value may have been
recorded against that instruction index. In this case, the use is not deleted
from the open intervals, which may then bump up the estimated register usage.
This patch fixes the issue by simply moving the "is used" check after the loop
which erases the uses at the current index.
Differential Revision: https://reviews.llvm.org/D26554
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@286969 91177308-0d34-0410-b5e6-96231b3b80d8
This adds support for TSan C++ exception handling, where we need to add extra calls to __tsan_func_exit when a function is exitted via exception mechanisms. Otherwise the shadow stack gets corrupted (leaked). This patch moves and enhances the existing implementation of EscapeEnumerator that finds all possible function exit points, and adds extra EH cleanup blocks where needed.
Differential Revision: https://reviews.llvm.org/D26177
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@286893 91177308-0d34-0410-b5e6-96231b3b80d8
Summary:
We have always speculatively promoted all renamable local values
(except const non-address taken variables) for both the exporting
and importing module. We would then internalize them back based on
the ThinLink results if they weren't actually exported. This is
inefficient, and results in unnecessary renames. It also meant we
had to check the non-renamability of a value in the summary, which
was already checked during function importing analysis in the ThinLink.
Made renameModuleForThinLTO (which does the promotion/renaming) instead
use the index when exporting, to avoid unnecessary renames/promotions.
For importing modules, we can simply promoted all values as any local
we import by definition is exported and needs promotion.
This required changes to the method used by the FunctionImport pass
(only invoked from 'opt' for testing) and when invoked from llvm-link,
since neither does a ThinLink. We simply conservatively mark all locals
in the index as promoted, which preserves the current aggressive
promotion behavior.
I also needed to change an llvm-lto based test where we had previously
been aggressively promoting values that weren't importable (aliasees),
but now will not promote.
Reviewers: mehdi_amini
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D26467
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@286871 91177308-0d34-0410-b5e6-96231b3b80d8
Summary:
The change in r285513 to prevent exporting of locals used in
inline asm added all locals in the llvm.used set to the reference
set of functions containing inline asm. Since these locals were marked
NoRename, this automatically prevented importing of the function.
Unfortunately, this caused an explosion in the summary reference lists
in some cases. In my particular example, it happened for a large protocol
buffer generated C++ file, where many of the generated functions
contained an inline asm call. It was exacerbated when doing a ThinLTO
PGO instrumentation build, where the PGO instrumentation included
thousands of private __profd_* values that were added to llvm.used.
We really only need to include a single llvm.used local (NoRename) value
in the reference list of a function containing inline asm to block it
being imported. However, it seems cleaner to add a flag to the summary
that explicitly describes this situation, which is what this patch does.
Reviewers: mehdi_amini
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D26402
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@286840 91177308-0d34-0410-b5e6-96231b3b80d8
We were already testing is the op was not a leaf, so need to then test if it was a leaf (added it to the assert instead).
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@286817 91177308-0d34-0410-b5e6-96231b3b80d8
Summary:
Unfolding selects was previously done with the help of a vector
of pointers that was then sorted to be able to remove duplicates.
As this sorting depends on the memory addresses, it was
non-deterministic. A SetVector is used now so that duplicates are
removed without the need of sorting first.
Reviewers: mgrang, efriedma
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D26450
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@286807 91177308-0d34-0410-b5e6-96231b3b80d8
All existing callers were manually extracting information out of an existing
GEP instruction and passing it to getGEPExpr(). Simplify the interface by
changing it to take a GEPOperator instead.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@286751 91177308-0d34-0410-b5e6-96231b3b80d8
This implements a function annotation that disables TSan checking for the
function at run time. The benefit over attribute((no_sanitize("thread")))
is that the accesses within the callees will also be suppressed.
The motivation for this attribute is a guarantee given by the objective C
language that the calls to the reference count decrement and object
deallocation will be synchronized. To model this properly, we would need
to intercept all ref count decrement calls (which are very common in ObjC
due to use of ARC) and also every single message send. Instead, we propose
to just ignore all accesses made from within dealloc at run time. The main
downside is that this still does not introduce any synchronization, which
means we might still report false positives if the code that relies on this
synchronization is not executed from within dealloc. However, we have not seen
this in practice so far and think these cases will be very rare.
Differential Revision: https://reviews.llvm.org/D25858
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@286663 91177308-0d34-0410-b5e6-96231b3b80d8
This is PR28376.
Unfortunately given the current structure of optimization diagnostics we
lack the capability to tell whether the user has
passed -Rpass-analysis=loop-vectorize since this is local to the
front-end (BackendConsumer::OptimizationRemarkHandler).
So rather than printing this even if the user has already
passed -Rpass-analysis, this patch just punts and stops recommending
this option. I don't think that getting this right is worth the
complexity.
Differential Revision: https://reviews.llvm.org/D26563
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@286662 91177308-0d34-0410-b5e6-96231b3b80d8
This is a follow-up on the recent refactoring of the FunctionMerge pass.
It should fix a fail of the new FunctionComparator unittest whe compiling with MSVC.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@286648 91177308-0d34-0410-b5e6-96231b3b80d8
When a function pointer is replaced with a jumptable pointer, special
case is needed to preserve the semantics of extern_weak functions.
Since a jumptable entry can not be extern_weak, we emulate that
behaviour by replacing all references to F (the extern_weak function)
with the following expression: F != nullptr ? JumpTablePtr : nullptr.
Extra special care is needed for global initializers, since most (or
probably all) backends can not lower an initializer that includes
this kind of constant expression. Initializers like that are replaced
with a global constructor (i.e. a runtime initializer).
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@286636 91177308-0d34-0410-b5e6-96231b3b80d8
This is pure refactoring. NFC.
This change moves the FunctionComparator (together with the GlobalNumberState
utility) in to a separate file so that it can be used by other passes.
For example, the SwiftMergeFunctions pass in the Swift compiler:
https://github.com/apple/swift/blob/master/lib/LLVMPasses/LLVMMergeFunctions.cpp
Details of the change:
*) The big part is just moving code out of MergeFunctions.cpp into FunctionComparator.h/cpp
*) Make FunctionComparator member functions protected (instead of private)
so that a derived comparator class can use them.
Following refactoring helps to share code between the base FunctionComparator
class and a derived class:
*) Add a beginCompare() function
*) Move some basic function property comparisons into a separate function compareSignature()
*) Do the GEP comparison inside cmpOperations() which now has a new
needToCmpOperands reference parameter
https://reviews.llvm.org/D25385
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@286632 91177308-0d34-0410-b5e6-96231b3b80d8
The current implementation is emitting a global constant that happens
to evaluate to the same bytes + relocation as a jump instruction on
X86. This does not work for PIE executables and shared libraries
though, because we end up with a wrong relocation type. And it has no
chance of working on ARM/AArch64 which use different relocation types
for jump instructions (R_ARM_JUMP24) that is never generated for
data.
This change replaces the constant with module-level inline assembly
followed by a hidden declaration of the jump table. Works fine for
ARM/AArch64, but has some drawbacks.
* Extra symbols are added to the static symbol table, which inflate
the size of the unstripped binary a little. Stripped binaries are not
affected. This happens because jump table declarations must be
external (because their body is in the inline asm).
* Original functions that were anonymous are now named
<original name>.cfi, and it affects symbolization sometimes. This is
necessary because the only user of these functions is the (inline
asm) jump table, so they had to be added to @llvm.used, which does
not allow unnamed functions.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@286611 91177308-0d34-0410-b5e6-96231b3b80d8
Removing the limitation in visitInsertElementInst() causes several regressions
because we're not prepared to fold sequences of shuffles or inserts and extracts
separated by shuffles. Fixing that appears to be a difficult mission because we
are purposely trying to avoid creating shuffles with arbitrary shuffle masks
because some targets may choke on those.
https://llvm.org/bugs/show_bug.cgi?id=30923
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@286423 91177308-0d34-0410-b5e6-96231b3b80d8
No testcase included because I can't figure out how to reduce it.
(It's easy to write a testcase where rotation clones an assume,
but that doesn't actually seem to trigger the crash in opt on
its own; maybe an issue with the laziness?)
Differential Revision: https://reviews.llvm.org/D26434
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@286410 91177308-0d34-0410-b5e6-96231b3b80d8
Summary:
Unrolled Loop Size calculations moved to a function.
Constant representing number of optimized instructions
when "back edge" becomes "fall through" replaced with
variable.
Some comments added.
Reviewers: mzolotukhin
Differential Revision: http://reviews.llvm.org/D21719
From: Evgeny Stupachenko <evstupac@gmail.com>
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@286389 91177308-0d34-0410-b5e6-96231b3b80d8