First, the motivation: LLVM currently does not realize that:
((2072 >> (L == 0)) >> 7) & 1 == 0
where L is some arbitrary value. Whether you right-shift 2072 by 7 or by 8, the
lowest-order bit is always zero. There are obviously several ways to go about
fixing this, but the generic solution pursued in this patch is to teach
computeKnownBits something about shifts by a non-constant amount. Previously,
we would give up completely on these. Instead, in cases where we know something
about the low-order bits of the shift-amount operand, we can combine (and
together) the associated restrictions for all shift amounts consistent with
that knowledge. As a further generalization, I refactored all of the logic for
all three kinds of shifts to have this capability. This works well in the above
case, for example, because the dynamic shift amount can only be 0 or 1, and
thus we can say a lot about the known bits of the result.
This brings us to the second part of this change: Even when we know all of the
bits of a value via computeKnownBits, nothing used to constant-fold the result.
This introduces the necessary code into InstCombine and InstSimplify. I've
added it into both because:
1. InstCombine won't automatically pick up the associated logic in
InstSimplify (InstCombine uses InstSimplify, but not via the API that
passes in the original instruction).
2. Putting the logic in InstCombine allows the resulting simplifications to become
part of the iterative worklist
3. Putting the logic in InstSimplify allows the resulting simplifications to be
used by everywhere else that calls SimplifyInstruction (inlining, unrolling,
and many others).
And this requires a small change to our definition of an ephemeral value so
that we don't break the rest case from r246696 (where the icmp feeding the
@llvm.assume, is also feeding a br). Under the old definition, the icmp would
not be considered ephemeral (because it is used by the br), but this causes the
assume to remove itself (in addition to simplifying the branch structure), and
it seems more-useful to prevent that from happening.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@251146 91177308-0d34-0410-b5e6-96231b3b80d8
After some look-ahead PRE was added for GEPs, an instruction could end
up in the table of candidates before it was actually inspected. When
this happened the pass might decide it was the best candidate to
replace itself. This didn't go well.
Should fix PR25291
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@251145 91177308-0d34-0410-b5e6-96231b3b80d8
Summary:
This change teaches the LLVM inliner to not inline through callsites
with unknown operand bundles. Currently all operand bundles are
"unknown" operand bundles but in the near future we will add support for
inlining through some select kinds of operand bundles.
Reviewers: reames, chandlerc, majnemer
Subscribers: llvm-commits
Differential Revision: http://reviews.llvm.org/D14001
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@251141 91177308-0d34-0410-b5e6-96231b3b80d8
This patch converts the remaining references to literal
strings for names of profile runtime entites (such as
profile runtime hook, runtime hook use function, profile
init method, register function etc).
Also added documentation for all the new interfaces.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@251093 91177308-0d34-0410-b5e6-96231b3b80d8
The insertLoop() API is only used to add new loops, and has confusing
ownership semantics. Simplify it by replacing it with addLoop().
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@251064 91177308-0d34-0410-b5e6-96231b3b80d8
Summary: Currently SimplifyResume can convert an invoke instruction to a call instruction if its landing pad is trivial. In practice we could have several invoke instructions with trivial landing pads and share a common rethrow block, and in the common rethrow block, all the landing pads join to a phi node. The patch extends SimplifyResume to check the phi of landing pad and their incoming blocks. If any of them is trivial, remove it from the phi node and convert the invoke instruction to a call instruction.
Reviewers: hfinkel, reames
Subscribers: llvm-commits
Differential Revision: http://reviews.llvm.org/D13718
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@251061 91177308-0d34-0410-b5e6-96231b3b80d8
This is a clean up patch that defines instr prof section and variable
name prefixes in a common header with access helper functions.
clang FE change will be done as a follow up once this patch is in.
Differential Revision: http://reviews.llvm.org/D13919
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@251058 91177308-0d34-0410-b5e6-96231b3b80d8
As an invariant, BasicBlocks cannot be empty when passed to a transform.
This is not the case for MachineBasicBlocks and the Sink pass was ported
from the MachineSink pass which would explain the check's existence.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@251057 91177308-0d34-0410-b5e6-96231b3b80d8
* Don't instrument promotable dynamic allocas:
We already have a test that checks that promotable dynamic allocas are
ignored, as well as static promotable allocas. Make sure this test will
still pass if/when we enable dynamic alloca instrumentation by default.
* Handle lifetime intrinsics before handling dynamic allocas:
lifetime intrinsics may refer to dynamic allocas, so we need to emit
instrumentation before these dynamic allocas would be replaced.
Differential Revision: http://reviews.llvm.org/D12704
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@251045 91177308-0d34-0410-b5e6-96231b3b80d8
SimplifyTerminatorOnSelect didn't consider the possibility that the
condition might be related to one of PHI nodes.
This fixes PR25267.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@250922 91177308-0d34-0410-b5e6-96231b3b80d8
In some cases (as illustrated in the unittest), lineno can be less than the heade_lineno because the function body are included from some other files. In this case, offset will be negative. This patch makes clang still able to match the profile to IR in this situation.
http://reviews.llvm.org/D13914
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@250873 91177308-0d34-0410-b5e6-96231b3b80d8
It is now possible to infer intrinsic modref behaviour purely from intrinsic attributes.
This change will allow to completely remove GET_INTRINSIC_MODREF_BEHAVIOR table.
Differential Revision: http://reviews.llvm.org/D13907
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@250860 91177308-0d34-0410-b5e6-96231b3b80d8
Summary: In r231241, TargetLibraryInfoWrapperPass was added to
`getAnalysisUsage` for `AddressSanitizer`, but the corresponding
`INITIALIZE_PASS_DEPENDENCY` was not added.
Reviewers: dvyukov, chandlerc, kcc
Subscribers: kcc, llvm-commits
Differential Revision: http://reviews.llvm.org/D13629
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@250813 91177308-0d34-0410-b5e6-96231b3b80d8
Since LiveVariables is uniqued (we just created it from a `DenseSet`),
`FindIndex(LiveVariables, LiveVariables[i])` is always `i`.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@250786 91177308-0d34-0410-b5e6-96231b3b80d8
`normalizeForInvokeSafepoint` in RewriteStatepointsForGC.cpp, as it is
written today, deals with `gc.relocate` and `gc.result` uses of a
statepoint equally well. This change documents this fact and adds a
test case.
There is no functional change here -- only documentation of existing
functionality.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@250784 91177308-0d34-0410-b5e6-96231b3b80d8
Allow LLVM to optimize the sequence like the following:
%inc = add nsw i32 %i, 1
%cmp = icmp slt %n, %inc
into:
%cmp = icmp sle i32 %n, %i
The case is not handled previously due to the complexity of compuation of %n.
Hence, LLVM cannot swap operands of icmp accordingly.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@250746 91177308-0d34-0410-b5e6-96231b3b80d8
Besides the usual, I finally added an overload to
`BasicBlock::splitBasicBlock()` that accepts an `Instruction*` instead
of `BasicBlock::iterator`. Someone can go back and remove this overload
later (after updating the callers I'm going to skip going forward), but
the most common call seems to be
`BB->splitBasicBlock(BB->getTerminator(), ...)` and I'm not sure it's
better to add `->getIterator()` to every one than have the overload.
It's pretty hard to get the usage wrong.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@250745 91177308-0d34-0410-b5e6-96231b3b80d8
Originally I planned to use the same interface for masked gather/scatter and set isConsecutive to "false" in this case.
Now I'm implementing masked gather/scatter and see that the interface is inconvenient. I want to add interfaces isLegalMaskedGather() / isLegalMaskedScatter() instead of using the "Consecutive" parameter in the existing interfaces.
Differential Revision: http://reviews.llvm.org/D13850
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@250686 91177308-0d34-0410-b5e6-96231b3b80d8
This is a follow up patch of r250199 after verifying the start/stop
section symbols work as spected on FreeBSD.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@250679 91177308-0d34-0410-b5e6-96231b3b80d8
This patch improves support for combining the SSE4A EXTRQ(I) and INSERTQ(I) intrinsics:
1 - Converts INSERTQ/EXTRQ calls to INSERTQI/EXTRQI if the 'bit index' and 'length' operands are constant
2 - Converts INSERTQI/EXTRQI calls to shufflevector if the bit index/length are both byte aligned (we can already lower shuffles to INSERTQI/EXTRQI if its useful)
3 - Constant folding support
4 - Add zeroinitializer handling
Differential Revision: http://reviews.llvm.org/D13348
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@250609 91177308-0d34-0410-b5e6-96231b3b80d8
The `"statepoint-id"` and `"statepoint-num-patch-bytes"` attributes are
used solely to determine properties of the `gc.statepoint` being
created. Once the `gc.statepoint` is in place, these should be removed.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@250491 91177308-0d34-0410-b5e6-96231b3b80d8
Summary:
This is a step towards using operand bundles to carry deopt state till
RewriteStatepointsForGC. The change adds a flag to
RewriteStatepointsForGC that teaches it to pick up deopt state from a
`"deopt"` operand bundle attached to the `call` or `invoke` it is
wrapping.
The command line flag added, `-rs4gc-use-deopt-bundles`, will only exist
for a short while. Once we are able to pipe deopt bundle state through
the full optimization pipeline without problems, we will "constant fold"
`-rs4gc-use-deopt-bundles` to `true`.
Reviewers: swaroop.sridhar, reames
Subscribers: llvm-commits, sanjoy
Differential Revision: http://reviews.llvm.org/D13372
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@250489 91177308-0d34-0410-b5e6-96231b3b80d8
Rename `IndVarSimplify::getExtend` to `IndVarSimplify::createExtendInst`
to make it obvious that it creates `llvm::Instruction` s.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@250484 91177308-0d34-0410-b5e6-96231b3b80d8
Summary:
`cloneArithmeticIVUser` currently trips over expression like `add %iv,
-1` when `%iv` is being zero extended -- it tries to construct the
widened use as `add %iv.zext, zext(-1)` and (correctly) fails to prove
equivalence to `zext(add %iv, -1)` (here the SCEV for `%iv` is
`{1,+,1}`).
This change teaches `IndVars` to try sign extending the non-IV operand
if that makes the newly constructed IV use equivalent to the widened
narrow IV use.
Reviewers: atrick, hfinkel, reames
Subscribers: sanjoy, llvm-commits
Differential Revision: http://reviews.llvm.org/D13717
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@250483 91177308-0d34-0410-b5e6-96231b3b80d8
Summary:
This NFC splitting is intended to make a later diff easier to follow.
It just tail duplicates `cloneIVUser` into `cloneArithmeticIVUser` and
`cloneBitwiseIVUser`.
Reviewers: atrick, hfinkel, reames
Subscribers: llvm-commits
Differential Revision: http://reviews.llvm.org/D13716
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@250481 91177308-0d34-0410-b5e6-96231b3b80d8
Android libc provides a fixed TLS slot for the unsafe stack pointer,
and this change implements direct access to that slot on AArch64 via
__builtin_thread_pointer() + offset.
This change also moves more code into TargetLowering and its
target-specific subclasses to get rid of target-specific codegen
in SafeStackPass.
This change does not touch the ARM backend because ARM lowers
builting_thread_pointer as aeabi_read_tp, which is not available
on Android.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@250456 91177308-0d34-0410-b5e6-96231b3b80d8
Turns out this approach is buggy. In discussion about follow on work, Sanjoy pointed out that we could be subject to circular logic problems.
Consider:
if (i u< L) leave()
if ((i + 1) u< L) leave()
print(a[i] + a[i+1])
If we know that L is less than UINT_MAX, we could possible prove (in a control dependent way) that i + 1 does not overflow. This gives us:
if (i u< L) leave()
if ((i +nuw 1) u< L) leave()
print(a[i] + a[i+1])
If we now do the transform this patch proposed, we end up with:
if ((i +nuw 1) u< L) leave_appropriately()
print(a[i] + a[i+1])
That would be a miscompile when i==-1. The problem here is that the control dependent nuw bits got used to prove something about the first condition. That's obviously invalid.
This won't happen today, but since I plan to enhance LVI/CVP with exactly that transform at some point in the not too distant future...
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@250430 91177308-0d34-0410-b5e6-96231b3b80d8
This adjusts all integers in the reader/writer to reflect the types
stored on profile files. They should all be unsigned 32-bit or 64-bit
values. Changed all associated internal types to be uint32_t or
uint64_t.
The only place that needed some adjustments is in the sample profile
transformation. Altough the weight read from the profile are 64-bit
values, the internal API for branch weights only accepts 32-bit values.
The pass now saturates weights that overflow uint32_t.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@250427 91177308-0d34-0410-b5e6-96231b3b80d8
With r250345 and r250343, we start to observe the following failure
when bootstrap clang with lto and pgo:
PHI node entries do not match predecessors!
%.sroa.029.3.i = phi %"class.llvm::SDNode.13298"* [ null, %30953 ], [ null, %31017 ], [ null, %30998 ], [ null, %_ZN4llvm8dyn_castINS_14ConstantSDNodeENS_7SDValueEEENS_10cast_rettyIT_T0_E8ret_typeERS5_.exit.i.1804 ], [ null, %30975 ], [ null, %30991 ], [ null, %_ZNK4llvm3EVT13getScalarTypeEv.exit.i.1812 ], [ %..sroa.029.0.i, %_ZN4llvm11SmallVectorIiLj8EED1Ev.exit.i.1826 ], !dbg !451895
label %30998
label %_ZNK4llvm3EVTeqES0_.exit19.thread.i
LLVM ERROR: Broken function found, compilation aborted!
I will re-commit this if the bot does not recover.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@250366 91177308-0d34-0410-b5e6-96231b3b80d8
Currently in JumpThreading pass, the branch weight metadata is not updated after CFG modification. Consider the jump threading on PredBB, BB, and SuccBB. After jump threading, the weight on BB->SuccBB should be adjusted as some of it is contributed by the edge PredBB->BB, which doesn't exist anymore. This patch tries to update the edge weight in metadata on BB->SuccBB by scaling it by 1 - Freq(PredBB->BB) / Freq(BB->SuccBB).
This is the third attempt to submit this patch, while the first two led to failures in some FDO tests. After investigation, it is the edge weight normalization that caused those failures. In this patch the edge weight normalization is fixed so that there is no zero weight in the output and the sum of all weights can fit in 32-bit integer. Several unit tests are added.
Differential revision: http://reviews.llvm.org/D10979
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@250345 91177308-0d34-0410-b5e6-96231b3b80d8
If we have a series of branches which are all unlikely to fail, we can possibly combine them into a single check on the fastpath combined with a bit of dispatch logic on the slowpath. We don't want to do this unconditionally since it requires speculating instructions past a branch, but if the profiling metadata on the branch indicates profitability, this can reduce the number of checks needed along the fast path.
The canonical example this is trying to handle is removing the second bounds check implied by the Java code: a[i] + a[i+1]. Note that it can currently only do so for really simple conditions and the values of a[i] can't be used anywhere except in the addition. (i.e. the load has to have been sunk already and not prevent speculation.) I plan on extending this transform over the next few days to handle alternate sequences.
Differential Revision: http://reviews.llvm.org/D13070
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@250343 91177308-0d34-0410-b5e6-96231b3b80d8