Currently, in MachineBlockPlacement pass the loop is rotated to let the best exit to be the last BB in the loop chain, to maximize the fall-through from the loop to outside. With profile data, we can determine the cost in terms of missed fall through opportunities when rotating a loop chain and select the best rotation. Basically, there are three kinds of cost to consider for each rotation:
1. The possibly missed fall through edge (if it exists) from BB out of the loop to the loop header.
2. The possibly missed fall through edges (if they exist) from the loop exits to BB out of the loop.
3. The missed fall through edge (if it exists) from the last BB to the first BB in the loop chain.
Therefore, the cost for a given rotation is the sum of costs listed above. We select the best rotation with the smallest cost. This is only for PGO mode when we have more precise edge frequencies.
Differential revision: http://reviews.llvm.org/D10717
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@250754 91177308-0d34-0410-b5e6-96231b3b80d8
Allow LLVM to optimize the sequence like the following:
%inc = add nsw i32 %i, 1
%cmp = icmp slt %n, %inc
into:
%cmp = icmp sle i32 %n, %i
The case is not handled previously due to the complexity of compuation of %n.
Hence, LLVM cannot swap operands of icmp accordingly.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@250746 91177308-0d34-0410-b5e6-96231b3b80d8
This was originally checked in at r250527, but reverted at r250570 because of PR25222.
There were at least 2 problems:
1. The cost check was checking for an instruction with an exact cost of TCC_Expensive;
that should have been >=.
2. The cause of the clang stage 1 failures was illegally sinking 'call' instructions;
we can't sink instructions that may have side effects / are not safe to execute speculatively.
Fixed those conditions in sinkSelectOperand() and added test cases.
Original commit message:
This is a follow-up to the discussion in D12882.
Ideally, we would like SimplifyCFG to be able to form select instructions even when the operands
are expensive (as defined by the TTI cost model) because that may expose further optimizations.
However, we would then like a later pass like CodeGenPrepare to undo that transformation if the
target would likely benefit from not speculatively executing an expensive op (this patch).
Once we have this safety mechanism in place, we can adjust SimplifyCFG to restore its
select-formation behavior that changed with r248439.
Differential Revision: http://reviews.llvm.org/D13297
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@250743 91177308-0d34-0410-b5e6-96231b3b80d8
Convert two halfword loads into a single 32-bit word load with bitfield extract
instructions. For example :
ldrh w0, [x2]
ldrh w1, [x2, #2]
becomes
ldr w0, [x2]
ubfx w1, w0, #16, #16
and w0, w0, #ffff
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@250719 91177308-0d34-0410-b5e6-96231b3b80d8
Emit the CFI instructions after all code transformation have been done.
This will avoid any interference between CFI instructions and packetization.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@250714 91177308-0d34-0410-b5e6-96231b3b80d8
Summary:
This patch adds support to llvm-lto that mirrors the support added by
r249270 to the gold plugin. This enables better testing of combined
index generation for ThinLTO.
Added a new test, and this support will be used in the test in D13515.
Reviewers: joker.eph
Subscribers: llvm-commits
Differential Revision: http://reviews.llvm.org/D13847
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@250699 91177308-0d34-0410-b5e6-96231b3b80d8
The mapping of these two intrinsics in ARMInstrInfo.td had a small
omission which lead to their operands not being validated/transformed
before being lowered into usat and ssat instructions. This can cause
incorrect instructions to be emitted.
I've also added tests for the remaining two saturating arithmatic
intrinsics @llvm.arm.qadd and @llvm.arm.qsub as they are missing
codegen tests.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@250697 91177308-0d34-0410-b5e6-96231b3b80d8
This is a follow up patch of r250199 after verifying the start/stop
section symbols work as spected on FreeBSD.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@250679 91177308-0d34-0410-b5e6-96231b3b80d8
This patch improves support for combining the SSE4A EXTRQ(I) and INSERTQ(I) intrinsics:
1 - Converts INSERTQ/EXTRQ calls to INSERTQI/EXTRQI if the 'bit index' and 'length' operands are constant
2 - Converts INSERTQI/EXTRQI calls to shufflevector if the bit index/length are both byte aligned (we can already lower shuffles to INSERTQI/EXTRQI if its useful)
3 - Constant folding support
4 - Add zeroinitializer handling
Differential Revision: http://reviews.llvm.org/D13348
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@250609 91177308-0d34-0410-b5e6-96231b3b80d8
Summary:
This is a temporary hack until we get around to remapping the vreg
numbers to local numbers. Dead vregs cause bad numbering and make
consumers sad.
We could also just look at debug info an use named locals instead, but
vregs have to work properly anyways so there!
Reviewers: binji, sunfish
Subscribers: jfb, llvm-commits, dschuff
Differential Revision: http://reviews.llvm.org/D13839
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@250594 91177308-0d34-0410-b5e6-96231b3b80d8
Summary:
Some shared code for handling eh.exceptionpointer and eh.exceptioncode
needs to not share the part that truncates to 32 bits, which is intended
just for exception codes.
Reviewers: rnk
Subscribers: llvm-commits
Differential Revision: http://reviews.llvm.org/D13747
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@250588 91177308-0d34-0410-b5e6-96231b3b80d8
Our previous value of "16 + 8 + MaxCallFrameSize" for ParentFrameOffset
is incorrect when CSRs are involved. We were supposed to have a test
case to catch this, but it wasn't very rigorous.
The main effect here is that calling _CxxThrowException inside a
catchpad doesn't immediately crash on MOVAPS when you have an odd number
of CSRs.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@250583 91177308-0d34-0410-b5e6-96231b3b80d8
This lets us make guesses about symbols in third party DLLs without
debug info, like MSVCR120.dll or kernel32.dll. dbghelp does the same
thing.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@250582 91177308-0d34-0410-b5e6-96231b3b80d8
The motivation for this patch starts with PR20134:
https://llvm.org/bugs/show_bug.cgi?id=20134
void foo(int *a, int i) {
a[i] = a[i+1] + a[i+2];
}
It seems better to produce this (14 bytes):
movslq %esi, %rsi
movl 0x4(%rdi,%rsi,4), %eax
addl 0x8(%rdi,%rsi,4), %eax
movl %eax, (%rdi,%rsi,4)
Rather than this (22 bytes):
leal 0x1(%rsi), %eax
cltq
leal 0x2(%rsi), %ecx
movslq %ecx, %rcx
movl (%rdi,%rcx,4), %ecx
addl (%rdi,%rax,4), %ecx
movslq %esi, %rax
movl %ecx, (%rdi,%rax,4)
The most basic problem (the first test case in the patch combines constants) should also be fixed in InstCombine,
but it gets more complicated after that because we need to consider architecture and micro-architecture. For
example, AArch64 may not see any benefit from the more general transform because the ISA solves the sexting in
hardware. Some x86 chips may not want to replace 2 ADD insts with 1 LEA, and there's an attribute for that:
FeatureSlowLEA. But I suspect that doesn't go far enough or maybe it's not getting used when it should; I'm
also not sure if FeatureSlowLEA should also mean "slow complex addressing mode".
I see no perf differences on test-suite with this change running on AMD Jaguar, but I see small code size
improvements when building clang and the LLVM tools with the patched compiler.
A more general solution to the sext(add nsw(x, C)) problem that works for multiple targets is available
in CodeGenPrepare, but it may take quite a bit more work to get that to fire on all of the test cases that
this patch takes care of.
Differential Revision: http://reviews.llvm.org/D13757
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@250560 91177308-0d34-0410-b5e6-96231b3b80d8
Crashing is bad, m'kay? Fixing a 4 year old bug of my own creation.
Adding the testcase now which I should have added then which would have
long since caught this.
The problem is that printMessage() will display the diagnostic but not
set HadError to true, resulting in the assembler continuing on its way
and trying to create relocations for things that may not allow them or
otherwise get itself into trouble. Using the Error() helper function
here rather than calling printMessage() directly resolves this.
rdar://23133240
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@250557 91177308-0d34-0410-b5e6-96231b3b80d8
Summary:
We now use the block for the catchpad itself, rather than its normal
successor, as the funclet entry.
Putting the normal successor in the map leads downstream funclet
membership computations to erroneous results.
Reviewers: majnemer, rnk
Subscribers: rnk, llvm-commits
Differential Revision: http://reviews.llvm.org/D13798
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@250552 91177308-0d34-0410-b5e6-96231b3b80d8
The number of samples collected at the head of a function only make
sense for top-level functions (i.e., those actually called as opposed to
being inlined inside another).
Head samples essentially count the time spent inside the function's
prologue. This clearly doesn't make sense for inlined functions, so we
were always emitting 0 in those.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@250539 91177308-0d34-0410-b5e6-96231b3b80d8
Summary:
When a cleanup's cleanupendpad or cleanupret targets a catchendpad, stop
trying to propagate the cleanup's parent's color to the catchendpad, since
what's needed is the cleanup's grandparent's color and the catchendpad
will get that color from the catchpad linkage already. We already had
this exclusion for invokes, but were missing it for
cleanupendpad/cleanupret.
Also add a missing line that tags cleanupendpads' states in the
EHPadStateMap, without with lowering invokes that target cleanupendpads
which unwind to other handlers (and so don't have the -1 state) will fail.
This fixes the reduced IR repro in PR25163.
Reviewers: majnemer, andrew.w.kaylor, rnk
Subscribers: llvm-commits
Differential Revision: http://reviews.llvm.org/D13797
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@250534 91177308-0d34-0410-b5e6-96231b3b80d8
Ideally, we would like SimplifyCFG to be able to form select instructions even when the operands
are expensive (as defined by the TTI cost model) because that may expose further optimizations.
However, we would then like a later pass like CodeGenPrepare to undo that transformation if the
target would likely benefit from not speculatively executing an expensive op (this patch).
Once we have this safety mechanism in place, we can adjust SimplifyCFG to restore its
select-formation behavior that changed with r248439.
Differential Revision: http://reviews.llvm.org/D13297
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@250527 91177308-0d34-0410-b5e6-96231b3b80d8
The `"statepoint-id"` and `"statepoint-num-patch-bytes"` attributes are
used solely to determine properties of the `gc.statepoint` being
created. Once the `gc.statepoint` is in place, these should be removed.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@250491 91177308-0d34-0410-b5e6-96231b3b80d8
Summary:
This is a step towards using operand bundles to carry deopt state till
RewriteStatepointsForGC. The change adds a flag to
RewriteStatepointsForGC that teaches it to pick up deopt state from a
`"deopt"` operand bundle attached to the `call` or `invoke` it is
wrapping.
The command line flag added, `-rs4gc-use-deopt-bundles`, will only exist
for a short while. Once we are able to pipe deopt bundle state through
the full optimization pipeline without problems, we will "constant fold"
`-rs4gc-use-deopt-bundles` to `true`.
Reviewers: swaroop.sridhar, reames
Subscribers: llvm-commits, sanjoy
Differential Revision: http://reviews.llvm.org/D13372
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@250489 91177308-0d34-0410-b5e6-96231b3b80d8
Summary:
`cloneArithmeticIVUser` currently trips over expression like `add %iv,
-1` when `%iv` is being zero extended -- it tries to construct the
widened use as `add %iv.zext, zext(-1)` and (correctly) fails to prove
equivalence to `zext(add %iv, -1)` (here the SCEV for `%iv` is
`{1,+,1}`).
This change teaches `IndVars` to try sign extending the non-IV operand
if that makes the newly constructed IV use equivalent to the widened
narrow IV use.
Reviewers: atrick, hfinkel, reames
Subscribers: sanjoy, llvm-commits
Differential Revision: http://reviews.llvm.org/D13717
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@250483 91177308-0d34-0410-b5e6-96231b3b80d8
Summary:
Follow the same syntax as for the spec repo. Both have evolved slightly
independently and need to converge again.
This, along with wasmate changes, allows me to do the following:
echo "int add(int a, int b) { return a + b; }" > add.c
./out/bin/clang -O2 -S --target=wasm32-unknown-unknown add.c -o add.wack
./experimental/prototype-wasmate/wasmate.py add.wack > add.wast
./sexpr-wasm-prototype/out/sexpr-wasm add.wast -o add.wasm
./sexpr-wasm-prototype/third_party/v8-native-prototype/v8/v8/out/Release/d8 -e "print(WASM.instantiateModule(readbuffer('add.wasm'), {print:print}).add(42, 1337));"
As you'd expect, the d8 shell prints out the right value.
Reviewers: sunfish
Subscribers: jfb, llvm-commits, dschuff
Differential Revision: http://reviews.llvm.org/D13712
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@250480 91177308-0d34-0410-b5e6-96231b3b80d8