The purpose of the YAML diagnostic output file is to collect information on
optimizations performed, or not performed, for later processing by tools that
help users (and compiler developers) understand how code was optimized. As
such, the diagnostics that appear in the file should not be coupled to what a
user might want to see summarized for them as the compiler runs, and in fact,
because the user likely does not know what optimization diagnostics their tools
might want to use, the user cannot provide a useful filter regardless. As such,
we shouldn't filter the diagnostics going to the output file.
Differential Revision: https://reviews.llvm.org/D25224
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@283236 91177308-0d34-0410-b5e6-96231b3b80d8
Summary:
In the case below, %Result.i19 is defined between coro.save and coro.suspend and used after coro.suspend. We need to correctly place such a value into the coroutine frame.
```
%save = call token @llvm.coro.save(i8* null)
%Result.i19 = getelementptr inbounds %"struct.lean_future<int>::Awaiter", %"struct.lean_future<int>::Awaiter"* %ref.tmp7, i64 0, i32 0
%suspend = call i8 @llvm.coro.suspend(token %save, i1 false)
switch i8 %suspend, label %exit [
i8 0, label %await.ready
i8 1, label %exit
]
await.ready:
%val = load i32, i32* %Result.i19
```
Reviewers: majnemer
Subscribers: llvm-commits, mehdi_amini
Differential Revision: https://reviews.llvm.org/D24418
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@282902 91177308-0d34-0410-b5e6-96231b3b80d8
Summary:
Without the fix, if there was a function inlined into the coroutine with debug information, CloneFunctionInto(NewF, &F, VMap, /*ModuleLevelChanges=*/true, Returns); would duplicate all of the debug information including the DICompileUnit.
We know use VMap to indicate that debug metadata for a File, Unit and FunctionType should not be duplicated when we creating clones that will become f.resume, f.destroy and f.cleanup.
Reviewers: majnemer
Subscribers: mehdi_amini, llvm-commits
Differential Revision: https://reviews.llvm.org/D24417
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@282899 91177308-0d34-0410-b5e6-96231b3b80d8
Summary: Not all coro.subfn.addr intrinsics can be eliminated in CoroElide through devirtualization. Those that remain need to be lowered in CoroCleanup.
Reviewers: majnemer
Subscribers: llvm-commits, mehdi_amini
Differential Revision: https://reviews.llvm.org/D24412
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@282897 91177308-0d34-0410-b5e6-96231b3b80d8
Summary: Debug info should *not* affect optimization decisions. This patch updates loop unroller cost model to make it not affected by debug info.
Reviewers: davidxl, mzolotukhin
Subscribers: haicheng, llvm-commits, mzolotukhin
Differential Revision: https://reviews.llvm.org/D25098
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@282894 91177308-0d34-0410-b5e6-96231b3b80d8
When building the steps for scalar induction variables, we previously attempted
to determine if all the scalar users of the induction variable were uniform. If
they were, we would only emit the step corresponding to vector lane zero. This
optimization was too aggressive. We generally don't know the entire set of
induction variable users that will be scalar. We have
isScalarAfterVectorization, but this is only a conservative estimate of the
instructions that will be scalarized. Thus, an induction variable may have
scalar users that aren't already known to be scalar. To avoid emitting unused
steps, we can only check that the induction variable is uniform. This should
fix PR30542.
Reference: https://llvm.org/bugs/show_bug.cgi?id=30542
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@282863 91177308-0d34-0410-b5e6-96231b3b80d8
Summary:
We don't want to decay hot callsites to import chains of hot
callsites. The same mechanism is used in LIPO.
Reviewers: tejohnson, eraman, mehdi_amini
Subscribers: llvm-commits, mehdi_amini
Differential Revision: https://reviews.llvm.org/D24976
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@282833 91177308-0d34-0410-b5e6-96231b3b80d8
Summary:
Not tunned up heuristic, but with this small heuristic there is about
+0.10% improvement on SPEC 2006
Reviewers: tejohnson, mehdi_amini, eraman
Subscribers: mehdi_amini, llvm-commits
Differential Revision: https://reviews.llvm.org/D24940
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@282733 91177308-0d34-0410-b5e6-96231b3b80d8
Also, remove unnecessary function attributes, parameters, and comments.
It looks like at least some of these tests are not minimal though...
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@282620 91177308-0d34-0410-b5e6-96231b3b80d8
Pointers in different addrspaces can have different sizes, so it's not valid to look through addrspace cast calculating base and offset for a value.
This is similar to D13008.
Reviewed By: reames
Differential Revision: https://reviews.llvm.org/D24729
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@282612 91177308-0d34-0410-b5e6-96231b3b80d8
There is really no reason for these to be separate.
The vectorizer started this pretty bad tradition that the text of the
missed remarks is pretty meaningless, i.e. vectorization failed. There,
you have to query analysis to get the full picture.
I think we should just explain the reason for missing the optimization
in the missed remark when possible. Analysis remarks should provide
information that the pass gathers regardless whether the optimization is
passing or not.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@282542 91177308-0d34-0410-b5e6-96231b3b80d8
(Re-committed after moving the template specialization under the yaml
namespace. GCC was complaining about this.)
This allows various presentation of this data using an external tool.
This was first recommended here[1].
As an example, consider this module:
1 int foo();
2 int bar();
3
4 int baz() {
5 return foo() + bar();
6 }
The inliner generates these missed-optimization remarks today (the
hotness information is pulled from PGO):
remark: /tmp/s.c:5:10: foo will not be inlined into baz (hotness: 30)
remark: /tmp/s.c:5:18: bar will not be inlined into baz (hotness: 30)
Now with -pass-remarks-output=<yaml-file>, we generate this YAML file:
--- !Missed
Pass: inline
Name: NotInlined
DebugLoc: { File: /tmp/s.c, Line: 5, Column: 10 }
Function: baz
Hotness: 30
Args:
- Callee: foo
- String: will not be inlined into
- Caller: baz
...
--- !Missed
Pass: inline
Name: NotInlined
DebugLoc: { File: /tmp/s.c, Line: 5, Column: 18 }
Function: baz
Hotness: 30
Args:
- Callee: bar
- String: will not be inlined into
- Caller: baz
...
This is a summary of the high-level decisions:
* There is a new streaming interface to emit optimization remarks.
E.g. for the inliner remark above:
ORE.emit(DiagnosticInfoOptimizationRemarkMissed(
DEBUG_TYPE, "NotInlined", &I)
<< NV("Callee", Callee) << " will not be inlined into "
<< NV("Caller", CS.getCaller()) << setIsVerbose());
NV stands for named value and allows the YAML client to process a remark
using its name (NotInlined) and the named arguments (Callee and Caller)
without parsing the text of the message.
Subsequent patches will update ORE users to use the new streaming API.
* I am using YAML I/O for writing the YAML file. YAML I/O requires you
to specify reading and writing at once but reading is highly non-trivial
for some of the more complex LLVM types. Since it's not clear that we
(ever) want to use LLVM to parse this YAML file, the code supports and
asserts that we're writing only.
On the other hand, I did experiment that the class hierarchy starting at
DiagnosticInfoOptimizationBase can be mapped back from YAML generated
here (see D24479).
* The YAML stream is stored in the LLVM context.
* In the example, we can probably further specify the IR value used,
i.e. print "Function" rather than "Value".
* As before hotness is computed in the analysis pass instead of
DiganosticInfo. This avoids the layering problem since BFI is in
Analysis while DiagnosticInfo is in IR.
[1] https://reviews.llvm.org/D19678#419445
Differential Revision: https://reviews.llvm.org/D24587
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@282539 91177308-0d34-0410-b5e6-96231b3b80d8
This allows various presentation of this data using an external tool.
This was first recommended here[1].
As an example, consider this module:
1 int foo();
2 int bar();
3
4 int baz() {
5 return foo() + bar();
6 }
The inliner generates these missed-optimization remarks today (the
hotness information is pulled from PGO):
remark: /tmp/s.c:5:10: foo will not be inlined into baz (hotness: 30)
remark: /tmp/s.c:5:18: bar will not be inlined into baz (hotness: 30)
Now with -pass-remarks-output=<yaml-file>, we generate this YAML file:
--- !Missed
Pass: inline
Name: NotInlined
DebugLoc: { File: /tmp/s.c, Line: 5, Column: 10 }
Function: baz
Hotness: 30
Args:
- Callee: foo
- String: will not be inlined into
- Caller: baz
...
--- !Missed
Pass: inline
Name: NotInlined
DebugLoc: { File: /tmp/s.c, Line: 5, Column: 18 }
Function: baz
Hotness: 30
Args:
- Callee: bar
- String: will not be inlined into
- Caller: baz
...
This is a summary of the high-level decisions:
* There is a new streaming interface to emit optimization remarks.
E.g. for the inliner remark above:
ORE.emit(DiagnosticInfoOptimizationRemarkMissed(
DEBUG_TYPE, "NotInlined", &I)
<< NV("Callee", Callee) << " will not be inlined into "
<< NV("Caller", CS.getCaller()) << setIsVerbose());
NV stands for named value and allows the YAML client to process a remark
using its name (NotInlined) and the named arguments (Callee and Caller)
without parsing the text of the message.
Subsequent patches will update ORE users to use the new streaming API.
* I am using YAML I/O for writing the YAML file. YAML I/O requires you
to specify reading and writing at once but reading is highly non-trivial
for some of the more complex LLVM types. Since it's not clear that we
(ever) want to use LLVM to parse this YAML file, the code supports and
asserts that we're writing only.
On the other hand, I did experiment that the class hierarchy starting at
DiagnosticInfoOptimizationBase can be mapped back from YAML generated
here (see D24479).
* The YAML stream is stored in the LLVM context.
* In the example, we can probably further specify the IR value used,
i.e. print "Function" rather than "Value".
* As before hotness is computed in the analysis pass instead of
DiganosticInfo. This avoids the layering problem since BFI is in
Analysis while DiagnosticInfo is in IR.
[1] https://reviews.llvm.org/D19678#419445
Differential Revision: https://reviews.llvm.org/D24587
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@282499 91177308-0d34-0410-b5e6-96231b3b80d8
Summary:
We don't currently need this facility for CFI. Disabling individual hot methods proved
to be a better strategy in Chrome.
Also, the design of the feature is suboptimal, as pointed out by Peter Collingbourne.
Reviewers: pcc
Subscribers: kcc
Differential Revision: https://reviews.llvm.org/D24948
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@282461 91177308-0d34-0410-b5e6-96231b3b80d8
Summary:
This patch improves thinlto importer
by importing 3x larger functions that are called from hot block.
I compared performance with the trunk on spec, and there
were about 2% on povray and 3.33% on milc. These results seems
to be consistant and match the results Teresa got with her simple
heuristic. Some benchmarks got slower but I think they are just
noisy (mcf, xalancbmki, omnetpp)- running the benchmarks again with
more iterations to confirm. Geomean of all benchmarks including the noisy ones
were about +0.02%.
I see much better improvement on google branch with Easwaran patch
for pgo callsite inlining (the inliner actually inline those big functions)
Over all I see +0.5% improvement, and I get +8.65% on povray.
So I guess we will see much bigger change when Easwaran patch will land
(it depends on new pass manager), but it is still worth putting this to trunk
before it.
Implementation details changes:
- Removed CallsiteCount.
- ProfileCount got replaced by Hotness
- hot-import-multiplier is set to 3.0 for now,
didn't have time to tune it up, but I see that we get most of the interesting
functions with 3, so there is no much performance difference with higher, and
binary size doesn't grow as much as with 10.0.
Reviewers: eraman, mehdi_amini, tejohnson
Subscribers: mehdi_amini, llvm-commits
Differential Revision: https://reviews.llvm.org/D24638
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@282437 91177308-0d34-0410-b5e6-96231b3b80d8
Summary:
If coroutine has no suspend points, remove heap allocation and turn a coroutine into a normal function.
Also, if a pattern is detected that coroutine resumes or destroys itself prior to coro.suspend call, turn the suspend point into a simple jump to resume or cleanup label. This pattern occurs when coroutines are used to propagate errors in functions that return expected<T>.
Reviewers: majnemer
Subscribers: mehdi_amini, llvm-commits
Differential Revision: https://reviews.llvm.org/D24408
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@282414 91177308-0d34-0410-b5e6-96231b3b80d8
The index of the new insertelement instruction was evaluated in the
wrong way, it was considered as the index of the inserted value instead
of index of the position, where the value should be inserted.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@282401 91177308-0d34-0410-b5e6-96231b3b80d8
This patch fixes PR30366.
Function foldUDivShl() worked under the assumption that one of the values
in input to the function was always an instance of llvm::Instruction.
However, function visitUDivOperand() (the only user of foldUDivShl) was
clearly violating that precondition; internally, visitUDivOperand() uses pattern
matches to check the operands of a udiv. Pattern matchers for binary operators
know how to handle both Instruction and ConstantExpr values.
This patch fixes the problem in foldUDivShl(). Now we use pattern matchers
instead of explicit casts to Instruction. The reduced test case from PR30366
has been added to test file InstCombine/udiv-simplify.ll.
Differential Revision: https://reviews.llvm.org/D24565
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@282398 91177308-0d34-0410-b5e6-96231b3b80d8
If inserting more than one constant into a vector:
define <4 x float> @foo(<4 x float> %x) {
%ins1 = insertelement <4 x float> %x, float 1.0, i32 1
%ins2 = insertelement <4 x float> %ins1, float 2.0, i32 2
ret <4 x float> %ins2
}
InstCombine could reduce that to a shufflevector:
define <4 x float> @goo(<4 x float> %x) {
%shuf = shufflevector <4 x float> %x, <4 x float> <float undef, float 1.0, float 2.0, float undef>, <4 x i32><i32 0, i32 5, i32 6, i32 3>
ret <4 x float> %shuf
}
Also, InstCombine tries to convert shuffle instruction to single insertelement, if one of the vectors is a constant vector and only a single element from this constant should be used in shuffle, i.e.
shufflevector <4 x float> %v, <4 x float> <float undef, float 1.0, float
undef, float undef>, <4 x i32> <i32 0, i32 5, i32 undef, i32 undef> ->
insertelement <4 x float> %v, float 1.0, 1
Differential Revision: https://reviews.llvm.org/D24182
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@282237 91177308-0d34-0410-b5e6-96231b3b80d8
and also the dependent r282175 "GVN-hoist: do not dereference null pointers"
It's causing compiler crashes building Harfbuzz (PR30499).
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@282199 91177308-0d34-0410-b5e6-96231b3b80d8
To hoist stores past loads, we used to search for potential
conflicting loads on the hoisting path by following a MemorySSA
def-def link from the store to be hoisted to the previous
defining memory access, and from there we followed the def-use
chains to all the uses that occur on the hoisting path. The
problem is that the def-def link may point to a store that does
not alias with the store to be hoisted, and so the loads that are
walked may not alias with the store to be hoisted, and even as in
the testcase of PR30216, the loads that may alias with the store
to be hoisted are not visited.
The current patch visits all loads on the path from the store to
be hoisted to the hoisting position and uses the alias analysis
to ask whether the store may alias the load. I was not able to
use the MemorySSA functionality to ask for whether load and
store are clobbered: I'm not sure which function to call, so I
used a call to AA->isNoAlias().
Store past store is still working as before using a MemorySSA
query: I added an extra test to pr30216.ll to make sure store
past store does not regress.
Differential Revision: https://reviews.llvm.org/D24517
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@282168 91177308-0d34-0410-b5e6-96231b3b80d8
Without this patch, GVN-hoist would think that a branch instruction is a scalar instruction
and would try to value number it. The patch filters out all such kind of irrelevant instructions.
A bit frustrating is that there is no easy way to discard all those very infrequent instructions,
a bit like isa<TerminatorInst> that stands for a large family of instructions. I'm thinking that
checking for those very infrequent other instructions would cost us more in compilation time
than just letting those instructions getting numbered, so I'm still thinking that a simpler check:
if (isa<TerminatorInst>(I))
return false;
is better than listing all the other less frequent instructions.
Differential Revision: https://reviews.llvm.org/D23929
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@282160 91177308-0d34-0410-b5e6-96231b3b80d8