Splitting critical edges when one of the source edges is an indirectbr
is hard in general (because it requires changing the memory the indirectbr
reads). But if a block only has a single indirectbr predecessor (which is
the common case), we can simulate splitting that edge by splitting
the destination block, and retargeting the *direct* branches.
This is motivated by the use of computed gotos in python 2.7: PyEval_EvalFrame()
ends up using an indirect branch with ~100 successors, and passing a constant to
each of those. Since MachineSink can't break indirect critical edges on demand
(and doing this in MIR doesn't look feasible), this causes us to emit about ~100
defs of registers containing constants, which we in the predecessor block, where
only one of those constants is used in each successor. So, at each computed goto,
we needlessly spill about a 100 constants to stack. The end result is that a
clang-compiled python interpreter can be about ~2.5x slower on a simple python
reduction loop than a gcc-compiled interpreter.
Differential Revision: https://reviews.llvm.org/D29916
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@296060 91177308-0d34-0410-b5e6-96231b3b80d8
Justin added support for DISubprogram locs in r295531 and r296052.
Use that instead of no-loc for constants and arguments.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@296058 91177308-0d34-0410-b5e6-96231b3b80d8
Add an optimization remark to asm-printer that summarizes the number
of instructions emitted per function.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@296053 91177308-0d34-0410-b5e6-96231b3b80d8
DiagnosticInfo switched from DebugLoc to DiagnosticLocation in
r295519, update these subclasses to match.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@296052 91177308-0d34-0410-b5e6-96231b3b80d8
Before this, MSan poisoned exactly one element of any array alloca,
even if the number of elements was zero.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@296050 91177308-0d34-0410-b5e6-96231b3b80d8
This allows the ability to call IPDBSession::getGlobalScope with a NativeSession and
to then query it for some basic fields from the PDB's InfoStream.
Note that the symbols now have non-const references back to the Session so that
NativeRawSymbol can access the PDBFile through the Session.
Differential Revision: https://reviews.llvm.org/D30314
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@296049 91177308-0d34-0410-b5e6-96231b3b80d8
We were stopping the translation of the parent block when the
translation of an instruction failed, but we were still trying to
translate the other blocks of the parent function.
Don't do that.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@296047 91177308-0d34-0410-b5e6-96231b3b80d8
This is the compromise between having a per-function IRTranslator
and manually managing the per-function state.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@296046 91177308-0d34-0410-b5e6-96231b3b80d8
Summary: In case we do not know what the condition is in an unswitched loop, but we know its definitely NOT a known constant. We can perform simplifcations based on this information.
Reviewers: sanjoy, hfinkel, chenli, efriedma
Reviewed By: efriedma
Subscribers: david2050, llvm-commits, mzolotukhin
Differential Revision: https://reviews.llvm.org/D28968
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@296041 91177308-0d34-0410-b5e6-96231b3b80d8
These are only used when emitting remarks without ORE directly using the free
functions emitOptimizationRemark*.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@296037 91177308-0d34-0410-b5e6-96231b3b80d8
This commit provides `zip_{first,shortest}` with the standard member types and
methods expected of iterators (e.g., `difference_type`), in order for zip to be
used with other adaptors, such as `make_filter_range`.
Support for reverse iteration has also been added.
Differential Revision: https://reviews.llvm.org/D30246
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@296036 91177308-0d34-0410-b5e6-96231b3b80d8
Summary:
The helper will be used in a later change. This change itself is NFC
since the only user of this new function is its unit test.
Reviewers: majnemer, efriedma
Reviewed By: efriedma
Subscribers: efriedma, mcrosier, llvm-commits
Differential Revision: https://reviews.llvm.org/D30184
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@296035 91177308-0d34-0410-b5e6-96231b3b80d8
FastISel wasn't checking the isFPOnlySP subtarget feature before emitting
double-precision operations, so it got completely invalid CodeGen for doubles
on Cortex-M4F.
The normal ISel testing wasn't spectacular either so I added a second RUN line
to improve that while I was in the area.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@296031 91177308-0d34-0410-b5e6-96231b3b80d8
While not CVP's fault, this caused miscompiles (PR31181). Reverting
until those are resolved.
(This also reverts the follow-ups r288154 and r288161 which removed the
flag.)
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@296030 91177308-0d34-0410-b5e6-96231b3b80d8
Instead of stripping the longest common prefix off of the filenames in a
report, strip out the longest chain of redundant path components. This
fixes the case in PR31982, where there are two files with the same
prefix, and stripping out the LCP makes things less intelligible.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@296029 91177308-0d34-0410-b5e6-96231b3b80d8
Summary: SamplePGO uses branch_weight annotation to represent callsite hotness. When ICP promotes an indirect call to direct call, we need to make sure the direct call is annotated with branch_weight in SamplePGO mode, so that downstream function inliner can use hot callsite heuristic.
Reviewers: davidxl, eraman, xur
Reviewed By: davidxl, xur
Subscribers: mehdi_amini, llvm-commits
Differential Revision: https://reviews.llvm.org/D30282
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@296028 91177308-0d34-0410-b5e6-96231b3b80d8
In the bit tracker, references to other bit values in which the register
is 0 are prohibited. This means that generating self-referential register
cells like { w:32 [0-15]:s[0-15] [16-31]:s[15] } is impossible. In order
to get a self-referential cell, it had to be stored into a map and then
reloaded from it. To avoid this step, add a function that will set the
register to a given value without going through the map.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@296025 91177308-0d34-0410-b5e6-96231b3b80d8
Clang issues warning about hidden overload. That was intended, so
add "using AMDGPUGenRegisterInfo::getRegUnitWeight;" to mute it.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@296021 91177308-0d34-0410-b5e6-96231b3b80d8
Last use was killed in my previous patch. The preferred way is now to
construct the remark, pipe things to it and pass it to ORE.emit.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@296019 91177308-0d34-0410-b5e6-96231b3b80d8
The need for this removed when I converted everything to use the opt-remark
classes directly with the streaming interface.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@296017 91177308-0d34-0410-b5e6-96231b3b80d8
The need for this removed when I converted everything to use the opt-remark
classes directly with the streaming interface.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@296016 91177308-0d34-0410-b5e6-96231b3b80d8
Having more fine-grained information on the specific construct that
caused us to fallback is valuable for large-scale data collection.
We still have the fallback warning, that's also used for FastISel.
We still need to remove the fallback warning, and teach FastISel to also
emit remarks (it currently has a combination of the warning, stats, and
debug prints: the remarks could unify all three).
The abort-on-fallback path could also be better handled using remarks:
one could imagine a "-Rpass-error", analoguous to "-Werror", which would
promote missed/failed remarks to errors. It's not clear whether that
would be useful for other remarks though, so we're not there yet.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@296013 91177308-0d34-0410-b5e6-96231b3b80d8
If a subreg is used in an instruction it counts as a whole superreg
for the purpose of register pressure calculation. This patch corrects
improper register pressure calculation by examining operand's lane mask.
Differential Revision: https://reviews.llvm.org/D29835
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@296009 91177308-0d34-0410-b5e6-96231b3b80d8
This reverts commit r295749 while investigating PR32042.
It looks like this check uncovered a problem in the frontend that
needs to be fixed before the check can be enabled again.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@296005 91177308-0d34-0410-b5e6-96231b3b80d8
In OptimizeAdd, we scan the operand list to see if there are any common factors
between operands that can be factored out to reduce the number of multiplies
(e.g., 'A*A+A*B*C+D' -> 'A*(A+B*C)+D'). For each operand of the operand list, we
only consider unique factors (which is tracked by the Duplicate set). Now if we
find a factor that is a negative constant, we add the negated value as a factor
as well, because we can percolate the negate out. However, we mistakenly don't
add this negated constant to the Duplicates set.
Consider the expression A*2*-2 + B. Obviously, nothing to factor.
For the added value A*2*-2 we over count 2 as a factor without this change,
which causes the assert reported in PR30256. The problem is that this code is
assuming that all the multiply operands of the add are already reassociated.
This change avoids the issue by making OptimizeAdd tolerate multiplies which
haven't been completely optimized; this sort of works, but we're doing wasted
work: we'll end up revisiting the add later anyway.
Another possible approach would be to enforce RPO iteration order more strongly.
If we have RedoInsts, we process them immediately in RPO order, rather than
waiting until we've finished processing the whole function. Intuitively, it
seems like the natural approach: reassociation works on expression trees, so
the optimization only works in one direction. That said, I'm not sure how
practical that is given the current Reassociate; the "optimal" form for an
expression depends on its use list (see all the uses of "user_back()"), so
Reassociate is really an iterative optimization of sorts, so any changes here
would probably get messy.
PR30256
Differential Revision: https://reviews.llvm.org/D30228
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@296003 91177308-0d34-0410-b5e6-96231b3b80d8
Summary: The discriminator has been encoded, and only the base discriminator should be used during profile matching.
Reviewers: dblaikie, davidxl
Reviewed By: dblaikie, davidxl
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D30218
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@295999 91177308-0d34-0410-b5e6-96231b3b80d8
Since LoopInfo is not available in machine passes as universally as in IR
passes, using the same approach for OptimizationRemarkEmitter as we did for IR
will run LoopInfo and DominatorTree unnecessarily. (LoopInfo is not used
lazily by ORE.)
To fix this, I am modifying the approach I took in D29836. LazyMachineBFI now
uses its client passes including MachineBFI itself that are available or
otherwise compute them on the fly.
So for example GreedyRegAlloc, since it's already using MBFI, will reuse that
instance. On the other hand, AsmPrinter in Justin's patch will generate DT,
LI and finally BFI on the fly.
(I am of course wondering now if the simplicity of this approach is even
preferable in IR. I will do some experiments.)
Testing is provided by an updated version of D29837 which requires Justin's
patch to bring ORE to the AsmPrinter.
Differential Revision: https://reviews.llvm.org/D30128
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@295996 91177308-0d34-0410-b5e6-96231b3b80d8