Summary:
This patch adds a callback registration API to the PassBuilder,
enabling registering out-of-tree passes with it.
Through the Callback API, callers may register callbacks with the
various stages at which passes are added into pass managers, including
parsing of a pass pipeline as well as at extension points within the
default -O pipelines.
Registering utilities like `require<>` and `invalidate<>` needs to be
handled manually by the caller, but a helper is provided.
Additionally, adding passes at pipeline extension points is exposed
through the opt tool. This patch adds a `-passes-ep-X` commandline
option for every extension point X, which opt parses into pipelines
inserted into that extension point.
Reviewers: chandlerc
Reviewed By: chandlerc
Subscribers: lksbhm, grosser, davide, mehdi_amini, llvm-commits, mgorny
Differential Revision: https://reviews.llvm.org/D33464
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@307532 91177308-0d34-0410-b5e6-96231b3b80d8
The SandyBridge architects have provided us with a more accurate information about each instruction latency, number of uOPs and used ports and I used it to replace the existing estimated SNB instructions scheduling and to add missing scheduling information.
Please note that the patch extensively affects the X86 MC instr scheduling for SNB.
Also note that this patch will be followed by additional patches for the remaining target architectures HSW, IVB, BDW, SKL and SKX.
The updated and extended information about each instruction includes the following details:
•static latency of the instruction
•number of uOps from which the instruction consists of
•all ports used by the instruction's' uOPs
For example, the following code dictates that instructions, ADC64mr, ADC8mr, SBB64mr, SBB8mr have a static latency of 9 cycles. Each of these instructions is decoded into 6 micro operations which use ports 4, ports 2 or 3 and port 0 and ports 0 or 1 or 5:
def SBWriteResGroup94 : SchedWriteRes<[SBPort4,SBPort23,SBPort0,SBPort015]> {
let Latency = 9;
let NumMicroOps = 6;
let ResourceCycles = [1,2,2,1];
}
def: InstRW<[SBWriteResGroup94], (instregex "ADC64mr")>;
def: InstRW<[SBWriteResGroup94], (instregex "ADC8mr")>;
def: InstRW<[SBWriteResGroup94], (instregex "SBB64mr")>;
def: InstRW<[SBWriteResGroup94], (instregex "SBB8mr")>;
Note that apart for the header, most of the X86SchedSandyBridge.td file was generated by a script.
Reviewers: zvi, chandlerc, RKSimon, m_zuckerman, craig.topper, igorb
Differential Revision: https://reviews.llvm.org/D35019#inline-304691
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@307529 91177308-0d34-0410-b5e6-96231b3b80d8
Summary:
Mark G_ZEXT/G_SEXT i1 to i8/i16, i8 to i16 as legal.
Support G_ZEXT i1 to i8/i16 instruction selection ( C++ code).
This patch requred to support G_LOAD/G_STORE i1.
Reviewers: zvi, guyblank
Reviewed By: guyblank
Subscribers: rovka, llvm-commits, kristof.beyls
Differential Revision: https://reviews.llvm.org/D35177
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@307526 91177308-0d34-0410-b5e6-96231b3b80d8
Summary:
This solves PR33641.
When removing a dead argument we must also handle possibly existing calls
to llvm.dbg.value that use the removed argument. Now we change the use
of the otherwise dead argument to an undef for some other pass to cleanup
later.
If the calls are left untouched, they will later on cause errors:
"function-local metadata used in wrong function"
since the ArgumentPromotion rewrites the code by creating a new function
with the wanted signature, but the metadata is not recreated so the new
function may then erroneously use metadata from the old function.
Reviewers: mstorsjo, rnk, arsenm
Reviewed By: rnk
Subscribers: wdng, llvm-commits
Differential Revision: https://reviews.llvm.org/D34874
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@307521 91177308-0d34-0410-b5e6-96231b3b80d8
WidenVSELECTAndMask can fold (and it folds in this case) so we
get a BUILD_VECTOR of constants as mask. convertMask() seems to
work fine when the input is a vector of constants, and we still
need to call it to extend/add elements at the end. but the current
code just asserts on anything but a SETCC or AND/OR/XOR of 2xSETCC.
This change was discussed briefly with Simon Pilgrim, who also
suggests we might consider dropping this assertion in the future.
Fixes PR33715.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@307508 91177308-0d34-0410-b5e6-96231b3b80d8
This change fixes a bug in SelectionDAGBuilder::visitInsertValue and SelectionDAGBuilder::visitExtractValue where constant expressions (InsertValueConstantExpr and ExtractValueConstantExpr) would be treated as non-constant instructions (InsertValueInst and ExtractValueInst). This bug resulted in an incorrect memory access, which manifested as an assertion failure in SDValue::SDValue.
Fixes PR#33094.
Submitted on behalf of @Praetonus (Benoit Vey)
Differential Revision: https://reviews.llvm.org/D34538
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@307502 91177308-0d34-0410-b5e6-96231b3b80d8
invalidation of analyses when merging SCCs.
While I've added a bunch of testing of this, it takes something much
more like the inliner to really trigger this as you need to have
partially-analyzed SCCs with updates at just the right time. So I've
added a direct test for this using the inliner and verifying the
domtree. Without the changes here, this test ends up finding a stale
dominator tree.
However, to handle this properly, we need to invalidate analyses
*before* merging the SCCs. After talking to Philip and Sanjoy about this
they convinced me this was the right approach. To do this, we need
a callback mechanism when merging SCCs so we can observe the cycle that
will be merged before the merge happens. This API update ended up being
surprisingly easy.
With this commit, the new PM passes the test-suite again. It hadn't
since MemorySSA was enabled for EarlyCSE as that also will find this bug
very quickly.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@307498 91177308-0d34-0410-b5e6-96231b3b80d8
the invalidation propagation logic from an SCC to a Function.
I wrote the infrastructure to test this but didn't actually use it in
the unit test where it was designed to be used. =[ My bad. Once
I actually added it to the test case I discovered that it also hadn't
been properly implemented, so I've implemented it. The logic in the FAM
proxy for an SCC pass to propagate invalidation follows the same ideas
as the FAM proxy for a Module pass, but the implementation is a bit
different to reflect the fact that it is forwarding just for an SCC.
However, implementing this correctly uncovered a surprising "bug" (it
was conservatively correct but relatively very expensive) in how we
handle invalidation when splitting one SCC into multiple SCCs. We did an
eager invalidation when in reality we should be deferring invaliadtion
for the *current* SCC to the CGSCC pass manager and just invaliating the
newly constructed SCCs. Otherwise we end up invalidating too much too
soon. This was exposed by the inliner test case that I've updated. Now,
we invalidate *just* the split off '(test1_f)' SCC when doing the CG
update, and then the inliner finishes and invalidates the '(test1_g,
test1_h)' SCC's analyses. The first few attempts at fixing this hit
still more bugs, but all of those are covered by existing tests. For
example, the inliner should also preserve the FAM proxy to avoid
unnecesasry invalidation, and this is safe because the CG update
routines it uses handle any necessary adjustments to the FAM proxy.
Finally, the unittests for the CGSCC pass manager needed a bunch of
updates where we weren't correctly preserving the FAM proxy because it
hadn't been fully implemented and failing to preserve it didn't matter.
Note that this doesn't yet fix the current crasher due to MemSSA finding
a stale dominator tree, but without this the fix to that crasher doesn't
really make any sense when testing because it relies on the proxy
behavior.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@307487 91177308-0d34-0410-b5e6-96231b3b80d8
The patch was reverted due to a bug. The bug was that if the IV is the 2nd operand of the icmp
instruction, then the "Pred" variable gets swapped and differs from the instruction's predicate.
In this patch we use the original predicate to do the transformation.
Also added a test case that exercises this situation.
Differentian Revision: https://reviews.llvm.org/D35107
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@307477 91177308-0d34-0410-b5e6-96231b3b80d8
I'm looking at a cmp transform in InstCombine that would affect these tests,
but it's hard to know if it makes things better or worse without seeing the
full IR. OTOH, maybe these tests shouldn't be running a bunch of transform
passes in the first place?
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@307475 91177308-0d34-0410-b5e6-96231b3b80d8
x86 scalar select-of-constants (Cond ? C1 : C2) combining/lowering is a mess
with missing optimizations. We handle some patterns, but miss logical variants.
To clean that up, we should convert all select-of-constants to logic/math and
enhance the combining for the expected patterns from that. Selecting 0 or -1
needs extra attention to produce the optimal code as shown here.
Attempt to verify that all of these IR forms are logically equivalent:
http://rise4fun.com/Alive/plxs
Earlier steps in this series:
rL306040
rL306072
rL307404 (D34652)
As acknowledged in the earlier review, there's a possibility that some Intel
uarch would prefer to produce an xor to clear the fake register operand with
sbb %eax, %eax. This will likely need to be addressed in a separate pass.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@307471 91177308-0d34-0410-b5e6-96231b3b80d8
Summary: For interative sample-pgo, if a hot call site is inlined in the profiling binary, we should inline it in before profile annotation in the backend. Before that, the compile phase first collects all GUIDs that needs to be imported and creates virtual "hot" call edge in the summary. However, "hot" is not good enough to guarantee the callsites get inlined. This patch introduces "critical" call edge, and assign much higher importing threshold for those edges.
Reviewers: tejohnson
Reviewed By: tejohnson
Subscribers: sanjoy, mehdi_amini, llvm-commits, eraman
Differential Revision: https://reviews.llvm.org/D35096
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@307439 91177308-0d34-0410-b5e6-96231b3b80d8
With the NFC refactoring in rL307417 (git SHA 987dd01), all the logic
is in place to support multiple exit/exiting blocks when prolog
remainder is generated.
This patch removed the assert that multiple exit blocks unrolling is only
supported when epilog remainder is generated.
Also, added test runs and checks with PROLOG prefix in
runtime-loop-multiple-exits.ll test cases.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@307435 91177308-0d34-0410-b5e6-96231b3b80d8
When reusing a register for a new definition, the fast register allocator
used to insert a kill flag at the previous last use of that register to
inform later passes that this register is free between the redef and the
last use. However, this may be wrong when subregisters are involved.
Indeed, a partially redef would have trigger a kill of the full super
register, potentially wrongly marking all the other subregisters as
free. Given we don't track which lanes are still live, we cannot set the
kill flag in such case.
Note: This bug has been latent for about 7 years (r104056).
llvmg.org/PR33677
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@307428 91177308-0d34-0410-b5e6-96231b3b80d8
r306334 fixed a bug in AArch64 dealing with wide interleaved accesses having
pointer types. The bug also exists in ARM, so this patch copies over the fix.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@307409 91177308-0d34-0410-b5e6-96231b3b80d8
x86 scalar select-of-constants (Cond ? C1 : C2) combining/lowering is a mess
with missing optimizations. We handle some patterns, but miss logical variants.
To clean that up, we should convert all select-of-constants to logic/math and
enhance the combining for the expected patterns from that. DAGCombiner already
has the foundation to allow the transforms, so we just need to fill in the holes
for x86 math op lowering. Selecting 0 or -1 needs extra attention to produce the
optimal code as shown here.
Attempt to verify that all of these IR forms are logically equivalent:
http://rise4fun.com/Alive/plxs
Earlier steps in this series:
rL306040
rL306072
Differential Revision: https://reviews.llvm.org/D34652
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@307404 91177308-0d34-0410-b5e6-96231b3b80d8
Prior to this commit both of the added test cases were passing. However, in the
latter case (test7) we were doing a lot more work to arrive at the same answer
(i.e., we were using isImpliedCondMatchingOperands() to determine the
implication.).
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@307400 91177308-0d34-0410-b5e6-96231b3b80d8
Today the safepoint IR verifier catches some unrelocated uses of base
pointers that are actually valid.
With this change, we narrow down the set of false positives.
Specifically, the verifier knows about compares to null and compares
between 2 unrelocated pointers.
Reviewed by: skatkov
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D35057
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@307392 91177308-0d34-0410-b5e6-96231b3b80d8
Summary:
This change gives a 0.89% speed on execution time, a 0.94% improvement
in benchmark scores and a 0.62% increase in binary size on a Cortex-A57.
These numbers are the geomean results on a wide range of benchmarks from
the test-suite, SPEC2000, SPEC2006 and a range of proprietary suites.
The software optimization guide for the Cortex-A57 recommends 16 byte
branch alignment.
Reviewers: t.p.northover, mcrosier, javed.absar, kristof.beyls, sbaranga
Reviewed By: kristof.beyls
Subscribers: aemerson, rengolin, llvm-commits
Differential Revision: https://reviews.llvm.org/D34954
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@307389 91177308-0d34-0410-b5e6-96231b3b80d8
Summary:
This change gives a 0.34% speed on execution time, a 0.61% improvement
in benchmark scores and a 0.57% increase in binary size on a Cortex-A72.
These numbers are the geomean results on a wide range of benchmarks from
the test-suite, SPEC2000, SPEC2006 and a range of proprietary suites.
The software optimization guide for the Cortex-A72 recommends 16 byte
branch alignment.
Reviewers: t.p.northover, kristof.beyls, rengolin, sbaranga, mcrosier, javed.absar
Reviewed By: kristof.beyls
Subscribers: llvm-commits, aemerson
Differential Revision: https://reviews.llvm.org/D34961
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@307380 91177308-0d34-0410-b5e6-96231b3b80d8
We lower to a sequence consisting of:
- MOVi 0 into a register
- VCMPS to do the actual comparison and set the VFP flags
- FMSTAT to move the flags out of the VFP unit
- MOVCCi to either use the "zero register" that we have previously set
with the MOVi, or move 1 into the result register, based on the values
of the flags
As was the case with soft-float, for some predicates (one, ueq) we
actually need two comparisons instead of just one. When that happens, we
generate two VCMPS-FMSTAT-MOVCCi sequences and chain them by means of
using the result of the first MOVCCi as the "zero register" for the
second one. This is a bit overkill, since one comparison followed by
two non-flag-setting conditional moves should be enough. In any case,
the backend manages to CSE one of the comparisons away so it doesn't
matter much.
Note that unlike SelectionDAG and FastISel, we always use VCMPS, and not
VCMPES. This makes the code a lot simpler, and it also seems correct
since the LLVM Lang Ref defines simple true/false returns if the
operands are QNaN's. For SNaN's, even VCMPS throws an Invalid Operand
exception, so they won't be slipping through unnoticed.
Implementation-wise, this introduces a template so we can share the same
code that we use for handling integer comparisons, since the only
differences are in the details (exact opcodes to be used etc). Hopefully
this will be easy to extend to s64 G_FCMP.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@307365 91177308-0d34-0410-b5e6-96231b3b80d8
Based strictly on the name, this seems to have something to do
width edit & continue. The goal of this patch has nothing to do
with supporting edit and continue though. msvc link.exe writes
very basic information into this area even when *not* compiling
with support for E&C, and so the goal here is to bring lld-link
to parity. Since we cannot know what assumptions standard tools
make about the content of PDB files, we need to be as close as
possible.
This ECNames data structure is a standard PDB string hash table.
link.exe puts a single string into this hash table, which is the
full path to the PDB file on disk. It then references this string
from the module descriptor for the compiler generated `* Linker *`
module.
With this patch, lld-link will generate the exact same sequence of
bytes as MSVC link for this subsection for a given object file
input (as reported by `llvm-pdbutil bytes -ec`).
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@307356 91177308-0d34-0410-b5e6-96231b3b80d8
InferAddressSpaces does not check address space in collectFlatAddressExpressions,
which causes values with non flat address space put into Postorder and causes
assertion in cloneValueWithNewAddressSpace.
This patch fixes assertion in OpenCL 2.0 conformance test generic_address_space
subtest for amdgcn target.
Differential Revision: https://reviews.llvm.org/D34991
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@307349 91177308-0d34-0410-b5e6-96231b3b80d8
Model weakly defined symbols as symbols that are both
exports and imported and marked as weak. Local references
to the symbols refer to the import but the linker can
resolve this to the weak export if not strong symbol
is found at link time.
Differential Revision: https://reviews.llvm.org/D35029
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@307348 91177308-0d34-0410-b5e6-96231b3b80d8
Adds loop expansions for known-size and unknown-sized memcpy calls, allowing the
target to provide the operand types through TTI callbacks. The default values
for the TTI callbacks use int8 operand types and matches the existing behaviour
if they aren't overridden by the target.
Differential revision: https://reviews.llvm.org/D32536
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@307346 91177308-0d34-0410-b5e6-96231b3b80d8
Revert "Copy arguments passed by value into explicit allocas for ASan."
Revert "[asan] Add end-to-end tests for overflows of byval arguments."
Build failure on lldb-x86_64-ubuntu-14.04-buildserver.
Test failure on clang-cmake-aarch64-42vma and sanitizer-x86_64-linux-android.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@307345 91177308-0d34-0410-b5e6-96231b3b80d8
ASan determines the stack layout from alloca instructions. Since
arguments marked as "byval" do not have an explicit alloca instruction, ASan
does not produce red zones for them. This commit produces an explicit alloca
instruction and copies the byval argument into the allocated memory so that red
zones are produced.
Patch by Matt Morehouse.
Differential revision: https://reviews.llvm.org/D34789
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@307342 91177308-0d34-0410-b5e6-96231b3b80d8
Using profile information to guide consthoisting is generally helpful for
performance, so the patch turns it on by default. No compile time or perf
regression were found using spec2000 and spec2006 on x86. Some significant
improvement (>20%) was seen on internal benchmarks.
Differential Revision: https://reviews.llvm.org/D35063
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@307338 91177308-0d34-0410-b5e6-96231b3b80d8
The patch is to adjust the strategy of frequency based consthoisting:
Previously when the candidate block has the same frequency with the existing
blocks containing a const, it will not hoist the const to the candidate block.
For that case, now we change the strategy to hoist the const if only existing
blocks have more than one block member. This is helpful for reducing code size.
Differential Revision: https://reviews.llvm.org/D35084
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@307328 91177308-0d34-0410-b5e6-96231b3b80d8
The patch adds support of i128 params lowering. The changes are quite trivial to
support i128 as a "special case" of integer type. With this patch, we lower i128
params the same way as aggregates of size 16 bytes: .param .b8 _ [16].
Currently, NVPTX can't deal with the 128 bit integers:
* in some cases because of failed assertions like
ValVTs.size() == OutVals.size() && "Bad return value decomposition"
* in other cases emitting PTX with .i128 or .u128 types (which are not valid [1])
[1] http://docs.nvidia.com/cuda/parallel-thread-execution/index.html#fundamental-types
Differential Revision: https://reviews.llvm.org/D34555
Patch by: Denys Zariaiev (denys.zariaiev@gmail.com)
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@307326 91177308-0d34-0410-b5e6-96231b3b80d8
Summary:
The capture() function was removed in r306625. This should fix PGO breakages
reported by Michael Zolotukhin.
Reviewers: mzolotukhin
Subscribers: sanjoy, llvm-commits
Differential Revision: https://reviews.llvm.org/D35088
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@307320 91177308-0d34-0410-b5e6-96231b3b80d8
Regardless of relaxation options such as -cl-fast-relaxed-math
we are producing rather long code for fdiv via amdgcn_fdiv_fast
intrinsic. This intrinsic is used to replace fdiv with 2.5ulp
metadata and does not handle denormals, thus believed to be fast.
An fdiv instruction can also have fast math flag either by itself
or together with fpmath metadata. Clang used with a relaxation flag
always produces both metadata and fast flag:
%div = fdiv fast float %v, %0, !fpmath !12!12 = !{float 2.500000e+00}
Current implementation ignores fast flag and favors metadata. An
instruction with just fast flag would be lowered to a fastest rcp +
mul, but that never happen on practice because of described mutual
clang and BE behavior.
This change allows an "fdiv fast" to be always lowered as rcp + mul.
Differential Revision: https://reviews.llvm.org/D34844
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@307308 91177308-0d34-0410-b5e6-96231b3b80d8