Summary: This is in preparation for LoopSink pass which calls replaceDominatedUsesWith to update after sinking.
Reviewers: chandlerc, davidxl, danielcdh
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D24170
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@280427 91177308-0d34-0410-b5e6-96231b3b80d8
auto-brief format for doxygen comments. Most notable is switching to
that in the example doxygen comment. I've also tweaked the wording but
am happy to tweak it further if others have suggestions here.
Mostly doing this to capture something I and others have been writing
consistently and repeatedly in code reviews.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@280419 91177308-0d34-0410-b5e6-96231b3b80d8
Prior to this, we could generate a vector_shuffle from an IR shuffle when the
size of the result was exactly the sum of the sizes of the input vectors.
If the output vector was narrower - e.g. a <12 x i8> being formed by a shuffle
with two <8 x i8> inputs - we would lower the shuffle to a sequence of extracts
and inserts.
Instead, we can form a larger vector_shuffle, and then extract a subvector
of the right size - e.g. shuffle the two <8 x i8> inputs into a <16 x i8>
and then extract a <12 x i8>.
This also includes a target-specific X86 combine that in the presence of
AVX2 combines:
(vector_shuffle <mask> (concat_vectors t1, undef)
(concat_vectors t2, undef))
into:
(vector_shuffle <mask> (concat_vectors t1, t2), undef)
in cases where this allows us to form VPERMD/VPERMQ.
(This is not a separate commit, as that pattern does not appear without
the DAGBuilder change.)
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@280418 91177308-0d34-0410-b5e6-96231b3b80d8
Summary: This patch adds asm.js-style setjmp/longjmp handling support for WebAssembly. It also uses JavaScript's try and catch mechanism.
Reviewers: jpp, dschuff
Subscribers: jfb, dschuff
Differential Revision: https://reviews.llvm.org/D24121
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@280415 91177308-0d34-0410-b5e6-96231b3b80d8
They're another source of generic vregs, which are going to need a type on the
definition when we remove the register width from MachineRegisterInfo.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@280412 91177308-0d34-0410-b5e6-96231b3b80d8
With that change, images built with 'lld-link /debug' always have a
debug directory. If no PDB filename was passed on the command line, then
the filename in the executable is empty.
PDB information would never work anyway if the PDB file name is empty,
so go ahead and try DWARF in that case.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@280410 91177308-0d34-0410-b5e6-96231b3b80d8
We can now maintain scalar values in VectorLoopValueMap. Thus, we no longer
have to create temporary vectors with insertelement instructions when handling
pointer induction variables. This case was mistakenly missed from r279649 when
refactoring the other scalarization code.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@280405 91177308-0d34-0410-b5e6-96231b3b80d8
According to spec cvtdq2pd and cvtps2pd instructions don't require memory operand to be aligned
to 16 bytes. This patch removes this requirement from the memory folding table.
Differential Revision: https://reviews.llvm.org/D23919
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@280402 91177308-0d34-0410-b5e6-96231b3b80d8
My previous attempt at this connected the sub-project check targets to the test-depends target instead of to the check-all target. That resulted in the tests running multiple times on bots that built "test-depends" and "check-all" in separate build invocations.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@280392 91177308-0d34-0410-b5e6-96231b3b80d8
This patch moves the allocation of VectorParts for PHI nodes into the actual
PHI widening code. Previously, we allocated these VectorParts in
vectorizeBlockInLoop, and passed them by reference to widenPHIInstruction. Upon
returning, we would then map the VectorParts in VectorLoopValueMap. This
behavior is problematic for the cases where we only want to generate a scalar
version of a PHI node. For example, if in the future we only generate a scalar
version of an induction variable, we would end up inserting an empty vector
entry into the map once we return to vectorizeBlockInLoop. We now no longer
need to pass VectorParts to the various PHI widening functions, and we can keep
VectorParts allocation as close as possible to the point at which they are
actually mapped in VectorLoopValueMap.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@280390 91177308-0d34-0410-b5e6-96231b3b80d8
Legalization tends to create anyext(trunc) patterns. This should always be
combined - into either a single trunc, a single ext, or nothing if the
types match exactly. But if we happen to combine the trunc first, we may pull
the trunc away from the anyext or make it implicit (e.g. the truncate(extract)
-> extract(bitcast) fold).
To prevent this, we can avoid doing the fold, similarly to how we already handle
fpround(fpextend).
Differential Revision: https://reviews.llvm.org/D23893
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@280386 91177308-0d34-0410-b5e6-96231b3b80d8
Apparently nobody evaluated multiprocessing on Windows since Daniel
enabled multiprocessing on Unix in r193279. It works so far as I can
tell.
Today this is worth about an 8x speedup (631.29s to 73.25s) on my 24
core Windows machine. Hopefully this will improve Windows buildbot cycle
time, where currently it takes more time to run check-all than it does
to self-host with assertions enabled:
http://lab.llvm.org:8011/builders/clang-x86-windows-msvc2015/builds/20
build stage 2 ninja all ( 28 mins, 22 secs )
ninja check 2 stage 2 ( 37 mins, 38 secs )
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@280382 91177308-0d34-0410-b5e6-96231b3b80d8
This is a partial revert of r280013. Brad King pointed out these variable names are matching CMake conventions, so we should preserve them.
I've also added a direct mapping of the LLVM_*_DIR variables which we need to make projects support building in and out of tree.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@280380 91177308-0d34-0410-b5e6-96231b3b80d8
Previous change broke the C API for creating an EarlyCSE pass w/
MemorySSA by adding a bool parameter to control whether MemorySSA was
used or not. This broke the OCaml bindings. Instead, change the old C
API entry point back and add a new one to request an EarlyCSE pass with
MemorySSA.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@280379 91177308-0d34-0410-b5e6-96231b3b80d8
While removing a scalar shackle from an icmp fold, I noticed that I couldn't find any tests to trigger
this code path.
The 'and' shrinking transform should be handled by InstCombiner::foldCastedBitwiseLogic()
or eliminated with InstSimplify. The icmp narrowing is part of InstCombiner::foldICmpWithCastAndCast().
Differential Revision: https://reviews.llvm.org/D24031
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@280370 91177308-0d34-0410-b5e6-96231b3b80d8
This was a real restriction in the original version of SinkIfThenCodeToEnd. Now it's been rewritten, the restriction can be lifted.
As part of this, we handle a very common and useful case where one of the incoming branches is actually conditional. Consider:
if (a)
x(1);
else if (b)
x(2);
This produces the following CFG:
[if]
/ \
[x(1)] [if]
| | \
| | \
| [x(2)] |
\ | /
[ end ]
[end] has two unconditional predecessor arcs and one conditional. The conditional refers to the implicit empty 'else' arc. This same pattern can also be caused by an empty default block in a switch.
We can't sink the call to x() down to end because no call to x() happens on the third incoming arc (assume that x() has sideeffects for the sake of argument; if something is safe to speculate we could indeed sink nevertheless but this cannot happen in the general case and causes many extra selects).
We are now able to detect this case and split off the unconditional arcs to a common successor:
[if]
/ \
[x(1)] [if]
| | \
| | \
| [x(2)] |
\ / |
[sink.split] |
\ /
[ end ]
Now we can sink the call to x() into %sink.split. This can cause significant code simplification in many testcases.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@280364 91177308-0d34-0410-b5e6-96231b3b80d8
If an attribute name has special characters such as '\01', it is not
properly printed in LLVM assembly language format. Since the format
expects the special characters are printed as it is, it has to contain
escape characters to make it printable.
Before:
attributes #0 = { ... "counting-function"="^A__gnu_mcount_nc" ...
After:
attributes #0 = { ... "counting-function"="\01__gnu_mcount_nc" ...
Reviewers: hfinkel, rengolin, rjmccall, compnerd
Subscribers: nemanjai, mcrosier, hans, shenhan, majnemer, llvm-commits
Differential Revision: https://reviews.llvm.org/D23792
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@280357 91177308-0d34-0410-b5e6-96231b3b80d8
r279460 rewrote this function to be able to handle more than two incoming edges and took pains to ensure this didn't regress anything.
This time we change the logic for determining if an instruction should be sunk. Previously we used a single pass greedy algorithm - sink instructions until one requires more than one PHI node or we run out of instructions to sink.
This had the problem that sinking instructions that had non-identical but trivially the same operands needed extra logic so we sunk them aggressively. For example:
%a = load i32* %b %d = load i32* %b
%c = gep i32* %a, i32 0 %e = gep i32* %d, i32 1
Sinking %c and %e would naively require two PHI merges as %a != %d. But the loads are obviously equivalent (and maybe can't be hoisted because there is no common predecessor).
This is why we implemented the fairly complex function areValuesTriviallySame(), to look through trivial differences like this. However it's just not clever enough.
Instead, throw areValuesTriviallySame away, use pointer equality to check equivalence of operands and switch to a two-stage algorithm.
In the "scan" stage, we look at every sinkable instruction in isolation from end of block to front. If it's sinkable, we keep track of all operands that required PHI merging.
In the "sink" stage, we iteratively sink the last non-terminator in the source blocks. But when calculating how many PHIs are actually required to be inserted (to work out if we should stop or not) we remove any values that have already been sunk from the set of PHI-merges required, which allows us to be more aggressive.
This turns an algorithm with potentially recursive lookahead (looking through GEPs, casts, loads and any other instruction potentially not CSE'd) to two linear scans.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@280351 91177308-0d34-0410-b5e6-96231b3b80d8
LLVM has an @llvm.eh.dwarf.cfa intrinsic, used to lower the GCC-compatible
__builtin_dwarf_cfa() builtin. As pointed out in PR26761, this is currently
broken on PowerPC (and likely on ARM as well). Currently, @llvm.eh.dwarf.cfa is
lowered using:
ADD(FRAMEADDR, FRAME_TO_ARGS_OFFSET)
where FRAME_TO_ARGS_OFFSET defaults to the constant zero. On x86,
FRAME_TO_ARGS_OFFSET is lowered to 2*SlotSize. This setup, however, does not
work for PowerPC. Because of the way that the stack layout works, the canonical
frame address is not exactly (FRAMEADDR + FRAME_TO_ARGS_OFFSET) on PowerPC
(there is a lower save-area offset as well), so it is not just a matter of
implementing FRAME_TO_ARGS_OFFSET for PowerPC (unless we redefine its
semantics -- We can do that, since it is currently used only for
@llvm.eh.dwarf.cfa lowering, but the better to directly lower the CFA construct
itself (since it can be easily represented as a fixed-offset FrameIndex)). Mips
currently does this, but by using a custom lowering for ADD that specifically
recognizes the (FRAMEADDR, FRAME_TO_ARGS_OFFSET) pattern.
This change introduces a ISD::EH_DWARF_CFA node, which by default expands using
the existing logic, but can be directly lowered by the target. Mips is updated
to use this method (which simplifies its implementation, and I suspect makes it
more robust), and updates PowerPC to do the same.
Fixes PR26761.
Differential Revision: https://reviews.llvm.org/D24038
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@280350 91177308-0d34-0410-b5e6-96231b3b80d8
As discussed in https://reviews.llvm.org/D22666, our current mechanism to
support -pg profiling, where we insert calls to mcount(), or some similar
function, is fundamentally broken. We insert these calls in the frontend, which
means they get duplicated when inlining, and so the accumulated execution
counts for the inlined-into functions are wrong.
Because we don't want the presence of these functions to affect optimizaton,
they should be inserted in the backend. Here's a pass which would do just that.
The knowledge of the name of the counting function lives in the frontend, so
we're passing it here as a function attribute. Clang will be updated to use
this mechanism.
Differential Revision: https://reviews.llvm.org/D22825
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@280347 91177308-0d34-0410-b5e6-96231b3b80d8
initializers not being in the same order as the members.
Specifically, 'preg' is the first member followed by 'error', so they
will be initialized in that order and should be written in the member
initializer list in that order.
For the constructor in question, there is no change in behavior.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@280345 91177308-0d34-0410-b5e6-96231b3b80d8
We iterate over the result from SafeToMergeTerminators, so make it a SmallSetVector instead of a SmallPtrSet.
Should fix stage3 convergence builds.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@280342 91177308-0d34-0410-b5e6-96231b3b80d8
A very important case is not handled here: multiple arcs to a single block with a PHI. Consider:
a:
%1 = icmp %b, 1
br %1, label %c, label %e
c:
%2 = icmp %b, 2
br %2, label %d, label %e
d:
br %e
e:
phi [0, %a], [1, %c], [2, %d]
FoldValueComparisonIntoPredecessors will refuse to fold this, as it doesn't know how to deal with two arcs to a common destination with different PHI values. The answer is obvious - just split all conflicting arcs.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@280338 91177308-0d34-0410-b5e6-96231b3b80d8
Summary:
This change promotes the 'isTailCall(...)' member function to
TargetInstrInfo as a query interface for determining on a per-target
basis whether a given MachineInstr is a tail call instruction. We build
upon this in the XRay instrumentation pass to emit special sleds for
tail call optimisations, where we emit the correct kind of sled.
The tail call sleds look like a mix between the function entry and
function exit sleds. Form-wise, the sled comes before the "jmp"
instruction that implements the tail call similar to how we do it for
the function entry sled. Functionally, because we know this is a tail
call, it behaves much like an exit sled -- i.e. at runtime we may use
the exit trampolines instead of a different kind of trampoline.
A follow-up change to recognise these sleds will be done in compiler-rt,
so that we can start intercepting these initially as exits, but also
have the option to have different log entries to more accurately reflect
that this is actually a tail call.
Reviewers: echristo, rSerge, majnemer
Subscribers: mehdi_amini, dberris, llvm-commits
Differential Revision: https://reviews.llvm.org/D23986
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@280334 91177308-0d34-0410-b5e6-96231b3b80d8
Older versions of clang defined __has_cpp_attribute in C mode, but
would choke on scoped attributes, as per llvm.org/PR23435. Since we
support building with clang all the way back to 3.1, we have to work
around this issue.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@280326 91177308-0d34-0410-b5e6-96231b3b80d8