It is cleaner to have a callback based system where the logic of
whether an add recurrence is normalized or not lives on IVUsers.
This is one step in a multi-step cleanup.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@300330 91177308-0d34-0410-b5e6-96231b3b80d8
MOVNTDQA non-temporal aligned vector loads can be correctly represented using generic builtin loads, allowing us to remove the existing x86 intrinsics.
Clang companion patch: D31766.
Differential Revision: https://reviews.llvm.org/D31767
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@300325 91177308-0d34-0410-b5e6-96231b3b80d8
This patch is part of D28975's breakdown - no change in output intended.
LV's code currently assumes the vectorized loop is a single basic block up
until predicateInstructions() is called. This patch removes two manifestations
of this assumption (loop phi incoming values, dominator tree update) by
replacing the use of vectorLoopBody with the vectorized loop's latch/header.
Differential Revision: https://reviews.llvm.org/D32040
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@300310 91177308-0d34-0410-b5e6-96231b3b80d8
The APInt was created from an 'unsigned' and we just wanted to know how many bits the value needed to represent it. We can just use Log2_32 from MathExtras.h to get the info.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@300309 91177308-0d34-0410-b5e6-96231b3b80d8
Start using it in LLD to avoid needing to read bitcode again just to get the
target triple, and in llvm-lto2 to avoid printing symbol table information
that is inappropriate for the target.
Differential Revision: https://reviews.llvm.org/D32038
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@300300 91177308-0d34-0410-b5e6-96231b3b80d8
The tests were failing due to an occasional deadlock in SerializationTraits
for Error: Both serializers and deserializers were protected by a single
mutex and in the unit test (where both ends of the RPC are in the same
process) one side might obtain the mutex, then block waiting for input,
leaving the other side of the connection unable to obtain the mutex to
write the data the first side was waiting for. Splitting the mutex into
two (one for serialization, one for deserialization) appears to have fixed the
issue.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@300286 91177308-0d34-0410-b5e6-96231b3b80d8
Now that we have a type that can represent the attributes on a single
return, function, or parameter, we can pass it around directly rather
than passing around AttributeList and Idx. Removes some more one-based
argument attribute index counting.
NFC
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@300285 91177308-0d34-0410-b5e6-96231b3b80d8
This further improves Ahmed's change in rL299482. See the new comment for the
rationale.
The patch recovers most of the regression for bzip2 after D31965. We're down
to +2.68% from +6.97%.
Differential Revision: https://reviews.llvm.org/D32028
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@300276 91177308-0d34-0410-b5e6-96231b3b80d8
Add hasParamAttribute() and use it instead of hasAttribute(ArgNo+1,
Kind) everywhere.
The fact that the AttributeList index for an argument is ArgNo+1 should
be a hidden implementation detail.
NFC
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@300272 91177308-0d34-0410-b5e6-96231b3b80d8
If the offset cannot fit into the instruction, an addition to the
pointer is emitted before the actual access. However, BPF offsets are
16-bit but LLVM considers them to be, for the matter of this check,
to be 32-bit long.
This causes the following program:
int bpf_prog1(void *ign)
{
volatile unsigned long t = 0x8983984739ull;
return *(unsigned long *)((0xffffffff8fff0002ull) + t);
}
To generate the following (wrong) code:
0: 18 01 00 00 39 47 98 83 00 00 00 00 89 00 00 00
r1 = 590618314553ll
2: 7b 1a f8 ff 00 00 00 00 *(u64 *)(r10 - 8) = r1
3: 79 a1 f8 ff 00 00 00 00 r1 = *(u64 *)(r10 - 8)
4: 79 10 02 00 00 00 00 00 r0 = *(u64 *)(r1 + 2)
5: 95 00 00 00 00 00 00 00 exit
Fix it by changing the offset check to 16-bit.
Patch by Nadav Amit <nadav.amit@gmail.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Differential Revision: https://reviews.llvm.org/D32055
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@300269 91177308-0d34-0410-b5e6-96231b3b80d8
- Refer to options by `-option` instead of `option`
- Use `-mtriple=` instead of `-march` in the example (-march will still
target the default operating system which is usually not what you want
in a test)
- Rephrase sentence because output does not go to stdout by default (you
need -o - for that as should be expected).
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@300268 91177308-0d34-0410-b5e6-96231b3b80d8
We call it unconditionally on the operands of the select. Then decide if its a min/max and call it on the min/max operands or on the select operands again. Either of those second calls will overwrite the results of the initial call so we can just delete the first call.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@300256 91177308-0d34-0410-b5e6-96231b3b80d8
For LCSSA purposes, loop BBs not dominating any of the exits aren't
interesting, as none of the values defined in these blocks can be
used outside the loop.
The way the code computed this information was by comparing each
BB of the loop with each of the exit blocks and ask the dominator tree
about their dominance relation. This is slow.
A more efficient way, implemented here, is that of starting from the
exit blocks and walking the dom upwards until we hit an header. By
transitivity, all the blocks we encounter in our path dominate an exit.
For the testcase provided in PR31851, this reduces compile time on
`opt -O2` by ~25%, going from 1m47s to 1m22s.
Thanks to Dan/MichaelZ for discussions/suggesting the approach/review.
Differential Revision: https://reviews.llvm.org/D31843
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@300255 91177308-0d34-0410-b5e6-96231b3b80d8
Switch from Euclid's algorithm to Stein's algorithm for computing GCD. This
avoids the (expensive) APInt division operation in favour of bit operations.
Remove all memory allocation from within the GCD loop by tweaking our `lshr`
implementation so it can operate in-place.
Differential Revision: https://reviews.llvm.org/D31968
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@300252 91177308-0d34-0410-b5e6-96231b3b80d8
Summary:
Bug noticed by inspection.
Extend the test to handle invokes as well as calls, and rewrite it to
not depend on the inliner and other passes.
Also simplify the call site replacement code with CallSite, similar to
what I did to dead arg elimination and arg promotion (rL300235 and
rL300229).
Reviewers: danielcdh, davidxl
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D32041
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@300251 91177308-0d34-0410-b5e6-96231b3b80d8
We could otherwise add BBs not belonging to a loop in `formLCSSA`
and later crash when trying to iterate the loop blocks.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@300244 91177308-0d34-0410-b5e6-96231b3b80d8
Summary: For iterative SamplePGO, an indirect call can be speculatively promoted to multiple direct calls and get inlined. All these promoted direct calls will share the same callsite location (offset+discriminator). With the current implementation, we cannot distinguish between different promotion candidates and its inlined instance. This patch adds callee_name to the key of the callsite sample map. And added helper functions to get all inlined callee samples for a given callsite location. This helps the profile annotator promote correct targets and inline it before annotation, and ensures all indirect call targets to be annotated correctly.
Reviewers: davidxl, dnovillo
Reviewed By: davidxl
Subscribers: andreadb, llvm-commits
Differential Revision: https://reviews.llvm.org/D31950
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@300240 91177308-0d34-0410-b5e6-96231b3b80d8