By default, git creates "llvm-project-20170507" directory,
but we want to create "llvm-project" directory.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@303124 91177308-0d34-0410-b5e6-96231b3b80d8
We don't use section-relative relocations on AArch64, so all symbols must be at
least visible to the linker (i.e. properly global or l_whatever, but not
L_whatever).
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@303118 91177308-0d34-0410-b5e6-96231b3b80d8
ARM Neon has native support for half-sized vector registers (64 bits). This
is beneficial for example for 2D and 3D graphics. This patch adds the option
to lower MinVecRegSize from 128 via a TTI in the SLP Vectorizer.
*** Performance Analysis
This change was motivated by some internal benchmarks but it is also
beneficial on SPEC and the LLVM testsuite.
The results are with -O3 and PGO. A negative percentage is an improvement.
The testsuite was run with a sample size of 4.
** SPEC
* CFP2006/482.sphinx3 -3.34%
A pretty hot loop is SLP vectorized resulting in nice instruction reduction.
This used to be a +22% regression before rL299482.
* CFP2000/177.mesa -3.34%
* CINT2000/256.bzip2 +6.97%
My current plan is to extend the fix in rL299482 to i16 which brings the
regression down to +2.5%. There are also other problems with the codegen in
this loop so there is further room for improvement.
** LLVM testsuite
* SingleSource/Benchmarks/Misc/ReedSolomon -10.75%
There are multiple small SLP vectorizations outside the hot code. It's a bit
surprising that it adds up to 10%. Some of this may be code-layout noise.
* MultiSource/Benchmarks/VersaBench/beamformer/beamformer -8.40%
The opt-viewer screenshot can be seen at F3218284. We start at a colder store
but the tree leads us into the hottest loop.
* MultiSource/Applications/lambda-0.1.3/lambda -2.68%
* MultiSource/Benchmarks/Bullet/bullet -2.18%
This is using 3D vectors.
* SingleSource/Benchmarks/Shootout-C++/Shootout-C++-lists +6.67%
Noise, binary is unchanged.
* MultiSource/Benchmarks/Ptrdist/anagram/anagram +4.90%
There is an additional SLP in the cold code. The test runs for ~1sec and
prints out over 2000 lines. This is most likely noise.
* MultiSource/Applications/aha/aha +1.63%
* MultiSource/Applications/JM/lencod/lencod +1.41%
* SingleSource/Benchmarks/Misc/richards_benchmark +1.15%
Differential Revision: https://reviews.llvm.org/D31965
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@303116 91177308-0d34-0410-b5e6-96231b3b80d8
This caused PR33053.
Original commit message:
> The new experimental reduction intrinsics can now be used, so I'm enabling this
> for AArch64. We will need this for SVE anyway, so it makes sense to do this for
> NEON reductions as well.
>
> The existing code to match shufflevector patterns are replaced with a direct
> lowering of the reductions to AArch64-specific nodes. Tests updated with the
> new, simpler, representation.
>
> Differential Revision: https://reviews.llvm.org/D32247
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@303115 91177308-0d34-0410-b5e6-96231b3b80d8
We were silently ignoring any features we couldn't match up, which led to
errors in an inline asm block missing the conventional "\n\t".
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@303108 91177308-0d34-0410-b5e6-96231b3b80d8
Summary:
The following loops should be recognized:
i = 0;
while (n) {
n = n >> 1;
i++;
body();
}
use(i);
And replaced with builtin_ctlz(n) if body() is empty or
for CPUs that have CTLZ instruction converted to countable:
for (j = 0; j < builtin_ctlz(n); j++) {
n = n >> 1;
i++;
body();
}
use(builtin_ctlz(n));
Reviewers: rengolin, joerg
Differential Revision: http://reviews.llvm.org/D32605
From: Evgeny Stupachenko <evstupac@gmail.com>
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@303102 91177308-0d34-0410-b5e6-96231b3b80d8
verifyMemoryCongruency() filters out trivially dead MemoryDef(s),
as we find them immediately dead, before moving from TOP to a new
congruence class.
This fixes the same problem for PHI(s) skipping MemoryPhis if all
the operands are dead.
Differential Revision: https://reviews.llvm.org/D33044
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@303100 91177308-0d34-0410-b5e6-96231b3b80d8
Summary:
All GlobalIndirectSymbol types (not just GlobalAlias) should return
their base object.
Without this patch LTO would warn "Unable to determine comdat of
alias!" for an ifunc.
Reviewers: pcc
Subscribers: mehdi_amini, inglorion, llvm-commits
Differential Revision: https://reviews.llvm.org/D33202
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@303096 91177308-0d34-0410-b5e6-96231b3b80d8
At O3 we are more willing to increase size if we believe it will improve
performance. The current threshold for tail-duplication of 2 instructions is
conservative, and can be relaxed at O3.
Benchmark results:
llvm test-suite:
6% improvement in aha, due to duplication of loop latch
3% improvement in hexxagon
2% slowdown in lpbench. Seems related, but couldn't completely diagnose.
Internal google benchmark:
Produces 4% improvement on internal google protocol buffer serialization
benchmarks.
Differential-Revision: https://reviews.llvm.org/D32324
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@303084 91177308-0d34-0410-b5e6-96231b3b80d8
Follow up to D33147
NVPTXTargetLowering::LowerCall was trusting the default argument values.
Fixes another 17 of the NVPTX '-verify-machineinstrs with EXPENSIVE_CHECKS' errors in PR32146.
Differential Revision: https://reviews.llvm.org/D33189
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@303082 91177308-0d34-0410-b5e6-96231b3b80d8
This patch enables fusing dependent AESE/AESMC and AESD/AESIMC
instruction pairs on Cortex-A72, as recommended in the Software
Optimization Guide, section 4.10.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@303073 91177308-0d34-0410-b5e6-96231b3b80d8
Doing this means that if an LEApcrel is used in two places we will rematerialize
instead of generating two MOVs. This is particularly useful for printfs using
the same format string, where we want to generate an address into a register
that's going to get corrupted by the call.
Differential Revision: https://reviews.llvm.org/D32858
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@303054 91177308-0d34-0410-b5e6-96231b3b80d8
Doing this lets us hoist it out of loops, and I've also marked it as
rematerializable the same as the thumb1 and thumb2 counterparts.
It looks like it being marked as such was just a mistake, as the commit that
made that change only mentions LEApcrelJT and in thumb1 and thumb2 only the
LEApcrelJT instructions were marked as having side-effects, so it looks like
the intent was to only mark LEApcrelJT as having side-effects but LEApcrel was
accidentally marked as such also.
Differential Revision: https://reviews.llvm.org/D32857
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@303053 91177308-0d34-0410-b5e6-96231b3b80d8
I am working on a speedup of building .gdb_index in LLD and
noticed that relocations that are proccessed in DWARFContextInMemory often uses
the same symbol in a row. This patch introduces caching to reduce the relocations
proccessing time.
For benchmark,
I took debug LLC binary objects configured with -ggnu-pubnames and linked it using LLD.
Link time without --gdb-index is about 4,45s.
Link time with --gdb-index: a) Without patch: 19,16s b) With patch: 15,52s
That means time spent on --gdb-index in this configuration is
19,16s - 4,45s = 14,71s (without patch) vs 15,52s - 4,45s = 11,07s (with patch).
Differential revision: https://reviews.llvm.org/D31136
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@303051 91177308-0d34-0410-b5e6-96231b3b80d8
Currently, when masked load, store, gather or scatter intrinsics are used, we check in CodeGenPrepare pass if the subtarget support these intrinsics, if not we replace them with scalar code - this is a functional transformation not an optimization (not optional).
CodeGenPrepare pass does not run when the optimization level is set to CodeGenOpt::None (-O0).
Functional transformation should run with all optimization levels, so here I created a new pass which runs on all optimization levels and does no more than this transformation.
Differential Revision: https://reviews.llvm.org/D32487
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@303050 91177308-0d34-0410-b5e6-96231b3b80d8
NFC followup to D33147, this explicitly sets all the arguments (instead of relying on the defaults) to SelectionDAG::getMemIntrinsicNode to help identify -verify-machineinstrs issues.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@303047 91177308-0d34-0410-b5e6-96231b3b80d8
Summary:
We were asserting in RegisterBankInfo if RBI.copyCost() returns
UINT_MAX. This is OK for RegBankSelect::Mode::Fast since we only
try one instruction mapping and can't recover from this, but for
RegBankSelect::Mode::Greedy we will be considering multiple
instruction mappings, so we can recover if we see a UNIT_MAX copy
cost.
The copy cost for one pair of register banks in the AMDGPU backend
will be UNIT_MAX, so this patch will prevent AMDGPU tests from
breaking.
Reviewers: ab, qcolombet, t.p.northover, dsanders
Reviewed By: qcolombet
Subscribers: tpr, llvm-commits
Differential Revision: https://reviews.llvm.org/D33144
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@303043 91177308-0d34-0410-b5e6-96231b3b80d8
Summary:
This was broken by r302499. Configuring with -DLLVM_BUILD_DOCS=ON would
cause the docs-llvm-man target not to be created.
Reviewers: anemet, beanz
Reviewed By: anemet
Subscribers: llvm-commits, mgorny
Differential Revision: https://reviews.llvm.org/D33146
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@303042 91177308-0d34-0410-b5e6-96231b3b80d8
We were previously silently emitting bogus data in release mode,
making it very hard to diagnose the error, or crashing with an
assert in debug mode. A proper diagnostic is now always emitted
when the value to be emitted is out of range.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@303041 91177308-0d34-0410-b5e6-96231b3b80d8
Summary:
Merge overflow computation for signed add,
appearing both in InstCombine and ValueTracking.
As part of the merge,
cleanup the interface for overflow checks in InstCombine.
Patch by Yoav Ben-Shalom.
Reviewers: craig.topper, majnemer
Reviewed By: craig.topper
Subscribers: takuto.ikuta, llvm-commits
Differential Revision: https://reviews.llvm.org/D32946
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@303029 91177308-0d34-0410-b5e6-96231b3b80d8