When vectorizing a tree rooted at a store bundle, we currently try to sort the
stores before building the tree, so that the stores can be vectorized. For other
trees, the order of the root bundle - which determines the order of all other
bundles - is arbitrary. That is bad, since if a leaf bundle of consecutive loads
happens to appear in the wrong order, we will not vectorize it.
This is partially mitigated when the root is a binary operator, by trying to
build a "reversed" tree when that's considered profitable. This patch extends the
workaround we have for binops to trees rooted in a horizontal reduction.
This fixes PR28474.
Differential Revision: https://reviews.llvm.org/D22554
llvm-svn: 276477
The error on clang-x86-win2008-selfhost is:
C:\buildbot\slave-config\clang-x86-win2008-selfhost\llvm\lib\Transforms\Vectorize\SLPVectorizer.cpp(955) : error C2248: 'llvm::slpvectorizer::BoUpSLP::ScheduleData' : cannot access private struct declared in class 'llvm::slpvectorizer::BoUpSLP'
C:\buildbot\slave-config\clang-x86-win2008-selfhost\llvm\lib\Transforms\Vectorize\SLPVectorizer.cpp(608) : see declaration of 'llvm::slpvectorizer::BoUpSLP::ScheduleData'
C:\buildbot\slave-config\clang-x86-win2008-selfhost\llvm\lib\Transforms\Vectorize\SLPVectorizer.cpp(337) : see declaration of 'llvm::slpvectorizer::BoUpSLP'
I reproduced this locally with both MSVC 2013 and MSVC 2015.
llvm-svn: 272772
This wasn't failing for me with clang as the compiler. I think GCC may
disagree with clang about whether a friend declaration introduces a
declaration in the enclosing namespace (or something).
Example error:
/home/uweigand/sandbox/buildbot/clang-s390x-linux/llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp:950:77: error: ‘llvm::raw_ostream& llvm::slpvectorizer::operator<<(llvm::raw_ostream&, const llvm::slpvectorizer::BoUpSLP::ScheduleData&)’ should have been declared inside ‘llvm::slpvectorizer’
const BoUpSLP::ScheduleData &SD) {
^
llvm-svn: 272767
This uses the "runImpl" approach to share code with the old PM.
Porting to the new PM meant abandoning the anonymous namespace enclosing
most of SLPVectorizer.cpp which is a bit of a bummer (but not a big deal
compared to having to pull the pass class into a header which the new PM
requires since it calls the constructor directly).
llvm-svn: 272766
Summary:
This fixes PR27617.
Bug description: The SLPVectorizer asserts on encountering GEPs with different index types, such as i8 and i64.
The patch includes a simple relaxation of the assert to allow constants being of different types, along with a regression test that will provoke the unrelaxed assert.
Reviewers: nadav, mzolotukhin
Subscribers: JesperAntonsson, llvm-commits, mzolotukhin
Differential Revision: http://reviews.llvm.org/D20685
Patch by Jesper Antonsson!
llvm-svn: 272206
This patch fixes bug https://llvm.org/bugs/show_bug.cgi?id=27897.
When query memory access cost, current SLP always passes in alignment value of 1 (unaligned), so it gets a very high cost of scalar memory access, and wrongly vectorize memory loads in the test case.
It can be fixed by simply giving correct alignment.
llvm-svn: 271333
Split FCMP//ICMP/SEL from the basic arithmetic cost functions. They were not sharing any notable code path (just the return) and were repeatedly testing the opcode.
llvm-svn: 269348
This is the first of two commits for extending SLP Vectorizer to deal with aggregates.
This commit merely refactors existing logic.
http://reviews.llvm.org/D14185
llvm-svn: 267748
This change adds a new hook for estimating the cost of vector extracts followed
by zero- and sign-extensions. The motivating example for this change is the
SMOV and UMOV instructions on AArch64. These instructions move data from vector
to general purpose registers while performing the corresponding extension
(sign-extend for SMOV and zero-extend for UMOV) at the same time. For these
operations, TargetTransformInfo can assume the extensions are free and only
report the cost of the vector extract. The SLP vectorizer has been updated to
make use of the new hook.
Differential Revision: http://reviews.llvm.org/D18523
llvm-svn: 267725
The original commit was reverted because of a buildbot problem with LazyCallGraph::SCC handling (not related to the OptBisect handling).
Differential Revision: http://reviews.llvm.org/D19172
llvm-svn: 267231
This patch implements a optimization bisect feature, which will allow optimizations to be selectively disabled at compile time in order to track down test failures that are caused by incorrect optimizations.
The bisection is enabled using a new command line option (-opt-bisect-limit). Individual passes that may be skipped call the OptBisect object (via an LLVMContext) to see if they should be skipped based on the bisect limit. A finer level of control (disabling individual transformations) can be managed through an addition OptBisect method, but this is not yet used.
The skip checking in this implementation is based on (and replaces) the skipOptnoneFunction check. Where that check was being called, a new call has been inserted in its place which checks the bisect limit and the optnone attribute. A new function call has been added for module and SCC passes that behaves in a similar way.
Differential Revision: http://reviews.llvm.org/D19172
llvm-svn: 267022
The functionality contained within getIntrinsicIDForCall is two-fold: it
checks if a CallInst's callee is a vectorizable intrinsic. If it isn't
an intrinsic, it attempts to map the call's target to a suitable
intrinsic.
Move the mapping functionality into getIntrinsicForCallSite and rename
getIntrinsicIDForCall to getVectorIntrinsicIDForCall while
reimplementing it in terms of getIntrinsicForCallSite.
llvm-svn: 266801
Removed some unused headers, replaced some headers with forward class declarations.
Found using simple scripts like this one:
clear && ack --cpp -l '#include "llvm/ADT/IndexedMap.h"' | xargs grep -L 'IndexedMap[<]' | xargs grep -n --color=auto 'IndexedMap'
Patch by Eugene Kosov <claprix@yandex.ru>
Differential Revision: http://reviews.llvm.org/D19219
From: Mehdi Amini <mehdi.amini@apple.com>
llvm-svn: 266595
The behavior of {MIN,MAX}NAN differs from that of {MIN,MAX}NUM when only
one of the inputs is NaN: -NUM will return the non-NaN argument while
-NAN would return NaN.
It is desirable to lower to @llvm.{min,max}num to -NAN if they don't
have a native instruction for -NUM. Notably, ARMv7 NEON's vmin has the
-NAN semantics.
N.B. Of course, it is only safe to do this if the intrinsic call is
marked nnan.
llvm-svn: 266279
A catchswitch cannot be preceded by another instruction in the same
basic block (other than a PHI node).
Instead, insert the extract element right after the materialization of
the vectorized value. This isn't optimal but is a reasonable compromise
given the constraints of WinEH.
This fixes PR27163.
llvm-svn: 265157
commit ae14bf6488e8441f0f6d74f00455555f6f3943ac
Author: Mehdi Amini <mehdi.amini@apple.com>
Date: Fri Mar 11 17:15:50 2016 +0000
Remove PreserveNames template parameter from IRBuilder
Summary:
Following r263086, we are now relying on a flag on the Context to
discard Value names in release builds.
Reviewers: chandlerc
Subscribers: mzolotukhin, llvm-commits
Differential Revision: http://reviews.llvm.org/D18023
From: Mehdi Amini <mehdi.amini@apple.com>
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@263258
91177308-0d34-0410-b5e6-96231b3b80d8
until we can figure out what to do about clang and Release build testing.
This reverts commit 263258.
llvm-svn: 263321
Summary:
Following r263086, we are now relying on a flag on the Context to
discard Value names in release builds.
Reviewers: chandlerc
Subscribers: mzolotukhin, llvm-commits
Differential Revision: http://reviews.llvm.org/D18023
From: Mehdi Amini <mehdi.amini@apple.com>
llvm-svn: 263258
MinVecRegSize is currently hardcoded to 128; this patch adds a cl::opt
to allow changing it. I tried not to change any existing behavior for the default
case.
Differential revision: http://reviews.llvm.org/D13278
llvm-svn: 263089
Commit r259357 was reverted because it caused PR26629. We were assuming all
roots of a vectorizable tree could be truncated to the same width, which is not
the case in general. This commit reapplies the patch along with a fix and a new
test case to ensure we don't regress because of this issue again. This should
fix PR26629.
llvm-svn: 261212
This patch is the second attempt to reapply commit r258404. There was bug in
the initial patch and subsequent fix (mentioned below).
The initial patch caused an assertion because we were computing smaller type
sizes for instructions that cannot be demoted. The fix first determines the
instructions that will be demoted, and then applies the smaller type size to
only those instructions.
This should fix PR26239 and PR26307.
llvm-svn: 258929
This commit exposes a crash in computeKnownBits on the Chromium buildbots.
Reverting to investigate.
Reference: https://llvm.org/bugs/show_bug.cgi?id=26307
llvm-svn: 258812
This is a recommit of r258620 which causes PR26293.
The original message:
Now LIR can turn following codes into memset:
typedef struct foo {
int a;
int b;
} foo_t;
void bar(foo_t *f, unsigned n) {
for (unsigned i = 0; i < n; ++i) {
f[i].a = 0;
f[i].b = 0;
}
}
void test(foo_t *f, unsigned n) {
for (unsigned i = 0; i < n; i += 2) {
f[i] = 0;
f[i+1] = 0;
}
}
llvm-svn: 258777
We were hitting an assertion because we were computing smaller type sizes for
instructions that cannot be demoted. The fix first determines the instructions
that will be demoted, and then applies the smaller type size to only those
instructions.
This should fix PR26239.
llvm-svn: 258705