mirror of
https://github.com/RPCS3/llvm.git
synced 2024-12-01 07:30:33 +00:00
Add support for llvm.vectorizer metadata
- llvm.loop.parallel metadata has been renamed to llvm.loop to be more generic by making the root of additional loop metadata. - Loop::isAnnotatedParallel now looks for llvm.loop and associated llvm.mem.parallel_loop_access - document llvm.loop and update llvm.mem.parallel_loop_access - add support for llvm.vectorizer.width and llvm.vectorizer.unroll - document llvm.vectorizer.* metadata - add utility class LoopVectorizerHints for getting/setting loop metadata - use llvm.vectorizer.width=1 to indicate already vectorized instead of already_vectorized - update existing tests that used llvm.loop.parallel and llvm.vectorizer.already_vectorized Reviewed by: Nadav Rotem git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@182802 91177308-0d34-0410-b5e6-96231b3b80d8
This commit is contained in:
parent
a32edcfbc5
commit
ee21b6f7b4
132
docs/LangRef.rst
132
docs/LangRef.rst
@ -2554,8 +2554,8 @@ Examples:
|
||||
It is sometimes useful to attach information to loop constructs. Currently,
|
||||
loop metadata is implemented as metadata attached to the branch instruction
|
||||
in the loop latch block. This type of metadata refer to a metadata node that is
|
||||
guaranteed to be separate for each loop. The loop-level metadata is prefixed
|
||||
with ``llvm.loop``.
|
||||
guaranteed to be separate for each loop. The loop identifier metadata is
|
||||
specified with the name ``llvm.loop``.
|
||||
|
||||
The loop identifier metadata is implemented using a metadata that refers to
|
||||
itself to avoid merging it with any other identifier metadata, e.g.,
|
||||
@ -2569,32 +2569,17 @@ constructs:
|
||||
!0 = metadata !{ metadata !0 }
|
||||
!1 = metadata !{ metadata !1 }
|
||||
|
||||
The loop identifier metadata can be used to specify additional per-loop
|
||||
metadata. Any operands after the first operand can be treated as user-defined
|
||||
metadata. For example the ``llvm.vectorizer.unroll`` metadata is understood
|
||||
by the loop vectorizer to indicate how many times to unroll the loop:
|
||||
|
||||
'``llvm.loop.parallel``' Metadata
|
||||
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
|
||||
.. code-block:: llvm
|
||||
|
||||
This loop metadata can be used to communicate that a loop should be considered
|
||||
a parallel loop. The semantics of parallel loops in this case is the one
|
||||
with the strongest cross-iteration instruction ordering freedom: the
|
||||
iterations in the loop can be considered completely independent of each
|
||||
other (also known as embarrassingly parallel loops).
|
||||
|
||||
This metadata can originate from a programming language with parallel loop
|
||||
constructs. In such a case it is completely the programmer's responsibility
|
||||
to ensure the instructions from the different iterations of the loop can be
|
||||
executed in an arbitrary order, in parallel, or intertwined. No loop-carried
|
||||
dependency checking at all must be expected from the compiler.
|
||||
|
||||
In order to fulfill the LLVM requirement for metadata to be safely ignored,
|
||||
it is important to ensure that a parallel loop is converted to
|
||||
a sequential loop in case an optimization (agnostic of the parallel loop
|
||||
semantics) converts the loop back to such. This happens when new memory
|
||||
accesses that do not fulfill the requirement of free ordering across iterations
|
||||
are added to the loop. Therefore, this metadata is required, but not
|
||||
sufficient, to consider the loop at hand a parallel loop. For a loop
|
||||
to be parallel, all its memory accessing instructions need to be
|
||||
marked with the ``llvm.mem.parallel_loop_access`` metadata that refer
|
||||
to the same loop identifier metadata that identify the loop at hand.
|
||||
br i1 %exitcond, label %._crit_edge, label %.lr.ph, !llvm.loop !0
|
||||
...
|
||||
!0 = metadata !{ metadata !0, metadata !1 }
|
||||
!1 = metadata !{ metadata !"llvm.vectorizer.unroll", i32 2 }
|
||||
|
||||
'``llvm.mem``'
|
||||
^^^^^^^^^^^^^^^
|
||||
@ -2606,29 +2591,28 @@ for optimizations are prefixed with ``llvm.mem``.
|
||||
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
|
||||
|
||||
For a loop to be parallel, in addition to using
|
||||
the ``llvm.loop.parallel`` metadata to mark the loop latch branch instruction,
|
||||
the ``llvm.loop`` metadata to mark the loop latch branch instruction,
|
||||
also all of the memory accessing instructions in the loop body need to be
|
||||
marked with the ``llvm.mem.parallel_loop_access`` metadata. If there
|
||||
is at least one memory accessing instruction not marked with the metadata,
|
||||
the loop, despite it possibly using the ``llvm.loop.parallel`` metadata,
|
||||
must be considered a sequential loop. This causes parallel loops to be
|
||||
the loop must be considered a sequential loop. This causes parallel loops to be
|
||||
converted to sequential loops due to optimization passes that are unaware of
|
||||
the parallel semantics and that insert new memory instructions to the loop
|
||||
body.
|
||||
|
||||
Example of a loop that is considered parallel due to its correct use of
|
||||
both ``llvm.loop.parallel`` and ``llvm.mem.parallel_loop_access``
|
||||
both ``llvm.loop`` and ``llvm.mem.parallel_loop_access``
|
||||
metadata types that refer to the same loop identifier metadata.
|
||||
|
||||
.. code-block:: llvm
|
||||
|
||||
for.body:
|
||||
...
|
||||
%0 = load i32* %arrayidx, align 4, !llvm.mem.parallel_loop_access !0
|
||||
...
|
||||
store i32 %0, i32* %arrayidx4, align 4, !llvm.mem.parallel_loop_access !0
|
||||
...
|
||||
br i1 %exitcond, label %for.end, label %for.body, !llvm.loop.parallel !0
|
||||
...
|
||||
%0 = load i32* %arrayidx, align 4, !llvm.mem.parallel_loop_access !0
|
||||
...
|
||||
store i32 %0, i32* %arrayidx4, align 4, !llvm.mem.parallel_loop_access !0
|
||||
...
|
||||
br i1 %exitcond, label %for.end, label %for.body, !llvm.loop !0
|
||||
|
||||
for.end:
|
||||
...
|
||||
@ -2644,27 +2628,73 @@ the loop identifier metadata node directly:
|
||||
...
|
||||
|
||||
inner.for.body:
|
||||
...
|
||||
%0 = load i32* %arrayidx, align 4, !llvm.mem.parallel_loop_access !0
|
||||
...
|
||||
store i32 %0, i32* %arrayidx4, align 4, !llvm.mem.parallel_loop_access !0
|
||||
...
|
||||
br i1 %exitcond, label %inner.for.end, label %inner.for.body, !llvm.loop.parallel !1
|
||||
...
|
||||
%0 = load i32* %arrayidx, align 4, !llvm.mem.parallel_loop_access !0
|
||||
...
|
||||
store i32 %0, i32* %arrayidx4, align 4, !llvm.mem.parallel_loop_access !0
|
||||
...
|
||||
br i1 %exitcond, label %inner.for.end, label %inner.for.body, !llvm.loop !1
|
||||
|
||||
inner.for.end:
|
||||
...
|
||||
%0 = load i32* %arrayidx, align 4, !llvm.mem.parallel_loop_access !0
|
||||
...
|
||||
store i32 %0, i32* %arrayidx4, align 4, !llvm.mem.parallel_loop_access !0
|
||||
...
|
||||
br i1 %exitcond, label %outer.for.end, label %outer.for.body, !llvm.loop.parallel !2
|
||||
...
|
||||
%0 = load i32* %arrayidx, align 4, !llvm.mem.parallel_loop_access !0
|
||||
...
|
||||
store i32 %0, i32* %arrayidx4, align 4, !llvm.mem.parallel_loop_access !0
|
||||
...
|
||||
br i1 %exitcond, label %outer.for.end, label %outer.for.body, !llvm.loop !2
|
||||
|
||||
outer.for.end: ; preds = %for.body
|
||||
...
|
||||
!0 = metadata !{ metadata !1, metadata !2 } ; a list of parallel loop identifiers
|
||||
!1 = metadata !{ metadata !1 } ; an identifier for the inner parallel loop
|
||||
!2 = metadata !{ metadata !2 } ; an identifier for the outer parallel loop
|
||||
!0 = metadata !{ metadata !1, metadata !2 } ; a list of loop identifiers
|
||||
!1 = metadata !{ metadata !1 } ; an identifier for the inner loop
|
||||
!2 = metadata !{ metadata !2 } ; an identifier for the outer loop
|
||||
|
||||
'``llvm.vectorizer``'
|
||||
^^^^^^^^^^^^^^^^^^^^^
|
||||
|
||||
Metadata prefixed with ``llvm.vectorizer`` is used to control per-loop
|
||||
vectorization parameters such as vectorization factor and unroll factor.
|
||||
|
||||
``llvm.vectorizer`` metadata should be used in conjunction with ``llvm.loop``
|
||||
loop identification metadata.
|
||||
|
||||
'``llvm.vectorizer.unroll``' Metadata
|
||||
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
|
||||
|
||||
This metadata instructs the loop vectorizer to unroll the specified
|
||||
loop exactly ``N`` times.
|
||||
|
||||
The first operand is the string ``llvm.vectorizer.unroll`` and the second
|
||||
operand is an integer specifying the unroll factor. For example:
|
||||
|
||||
.. code-block:: llvm
|
||||
|
||||
!0 = metadata !{ metadata !"llvm.vectorizer.unroll", i32 4 }
|
||||
|
||||
Note that setting ``llvm.vectorizer.unroll`` to 1 disables unrolling of the
|
||||
loop.
|
||||
|
||||
If ``llvm.vectorizer.unroll`` is set to 0 then the amount of unrolling will be
|
||||
determined automatically.
|
||||
|
||||
'``llvm.vectorizer.width``' Metadata
|
||||
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
|
||||
|
||||
This metadata forces the loop vectorizer to widen scalar values to a vector
|
||||
width of ``N`` rather than computing the width using a cost model.
|
||||
|
||||
The first operand is the string ``llvm.vectorizer.width`` and the second
|
||||
operand is an integer specifying the width. For example:
|
||||
|
||||
.. code-block:: llvm
|
||||
|
||||
!0 = metadata !{ metadata !"llvm.vectorizer.width", i32 4 }
|
||||
|
||||
Note that setting ``llvm.vectorizer.width`` to 1 disables vectorization of the
|
||||
loop.
|
||||
|
||||
If ``llvm.vectorizer.width`` is set to 0 then the width will be determined
|
||||
automatically.
|
||||
|
||||
Module Flags Metadata
|
||||
=====================
|
||||
|
@ -50,6 +50,7 @@ inline void RemoveFromVector(std::vector<T*> &V, T *N) {
|
||||
class DominatorTree;
|
||||
class LoopInfo;
|
||||
class Loop;
|
||||
class MDNode;
|
||||
class PHINode;
|
||||
class raw_ostream;
|
||||
template<class N, class M> class LoopInfoBase;
|
||||
@ -391,6 +392,22 @@ public:
|
||||
/// iterations.
|
||||
bool isAnnotatedParallel() const;
|
||||
|
||||
/// Return the llvm.loop loop id metadata node for this loop if it is present.
|
||||
///
|
||||
/// If this loop contains the same llvm.loop metadata on each branch to the
|
||||
/// header then the node is returned. If any latch instruction does not
|
||||
/// contain llvm.loop or or if multiple latches contain different nodes then
|
||||
/// 0 is returned.
|
||||
MDNode *getLoopID() const;
|
||||
/// Set the llvm.loop loop id metadata for this loop.
|
||||
///
|
||||
/// The LoopID metadata node will be added to each terminator instruction in
|
||||
/// the loop that branches to the loop header.
|
||||
///
|
||||
/// The LoopID metadata node should have one or more operands and the first
|
||||
/// operand should should be the node itself.
|
||||
void setLoopID(MDNode *LoopID) const;
|
||||
|
||||
/// hasDedicatedExits - Return true if no exit block for the loop
|
||||
/// has a predecessor that is outside the loop.
|
||||
bool hasDedicatedExits() const;
|
||||
|
@ -50,6 +50,9 @@ INITIALIZE_PASS_BEGIN(LoopInfo, "loops", "Natural Loop Information", true, true)
|
||||
INITIALIZE_PASS_DEPENDENCY(DominatorTree)
|
||||
INITIALIZE_PASS_END(LoopInfo, "loops", "Natural Loop Information", true, true)
|
||||
|
||||
// Loop identifier metadata name.
|
||||
static const char* LoopMDName = "llvm.loop";
|
||||
|
||||
//===----------------------------------------------------------------------===//
|
||||
// Loop implementation
|
||||
//
|
||||
@ -234,14 +237,62 @@ bool Loop::isSafeToClone() const {
|
||||
return true;
|
||||
}
|
||||
|
||||
MDNode *Loop::getLoopID() const {
|
||||
MDNode *LoopID = 0;
|
||||
if (isLoopSimplifyForm()) {
|
||||
LoopID = getLoopLatch()->getTerminator()->getMetadata(LoopMDName);
|
||||
} else {
|
||||
// Go through each predecessor of the loop header and check the
|
||||
// terminator for the metadata.
|
||||
BasicBlock *H = getHeader();
|
||||
for (block_iterator I = block_begin(), IE = block_end(); I != IE; ++I) {
|
||||
TerminatorInst *TI = (*I)->getTerminator();
|
||||
MDNode *MD = 0;
|
||||
|
||||
// Check if this terminator branches to the loop header.
|
||||
for (unsigned i = 0, ie = TI->getNumSuccessors(); i != ie; ++i) {
|
||||
if (TI->getSuccessor(i) == H) {
|
||||
MD = TI->getMetadata(LoopMDName);
|
||||
break;
|
||||
}
|
||||
}
|
||||
if (!MD)
|
||||
return 0;
|
||||
|
||||
if (!LoopID)
|
||||
LoopID = MD;
|
||||
else if (MD != LoopID)
|
||||
return 0;
|
||||
}
|
||||
}
|
||||
if (!LoopID || LoopID->getNumOperands() == 0 ||
|
||||
LoopID->getOperand(0) != LoopID)
|
||||
return 0;
|
||||
return LoopID;
|
||||
}
|
||||
|
||||
void Loop::setLoopID(MDNode *LoopID) const {
|
||||
assert(LoopID && "Loop ID should not be null");
|
||||
assert(LoopID->getNumOperands() > 0 && "Loop ID needs at least one operand");
|
||||
assert(LoopID->getOperand(0) == LoopID && "Loop ID should refer to itself");
|
||||
|
||||
if (isLoopSimplifyForm()) {
|
||||
getLoopLatch()->getTerminator()->setMetadata(LoopMDName, LoopID);
|
||||
return;
|
||||
}
|
||||
|
||||
BasicBlock *H = getHeader();
|
||||
for (block_iterator I = block_begin(), IE = block_end(); I != IE; ++I) {
|
||||
TerminatorInst *TI = (*I)->getTerminator();
|
||||
for (unsigned i = 0, ie = TI->getNumSuccessors(); i != ie; ++i) {
|
||||
if (TI->getSuccessor(i) == H)
|
||||
TI->setMetadata(LoopMDName, LoopID);
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
bool Loop::isAnnotatedParallel() const {
|
||||
|
||||
BasicBlock *latch = getLoopLatch();
|
||||
if (latch == NULL)
|
||||
return false;
|
||||
|
||||
MDNode *desiredLoopIdMetadata =
|
||||
latch->getTerminator()->getMetadata("llvm.loop.parallel");
|
||||
MDNode *desiredLoopIdMetadata = getLoopID();
|
||||
|
||||
if (!desiredLoopIdMetadata)
|
||||
return false;
|
||||
|
@ -119,11 +119,11 @@ static const unsigned TinyTripCountUnrollThreshold = 128;
|
||||
/// than this number of comparisons.
|
||||
static const unsigned RuntimeMemoryCheckThreshold = 8;
|
||||
|
||||
/// We use a metadata with this name to indicate that a scalar loop was
|
||||
/// vectorized and that we don't need to re-vectorize it if we run into it
|
||||
/// again.
|
||||
static const char*
|
||||
AlreadyVectorizedMDName = "llvm.vectorizer.already_vectorized";
|
||||
/// Maximum simd width.
|
||||
static const unsigned MaxVectorWidth = 64;
|
||||
|
||||
/// Maximum vectorization unroll count.
|
||||
static const unsigned MaxUnrollFactor = 16;
|
||||
|
||||
namespace {
|
||||
|
||||
@ -768,6 +768,127 @@ private:
|
||||
const TargetLibraryInfo *TLI;
|
||||
};
|
||||
|
||||
/// Utility class for getting and setting loop vectorizer hints in the form
|
||||
/// of loop metadata.
|
||||
struct LoopVectorizeHints {
|
||||
/// Vectorization width.
|
||||
unsigned Width;
|
||||
/// Vectorization unroll factor.
|
||||
unsigned Unroll;
|
||||
|
||||
LoopVectorizeHints(const Loop *L)
|
||||
: Width(VectorizationFactor)
|
||||
, Unroll(VectorizationUnroll)
|
||||
, LoopID(L->getLoopID()) {
|
||||
getHints(L);
|
||||
// The command line options override any loop metadata except for when
|
||||
// width == 1 which is used to indicate the loop is already vectorized.
|
||||
if (VectorizationFactor.getNumOccurrences() > 0 && Width != 1)
|
||||
Width = VectorizationFactor;
|
||||
if (VectorizationUnroll.getNumOccurrences() > 0)
|
||||
Unroll = VectorizationUnroll;
|
||||
}
|
||||
|
||||
/// Return the loop vectorizer metadata prefix.
|
||||
static StringRef Prefix() { return "llvm.vectorizer."; }
|
||||
|
||||
MDNode *createHint(LLVMContext &Context, StringRef Name, unsigned V) {
|
||||
SmallVector<Value*, 2> Vals;
|
||||
Vals.push_back(MDString::get(Context, Name));
|
||||
Vals.push_back(ConstantInt::get(Type::getInt32Ty(Context), V));
|
||||
return MDNode::get(Context, Vals);
|
||||
}
|
||||
|
||||
/// Mark the loop L as already vectorized by setting the width to 1.
|
||||
void setAlreadyVectorized(Loop *L) {
|
||||
LLVMContext &Context = L->getHeader()->getContext();
|
||||
|
||||
Width = 1;
|
||||
|
||||
// Create a new loop id with one more operand for the already_vectorized
|
||||
// hint. If the loop already has a loop id then copy the existing operands.
|
||||
SmallVector<Value*, 4> Vals(1);
|
||||
if (LoopID)
|
||||
for (unsigned i = 1, ie = LoopID->getNumOperands(); i < ie; ++i)
|
||||
Vals.push_back(LoopID->getOperand(i));
|
||||
|
||||
Twine Name = Prefix() + "width";
|
||||
Vals.push_back(createHint(Context, Name.str(), Width));
|
||||
|
||||
MDNode *NewLoopID = MDNode::get(Context, Vals);
|
||||
// Set operand 0 to refer to the loop id itself.
|
||||
NewLoopID->replaceOperandWith(0, NewLoopID);
|
||||
|
||||
L->setLoopID(NewLoopID);
|
||||
if (LoopID)
|
||||
LoopID->replaceAllUsesWith(NewLoopID);
|
||||
|
||||
LoopID = NewLoopID;
|
||||
}
|
||||
|
||||
private:
|
||||
MDNode *LoopID;
|
||||
|
||||
/// Find hints specified in the loop metadata.
|
||||
void getHints(const Loop *L) {
|
||||
if (!LoopID)
|
||||
return;
|
||||
|
||||
// First operand should refer to the loop id itself.
|
||||
assert(LoopID->getNumOperands() > 0 && "requires at least one operand");
|
||||
assert(LoopID->getOperand(0) == LoopID && "invalid loop id");
|
||||
|
||||
for (unsigned i = 1, ie = LoopID->getNumOperands(); i < ie; ++i) {
|
||||
const MDString *S = 0;
|
||||
SmallVector<Value*, 4> Args;
|
||||
|
||||
// The expected hint is either a MDString or a MDNode with the first
|
||||
// operand a MDString.
|
||||
if (const MDNode *MD = dyn_cast<MDNode>(LoopID->getOperand(i))) {
|
||||
if (!MD || MD->getNumOperands() == 0)
|
||||
continue;
|
||||
S = dyn_cast<MDString>(MD->getOperand(0));
|
||||
for (unsigned i = 1, ie = MD->getNumOperands(); i < ie; ++i)
|
||||
Args.push_back(MD->getOperand(i));
|
||||
} else {
|
||||
S = dyn_cast<MDString>(LoopID->getOperand(i));
|
||||
assert(Args.size() == 0 && "too many arguments for MDString");
|
||||
}
|
||||
|
||||
if (!S)
|
||||
continue;
|
||||
|
||||
// Check if the hint starts with the vectorizer prefix.
|
||||
StringRef Hint = S->getString();
|
||||
if (!Hint.startswith(Prefix()))
|
||||
continue;
|
||||
// Remove the prefix.
|
||||
Hint = Hint.substr(Prefix().size(), StringRef::npos);
|
||||
|
||||
if (Args.size() == 1)
|
||||
getHint(Hint, Args[0]);
|
||||
}
|
||||
}
|
||||
|
||||
// Check string hint with one operand.
|
||||
void getHint(StringRef Hint, Value *Arg) {
|
||||
const ConstantInt *C = dyn_cast<ConstantInt>(Arg);
|
||||
if (!C) return;
|
||||
unsigned Val = C->getZExtValue();
|
||||
|
||||
if (Hint == "width") {
|
||||
assert(isPowerOf2_32(Val) && Val <= MaxVectorWidth &&
|
||||
"Invalid width metadata");
|
||||
Width = Val;
|
||||
} else if (Hint == "unroll") {
|
||||
assert(isPowerOf2_32(Val) && Val <= MaxUnrollFactor &&
|
||||
"Invalid unroll metadata");
|
||||
Unroll = Val;
|
||||
} else
|
||||
DEBUG(dbgs() << "LV: ignoring unknown hint " << Hint);
|
||||
}
|
||||
};
|
||||
|
||||
/// The LoopVectorize Pass.
|
||||
struct LoopVectorize : public LoopPass {
|
||||
/// Pass identification, replacement for typeid
|
||||
@ -806,6 +927,13 @@ struct LoopVectorize : public LoopPass {
|
||||
DEBUG(dbgs() << "LV: Checking a loop in \"" <<
|
||||
L->getHeader()->getParent()->getName() << "\"\n");
|
||||
|
||||
LoopVectorizeHints Hints(L);
|
||||
|
||||
if (Hints.Width == 1) {
|
||||
DEBUG(dbgs() << "LV: Not vectorizing.\n");
|
||||
return false;
|
||||
}
|
||||
|
||||
// Check if it is legal to vectorize the loop.
|
||||
LoopVectorizationLegality LVL(L, SE, DL, DT, TTI, AA, TLI);
|
||||
if (!LVL.canVectorize()) {
|
||||
@ -833,10 +961,10 @@ struct LoopVectorize : public LoopPass {
|
||||
|
||||
// Select the optimal vectorization factor.
|
||||
LoopVectorizationCostModel::VectorizationFactor VF;
|
||||
VF = CM.selectVectorizationFactor(OptForSize, VectorizationFactor);
|
||||
VF = CM.selectVectorizationFactor(OptForSize, Hints.Width);
|
||||
// Select the unroll factor.
|
||||
unsigned UF = CM.selectUnrollFactor(OptForSize, VectorizationUnroll,
|
||||
VF.Width, VF.Cost);
|
||||
unsigned UF = CM.selectUnrollFactor(OptForSize, Hints.Unroll, VF.Width,
|
||||
VF.Cost);
|
||||
|
||||
if (VF.Width == 1) {
|
||||
DEBUG(dbgs() << "LV: Vectorization is possible but not beneficial.\n");
|
||||
@ -851,6 +979,9 @@ struct LoopVectorize : public LoopPass {
|
||||
InnerLoopVectorizer LB(L, SE, LI, DT, DL, TLI, VF.Width, UF);
|
||||
LB.vectorize(&LVL);
|
||||
|
||||
// Mark the loop as already vectorized to avoid vectorizing again.
|
||||
Hints.setAlreadyVectorized(L);
|
||||
|
||||
DEBUG(verifyFunction(*L->getHeader()->getParent()));
|
||||
return true;
|
||||
}
|
||||
@ -1318,11 +1449,6 @@ InnerLoopVectorizer::createEmptyLoop(LoopVectorizationLegality *Legal) {
|
||||
BasicBlock *ExitBlock = OrigLoop->getExitBlock();
|
||||
assert(ExitBlock && "Must have an exit block");
|
||||
|
||||
// Mark the old scalar loop with metadata that tells us not to vectorize this
|
||||
// loop again if we run into it.
|
||||
MDNode *MD = MDNode::get(OldBasicBlock->getContext(), None);
|
||||
OldBasicBlock->getTerminator()->setMetadata(AlreadyVectorizedMDName, MD);
|
||||
|
||||
// Some loops have a single integer induction variable, while other loops
|
||||
// don't. One example is c++ iterators that often have multiple pointer
|
||||
// induction variables. In the code below we also support a case where we
|
||||
@ -2516,13 +2642,6 @@ bool LoopVectorizationLegality::canVectorizeInstrs() {
|
||||
BasicBlock *PreHeader = TheLoop->getLoopPreheader();
|
||||
BasicBlock *Header = TheLoop->getHeader();
|
||||
|
||||
// If we marked the scalar loop as "already vectorized" then no need
|
||||
// to vectorize it again.
|
||||
if (Header->getTerminator()->getMetadata(AlreadyVectorizedMDName)) {
|
||||
DEBUG(dbgs() << "LV: This loop was vectorized before\n");
|
||||
return false;
|
||||
}
|
||||
|
||||
// Look for the attribute signaling the absence of NaNs.
|
||||
Function &F = *Header->getParent();
|
||||
if (F.hasFnAttribute("no-nans-fp-math"))
|
||||
|
@ -21,7 +21,7 @@ for.end.us: ; preds = %for.body3.us
|
||||
%indvars.iv.next34 = add i64 %indvars.iv33, 1
|
||||
%lftr.wideiv35 = trunc i64 %indvars.iv.next34 to i32
|
||||
%exitcond36 = icmp eq i32 %lftr.wideiv35, %m
|
||||
br i1 %exitcond36, label %for.end15, label %for.body3.lr.ph.us, !llvm.loop.parallel !5
|
||||
br i1 %exitcond36, label %for.end15, label %for.body3.lr.ph.us, !llvm.loop !5
|
||||
|
||||
for.body3.us: ; preds = %for.body3.us, %for.body3.lr.ph.us
|
||||
%indvars.iv29 = phi i64 [ 0, %for.body3.lr.ph.us ], [ %indvars.iv.next30, %for.body3.us ]
|
||||
@ -35,7 +35,7 @@ for.body3.us: ; preds = %for.body3.us, %for.
|
||||
%indvars.iv.next30 = add i64 %indvars.iv29, 1
|
||||
%lftr.wideiv31 = trunc i64 %indvars.iv.next30 to i32
|
||||
%exitcond32 = icmp eq i32 %lftr.wideiv31, %m
|
||||
br i1 %exitcond32, label %for.end.us, label %for.body3.us, !llvm.loop.parallel !4
|
||||
br i1 %exitcond32, label %for.end.us, label %for.body3.us, !llvm.loop !4
|
||||
|
||||
for.body3.lr.ph.us: ; preds = %for.end.us, %entry
|
||||
%indvars.iv33 = phi i64 [ %indvars.iv.next34, %for.end.us ], [ 0, %entry ]
|
||||
|
@ -35,7 +35,7 @@ for.body: ; preds = %for.body.for.body_c
|
||||
%indvars.iv.next.reload = load i64* %indvars.iv.next.reg2mem
|
||||
%lftr.wideiv = trunc i64 %indvars.iv.next.reload to i32
|
||||
%exitcond = icmp eq i32 %lftr.wideiv, 512
|
||||
br i1 %exitcond, label %for.end, label %for.body.for.body_crit_edge, !llvm.loop.parallel !3
|
||||
br i1 %exitcond, label %for.end, label %for.body.for.body_crit_edge, !llvm.loop !3
|
||||
|
||||
for.body.for.body_crit_edge: ; preds = %for.body
|
||||
%indvars.iv.next.reload2 = load i64* %indvars.iv.next.reg2mem
|
||||
|
@ -65,7 +65,7 @@ for.body: ; preds = %for.body, %entry
|
||||
store i32 %2, i32* %arrayidx2, align 4, !llvm.mem.parallel_loop_access !3
|
||||
%lftr.wideiv = trunc i64 %indvars.iv.next to i32
|
||||
%exitcond = icmp eq i32 %lftr.wideiv, 512
|
||||
br i1 %exitcond, label %for.end, label %for.body, !llvm.loop.parallel !3
|
||||
br i1 %exitcond, label %for.end, label %for.body, !llvm.loop !3
|
||||
|
||||
for.end: ; preds = %for.body
|
||||
ret void
|
||||
@ -98,7 +98,7 @@ for.body: ; preds = %for.body, %entry
|
||||
store i32 %2, i32* %arrayidx2, align 4, !llvm.mem.parallel_loop_access !6
|
||||
%lftr.wideiv = trunc i64 %indvars.iv.next to i32
|
||||
%exitcond = icmp eq i32 %lftr.wideiv, 512
|
||||
br i1 %exitcond, label %for.end, label %for.body, !llvm.loop.parallel !6
|
||||
br i1 %exitcond, label %for.end, label %for.body, !llvm.loop !6
|
||||
|
||||
for.end: ; preds = %for.body
|
||||
ret void
|
||||
|
41
test/Transforms/LoopVectorize/metadata-unroll.ll
Normal file
41
test/Transforms/LoopVectorize/metadata-unroll.ll
Normal file
@ -0,0 +1,41 @@
|
||||
; RUN: opt < %s -loop-vectorize -force-vector-width=4 -dce -instcombine -S | FileCheck %s
|
||||
|
||||
target datalayout = "e-p:64:64:64-i1:8:8-i8:8:8-i16:16:16-i32:32:32-i64:64:64-f32:32:32-f64:64:64-v64:64:64-v128:128:128-a0:0:64-s0:64:64-f80:128:128-n8:16:32:64-S128"
|
||||
target triple = "x86_64-apple-macosx10.8.0"
|
||||
|
||||
@a = common global [2048 x i32] zeroinitializer, align 16
|
||||
|
||||
; This is the loop.
|
||||
; for (i=0; i<n; i++){
|
||||
; a[i] += i;
|
||||
; }
|
||||
;CHECK: @inc
|
||||
;CHECK: load <4 x i32>
|
||||
;CHECK: load <4 x i32>
|
||||
;CHECK: add nsw <4 x i32>
|
||||
;CHECK: add nsw <4 x i32>
|
||||
;CHECK: store <4 x i32>
|
||||
;CHECK: store <4 x i32>
|
||||
;CHECK: ret void
|
||||
define void @inc(i32 %n) nounwind uwtable noinline ssp {
|
||||
%1 = icmp sgt i32 %n, 0
|
||||
br i1 %1, label %.lr.ph, label %._crit_edge
|
||||
|
||||
.lr.ph: ; preds = %0, %.lr.ph
|
||||
%indvars.iv = phi i64 [ %indvars.iv.next, %.lr.ph ], [ 0, %0 ]
|
||||
%2 = getelementptr inbounds [2048 x i32]* @a, i64 0, i64 %indvars.iv
|
||||
%3 = load i32* %2, align 4
|
||||
%4 = trunc i64 %indvars.iv to i32
|
||||
%5 = add nsw i32 %3, %4
|
||||
store i32 %5, i32* %2, align 4
|
||||
%indvars.iv.next = add i64 %indvars.iv, 1
|
||||
%lftr.wideiv = trunc i64 %indvars.iv.next to i32
|
||||
%exitcond = icmp eq i32 %lftr.wideiv, %n
|
||||
br i1 %exitcond, label %._crit_edge, label %.lr.ph, !llvm.loop !0
|
||||
|
||||
._crit_edge: ; preds = %.lr.ph, %0
|
||||
ret void
|
||||
}
|
||||
|
||||
!0 = metadata !{metadata !0, metadata !1}
|
||||
!1 = metadata !{metadata !"llvm.vectorizer.unroll", i32 2}
|
31
test/Transforms/LoopVectorize/metadata-width.ll
Normal file
31
test/Transforms/LoopVectorize/metadata-width.ll
Normal file
@ -0,0 +1,31 @@
|
||||
; RUN: opt < %s -loop-vectorize -force-vector-unroll=1 -dce -instcombine -S | FileCheck %s
|
||||
|
||||
target datalayout = "e-p:64:64:64-i1:8:8-i8:8:8-i16:16:16-i32:32:32-i64:64:64-f32:32:32-f64:64:64-v64:64:64-v128:128:128-a0:0:64-s0:64:64-f80:128:128-n8:16:32:64-S128"
|
||||
target triple = "x86_64-unknown-linux-gnu"
|
||||
|
||||
; CHECK: @test1
|
||||
; CHECK: store <8 x i32>
|
||||
; CHECK: ret void
|
||||
define void @test1(i32* nocapture %a, i32 %n) #0 {
|
||||
entry:
|
||||
%cmp4 = icmp sgt i32 %n, 0
|
||||
br i1 %cmp4, label %for.body, label %for.end
|
||||
|
||||
for.body: ; preds = %entry, %for.body
|
||||
%indvars.iv = phi i64 [ %indvars.iv.next, %for.body ], [ 0, %entry ]
|
||||
%arrayidx = getelementptr inbounds i32* %a, i64 %indvars.iv
|
||||
%0 = trunc i64 %indvars.iv to i32
|
||||
store i32 %0, i32* %arrayidx, align 4
|
||||
%indvars.iv.next = add i64 %indvars.iv, 1
|
||||
%lftr.wideiv = trunc i64 %indvars.iv.next to i32
|
||||
%exitcond = icmp eq i32 %lftr.wideiv, %n
|
||||
br i1 %exitcond, label %for.end, label %for.body, !llvm.loop !0
|
||||
|
||||
for.end: ; preds = %for.body, %entry
|
||||
ret void
|
||||
}
|
||||
|
||||
attributes #0 = { nounwind uwtable "less-precise-fpmad"="false" "no-frame-pointer-elim"="false" "no-frame-pointer-elim-non-leaf"="false" "no-infs-fp-math"="false" "no-nans-fp-math"="false" "unsafe-fp-math"="false" "use-soft-float"="false" }
|
||||
|
||||
!0 = metadata !{metadata !0, metadata !1}
|
||||
!1 = metadata !{metadata !"llvm.vectorizer.width", i32 8}
|
@ -11,7 +11,7 @@ target triple = "x86_64-apple-macosx10.8.0"
|
||||
; This test checks that we add metadata to vectorized loops
|
||||
; CHECK: _Z4foo1Pii
|
||||
; CHECK: <4 x i32>
|
||||
; CHECK: llvm.vectorizer.already_vectorized
|
||||
; CHECK: llvm.loop
|
||||
; CHECK: ret
|
||||
|
||||
; This test comes from the loop:
|
||||
@ -40,10 +40,10 @@ _ZSt10accumulateIPiiET0_T_S2_S1_.exit: ; preds = %for.body.i, %entry
|
||||
ret i32 %__init.addr.0.lcssa.i
|
||||
}
|
||||
|
||||
; This test checks that we don't vectorize loops that are marked with the "already vectorized" metadata.
|
||||
; This test checks that we don't vectorize loops that are marked with the "width" == 1 metadata.
|
||||
; CHECK: _Z4foo2Pii
|
||||
; CHECK-NOT: <4 x i32>
|
||||
; CHECK: llvm.vectorizer.already_vectorized
|
||||
; CHECK: llvm.loop
|
||||
; CHECK: ret
|
||||
define i32 @_Z4foo2Pii(i32* %A, i32 %n) #0 {
|
||||
entry:
|
||||
@ -59,7 +59,7 @@ for.body.i: ; preds = %entry, %for.body.i
|
||||
%add.i = add nsw i32 %0, %__init.addr.05.i
|
||||
%incdec.ptr.i = getelementptr inbounds i32* %__first.addr.04.i, i64 1
|
||||
%cmp.i = icmp eq i32* %incdec.ptr.i, %add.ptr
|
||||
br i1 %cmp.i, label %_ZSt10accumulateIPiiET0_T_S2_S1_.exit, label %for.body.i, !llvm.vectorizer.already_vectorized !3
|
||||
br i1 %cmp.i, label %_ZSt10accumulateIPiiET0_T_S2_S1_.exit, label %for.body.i, !llvm.loop !0
|
||||
|
||||
_ZSt10accumulateIPiiET0_T_S2_S1_.exit: ; preds = %for.body.i, %entry
|
||||
%__init.addr.0.lcssa.i = phi i32 [ 0, %entry ], [ %add.i, %for.body.i ]
|
||||
@ -68,5 +68,9 @@ _ZSt10accumulateIPiiET0_T_S2_S1_.exit: ; preds = %for.body.i, %entry
|
||||
|
||||
attributes #0 = { nounwind readonly ssp uwtable "fp-contract-model"="standard" "no-frame-pointer-elim" "no-frame-pointer-elim-non-leaf" "realign-stack" "relocation-model"="pic" "ssp-buffers-size"="8" }
|
||||
|
||||
!3 = metadata !{}
|
||||
; CHECK: !0 = metadata !{metadata !0, metadata !1}
|
||||
; CHECK: !1 = metadata !{metadata !"llvm.vectorizer.width", i32 1}
|
||||
; CHECK: !2 = metadata !{metadata !2, metadata !1}
|
||||
|
||||
!0 = metadata !{metadata !0, metadata !1}
|
||||
!1 = metadata !{metadata !"llvm.vectorizer.width", i32 1}
|
||||
|
Loading…
Reference in New Issue
Block a user