Add support for llvm.vectorizer metadata

- llvm.loop.parallel metadata has been renamed to llvm.loop to be more generic
  by making the root of additional loop metadata.
  - Loop::isAnnotatedParallel now looks for llvm.loop and associated
    llvm.mem.parallel_loop_access
  - document llvm.loop and update llvm.mem.parallel_loop_access
- add support for llvm.vectorizer.width and llvm.vectorizer.unroll
  - document llvm.vectorizer.* metadata
  - add utility class LoopVectorizerHints for getting/setting loop metadata
  - use llvm.vectorizer.width=1 to indicate already vectorized instead of
    already_vectorized
- update existing tests that used llvm.loop.parallel and
  llvm.vectorizer.already_vectorized

Reviewed by: Nadav Rotem


git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@182802 91177308-0d34-0410-b5e6-96231b3b80d8
This commit is contained in:
Paul Redmond 2013-05-28 20:00:34 +00:00
parent a32edcfbc5
commit ee21b6f7b4
10 changed files with 381 additions and 88 deletions

View File

@ -2554,8 +2554,8 @@ Examples:
It is sometimes useful to attach information to loop constructs. Currently,
loop metadata is implemented as metadata attached to the branch instruction
in the loop latch block. This type of metadata refer to a metadata node that is
guaranteed to be separate for each loop. The loop-level metadata is prefixed
with ``llvm.loop``.
guaranteed to be separate for each loop. The loop identifier metadata is
specified with the name ``llvm.loop``.
The loop identifier metadata is implemented using a metadata that refers to
itself to avoid merging it with any other identifier metadata, e.g.,
@ -2569,32 +2569,17 @@ constructs:
!0 = metadata !{ metadata !0 }
!1 = metadata !{ metadata !1 }
The loop identifier metadata can be used to specify additional per-loop
metadata. Any operands after the first operand can be treated as user-defined
metadata. For example the ``llvm.vectorizer.unroll`` metadata is understood
by the loop vectorizer to indicate how many times to unroll the loop:
'``llvm.loop.parallel``' Metadata
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
.. code-block:: llvm
This loop metadata can be used to communicate that a loop should be considered
a parallel loop. The semantics of parallel loops in this case is the one
with the strongest cross-iteration instruction ordering freedom: the
iterations in the loop can be considered completely independent of each
other (also known as embarrassingly parallel loops).
This metadata can originate from a programming language with parallel loop
constructs. In such a case it is completely the programmer's responsibility
to ensure the instructions from the different iterations of the loop can be
executed in an arbitrary order, in parallel, or intertwined. No loop-carried
dependency checking at all must be expected from the compiler.
In order to fulfill the LLVM requirement for metadata to be safely ignored,
it is important to ensure that a parallel loop is converted to
a sequential loop in case an optimization (agnostic of the parallel loop
semantics) converts the loop back to such. This happens when new memory
accesses that do not fulfill the requirement of free ordering across iterations
are added to the loop. Therefore, this metadata is required, but not
sufficient, to consider the loop at hand a parallel loop. For a loop
to be parallel, all its memory accessing instructions need to be
marked with the ``llvm.mem.parallel_loop_access`` metadata that refer
to the same loop identifier metadata that identify the loop at hand.
br i1 %exitcond, label %._crit_edge, label %.lr.ph, !llvm.loop !0
...
!0 = metadata !{ metadata !0, metadata !1 }
!1 = metadata !{ metadata !"llvm.vectorizer.unroll", i32 2 }
'``llvm.mem``'
^^^^^^^^^^^^^^^
@ -2606,29 +2591,28 @@ for optimizations are prefixed with ``llvm.mem``.
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
For a loop to be parallel, in addition to using
the ``llvm.loop.parallel`` metadata to mark the loop latch branch instruction,
the ``llvm.loop`` metadata to mark the loop latch branch instruction,
also all of the memory accessing instructions in the loop body need to be
marked with the ``llvm.mem.parallel_loop_access`` metadata. If there
is at least one memory accessing instruction not marked with the metadata,
the loop, despite it possibly using the ``llvm.loop.parallel`` metadata,
must be considered a sequential loop. This causes parallel loops to be
the loop must be considered a sequential loop. This causes parallel loops to be
converted to sequential loops due to optimization passes that are unaware of
the parallel semantics and that insert new memory instructions to the loop
body.
Example of a loop that is considered parallel due to its correct use of
both ``llvm.loop.parallel`` and ``llvm.mem.parallel_loop_access``
both ``llvm.loop`` and ``llvm.mem.parallel_loop_access``
metadata types that refer to the same loop identifier metadata.
.. code-block:: llvm
for.body:
...
%0 = load i32* %arrayidx, align 4, !llvm.mem.parallel_loop_access !0
...
store i32 %0, i32* %arrayidx4, align 4, !llvm.mem.parallel_loop_access !0
...
br i1 %exitcond, label %for.end, label %for.body, !llvm.loop.parallel !0
...
%0 = load i32* %arrayidx, align 4, !llvm.mem.parallel_loop_access !0
...
store i32 %0, i32* %arrayidx4, align 4, !llvm.mem.parallel_loop_access !0
...
br i1 %exitcond, label %for.end, label %for.body, !llvm.loop !0
for.end:
...
@ -2644,27 +2628,73 @@ the loop identifier metadata node directly:
...
inner.for.body:
...
%0 = load i32* %arrayidx, align 4, !llvm.mem.parallel_loop_access !0
...
store i32 %0, i32* %arrayidx4, align 4, !llvm.mem.parallel_loop_access !0
...
br i1 %exitcond, label %inner.for.end, label %inner.for.body, !llvm.loop.parallel !1
...
%0 = load i32* %arrayidx, align 4, !llvm.mem.parallel_loop_access !0
...
store i32 %0, i32* %arrayidx4, align 4, !llvm.mem.parallel_loop_access !0
...
br i1 %exitcond, label %inner.for.end, label %inner.for.body, !llvm.loop !1
inner.for.end:
...
%0 = load i32* %arrayidx, align 4, !llvm.mem.parallel_loop_access !0
...
store i32 %0, i32* %arrayidx4, align 4, !llvm.mem.parallel_loop_access !0
...
br i1 %exitcond, label %outer.for.end, label %outer.for.body, !llvm.loop.parallel !2
...
%0 = load i32* %arrayidx, align 4, !llvm.mem.parallel_loop_access !0
...
store i32 %0, i32* %arrayidx4, align 4, !llvm.mem.parallel_loop_access !0
...
br i1 %exitcond, label %outer.for.end, label %outer.for.body, !llvm.loop !2
outer.for.end: ; preds = %for.body
...
!0 = metadata !{ metadata !1, metadata !2 } ; a list of parallel loop identifiers
!1 = metadata !{ metadata !1 } ; an identifier for the inner parallel loop
!2 = metadata !{ metadata !2 } ; an identifier for the outer parallel loop
!0 = metadata !{ metadata !1, metadata !2 } ; a list of loop identifiers
!1 = metadata !{ metadata !1 } ; an identifier for the inner loop
!2 = metadata !{ metadata !2 } ; an identifier for the outer loop
'``llvm.vectorizer``'
^^^^^^^^^^^^^^^^^^^^^
Metadata prefixed with ``llvm.vectorizer`` is used to control per-loop
vectorization parameters such as vectorization factor and unroll factor.
``llvm.vectorizer`` metadata should be used in conjunction with ``llvm.loop``
loop identification metadata.
'``llvm.vectorizer.unroll``' Metadata
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
This metadata instructs the loop vectorizer to unroll the specified
loop exactly ``N`` times.
The first operand is the string ``llvm.vectorizer.unroll`` and the second
operand is an integer specifying the unroll factor. For example:
.. code-block:: llvm
!0 = metadata !{ metadata !"llvm.vectorizer.unroll", i32 4 }
Note that setting ``llvm.vectorizer.unroll`` to 1 disables unrolling of the
loop.
If ``llvm.vectorizer.unroll`` is set to 0 then the amount of unrolling will be
determined automatically.
'``llvm.vectorizer.width``' Metadata
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
This metadata forces the loop vectorizer to widen scalar values to a vector
width of ``N`` rather than computing the width using a cost model.
The first operand is the string ``llvm.vectorizer.width`` and the second
operand is an integer specifying the width. For example:
.. code-block:: llvm
!0 = metadata !{ metadata !"llvm.vectorizer.width", i32 4 }
Note that setting ``llvm.vectorizer.width`` to 1 disables vectorization of the
loop.
If ``llvm.vectorizer.width`` is set to 0 then the width will be determined
automatically.
Module Flags Metadata
=====================

View File

@ -50,6 +50,7 @@ inline void RemoveFromVector(std::vector<T*> &V, T *N) {
class DominatorTree;
class LoopInfo;
class Loop;
class MDNode;
class PHINode;
class raw_ostream;
template<class N, class M> class LoopInfoBase;
@ -391,6 +392,22 @@ public:
/// iterations.
bool isAnnotatedParallel() const;
/// Return the llvm.loop loop id metadata node for this loop if it is present.
///
/// If this loop contains the same llvm.loop metadata on each branch to the
/// header then the node is returned. If any latch instruction does not
/// contain llvm.loop or or if multiple latches contain different nodes then
/// 0 is returned.
MDNode *getLoopID() const;
/// Set the llvm.loop loop id metadata for this loop.
///
/// The LoopID metadata node will be added to each terminator instruction in
/// the loop that branches to the loop header.
///
/// The LoopID metadata node should have one or more operands and the first
/// operand should should be the node itself.
void setLoopID(MDNode *LoopID) const;
/// hasDedicatedExits - Return true if no exit block for the loop
/// has a predecessor that is outside the loop.
bool hasDedicatedExits() const;

View File

@ -50,6 +50,9 @@ INITIALIZE_PASS_BEGIN(LoopInfo, "loops", "Natural Loop Information", true, true)
INITIALIZE_PASS_DEPENDENCY(DominatorTree)
INITIALIZE_PASS_END(LoopInfo, "loops", "Natural Loop Information", true, true)
// Loop identifier metadata name.
static const char* LoopMDName = "llvm.loop";
//===----------------------------------------------------------------------===//
// Loop implementation
//
@ -234,14 +237,62 @@ bool Loop::isSafeToClone() const {
return true;
}
MDNode *Loop::getLoopID() const {
MDNode *LoopID = 0;
if (isLoopSimplifyForm()) {
LoopID = getLoopLatch()->getTerminator()->getMetadata(LoopMDName);
} else {
// Go through each predecessor of the loop header and check the
// terminator for the metadata.
BasicBlock *H = getHeader();
for (block_iterator I = block_begin(), IE = block_end(); I != IE; ++I) {
TerminatorInst *TI = (*I)->getTerminator();
MDNode *MD = 0;
// Check if this terminator branches to the loop header.
for (unsigned i = 0, ie = TI->getNumSuccessors(); i != ie; ++i) {
if (TI->getSuccessor(i) == H) {
MD = TI->getMetadata(LoopMDName);
break;
}
}
if (!MD)
return 0;
if (!LoopID)
LoopID = MD;
else if (MD != LoopID)
return 0;
}
}
if (!LoopID || LoopID->getNumOperands() == 0 ||
LoopID->getOperand(0) != LoopID)
return 0;
return LoopID;
}
void Loop::setLoopID(MDNode *LoopID) const {
assert(LoopID && "Loop ID should not be null");
assert(LoopID->getNumOperands() > 0 && "Loop ID needs at least one operand");
assert(LoopID->getOperand(0) == LoopID && "Loop ID should refer to itself");
if (isLoopSimplifyForm()) {
getLoopLatch()->getTerminator()->setMetadata(LoopMDName, LoopID);
return;
}
BasicBlock *H = getHeader();
for (block_iterator I = block_begin(), IE = block_end(); I != IE; ++I) {
TerminatorInst *TI = (*I)->getTerminator();
for (unsigned i = 0, ie = TI->getNumSuccessors(); i != ie; ++i) {
if (TI->getSuccessor(i) == H)
TI->setMetadata(LoopMDName, LoopID);
}
}
}
bool Loop::isAnnotatedParallel() const {
BasicBlock *latch = getLoopLatch();
if (latch == NULL)
return false;
MDNode *desiredLoopIdMetadata =
latch->getTerminator()->getMetadata("llvm.loop.parallel");
MDNode *desiredLoopIdMetadata = getLoopID();
if (!desiredLoopIdMetadata)
return false;

View File

@ -119,11 +119,11 @@ static const unsigned TinyTripCountUnrollThreshold = 128;
/// than this number of comparisons.
static const unsigned RuntimeMemoryCheckThreshold = 8;
/// We use a metadata with this name to indicate that a scalar loop was
/// vectorized and that we don't need to re-vectorize it if we run into it
/// again.
static const char*
AlreadyVectorizedMDName = "llvm.vectorizer.already_vectorized";
/// Maximum simd width.
static const unsigned MaxVectorWidth = 64;
/// Maximum vectorization unroll count.
static const unsigned MaxUnrollFactor = 16;
namespace {
@ -768,6 +768,127 @@ private:
const TargetLibraryInfo *TLI;
};
/// Utility class for getting and setting loop vectorizer hints in the form
/// of loop metadata.
struct LoopVectorizeHints {
/// Vectorization width.
unsigned Width;
/// Vectorization unroll factor.
unsigned Unroll;
LoopVectorizeHints(const Loop *L)
: Width(VectorizationFactor)
, Unroll(VectorizationUnroll)
, LoopID(L->getLoopID()) {
getHints(L);
// The command line options override any loop metadata except for when
// width == 1 which is used to indicate the loop is already vectorized.
if (VectorizationFactor.getNumOccurrences() > 0 && Width != 1)
Width = VectorizationFactor;
if (VectorizationUnroll.getNumOccurrences() > 0)
Unroll = VectorizationUnroll;
}
/// Return the loop vectorizer metadata prefix.
static StringRef Prefix() { return "llvm.vectorizer."; }
MDNode *createHint(LLVMContext &Context, StringRef Name, unsigned V) {
SmallVector<Value*, 2> Vals;
Vals.push_back(MDString::get(Context, Name));
Vals.push_back(ConstantInt::get(Type::getInt32Ty(Context), V));
return MDNode::get(Context, Vals);
}
/// Mark the loop L as already vectorized by setting the width to 1.
void setAlreadyVectorized(Loop *L) {
LLVMContext &Context = L->getHeader()->getContext();
Width = 1;
// Create a new loop id with one more operand for the already_vectorized
// hint. If the loop already has a loop id then copy the existing operands.
SmallVector<Value*, 4> Vals(1);
if (LoopID)
for (unsigned i = 1, ie = LoopID->getNumOperands(); i < ie; ++i)
Vals.push_back(LoopID->getOperand(i));
Twine Name = Prefix() + "width";
Vals.push_back(createHint(Context, Name.str(), Width));
MDNode *NewLoopID = MDNode::get(Context, Vals);
// Set operand 0 to refer to the loop id itself.
NewLoopID->replaceOperandWith(0, NewLoopID);
L->setLoopID(NewLoopID);
if (LoopID)
LoopID->replaceAllUsesWith(NewLoopID);
LoopID = NewLoopID;
}
private:
MDNode *LoopID;
/// Find hints specified in the loop metadata.
void getHints(const Loop *L) {
if (!LoopID)
return;
// First operand should refer to the loop id itself.
assert(LoopID->getNumOperands() > 0 && "requires at least one operand");
assert(LoopID->getOperand(0) == LoopID && "invalid loop id");
for (unsigned i = 1, ie = LoopID->getNumOperands(); i < ie; ++i) {
const MDString *S = 0;
SmallVector<Value*, 4> Args;
// The expected hint is either a MDString or a MDNode with the first
// operand a MDString.
if (const MDNode *MD = dyn_cast<MDNode>(LoopID->getOperand(i))) {
if (!MD || MD->getNumOperands() == 0)
continue;
S = dyn_cast<MDString>(MD->getOperand(0));
for (unsigned i = 1, ie = MD->getNumOperands(); i < ie; ++i)
Args.push_back(MD->getOperand(i));
} else {
S = dyn_cast<MDString>(LoopID->getOperand(i));
assert(Args.size() == 0 && "too many arguments for MDString");
}
if (!S)
continue;
// Check if the hint starts with the vectorizer prefix.
StringRef Hint = S->getString();
if (!Hint.startswith(Prefix()))
continue;
// Remove the prefix.
Hint = Hint.substr(Prefix().size(), StringRef::npos);
if (Args.size() == 1)
getHint(Hint, Args[0]);
}
}
// Check string hint with one operand.
void getHint(StringRef Hint, Value *Arg) {
const ConstantInt *C = dyn_cast<ConstantInt>(Arg);
if (!C) return;
unsigned Val = C->getZExtValue();
if (Hint == "width") {
assert(isPowerOf2_32(Val) && Val <= MaxVectorWidth &&
"Invalid width metadata");
Width = Val;
} else if (Hint == "unroll") {
assert(isPowerOf2_32(Val) && Val <= MaxUnrollFactor &&
"Invalid unroll metadata");
Unroll = Val;
} else
DEBUG(dbgs() << "LV: ignoring unknown hint " << Hint);
}
};
/// The LoopVectorize Pass.
struct LoopVectorize : public LoopPass {
/// Pass identification, replacement for typeid
@ -806,6 +927,13 @@ struct LoopVectorize : public LoopPass {
DEBUG(dbgs() << "LV: Checking a loop in \"" <<
L->getHeader()->getParent()->getName() << "\"\n");
LoopVectorizeHints Hints(L);
if (Hints.Width == 1) {
DEBUG(dbgs() << "LV: Not vectorizing.\n");
return false;
}
// Check if it is legal to vectorize the loop.
LoopVectorizationLegality LVL(L, SE, DL, DT, TTI, AA, TLI);
if (!LVL.canVectorize()) {
@ -833,10 +961,10 @@ struct LoopVectorize : public LoopPass {
// Select the optimal vectorization factor.
LoopVectorizationCostModel::VectorizationFactor VF;
VF = CM.selectVectorizationFactor(OptForSize, VectorizationFactor);
VF = CM.selectVectorizationFactor(OptForSize, Hints.Width);
// Select the unroll factor.
unsigned UF = CM.selectUnrollFactor(OptForSize, VectorizationUnroll,
VF.Width, VF.Cost);
unsigned UF = CM.selectUnrollFactor(OptForSize, Hints.Unroll, VF.Width,
VF.Cost);
if (VF.Width == 1) {
DEBUG(dbgs() << "LV: Vectorization is possible but not beneficial.\n");
@ -851,6 +979,9 @@ struct LoopVectorize : public LoopPass {
InnerLoopVectorizer LB(L, SE, LI, DT, DL, TLI, VF.Width, UF);
LB.vectorize(&LVL);
// Mark the loop as already vectorized to avoid vectorizing again.
Hints.setAlreadyVectorized(L);
DEBUG(verifyFunction(*L->getHeader()->getParent()));
return true;
}
@ -1318,11 +1449,6 @@ InnerLoopVectorizer::createEmptyLoop(LoopVectorizationLegality *Legal) {
BasicBlock *ExitBlock = OrigLoop->getExitBlock();
assert(ExitBlock && "Must have an exit block");
// Mark the old scalar loop with metadata that tells us not to vectorize this
// loop again if we run into it.
MDNode *MD = MDNode::get(OldBasicBlock->getContext(), None);
OldBasicBlock->getTerminator()->setMetadata(AlreadyVectorizedMDName, MD);
// Some loops have a single integer induction variable, while other loops
// don't. One example is c++ iterators that often have multiple pointer
// induction variables. In the code below we also support a case where we
@ -2516,13 +2642,6 @@ bool LoopVectorizationLegality::canVectorizeInstrs() {
BasicBlock *PreHeader = TheLoop->getLoopPreheader();
BasicBlock *Header = TheLoop->getHeader();
// If we marked the scalar loop as "already vectorized" then no need
// to vectorize it again.
if (Header->getTerminator()->getMetadata(AlreadyVectorizedMDName)) {
DEBUG(dbgs() << "LV: This loop was vectorized before\n");
return false;
}
// Look for the attribute signaling the absence of NaNs.
Function &F = *Header->getParent();
if (F.hasFnAttribute("no-nans-fp-math"))

View File

@ -21,7 +21,7 @@ for.end.us: ; preds = %for.body3.us
%indvars.iv.next34 = add i64 %indvars.iv33, 1
%lftr.wideiv35 = trunc i64 %indvars.iv.next34 to i32
%exitcond36 = icmp eq i32 %lftr.wideiv35, %m
br i1 %exitcond36, label %for.end15, label %for.body3.lr.ph.us, !llvm.loop.parallel !5
br i1 %exitcond36, label %for.end15, label %for.body3.lr.ph.us, !llvm.loop !5
for.body3.us: ; preds = %for.body3.us, %for.body3.lr.ph.us
%indvars.iv29 = phi i64 [ 0, %for.body3.lr.ph.us ], [ %indvars.iv.next30, %for.body3.us ]
@ -35,7 +35,7 @@ for.body3.us: ; preds = %for.body3.us, %for.
%indvars.iv.next30 = add i64 %indvars.iv29, 1
%lftr.wideiv31 = trunc i64 %indvars.iv.next30 to i32
%exitcond32 = icmp eq i32 %lftr.wideiv31, %m
br i1 %exitcond32, label %for.end.us, label %for.body3.us, !llvm.loop.parallel !4
br i1 %exitcond32, label %for.end.us, label %for.body3.us, !llvm.loop !4
for.body3.lr.ph.us: ; preds = %for.end.us, %entry
%indvars.iv33 = phi i64 [ %indvars.iv.next34, %for.end.us ], [ 0, %entry ]

View File

@ -35,7 +35,7 @@ for.body: ; preds = %for.body.for.body_c
%indvars.iv.next.reload = load i64* %indvars.iv.next.reg2mem
%lftr.wideiv = trunc i64 %indvars.iv.next.reload to i32
%exitcond = icmp eq i32 %lftr.wideiv, 512
br i1 %exitcond, label %for.end, label %for.body.for.body_crit_edge, !llvm.loop.parallel !3
br i1 %exitcond, label %for.end, label %for.body.for.body_crit_edge, !llvm.loop !3
for.body.for.body_crit_edge: ; preds = %for.body
%indvars.iv.next.reload2 = load i64* %indvars.iv.next.reg2mem

View File

@ -65,7 +65,7 @@ for.body: ; preds = %for.body, %entry
store i32 %2, i32* %arrayidx2, align 4, !llvm.mem.parallel_loop_access !3
%lftr.wideiv = trunc i64 %indvars.iv.next to i32
%exitcond = icmp eq i32 %lftr.wideiv, 512
br i1 %exitcond, label %for.end, label %for.body, !llvm.loop.parallel !3
br i1 %exitcond, label %for.end, label %for.body, !llvm.loop !3
for.end: ; preds = %for.body
ret void
@ -98,7 +98,7 @@ for.body: ; preds = %for.body, %entry
store i32 %2, i32* %arrayidx2, align 4, !llvm.mem.parallel_loop_access !6
%lftr.wideiv = trunc i64 %indvars.iv.next to i32
%exitcond = icmp eq i32 %lftr.wideiv, 512
br i1 %exitcond, label %for.end, label %for.body, !llvm.loop.parallel !6
br i1 %exitcond, label %for.end, label %for.body, !llvm.loop !6
for.end: ; preds = %for.body
ret void

View File

@ -0,0 +1,41 @@
; RUN: opt < %s -loop-vectorize -force-vector-width=4 -dce -instcombine -S | FileCheck %s
target datalayout = "e-p:64:64:64-i1:8:8-i8:8:8-i16:16:16-i32:32:32-i64:64:64-f32:32:32-f64:64:64-v64:64:64-v128:128:128-a0:0:64-s0:64:64-f80:128:128-n8:16:32:64-S128"
target triple = "x86_64-apple-macosx10.8.0"
@a = common global [2048 x i32] zeroinitializer, align 16
; This is the loop.
; for (i=0; i<n; i++){
; a[i] += i;
; }
;CHECK: @inc
;CHECK: load <4 x i32>
;CHECK: load <4 x i32>
;CHECK: add nsw <4 x i32>
;CHECK: add nsw <4 x i32>
;CHECK: store <4 x i32>
;CHECK: store <4 x i32>
;CHECK: ret void
define void @inc(i32 %n) nounwind uwtable noinline ssp {
%1 = icmp sgt i32 %n, 0
br i1 %1, label %.lr.ph, label %._crit_edge
.lr.ph: ; preds = %0, %.lr.ph
%indvars.iv = phi i64 [ %indvars.iv.next, %.lr.ph ], [ 0, %0 ]
%2 = getelementptr inbounds [2048 x i32]* @a, i64 0, i64 %indvars.iv
%3 = load i32* %2, align 4
%4 = trunc i64 %indvars.iv to i32
%5 = add nsw i32 %3, %4
store i32 %5, i32* %2, align 4
%indvars.iv.next = add i64 %indvars.iv, 1
%lftr.wideiv = trunc i64 %indvars.iv.next to i32
%exitcond = icmp eq i32 %lftr.wideiv, %n
br i1 %exitcond, label %._crit_edge, label %.lr.ph, !llvm.loop !0
._crit_edge: ; preds = %.lr.ph, %0
ret void
}
!0 = metadata !{metadata !0, metadata !1}
!1 = metadata !{metadata !"llvm.vectorizer.unroll", i32 2}

View File

@ -0,0 +1,31 @@
; RUN: opt < %s -loop-vectorize -force-vector-unroll=1 -dce -instcombine -S | FileCheck %s
target datalayout = "e-p:64:64:64-i1:8:8-i8:8:8-i16:16:16-i32:32:32-i64:64:64-f32:32:32-f64:64:64-v64:64:64-v128:128:128-a0:0:64-s0:64:64-f80:128:128-n8:16:32:64-S128"
target triple = "x86_64-unknown-linux-gnu"
; CHECK: @test1
; CHECK: store <8 x i32>
; CHECK: ret void
define void @test1(i32* nocapture %a, i32 %n) #0 {
entry:
%cmp4 = icmp sgt i32 %n, 0
br i1 %cmp4, label %for.body, label %for.end
for.body: ; preds = %entry, %for.body
%indvars.iv = phi i64 [ %indvars.iv.next, %for.body ], [ 0, %entry ]
%arrayidx = getelementptr inbounds i32* %a, i64 %indvars.iv
%0 = trunc i64 %indvars.iv to i32
store i32 %0, i32* %arrayidx, align 4
%indvars.iv.next = add i64 %indvars.iv, 1
%lftr.wideiv = trunc i64 %indvars.iv.next to i32
%exitcond = icmp eq i32 %lftr.wideiv, %n
br i1 %exitcond, label %for.end, label %for.body, !llvm.loop !0
for.end: ; preds = %for.body, %entry
ret void
}
attributes #0 = { nounwind uwtable "less-precise-fpmad"="false" "no-frame-pointer-elim"="false" "no-frame-pointer-elim-non-leaf"="false" "no-infs-fp-math"="false" "no-nans-fp-math"="false" "unsafe-fp-math"="false" "use-soft-float"="false" }
!0 = metadata !{metadata !0, metadata !1}
!1 = metadata !{metadata !"llvm.vectorizer.width", i32 8}

View File

@ -11,7 +11,7 @@ target triple = "x86_64-apple-macosx10.8.0"
; This test checks that we add metadata to vectorized loops
; CHECK: _Z4foo1Pii
; CHECK: <4 x i32>
; CHECK: llvm.vectorizer.already_vectorized
; CHECK: llvm.loop
; CHECK: ret
; This test comes from the loop:
@ -40,10 +40,10 @@ _ZSt10accumulateIPiiET0_T_S2_S1_.exit: ; preds = %for.body.i, %entry
ret i32 %__init.addr.0.lcssa.i
}
; This test checks that we don't vectorize loops that are marked with the "already vectorized" metadata.
; This test checks that we don't vectorize loops that are marked with the "width" == 1 metadata.
; CHECK: _Z4foo2Pii
; CHECK-NOT: <4 x i32>
; CHECK: llvm.vectorizer.already_vectorized
; CHECK: llvm.loop
; CHECK: ret
define i32 @_Z4foo2Pii(i32* %A, i32 %n) #0 {
entry:
@ -59,7 +59,7 @@ for.body.i: ; preds = %entry, %for.body.i
%add.i = add nsw i32 %0, %__init.addr.05.i
%incdec.ptr.i = getelementptr inbounds i32* %__first.addr.04.i, i64 1
%cmp.i = icmp eq i32* %incdec.ptr.i, %add.ptr
br i1 %cmp.i, label %_ZSt10accumulateIPiiET0_T_S2_S1_.exit, label %for.body.i, !llvm.vectorizer.already_vectorized !3
br i1 %cmp.i, label %_ZSt10accumulateIPiiET0_T_S2_S1_.exit, label %for.body.i, !llvm.loop !0
_ZSt10accumulateIPiiET0_T_S2_S1_.exit: ; preds = %for.body.i, %entry
%__init.addr.0.lcssa.i = phi i32 [ 0, %entry ], [ %add.i, %for.body.i ]
@ -68,5 +68,9 @@ _ZSt10accumulateIPiiET0_T_S2_S1_.exit: ; preds = %for.body.i, %entry
attributes #0 = { nounwind readonly ssp uwtable "fp-contract-model"="standard" "no-frame-pointer-elim" "no-frame-pointer-elim-non-leaf" "realign-stack" "relocation-model"="pic" "ssp-buffers-size"="8" }
!3 = metadata !{}
; CHECK: !0 = metadata !{metadata !0, metadata !1}
; CHECK: !1 = metadata !{metadata !"llvm.vectorizer.width", i32 1}
; CHECK: !2 = metadata !{metadata !2, metadata !1}
!0 = metadata !{metadata !0, metadata !1}
!1 = metadata !{metadata !"llvm.vectorizer.width", i32 1}