Add support for llvm.vectorizer metadata

- llvm.loop.parallel metadata has been renamed to llvm.loop to be more generic by making the root of additional loop metadata. - Loop::isAnnotatedParallel now looks for llvm.loop and associated llvm.mem.parallel_loop_access - document llvm.loop and update llvm.mem.parallel_loop_access - add support for llvm.vectorizer.width and llvm.vectorizer.unroll - document llvm.vectorizer.* metadata - add utility class LoopVectorizerHints for getting/setting loop metadata - use llvm.vectorizer.width=1 to indicate already vectorized instead of already_vectorized - update existing tests that used llvm.loop.parallel and llvm.vectorizer.already_vectorized Reviewed by: Nadav Rotem git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@182802 91177308-0d34-0410-b5e6-96231b3b80d8
2024-12-01 07:30:33 +00:00 · 2013-05-28 20:00:34 +00:00 · 2013-05-28 20:00:34 +00:00 · ee21b6f7b4
commit ee21b6f7b4
parent a32edcfbc5
10 changed files with 381 additions and 88 deletions
--- a/docs/LangRef.rst
+++ b/docs/LangRef.rst
@ -2554,8 +2554,8 @@ Examples:
 It is sometimes useful to attach information to loop constructs. Currently,
 loop metadata is implemented as metadata attached to the branch instruction
 in the loop latch block. This type of metadata refer to a metadata node that is
-guaranteed to be separate for each loop. The loop-level metadata is prefixed
-with ``llvm.loop``.
+guaranteed to be separate for each loop. The loop identifier metadata is 
+specified with the name ``llvm.loop``.

 The loop identifier metadata is implemented using a metadata that refers to
 itself to avoid merging it with any other identifier metadata, e.g.,
@ -2569,32 +2569,17 @@ constructs:
    !0 = metadata !{ metadata !0 }
    !1 = metadata !{ metadata !1 }

+The loop identifier metadata can be used to specify additional per-loop
+metadata. Any operands after the first operand can be treated as user-defined
+metadata. For example the ``llvm.vectorizer.unroll`` metadata is understood
+by the loop vectorizer to indicate how many times to unroll the loop:

-'``llvm.loop.parallel``' Metadata
-^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+.. code-block:: llvm

-This loop metadata can be used to communicate that a loop should be considered
-a parallel loop. The semantics of parallel loops in this case is the one
-with the strongest cross-iteration instruction ordering freedom: the
-iterations in the loop can be considered completely independent of each
-other (also known as embarrassingly parallel loops).
-
-This metadata can originate from a programming language with parallel loop
-constructs. In such a case it is completely the programmer's responsibility
-to ensure the instructions from the different iterations of the loop can be
-executed in an arbitrary order, in parallel, or intertwined. No loop-carried
-dependency checking at all must be expected from the compiler.
-
-In order to fulfill the LLVM requirement for metadata to be safely ignored,
-it is important to ensure that a parallel loop is converted to
-a sequential loop in case an optimization (agnostic of the parallel loop
-semantics) converts the loop back to such. This happens when new memory
-accesses that do not fulfill the requirement of free ordering across iterations
-are added to the loop. Therefore, this metadata is required, but not
-sufficient, to consider the loop at hand a parallel loop. For a loop
-to be parallel,  all its memory accessing instructions need to be
-marked with the ``llvm.mem.parallel_loop_access`` metadata that refer
-to the same loop identifier metadata that identify the loop at hand.
+      br i1 %exitcond, label %._crit_edge, label %.lr.ph, !llvm.loop !0
+    ...
+    !0 = metadata !{ metadata !0, metadata !1 }
+    !1 = metadata !{ metadata !"llvm.vectorizer.unroll", i32 2 }

 '``llvm.mem``'
 ^^^^^^^^^^^^^^^
@ -2606,29 +2591,28 @@ for optimizations are prefixed with ``llvm.mem``.
 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

 For a loop to be parallel, in addition to using
-the ``llvm.loop.parallel`` metadata to mark the loop latch branch instruction,
+the ``llvm.loop`` metadata to mark the loop latch branch instruction,
 also all of the memory accessing instructions in the loop body need to be
 marked with the ``llvm.mem.parallel_loop_access`` metadata. If there
 is at least one memory accessing instruction not marked with the metadata,
-the loop, despite it possibly using the ``llvm.loop.parallel`` metadata,
-must be considered a sequential loop. This causes parallel loops to be
+the loop must be considered a sequential loop. This causes parallel loops to be
 converted to sequential loops due to optimization passes that are unaware of
 the parallel semantics and that insert new memory instructions to the loop
 body.

 Example of a loop that is considered parallel due to its correct use of
-both ``llvm.loop.parallel`` and ``llvm.mem.parallel_loop_access``
+both ``llvm.loop`` and ``llvm.mem.parallel_loop_access``
 metadata types that refer to the same loop identifier metadata.

 .. code-block:: llvm

   for.body:
-   ...
-   %0 = load i32* %arrayidx, align 4, !llvm.mem.parallel_loop_access !0
-   ...
-   store i32 %0, i32* %arrayidx4, align 4, !llvm.mem.parallel_loop_access !0
-   ...
-   br i1 %exitcond, label %for.end, label %for.body, !llvm.loop.parallel !0
+     ...
+     %0 = load i32* %arrayidx, align 4, !llvm.mem.parallel_loop_access !0
+     ...
+     store i32 %0, i32* %arrayidx4, align 4, !llvm.mem.parallel_loop_access !0
+     ...
+     br i1 %exitcond, label %for.end, label %for.body, !llvm.loop !0

   for.end:
   ...
@ -2644,27 +2628,73 @@ the loop identifier metadata node directly:
   ...

   inner.for.body:
-   ...
-   %0 = load i32* %arrayidx, align 4, !llvm.mem.parallel_loop_access !0
-   ...
-   store i32 %0, i32* %arrayidx4, align 4, !llvm.mem.parallel_loop_access !0
-   ...
-   br i1 %exitcond, label %inner.for.end, label %inner.for.body, !llvm.loop.parallel !1
+     ...
+     %0 = load i32* %arrayidx, align 4, !llvm.mem.parallel_loop_access !0
+     ...
+     store i32 %0, i32* %arrayidx4, align 4, !llvm.mem.parallel_loop_access !0
+     ...
+     br i1 %exitcond, label %inner.for.end, label %inner.for.body, !llvm.loop !1

   inner.for.end:
-   ...
-   %0 = load i32* %arrayidx, align 4, !llvm.mem.parallel_loop_access !0
-   ...
-   store i32 %0, i32* %arrayidx4, align 4, !llvm.mem.parallel_loop_access !0
-   ...
-   br i1 %exitcond, label %outer.for.end, label %outer.for.body, !llvm.loop.parallel !2
+     ...
+     %0 = load i32* %arrayidx, align 4, !llvm.mem.parallel_loop_access !0
+     ...
+     store i32 %0, i32* %arrayidx4, align 4, !llvm.mem.parallel_loop_access !0
+     ...
+     br i1 %exitcond, label %outer.for.end, label %outer.for.body, !llvm.loop !2

   outer.for.end:                                          ; preds = %for.body
   ...
-   !0 = metadata !{ metadata !1, metadata !2 } ; a list of parallel loop identifiers
-   !1 = metadata !{ metadata !1 } ; an identifier for the inner parallel loop
-   !2 = metadata !{ metadata !2 } ; an identifier for the outer parallel loop
+   !0 = metadata !{ metadata !1, metadata !2 } ; a list of loop identifiers
+   !1 = metadata !{ metadata !1 } ; an identifier for the inner loop
+   !2 = metadata !{ metadata !2 } ; an identifier for the outer loop

+'``llvm.vectorizer``'
+^^^^^^^^^^^^^^^^^^^^^
+
+Metadata prefixed with ``llvm.vectorizer`` is used to control per-loop
+vectorization parameters such as vectorization factor and unroll factor.
+
+``llvm.vectorizer`` metadata should be used in conjunction with ``llvm.loop``
+loop identification metadata.
+
+'``llvm.vectorizer.unroll``' Metadata
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+
+This metadata instructs the loop vectorizer to unroll the specified
+loop exactly ``N`` times.
+
+The first operand is the string ``llvm.vectorizer.unroll`` and the second
+operand is an integer specifying the unroll factor. For example:
+
+.. code-block:: llvm
+
+   !0 = metadata !{ metadata !"llvm.vectorizer.unroll", i32 4 }
+
+Note that setting ``llvm.vectorizer.unroll`` to 1 disables unrolling of the
+loop.
+
+If ``llvm.vectorizer.unroll`` is set to 0 then the amount of unrolling will be
+determined automatically.
+
+'``llvm.vectorizer.width``' Metadata
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+
+This metadata forces the loop vectorizer to widen scalar values to a vector
+width of ``N`` rather than computing the width using a cost model.
+
+The first operand is the string ``llvm.vectorizer.width`` and the second
+operand is an integer specifying the width. For example:
+
+.. code-block:: llvm
+
+   !0 = metadata !{ metadata !"llvm.vectorizer.width", i32 4 }
+
+Note that setting ``llvm.vectorizer.width`` to 1 disables vectorization of the
+loop.
+
+If ``llvm.vectorizer.width`` is set to 0 then the width will be determined
+automatically.

 Module Flags Metadata
 =====================
--- a/include/llvm/Analysis/LoopInfo.h
+++ b/include/llvm/Analysis/LoopInfo.h
@ -50,6 +50,7 @@ inline void RemoveFromVector(std::vector<T*> &V, T *N) {
 class DominatorTree;
 class LoopInfo;
 class Loop;
+class MDNode;
 class PHINode;
 class raw_ostream;
 template<class N, class M> class LoopInfoBase;
@ -391,6 +392,22 @@ public:
  /// iterations.
  bool isAnnotatedParallel() const;

+  /// Return the llvm.loop loop id metadata node for this loop if it is present.
+  ///
+  /// If this loop contains the same llvm.loop metadata on each branch to the
+  /// header then the node is returned. If any latch instruction does not
+  /// contain llvm.loop or or if multiple latches contain different nodes then
+  /// 0 is returned.
+  MDNode *getLoopID() const;
+  /// Set the llvm.loop loop id metadata for this loop.
+  ///
+  /// The LoopID metadata node will be added to each terminator instruction in
+  /// the loop that branches to the loop header.
+  ///
+  /// The LoopID metadata node should have one or more operands and the first
+  /// operand should should be the node itself.
+  void setLoopID(MDNode *LoopID) const;
+
  /// hasDedicatedExits - Return true if no exit block for the loop
  /// has a predecessor that is outside the loop.
  bool hasDedicatedExits() const;
--- a/lib/Analysis/LoopInfo.cpp
+++ b/lib/Analysis/LoopInfo.cpp
@ -50,6 +50,9 @@ INITIALIZE_PASS_BEGIN(LoopInfo, "loops", "Natural Loop Information", true, true)
 INITIALIZE_PASS_DEPENDENCY(DominatorTree)
 INITIALIZE_PASS_END(LoopInfo, "loops", "Natural Loop Information", true, true)

+// Loop identifier metadata name.
+static const char* LoopMDName = "llvm.loop";
+
 //===----------------------------------------------------------------------===//
 // Loop implementation
 //
@ -234,14 +237,62 @@ bool Loop::isSafeToClone() const {
  return true;
 }

+MDNode *Loop::getLoopID() const {
+  MDNode *LoopID = 0;
+  if (isLoopSimplifyForm()) {
+    LoopID = getLoopLatch()->getTerminator()->getMetadata(LoopMDName);
+  } else {
+    // Go through each predecessor of the loop header and check the
+    // terminator for the metadata.
+    BasicBlock *H = getHeader();
+    for (block_iterator I = block_begin(), IE = block_end(); I != IE; ++I) {
+      TerminatorInst *TI = (*I)->getTerminator();
+      MDNode *MD = 0;
+
+      // Check if this terminator branches to the loop header.
+      for (unsigned i = 0, ie = TI->getNumSuccessors(); i != ie; ++i) {
+        if (TI->getSuccessor(i) == H) {
+          MD = TI->getMetadata(LoopMDName);
+          break;
+        }
+      }
+      if (!MD)
+        return 0;
+
+      if (!LoopID)
+        LoopID = MD;
+      else if (MD != LoopID)
+        return 0;
+    }
+  }
+  if (!LoopID || LoopID->getNumOperands() == 0 ||
+      LoopID->getOperand(0) != LoopID)
+    return 0;
+  return LoopID;
+}
+
+void Loop::setLoopID(MDNode *LoopID) const {
+  assert(LoopID && "Loop ID should not be null");
+  assert(LoopID->getNumOperands() > 0 && "Loop ID needs at least one operand");
+  assert(LoopID->getOperand(0) == LoopID && "Loop ID should refer to itself");
+
+  if (isLoopSimplifyForm()) {
+    getLoopLatch()->getTerminator()->setMetadata(LoopMDName, LoopID);
+    return;
+  }
+
+  BasicBlock *H = getHeader();
+  for (block_iterator I = block_begin(), IE = block_end(); I != IE; ++I) {
+    TerminatorInst *TI = (*I)->getTerminator();
+    for (unsigned i = 0, ie = TI->getNumSuccessors(); i != ie; ++i) {
+      if (TI->getSuccessor(i) == H)
+        TI->setMetadata(LoopMDName, LoopID);
+    }
+  }
+}
+
 bool Loop::isAnnotatedParallel() const {
-
-  BasicBlock *latch = getLoopLatch();
-  if (latch == NULL)
-    return false;
-
-  MDNode *desiredLoopIdMetadata =
-    latch->getTerminator()->getMetadata("llvm.loop.parallel");
+  MDNode *desiredLoopIdMetadata = getLoopID();

  if (!desiredLoopIdMetadata)
      return false;
--- a/lib/Transforms/Vectorize/LoopVectorize.cpp
+++ b/lib/Transforms/Vectorize/LoopVectorize.cpp
@ -119,11 +119,11 @@ static const unsigned TinyTripCountUnrollThreshold = 128;
 /// than this number of comparisons.
 static const unsigned RuntimeMemoryCheckThreshold = 8;

-/// We use a metadata with this name  to indicate that a scalar loop was
-/// vectorized and that we don't need to re-vectorize it if we run into it
-/// again.
-static const char*
-AlreadyVectorizedMDName = "llvm.vectorizer.already_vectorized";
+/// Maximum simd width.
+static const unsigned MaxVectorWidth = 64;
+
+/// Maximum vectorization unroll count.
+static const unsigned MaxUnrollFactor = 16;

 namespace {

@ -768,6 +768,127 @@ private:
  const TargetLibraryInfo *TLI;
 };

+/// Utility class for getting and setting loop vectorizer hints in the form
+/// of loop metadata.
+struct LoopVectorizeHints {
+  /// Vectorization width.
+  unsigned Width;
+  /// Vectorization unroll factor.
+  unsigned Unroll;
+
+  LoopVectorizeHints(const Loop *L)
+  : Width(VectorizationFactor)
+  , Unroll(VectorizationUnroll)
+  , LoopID(L->getLoopID()) {
+    getHints(L);
+    // The command line options override any loop metadata except for when
+    // width == 1 which is used to indicate the loop is already vectorized.
+    if (VectorizationFactor.getNumOccurrences() > 0 && Width != 1)
+      Width = VectorizationFactor;
+    if (VectorizationUnroll.getNumOccurrences() > 0)
+      Unroll = VectorizationUnroll;
+  }
+
+  /// Return the loop vectorizer metadata prefix.
+  static StringRef Prefix() { return "llvm.vectorizer."; }
+
+  MDNode *createHint(LLVMContext &Context, StringRef Name, unsigned V) {
+    SmallVector<Value*, 2> Vals;
+    Vals.push_back(MDString::get(Context, Name));
+    Vals.push_back(ConstantInt::get(Type::getInt32Ty(Context), V));
+    return MDNode::get(Context, Vals);
+  }
+
+  /// Mark the loop L as already vectorized by setting the width to 1.
+  void setAlreadyVectorized(Loop *L) {
+    LLVMContext &Context = L->getHeader()->getContext();
+
+    Width = 1;
+
+    // Create a new loop id with one more operand for the already_vectorized
+    // hint. If the loop already has a loop id then copy the existing operands.
+    SmallVector<Value*, 4> Vals(1);
+    if (LoopID)
+      for (unsigned i = 1, ie = LoopID->getNumOperands(); i < ie; ++i)
+        Vals.push_back(LoopID->getOperand(i));
+
+    Twine Name = Prefix() + "width";
+    Vals.push_back(createHint(Context, Name.str(), Width));
+
+    MDNode *NewLoopID = MDNode::get(Context, Vals);
+    // Set operand 0 to refer to the loop id itself.
+    NewLoopID->replaceOperandWith(0, NewLoopID);
+
+    L->setLoopID(NewLoopID);
+    if (LoopID)
+      LoopID->replaceAllUsesWith(NewLoopID);
+
+    LoopID = NewLoopID;
+  }
+
+private:
+  MDNode *LoopID;
+
+  /// Find hints specified in the loop metadata.
+  void getHints(const Loop *L) {
+    if (!LoopID)
+      return;
+  
+    // First operand should refer to the loop id itself.
+    assert(LoopID->getNumOperands() > 0 && "requires at least one operand");
+    assert(LoopID->getOperand(0) == LoopID && "invalid loop id");
+  
+    for (unsigned i = 1, ie = LoopID->getNumOperands(); i < ie; ++i) {
+      const MDString *S = 0;
+      SmallVector<Value*, 4> Args;
+
+      // The expected hint is either a MDString or a MDNode with the first
+      // operand a MDString.
+      if (const MDNode *MD = dyn_cast<MDNode>(LoopID->getOperand(i))) {
+        if (!MD || MD->getNumOperands() == 0)
+          continue;
+        S = dyn_cast<MDString>(MD->getOperand(0));
+        for (unsigned i = 1, ie = MD->getNumOperands(); i < ie; ++i)
+          Args.push_back(MD->getOperand(i));
+      } else {
+        S = dyn_cast<MDString>(LoopID->getOperand(i));
+        assert(Args.size() == 0 && "too many arguments for MDString");
+      }
+
+      if (!S)
+        continue;
+
+      // Check if the hint starts with the vectorizer prefix.
+      StringRef Hint = S->getString();
+      if (!Hint.startswith(Prefix()))
+        continue;
+      // Remove the prefix.
+      Hint = Hint.substr(Prefix().size(), StringRef::npos);
+  
+      if (Args.size() == 1)
+        getHint(Hint, Args[0]);
+    }
+  }
+
+  // Check string hint with one operand.
+  void getHint(StringRef Hint, Value *Arg) {
+    const ConstantInt *C = dyn_cast<ConstantInt>(Arg);
+    if (!C) return;
+    unsigned Val = C->getZExtValue();
+
+    if (Hint == "width") {
+      assert(isPowerOf2_32(Val) && Val <= MaxVectorWidth &&
+             "Invalid width metadata");
+      Width = Val;
+    } else if (Hint == "unroll") {
+      assert(isPowerOf2_32(Val) && Val <= MaxUnrollFactor &&
+             "Invalid unroll metadata");
+      Unroll = Val;
+    } else
+      DEBUG(dbgs() << "LV: ignoring unknown hint " << Hint);
+  }
+};
+
 /// The LoopVectorize Pass.
 struct LoopVectorize : public LoopPass {
  /// Pass identification, replacement for typeid
@ -806,6 +927,13 @@ struct LoopVectorize : public LoopPass {
    DEBUG(dbgs() << "LV: Checking a loop in \"" <<
          L->getHeader()->getParent()->getName() << "\"\n");

+    LoopVectorizeHints Hints(L);
+
+    if (Hints.Width == 1) {
+      DEBUG(dbgs() << "LV: Not vectorizing.\n");
+      return false;
+    }
+
    // Check if it is legal to vectorize the loop.
    LoopVectorizationLegality LVL(L, SE, DL, DT, TTI, AA, TLI);
    if (!LVL.canVectorize()) {
@ -833,10 +961,10 @@ struct LoopVectorize : public LoopPass {

    // Select the optimal vectorization factor.
    LoopVectorizationCostModel::VectorizationFactor VF;
-    VF = CM.selectVectorizationFactor(OptForSize, VectorizationFactor);
+    VF = CM.selectVectorizationFactor(OptForSize, Hints.Width);
    // Select the unroll factor.
-    unsigned UF = CM.selectUnrollFactor(OptForSize, VectorizationUnroll,
-                                        VF.Width, VF.Cost);
+    unsigned UF = CM.selectUnrollFactor(OptForSize, Hints.Unroll, VF.Width,
+                                        VF.Cost);

    if (VF.Width == 1) {
      DEBUG(dbgs() << "LV: Vectorization is possible but not beneficial.\n");
@ -851,6 +979,9 @@ struct LoopVectorize : public LoopPass {
    InnerLoopVectorizer LB(L, SE, LI, DT, DL, TLI, VF.Width, UF);
    LB.vectorize(&LVL);

+    // Mark the loop as already vectorized to avoid vectorizing again.
+    Hints.setAlreadyVectorized(L);
+
    DEBUG(verifyFunction(*L->getHeader()->getParent()));
    return true;
  }
@ -1318,11 +1449,6 @@ InnerLoopVectorizer::createEmptyLoop(LoopVectorizationLegality *Legal) {
  BasicBlock *ExitBlock = OrigLoop->getExitBlock();
  assert(ExitBlock && "Must have an exit block");

-  // Mark the old scalar loop with metadata that tells us not to vectorize this
-  // loop again if we run into it.
-  MDNode *MD = MDNode::get(OldBasicBlock->getContext(), None);
-  OldBasicBlock->getTerminator()->setMetadata(AlreadyVectorizedMDName, MD);
-
  // Some loops have a single integer induction variable, while other loops
  // don't. One example is c++ iterators that often have multiple pointer
  // induction variables. In the code below we also support a case where we
@ -2516,13 +2642,6 @@ bool LoopVectorizationLegality::canVectorizeInstrs() {
  BasicBlock *PreHeader = TheLoop->getLoopPreheader();
  BasicBlock *Header = TheLoop->getHeader();

-  // If we marked the scalar loop as "already vectorized" then no need
-  // to vectorize it again.
-  if (Header->getTerminator()->getMetadata(AlreadyVectorizedMDName)) {
-    DEBUG(dbgs() << "LV: This loop was vectorized before\n");
-    return false;
-  }
-
  // Look for the attribute signaling the absence of NaNs.
  Function &F = *Header->getParent();
  if (F.hasFnAttribute("no-nans-fp-math"))
--- a/test/Transforms/LoopVectorize/X86/illegal-parallel-loop-uniform-write.ll
+++ b/test/Transforms/LoopVectorize/X86/illegal-parallel-loop-uniform-write.ll
@ -21,7 +21,7 @@ for.end.us:                                       ; preds = %for.body3.us
  %indvars.iv.next34 = add i64 %indvars.iv33, 1
  %lftr.wideiv35 = trunc i64 %indvars.iv.next34 to i32
  %exitcond36 = icmp eq i32 %lftr.wideiv35, %m
-  br i1 %exitcond36, label %for.end15, label %for.body3.lr.ph.us, !llvm.loop.parallel !5
+  br i1 %exitcond36, label %for.end15, label %for.body3.lr.ph.us, !llvm.loop !5

 for.body3.us:                                     ; preds = %for.body3.us, %for.body3.lr.ph.us
  %indvars.iv29 = phi i64 [ 0, %for.body3.lr.ph.us ], [ %indvars.iv.next30, %for.body3.us ]
@ -35,7 +35,7 @@ for.body3.us:                                     ; preds = %for.body3.us, %for.
  %indvars.iv.next30 = add i64 %indvars.iv29, 1
  %lftr.wideiv31 = trunc i64 %indvars.iv.next30 to i32
  %exitcond32 = icmp eq i32 %lftr.wideiv31, %m
-  br i1 %exitcond32, label %for.end.us, label %for.body3.us, !llvm.loop.parallel !4
+  br i1 %exitcond32, label %for.end.us, label %for.body3.us, !llvm.loop !4

 for.body3.lr.ph.us:                               ; preds = %for.end.us, %entry
  %indvars.iv33 = phi i64 [ %indvars.iv.next34, %for.end.us ], [ 0, %entry ]
--- a/test/Transforms/LoopVectorize/X86/parallel-loops-after-reg2mem.ll
+++ b/test/Transforms/LoopVectorize/X86/parallel-loops-after-reg2mem.ll
@ -35,7 +35,7 @@ for.body:                                         ; preds = %for.body.for.body_c
  %indvars.iv.next.reload = load i64* %indvars.iv.next.reg2mem
  %lftr.wideiv = trunc i64 %indvars.iv.next.reload to i32
  %exitcond = icmp eq i32 %lftr.wideiv, 512
-  br i1 %exitcond, label %for.end, label %for.body.for.body_crit_edge, !llvm.loop.parallel !3
+  br i1 %exitcond, label %for.end, label %for.body.for.body_crit_edge, !llvm.loop !3

 for.body.for.body_crit_edge:                      ; preds = %for.body
  %indvars.iv.next.reload2 = load i64* %indvars.iv.next.reg2mem
--- a/test/Transforms/LoopVectorize/X86/parallel-loops.ll
+++ b/test/Transforms/LoopVectorize/X86/parallel-loops.ll
@ -65,7 +65,7 @@ for.body:                                         ; preds = %for.body, %entry
  store i32 %2, i32* %arrayidx2, align 4, !llvm.mem.parallel_loop_access !3
  %lftr.wideiv = trunc i64 %indvars.iv.next to i32
  %exitcond = icmp eq i32 %lftr.wideiv, 512
-  br i1 %exitcond, label %for.end, label %for.body, !llvm.loop.parallel !3
+  br i1 %exitcond, label %for.end, label %for.body, !llvm.loop !3

 for.end:                                          ; preds = %for.body
  ret void
@ -98,7 +98,7 @@ for.body:                                         ; preds = %for.body, %entry
  store i32 %2, i32* %arrayidx2, align 4, !llvm.mem.parallel_loop_access !6
  %lftr.wideiv = trunc i64 %indvars.iv.next to i32
  %exitcond = icmp eq i32 %lftr.wideiv, 512
-  br i1 %exitcond, label %for.end, label %for.body, !llvm.loop.parallel !6
+  br i1 %exitcond, label %for.end, label %for.body, !llvm.loop !6

 for.end:                                          ; preds = %for.body
  ret void
--- a/test/Transforms/LoopVectorize/metadata-unroll.ll
+++ b/test/Transforms/LoopVectorize/metadata-unroll.ll
@ -0,0 +1,41 @@
+; RUN: opt < %s  -loop-vectorize -force-vector-width=4 -dce -instcombine -S | FileCheck %s
+
+target datalayout = "e-p:64:64:64-i1:8:8-i8:8:8-i16:16:16-i32:32:32-i64:64:64-f32:32:32-f64:64:64-v64:64:64-v128:128:128-a0:0:64-s0:64:64-f80:128:128-n8:16:32:64-S128"
+target triple = "x86_64-apple-macosx10.8.0"
+
+@a = common global [2048 x i32] zeroinitializer, align 16
+
+; This is the loop.
+;  for (i=0; i<n; i++){
+;    a[i] += i;
+;  }
+;CHECK: @inc
+;CHECK: load <4 x i32>
+;CHECK: load <4 x i32>
+;CHECK: add nsw <4 x i32>
+;CHECK: add nsw <4 x i32>
+;CHECK: store <4 x i32>
+;CHECK: store <4 x i32>
+;CHECK: ret void
+define void @inc(i32 %n) nounwind uwtable noinline ssp {
+  %1 = icmp sgt i32 %n, 0
+  br i1 %1, label %.lr.ph, label %._crit_edge
+
+.lr.ph:                                           ; preds = %0, %.lr.ph
+  %indvars.iv = phi i64 [ %indvars.iv.next, %.lr.ph ], [ 0, %0 ]
+  %2 = getelementptr inbounds [2048 x i32]* @a, i64 0, i64 %indvars.iv
+  %3 = load i32* %2, align 4
+  %4 = trunc i64 %indvars.iv to i32
+  %5 = add nsw i32 %3, %4
+  store i32 %5, i32* %2, align 4
+  %indvars.iv.next = add i64 %indvars.iv, 1
+  %lftr.wideiv = trunc i64 %indvars.iv.next to i32
+  %exitcond = icmp eq i32 %lftr.wideiv, %n
+  br i1 %exitcond, label %._crit_edge, label %.lr.ph, !llvm.loop !0
+
+._crit_edge:                                      ; preds = %.lr.ph, %0
+  ret void
+}
+
+!0 = metadata !{metadata !0, metadata !1}
+!1 = metadata !{metadata !"llvm.vectorizer.unroll", i32 2}
--- a/test/Transforms/LoopVectorize/metadata-width.ll
+++ b/test/Transforms/LoopVectorize/metadata-width.ll
@ -0,0 +1,31 @@
+; RUN: opt < %s  -loop-vectorize -force-vector-unroll=1 -dce -instcombine -S | FileCheck %s
+
+target datalayout = "e-p:64:64:64-i1:8:8-i8:8:8-i16:16:16-i32:32:32-i64:64:64-f32:32:32-f64:64:64-v64:64:64-v128:128:128-a0:0:64-s0:64:64-f80:128:128-n8:16:32:64-S128"
+target triple = "x86_64-unknown-linux-gnu"
+
+; CHECK: @test1
+; CHECK: store <8 x i32>
+; CHECK: ret void
+define void @test1(i32* nocapture %a, i32 %n) #0 {
+entry:
+  %cmp4 = icmp sgt i32 %n, 0
+  br i1 %cmp4, label %for.body, label %for.end
+
+for.body:                                         ; preds = %entry, %for.body
+  %indvars.iv = phi i64 [ %indvars.iv.next, %for.body ], [ 0, %entry ]
+  %arrayidx = getelementptr inbounds i32* %a, i64 %indvars.iv
+  %0 = trunc i64 %indvars.iv to i32
+  store i32 %0, i32* %arrayidx, align 4
+  %indvars.iv.next = add i64 %indvars.iv, 1
+  %lftr.wideiv = trunc i64 %indvars.iv.next to i32
+  %exitcond = icmp eq i32 %lftr.wideiv, %n
+  br i1 %exitcond, label %for.end, label %for.body, !llvm.loop !0
+
+for.end:                                          ; preds = %for.body, %entry
+  ret void
+}
+
+attributes #0 = { nounwind uwtable "less-precise-fpmad"="false" "no-frame-pointer-elim"="false" "no-frame-pointer-elim-non-leaf"="false" "no-infs-fp-math"="false" "no-nans-fp-math"="false" "unsafe-fp-math"="false" "use-soft-float"="false" }
+
+!0 = metadata !{metadata !0, metadata !1}
+!1 = metadata !{metadata !"llvm.vectorizer.width", i32 8}
--- a/test/Transforms/LoopVectorize/vectorize-once.ll
+++ b/test/Transforms/LoopVectorize/vectorize-once.ll
@ -11,7 +11,7 @@ target triple = "x86_64-apple-macosx10.8.0"
 ; This test checks that we add metadata to vectorized loops
 ; CHECK: _Z4foo1Pii
 ; CHECK: <4 x i32>
-; CHECK: llvm.vectorizer.already_vectorized
+; CHECK: llvm.loop
 ; CHECK: ret

 ; This test comes from the loop:
@ -40,10 +40,10 @@ _ZSt10accumulateIPiiET0_T_S2_S1_.exit:            ; preds = %for.body.i, %entry
  ret i32 %__init.addr.0.lcssa.i
 }

-; This test checks that we don't vectorize loops that are marked with the "already vectorized" metadata.
+; This test checks that we don't vectorize loops that are marked with the "width" == 1 metadata.
 ; CHECK: _Z4foo2Pii
 ; CHECK-NOT: <4 x i32>
-; CHECK: llvm.vectorizer.already_vectorized
+; CHECK: llvm.loop
 ; CHECK: ret
 define i32 @_Z4foo2Pii(i32* %A, i32 %n) #0 {
 entry:
@ -59,7 +59,7 @@ for.body.i:                                       ; preds = %entry, %for.body.i
  %add.i = add nsw i32 %0, %__init.addr.05.i
  %incdec.ptr.i = getelementptr inbounds i32* %__first.addr.04.i, i64 1
  %cmp.i = icmp eq i32* %incdec.ptr.i, %add.ptr
-  br i1 %cmp.i, label %_ZSt10accumulateIPiiET0_T_S2_S1_.exit, label %for.body.i, !llvm.vectorizer.already_vectorized !3
+  br i1 %cmp.i, label %_ZSt10accumulateIPiiET0_T_S2_S1_.exit, label %for.body.i, !llvm.loop !0

 _ZSt10accumulateIPiiET0_T_S2_S1_.exit:            ; preds = %for.body.i, %entry
  %__init.addr.0.lcssa.i = phi i32 [ 0, %entry ], [ %add.i, %for.body.i ]
@ -68,5 +68,9 @@ _ZSt10accumulateIPiiET0_T_S2_S1_.exit:            ; preds = %for.body.i, %entry

 attributes #0 = { nounwind readonly ssp uwtable "fp-contract-model"="standard" "no-frame-pointer-elim" "no-frame-pointer-elim-non-leaf" "realign-stack" "relocation-model"="pic" "ssp-buffers-size"="8" }

-!3 = metadata !{}
+; CHECK: !0 = metadata !{metadata !0, metadata !1}
+; CHECK: !1 = metadata !{metadata !"llvm.vectorizer.width", i32 1}
+; CHECK: !2 = metadata !{metadata !2, metadata !1}

+!0 = metadata !{metadata !0, metadata !1}
+!1 = metadata !{metadata !"llvm.vectorizer.width", i32 1}