Files
archived-llvm/include/llvm/Transforms/Scalar/JumpThreading.h
Adam Nemet 6050b768b6 [JumpThreading] Only write back branch-weight MDs for blocks that originally had PGO info
Currently the pass updates branch weights in the IR if the function has
any PGO info (entry frequency is set).  However we could still have
regions of the CFG that does not have branch weights collected (e.g. a
cold region).  In this case we'd use static estimates.  Since static
estimates for branches are determined independently, they are
inconsistent.  Updating them can "randomly" inflate block frequencies.

I've run into this in a completely cold loop of h264ref from
SPEC.  -Rpass-with-hotness showed the loop to be completely cold during
inlining (before JT) but completely hot during vectorization (after JT).

The new testcase demonstrate the problem.  We check array elements
against 1, 2 and 3 in a loop.  The check against 3 is the loop-exiting
check.  The block names should be self-explanatory.

In this example, jump threading incorrectly updates the weight of the
loop-exiting branch to 0, drastically inflating the frequency of the
loop (in the range of billions).

There is no run-time profile info for edges inside the loop, so branch
probabilities are estimated.  These are the resulting branch and block
frequencies for the loop body:

                check_1 (16)
            (8) /  |
            eq_1   | (8)
                \  |
                check_2 (16)
            (8) /  |
            eq_2   | (8)
                \  |
                check_3 (16)
            (1) /  |
       (loop exit) | (15)
                   |
              (back edge)

First we thread eq_1 -> check_2 to check_3.  Frequencies are updated to
remove the frequency of eq_1 from check_2 and then from the false edge
leaving check_2.  Changed frequencies are highlighted with * *:

                check_1 (16)
            (8) /  |
           eq_1~   | (8)
           /       |
          /     check_2 (*8*)
         /  (8) /  |
         \  eq_2   | (*0*)
          \     \  |
           ` --- check_3 (16)
            (1) /  |
       (loop exit) | (15)
                   |
              (back edge)

Next we thread eq_1 -> check_3 and eq_2 -> check_3 to check_1 as new
back edges.  Frequencies are updated to remove the frequency of eq_1 and
eq_3 from check_3 and then the false edge leaving check_3 (changed
frequencies are highlighted with * *):

                  check_1 (16)
              (8) /  |
             eq_1~   | (8)
             /       |
            /     check_2 (*8*)
           /  (8) /  |
          /-- eq_2~  | (*0*)
  (back edge)        |
                  check_3 (*0*)
            (*0*) /  |
         (loop exit) | (*0*)
                     |
                (back edge)

As a result, the loop exit edge ends up with 0 frequency which in turn makes
the loop header to have maximum frequency.

There are a few potential problems here:

1. The profile data seems odd.  There is a single profile sample of the
loop being entered.  On the other hand, there are no weights inside the
loop.

2. Based on static estimation we shouldn't set edges to "extreme"
values, i.e. extremely likely or unlikely.

3. We shouldn't create profile metadata that is calculated from static
estimation.  I am not sure what policy is but it seems to make sense to
treat profile metadata as something that is known to originate from
profiling.  Estimated probabilities should only be reflected in BPI/BFI.

Any one of these would probably fix the immediate problem.  I went for 3
because I think it's a good policy to have and added a FIXME about 2.

Differential Revision: https://reviews.llvm.org/D24118

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@280713 91177308-0d34-0410-b5e6-96231b3b80d8
2016-09-06 16:08:33 +00:00

144 lines
5.2 KiB
C++

//===- JumpThreading.h - thread control through conditional BBs -*- C++ -*-===//
//
// The LLVM Compiler Infrastructure
//
// This file is distributed under the University of Illinois Open Source
// License. See LICENSE.TXT for details.
//
//===----------------------------------------------------------------------===//
/// \file
/// See the comments on JumpThreadingPass.
///
//===----------------------------------------------------------------------===//
#ifndef LLVM_TRANSFORMS_SCALAR_JUMPTHREADING_H
#define LLVM_TRANSFORMS_SCALAR_JUMPTHREADING_H
#include "llvm/ADT/DenseSet.h"
#include "llvm/ADT/SmallPtrSet.h"
#include "llvm/ADT/SmallSet.h"
#include "llvm/Analysis/BlockFrequencyInfo.h"
#include "llvm/Analysis/BlockFrequencyInfoImpl.h"
#include "llvm/Analysis/BranchProbabilityInfo.h"
#include "llvm/Analysis/LazyValueInfo.h"
#include "llvm/Analysis/TargetLibraryInfo.h"
#include "llvm/IR/Instructions.h"
#include "llvm/IR/ValueHandle.h"
namespace llvm {
/// A private "module" namespace for types and utilities used by
/// JumpThreading.
/// These are implementation details and should not be used by clients.
namespace jumpthreading {
// These are at global scope so static functions can use them too.
typedef SmallVectorImpl<std::pair<Constant *, BasicBlock *>> PredValueInfo;
typedef SmallVector<std::pair<Constant *, BasicBlock *>, 8> PredValueInfoTy;
// This is used to keep track of what kind of constant we're currently hoping
// to find.
enum ConstantPreference { WantInteger, WantBlockAddress };
}
/// This pass performs 'jump threading', which looks at blocks that have
/// multiple predecessors and multiple successors. If one or more of the
/// predecessors of the block can be proven to always jump to one of the
/// successors, we forward the edge from the predecessor to the successor by
/// duplicating the contents of this block.
///
/// An example of when this can occur is code like this:
///
/// if () { ...
/// X = 4;
/// }
/// if (X < 3) {
///
/// In this case, the unconditional branch at the end of the first if can be
/// revectored to the false side of the second if.
///
class JumpThreadingPass : public PassInfoMixin<JumpThreadingPass> {
TargetLibraryInfo *TLI;
LazyValueInfo *LVI;
std::unique_ptr<BlockFrequencyInfo> BFI;
std::unique_ptr<BranchProbabilityInfo> BPI;
bool HasProfileData = false;
#ifdef NDEBUG
SmallPtrSet<const BasicBlock *, 16> LoopHeaders;
#else
SmallSet<AssertingVH<const BasicBlock>, 16> LoopHeaders;
#endif
DenseSet<std::pair<Value *, BasicBlock *>> RecursionSet;
unsigned BBDupThreshold;
// RAII helper for updating the recursion stack.
struct RecursionSetRemover {
DenseSet<std::pair<Value *, BasicBlock *>> &TheSet;
std::pair<Value *, BasicBlock *> ThePair;
RecursionSetRemover(DenseSet<std::pair<Value *, BasicBlock *>> &S,
std::pair<Value *, BasicBlock *> P)
: TheSet(S), ThePair(P) {}
~RecursionSetRemover() { TheSet.erase(ThePair); }
};
public:
JumpThreadingPass(int T = -1);
// Hack for MSVC 2013 which seems like it can't synthesize this.
JumpThreadingPass(JumpThreadingPass &&Other)
: TLI(Other.TLI), LVI(Other.LVI), BFI(std::move(Other.BFI)),
BPI(std::move(Other.BPI)), HasProfileData(Other.HasProfileData),
LoopHeaders(std::move(Other.LoopHeaders)),
RecursionSet(std::move(Other.RecursionSet)),
BBDupThreshold(Other.BBDupThreshold) {}
// Glue for old PM.
bool runImpl(Function &F, TargetLibraryInfo *TLI_, LazyValueInfo *LVI_,
bool HasProfileData_, std::unique_ptr<BlockFrequencyInfo> BFI_,
std::unique_ptr<BranchProbabilityInfo> BPI_);
PreservedAnalyses run(Function &F, FunctionAnalysisManager &AM);
void releaseMemory() {
BFI.reset();
BPI.reset();
}
void FindLoopHeaders(Function &F);
bool ProcessBlock(BasicBlock *BB);
bool ThreadEdge(BasicBlock *BB, const SmallVectorImpl<BasicBlock *> &PredBBs,
BasicBlock *SuccBB);
bool DuplicateCondBranchOnPHIIntoPred(
BasicBlock *BB, const SmallVectorImpl<BasicBlock *> &PredBBs);
bool
ComputeValueKnownInPredecessors(Value *V, BasicBlock *BB,
jumpthreading::PredValueInfo &Result,
jumpthreading::ConstantPreference Preference,
Instruction *CxtI = nullptr);
bool ProcessThreadableEdges(Value *Cond, BasicBlock *BB,
jumpthreading::ConstantPreference Preference,
Instruction *CxtI = nullptr);
bool ProcessBranchOnPHI(PHINode *PN);
bool ProcessBranchOnXOR(BinaryOperator *BO);
bool ProcessImpliedCondition(BasicBlock *BB);
bool SimplifyPartiallyRedundantLoad(LoadInst *LI);
bool TryToUnfoldSelect(CmpInst *CondCmp, BasicBlock *BB);
bool TryToUnfoldSelectInCurrBB(BasicBlock *BB);
private:
BasicBlock *SplitBlockPreds(BasicBlock *BB, ArrayRef<BasicBlock *> Preds,
const char *Suffix);
void UpdateBlockFreqAndEdgeWeight(BasicBlock *PredBB, BasicBlock *BB,
BasicBlock *NewBB, BasicBlock *SuccBB);
/// Check if the block has profile metadata for its outgoing edges.
bool doesBlockHaveProfileData(BasicBlock *BB);
};
} // end namespace llvm
#endif