llvm/lib/Target/AMDGPU/Utils/AMDGPULaneDominator.cpp
Nicolai Haehnle 126cd7e831 AMDGPU: Fix copying i1 value out of loop with non-uniform exit
Summary:
When an i1-value is defined inside of a loop and used outside of it, we
cannot simply use the SGPR bitmask from the loop's last iteration.

There are also useful and correct cases of an i1-value being copied between
basic blocks, e.g. when a condition is computed outside of a loop and used
inside it. The concept of dominators is not sufficient to capture what is
going on, so I propose the notion of "lane-dominators".

Fixes a bug encountered in Nier: Automata.

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=103743
Change-Id: If37b969ddc71d823ab3004aeafb9ea050e45bd9a

Reviewers: arsenm, rampitec

Subscribers: kzhuravl, wdng, mgorny, yaxunl, dstuttard, tpr, llvm-commits, t-tye

Differential Revision: https://reviews.llvm.org/D40547

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@329164 91177308-0d34-0410-b5e6-96231b3b80d8
2018-04-04 10:57:58 +00:00

76 lines
2.2 KiB
C++

//===-- AMDGPULaneDominator.cpp - Determine Lane Dominators ---------------===//
//
// The LLVM Compiler Infrastructure
//
// This file is distributed under the University of Illinois Open Source
// License. See LICENSE.TXT for details.
//
//===----------------------------------------------------------------------===//
//
// MBB A lane-dominates MBB B if
// 1. A dominates B in the usual sense, i.e. every path from the entry to B
// goes through A, and
// 2. whenever B executes, every active lane during that execution of B was
// also active during the most recent execution of A.
//
// The simplest example where A dominates B but does not lane-dominate it is
// where A is a loop:
//
// |
// +--+
// A |
// +--+
// |
// B
//
// Unfortunately, the second condition is not fully captured by the control
// flow graph when it is unstructured (as may happen when branch conditions are
// uniform).
//
// The following replacement of the second condition is a conservative
// approximation. It is an equivalent condition when the CFG is fully
// structured:
//
// 2'. every cycle in the CFG that contains A also contains B.
//
//===----------------------------------------------------------------------===//
#include "AMDGPULaneDominator.h"
#include "llvm/ADT/DenseSet.h"
#include "llvm/ADT/SmallVector.h"
#include "llvm/CodeGen/MachineBasicBlock.h"
namespace llvm {
namespace AMDGPU {
// Given machine basic blocks A and B where A dominates B, check whether
// A lane-dominates B.
//
// The check is conservative, i.e. there can be false-negatives.
bool laneDominates(MachineBasicBlock *A, MachineBasicBlock *B) {
// Check whether A is reachable from itself without going through B.
DenseSet<MachineBasicBlock *> Reachable;
SmallVector<MachineBasicBlock *, 8> Stack;
Stack.push_back(A);
do {
MachineBasicBlock *MBB = Stack.back();
Stack.pop_back();
for (MachineBasicBlock *Succ : MBB->successors()) {
if (Succ == A)
return false;
if (Succ != B && Reachable.insert(Succ).second)
Stack.push_back(Succ);
}
} while (!Stack.empty());
return true;
}
} // namespace AMDGPU
} // namespace llvm