mirror of
https://github.com/RPCS3/llvm.git
synced 2024-12-25 05:25:53 +00:00
[MemoryDepAnalysis] Fix compile time slowdown
- Problem One program takes ~3min to compile under -O2. This happens after a certain function A is inlined ~700 times in a function B, inserting thousands of new BBs. This leads to 80% of the compilation time spent in GVN::processNonLocalLoad and MemoryDependenceAnalysis::getNonLocalPointerDependency, while searching for nonlocal information for basic blocks. Usually, to avoid spending a long time to process nonlocal loads, GVN bails out if it gets more than 100 deps as a result from MD->getNonLocalPointerDependency. However this only happens *after* all nonlocal information for BBs have been computed, which is the bottleneck in this scenario. For instance, there are 8280 times where getNonLocalPointerDependency returns deps with more than 100 bbs and from those, 600 times it returns more than 1000 blocks. - Solution Bail out early during the nonlocal info computation whenever we reach a specified threshold. This patch proposes a 100 BBs threshold, it also reduces the compile time from 3min to 23s. - Testing The test-suite presented no compile nor execution time regressions. Some numbers from my machine (x86_64 darwin): - 17s under -Oz (which avoids inlining). - 1.3s under -O1. - 2m51s under -O2 ToT *** 23s under -O2 w/ Result.size() > 100 - 1m54s under -O2 w/ Result.size() > 500 With NumResultsLimit = 100, GVN yields the same outcome as in the unlimited 3min version. http://reviews.llvm.org/D5532 rdar://problem/18188041 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@218792 91177308-0d34-0410-b5e6-96231b3b80d8
This commit is contained in:
parent
72447214a6
commit
1610bcf8d9
@ -51,6 +51,9 @@ STATISTIC(NumCacheCompleteNonLocalPtr,
|
|||||||
// Limit for the number of instructions to scan in a block.
|
// Limit for the number of instructions to scan in a block.
|
||||||
static const int BlockScanLimit = 100;
|
static const int BlockScanLimit = 100;
|
||||||
|
|
||||||
|
// Limit on the number of memdep results to process.
|
||||||
|
static const int NumResultsLimit = 100;
|
||||||
|
|
||||||
char MemoryDependenceAnalysis::ID = 0;
|
char MemoryDependenceAnalysis::ID = 0;
|
||||||
|
|
||||||
// Register this pass...
|
// Register this pass...
|
||||||
@ -1133,6 +1136,25 @@ getNonLocalPointerDepFromBB(const PHITransAddr &Pointer,
|
|||||||
while (!Worklist.empty()) {
|
while (!Worklist.empty()) {
|
||||||
BasicBlock *BB = Worklist.pop_back_val();
|
BasicBlock *BB = Worklist.pop_back_val();
|
||||||
|
|
||||||
|
// If we do process a large number of blocks it becomes very expensive and
|
||||||
|
// likely it isn't worth worrying about
|
||||||
|
if (Result.size() > NumResultsLimit) {
|
||||||
|
Worklist.clear();
|
||||||
|
// Sort it now (if needed) so that recursive invocations of
|
||||||
|
// getNonLocalPointerDepFromBB and other routines that could reuse the
|
||||||
|
// cache value will only see properly sorted cache arrays.
|
||||||
|
if (Cache && NumSortedEntries != Cache->size()) {
|
||||||
|
SortNonLocalDepInfoCache(*Cache, NumSortedEntries);
|
||||||
|
NumSortedEntries = Cache->size();
|
||||||
|
}
|
||||||
|
// Since we bail out, the "Cache" set won't contain all of the
|
||||||
|
// results for the query. This is ok (we can still use it to accelerate
|
||||||
|
// specific block queries) but we can't do the fastpath "return all
|
||||||
|
// results from the set". Clear out the indicator for this.
|
||||||
|
CacheInfo->Pair = BBSkipFirstBlockPair();
|
||||||
|
return true;
|
||||||
|
}
|
||||||
|
|
||||||
// Skip the first block if we have it.
|
// Skip the first block if we have it.
|
||||||
if (!SkipFirstBlock) {
|
if (!SkipFirstBlock) {
|
||||||
// Analyze the dependency of *Pointer in FromBB. See if we already have
|
// Analyze the dependency of *Pointer in FromBB. See if we already have
|
||||||
|
Loading…
Reference in New Issue
Block a user