llvm-mirror

mirror of https://github.com/RPCS3/llvm-mirror.git synced 2024-12-13 06:29:59 +00:00

History

Keno Fischer 4db92211d7 [ExecutionDepsFix] Improve clearance calculation for loops Summary: In revision rL278321, ExecutionDepsFix learned how to pick a better register for undef register reads, e.g. for instructions such as `vcvtsi2sdq`. While this revision improved performance on a good number of our benchmarks, it unfortunately also caused significant regressions (up to 3x) on others. This regression turned out to be caused by loops such as: PH -> A -> B (xmm<Undef> -> xmm<Def>) -> C -> D -> EXIT ^ \| +----------------------------------+ In the previous version of the clearance calculation, we would visit the blocks in order, remembering for each whether there were any incoming backedges from blocks that we hadn't processed yet and if so queuing up the block to be re-processed. However, for loop structures such as the above, this is clearly insufficient, since the block B does not have any unknown backedges, so we do not see the false dependency from the previous interation's Def of xmm registers in B. To fix this, we need to consider all blocks that are part of the loop and reprocess them one the correct clearance values are known. As an optimization, we also want to avoid reprocessing any later blocks that are not part of the loop. In summary, the iteration order is as follows: Before: PH A B C D A' Corrected (Naive): PH A B C D A' B' C' D' Corrected (w/ optimization): PH A B C A' B' C' D To facilitate this optimization we introduce two new counters for each basic block. The first counts how many of it's predecssors have completed primary processing. The second counts how many of its predecessors have completed all processing (we will call such a block done. Now, the criteria to reprocess a block is as follows: - All Predecessors have completed primary processing - For x the number of predecessors that have completed primary processing at the time of primary processing of this block, the number of predecessors that are done has reached x. The intuition behind this criterion is as follows: We need to perform primary processing on all predecessors in order to find out any direct defs in those predecessors. When predecessors are done, we also know that we have information about indirect defs (e.g. in block B though that were inherited through B->C->A->B). However, we can't wait for all predecessors to be done, since that would cause cyclic dependencies. However, it is guaranteed that all those predecessors that are prior to us in reverse postorder will be done before us. Since we iterate of the basic blocks in reverse postorder, the number x above, is precisely the count of the number of predecessors prior to us in reverse postorder. Reviewers: myatsina Differential Revision: https://reviews.llvm.org/D28759 llvm-svn: 293571		2017-01-30 23:37:03 +00:00
..
AArch64	GlobalISel: correctly translate invoke when callee is a register.	2017-01-30 21:45:21 +00:00
AMDGPU	Re-commit AMDGPU/GlobalISel: Add support for simple shaders	2017-01-30 21:56:46 +00:00
ARM	DAG: Constant fold fp16_to_fp/fp16_to_fp	2017-01-30 16:57:41 +00:00
AVR	[AVR] Implement TargetLoweing::getRegisterByName	2017-01-07 23:39:47 +00:00
BPF	Revert "In visitSTORE, always use FindBetterChain, rather than only when UseAA is enabled."	2017-01-26 16:46:13 +00:00
Generic	Revert "Do not verify dominator tree if it has no roots"	2017-01-25 17:15:48 +00:00
Hexagon	[Hexagon] Add Hexagon-specific loop idiom recognition pass	2017-01-26 21:41:10 +00:00
Inputs
Lanai	[lanai] Simplify small section check in LowerGlobalAddress and treat ldata sections specially.	2016-12-15 16:56:16 +00:00
Mips	[mips] Recommit: "N64 static relocation model support"	2017-01-27 11:36:52 +00:00
MIR	[MIRParser] Allow generic register specification on operand.	2017-01-20 00:29:59 +00:00
MSP430	Revert "In visitSTORE, always use FindBetterChain, rather than only when UseAA is enabled."	2017-01-26 16:46:13 +00:00
NVPTX	NVPTX: Refactor NVPTXInferAddressSpaces to check TTI	2017-01-30 23:02:12 +00:00
PowerPC	[PPC] cleanup of mayLoad/mayStore flags and memory operands.	2017-01-26 18:59:15 +00:00
SPARC	Check for register clobbers when merging a vreg live range with a	2017-01-13 19:08:36 +00:00
SystemZ	SDAG: Update ChainNodesMatched during UpdateChains if a node is replaced	2017-01-30 18:29:46 +00:00
Thumb	Revert "In visitSTORE, always use FindBetterChain, rather than only when UseAA is enabled."	2017-01-26 16:46:13 +00:00
Thumb2	ARM: match GCC's behaviour for builtins	2017-01-13 16:25:33 +00:00
WebAssembly	[WebAssembly] Don't create bitcast-wrappers for varargs.	2017-01-20 20:50:29 +00:00
WinEH
X86	[ExecutionDepsFix] Improve clearance calculation for loops	2017-01-30 23:37:03 +00:00
XCore	DAG: Fold fneg into compare with constant into the constant	2017-01-30 17:57:28 +00:00