llvm/test/CodeGen/SPARC
Kyle Butt a466b368fe Codegen: Make chains from trellis-shaped CFGs
Lay out trellis-shaped CFGs optimally.
A trellis of the shape below:

  A     B
  |\   /|
  | \ / |
  |  X  |
  | / \ |
  |/   \|
  C     D

would be laid out A; B->C ; D by the current layout algorithm. Now we identify
trellises and lay them out either A->C; B->D or A->D; B->C. This scales with an
increasing number of predecessors. A trellis is a a group of 2 or more
predecessor blocks that all have the same successors.

because of this we can tail duplicate to extend existing trellises.

As an example consider the following CFG:

    B   D   F   H
   / \ / \ / \ / \
  A---C---E---G---Ret

Where A,C,E,G are all small (Currently 2 instructions).

The CFG preserving layout is then A,B,C,D,E,F,G,H,Ret.

The current code will copy C into B, E into D and G into F and yield the layout
A,C,B(C),E,D(E),F(G),G,H,ret

define void @straight_test(i32 %tag) {
entry:
  br label %test1
test1: ; A
  %tagbit1 = and i32 %tag, 1
  %tagbit1eq0 = icmp eq i32 %tagbit1, 0
  br i1 %tagbit1eq0, label %test2, label %optional1
optional1: ; B
  call void @a()
  br label %test2
test2: ; C
  %tagbit2 = and i32 %tag, 2
  %tagbit2eq0 = icmp eq i32 %tagbit2, 0
  br i1 %tagbit2eq0, label %test3, label %optional2
optional2: ; D
  call void @b()
  br label %test3
test3: ; E
  %tagbit3 = and i32 %tag, 4
  %tagbit3eq0 = icmp eq i32 %tagbit3, 0
  br i1 %tagbit3eq0, label %test4, label %optional3
optional3: ; F
  call void @c()
  br label %test4
test4: ; G
  %tagbit4 = and i32 %tag, 8
  %tagbit4eq0 = icmp eq i32 %tagbit4, 0
  br i1 %tagbit4eq0, label %exit, label %optional4
optional4: ; H
  call void @d()
  br label %exit
exit:
  ret void
}

here is the layout after D27742:
straight_test:                          # @straight_test
; ... Prologue elided
; BB#0:                                 # %entry ; A (merged with test1)
; ... More prologue elided
	mr 30, 3
	andi. 3, 30, 1
	bc 12, 1, .LBB0_2
; BB#1:                                 # %test2 ; C
	rlwinm. 3, 30, 0, 30, 30
	beq	 0, .LBB0_3
	b .LBB0_4
.LBB0_2:                                # %optional1 ; B (copy of C)
	bl a
	nop
	rlwinm. 3, 30, 0, 30, 30
	bne	 0, .LBB0_4
.LBB0_3:                                # %test3 ; E
	rlwinm. 3, 30, 0, 29, 29
	beq	 0, .LBB0_5
	b .LBB0_6
.LBB0_4:                                # %optional2 ; D (copy of E)
	bl b
	nop
	rlwinm. 3, 30, 0, 29, 29
	bne	 0, .LBB0_6
.LBB0_5:                                # %test4 ; G
	rlwinm. 3, 30, 0, 28, 28
	beq	 0, .LBB0_8
	b .LBB0_7
.LBB0_6:                                # %optional3 ; F (copy of G)
	bl c
	nop
	rlwinm. 3, 30, 0, 28, 28
	beq	 0, .LBB0_8
.LBB0_7:                                # %optional4 ; H
	bl d
	nop
.LBB0_8:                                # %exit ; Ret
	ld 30, 96(1)                    # 8-byte Folded Reload
	addi 1, 1, 112
	ld 0, 16(1)
	mtlr 0
	blr

The tail-duplication has produced some benefit, but it has also produced a
trellis which is not laid out optimally. With this patch, we improve the layouts
of such trellises, and decrease the cost calculation for tail-duplication
accordingly.

This patch produces the layout A,C,E,G,B,D,F,H,Ret. This layout does have
back edges, which is a negative, but it has a bigger compensating
positive, which is that it handles the case where there are long strings
of skipped blocks much better than the original layout. Both layouts
handle runs of executed blocks equally well. Branch prediction also
improves if there is any correlation between subsequent optional blocks.

Here is the resulting concrete layout:

straight_test:                          # @straight_test
; BB#0:                                 # %entry ; A (merged with test1)
	mr 30, 3
	andi. 3, 30, 1
	bc 12, 1, .LBB0_4
; BB#1:                                 # %test2 ; C
	rlwinm. 3, 30, 0, 30, 30
	bne	 0, .LBB0_5
.LBB0_2:                                # %test3 ; E
	rlwinm. 3, 30, 0, 29, 29
	bne	 0, .LBB0_6
.LBB0_3:                                # %test4 ; G
	rlwinm. 3, 30, 0, 28, 28
	bne	 0, .LBB0_7
	b .LBB0_8
.LBB0_4:                                # %optional1 ; B (Copy of C)
	bl a
	nop
	rlwinm. 3, 30, 0, 30, 30
	beq	 0, .LBB0_2
.LBB0_5:                                # %optional2 ; D (Copy of E)
	bl b
	nop
	rlwinm. 3, 30, 0, 29, 29
	beq	 0, .LBB0_3
.LBB0_6:                                # %optional3 ; F (Copy of G)
	bl c
	nop
	rlwinm. 3, 30, 0, 28, 28
	beq	 0, .LBB0_8
.LBB0_7:                                # %optional4 ; H
	bl d
	nop
.LBB0_8:                                # %exit

Differential Revision: https://reviews.llvm.org/D28522

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@295223 91177308-0d34-0410-b5e6-96231b3b80d8
2017-02-15 19:49:14 +00:00
..
32abi.ll VirtRegMap: Replace some identity copies with KILL instructions. 2016-07-09 00:19:07 +00:00
64abi.ll [Sparc] Add Soft Float support 2016-05-18 09:14:13 +00:00
64bit.ll
64cond.ll
64spill.ll
2006-01-22-BitConvertLegalize.ll
2007-05-09-JumpTables.ll
2007-07-05-LiveIntervalAssert.ll
2008-10-10-InlineAsmMemoryOperand.ll
2008-10-10-InlineAsmRegOperand.ll
2009-08-28-PIC.ll
2009-08-28-WeakLinkage.ll
2011-01-11-Call.ll
2011-01-11-CC.ll Fix tests that used CHECK-NEXT-NOT and CHECK-DAG-NOT. 2016-02-26 19:40:34 +00:00
2011-01-11-FrameAddr.ll
2011-01-19-DelaySlot.ll ScheduleDAGInstrs: Add condjump deps to addSchedBarrierDeps() 2016-11-11 01:34:21 +00:00
2011-01-21-ByValArgs.ll
2011-01-22-SRet.ll
2011-12-03-TailDuplication.ll
2012-05-01-LowerArguments.ll
2013-05-17-CallFrame.ll [Sparc] Don't overlap variable-sized allocas with other stack variables. 2016-10-25 22:13:28 +00:00
analyze-branch.ll [SPARC] Revamp AnalyzeBranch and add ReverseBranchCondition. 2016-01-13 04:44:14 +00:00
atomics.ll Support expanding partial-word cmpxchg to full-word cmpxchg in AtomicExpandPass. 2016-06-17 18:11:48 +00:00
basictest.ll [Sparc] Implement UMUL_LOHI and SMUL_LOHI instead of MULHS/MULHU/MUL. 2016-10-05 20:54:17 +00:00
blockaddr.ll
constpool.ll
ctpop.ll
DbgValueOtherTargets.test
empty-functions.ll
exception.ll
fail-alloca-align.ll [Sparc] Don't overlap variable-sized allocas with other stack variables. 2016-10-25 22:13:28 +00:00
float-constants.ll
float.ll [Sparc] Fix double-float fabs and fneg on little endian CPUs. 2016-04-25 22:54:09 +00:00
fp128.ll [Sparc] Fix double-float fabs and fneg on little endian CPUs. 2016-04-25 22:54:09 +00:00
func-addr.ll [Sparc] Allow taking of function address into a register. 2016-05-04 12:11:05 +00:00
globals.ll
inlineasm.ll [Sparc] Enable more inline assembly constraints. 2016-05-20 09:03:01 +00:00
leafproc.ll
LeonCASAInstructionUT.ll [Myriad]: set LeonCASA processor feature 2016-09-13 17:51:41 +00:00
LeonDetectRoundChangePassUT.ll Sparc: fix test. 2016-10-19 15:55:11 +00:00
LeonFixAllFDIVSQRTPassUT.ll [Sparc][LEON] Test for FixFDIVSQRT erratum fix. 2016-11-01 14:23:37 +00:00
LeonInsertNOPLoadPassUT.ll [Sparc][LEON] Removed the parts of the errata fixes implemented using inline assembly as this is not the desired behaviour for end-users. Small change to a unit test to implement this without requiring the inline assembly. 2016-09-09 14:16:51 +00:00
LeonItinerariesUT.ll [Sparc][LEON] Itineraries unit test. 2016-05-10 09:09:20 +00:00
LeonReplaceFMULSPassUT.ll [Sparc][LEON] Removed the parts of the errata fixes implemented using inline assembly as this is not the desired behaviour for end-users. Small change to a unit test to implement this without requiring the inline assembly. 2016-09-09 14:16:51 +00:00
LeonReplaceSDIVPassUT.ll This pass, fixing an erratum in some LEON 2 processors ensures that the SDIV instruction is not issued, but replaced by SDIVcc instead, which does not exhibit the error. Unit test included. 2016-10-10 08:53:06 +00:00
LeonSMACUMACInstructionUT.ll [SPARC] Fixes for hardware errata on LEON processor. 2016-06-19 11:03:28 +00:00
lit.local.cfg
mature-mc-support.ll [LLC] Add an inline assembly diagnostics handler. 2017-02-03 11:14:39 +00:00
missing-sret.ll
missinglabel.ll Codegen: Fix broken assumption in Tail Merge. 2016-06-24 18:16:36 +00:00
mult-alt-generic-sparc.ll
multiple-div.ll
obj-relocs.ll
parts.ll
private.ll
register-clobber.ll Check for register clobbers when merging a vreg live range with a 2017-01-13 19:08:36 +00:00
rem.ll
reserved-regs.ll
select-mask.ll
setjmp.ll
sjlj.ll Codegen: Make chains from trellis-shaped CFGs 2017-02-15 19:49:14 +00:00
soft-float.ll [SPARC] Fix test so that it checks the correct label. 2017-01-04 14:01:58 +00:00
spill.ll
spillsize.ll
sret-secondary.ll
stack-align.ll [Sparc] Don't overlap variable-sized allocas with other stack variables. 2016-10-25 22:13:28 +00:00
stack-protector.ll [SPARC] [SSP] Add support for LOAD_STACK_GUARD. 2016-04-26 10:37:14 +00:00
thread-pointer.ll [SPARC] Add support for llvm.thread.pointer. 2016-04-26 10:37:01 +00:00
tls.ll
trap.ll
varargs.ll
vector-call.ll Fix Sparc 32bit Lowering to rebundle up v2i32 values. 2016-02-26 18:55:22 +00:00
vector-extract-elt.ll [DAG] Teach computeKnownBits and ComputeNumSignBits in SelectionDAG to look through EXTRACT_VECTOR_ELT. 2016-10-05 17:40:27 +00:00
zerostructcall.ll [Sparc] Allow passing of empty structs. 2016-06-01 08:48:56 +00:00