llvm/CodeGen at 4172b10ca1adfc1026428e5f522aaab98bd939ad - llvm

RPCSX/llvm

mirror of https://github.com/RPCSX/llvm.git synced 2024-12-09 04:36:49 +00:00

History

Chris Lattner c88d8e944d Fix the #1 code quality problem that I have seen on X86 (and it also affects PPC and other targets). In a particular, consider code like this: struct Vector3 { double x, y, z; }; struct Matrix3 { Vector3 a, b, c; }; double dot(Vector3 &a, Vector3 &b) { return a.x * b.x + a.y * b.y + a.z * b.z; } Vector3 mul(Vector3 &a, Matrix3 &b) { Vector3 r; r.x = dot( a, b.a ); r.y = dot( a, b.b ); r.z = dot( a, b.c ); return r; } void transform(Matrix3 &m, Vector3 *x, int n) { for (int i = 0; i < n; i++) x[i] = mul( x[i], m ); } we compile transform to a loop with all of the GEP instructions for indexing into 'm' pulled out of the loop (9 of them). Because isel occurs a bb at a time we are unable to fold the constant index into the loads in the loop, leading to PPC code that looks like this: LBB3_1: ; no_exit.preheader li r2, 0 addi r6, r3, 64 ;; 9 values live across the loop body! addi r7, r3, 56 addi r8, r3, 48 addi r9, r3, 40 addi r10, r3, 32 addi r11, r3, 24 addi r12, r3, 16 addi r30, r3, 8 LBB3_2: ; no_exit lfd f0, 0(r30) lfd f1, 8(r4) fmul f0, f1, f0 lfd f2, 0(r3) ;; no constant indices folded into the loads! lfd f3, 0(r4) lfd f4, 0(r10) lfd f5, 0(r6) lfd f6, 0(r7) lfd f7, 0(r8) lfd f8, 0(r9) lfd f9, 0(r11) lfd f10, 0(r12) lfd f11, 16(r4) fmadd f0, f3, f2, f0 fmul f2, f1, f4 fmadd f0, f11, f10, f0 fmadd f2, f3, f9, f2 fmul f1, f1, f6 stfd f0, 0(r4) fmadd f0, f11, f8, f2 fmadd f1, f3, f7, f1 stfd f0, 8(r4) fmadd f0, f11, f5, f1 addi r29, r4, 24 stfd f0, 16(r4) addi r2, r2, 1 cmpw cr0, r2, r5 or r4, r29, r29 bne cr0, LBB3_2 ; no_exit uh, yuck. With this patch, we now sink the constant offsets into the loop, producing this code: LBB3_1: ; no_exit.preheader li r2, 0 LBB3_2: ; no_exit lfd f0, 8(r3) lfd f1, 8(r4) fmul f0, f1, f0 lfd f2, 0(r3) lfd f3, 0(r4) lfd f4, 32(r3) ;; much nicer. lfd f5, 64(r3) lfd f6, 56(r3) lfd f7, 48(r3) lfd f8, 40(r3) lfd f9, 24(r3) lfd f10, 16(r3) lfd f11, 16(r4) fmadd f0, f3, f2, f0 fmul f2, f1, f4 fmadd f0, f11, f10, f0 fmadd f2, f3, f9, f2 fmul f1, f1, f6 stfd f0, 0(r4) fmadd f0, f11, f8, f2 fmadd f1, f3, f7, f1 stfd f0, 8(r4) fmadd f0, f11, f5, f1 addi r6, r4, 24 stfd f0, 16(r4) addi r2, r2, 1 cmpw cr0, r2, r5 or r4, r6, r6 bne cr0, LBB3_2 ; no_exit This is much nicer as it reduces register pressure in the loop a lot. On X86, this takes the function from having 9 spilled registers to 2. This should help some spec programs on X86 (gzip?) This is currently only enabled with -enable-gep-isel-opt to allow perf testing tonight. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@24606 91177308-0d34-0410-b5e6-96231b3b80d8		2005-12-05 07:10:48 +00:00
..
SelectionDAG	Fix the #1 code quality problem that I have seen on X86 (and it also affects	2005-12-05 07:10:48 +00:00
AsmPrinter.cpp	Allow target to customize directive used to switch to arbitrary section in SwitchSection,	2005-11-21 08:25:09 +00:00
BranchFolding.cpp
ELFWriter.cpp	nuke blank line	2005-11-10 18:49:46 +00:00
IntrinsicLowering.cpp	continued readcyclecounter support	2005-11-11 16:47:30 +00:00
LiveInterval.cpp	Fix LiveInterval::getOverlapingRanges to take things in the right order	2005-10-21 06:41:30 +00:00
LiveIntervalAnalysis.cpp	Fix some spello's pointed out by Gabor Greif	2005-10-26 18:41:41 +00:00
LiveVariables.cpp	Add section switching to common code generator code. Add a couple of	2005-11-21 07:06:27 +00:00
MachineBasicBlock.cpp
MachineCodeEmitter.cpp
MachineFunction.cpp	Added graphviz/gv support for MF.	2005-10-12 12:09:05 +00:00
MachineInstr.cpp
Makefile
Passes.cpp	Alkis agrees that that iterative scan allocator isn't going to be worked on	2005-10-24 04:14:30 +00:00
PHIElimination.cpp	clean up this code a bit, no functionality change	2005-10-03 07:22:07 +00:00
PhysRegTracker.h
PrologEpilogInserter.cpp	Always compute max align.	2005-11-06 17:43:20 +00:00
RegAllocLinearScan.cpp	I think I know what you meant here, but just to be safe I'll let you	2005-11-21 14:09:40 +00:00
RegAllocLocal.cpp	Nuke noop copies.	2005-11-09 18:22:42 +00:00
RegAllocSimple.cpp	Change this code ot pass register classes into the stack slot spiller/reloader	2005-09-30 01:29:00 +00:00
TwoAddressInstructionPass.cpp	Fix some spello's pointed out by Gabor Greif	2005-10-26 18:41:41 +00:00
UnreachableBlockElim.cpp
VirtRegMap.cpp	Fix the LLC regressions on X86 last night. In particular, when undoing	2005-10-06 17:19:06 +00:00
VirtRegMap.h