llvm/lib/CodeGen
Chris Lattner c88d8e944d Fix the #1 code quality problem that I have seen on X86 (and it also affects
PPC and other targets).  In a particular, consider code like this:

struct Vector3 { double x, y, z; };
struct Matrix3 { Vector3 a, b, c; };
double dot(Vector3 &a, Vector3 &b) {
   return a.x * b.x  +  a.y * b.y  +  a.z * b.z;
}
Vector3 mul(Vector3 &a, Matrix3 &b) {
   Vector3 r;
   r.x = dot( a, b.a );
   r.y = dot( a, b.b );
   r.z = dot( a, b.c );
   return r;
}
void transform(Matrix3 &m, Vector3 *x, int n) {
   for (int i = 0; i < n; i++)
      x[i] = mul( x[i], m );
}

we compile transform to a loop with all of the GEP instructions for indexing
into 'm' pulled out of the loop (9 of them).  Because isel occurs a bb at a time
we are unable to fold the constant index into the loads in the loop, leading to
PPC code that looks like this:

LBB3_1: ; no_exit.preheader
        li r2, 0
        addi r6, r3, 64        ;; 9 values live across the loop body!
        addi r7, r3, 56
        addi r8, r3, 48
        addi r9, r3, 40
        addi r10, r3, 32
        addi r11, r3, 24
        addi r12, r3, 16
        addi r30, r3, 8
LBB3_2: ; no_exit
        lfd f0, 0(r30)
        lfd f1, 8(r4)
        fmul f0, f1, f0
        lfd f2, 0(r3)        ;; no constant indices folded into the loads!
        lfd f3, 0(r4)
        lfd f4, 0(r10)
        lfd f5, 0(r6)
        lfd f6, 0(r7)
        lfd f7, 0(r8)
        lfd f8, 0(r9)
        lfd f9, 0(r11)
        lfd f10, 0(r12)
        lfd f11, 16(r4)
        fmadd f0, f3, f2, f0
        fmul f2, f1, f4
        fmadd f0, f11, f10, f0
        fmadd f2, f3, f9, f2
        fmul f1, f1, f6
        stfd f0, 0(r4)
        fmadd f0, f11, f8, f2
        fmadd f1, f3, f7, f1
        stfd f0, 8(r4)
        fmadd f0, f11, f5, f1
        addi r29, r4, 24
        stfd f0, 16(r4)
        addi r2, r2, 1
        cmpw cr0, r2, r5
        or r4, r29, r29
        bne cr0, LBB3_2 ; no_exit

uh, yuck.  With this patch, we now sink the constant offsets into the loop, producing
this code:

LBB3_1: ; no_exit.preheader
        li r2, 0
LBB3_2: ; no_exit
        lfd f0, 8(r3)
        lfd f1, 8(r4)
        fmul f0, f1, f0
        lfd f2, 0(r3)
        lfd f3, 0(r4)
        lfd f4, 32(r3)       ;; much nicer.
        lfd f5, 64(r3)
        lfd f6, 56(r3)
        lfd f7, 48(r3)
        lfd f8, 40(r3)
        lfd f9, 24(r3)
        lfd f10, 16(r3)
        lfd f11, 16(r4)
        fmadd f0, f3, f2, f0
        fmul f2, f1, f4
        fmadd f0, f11, f10, f0
        fmadd f2, f3, f9, f2
        fmul f1, f1, f6
        stfd f0, 0(r4)
        fmadd f0, f11, f8, f2
        fmadd f1, f3, f7, f1
        stfd f0, 8(r4)
        fmadd f0, f11, f5, f1
        addi r6, r4, 24
        stfd f0, 16(r4)
        addi r2, r2, 1
        cmpw cr0, r2, r5
        or r4, r6, r6
        bne cr0, LBB3_2 ; no_exit

This is much nicer as it reduces register pressure in the loop a lot.  On X86,
this takes the function from having 9 spilled registers to 2.  This should help
some spec programs on X86 (gzip?)

This is currently only enabled with -enable-gep-isel-opt to allow perf testing
tonight.


git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@24606 91177308-0d34-0410-b5e6-96231b3b80d8
2005-12-05 07:10:48 +00:00
..
SelectionDAG Fix the #1 code quality problem that I have seen on X86 (and it also affects 2005-12-05 07:10:48 +00:00
AsmPrinter.cpp Allow target to customize directive used to switch to arbitrary section in SwitchSection, 2005-11-21 08:25:09 +00:00
BranchFolding.cpp
ELFWriter.cpp nuke blank line 2005-11-10 18:49:46 +00:00
IntrinsicLowering.cpp continued readcyclecounter support 2005-11-11 16:47:30 +00:00
LiveInterval.cpp Fix LiveInterval::getOverlapingRanges to take things in the right order 2005-10-21 06:41:30 +00:00
LiveIntervalAnalysis.cpp Fix some spello's pointed out by Gabor Greif 2005-10-26 18:41:41 +00:00
LiveVariables.cpp Add section switching to common code generator code. Add a couple of 2005-11-21 07:06:27 +00:00
MachineBasicBlock.cpp
MachineCodeEmitter.cpp
MachineFunction.cpp Added graphviz/gv support for MF. 2005-10-12 12:09:05 +00:00
MachineInstr.cpp
Makefile
Passes.cpp Alkis agrees that that iterative scan allocator isn't going to be worked on 2005-10-24 04:14:30 +00:00
PHIElimination.cpp clean up this code a bit, no functionality change 2005-10-03 07:22:07 +00:00
PhysRegTracker.h
PrologEpilogInserter.cpp Always compute max align. 2005-11-06 17:43:20 +00:00
RegAllocLinearScan.cpp I think I know what you meant here, but just to be safe I'll let you 2005-11-21 14:09:40 +00:00
RegAllocLocal.cpp Nuke noop copies. 2005-11-09 18:22:42 +00:00
RegAllocSimple.cpp Change this code ot pass register classes into the stack slot spiller/reloader 2005-09-30 01:29:00 +00:00
TwoAddressInstructionPass.cpp Fix some spello's pointed out by Gabor Greif 2005-10-26 18:41:41 +00:00
UnreachableBlockElim.cpp
VirtRegMap.cpp Fix the LLC regressions on X86 last night. In particular, when undoing 2005-10-06 17:19:06 +00:00
VirtRegMap.h