Commit Graph

4735 Commits

Author SHA1 Message Date
Chris Lattner
022b15083b Reimplement the inner loop of DSE. It now uniformly uses getDependence(),
doesn't do its own local caching, and is slightly more aggressive about
free/store dse (see testcase).  This eliminates the last external client 
of MemDep::getDependenceFrom().

llvm-svn: 60619
2008-12-06 00:53:22 +00:00
Dale Johannesen
f5a072c388 Make LoopStrengthReduce smarter about hoisting things out of
loops when they can be subsumed into addressing modes.

Change X86 addressing mode check to realize that
some PIC references need an extra register.
(I believe this is correct for Linux, if not, I'm sure
someone will tell me.)

llvm-svn: 60608
2008-12-05 21:47:27 +00:00
Chris Lattner
2b5e1b5263 Make a few major changes to memdep and its clients:
1. Merge the 'None' result into 'Normal', making loads
   and stores return their dependencies on allocations as Normal.
2. Split the 'Normal' result into 'Clobber' and 'Def' to
   distinguish between the cases when memdep knows the value is
   produced from when we just know if may be changed.
3. Move some of the logic for determining whether readonly calls
   are CSEs into memdep instead of it being in GVN.  This still
   leaves verification that the arguments are hte same to GVN to
   let it know about value equivalences in different contexts.
4. Change memdep's call/call dependency analysis to use 
   getModRefInfo(CallSite,CallSite) instead of doing something 
   very weak.  This only really matters for things like DSA, but
   someday maybe we'll have some other decent context sensitive
   analyses :)
5. This reimplements the guts of memdep to handle the new results.
6. This simplifies GVN significantly:
   a) readonly call CSE is slightly simpler
   b) I eliminated the "getDependencyFrom" chaining for load 
      elimination and load CSE doesn't have to worry about 
      volatile (they are always clobbers) anymore.
   c) GVN no longer does any 'lastLoad' caching, leaving it to 
      memdep.
7. The logic in DSE is simplified a bit and sped up.  A potentially
   unsafe case was eliminated.

llvm-svn: 60607
2008-12-05 21:04:20 +00:00
Anton Korobeynikov
30085a6f51 Revert invalid r60393. It causes llvm-gcc bootstrap fails in release builds.
See PR3160 for details

llvm-svn: 60604
2008-12-05 19:38:49 +00:00
Chris Lattner
211146e709 Fix test/Transforms/GVN/pre-load.ll
llvm-svn: 60594
2008-12-05 17:04:12 +00:00
Chris Lattner
35547ba5ca Make IsValueFullyAvailableInBlock safe.
llvm-svn: 60588
2008-12-05 07:49:08 +00:00
Devang Patel
4fcea36b8b Rewrite code that 1) filters loops and 2) calculates new loop bounds.
This fixes many bugs. I will add more test cases in a separate check-in.

Some day, the code that manipulates CFG and updates dom. info could use refactoring help.

llvm-svn: 60554
2008-12-04 21:38:42 +00:00
Chris Lattner
7b3576824a Start simplifying a switch that has a successor that is a switch.
llvm-svn: 60534
2008-12-04 06:31:07 +00:00
Chris Lattner
2677286c25 add a debugging option to help track down j-t problems.
llvm-svn: 60514
2008-12-04 00:07:59 +00:00
Dale Johannesen
119036d435 Remove an unused field.
llvm-svn: 60508
2008-12-03 22:43:56 +00:00
Dale Johannesen
a0b1516bdc Fix a misspelled function name.
llvm-svn: 60506
2008-12-03 20:56:12 +00:00
Chris Lattner
420385f8c3 Factor some code into a new FoldSingleEntryPHINodes method.
llvm-svn: 60501
2008-12-03 19:44:02 +00:00
Dale Johannesen
a851280d26 Fix a really wrong comment.
llvm-svn: 60494
2008-12-03 19:25:46 +00:00
Chris Lattner
f00b2f3fb4 Teach jump threading some more simple tricks:
1) have it fold "br undef", which does occur with
   surprising frequency as jump threading iterates.
2) teach j-t to delete dead blocks.  This removes the successor
   edges, reducing the in-edges of other blocks, allowing 
   recursive simplification.
3) Fold things like:
     br COND, BBX, BBY
  BBX:
     br COND, BBZ, BBW

   which also happens because jump threading iterates.

llvm-svn: 60470
2008-12-03 07:48:08 +00:00
Chris Lattner
29326a6d1f third time is the charm.
llvm-svn: 60469
2008-12-03 07:45:15 +00:00
Chris Lattner
d03c1b5440 fix assertion.
llvm-svn: 60468
2008-12-03 07:43:05 +00:00
Chris Lattner
7a00825f57 Rename DeleteBlockIfDead to DeleteDeadBlock and make it
unconditionally delete the block.  All likely clients will
do the checking anyway.

llvm-svn: 60464
2008-12-03 06:40:52 +00:00
Chris Lattner
12c3938837 Factor some code out of SimplifyCFG, forming a new
DeleteBlockIfDead method.

llvm-svn: 60463
2008-12-03 06:37:44 +00:00
Dale Johannesen
e06fb96c43 Minor rewrite per review feedback.
llvm-svn: 60442
2008-12-02 21:17:11 +00:00
Dale Johannesen
531472926e Make the code do what the comment says it does.
llvm-svn: 60431
2008-12-02 18:40:09 +00:00
Chris Lattner
2a9747548e Implement PRE of loads in the GVN pass with a pretty cheap and
straight-forward implementation.  This does not require any extra
alias analysis queries beyond what we already do for non-local loads.

Some programs really really like load PRE.  For example, SPASS triggers
this ~1000 times, ~300 times in 255.vortex, and ~1500 times on 403.gcc.

The biggest limitation to the implementation is that it does not split
critical edges.  This is a huge killer on many programs and should be
addressed after the initial patch is enabled by default.

The implementation of this should incidentally speed up rejection of 
non-local loads because it avoids creating the repl densemap in cases 
when it won't be used for fully redundant loads.

This is currently disabled by default.
Before I turn this on, I need to fix a couple of miscompilations in
the testsuite, look at compile time performance numbers, and look at
perf impact.  This is pretty close to ready though.

llvm-svn: 60408
2008-12-02 08:16:11 +00:00
Bill Wendling
f5798b5d6c Remove some errors that crept in. No functionality change.
llvm-svn: 60403
2008-12-02 06:24:20 +00:00
Bill Wendling
9981b7bcdc Merge two if-statements into one.
llvm-svn: 60402
2008-12-02 06:22:04 +00:00
Bill Wendling
109da8c135 More styalistic changes. No functionality change.
llvm-svn: 60401
2008-12-02 06:18:11 +00:00
Bill Wendling
654cc91c36 - Remove the buggy -X/C -> X/-C transform. This isn't valid when X isn't a
constant. If X is a constant, then this is folded elsewhere.

- Added a note to Target/README.txt to indicate that we'd like to implement
  this when we're able.

llvm-svn: 60399
2008-12-02 05:12:47 +00:00
Bill Wendling
a60e3e3539 Improve comment.
llvm-svn: 60398
2008-12-02 05:09:00 +00:00
Bill Wendling
e319ca5f21 - Reduce nesting.
- No need to do a swap on a canonicalized pattern.

No functionality change.

llvm-svn: 60397
2008-12-02 05:06:43 +00:00
Chris Lattner
cc084722e7 some random comment improvements.
llvm-svn: 60395
2008-12-02 04:52:26 +00:00
Owen Anderson
92e405b332 Fix an issue that Chris noticed, where local PRE was not properly instantiating
a new value numbering set after splitting a critical edge.  This increases
the number of instances of PRE on 403.gcc from ~60 to ~570.

llvm-svn: 60393
2008-12-02 04:09:22 +00:00
Dale Johannesen
f4362aae8c Consider only references to an IV within the loop when
figuring out the base of the IV.  This produces better
code in the example.  (Addresses use (IV) instead of 
(BASE,IV) - a significant improvement on low-register
machines like x86).

llvm-svn: 60374
2008-12-01 22:00:01 +00:00
Bill Wendling
33f3e77a5b Don't rebuild RHSNeg. Just use the one that's already there.
llvm-svn: 60370
2008-12-01 21:06:30 +00:00
Bill Wendling
d436da480d Document what this check is doing. Also, no need to cast to ConstantInt.
llvm-svn: 60369
2008-12-01 21:03:43 +00:00
Bill Wendling
1e4fb7a143 Use a simple comparison. Overflow on integer negation can only occur when the
integer is "minint".

llvm-svn: 60366
2008-12-01 19:46:27 +00:00
Bill Wendling
48b7cbbc01 Generalize the FoldOrWithConstant method to fold for any two constants which
don't have overlapping bits.

llvm-svn: 60344
2008-12-01 08:32:40 +00:00
Bill Wendling
2a182b838d Reduce copy-and-paste code by splitting out the code into its own function.
llvm-svn: 60343
2008-12-01 08:23:25 +00:00
Bill Wendling
a6e7dd2299 Use m_Specific() instead of double matching.
llvm-svn: 60341
2008-12-01 08:09:47 +00:00
Bill Wendling
8e484e9556 Move pattern check outside of the if-then statement. This prevents us from fiddling with constants unless we have to.
llvm-svn: 60340
2008-12-01 07:47:02 +00:00
Chris Lattner
3b908483b7 Rename some variables, only increment BI once at the start of the loop instead of throughout it.
llvm-svn: 60339
2008-12-01 07:35:54 +00:00
Chris Lattner
c6e6eaf6d3 pull the predMap densemap out of the inner loop of performPRE, so
that it isn't reallocated all the time.  This is a tiny speedup for
GVN: 3.90->3.88s

llvm-svn: 60338
2008-12-01 07:29:03 +00:00
Chris Lattner
f72f8e3b74 switch a couple more calls to use array_pod_sort.
llvm-svn: 60337
2008-12-01 06:52:57 +00:00
Chris Lattner
80d0eff786 Introduce a new array_pod_sort function and switch LSR to use it
instead of std::sort.  This shrinks the release-asserts LSR.o file
by 1100 bytes of code on my system.

We should start using array_pod_sort where possible.

llvm-svn: 60335
2008-12-01 06:49:59 +00:00
Chris Lattner
74f1e6d3ec Eliminate use of setvector for the DeadInsts set, just use a smallvector.
This is a lot cheaper and conceptually simpler.

llvm-svn: 60332
2008-12-01 06:27:41 +00:00
Chris Lattner
db86ff62f9 DeleteTriviallyDeadInstructions is always passed the
DeadInsts ivar, just use it directly.

llvm-svn: 60330
2008-12-01 06:14:28 +00:00
Chris Lattner
d6be279b4d simplify DeleteTriviallyDeadInstructions again, unlike my previous
buggy rewrite, this notifies ScalarEvolution of a pending instruction
about to be removed and then erases it, instead of erasing it then 
notifying.

llvm-svn: 60329
2008-12-01 06:11:32 +00:00
Chris Lattner
e6c7ed156f simplify these patterns using m_Specific. No need to grep for
xor in testcase (or is a substring).

llvm-svn: 60328
2008-12-01 05:16:26 +00:00
Chris Lattner
c92e1e104b Teach jump threading to clean up after itself, DCE and constfolding the
new instructions it simplifies.  Because we're threading jumps on edges
with constants coming in from PHI's, we inherently are exposing a lot more
constants to the new block.  Folding them and deleting dead conditions
allows the cost model in jump threading to be more accurate as it iterates.

llvm-svn: 60327
2008-12-01 04:48:07 +00:00
Chris Lattner
13942f82c4 Change instcombine to use FoldPHIArgGEPIntoPHI to fold two operand PHIs
instead of using FoldPHIArgBinOpIntoPHI.  In addition to being more
obvious, this also fixes a problem where instcombine wouldn't merge two
phis that had different variable indices.  This prevented instcombine
from factoring big chunks of code in 403.gcc.  For example:

 insn_cuid.exit:                
-       %tmp336 = load i32** @uid_cuid, align 4      
-       %tmp337 = getelementptr %struct.rtx_def* %insn_addr.0.ph.i, i32 0, i32 3    
-       %tmp338 = bitcast [1 x %struct.rtunion]* %tmp337 to i32*               
-       %tmp339 = load i32* %tmp338, align 4           
-       %tmp340 = getelementptr i32* %tmp336, i32 %tmp339     
        br label %bb62
 
 bb61:       
-       %tmp341 = load i32** @uid_cuid, align 4     
-       %tmp342 = getelementptr %struct.rtx_def* %insn, i32 0, i32 3        
-       %tmp343 = bitcast [1 x %struct.rtunion]* %tmp342 to i32*           
-       %tmp344 = load i32* %tmp343, align 4        
-       %tmp345 = getelementptr i32* %tmp341, i32 %tmp344          
        br label %bb62
 
 bb62:      
-       %iftmp.62.0.in = phi i32* [ %tmp345, %bb61 ], [ %tmp340, %insn_cuid.exit ]         
+       %insn.pn2 = phi %struct.rtx_def* [ %insn, %bb61 ], [ %insn_addr.0.ph.i, %insn_cuid.exit ]         
+       %tmp344.pn.in.in = getelementptr %struct.rtx_def* %insn.pn2, i32 0, i32 3     
+       %tmp344.pn.in = bitcast [1 x %struct.rtunion]* %tmp344.pn.in.in to i32*  
+       %tmp341.pn = load i32** @uid_cuid     
+       %tmp344.pn = load i32* %tmp344.pn.in 
+       %iftmp.62.0.in = getelementptr i32* %tmp341.pn, i32 %tmp344.pn   
        %iftmp.62.0 = load i32* %iftmp.62.0.in     

llvm-svn: 60325
2008-12-01 03:42:51 +00:00
Chris Lattner
0e03e40a76 Teach inst combine to merge GEPs through PHIs. This is really
important because it is sinking the loads using the GEPs, but
not the GEPs themselves.  This triggers 647 times on 403.gcc
and makes the .s file much much nicer.  For example before:

        je      LBB1_87 ## bb78
LBB1_62:        ## bb77
        leal    84(%esi), %eax
LBB1_63:        ## bb79
        movl    (%eax), %eax
...
LBB1_87:        ## bb78
        movl    $0, 4(%esp)
        movl    %esi, (%esp)
        call    L_make_decl_rtl$stub
        jmp     LBB1_62 ## bb77


after:

        jne     LBB1_63 ## bb79
LBB1_62:        ## bb78
        movl    $0, 4(%esp)
        movl    %esi, (%esp)
        call    L_make_decl_rtl$stub
LBB1_63:        ## bb79
        movl    84(%esi), %eax

The input code was (and the GEPs are merged and
the PHI is now eliminated by instcombine):

        br i1 %tmp233, label %bb78, label %bb77
bb77:           
        %tmp234 = getelementptr %struct.tree_node* %t_addr.3, i32 0, i32 0, i32 22              
        br label %bb79
bb78:           
        call void @make_decl_rtl(%struct.tree_node* %t_addr.3, i8* null) nounwind
        %tmp235 = getelementptr %struct.tree_node* %t_addr.3, i32 0, i32 0, i32 22              
        br label %bb79
bb79:           
        %iftmp.12.0.in = phi %struct.rtx_def** [ %tmp235, %bb78 ], [ %tmp234, %bb77 ]           
        %iftmp.12.0 = load %struct.rtx_def** %iftmp.12.0.in             

llvm-svn: 60322
2008-12-01 02:34:36 +00:00
Chris Lattner
c1adf6fc51 Make GVN be more intelligent about redundant load
elimination: when finding dependent load/stores, realize that
they are the same if aliasing claims must alias instead of relying
on the pointers to be exactly equal.  This makes load elimination
more aggressive.  For example, on 403.gcc, we had:

<     68 gvn    - Number of instructions PRE'd
< 152718 gvn    - Number of instructions deleted
<  49699 gvn    - Number of loads deleted
<   6153 memdep - Number of dirty cached non-local responses
< 169336 memdep - Number of fully cached non-local responses
< 162428 memdep - Number of uncached non-local responses

now we have:

>     64 gvn    - Number of instructions PRE'd
> 153623 gvn    - Number of instructions deleted
>  49856 gvn    - Number of loads deleted
>   5022 memdep - Number of dirty cached non-local responses
> 159030 memdep - Number of fully cached non-local responses
> 162443 memdep - Number of uncached non-local responses

That's an extra 157 loads deleted and extra 905 other instructions nuked.

This slows down GVN very slightly, from 3.91 to 3.96s.

llvm-svn: 60314
2008-12-01 01:31:36 +00:00
Chris Lattner
bd1bc4a75e Reimplement the non-local dependency data structure in terms of a sorted
vector instead of a densemap.  This shrinks the memory usage of this thing
substantially (the high water mark) as well as making operations like
scanning it faster.  This speeds up memdep slightly, gvn goes from
3.9376 to 3.9118s on 403.gcc

This also splits out the statistics for the cached non-local case to
differentiate between the dirty and clean cached case.  Here's the stats
for 403.gcc:

  6153 memdep - Number of dirty cached non-local responses
169336 memdep - Number of fully cached non-local responses
162428 memdep - Number of uncached non-local responses

yay for caching :)

llvm-svn: 60313
2008-12-01 01:15:42 +00:00