would scan the entire loop body, then scan all users of instructions in the
loop, looking for users outside the loop. Now, since we know that the
loop is in LCSSA form, we know that any users outside the loop will be LCSSA
phi nodes. Just scan them.
This speeds up indvars significantly.
llvm-svn: 34898
the order that instcombine processed instructions in the testcase. The end
result is that instcombine finished with:
define i16 @test1(i16 %a) {
%tmp = zext i16 %a to i32 ; <i32> [#uses=2]
%tmp21 = lshr i32 %tmp, 8 ; <i32> [#uses=1]
%tmp5 = shl i32 %tmp, 8 ; <i32> [#uses=1]
%tmp.upgrd.32 = or i32 %tmp21, %tmp5 ; <i32> [#uses=1]
%tmp.upgrd.3 = trunc i32 %tmp.upgrd.32 to i16 ; <i16> [#uses=1]
ret i16 %tmp.upgrd.3
}
which can't get matched as a bswap.
This patch makes instcombine more sophisticated about removing truncating
casts, allowing it to turn this into:
define i16 @test2(i16 %a) {
%tmp211 = lshr i16 %a, 8
%tmp52 = shl i16 %a, 8
%tmp.upgrd.323 = or i16 %tmp211, %tmp52
ret i16 %tmp.upgrd.323
}
which then matches as bswap. This fixes bswap.ll and implements
InstCombine/cast2.ll:test[12]. This also implements cast elimination of
add/sub.
llvm-svn: 34870
entry (0x8b056f0, LLVM BB @0x8b01b30, ID#0):
Live Ins: %r0 %r1 %r2 %r3
%reg1032 = tMOVrr %r3<kill>
%reg1033 = tMOVri8 1
%reg1034 = tMOVri8 0
tCMPi8 %reg1029<kill>, 0
tBcc mbb<entry,0x8b06a10>, 0
Successors according to CFG: 0x8b06980 0x8b06a10
entry (0x8b06980, LLVM BB @0x8b01b30, ID#12):
Predecessors according to CFG: 0x8b056f0
%reg1036 = tMOVrr %reg1034<kill>
Successors according to CFG: 0x8b06a10
entry (0x8b06a10, LLVM BB @0x8b01b30, ID#13):
Predecessors according to CFG: 0x8b056f0 0x8b06980
%reg1024<dead> = tMOVrr %reg1030<kill>
...
reg1030 and r1 have already been joined. When reg1024 and reg1030 are joined,
r1 live range from function entry to the tMOVrr instruction are dead. Eliminate
r1 from the livein set of the entry BB, not the BB where the copy is.
llvm-svn: 34866
a value from the worklist required scanning the entire worklist to remove all
entries. We now use a combination map+vector to prevent duplicates from
happening and prevent the scan. This speeds up instcombine on a large file
from the llvm-gcc bootstrap from 189.7s to 4.84s in a debug build and from
5.04s to 1.37s in a release build.
llvm-svn: 34848
this to a NOTE: because pari/gp results start to get rounded incorrectly
after 192 bits of precision. APInt and pari/gp never differ by more than
1, but APInt is more accurate because it does not lose precision after 192
bits as does pari/gp.
llvm-svn: 34834