mirror of
https://github.com/RPCS3/llvm.git
synced 2025-01-10 22:46:25 +00:00
726942c8bb
In the X86 backend, matching an address is initiated by the 'addr' complex pattern and its friends. During this process we may reassociate and-of-shift into shift-of-and (FoldMaskedShiftToScaledMask) to allow folding of the shift into the scale of the address. However as demonstrated by the testcase, this can trigger CSE of not only the shift and the AND which the code is prepared for but also the underlying load node. In the testcase this node is sitting in the RecordedNode and MatchScope data structures of the matcher and becomes a deleted node upon CSE. Returning from the complex pattern function, we try to access it again hitting an assert because the node is no longer a load even though this was checked before. Now obviously changing the DAG this late is bending the rules but I think it makes sense somewhat. Outside of addresses we prefer and-of-shift because it may lead to smaller immediates (FoldMaskAndShiftToScale is an even better example because it create a non-canonical node). We currently don't recognize addresses during DAGCombiner where arguably this canonicalization should be performed. On the other hand, having this in the matcher allows us to cover all the cases where an address can be used in an instruction. I've also talked a little bit to Dan Gohman on llvm-dev who added the RAUW for the new shift node in FoldMaskedShiftToScaledMask. This RAUW is responsible for initiating the recursive CSE on users (http://lists.cs.uiuc.edu/pipermail/llvmdev/2014-September/076903.html) but it is not strictly necessary since the shift is hooked into the visited user. Of course it's safer to keep the DAG consistent at all times (e.g. for accurate number of uses, etc.). So rather than changing the fundamentals, I've decided to continue along the previous patches and detect the CSE. This patch installs a very targeted DAGUpdateListener for the duration of a complex-pattern match and updates the matching state accordingly. (Previous patches used HandleSDNode to detect the CSE but that's not practical here). The listener is only installed on X86. I tested that there is no measurable overhead due to this while running through the spec2k BC files with llc. The only thing we pay for is the creation of the listener. The callback never ever triggers in spec2k since this is a corner case. Fixes rdar://problem/18206171 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@219009 91177308-0d34-0410-b5e6-96231b3b80d8
63 lines
1.8 KiB
LLVM
63 lines
1.8 KiB
LLVM
; RUN: llc < %s | FileCheck %s
|
|
|
|
; This testcase used to hit an assert during ISel. For details, see the big
|
|
; comment inside the function.
|
|
|
|
; CHECK-LABEL: foo:
|
|
; The AND should be turned into a subreg access.
|
|
; CHECK-NOT: and
|
|
; The shift (leal) should be folded into the scale of the address in the load.
|
|
; CHECK-NOT: leal
|
|
; CHECK: movl {{.*}},4),
|
|
|
|
target datalayout = "e-m:o-p:32:32-f64:32:64-f80:128-n8:16:32-S128"
|
|
target triple = "i386-apple-macosx10.6.0"
|
|
|
|
define void @foo(i32 %a) {
|
|
bb:
|
|
br label %bb1692
|
|
|
|
bb1692:
|
|
%tmp1694 = phi i32 [ 0, %bb ], [ %tmp1745, %bb1692 ]
|
|
%xor = xor i32 0, %tmp1694
|
|
|
|
; %load1 = (load (and (shl %xor, 2), 1020))
|
|
%tmp1701 = shl i32 %xor, 2
|
|
%tmp1702 = and i32 %tmp1701, 1020
|
|
%tmp1703 = getelementptr inbounds [1028 x i8]* null, i32 0, i32 %tmp1702
|
|
%tmp1704 = bitcast i8* %tmp1703 to i32*
|
|
%load1 = load i32* %tmp1704, align 4
|
|
|
|
; %load2 = (load (shl (and %xor, 255), 2))
|
|
%tmp1698 = and i32 %xor, 255
|
|
%tmp1706 = shl i32 %tmp1698, 2
|
|
%tmp1707 = getelementptr inbounds [1028 x i8]* null, i32 0, i32 %tmp1706
|
|
%tmp1708 = bitcast i8* %tmp1707 to i32*
|
|
%load2 = load i32* %tmp1708, align 4
|
|
|
|
%tmp1710 = or i32 %load2, %a
|
|
|
|
; While matching xor we address-match %load1. The and-of-shift reassocication
|
|
; in address matching transform this into into a shift-of-and and the resuting
|
|
; node becomes identical to %load2. CSE replaces %load1 which leaves its
|
|
; references in MatchScope and RecordedNodes stale.
|
|
%tmp1711 = xor i32 %load1, %tmp1710
|
|
|
|
%tmp1744 = getelementptr inbounds [256 x i32]* null, i32 0, i32 %tmp1711
|
|
store i32 0, i32* %tmp1744, align 4
|
|
%tmp1745 = add i32 %tmp1694, 1
|
|
indirectbr i8* undef, [label %bb1756, label %bb1692]
|
|
|
|
bb1756:
|
|
br label %bb2705
|
|
|
|
bb2705:
|
|
indirectbr i8* undef, [label %bb5721, label %bb5736]
|
|
|
|
bb5721:
|
|
br label %bb2705
|
|
|
|
bb5736:
|
|
ret void
|
|
}
|