Summary:
This patch implements my promised optimization to reunites certain sexts from
operands after we extract the constant offset. See the header comment of
reuniteExts for its motivation.
One key building block that enables this optimization is Bjarke's poison value
analysis (D11212). That helps to prove "a +nsw b" can't overflow.
Reviewers: broune
Subscribers: jholewinski, sanjoy, llvm-commits
Differential Revision: http://reviews.llvm.org/D12016
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@245003 91177308-0d34-0410-b5e6-96231b3b80d8
This commit modifies the way the machine basic blocks are serialized - now the
machine basic blocks are serialized using a custom syntax instead of relying on
YAML primitives. Instead of using YAML mappings to represent the individual
machine basic blocks in a machine function's body, the new syntax uses a single
YAML block scalar which contains all of the machine basic blocks and
instructions for that function.
This is an example of a function's body that uses the old syntax:
body:
- id: 0
name: entry
instructions:
- '%eax = MOV32r0 implicit-def %eflags'
- 'RETQ %eax'
...
The same body is now written like this:
body: |
bb.0.entry:
%eax = MOV32r0 implicit-def %eflags
RETQ %eax
...
This syntax change is motivated by the fact that the bundled machine
instructions didn't map that well to the old syntax which was using a single
YAML sequence to store all of the machine instructions in a block. The bundled
machine instructions internally use flags like BundledPred and BundledSucc to
determine the bundles, and serializing them as MI flags using the old syntax
would have had a negative impact on the readability and the ease of editing
for MIR files. The new syntax allows me to serialize the bundled machine
instructions using a block construct without relying on the internal flags,
for example:
BUNDLE implicit-def dead %itstate, implicit-def %s1 ... {
t2IT 1, 24, implicit-def %itstate
%s1 = VMOVS killed %s0, 1, killed %cpsr, implicit killed %itstate
}
This commit also converts the MIR testcases to the new syntax. I developed
a script that can convert from the old syntax to the new one. I will post the
script on the llvm-commits mailing list in the thread for this commit.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@244982 91177308-0d34-0410-b5e6-96231b3b80d8
We used to just say "invalid type suffix for instruction", which is
misleading. This is because we fallback to the long-form matcher if the
short-form matcher failed, losing the error information on the way.
Save it, so that we can provide a little better diagnostics when the
long-form matcher thinks a suffix is the cause of the error.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@244955 91177308-0d34-0410-b5e6-96231b3b80d8
If <src> is non-zero we can safely set the flag to true, and this
results in less code generated for, e.g. ffs(x) + 1 on FreeBSD.
Thanks to majnemer for suggesting the fix and reviewing.
Code generated before the patch was applied:
0: 0f bc c7 bsf %edi,%eax
3: b9 20 00 00 00 mov $0x20,%ecx
8: 0f 45 c8 cmovne %eax,%ecx
b: 83 c1 02 add $0x2,%ecx
e: b8 01 00 00 00 mov $0x1,%eax
13: 85 ff test %edi,%edi
15: 0f 45 c1 cmovne %ecx,%eax
18: c3 retq
Code generated after the patch was applied:
0: 0f bc cf bsf %edi,%ecx
3: 83 c1 02 add $0x2,%ecx
6: 85 ff test %edi,%edi
8: b8 01 00 00 00 mov $0x1,%eax
d: 0f 45 c1 cmovne %ecx,%eax
10: c3 retq
It seems we can still use cmove and save another 'test' instruction, but
that can be tackled separately.
Differential Revision: http://reviews.llvm.org/D11989
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@244947 91177308-0d34-0410-b5e6-96231b3b80d8
We used to be over-conservative about preserving inbounds. Actually, the second
GEP (which applies the constant offset) can inherit the inbounds attribute of
the original GEP, because the resultant pointer is equivalent to that of the
original GEP. For example,
x = GEP inbounds a, i+5
=>
y = GEP a, i // inbounds removed
x = GEP inbounds y, 5 // inbounds preserved
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@244937 91177308-0d34-0410-b5e6-96231b3b80d8
This patch corresponds to review:
http://reviews.llvm.org/D11471
It improves the code generated for converting a scalar to a vector value. With
direct moves from GPRs to VSRs, we no longer require expensive stack operations
for this. Subsequent patches will handle the reverse case and more general
operations between vectors and their scalar elements.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@244921 91177308-0d34-0410-b5e6-96231b3b80d8
They rely on global fast-math options, but soon ISel will rely only on fast-math flags on the instructions themselves. Rip the fast checks out into their own file so we can mark their instructions as fast.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@244914 91177308-0d34-0410-b5e6-96231b3b80d8
These tests relied on -enable-no-nans-fp-math, whereas soon they'll take their no-nans hint
from the FCMP instruction itself, so split the no-nans stuff out into its own test.
Also do a slight rejig of instruction order. The old FMIN/MAX backend matching had to deal with looking through casts, which it never did particularly well. Now, instcombine will recognize such patterns and canonicalize the cast outside the select. So modify the test inputs to assume that instcombine has already run.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@244913 91177308-0d34-0410-b5e6-96231b3b80d8
DeadStoreElimination does eliminate a store if it stores a value which was loaded from the same memory location.
So far this worked only if the store is in the same block as the load.
Now we can also handle stores which are in a different block than the load.
Example:
define i32 @test(i1, i32*) {
entry:
%l2 = load i32, i32* %1, align 4
br i1 %0, label %bb1, label %bb2
bb1:
br label %bb3
bb2:
; This store is redundant
store i32 %l2, i32* %1, align 4
br label %bb3
bb3:
ret i32 0
}
Differential Revision: http://reviews.llvm.org/D11854
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@244901 91177308-0d34-0410-b5e6-96231b3b80d8
Summary:
Update the demotion logic in WinEHPrepare to avoid creating new cleanups by
walking predecessors as necessary to insert stores for EH-pad PHIs.
Also avoid creating stores for EH-pad PHIs that have no uses.
The store/load placement is still pretty naive. Likely future improvements
(at least for optimized compiles) include:
- Share loads for related uses as possible
- Coalesce non-interfering use/def-related PHIs
- Store at definition point rather than each PHI pred for non-interfering
lifetimes.
Reviewers: rnk, majnemer
Subscribers: llvm-commits
Differential Revision: http://reviews.llvm.org/D11955
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@244894 91177308-0d34-0410-b5e6-96231b3b80d8
Recent mesa/llvmpipe crashes on SystemZ due to a failed assertion when
attempting to compile a routine with a return type of
{ <4 x float>, <4 x float>, <4 x float>, <4 x float> }
on a system without vector instruction support.
This is because after legalizing the vector type, we get a return value
consisting of 16 floats, which cannot all be returned in registers.
Usually, what should happen in this case is that the target's CanLowerReturn
routine rejects the return type, in which case SelectionDAG falls back to
implementing a structure return in memory via implicit reference.
However, the SystemZ target never actually implemented any CanLowerReturn
routine, and thus would accept any struct return type.
This patch fixes the crash by implementing CanLowerReturn. As a side effect,
this also handles fp128 return values, fixing a todo that was noted in
SystemZCallingConv.td.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@244889 91177308-0d34-0410-b5e6-96231b3b80d8
Consider this code:
BB:
%i = phi i32 [ 0, %if.then ], [ %c, %if.else ]
%add = add nsw i32 %i, %b
...
In this common case the add can be moved to the %if.else basic block, because
adding zero is an identity operation. If we go though %if.then branch it's
always a win, because add is not executed; if not, the number of instructions
stays the same.
This pattern applies also to other instructions like sub, shl, shr, ashr | 0,
mul, sdiv, div | 1.
Patch by Jakub Kuderski!
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@244887 91177308-0d34-0410-b5e6-96231b3b80d8
Other than PC-relative loads/store the patterns that match the various
load/store addressing modes have the same complexity, so the order that they
are matched is the order that they appear in the .td file.
Rearrange the instruction definitions in ARMInstrThumb.td, and make use of
AddedComplexity for PC-relative loads, so that the instruction matching order
is the order that results in the simplest selection logic. This also makes
register-offset load/store be selected when it should, as previously it was
only selected for too-large immediate offsets.
Differential Revision: http://reviews.llvm.org/D11800
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@244882 91177308-0d34-0410-b5e6-96231b3b80d8
Most SSE/AVX (non-constant) vector shift instructions only use the lower 64-bits of the 128-bit shift amount vector operand, this patch calls SimplifyDemandedVectorElts to optimize for this.
I had to refactor some of my recent InstCombiner work on the vector shifts to avoid quite a bit of duplicate code, it means that SimplifyX86immshift now (re)decodes the type of shift.
Differential Revision: http://reviews.llvm.org/D11938
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@244872 91177308-0d34-0410-b5e6-96231b3b80d8
Now that we can properly promote mismatched FCOPYSIGNs (r244858), we
can mark the FP_ROUND on the result as truncating, to expose folding.
FCOPYSIGN doesn't change anything but the sign bit, so
(fp_round (fcopysign (fpext a), b))
is equivalent to (modulo the sign bit):
(fp_round (fpext a))
which is a no-op.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@244862 91177308-0d34-0410-b5e6-96231b3b80d8
We can lower them using our cool tricks if we fpext/fptrunc the second
input, like we do for f32/f64.
Follow-up to r243924, r243926, and r244858.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@244860 91177308-0d34-0410-b5e6-96231b3b80d8
We don't care about its type, and there's even a combine that'll fold
away the FP_EXTEND if we let it run. However, until it does, we'll have
something broken like:
(f32 (fp_extend (f64 v)))
Scalar f16 follow-up to r243924.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@244858 91177308-0d34-0410-b5e6-96231b3b80d8
To be clear: this is an *optimization* not a correctness change.
CodeGenPrep likes to duplicate icmps feeding branch instructions to take advantage of x86's ability to fuze many comparison/branch patterns into a single micro-op and to reduce the need for materializing i1s into general registers. PlaceSafepoints likes to place safepoint polls right at the end of basic blocks (immediately before terminators) when inserting entry and backedge safepoints. These two heuristics interact in a somewhat unfortunate way where the branch terminating the original block will be controlled by a condition driven by unrelocated pointers. This forces the register allocator to keep both the relocated and unrelocated values of the pointers feeding the icmp alive over the safepoint poll.
One simple fix would have been to just adjust PlaceSafepoints to move one back in the basic block, but you can reach similar cases as a result of LICM or other hoisting passes. As a result, doing a post insertion fixup seems to be more robust.
I considered doing this in CodeGenPrep itself, but having to update the live sets of already rewritten safepoints gets complicated fast. In particular, you can't just use def/use information because by moving the icmp, we're extending the live range of it's inputs potentially.
Instead, this patch teaches RewriteStatepointsForGC to make the required adjustments before making the relocations explicit in the IR. This change really highlights the fact that RSForGC is a CodeGenPrep-like pass which is performing target specific lowering. In the long run, we may even want to combine the two though this would require a lot more smarts to be integrated into RSForGC first. We currently rely on being able to run a set of cleanup passes post rewriting because the IR RSForGC generates is pretty damn ugly.
Differential Revision: http://reviews.llvm.org/D11819
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@244821 91177308-0d34-0410-b5e6-96231b3b80d8
This commit fixes a bug where MI parser couldn't resolve the named IR
references that referenced named global values.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@244817 91177308-0d34-0410-b5e6-96231b3b80d8
When rewriting the IR such that base pointers are available for every live pointer, we potentially need to duplicate instructions to propagate the base. The original code had only handled PHI and Select under the belief those were the only instructions which would need duplicated. When I added support for vector instructions, I'd added a collection of hacks for ExtractElement which caught most of the common cases. Of course, I then found the one test case my hacks couldn't cover. :)
This change removes all of the early hacks for extract element. By defining extractelement as a BDV (rather than trying to look through it), we can extend the rewriting algorithm to duplicate the extract as needed. Note that a couple of peephole optimizations were left in for the moment, because while we now handle extractelement as a first class citizen, we're not yet handling insertelement. That change will follow in the near future.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@244808 91177308-0d34-0410-b5e6-96231b3b80d8
Summary:
D11924 implemented part of the floating-point comparisons, this patch implements the rest:
* Tell ISelLowering that all booleans are either 0 or 1.
* Expand the eq/ne/lt/le/gt/ge floating-point comparisons to the canonical ones (similar to what Mips32r6InstrInfo.td does).
* Add tests for ord/uno.
* Add tests for ueq/one/ult/ule/ugt/uge.
* Fix existing comparison tests to remove the (res & 1) code, which setBooleanContents stops from generating.
Reviewers: sunfish
Subscribers: llvm-commits, jfb
Differential Revision: http://reviews.llvm.org/D11970
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@244779 91177308-0d34-0410-b5e6-96231b3b80d8
r242520 was reverted in r244313 as the expected behaviour of the alias
attribute in C is that the alias has the same size as the aliasee. However
we can re-introduce adding the size on the alias when the aliasee does not,
from a source code or object perspective, exist as a discrete entity. This
happens when the aliasee is not a symbol, or when that symbol is private.
Differential Revision: http://reviews.llvm.org/D11943
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@244752 91177308-0d34-0410-b5e6-96231b3b80d8
On Mach-O emitting aliases for the variables that make up a MergedGlobals
variable can cause problems when linking with dead stripping enabled so don't
do that, except for external variables where we must emit an alias.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@244748 91177308-0d34-0410-b5e6-96231b3b80d8
This abstracts away the test for "when can we fold across a MachineInstruction"
into the the MI interface, and changes call-frame optimization use the same test
the peephole optimizer users.
Differential Revision: http://reviews.llvm.org/D11945
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@244729 91177308-0d34-0410-b5e6-96231b3b80d8
As discussed in D11886, this patch moves the SSE/AVX vector blend folding to instcombiner from PerformINTRINSIC_WO_CHAINCombine (which allows us to remove this completely).
InstCombiner already had partial support for this, I just had to add support for zero (ConstantAggregateZero) masks and also the case where both selection inputs were the same (allowing us to ignore the mask).
I also moved all the relevant combine tests into InstCombine/blend_x86.ll
Differential Revision: http://reviews.llvm.org/D11934
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@244723 91177308-0d34-0410-b5e6-96231b3b80d8
For NVPTX, try to use 32-bit division instead of 64-bit division when the dividend and divisor
fit in 32 bits. This speeds up some internal benchmarks significantly. The underlying reason
is that many index computations are carried out in 64-bits but never actually exceed the
capacity of a 32-bit word.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@244684 91177308-0d34-0410-b5e6-96231b3b80d8
Mangled "linkage" names can be huge, and if the debugger (or other
tools) have no use for them, the size savings can be very impressive
(on the order of 40%).
Add one test for controlling behavior, and modify a number of tests to
either stop using linkage names, or make llc emit them (so these tests
will still run when the default triple is for PS4).
Differential Revision: http://reviews.llvm.org/D11374
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@244678 91177308-0d34-0410-b5e6-96231b3b80d8
`InstCombiner::OptimizeOverflowCheck` was asserting an
invariant (operands to binary operations are ordered by decreasing
complexity) that wasn't really an invariant. Fix this by instead having
`InstCombiner::OptimizeOverflowCheck` establish the invariant if it does
not hold.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@244676 91177308-0d34-0410-b5e6-96231b3b80d8
Summary:
For example:
s6 = s0*s5;
s2 = s6*s6 + s6;
...
s4 = s6*s3;
We notice that it is possible for s2 is folded to fma (s0, s5, fmul (s6 s6)).
This only happens when Aggressive is true, otherwise hasOneUse() check
already prevents from folding the multiplication with more uses.
Test Plan: test/CodeGen/NVPTX/fma-assoc.ll
Patch by Xuetian Weng
Reviewers: hfinkel, apazos, jingyue, ohsallen, arsenm
Subscribers: arsenm, jholewinski, llvm-commits
Differential Revision: http://reviews.llvm.org/D11855
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@244649 91177308-0d34-0410-b5e6-96231b3b80d8