addSchedBarrierDeps() is supposed to add use operands to the ExitSU
node. The current implementation adds uses for calls/barrier instruction
and the MBB live-outs in all other cases. The use
operands of conditional jump instructions were missed.
Also added code to macrofusion to set the latencies between nodes to
zero to avoid problems with the fusing nodes lingering around in the
pending list now.
Differential Revision: https://reviews.llvm.org/D25140
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@286544 91177308-0d34-0410-b5e6-96231b3b80d8
When copying to/from a constant register interferences can be ignored.
Also update the documentation for isConstantPhysReg() to make it more
obvious that this transformation is valid.
Differential Revision: https://reviews.llvm.org/D26106
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@286503 91177308-0d34-0410-b5e6-96231b3b80d8
Currently runtime metadata is emitted as an ELF section with name .AMDGPU.runtime_metadata.
However there is a standard way to convey vendor specific information about how to run an ELF binary, which is called vendor-specific note element (http://www.netbsd.org/docs/kernel/elf-notes.html).
This patch lets AMDGPU backend emits runtime metadata as a note element in .note section.
Differential Revision: https://reviews.llvm.org/D25781
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@286502 91177308-0d34-0410-b5e6-96231b3b80d8
We were failing to extract a constant splat shift value if the shifted value was being masked.
The (shl (and (setcc) N01CV) N1CV) -> (and (setcc) N01CV<<N1CV) combine was unnecessarily preventing this.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@286454 91177308-0d34-0410-b5e6-96231b3b80d8
Teach X86InstrInfo::analyzeCompare() not to crash on CMP and SUB instructions
that take a global address operand.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@286420 91177308-0d34-0410-b5e6-96231b3b80d8
Pretty bare-bones support for exception handling (no weird MSVC stuff, no SjLj
etc), but it should get things going.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@286407 91177308-0d34-0410-b5e6-96231b3b80d8
For pairs of 32-bit registers: isub_lo, isub_hi.
For pairs of vector registers: vsub_lo, vsub_hi.
Add generic subreg indices: ps_sub_lo, ps_sub_hi, and a function
HexagonRegisterInfo::getHexagonSubRegIndex(RegClass, GenericSubreg)
that returns the appropriate subreg index for RegClass.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@286377 91177308-0d34-0410-b5e6-96231b3b80d8
This patch adds support for fptoui to 2i32 from both 2f64 and 2f32, building on Simon's change for the signed version in r284459 and using AVX-512 instructions.
If we don't have VLX support we need to use a 512-bit operation for v2f64->v2i32 and extract the result.
It also recognises that cvttpd2udq zeroes the upper 64-bits of the xmm result.
Differential Revision: https://reviews.llvm.org/D26331
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@286345 91177308-0d34-0410-b5e6-96231b3b80d8
Summary: This allows the SSE intrinsic to use the EVEX instruction when available. It also fixes EVEX to not use a weird (v4i32 (fp_to_sint v2f64)) node and it merges some isel patterns. This also fixes some cases that weren't combining vzmovl with cvttpd2dq to remove extra moves.
Reviewers: delena, zvi, RKSimon
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D26330
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@286344 91177308-0d34-0410-b5e6-96231b3b80d8
The smallest tests that expose this are codegen tests (because SelectionDAGBuilder::visitSelect() uses matchSelectPattern
to create UMAX/UMIN nodes), but it's also possible to see the effects in IR alone with folds of min/max pairs.
If these were written as unsigned compares in IR, InstCombine canonicalizes the unsigned compares to signed compares.
Ie, running the optimizer pessimizes the codegen for this case without this patch:
define <4 x i32> @umax_vec(<4 x i32> %x) {
%cmp = icmp ugt <4 x i32> %x, <i32 2147483647, i32 2147483647, i32 2147483647, i32 2147483647>
%sel = select <4 x i1> %cmp, <4 x i32> %x, <4 x i32> <i32 2147483647, i32 2147483647, i32 2147483647, i32 2147483647>
ret <4 x i32> %sel
}
$ ./opt umax.ll -S | ./llc -o - -mattr=avx
vpmaxud LCPI0_0(%rip), %xmm0, %xmm0
$ ./opt -instcombine umax.ll -S | ./llc -o - -mattr=avx
vpxor %xmm1, %xmm1, %xmm1
vpcmpgtd %xmm0, %xmm1, %xmm1
vmovaps LCPI0_0(%rip), %xmm2 ## xmm2 = [2147483647,2147483647,2147483647,2147483647]
vblendvps %xmm1, %xmm0, %xmm2, %xmm0
Differential Revision: https://reviews.llvm.org/D26096
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@286318 91177308-0d34-0410-b5e6-96231b3b80d8
Since IMPLIFIT_DEF instructions are omitted in the output, when the output
of an IMPLICIT_DEF instruction is stackified, the resulting register lacks
an explicit push, leading to a push/pop mismatch. Fix this by converting
such IMPLICIT_DEFs into CONST_I32 0 instructions so that they have explicit
pushes.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@286274 91177308-0d34-0410-b5e6-96231b3b80d8