Summary: When selectScalarSSELoad is looking for a scalar_to_vector of a scalar load, it makes sure the load is only used by the scalar_to_vector. But it doesn't make sure the scalar_to_vector is only used once. This can cause the same load to be folded multiple times. This can be bad for performance. This also causes the chain output to be duplicated, but not connected to anything so chain dependencies will not be satisfied.
Reviewers: RKSimon, zvi, delena, spatel
Subscribers: andreadb, llvm-commits
Differential Revision: https://reviews.llvm.org/D26790
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@287983 91177308-0d34-0410-b5e6-96231b3b80d8
There are other spots where we can use this; we're currently dropping
metadata in some places, and there are proposed changes where we will
want to propagate metadata.
IRBuilder's CreateSelect() already has a parameter like this, so this
change makes the regular 'Create' API line up with that.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@287976 91177308-0d34-0410-b5e6-96231b3b80d8
The W bit distinquishes which operand is the memory operand. But if the mod bits are 3 then the memory operand is a register and there are two possible encodings. We already did this correctly for several other XOP instructions.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@287961 91177308-0d34-0410-b5e6-96231b3b80d8
Not sure this is truly needed but we had the floating point equivalents, the aligned equivalents, and the EVEX equivalents. So this just makes it complete.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@287960 91177308-0d34-0410-b5e6-96231b3b80d8
We were a little sloppy with adding tailcall markers. Be more
consistent by using setTailCallKind instead of setTailCall.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@287955 91177308-0d34-0410-b5e6-96231b3b80d8
Summary:
Shuffle lowering may have widened the element size of a i32 shuffle to i64 before selecting X86ISD::SHUF128. If this shuffle was used by a vselect this can prevent us from selecting masked operations.
This patch detects this and changes the element size to match the vselect.
I don't handle changing integer to floating point or vice versa as its not clear if its better to push such a bitcast to the inputs of the shuffle or to the user of the vselect. So I'm ignoring that case for now.
Reviewers: delena, zvi, RKSimon
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D27087
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@287939 91177308-0d34-0410-b5e6-96231b3b80d8
Summary:
The iterative algorithm for Loop Unswitching may render some of the branches unreachable in the unswitched loops.
Given the exponential nature of the algorithm, this is quite an overhead.
This patch fixes this problem by selectively unswitching only those branches within a loop that are reachable from the loop header.
Reviewers: Michael Zolothukin, Anna Thomas, Weiming Zhao.
Subscribers: llvm-commits.
Differential Revision: http://reviews.llvm.org/D26299
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@287925 91177308-0d34-0410-b5e6-96231b3b80d8
This patch corrects the behaviour of code such as:
.local foo
jal foo
foo:
to use the correct jal expansion when writing ELF files.
Patch by: Daniel Sanders
Reviewers: zoran.jovanovic, seanbruno, vkalintiris
Differential Revision: https://reviews.llvm.org/D24722
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@287918 91177308-0d34-0410-b5e6-96231b3b80d8
Vectorize UINT_TO_FP v2i32 -> v2f64 instead of scalarization (albeit still on the SIMD unit).
The codegen matches that generated by legalization (and is in fact used by AVX for UINT_TO_FP v4i32 -> v4f64), but has to be done in the x86 backend to account for legalization via 4i32.
Differential Revision: https://reviews.llvm.org/D26938
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@287886 91177308-0d34-0410-b5e6-96231b3b80d8
Use 512-bit instructions with subvector insertion/extraction like we do in a number of similar circumstances
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@287882 91177308-0d34-0410-b5e6-96231b3b80d8
The bug arises during register allocation on i686 for
CMPXCHG8B instruction when base pointer is needed. CMPXCHG8B
needs 4 implicit registers (EAX, EBX, ECX, EDX) and a memory address,
plus ESI is reserved as the base pointer. With such constraints the only
way register allocator would do its job successfully is when the addressing
mode of the instruction requires only one register. If that is not the case
- we are emitting additional LEA instruction to compute the address.
It fixes PR28755.
Patch by Alexander Ivchenko <alexander.ivchenko@intel.com>
Differential Revision: https://reviews.llvm.org/D25088
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@287875 91177308-0d34-0410-b5e6-96231b3b80d8
- It does not modify the input instruction
- Second operand of any address is always an Index Register,
make sure we actually check for that, instead of a check for
an immediate value
Patch by Alexander Ivchenko <alexander.ivchenko@intel.com>
Differential Revision: https://reviews.llvm.org/D24938
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@287873 91177308-0d34-0410-b5e6-96231b3b80d8
Replace the CVTTPD2DQ/CVTTPD2UDQ and CVTDQ2PD/CVTUDQ2PD opcodes with general versions.
This is an initial step towards similar FP_TO_SINT/FP_TO_UINT and SINT_TO_FP/UINT_TO_FP lowering to AVX512 CVTTPS2QQ/CVTTPS2UQQ and CVTQQ2PS/CVTUQQ2PS with illegal types.
Differential Revision: https://reviews.llvm.org/D27072
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@287870 91177308-0d34-0410-b5e6-96231b3b80d8
Change the IRObjectFile symbol iterator to be a pointer into a vector of
PointerUnions representing either IR symbols or asm symbols.
This change is in preparation for a future change for supporting multiple
modules in an IRObjectFile. Although it causes an increase in memory
consumption, we can deal with that issue separately by introducing a bitcode
symbol table.
Differential Revision: https://reviews.llvm.org/D26928
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@287845 91177308-0d34-0410-b5e6-96231b3b80d8
The scavenger was not passed if requiresFrameIndexScavenging was
enabled. I need to be able to test for the availability of an
unallocatable register here, so I can't create a virtual register for
it.
It might be better to just always use the scavenger and stop
creating virtual registers.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@287843 91177308-0d34-0410-b5e6-96231b3b80d8
m0 may need to be written for spill code, so
we don't want general code uses relying on the
value stored in it.
This introduces a few code quality regressions where copies
from m0 are not coalesced into copies of a copy of m0.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@287841 91177308-0d34-0410-b5e6-96231b3b80d8
This patch makes AsmPrinter less reliant on DwarfDebug by relying on the DWARF version in the AsmPrinter's MCStreamer's MCContext. This allows us to remove the redundant DWARF version from DwarfDebug. It also lets us change code that used to access the AsmPrinter's DwarfDebug just to get to the DWARF version by changing the DWARF version accessor on AsmPrinter so that it grabs the version from its MCStreamer's MCContext.
Differential Revision: https://reviews.llvm.org/D27032
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@287839 91177308-0d34-0410-b5e6-96231b3b80d8