This patch removes function 'CommuteVectorShuffle' from X86ISelLowering.cpp
and moves its logic into SelectionDAG.cpp as method 'getCommutedVectorShuffles'.
This refactoring is in preperation of an upcoming change to the DAGCombiner.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@213503 91177308-0d34-0410-b5e6-96231b3b80d8
Since the result of a SETCC for X86 is 0 or -1 in each lane, we can
move unary operations, in this case [su]int_to_fp through the mask
operation and constant fold the operation away. Generally speaking:
UNARYOP(AND(VECTOR_CMP(x,y), constant))
--> AND(VECTOR_CMP(x,y), constant2)
where constant2 is UNARYOP(constant).
This implements the transform where UNARYOP is [su]int_to_fp.
For example, consider the simple function:
define <4 x float> @foo(<4 x float> %val, <4 x float> %test) nounwind {
%cmp = fcmp oeq <4 x float> %val, %test
%ext = zext <4 x i1> %cmp to <4 x i32>
%result = sitofp <4 x i32> %ext to <4 x float>
ret <4 x float> %result
}
Before this change, the SSE code is generated as:
LCPI0_0:
.long 1 ## 0x1
.long 1 ## 0x1
.long 1 ## 0x1
.long 1 ## 0x1
.section __TEXT,__text,regular,pure_instructions
.globl _foo
.align 4, 0x90
_foo: ## @foo
cmpeqps %xmm1, %xmm0
andps LCPI0_0(%rip), %xmm0
cvtdq2ps %xmm0, %xmm0
retq
After, the code is improved to:
LCPI0_0:
.long 1065353216 ## float 1.000000e+00
.long 1065353216 ## float 1.000000e+00
.long 1065353216 ## float 1.000000e+00
.long 1065353216 ## float 1.000000e+00
.section __TEXT,__text,regular,pure_instructions
.globl _foo
.align 4, 0x90
_foo: ## @foo
cmpeqps %xmm1, %xmm0
andps LCPI0_0(%rip), %xmm0
retq
The cvtdq2ps has been constant folded away and the floating point 1.0f
vector lanes are materialized directly via the ModRM operand of andps.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@213342 91177308-0d34-0410-b5e6-96231b3b80d8
Clang tries to check the clobber list but doesn't list segment registers in its
x86 register list. This fixes PR20343.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@213303 91177308-0d34-0410-b5e6-96231b3b80d8
There are two parts here. First is to modify tablegen to adjust the encoding
type ENCODING_RM with the scaling factor.
The second is to use the new encoding types to compute the correct
displacement in the decoder.
Fixes <rdar://problem/17608489>
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@213281 91177308-0d34-0410-b5e6-96231b3b80d8
Passes the computed scaling factor in TSFlags rather than the old attributes.
Also removes the C++ version of computing the scaling factor (MemObjSize)
along with the asserts added by the previous patch.
No functional change.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@213279 91177308-0d34-0410-b5e6-96231b3b80d8
This does not actually move the logic yet but reimplements it in the Tablegen
language. Then asserts that the new implementation results in the same value.
The next patch will remove the assert and the temporary use of the TSFlags and
remove the C++ implementation.
The formula requires a limited form of the logical left and right operators.
I implemented these with the bit-extract/insert operator (i.e. blah{bits}).
No functional change.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@213278 91177308-0d34-0410-b5e6-96231b3b80d8
Previously we asserted on this code. Currently compiler-rt doesn't
actually implement any of these new libcalls, but external help is
pretty much the only viable option for LLVM.
I've followed the much more generic "__truncST2" naming, as opposed to
the odd name for f32 -> f16 truncation. This can obviously be changed
later, or overridden by any targets that need to.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@213252 91177308-0d34-0410-b5e6-96231b3b80d8
x86 has no native ability to extend an f16 to f64, but the same result
is obtained if we expand it into two separate extensions: f16 -> f32
-> f64.
Unfortunately the same is not true for truncate, so that still results
in a compilation failure.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@213251 91177308-0d34-0410-b5e6-96231b3b80d8
This makes the two intrinsics @llvm.convert.from.f16 and
@llvm.convert.to.f16 accept types other than simple "float". This is
only strictly needed for the truncate operation, since otherwise
double rounding occurs and there's no way to represent the strict IEEE
conversion. However, for symmetry we allow larger types in the extend
too.
During legalization, we can expand an "fp16_to_double" operation into
two extends for convenience, but abort when the truncate isn't legal. A new
libcall is probably needed here.
Even after this commit, various target tweaks are needed to actually use the
extended intrinsics. I've put these into separate commits for clarity, so there
are no actual tests of f64 conversion here.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@213248 91177308-0d34-0410-b5e6-96231b3b80d8
It turns out that in most cases (the main exception being i1-related
types) once these operations are formed we cannot separate them and
the targets end up having to deal with them whether they want to or
not.
This is not a good situation, and a more reasonable default can be
formed by ackowledging this and having targets leave them as Legal.
Only x86 seems to be affected (other targets don't even try marking
the operation Expand).
Mostly there's no visible change here yet, but it will be useful to
have truly expanded EXTLOADS for MVT::f16 softening support.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@213162 91177308-0d34-0410-b5e6-96231b3b80d8
Before this change, method 'isShuffleMaskLegal' didn't know that shuffles
implementing a 'movhlps' operation were perfectly legal for SSE targets.
This patch adds the missing check for 'isMOVHLPSMask' inside method
'isShuffleMaskLegal' to fix the problem.
The reason why it is important to do this is because the DAGCombiner
conservatively avoids combining a pair of shuffles if the resulting shuffle
node has an illegal mask. Before this patch, shuffles with a MOVHLPS mask were
wrongly considered not to be legal. This was the root cause of some poor-code
generation bugs.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@213137 91177308-0d34-0410-b5e6-96231b3b80d8
There exists a helper function to abstract away the various differences
between ConstantVector, ConstantDataVector, ConstantAggregateZero, etc.
Use it to simplify X86WindowsTargetObjectFile::getSectionForConstant.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@213104 91177308-0d34-0410-b5e6-96231b3b80d8
Refactoring; no functional changes intended
Removed PostRAScheduler bits from subtargets (X86, ARM).
Added PostRAScheduler bit to MCSchedModel class.
This bit is set by a CPU's scheduling model (if it exists).
Removed enablePostRAScheduler() function from TargetSubtargetInfo and subclasses.
Fixed the existing enablePostMachineScheduler() method to use the MCSchedModel (was just returning false!).
Added methods to TargetSubtargetInfo to allow overrides for AntiDepBreakMode, CriticalPathRCs, and OptLevel for PostRAScheduling.
Added enablePostRAScheduler() function to PostRAScheduler class which queries the subtarget for the above values.
Preserved existing scheduler behavior for ARM, MIPS, PPC, and X86:
a. ARM overrides the CPU's postRA settings by enabling postRA for any non-Thumb or Thumb2 subtarget.
b. MIPS overrides the CPU's postRA settings by enabling postRA for everything.
c. PPC overrides the CPU's postRA settings by enabling postRA for everything.
d. X86 is the only target that actually has postRA specified via sched model info.
Differential Revision: http://reviews.llvm.org/D4217
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@213101 91177308-0d34-0410-b5e6-96231b3b80d8
Fixes a gcc warning caused by a typo. A redundant assignment operation was
accidentally used as the third operand of a conditional expression.
No functional change intended.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@213061 91177308-0d34-0410-b5e6-96231b3b80d8
This implements the FastLowerCall hook, which is based on the DoSelectCall
function. The implementation is very similar, but the target-independent call
lowering part has been factored out.
This should also enable patchpoint intrinsic lowering for FastISel on X86.
Related to <rdar://problem/17427052>.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@213049 91177308-0d34-0410-b5e6-96231b3b80d8
Revert "[FastISel][X86] Implement the FastLowerIntrinsicCall hook."
Revert "[FastISel][X86] Implement the FastLowerCall hook."
This reverts commit r213035, r213036, and r213037 to make the
buildbots happy again.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@213048 91177308-0d34-0410-b5e6-96231b3b80d8
The constant pool entry code for WinCOFF assumed that vector constants
would be formed using ConstantDataVector, it did not expect to see a
ConstantVector. Furthermore, it did not expect undef as one of the
elements of the vector.
ConstantVectors should be handled like ConstantDataVectors, treat Undef
as zero.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@213038 91177308-0d34-0410-b5e6-96231b3b80d8
This implements the FastLowerCall hook, which is based on the DoSelectCall
function. The implementation is very similar, but the target-independent call
lowering part has been factored out.
This should also enable patchpoint intrinsic lowering for FastISel on X86.
Related to <rdar://problem/17427052>.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@213035 91177308-0d34-0410-b5e6-96231b3b80d8
No functional change.
The offsets for the other bitfields are specified symbolically. I need to
increase the size for one of the earlier fields which is easier after this
cleanup.
Why these bits are relative to VEXShift is a bit strange but that is for
another cleanup.
I made sure that the values for the enums are unchanged after this change.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@213011 91177308-0d34-0410-b5e6-96231b3b80d8
COFF lacks a feature that other object file formats support: mergeable
sections.
To work around this, MSVC sticks constant pool entries in special COMDAT
sections so that each constant is in it's own section. This permits
unused constants to be dropped and it also allows duplicate constants in
different translation units to get merged together.
This fixes PR20262.
Differential Revision: http://reviews.llvm.org/D4482
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@213006 91177308-0d34-0410-b5e6-96231b3b80d8
We would emit a libcall for a 64-bit atomic on x86 after SVN r212119. This was
due to the misuse of hasCmpxchg16 to indicate if cmpxchg8b was supported on a
32-bit target. They were added at different times and would result in the
border condition being mishandled.
This fixes the border case to emit the cmpxchg8b instruction for 64-bit atomic
operations on x86 at the cost of restoring a long-standing bug in the codegen.
We emit a cmpxchg8b on all x86 targets even where the CPU does not support this
instruction (pre-Pentium CPUs). Although this bug should be fixed, this was
present prior to SVN r212119 and this change, so this is not really introducing
a regression.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@212956 91177308-0d34-0410-b5e6-96231b3b80d8
We construct a temporary "atomicrmw xchg" instruction when lowering atomic
stores for widths that aren't supported natively. This isn't on the top-level
worklist though, so it won't be removed automatically and we have to do it
ourselves once that itself has been lowered.
Thanks Saleem for pointing this out!
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@212948 91177308-0d34-0410-b5e6-96231b3b80d8
Rename member variables and functions for the MCStreamer for DWARF-like
unwinding management. Rename the Windows ones as well and make the naming and
handling similar across the two. No functional change intended.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@212912 91177308-0d34-0410-b5e6-96231b3b80d8
No functional change. As I was trying to understand this function, I found
that variables were reused with confusing names and the broadcast case was a
bit too implicit. Hopefully, this is an improvement.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@212795 91177308-0d34-0410-b5e6-96231b3b80d8
It was computing the VL/n case as:
MemObjSize = VectorByteSize / ElemByteSize / Divider * ElemByteSize
ElemByteSize not only falls out but VectorByteSize/Divider now actually
matches the definition of VL/n.
Also some formatting fixes.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@212794 91177308-0d34-0410-b5e6-96231b3b80d8
Also, add a case clause in X86InstrInfo::shouldScheduleAdjacent to enable
macro-fusion.
<rdar://problem/15680770>
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@212747 91177308-0d34-0410-b5e6-96231b3b80d8
shuffle lowering: match shuffle patterns equivalent to an unpcklwd or
unpckhwd instruction.
This allows us to use generic lowering code for v8i16 shuffles and match
the unpack pattern late.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@212705 91177308-0d34-0410-b5e6-96231b3b80d8
combine into half-shuffles through unpack instructions that expand the
half to a whole vector without messing with the dword lanes.
This fixes some redundant instructions in splat-like lowerings for
v16i8, which are now getting to be *really* nice.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@212695 91177308-0d34-0410-b5e6-96231b3b80d8
that splat i8s into i16s.
Previously, we would try much too hard to arrange a sequence of i8s in
one half of the input such that we could unpack them into i16s and
shuffle those into place. This isn't always going to be a cheaper i8
shuffle than our other strategies. The case where it is always going to
be cheaper is when we can arrange all the necessary inputs into one half
using just i16 shuffles. It happens that viewing the problem this way
also makes it much easier to produce an efficient set of shuffles to
move the inputs into one half and then unpack them.
With this, our splat code gets one step closer to being not terrible
with the new experimental lowering strategy. It also exposes two
combines missing which I will add next.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@212692 91177308-0d34-0410-b5e6-96231b3b80d8
shuffles specifically for cases where a small subset of the elements in
the input vector are actually used.
This is specifically targetted at improving the shuffles generated for
trunc operations, but also helps out splat-like operations.
There is still some really low-hanging fruit here that I want to address
but this is a huge step in the right direction.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@212680 91177308-0d34-0410-b5e6-96231b3b80d8
don't need to set it manually.
This is based on feedback from Tom who pointed out that if every target
needs to handle this we need to reach out to those maintainers. In fact,
it doesn't make sense to duplicate everything when anything other than
expand seems unlikely at this stage.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@212661 91177308-0d34-0410-b5e6-96231b3b80d8
This lets us experiment with 512-bit vectorization without passing
force-vector-width manually.
The code generated for a simple integer memset loop is properly vectorized.
Disassembly is still broken for it though :(.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@212634 91177308-0d34-0410-b5e6-96231b3b80d8