Module::getOrInsertFunction is using C-style vararg instead of
variadic templates.
From a user prospective, it forces the use of an annoying nullptr
to mark the end of the vararg, and there's not type checking on the
arguments. The variadic template is an obvious solution to both
issues.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@299925 91177308-0d34-0410-b5e6-96231b3b80d8
Module::getOrInsertFunction is using C-style vararg instead of
variadic templates.
From a user prospective, it forces the use of an annoying nullptr
to mark the end of the vararg, and there's not type checking on the
arguments. The variadic template is an obvious solution to both
issues.
Patch by: Serge Guelton <serge.guelton@telecom-bretagne.eu>
Differential Revision: https://reviews.llvm.org/D31070
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@299699 91177308-0d34-0410-b5e6-96231b3b80d8
Before r294774, there was a problem when lowering broadcasts to use
128-bit subvectors.
When we looked through a bitcast to find the broadcast input, we'd keep
using the original type, so you'd end up with things like:
(v8f32 (broadcast
(v4f32 (extract_subvector
(v8i32 V),
...))
))
r294774 fixed it to always emit subvectors with the scalar type of the
original source.
It also introduced some asserts, to check that we use scalars with
the same size, and vectors with the same number of elements.
The scalar size equality is checked earlier when looking through bitcasts,
and is a useful assert.
However, the number of elements don't have to be identical: we're always
going to extract a 128-bit subvector, and we can have different size
inputs if we looked through a concat_vector to find a 256-bit source.
Relax the overzealous assert.
Replace it with a check of the original source vector being 256 or 512
bits. If it's 128 bits, we can't extract_subvector from it.
Fixes PR32371.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@299490 91177308-0d34-0410-b5e6-96231b3b80d8
PSADBW pattern currently supports the 32 bit IR pattern and only GLT (greather than) comparison.
The patch extends the pattern to catch also 64 bit IR pattern and includes all other comparison types (not only GLT).
Differential Revision: https://reviews.llvm.org/D31577
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@299425 91177308-0d34-0410-b5e6-96231b3b80d8
It can be costly to transfer from the gprs to the xmm registers and can prevent loads merging.
This patch splits vXi16/vXi32/vXi64 BUILD_VECTORS that use the same operand in multiple elements into a BUILD_VECTOR with only a single insertion of each of those elements and then performs an unary shuffle to duplicate the values.
There are a couple of minor regressions this patch unearths due to some missing MOVDDUP/BROADCAST folds that I will address in a future patch.
Note: Now that vector shuffle lowering and combining is pretty good we should be reusing that instead of duplicating so much in LowerBUILD_VECTOR - this is the first of several patches to address this.
Differential Revision: https://reviews.llvm.org/D31373
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@299387 91177308-0d34-0410-b5e6-96231b3b80d8
The x86_64 ABI requires that the stack is 16 byte aligned on function calls. Thus, the 8-byte error code, which is pushed by the CPU for certain exceptions, leads to a misaligned stack. This results in bugs such as Bug 26413, where misaligned movaps instructions are generated.
This commit fixes the misalignment by adjusting the stack pointer in these cases. The adjustment is done at the beginning of the prologue generation by subtracting another 8 bytes from the stack pointer. These additional bytes are popped again in the function epilogue.
Fixes Bug 26413
Patch by Philipp Oppermann.
Differential Revision: https://reviews.llvm.org/D30049
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@299383 91177308-0d34-0410-b5e6-96231b3b80d8
This moves the isMask and isShiftedMask functions to be class methods. They now use the MathExtras.h function for single word size and leading/trailing zeros/ones or countPopulation for the multiword size. The previous implementation made multiple temorary memory allocations to do the bitwise arithmetic operations to match the MathExtras.h implementation.
Differential Revision: https://reviews.llvm.org/D31565
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@299362 91177308-0d34-0410-b5e6-96231b3b80d8
Our _MM_HINT_T0/T1 constant values are 3/2 which matches gcc, but not icc or Intel documentation. Interestingly gcc had this same bug on their implementation of the gather/scatter builtins at one point too.
Fixes PR32411.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@299234 91177308-0d34-0410-b5e6-96231b3b80d8
Currently ComputeNumSignBits returns the minimum number of sign bits for all elements of vector data, when we may only be interested in one/some of the elements.
This patch adds a DemandedElts argument that allows us to specify the elements we actually care about. The original ComputeNumSignBits implementation calls with a DemandedElts demanding all elements to match current behaviour. Scalar types set this to 1.
I've only added support for BUILD_VECTOR and EXTRACT_VECTOR_ELT so far, all others will default to demanding all elements but can be updated in due course.
Followup to D25691.
Differential Revision: https://reviews.llvm.org/D31311
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@299219 91177308-0d34-0410-b5e6-96231b3b80d8
We currently perform the various fp_to_sint XMM conversion and then transfer to the MMX register (on 32-bit via the stack).
This patch improves support for MOVDQ2Q XMM to MMX transfers and adds the XMM->MMX fp_to_sint direct conversion patterns. The SSE2 specifications are the same as for XMM->XMM and XMM->MMX rounding/exceptions/etc.
Differential Revision: https://reviews.llvm.org/D30868
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@298943 91177308-0d34-0410-b5e6-96231b3b80d8
Copy isn't necessary after the matchVectorShuffleWithUNPCK refactor and undef value will make some future undef/zero handling easier.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@298910 91177308-0d34-0410-b5e6-96231b3b80d8
This is a patch for an on-going bugzilla bug 21281 on the generated X86 code for a matrix transpose8x8 subroutine which requires vector interleaving. The generated code in AVX2 is currently non-optimal and requires 60 instructions as opposed to only 40 instructions generated for AVX1.
The patch includes a fix for the AVX2 case where vector unpack instructions use less operations than the vector blend operations available in AVX2.
In this case using vector unpack instructions is more efficient.
Reviewers:
zvi
delena
igorb
craig.topper
guyblank
eladcohen
m_zuckerman
aymanmus
RKSimon
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@298840 91177308-0d34-0410-b5e6-96231b3b80d8
Patch to generalize combinePCMPAnd1 (for handling SETCC + ZEXT cases) to work for any input that has zero/all bits set masked with an 'all low bits' mask.
Replaced the implicit assumption of shift availability with a call to SupportedVectorShiftWithImm.
Part 1 of 3.
Differential Revision: https://reviews.llvm.org/D31347
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@298779 91177308-0d34-0410-b5e6-96231b3b80d8
This is the payoff for D31156 - if a target has efficient comparison instructions for vector-sized equality,
we can replace memcmp calls with inline code that is both smaller and faster.
Differential Revision: https://reviews.llvm.org/D31290
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@298775 91177308-0d34-0410-b5e6-96231b3b80d8
Add support for widening narrow shuffle masks so we can directly extract from the relevant input vector of the shuffle.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@298616 91177308-0d34-0410-b5e6-96231b3b80d8
Up until now, vpmovm2 instruction described its destination operand size
by the source operand size. This patch adds new pattern for the vpmovm2
instruction. The node describes new expansion of the destination (from
{128|256} to 512).
Differential Revision: https://reviews.llvm.org/D30654
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@298586 91177308-0d34-0410-b5e6-96231b3b80d8
Summary:
This class is a list of AttributeSetNodes corresponding the function
prototype of a call or function declaration. This class used to be
called ParamAttrListPtr, then AttrListPtr, then AttributeSet. It is
typically accessed by parameter and return value index, so
"AttributeList" seems like a more intuitive name.
Rename AttributeSetImpl to AttributeListImpl to follow suit.
It's useful to rename this class so that we can rename AttributeSetNode
to AttributeSet later. AttributeSet is the set of attributes that apply
to a single function, argument, or return value.
Reviewers: sanjoy, javed.absar, chandlerc, pete
Reviewed By: pete
Subscribers: pete, jholewinski, arsenm, dschuff, mehdi_amini, jfb, nhaehnle, sbc100, void, llvm-commits
Differential Revision: https://reviews.llvm.org/D31102
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@298393 91177308-0d34-0410-b5e6-96231b3b80d8
We could do better by splitting any oversized type into whatever vector size the target supports,
but I left that for future work if it ever comes up. The motivating case is memcmp() calls on 16-byte
structs, so I think we can wire that up with a TLI hook that feeds into this.
Differential Revision: https://reviews.llvm.org/D31156
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@298376 91177308-0d34-0410-b5e6-96231b3b80d8