vXi8/vXi64 vector shifts are often shifted as vYi16/vYi32 types but we weren't always remembering to bitcast the input.
Tested with a new assert as we don't currently manipulate these shifts enough for test cases to catch them.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@294308 91177308-0d34-0410-b5e6-96231b3b80d8
Currently we only combine shuffle nodes if they have a single user to prevent us from causing code bloat by splitting the shuffles into several different combines.
We don't take into account that in some cases we will already have combined all the users during recursively calling up the shuffle tree.
This patch keeps a list of all the shuffle nodes that have been combined so far and permits combining of further shuffle nodes if all its users are in that list.
Differential Revision: https://reviews.llvm.org/D29399
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@294183 91177308-0d34-0410-b5e6-96231b3b80d8
Similar to what we already do for zero elt insertion, we can quickly rematerialize 'allbits' vectors so to avoid a unnecessary gpr value and insertion into a vector
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@294162 91177308-0d34-0410-b5e6-96231b3b80d8
Similar was already done for several other shuffles in this function.
The test changes are because the old code used explicity zeroing for elements that could have been undef.
While I was here I also changed other shuffle vectors in the same function to use the same input twice instead of creating UNDEF nodes. getVectorShuffle can create the UNDEF for us.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@294130 91177308-0d34-0410-b5e6-96231b3b80d8
On one test this seems to have given more chance for DAG combine to do other INSERT_SUBVECTOR/EXTRACT_SUBVECTOR combines before the BLENDI was created. Looks like we can still improve more by teaching DAG combine to optimize INSERT_SUBVECTOR/EXTRACT_SUBVECTOR with BLENDI.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@293944 91177308-0d34-0410-b5e6-96231b3b80d8
This is a first attempt at using the MOVMSK instructions to replace all_of/any_of reduction patterns (i.e. an and/or + shuffle chain).
So far this only matches patterns where we are reducing an all/none bits source vector (i.e. a comparison result) but we should be able to expand on this in conjunction with improvements to 'bool vector' handling both in the x86 backend as well as the vectorizers etc.
Differential Revision: https://reviews.llvm.org/D28810
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@293880 91177308-0d34-0410-b5e6-96231b3b80d8
This moves creation of SUBV_BROADCAST and merging of adjacent loads that are being inserted together.
This is a step towards removing legalizing of INSERT_SUBVECTOR except for vXi1 cases.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@293872 91177308-0d34-0410-b5e6-96231b3b80d8
Also add the ability to recognise PINSR(Vex, 0, Idx).
Targets shuffle combines won't replace multiple insertions with a bit mask until a depth of 3 or more, so we avoid codesize bloat.
The unnecessary vpblendw in clearupper8xi16a will be fixed in an upcoming patch.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@293627 91177308-0d34-0410-b5e6-96231b3b80d8
combineX86ShufflesRecursively can still only handle a maximum of 2 shuffle inputs but everything before it now supports any number of shuffle inputs.
This will be necessary for combining OR(SHUFFLE, SHUFFLE) patterns.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@293560 91177308-0d34-0410-b5e6-96231b3b80d8
Previously this test case fired an assertion in getNode because we tried to create an insert_subvector with both input types the same size and the index pointing to half the vector width.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@293446 91177308-0d34-0410-b5e6-96231b3b80d8
Replaces an xor+movd/movq with an xorps which will be shorter in codesize, avoid an int-fpu transfer, allow modern cores to fast path the result during decode and helps other combines recognise an all-zero vector.
The only reason I can think of that we'd want to keep scalar_to_vector in this case is to help recognise the upper elts are undef but this doesn't seem to be a problem.
Differential Revision: https://reviews.llvm.org/D29097
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@293438 91177308-0d34-0410-b5e6-96231b3b80d8
Pulled out code that removed unused inputs from a target shuffle mask into a helper function to allow it to be reused in a future commit.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@293175 91177308-0d34-0410-b5e6-96231b3b80d8
As discussed on D28219 - it is profitable to combine trunc(binop (s/zext(x), s/zext(y)) to binop(trunc(s/zext(x)), trunc(s/zext(y))) assuming the trunc(ext()) will simplify further
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@292493 91177308-0d34-0410-b5e6-96231b3b80d8