llvm/test
Simon Pilgrim 34963650b8 [X86][AVX] Lower shuffles as repeated lane shuffles then lane-crossing shuffles
This patch attempts to represent a shuffle as a repeating shuffle (recognisable by is128BitLaneRepeatedShuffleMask) with the source input(s) in their original lanes, followed by a single permutation of the 128-bit lanes to their final destinations.

On AVX2 we can additionally attempt to match using 64-bit sub-lane permutation. AVX2 can also now match a similar 'broadcasted' repeating shuffle.

This patch has several benefits:

 * Avoids prematurely matching with lowerVectorShuffleByMerging128BitLanes which can require both inputs to have their input lanes permuted before shuffling.
 * Can replace PERMPS/PERMD instructions - although these are useful for cross-lane unary shuffling, they require their shuffle mask to be pre-loaded (and increase register pressure).
 * Matching the repeating shuffle makes use of a lot of existing shuffle lowering.

There is an outstanding minor AVX1 regression (combine_unneeded_subvector1 in vector-shuffle-combining.ll) of a previously 128-bit shuffle + subvector splat being converted to a subvector splat + (2 instruction) 256-bit shuffle, I intend to fix this in a followup patch for review.

Differential Revision: http://reviews.llvm.org/D16537

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@260834 91177308-0d34-0410-b5e6-96231b3b80d8
2016-02-13 21:54:04 +00:00
..
Analysis [attrs] Move the norecurse deduction to operate on the node set rather 2016-02-13 08:47:51 +00:00
Assembler
Bindings Remove LLVMGetTargetMachineData leftovers. 2016-02-12 20:26:46 +00:00
Bitcode Restore "[ThinLTO] Use MD5 hash in function index." with fix 2016-02-10 21:55:02 +00:00
BugPoint
CodeGen [X86][AVX] Lower shuffles as repeated lane shuffles then lane-crossing shuffles 2016-02-13 21:54:04 +00:00
DebugInfo [llvm-pdbdump] Start to decode some streams 2016-02-12 22:27:44 +00:00
Examples
ExecutionEngine Disable the new Orc lazy JIT tests on Windows, they do not pass 2016-02-10 18:46:42 +00:00
Feature [GMR/OperandBundles] Teach getModRefBehavior about operand bundles 2016-02-09 02:31:47 +00:00
FileCheck
Instrumentation [msan] Put msan constructor in a comdat. 2016-02-12 00:37:52 +00:00
Integer
JitListener
LibDriver
Linker [ThinLTO] Remove imported available externally defs from comdats. 2016-02-08 18:47:20 +00:00
LTO
MC [AMDGPU] Assembler: Swap operands of flat_store instructions to match AMD assembler 2016-02-12 17:57:54 +00:00
Object [readobj] Dump DT_JMPREL relocations when outputting dynamic relocations. 2016-02-11 04:59:53 +00:00
Other
SymbolRewriter
TableGen SelectionDAG: Make Properties a field of SDPatternOperator 2016-02-10 18:40:04 +00:00
tools [llvm-size] Make error handling uniform. 2016-02-13 01:38:16 +00:00
Transforms [attrs] Move the norecurse deduction to operate on the node set rather 2016-02-13 08:47:51 +00:00
Unit
Verifier
YAMLParser
.clang-format
CMakeLists.txt
lit.cfg
lit.site.cfg.in
TestRunner.sh