llvm/test
Chandler Carruth 4ea3097d08 [x86] Teach the vector shuffle lowering to make a more nuanced decision
between splitting a vector into 128-bit lanes and recombining them vs.
decomposing things into single-input shuffles and a final blend.

This handles a large number of cases in AVX1 where the cross-lane
shuffles would be much more expensive to represent even though we end up
with a fast blend at the root. Instead, we can do a better job of
shuffling in a single lane and then inserting it into the other lanes.

This fixes the remaining bits of Halide's regression captured in PR21281
for AVX1. However, the bug persists in AVX2 because I've made this
change reasonably conservative. The cases where it makes sense in AVX2
to split into 128-bit lanes are much more rare because we can often do
full permutations across all elements of the 256-bit vector. However,
the particular test case in PR21281 is an example of one of the rare
cases where it is *always* better to work in a single 128-bit lane. I'm
going to try to teach the logic to detect and form the good code even in
AVX2 next, but it will need to use a separate heuristic.

Finally, there is one pesky regression here where we previously would
craftily use vpermilps in AVX1 to shuffle both high and low halves at
the same time. We no longer pull that off, and not for any really good
reason. Ultimately, I think this is just another missing nuance to the
selection heuristic that I'll try to add in afterward, but this change
already seems strictly worth doing considering the magnitude of the
improvements in common matrix math shuffle patterns.

As always, please let me know if this causes a surprising regression for
you.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@221861 91177308-0d34-0410-b5e6-96231b3b80d8
2014-11-13 04:06:10 +00:00
..
Analysis [X86] Custom lower UINT_TO_FP from v4f32 to v4i32, and for v8f32 to v8i32 if 2014-11-11 02:23:47 +00:00
Assembler Use FileCheck in a few tests. 2014-11-06 15:05:51 +00:00
Bindings [OCaml] Run tests twice, with ocamlc and ocamlopt (if available) 2014-11-03 09:50:53 +00:00
Bitcode Revert "Revert "DI: Fold constant arguments into a single MDString"" 2014-10-03 20:01:09 +00:00
BugPoint Revert "Revert "DI: Fold constant arguments into a single MDString"" 2014-10-03 20:01:09 +00:00
CodeGen [x86] Teach the vector shuffle lowering to make a more nuanced decision 2014-11-13 04:06:10 +00:00
DebugInfo Add an assert and a test that verify r221709's fix. 2014-11-13 03:20:23 +00:00
ExecutionEngine [MCJIT] Defer application of AArch64 MachO GOT relocations until resolve time. 2014-10-21 23:41:15 +00:00
Feature Delete -std-compile-opts. 2014-10-16 20:00:02 +00:00
FileCheck FileCheck: Add a flag to allow checking empty input 2014-08-07 18:40:37 +00:00
Instrumentation Move asan-coverage into a separate phase. 2014-11-11 22:14:37 +00:00
Integer
JitListener Revert "Revert "DI: Fold constant arguments into a single MDString"" 2014-10-03 20:01:09 +00:00
Linker Copy externally_initialized in GlobalVariable::copyAttributesFrom. 2014-11-10 18:41:59 +00:00
LTO Add Forward Control-Flow Integrity. 2014-11-11 21:08:02 +00:00
MC [mips] Add hardware register name "hwr_ulr" ($29) 2014-11-11 11:22:39 +00:00
Object Object, support both mach-o archive t.o.c file names 2014-11-12 01:37:45 +00:00
Other [lit] Parse all strings as UTF-8 rather than ASCII. 2014-09-12 16:46:05 +00:00
SymbolRewriter Transform: add SymbolRewriter pass 2014-11-07 21:32:08 +00:00
TableGen [AVX512] Added intrinsics for VPCMPEQB and VPCMPEQW. 2014-09-30 11:32:22 +00:00
tools llvm-readobj: Print out address table when dumping COFF delay-import table 2014-11-13 03:22:54 +00:00
Transforms Teach ScalarEvolution to sharpen range information. 2014-11-13 00:00:58 +00:00
Unit
Verifier Extend the verifier to validate range metadata on calls and invokes. 2014-10-20 23:52:07 +00:00
YAMLParser
.clang-format
CMakeLists.txt Make llvm-go test dependency optional. 2014-10-23 19:51:40 +00:00
lit.cfg Only run the gold plugin tests if gold supports the targets we test with. 2014-11-11 05:27:12 +00:00
lit.site.cfg.in [OCaml] [autoconf] Migrate to ocamlfind. 2014-10-30 08:29:45 +00:00
Makefile [OCaml] [autoconf] Migrate to ocamlfind. 2014-10-30 08:29:45 +00:00
Makefile.tests
TestRunner.sh