llvm/test/CodeGen
Mehdi Amini 5eed637b34 Improve DAG combine pass on certain IR vector patterns
Loading 2 2x32-bit float vectors into the bottom half of a 256-bit vector
produced suboptimal code in AVX2 mode with certain IR combinations.

In particular, the IR optimizer folded 2f32 + 2f32 -> 4f32, 4f32 + 4f32
(undef) -> 8f32 into a 2f32 + 2f32 -> 8f32, which seems more canonical,
but then mysteriously generated rather bad code; the movq/movhpd combination
didn't match.

The problem lay in the BUILD_VECTOR optimization path. The 2f32 inputs
would get promoted to 4f32 by the type legalizer, eventually resulting
in a BUILD_VECTOR on two 4f32 into an 8f32. The BUILD_VECTOR then, recognizing
these were both half the output size, concatted them and then produced
a shuffle. However, the resulting concat + shuffle was more complex than
it should be; in the case where the upper half of the output is undef, we
probably want to generate shuffle + concat instead.

This enhancement causes the vector_shuffle combine step to recognize this
suboptimal pattern and correct it. I included it there instead of in BUILD_VECTOR
in case the same suboptimal pattern occurs for other reasons.

This results in the optimizer correctly producing the optimal movq + movhpd
sequence for all three variations on this IR, even with AVX2.

I've included a test case.

Radar link: rdar://problem/19287012
Fix for PR 21943.

From: Fiona Glaser <fglaser@apple.com>

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@226360 91177308-0d34-0410-b5e6-96231b3b80d8
2015-01-17 01:35:56 +00:00
..
AArch64 IR: Move MDLocation into place 2015-01-14 22:27:36 +00:00
ARM Revert r226242 - Revert Revert Don't create new comdats in CodeGen 2015-01-16 08:38:45 +00:00
CPP
Generic getMangledTypeStr: clarify how it mangles types, and add tests 2015-01-14 23:05:17 +00:00
Hexagon [Hexagon] Converting halfword to doubleword multiply intrinsics. 2015-01-16 21:41:57 +00:00
Inputs IR: Move MDLocation into place 2015-01-14 22:27:36 +00:00
Mips [mips] Fix a typo in the compare patterns for MIPS32r6/MIPS64r6. 2015-01-15 15:41:03 +00:00
MSP430
NVPTX Check that the TLI callback enableAggressiveFMAFusion has the desired effect on FMA folding. 2015-01-14 15:36:28 +00:00
PowerPC [PowerPC] Adjust PatchPoints for ppc64le 2015-01-16 04:40:58 +00:00
R600 R600: Clean up floor tests 2015-01-16 22:11:00 +00:00
SPARC Use the integrated assembler by default on SPARC. 2015-01-14 07:53:39 +00:00
SystemZ Use the integrated assembler as default on SystemZ 2015-01-13 19:45:16 +00:00
Thumb IR: Move MDLocation into place 2015-01-14 22:27:36 +00:00
Thumb2 [ARM] Fix a bug in constant island pass that was triggering an assertion. 2015-01-08 20:44:50 +00:00
X86 Improve DAG combine pass on certain IR vector patterns 2015-01-17 01:35:56 +00:00
XCore IR: Move MDLocation into place 2015-01-14 22:27:36 +00:00