llvm/CodeGen at 5eed637b34df7a601b8231c6373d4b8237317fd8 - llvm

RPCSX/llvm

mirror of https://github.com/RPCSX/llvm.git synced 2025-01-07 12:30:44 +00:00

History

Mehdi Amini 5eed637b34 Improve DAG combine pass on certain IR vector patterns Loading 2 2x32-bit float vectors into the bottom half of a 256-bit vector produced suboptimal code in AVX2 mode with certain IR combinations. In particular, the IR optimizer folded 2f32 + 2f32 -> 4f32, 4f32 + 4f32 (undef) -> 8f32 into a 2f32 + 2f32 -> 8f32, which seems more canonical, but then mysteriously generated rather bad code; the movq/movhpd combination didn't match. The problem lay in the BUILD_VECTOR optimization path. The 2f32 inputs would get promoted to 4f32 by the type legalizer, eventually resulting in a BUILD_VECTOR on two 4f32 into an 8f32. The BUILD_VECTOR then, recognizing these were both half the output size, concatted them and then produced a shuffle. However, the resulting concat + shuffle was more complex than it should be; in the case where the upper half of the output is undef, we probably want to generate shuffle + concat instead. This enhancement causes the vector_shuffle combine step to recognize this suboptimal pattern and correct it. I included it there instead of in BUILD_VECTOR in case the same suboptimal pattern occurs for other reasons. This results in the optimizer correctly producing the optimal movq + movhpd sequence for all three variations on this IR, even with AVX2. I've included a test case. Radar link: rdar://problem/19287012 Fix for PR 21943. From: Fiona Glaser <fglaser@apple.com> git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@226360 91177308-0d34-0410-b5e6-96231b3b80d8		2015-01-17 01:35:56 +00:00
..
AArch64	IR: Move MDLocation into place	2015-01-14 22:27:36 +00:00
ARM	Revert r226242 - Revert Revert Don't create new comdats in CodeGen	2015-01-16 08:38:45 +00:00
CPP
Generic	getMangledTypeStr: clarify how it mangles types, and add tests	2015-01-14 23:05:17 +00:00
Hexagon	[Hexagon] Converting halfword to doubleword multiply intrinsics.	2015-01-16 21:41:57 +00:00
Inputs	IR: Move MDLocation into place	2015-01-14 22:27:36 +00:00
Mips	[mips] Fix a typo in the compare patterns for MIPS32r6/MIPS64r6.	2015-01-15 15:41:03 +00:00
MSP430
NVPTX	Check that the TLI callback enableAggressiveFMAFusion has the desired effect on FMA folding.	2015-01-14 15:36:28 +00:00
PowerPC	[PowerPC] Adjust PatchPoints for ppc64le	2015-01-16 04:40:58 +00:00
R600	R600: Clean up floor tests	2015-01-16 22:11:00 +00:00
SPARC	Use the integrated assembler by default on SPARC.	2015-01-14 07:53:39 +00:00
SystemZ	Use the integrated assembler as default on SystemZ	2015-01-13 19:45:16 +00:00
Thumb	IR: Move MDLocation into place	2015-01-14 22:27:36 +00:00
Thumb2	[ARM] Fix a bug in constant island pass that was triggering an assertion.	2015-01-08 20:44:50 +00:00
X86	Improve DAG combine pass on certain IR vector patterns	2015-01-17 01:35:56 +00:00
XCore	IR: Move MDLocation into place	2015-01-14 22:27:36 +00:00