llvm/test/CodeGen
Evan Cheng 48575f6ea7 Making use of VFP / NEON floating point multiply-accumulate / subtraction is
difficult on current ARM implementations for a few reasons.
1. Even though a single vmla has latency that is one cycle shorter than a pair
   of vmul + vadd, a RAW hazard during the first (4? on Cortex-a8) can cause
   additional pipeline stall. So it's frequently better to single codegen
   vmul + vadd.
2. A vmla folowed by a vmul, vmadd, or vsub causes the second fp instruction to
   stall for 4 cycles. We need to schedule them apart.
3. A vmla followed vmla is a special case. Obvious issuing back to back RAW
   vmla + vmla is very bad. But this isn't ideal either:
     vmul
     vadd
     vmla
   Instead, we want to expand the second vmla:
     vmla
     vmul
     vadd
   Even with the 4 cycle vmul stall, the second sequence is still 2 cycles
   faster.

Up to now, isel simply avoid codegen'ing fp vmla / vmls. This works well enough
but it isn't the optimial solution. This patch attempts to make it possible to
use vmla / vmls in cases where it is profitable.

A. Add missing isel predicates which cause vmla to be codegen'ed.
B. Make sure the fmul in (fadd (fmul)) has a single use. We don't want to
   compute a fmul and a fmla.
C. Add additional isel checks for vmla, avoid cases where vmla is feeding into
   fp instructions (except for the #3 exceptional case).
D. Add ARM hazard recognizer to model the vmla / vmls hazards.
E. Add a special pre-regalloc case to expand vmla / vmls when it's likely the
   vmla / vmls will trigger one of the special hazards.

Work in progress, only A+B are enabled.


git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@120960 91177308-0d34-0410-b5e6-96231b3b80d8
2010-12-05 22:04:16 +00:00
..
Alpha
ARM Making use of VFP / NEON floating point multiply-accumulate / subtraction is 2010-12-05 22:04:16 +00:00
Blackfin
CBackend
CellSPU Handle lshr for i128 correctly on SPU also when 2010-11-29 14:44:28 +00:00
CPP
Generic Removing the useless test that I added recently. It was meant as an example, but not complicated enough to merit another test. 2010-11-20 07:26:51 +00:00
MBlaze Implement branch analysis in the MBlaze backend. 2010-11-21 21:53:36 +00:00
Mips Enable mips32 mul instruction. Patch by Akira Hatanaka <ahatanaka@mips.com> 2010-11-12 00:38:32 +00:00
MSP430 Inline asm mult-alt constraint tests. 2010-11-02 23:01:44 +00:00
PowerPC remove a pointless testcase. 2010-11-15 05:07:03 +00:00
PTX ptx: add command-line options for gpu target and ptx version 2010-11-30 10:14:14 +00:00
SPARC filecheckize 2010-11-23 02:26:52 +00:00
SystemZ Correct bogus module triple specifications. 2010-08-30 10:48:29 +00:00
Thumb Fix epilogue codegen to avoid leaving the stack pointer in an invalid 2010-11-22 18:12:04 +00:00
Thumb2 The Thumb tADDrSPi instruction is not valid when the destination is SP. 2010-12-04 04:40:19 +00:00
X86 Teach X86ISelLowering that the second result of X86ISD::UMUL is a flags 2010-12-05 07:49:54 +00:00
XCore Enable machine sinking critical edge splitting. e.g. 2010-09-20 22:52:00 +00:00
thumb2-mul.ll Enable target-specific mul-lowering on ARM, even at -Os. Remove a test that this makes 2010-09-21 22:51:46 +00:00