llvm/CodeGen at 48575f6ea7d5cd21ab29ca370f58fcf9ca31400b - llvm

RPCSX/llvm

mirror of https://github.com/RPCSX/llvm.git synced 2024-12-26 05:56:12 +00:00

History

Evan Cheng 48575f6ea7 Making use of VFP / NEON floating point multiply-accumulate / subtraction is difficult on current ARM implementations for a few reasons. 1. Even though a single vmla has latency that is one cycle shorter than a pair of vmul + vadd, a RAW hazard during the first (4? on Cortex-a8) can cause additional pipeline stall. So it's frequently better to single codegen vmul + vadd. 2. A vmla folowed by a vmul, vmadd, or vsub causes the second fp instruction to stall for 4 cycles. We need to schedule them apart. 3. A vmla followed vmla is a special case. Obvious issuing back to back RAW vmla + vmla is very bad. But this isn't ideal either: vmul vadd vmla Instead, we want to expand the second vmla: vmla vmul vadd Even with the 4 cycle vmul stall, the second sequence is still 2 cycles faster. Up to now, isel simply avoid codegen'ing fp vmla / vmls. This works well enough but it isn't the optimial solution. This patch attempts to make it possible to use vmla / vmls in cases where it is profitable. A. Add missing isel predicates which cause vmla to be codegen'ed. B. Make sure the fmul in (fadd (fmul)) has a single use. We don't want to compute a fmul and a fmla. C. Add additional isel checks for vmla, avoid cases where vmla is feeding into fp instructions (except for the #3 exceptional case). D. Add ARM hazard recognizer to model the vmla / vmls hazards. E. Add a special pre-regalloc case to expand vmla / vmls when it's likely the vmla / vmls will trigger one of the special hazards. Work in progress, only A+B are enabled. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@120960 91177308-0d34-0410-b5e6-96231b3b80d8		2010-12-05 22:04:16 +00:00
..
Alpha
ARM	Making use of VFP / NEON floating point multiply-accumulate / subtraction is	2010-12-05 22:04:16 +00:00
Blackfin
CBackend
CellSPU	Handle lshr for i128 correctly on SPU also when	2010-11-29 14:44:28 +00:00
CPP
Generic	Removing the useless test that I added recently. It was meant as an example, but not complicated enough to merit another test.	2010-11-20 07:26:51 +00:00
MBlaze	Implement branch analysis in the MBlaze backend.	2010-11-21 21:53:36 +00:00
Mips	Enable mips32 mul instruction. Patch by Akira Hatanaka <ahatanaka@mips.com>	2010-11-12 00:38:32 +00:00
MSP430	Inline asm mult-alt constraint tests.	2010-11-02 23:01:44 +00:00
PowerPC	remove a pointless testcase.	2010-11-15 05:07:03 +00:00
PTX	ptx: add command-line options for gpu target and ptx version	2010-11-30 10:14:14 +00:00
SPARC	filecheckize	2010-11-23 02:26:52 +00:00
SystemZ	Correct bogus module triple specifications.	2010-08-30 10:48:29 +00:00
Thumb	Fix epilogue codegen to avoid leaving the stack pointer in an invalid	2010-11-22 18:12:04 +00:00
Thumb2	The Thumb tADDrSPi instruction is not valid when the destination is SP.	2010-12-04 04:40:19 +00:00
X86	Teach X86ISelLowering that the second result of X86ISD::UMUL is a flags	2010-12-05 07:49:54 +00:00
XCore	Enable machine sinking critical edge splitting. e.g.	2010-09-20 22:52:00 +00:00
thumb2-mul.ll	Enable target-specific mul-lowering on ARM, even at -Os. Remove a test that this makes	2010-09-21 22:51:46 +00:00