llvm/test at 9b6ca9304cd7deb0598b0c3d7488023bfffd29d9 - llvm

RPCSX/llvm

mirror of https://github.com/RPCSX/llvm.git synced 2024-11-24 12:19:53 +00:00

History

Quentin Colombet 9b6ca9304c [CodeGenPrepare] Move extractelement close to store if they can be combined. This patch adds an optimization in CodeGenPrepare to move an extractelement right before a store when the target can combine them. The optimization may promote any scalar operations to vector operations in the way to make that possible. Context Some targets use different register files for both vector and scalar operations. This means that transitioning from one domain to another may incur copy from one register file to another. These copies are not coalescable and may be expensive. For example, according to the scheduling model, on cortex-A8 a vector to GPR move is 20 cycles. Motivating Example Let us consider an example: define void @foo(<2 x i32>* %addr1, i32* %dest) { %in1 = load <2 x i32>* %addr1, align 8 %extract = extractelement <2 x i32> %in1, i32 1 %out = or i32 %extract, 1 store i32 %out, i32* %dest, align 4 ret void } As it is, this IR generates the following assembly on armv7: vldr d16, [r0] @vector load vmov.32 r0, d16[1] @ cross-register-file copy: 20 cycles orr r0, r0, #1 @ scalar bitwise or str r0, [r1] @ scalar store bx lr Whereas we could generate much faster code: vldr d16, [r0] @ vector load vorr.i32 d16, #0x1 @ vector bitwise or vst1.32 {d16[1]}, [r1:32] @ vector extract + store bx lr Half of the computation made in the vector is useless, but this allows to get rid of the expensive cross-register-file copy. Proposed Solution To avoid this cross-register-copy penalty, we promote the scalar operations to vector operations. The penalty will be removed if we manage to promote the whole chain of computation in the vector domain. Currently, we do that only when the chain of computation ends by a store and the target is able to combine an extract with a store. Stores are the most likely candidates, because other instructions produce values that would need to be promoted and so, extracted as some point[1]. Moreover, this is customary that targets feature stores that perform a vector extract (see AArch64 and X86 for instance). The proposed implementation relies on the TargetTransformInfo to decide whether or not it is beneficial to promote a chain of computation in the vector domain. Unfortunately, this interface is rather inaccurate for this level of details and although this optimization may be beneficial for X86 and AArch64, the inaccuracy will lead to the optimization being too aggressive. Basically in TargetTransformInfo, everything that is legal has a cost of 1, whereas, even if a vector type is legal, usually a vector operation is slightly more expensive than its scalar counterpart. That will lead to too many promotions that may not be counter balanced by the saving of the cross-register-file copy. For instance, on AArch64 this penalty is just 4 cycles. For now, the optimization is just enabled for ARM prior than v8, since those processors have a larger penalty on cross-register-file copies, and the scope is limited to basic blocks. Because of these two factors, we limit the effects of the inaccuracy. Indeed, I did not want to build up a fancy cost model with block frequency and everything on top of that. [1] We can imagine targets that can combine an extractelement with other instructions than just stores. If we want to go into that direction, the current interfaces must be augmented and, moreover, I think this becomes a global isel problem. Differential Revision: http://reviews.llvm.org/D5921 <rdar://problem/14170854> git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@220978 91177308-0d34-0410-b5e6-96231b3b80d8		2014-10-31 17:52:53 +00:00
..
Analysis	[SCEV] Improve Scalar Evolution's use of no {un,}signed wrap flags	2014-10-31 11:40:32 +00:00
Assembler	Delete -std-compile-opts.	2014-10-16 20:00:02 +00:00
Bindings	[OCaml] Ensure consistent naming.	2014-10-31 09:19:03 +00:00
Bitcode	Revert "Revert "DI: Fold constant arguments into a single MDString""	2014-10-03 20:01:09 +00:00
BugPoint	Revert "Revert "DI: Fold constant arguments into a single MDString""	2014-10-03 20:01:09 +00:00
CodeGen	[CodeGenPrepare] Move extractelement close to store if they can be combined.	2014-10-31 17:52:53 +00:00
DebugInfo	PR21408: Workaround the appearance of duplicate variables due to problems when inlining two calls to the same function from the same call site.	2014-10-30 20:20:11 +00:00
ExecutionEngine	[MCJIT] Defer application of AArch64 MachO GOT relocations until resolve time.	2014-10-21 23:41:15 +00:00
Feature	Delete -std-compile-opts.	2014-10-16 20:00:02 +00:00
FileCheck	FileCheck: Add a flag to allow checking empty input	2014-08-07 18:40:37 +00:00
Instrumentation	[asan] fix caller-calee instrumentation to emit new cache for every call site	2014-10-31 17:11:27 +00:00
Integer
JitListener	Revert "Revert "DI: Fold constant arguments into a single MDString""	2014-10-03 20:01:09 +00:00
Linker	Unify and update link-messages.ll and redefinition.ll. NFC.	2014-10-31 16:52:30 +00:00
LTO	LTO: Add missing target triple from r218784	2014-10-01 18:49:58 +00:00
MC	[AVX512] Added VBROADCAST{SS/SD} encoding for VL subset.	2014-10-30 14:21:47 +00:00
Object	Object, COFF: Cleanup symbol type code, improve binutils compatibility	2014-10-31 05:07:00 +00:00
Other	[lit] Parse all strings as UTF-8 rather than ASCII.	2014-09-12 16:46:05 +00:00
TableGen	[AVX512] Added intrinsics for VPCMPEQB and VPCMPEQW.	2014-09-30 11:32:22 +00:00
tools	Enable the slp vectorizer in the gold plugin.	2014-10-30 00:38:54 +00:00
Transforms	[SCEV] Improve Scalar Evolution's use of no {un,}signed wrap flags	2014-10-31 11:40:32 +00:00
Unit	Let test/Unit/lit.cfg add config.shlibdir to $PATH on DLL platforms like cygming.	2014-07-04 05:11:55 +00:00
Verifier	Extend the verifier to validate range metadata on calls and invokes.	2014-10-20 23:52:07 +00:00
YAMLParser
.clang-format
CMakeLists.txt	Make llvm-go test dependency optional.	2014-10-23 19:51:40 +00:00
lit.cfg	lit: PR21417: don't try to update OCAMLPATH if LibDir is empty.	2014-10-30 19:26:42 +00:00
lit.site.cfg.in	[OCaml] [autoconf] Migrate to ocamlfind.	2014-10-30 08:29:45 +00:00
Makefile	[OCaml] [autoconf] Migrate to ocamlfind.	2014-10-30 08:29:45 +00:00
Makefile.tests
TestRunner.sh