archived-llvm-mirror

mirror of https://github.com/RPCS3/llvm-mirror.git synced 2026-01-31 01:35:20 +01:00

Author	SHA1	Message	Date
Florian Hahn	3944b201cc	[SLP] Consider alternatives for cost of select instructions. Some architectures do not have general vector select instructions (e.g. AArch64). But some cmp/select patterns can be vectorized using other instructions/intrinsics. One example is using min/max instructions for certain patterns. This patch updates the cost calculations for selects in the SLP vectorizer to consider using min/max intrinsics. This patch does not change SLP vectorizer's codegen itself to actually generate those intrinsics, but relies on the backends to lower the vector cmps & selects. This keeps things simple on the SLP side and works well in practice for AArch64. This exposes additional SLP vectorization opportunities in some benchmarks on AArch64 (-O3 -flto). Metric: SLP.NumVectorInstructions Program base slp diff test-suite...ications/JM/ldecod/ldecod.test 502.00 697.00 38.8% test-suite...ications/JM/lencod/lencod.test 1023.00 1414.00 38.2% test-suite...-typeset/consumer-typeset.test 56.00 65.00 16.1% test-suite...6/464.h264ref/464.h264ref.test 804.00 822.00 2.2% test-suite...006/453.povray/453.povray.test 3335.00 3357.00 0.7% test-suite...CFP2000/177.mesa/177.mesa.test 2110.00 2121.00 0.5% test-suite...:: External/Povray/povray.test 2378.00 2382.00 0.2% Reviewed By: RKSimon, samparker Differential Revision: https://reviews.llvm.org/D89969	2020-10-29 20:39:50 +00:00
Simon Pilgrim	2c22c76b35	[SLP] optimizeGatherSequence - assert every Instruction in the worklist is non-null. Fixes clang static analyzer warning.	2020-10-08 20:02:18 +01:00
Sanjay Patel	a056cc9998	[SLP] clean up - use 'const' and ArrayRef constructor; NFC Follow-on tidying suggested in the post-commit review of 6a23668.	2020-09-24 15:31:07 -04:00
Craig Topper	56fc2e53b2	[SLP] Remove LHS and RHS from OperationData. These were only really used for 2 things. One was to check if the operand matches the phi if it exists. The other was for the createOp method to build the reduction. For the first case we still have the operation we just need to know how to index its operands. So I've modified getLHS/getRHS to just use the opcode/kind to know how to find the right operands on an instruction that is now passed in. For the other case we had to create an OperationData object to set the LHS/RHS values and copy the opcode/kind from another object. We would then just call createOp on that temporary object. Instead I've made LHS/RHS arguments to createOp and removed all these temporary objects. Differential Revision: https://reviews.llvm.org/D88193	2020-09-24 10:57:11 -07:00
Craig Topper	2192c7dd43	[SLP] Make HorizontalReduction::getOperationData take an Instruction* instead of a Value. NFCI All of the callers already have an Instruction . Many of them from a dyn_cast. Also update the OperationData constructor to use a Instruction& to remove a dyn_cast and make it clear that the pointer is non-null. Differential Revision: https://reviews.llvm.org/D88132	2020-09-23 10:51:03 -07:00
Alexey Bataev	47531738d7	[SLP]Fix coding style, NFC.	2020-09-22 17:44:29 -04:00
Sanjay Patel	db9cccedfd	[SLP] reduce code duplication for checking parent block; NFC	2020-09-22 09:21:20 -04:00
Sanjay Patel	87c6ed7e4f	[SLP] move misplaced code comments; NFC	2020-09-22 09:21:20 -04:00
Sanjay Patel	207c01f738	[SLP] clean up code in gather(); NFC 1. Use range for-loop to avoid repeatedly accessing end index. 2. Better variable names.	2020-09-22 09:21:20 -04:00
Simon Pilgrim	85b5840339	[SLP] Merge null and dyn_cast<> checks into dyn_cast_or_null<>. NFCI.	2020-09-22 14:01:47 +01:00
Sanjay Patel	c0733e8126	[SLP] use std::distance/find to reduce code; NFC We were already using this code pattern right after the loop, so this makes it consistent.	2020-09-21 16:22:55 -04:00
Sanjay Patel	7f64f1028d	[SLP] use unary shuffle creator to reduce code duplication; NFC	2020-09-21 13:54:06 -04:00
Simon Pilgrim	93c97aff7a	[SLP] Use for-range loops across ValueLists. NFCI. Also rename some existing loops that used a 'j' iterator to consistently use 'V'.	2020-09-21 18:24:23 +01:00
Sanjay Patel	5c7666e423	[SLP] simplify interface for gather(); NFC The implementation of gather() should be reduced too, but this change by itself makes things a little clearer: we don't try to gather to a different type or number-of-values than whatever is passed in as the value list itself.	2020-09-21 12:57:28 -04:00
Simon Pilgrim	40596098d9	SLPVectorizer.cpp - fix include ordering. NFCI.	2020-09-21 17:17:11 +01:00
Alexey Bataev	9d0d65e027	[SLP] Allow reordering of vectorization trees with reused instructions. If some leaves have the same instructions to be vectorized, we may incorrectly evaluate the best order for the root node (it is built for the vector of instructions without repeated instructions and, thus, has less elements than the root node). In this case we just can not try to reorder the tree + we may calculate the wrong number of nodes that requre the same reordering. For example, if the root node is \<a+b, a+c, a+d, f+e\>, then the leaves are \<a, a, a, f\> and \<b, c, d, e\>. When we try to vectorize the first leaf, it will be shrink to \<a, b\>. If instructions in this leaf should be reordered, the best order will be \<1, 0\>. We need to extend this order for the root node. For the root node this order should look like \<3, 0, 1, 2\>. This patch allows extension of the orders of the nodes with the reused instructions. Reviewed By: RKSimon Differential Revision: https://reviews.llvm.org/D45263	2020-09-21 10:51:03 -04:00
Fangrui Song	64799c106c	Fix some clang-tidy bugprone-argument-comment issues	2020-09-19 20:41:25 -07:00
Eric Christopher	f617455845	Temporarily Revert "[SLP] Allow reordering of vectorization trees with reused instructions." as it's infinite looping on occasion. This reverts commit 455ca0ebb69210046928fedffe292420a30f89ad.	2020-09-18 12:50:04 -07:00
Alexey Bataev	50c5b40f69	[SLP] Allow reordering of vectorization trees with reused instructions. If some leaves have the same instructions to be vectorized, we may incorrectly evaluate the best order for the root node (it is built for the vector of instructions without repeated instructions and, thus, has less elements than the root node). In this case we just can not try to reorder the tree + we may calculate the wrong number of nodes that requre the same reordering. For example, if the root node is \<a+b, a+c, a+d, f+e\>, then the leaves are \<a, a, a, f\> and \<b, c, d, e\>. When we try to vectorize the first leaf, it will be shrink to \<a, b\>. If instructions in this leaf should be reordered, the best order will be \<1, 0\>. We need to extend this order for the root node. For the root node this order should look like \<3, 0, 1, 2\>. This patch allows extension of the orders of the nodes with the reused instructions. Reviewed By: RKSimon Differential Revision: https://reviews.llvm.org/D45263	2020-09-18 09:34:59 -04:00
Sanjay Patel	2eeb2e6879	[SLP] sort candidates to increase chance of optimal compare reduction This is one (small) part of improving PR41312: https://llvm.org/PR41312 As shown there and in the smaller tests here, if we have some member of the reduction values that does not match the others, we want to push it to the end (bring the matching members forward and together). In the regression tests, we have 5 candidates for the 4 slots of the reduction. If the one "wrong" compare is grouped with the others, it prevents forming the ideal v4i1 compare reduction. Differential Revision: https://reviews.llvm.org/D87772	2020-09-17 08:49:27 -04:00
Sanjay Patel	4abaadfc37	[SLP] fix formatting; NFC Also move variable declarations closer to usage and add code comments.	2020-09-16 08:50:27 -04:00
Sanjay Patel	fe442597a7	[SLP] remove uses of 'auto' that obscure functionality; NFC	2020-09-16 08:26:21 -04:00
Sanjay Patel	838e7cb42b	[SLP] remove redundant size check; NFC We bail out on small array size anyway.	2020-09-16 08:11:19 -04:00
Sanjay Patel	2afe46fd57	[SLP] move loop index variable declaration to its use; NFC	2020-09-16 07:59:31 -04:00
Sanjay Patel	fec534536c	[SLP] change poorly named variable; NFC 'V' shadows a function argument.	2020-09-16 07:59:31 -04:00
Huihui Zhang	32cbc304b2	[SLPVectorizer][SVE] Skip scalable-vector instructions before vectorizeSimpleInstructions. For scalable type, the aggregated size is unknown at compile-time. Skip instructions with scalable type to ensure the list of instructions for vectorizeSimpleInstructions does not contains any scalable-vector instructions. Reviewed By: RKSimon Differential Revision: https://reviews.llvm.org/D87550	2020-09-15 13:10:15 -07:00
Simon Pilgrim	eba47c734c	SLPVectorizer.h - remove unnecessary AliasAnalysis.h include. NFCI. Forward declare AAResults instead of the (old) AliasAnalysis type. Remove includes from SLPVectorizer.cpp that are already included in SLPVectorizer.h.	2020-09-15 16:24:05 +01:00
Sanjay Patel	0a0854f6fe	[SLP] further limit bailout for load combine candidate (PR47450) The test example based on PR47450 shows that we can match non-byte-sized shifts, but those won't ever be bswap opportunities. This isn't a full fix (we'd still match if the shifts were by 8-bits for example), but this should be enough until there's evidence that we need to do more (this is a borderline case for vectorization in the first place).	2020-09-11 11:56:11 -04:00
Craig Topper	4537130369	[SLPVectorizer][X86][AMDGPU] Remove fcmp+select to fmin/fmax reduction support. Previously we could match fcmp+select to a reduction if the fcmp had the nonans fast math flag. But if the select had the nonans fast math flag, InstCombine would turn it into a fminnum/fmaxnum intrinsic before SLP gets to it. Seems fairly likely that if one of the fcmp+select pair have the fast math flag, they both would. My plan is to start vectorizing the fmaxnum/fminnum version soon, but I wanted to get this code out as it had some of the strangest fast math flag behaviors.	2020-09-10 11:49:19 -07:00
Sanjay Patel	8a96dfec70	[SLP] make commutative check apply only to binops; NFC As discussed in D86798, it's not clear if the caller code works with a more liberal definition of "commutative" that includes intrinsics like min/max. This makes the binop restriction (current functionality is unchanged) explicit until the code is audited/tested.	2020-08-30 10:55:44 -04:00
Christopher Tetreault	c8d8d4e8c9	[SVE] Remove calls to VectorType::getNumElements from Transforms/Vectorize Reviewed By: spatel Differential Revision: https://reviews.llvm.org/D82056	2020-08-27 12:02:20 -07:00
Mehdi Amini	db235b2187	Revert "Revert "[NFC][llvm] Make the contructors of `ElementCount` private."" Was reverted because MLIR/Flang builds were broken, these APIs have been fixed in the meantime.	2020-08-19 17:26:36 +00:00
Mehdi Amini	4386b1823a	Revert "[NFC][llvm] Make the contructors of `ElementCount` private." This reverts commit 264afb9e6aebc98c353644dd0700bec808501cab. (and dependent 6b742cc48 and fc53bd610f) MLIR/Flang are broken.	2020-08-19 17:21:37 +00:00
Francesco Petrogalli	d75808bc7f	[NFC][llvm] Make the contructors of `ElementCount` private. Differential Revision: https://reviews.llvm.org/D86120	2020-08-19 16:26:44 +00:00
Dinar Temirbulatov	120cdeb9a9	[NFC] Guard the cost report block of debug outputs with NDEBUG and switch to SmallString, this is part of D57779.	2020-08-11 16:34:47 +02:00
Florian Hahn	d866e0aa79	[SLP] Make sure instructions are ordered when computing spill cost. The entries in VectorizableTree are not necessarily ordered by their position in basic blocks. Collect them and order them by dominance so later instructions are guaranteed to be visited first. For instructions in different basic blocks, we only scan to the beginning of the block, so their order does not matter, as long as all instructions in a basic block are grouped together. Using dominance ensures a deterministic order. The modified test case contains an example where we compute a wrong spill cost (2) without this patch, even though there is no call between any instruction in the bundle. This seems to have limited practical impact, .e.g on X86 with a recent Intel Xeon CPU with -O3 -march=native -flto on MultiSource,SPEC2000,SPEC2006 there are no binary changes. Reviewed By: ABataev Differential Revision: https://reviews.llvm.org/D82444	2020-08-11 11:18:12 +02:00
Anton Afanasyev	c038d11ee3	[SLP] Fix order of `insertelement`/`insertvalue` seed operands Summary: This patch takes the indices operands of `insertelement`/`insertvalue` into account while generation of seed elements for `findBuildAggregate()`. This function has kept the original order of `insert`s before. Also this patch optimizes `findBuildAggregate()` preventing it from redundant temporary vector allocations and its multiple reversing. Fixes llvm.org/pr44067 Subscribers: hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D83779	2020-08-06 22:09:24 +03:00
Vitaly Buka	1bae08d2a5	[NFC] Remove unused GetUnderlyingObject paramenter Depends on D84617. Differential Revision: https://reviews.llvm.org/D84621	2020-07-31 02:10:03 -07:00
Vitaly Buka	4ee4573a60	[NFC] GetUnderlyingObject -> getUnderlyingObject I am going to touch them in the next patch anyway	2020-07-30 21:08:24 -07:00
David Sherwood	82faee9523	[SVE] Don't consider scalable vector types in SLPVectorizerPass::vectorizeChainsInBlock In vectorizeChainsInBlock we try to collect chains of PHI nodes that have the same element type, but the code is relying upon the implicit conversion from TypeSize -> uint64_t. For now, I have modified the code to ignore PHI nodes with scalable types. Differential Revision: https://reviews.llvm.org/D83542	2020-07-29 16:29:19 +01:00
David Green	49873f2449	[Analysis] TTI: Add CastContextHint for getCastInstrCost Currently, getCastInstrCost has limited information about the cast it's rating, often just the opcode and types. Sometimes there is a context instruction as well, but it isn't trustworthy: for instance, when the vectorizer is rating a plan, it calls getCastInstrCost with the old instructions when, in fact, it's trying to evaluate the cost of the instruction post-vectorization. Thus, the current system can get the cost of certain casts incorrect as the correct cost can vary greatly based on the context in which it's used. For example, if the vectorizer queries getCastInstrCost to evaluate the cost of a sext(load) with tail predication enabled, getCastInstrCost will think it's free most of the time, but it's not always free. On ARM MVE, a VLD2 group cannot be extended like a normal VLDR can. Similar situations can come up with how masked loads can be extended when being split. To fix that, this path adds a new parameter to getCastInstrCost to give it a hint about the context of the cast. It adds a CastContextHint enum which contains the type of the load/store being created by the vectorizer - one for each of the types it can produce. Original patch by Pierre van Houtryve Differential Revision: https://reviews.llvm.org/D79162	2020-07-29 13:32:53 +01:00
Stanislav Mekhanoshin	27cd2159b8	Fixed warning about signed/unsigned comparison I've got the report clang11 issues signed/unsigned mismatch warning here. For some reason only clang11 seems to issue this warning. Differential Revision: https://reviews.llvm.org/D83916	2020-07-17 11:03:42 -07:00
Sanne Wouda	d79de12ea6	[NFC] rename to reflect F is not necessarily an Intrinsic	2020-07-13 15:28:46 +01:00
Sanne Wouda	a003603bd6	[SLPVectorizer] handle vectorizeable library functions Teaches the SLPVectorizer to use vectorized library functions for non-intrinsic calls. This already worked for intrinsics that have vectorized library functions, thanks to D75878, but schedules with library functions with a vector variant were being rejected early. - assume that there are no load/store dependencies between lib functions with a vector variant; this would otherwise prevent the bundle from becoming "ready" - check during legalization that the vector variant can be used - fix-up where we previously assumed that a call would be an intrinsic Differential Revision: https://reviews.llvm.org/D82550	2020-07-13 15:28:46 +01:00
Stanislav Mekhanoshin	6a04f24d67	SLP: honor requested max vector size merging PHIs At the moment this place does not check maximum size set by TTI and just creates a maximum possible vectors. Differential Revision: https://reviews.llvm.org/D82227	2020-07-08 08:06:15 -07:00
Florian Hahn	1a54133388	Revert "[SLP] Make sure instructions are ordered when computing spill cost." This seems to break http://lab.llvm.org:8011/builders/llvm-clang-x86_64-expensive-checks-win/builds/24371 This reverts commit eb46137daa92723b75d828f2db959f2061612622.	2020-07-07 23:15:01 +01:00
Florian Hahn	fce4f3542f	[SLP] Make sure instructions are ordered when computing spill cost. The entries in VectorizableTree are not necessarily ordered by their position in basic blocks. Collect them and order them by dominance so later instructions are guaranteed to be visited first. For instructions in different basic blocks, we only scan to the beginning of the block, so their order does not matter, as long as all instructions in a basic block are grouped together. Using dominance ensures a deterministic order. The modified test case contains an example where we compute a wrong spill cost (2) without this patch, even though there is no call between any instruction in the bundle. This seems to have limited practical impact, .e.g on X86 with a recent Intel Xeon CPU with -O3 -march=native -flto on MultiSource,SPEC2000,SPEC2006 there are no binary changes. Reviewers: craig.topper, RKSimon, xbolva00, ABataev, spatel Reviewed By: ABataev Differential Revision: https://reviews.llvm.org/D82444	2020-07-03 17:30:17 +01:00
Florian Hahn	27cc9a4ca9	[SLP] Limit GEP lists based on width of index computation. D68667 introduced a tighter limit to the number of GEPs to simplify together. The limit was based on the vector element size of the pointer, but the pointers themselves are not actually put in vectors. IIUC we try to vectorize the index computations here, so we should base the limit on the vector element size of the computation of the index. This restores the test regression on AArch64 and also restores the vectorization for a important pattern in SPEC2006/464.h264ref on AArch64 (@test_i16_extend). We get a large benefit from doing a single load up front and then processing the index computations in vectors. Note that we could probably even further improve the AArch64 codegen, if we would do zexts to i32 instead of i64 for the sub operands and then do a single vector sext on the result of the subtractions. AArch64 provides dedicated vector instructions to do so. Sketch of proof in Alive: https://alive2.llvm.org/ce/z/A4xYAB Reviewers: craig.topper, RKSimon, xbolva00, ABataev, spatel Reviewed By: ABataev, spatel Differential Revision: https://reviews.llvm.org/D82418	2020-06-24 19:56:53 +01:00
Sanjay Patel	16ab4831de	[IRBuilder] add/use wrapper to create a generic compare based on predicate type; NFC The predicate can always be used to distinguish between icmp and fcmp, so we don't need to keep repeating this check in the callers.	2020-06-18 15:47:06 -04:00
Christopher Tetreault	59f1665fe9	[SVE] Eliminate calls to default-false VectorType::get() from Vectorize Reviewers: efriedma, fhahn, spatel, sdesmalen, kmclaughlin Reviewed By: efriedma Subscribers: tschuett, hiraditya, rkruppe, psnobl, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D81521	2020-06-16 12:50:13 -07:00

1 2 3 4 5 ...

728 Commits