RPCS3/llvm - llvm - Gitea: Git with a cup of tea

RPCS3/llvm

mirror of https://github.com/RPCS3/llvm.git synced 2025-04-06 23:31:48 +00:00

Author	SHA1	Message	Date
David Majnemer	951ea8be17	[LoopVectorize] Register cloned assumptions InstCombine cannot effectively remove redundant assumptions without them registered in the assumption cache. The vectorizer can create identical assumptions but doesn't register them with the cache, resulting in slower compile times because InstCombine tries to reason about a lot more assumptions. Fix this by registering the cloned assumptions. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@265800 91177308-0d34-0410-b5e6-96231b3b80d8	2016-04-08 16:37:10 +00:00
Silviu Baranga	d8cc816f81	Re-commit [SCEV] Introduce a guarded backedge taken count and use it in LAA and LV This re-commits r265535 which was reverted in r265541 because it broke the windows bots. The problem was that we had a PointerIntPair which took a pointer to a struct allocated with new. The problem was that new doesn't provide sufficient alignment guarantees. This pattern was already present before r265535 and it just happened to work. To fix this, we now separate the PointerToIntPair from the ExitNotTakenInfo struct into a pointer and a bool. Original commit message: Summary: When the backedge taken codition is computed from an icmp, SCEV can deduce the backedge taken count only if one of the sides of the icmp is an AddRecExpr. However, due to sign/zero extensions, we sometimes end up with something that is not an AddRecExpr. However, we can use SCEV predicates to produce a 'guarded' expression. This change adds a method to SCEV to get this expression, and the SCEV predicate associated with it. In HowManyGreaterThans and HowManyLessThans we will now add a SCEV predicate associated with the guarded backedge taken count when the analyzed SCEV expression is not an AddRecExpr. Note that we only do this as an alternative to returning a 'CouldNotCompute'. We use new feature in Loop Access Analysis and LoopVectorize to analyze and transform more loops. Reviewers: anemet, mzolotukhin, hfinkel, sanjoy Subscribers: flyingforyou, mcrosier, atrick, mssimpso, sanjoy, mzolotukhin, llvm-commits Differential Revision: http://reviews.llvm.org/D17201 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@265786 91177308-0d34-0410-b5e6-96231b3b80d8	2016-04-08 14:29:09 +00:00
Silviu Baranga	89e8236bfb	Revert r265535 until we know how we can fix the bots git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@265541 91177308-0d34-0410-b5e6-96231b3b80d8	2016-04-06 14:06:32 +00:00
Silviu Baranga	39fbde60e1	[SCEV] Introduce a guarded backedge taken count and use it in LAA and LV Summary: When the backedge taken codition is computed from an icmp, SCEV can deduce the backedge taken count only if one of the sides of the icmp is an AddRecExpr. However, due to sign/zero extensions, we sometimes end up with something that is not an AddRecExpr. However, we can use SCEV predicates to produce a 'guarded' expression. This change adds a method to SCEV to get this expression, and the SCEV predicate associated with it. In HowManyGreaterThans and HowManyLessThans we will now add a SCEV predicate associated with the guarded backedge taken count when the analyzed SCEV expression is not an AddRecExpr. Note that we only do this as an alternative to returning a 'CouldNotCompute'. We use new feature in Loop Access Analysis and LoopVectorize to analyze and transform more loops. Reviewers: anemet, mzolotukhin, hfinkel, sanjoy Subscribers: flyingforyou, mcrosier, atrick, mssimpso, sanjoy, mzolotukhin, llvm-commits Differential Revision: http://reviews.llvm.org/D17201 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@265535 91177308-0d34-0410-b5e6-96231b3b80d8	2016-04-06 13:18:26 +00:00
Hal Finkel	dfdada0adb	[LoopVectorize] Don't vectorize loops when everything will be scalarized This change prevents the loop vectorizer from vectorizing when all of the vector types it generates will be scalarized. I've run into this problem on the PPC's QPX vector ISA, which only holds floating-point vector types. The loop vectorizer will, however, happily vectorize loops with purely integer computation. Here's an example: LV: The Smallest and Widest types: 32 / 32 bits. LV: The Widest register is: 256 bits. LV: Found an estimated cost of 0 for VF 1 For instruction: %indvars.iv25 = phi i64 [ 0, %entry ], [ %indvars.iv.next26, %for.body ] LV: Found an estimated cost of 0 for VF 1 For instruction: %arrayidx = getelementptr inbounds [1600 x i32], [1600 x i32]* %a, i64 0, i64 %indvars.iv25 LV: Found an estimated cost of 0 for VF 1 For instruction: %2 = trunc i64 %indvars.iv25 to i32 LV: Found an estimated cost of 1 for VF 1 For instruction: store i32 %2, i32* %arrayidx, align 4 LV: Found an estimated cost of 1 for VF 1 For instruction: %indvars.iv.next26 = add nuw nsw i64 %indvars.iv25, 1 LV: Found an estimated cost of 1 for VF 1 For instruction: %exitcond27 = icmp eq i64 %indvars.iv.next26, 1600 LV: Found an estimated cost of 0 for VF 1 For instruction: br i1 %exitcond27, label %for.cond.cleanup, label %for.body LV: Scalar loop costs: 3. LV: Found an estimated cost of 0 for VF 2 For instruction: %indvars.iv25 = phi i64 [ 0, %entry ], [ %indvars.iv.next26, %for.body ] LV: Found an estimated cost of 0 for VF 2 For instruction: %arrayidx = getelementptr inbounds [1600 x i32], [1600 x i32]* %a, i64 0, i64 %indvars.iv25 LV: Found an estimated cost of 0 for VF 2 For instruction: %2 = trunc i64 %indvars.iv25 to i32 LV: Found an estimated cost of 2 for VF 2 For instruction: store i32 %2, i32* %arrayidx, align 4 LV: Found an estimated cost of 1 for VF 2 For instruction: %indvars.iv.next26 = add nuw nsw i64 %indvars.iv25, 1 LV: Found an estimated cost of 1 for VF 2 For instruction: %exitcond27 = icmp eq i64 %indvars.iv.next26, 1600 LV: Found an estimated cost of 0 for VF 2 For instruction: br i1 %exitcond27, label %for.cond.cleanup, label %for.body LV: Vector loop of width 2 costs: 2. LV: Found an estimated cost of 0 for VF 4 For instruction: %indvars.iv25 = phi i64 [ 0, %entry ], [ %indvars.iv.next26, %for.body ] LV: Found an estimated cost of 0 for VF 4 For instruction: %arrayidx = getelementptr inbounds [1600 x i32], [1600 x i32]* %a, i64 0, i64 %indvars.iv25 LV: Found an estimated cost of 0 for VF 4 For instruction: %2 = trunc i64 %indvars.iv25 to i32 LV: Found an estimated cost of 4 for VF 4 For instruction: store i32 %2, i32* %arrayidx, align 4 LV: Found an estimated cost of 1 for VF 4 For instruction: %indvars.iv.next26 = add nuw nsw i64 %indvars.iv25, 1 LV: Found an estimated cost of 1 for VF 4 For instruction: %exitcond27 = icmp eq i64 %indvars.iv.next26, 1600 LV: Found an estimated cost of 0 for VF 4 For instruction: br i1 %exitcond27, label %for.cond.cleanup, label %for.body LV: Vector loop of width 4 costs: 1. ... LV: Selecting VF: 8. LV: The target has 32 registers LV(REG): Calculating max register usage: LV(REG): At #0 Interval # 0 LV(REG): At #1 Interval # 1 LV(REG): At #2 Interval # 2 LV(REG): At #4 Interval # 1 LV(REG): At #5 Interval # 1 LV(REG): VF = 8 The problem is that the cost model here is not wrong, exactly. Since all of these operations are scalarized, their cost (aside from the uniform ones) are indeed VF*(scalar cost), just as the model suggests. In fact, the larger the VF picked, the lower the relative overhead from the loop itself (and the induction-variable update and check), and so in a sense, picking the largest VF here is the right thing to do. The problem is that vectorizing like this, where all of the vectors will be scalarized in the backend, isn't really vectorizing, but rather interleaving. By itself, this would be okay, but then the vectorizer itself also interleaves, and that's where the problem manifests itself. There's aren't actually enough scalar registers to support the normal interleave factor multiplied by a factor of VF (8 in this example). In other words, the problem with this is that our register-pressure heuristic does not account for scalarization. While we might want to improve our register-pressure heuristic, I don't think this is the right motivating case for that work. Here we have a more-basic problem: The job of the vectorizer is to vectorize things (interleaving aside), and if the IR it generates won't generate any actual vector code, then something is wrong. Thus, if every type looks like it will be scalarized (i.e. will be split into VF or more parts), then don't consider that VF. This is not a problem specific to PPC/QPX, however. The problem comes up under SSE on x86 too, and as such, this change fixes PR26837 too. I've added Sanjay's reduced test case from PR26837 to this commit. Differential Revision: http://reviews.llvm.org/D18537 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@264904 91177308-0d34-0410-b5e6-96231b3b80d8	2016-03-30 19:37:08 +00:00
Nirav Dave	e0d3b8510d	Remove HasFnAttribute guards to getFnAttribute calls These checks are redundant and can be removed Reviewers: hans Subscribers: llvm-commits, mzolotukhin Differential Revision: http://reviews.llvm.org/D18564 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@264872 91177308-0d34-0410-b5e6-96231b3b80d8	2016-03-30 15:41:12 +00:00
Adam Nemet	367051414e	[LoopVectorize] Annotate versioned loop with noalias metadata Summary: Use the new LoopVersioning facility (D16712) to add noalias metadata in the vector loop if we versioned with memchecks. This can enable some optimization opportunities further down the pipeline (see the included test or the benchmark improvement quoted in D16712). The test also covers the bug I had in the initial version in D16712. The vectorizer did not previously use LoopVersioning. The reason is that the vectorizer performs its transformations in single shot. It creates an empty single-block vector loop that it then populates with the widened, if-converted instructions. Thus creating an intermediate versioned scalar loop seems wasteful. So this patch (rather than bringing in LoopVersioning fully) adds a special interface to LoopVersioning to allow the vectorizer to add no-alias annotation while still performing its own versioning. As the vectorizer propagates metadata from the instructions in the original loop to the vector instructions we also check the pointer in the original instruction and see if LoopVersioning can add no-alias metadata based on the issued memchecks. Reviewers: hfinkel, nadav, mzolotukhin Subscribers: mzolotukhin, llvm-commits Differential Revision: http://reviews.llvm.org/D17191 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@263744 91177308-0d34-0410-b5e6-96231b3b80d8	2016-03-17 20:32:37 +00:00
Adam Nemet	31bf7f9ec0	[LV] Preserve LoopInfo when store predication is used This was a latent bug that got exposed by the change to add LoopSimplify as a dependence to LoopLoadElimination. Since LoopInfo was corrupted after LV, LoopSimplify mis-compiled nbench in the test-suite (more details in the PR). The problem was that when we create the blocks for predicated stores we didn't add those to any loops. The original testcase for store predication provides coverage for this assuming we verify LI on the way out of LV. Fixes PR26952. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@263565 91177308-0d34-0410-b5e6-96231b3b80d8	2016-03-15 18:06:20 +00:00
Hans Wennborg	1836552368	Revert r255691 "[LoopVectorizer] Refine loop vectorizer's register usage calculator by ignoring specific instructions." It caused PR26509. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@261368 91177308-0d34-0410-b5e6-96231b3b80d8	2016-02-19 21:40:12 +00:00
Matthew Simpson	3dd74513a8	[LV] Vectorize first-order recurrences This patch enables the vectorization of first-order recurrences. A first-order recurrence is a non-reduction recurrence relation in which the value of the recurrence in the current loop iteration equals a value defined in the previous iteration. The load PRE of the GVN pass often creates these recurrences by hoisting loads from within loops. In this patch, we add a new recurrence kind for first-order phi nodes and attempt to vectorize them if possible. Vectorization is performed by shuffling the values for the current and previous iterations. The vectorization cost estimate is updated to account for the added shuffle instruction. Contributed-by: Matthew Simpson and Chad Rosier <mcrosier@codeaurora.org> Differential Revision: http://reviews.llvm.org/D16197 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@261346 91177308-0d34-0410-b5e6-96231b3b80d8	2016-02-19 17:56:08 +00:00
Silviu Baranga	865db3895b	[LV] Fix PR26600: avoid out of bounds loads for interleaved access vectorization Summary: If we don't have the first and last access of an interleaved load group, the first and last wide load in the loop can do an out of bounds access. Even though we discard results from speculative loads, this can cause problems, since it can technically generate page faults (or worse). We now discard interleaved load groups that don't have the first and load in the group. Reviewers: hfinkel, rengolin Subscribers: rengolin, llvm-commits, mzolotukhin, anemet Differential Revision: http://reviews.llvm.org/D17332 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@261331 91177308-0d34-0410-b5e6-96231b3b80d8	2016-02-19 15:46:10 +00:00
Elena Demikhovsky	2c7551bff2	Create masked gather and scatter intrinsics in Loop Vectorizer. Loop vectorizer now knows to vectorize GEP and create masked gather and scatter intrinsics for random memory access. The feature is enabled on AVX-512 target. Differential Revision: http://reviews.llvm.org/D15690 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@261140 91177308-0d34-0410-b5e6-96231b3b80d8	2016-02-17 19:23:04 +00:00
Silviu Baranga	23340531a1	[LV] Add support for insertelt/extractelt processing during type truncation Summary: While shrinking types according to the required bits, we can encounter insert/extract element instructions. This will cause us to reach an llvm_unreachable statement. This change adds support for truncating insert/extract element operations, and adds a regression test. Reviewers: jmolloy Subscribers: mzolotukhin, llvm-commits Differential Revision: http://reviews.llvm.org/D17078 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@260893 91177308-0d34-0410-b5e6-96231b3b80d8	2016-02-15 15:38:17 +00:00
Silviu Baranga	e942cf87e8	[SCEV][LAA] Re-commit r260085 and r260086, this time with a fix for the memory sanitizer issue. The PredicatedScalarEvolution's copy constructor wasn't copying the Generation value, and was leaving it un-initialized. Original commit message: [SCEV][LAA] Add no wrap SCEV predicates and use use them to improve strided pointer detection Summary: This change adds no wrap SCEV predicates with: - support for runtime checking - support for expression rewriting: (sext ({x,+,y}) -> {sext(x),+,sext(y)} (zext ({x,+,y}) -> {zext(x),+,sext(y)} Note that we are sign extending the increment of the SCEV, even for the zext case. This is needed to cover the fairly common case where y would be a (small) negative integer. In order to do this, this change adds two new flags: nusw and nssw that are applicable to AddRecExprs and permit the transformations above. We also change isStridedPtr in LAA to be able to make use of these predicates. With this feature we should now always be able to work around overflow issues in the dependence analysis. Reviewers: mzolotukhin, sanjoy, anemet Subscribers: mzolotukhin, sanjoy, llvm-commits, rengolin, jmolloy, hfinkel Differential Revision: http://reviews.llvm.org/D15412 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@260112 91177308-0d34-0410-b5e6-96231b3b80d8	2016-02-08 17:02:45 +00:00
Silviu Baranga	bbaff75d11	Revert r260086 and r260085. They have broken the memory sanitizer bots. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@260087 91177308-0d34-0410-b5e6-96231b3b80d8	2016-02-08 11:56:15 +00:00
Silviu Baranga	41fcf12691	[SCEV][LAA] Add no wrap SCEV predicates and use use them to improve strided pointer detection Summary: This change adds no wrap SCEV predicates with: - support for runtime checking - support for expression rewriting: (sext ({x,+,y}) -> {sext(x),+,sext(y)} (zext ({x,+,y}) -> {zext(x),+,sext(y)} Note that we are sign extending the increment of the SCEV, even for the zext case. This is needed to cover the fairly common case where y would be a (small) negative integer. In order to do this, this change adds two new flags: nusw and nssw that are applicable to AddRecExprs and permit the transformations above. We also change isStridedPtr in LAA to be able to make use of these predicates. With this feature we should now always be able to work around overflow issues in the dependence analysis. Reviewers: mzolotukhin, sanjoy, anemet Subscribers: mzolotukhin, sanjoy, llvm-commits, rengolin, jmolloy, hfinkel Differential Revision: http://reviews.llvm.org/D15412 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@260085 91177308-0d34-0410-b5e6-96231b3b80d8	2016-02-08 10:45:50 +00:00
Wei Mi	eafb39b656	[SCEV] Try to reuse existing value during SCEV expansion Current SCEV expansion will expand SCEV as a sequence of operations and doesn't utilize the value already existed. This will introduce redundent computation which may not be cleaned up throughly by following optimizations. This patch introduces an ExprValueMap which is a map from SCEV to the set of equal values with the same SCEV. When a SCEV is expanded, the set of values is checked and reused whenever possible before generating a sequence of operations. The original commit triggered regressions in Polly tests. The regressions exposed two problems which have been fixed in current version. 1. Polly will generate a new function based on the old one. To generate an instruction for the new function, it builds SCEV for the old instruction, applies some tranformation on the SCEV generated, then expands the transformed SCEV and insert the expanded value into new function. Because SCEV expansion may reuse value cached in ExprValueMap, the value in old function may be inserted into new function, which is wrong. In SCEVExpander::expand, there is a logic to check the cached value to be used should dominate the insertion point. However, for the above case, the check always passes. That is because the insertion point is in a new function, which is unreachable from the old function. However for unreachable node, DominatorTreeBase::dominates thinks it will be dominated by any other node. The fix is to simply add a check that the cached value to be used in expansion should be in the same function as the insertion point instruction. 2. When the SCEV is of scConstant type, expanding it directly is cheaper than reusing a normal value cached. Although in the cached value set in ExprValueMap, there is a Constant type value, but it is not easy to find it out -- the cached Value set is not sorted according to the potential cost. Existing reuse logic in SCEVExpander::expand simply chooses the first legal element from the cached value set. The fix is that when the SCEV is of scConstant type, don't try the reuse logic. simply expand it. Differential Revision: http://reviews.llvm.org/D12090 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@259736 91177308-0d34-0410-b5e6-96231b3b80d8	2016-02-04 01:27:38 +00:00
Junmo Park	f38d8c901e	Minor code cleanups. NFC. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@259725 91177308-0d34-0410-b5e6-96231b3b80d8	2016-02-03 23:16:39 +00:00
Wei Mi	dcbf7c311e	Revert r259662, which caused regressions on polly tests. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@259675 91177308-0d34-0410-b5e6-96231b3b80d8	2016-02-03 18:05:57 +00:00
Wei Mi	e32bfe25a3	[SCEV] Try to reuse existing value during SCEV expansion Current SCEV expansion will expand SCEV as a sequence of operations and doesn't utilize the value already existed. This will introduce redundent computation which may not be cleaned up throughly by following optimizations. This patch introduces an ExprValueMap which is a map from SCEV to the set of equal values with the same SCEV. When a SCEV is expanded, the set of values is checked and reused whenever possible before generating a sequence of operations. Differential Revision: http://reviews.llvm.org/D12090 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@259662 91177308-0d34-0410-b5e6-96231b3b80d8	2016-02-03 17:05:12 +00:00
Matthew Simpson	54a309e4ea	[LV] Rename RdxPHIsToFix to PHIsToFix (NFC) In the future, we will vectorize recurrences other than reductions. This patch renames a few variables and updates their associated comments to enable them to be reused for non-reduction PHI nodes. This change was requested in the review for D16197. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@259364 91177308-0d34-0410-b5e6-96231b3b80d8	2016-02-01 16:07:01 +00:00
Matthew Simpson	cfa9b54c86	[LV] Avoid creating empty reduction entries (NFC) This patch prevents us from unintentionally creating entries in the reductions map for PHIs that are not actually reductions. This is currently not an issue since we bail out if we encounter PHIs other than inductions or reductions. However the behavior could become problematic as we add support for additional recurrence types. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@256930 91177308-0d34-0410-b5e6-96231b3b80d8	2016-01-06 12:50:29 +00:00
Sanjoy Das	4b892417a6	[SCEV] Add and use SCEVConstant::getAPInt; NFCI git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@255921 91177308-0d34-0410-b5e6-96231b3b80d8	2015-12-17 20:28:46 +00:00
Cong Hou	e956465289	[LoopVectorizer] Refine loop vectorizer's register usage calculator by ignoring specific instructions. (This is the third attempt to check in this patch, and the first two are r255454 and r255460. The once failed test file reg-usage.ll is now moved to test/Transform/LoopVectorize/X86 directory with target datalayout and target triple indicated.) LoopVectorizationCostModel::calculateRegisterUsage() is used to estimate the register usage for specific VFs. However, it takes into account many instructions that won't be vectorized, such as induction variables, GetElementPtr instruction, etc.. This makes the loop vectorizer too conservative when choosing VF. In this patch, the induction variables that won't be vectorized plus GetElementPtr instruction will be added to ValuesToIgnore set so that their register usage won't be considered any more. Differential revision: http://reviews.llvm.org/D15177 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@255691 91177308-0d34-0410-b5e6-96231b3b80d8	2015-12-15 22:45:09 +00:00
Cong Hou	dbef3b079d	Revert r255460, which still causes test failures on some platforms. Further investigation on the failures is ongoing. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@255463 91177308-0d34-0410-b5e6-96231b3b80d8	2015-12-13 17:15:38 +00:00
Cong Hou	f26946fa52	[LoopVectorizer] Refine loop vectorizer's register usage calculator by ignoring specific instructions. (This is the second attempt to check in this patch: REQUIRES: asserts is added to reg-usage.ll now.) LoopVectorizationCostModel::calculateRegisterUsage() is used to estimate the register usage for specific VFs. However, it takes into account many instructions that won't be vectorized, such as induction variables, GetElementPtr instruction, etc.. This makes the loop vectorizer too conservative when choosing VF. In this patch, the induction variables that won't be vectorized plus GetElementPtr instruction will be added to ValuesToIgnore set so that their register usage won't be considered any more. Differential revision: http://reviews.llvm.org/D15177 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@255460 91177308-0d34-0410-b5e6-96231b3b80d8	2015-12-13 16:55:46 +00:00
Cong Hou	6f344e5da6	Revert r255454 as it leads to several test failers on buildbots. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@255456 91177308-0d34-0410-b5e6-96231b3b80d8	2015-12-13 09:28:57 +00:00
Cong Hou	c731de4630	[LoopVectorizer] Refine loop vectorizer's register usage calculator by ignoring specific instructions. LoopVectorizationCostModel::calculateRegisterUsage() is used to estimate the register usage for specific VFs. However, it takes into account many instructions that won't be vectorized, such as induction variables, GetElementPtr instruction, etc.. This makes the loop vectorizer too conservative when choosing VF. In this patch, the induction variables that won't be vectorized plus GetElementPtr instruction will be added to ValuesToIgnore set so that their register usage won't be considered any more. Differential revision: http://reviews.llvm.org/D15177 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@255454 91177308-0d34-0410-b5e6-96231b3b80d8	2015-12-13 08:44:08 +00:00
Silviu Baranga	90f6cd579a	Re-commit r255115, with the PredicatedScalarEvolution class moved to ScalarEvolution.h, in order to avoid cyclic dependencies between the Transform and Analysis modules: [LV][LAA] Add a layer over SCEV to apply run-time checked knowledge on SCEV expressions Summary: This change creates a layer over ScalarEvolution for LAA and LV, and centralizes the usage of SCEV predicates. The SCEVPredicatedLayer takes the statically deduced knowledge by ScalarEvolution and applies the knowledge from the SCEV predicates. The end goal is that both LAA and LV should use this interface everywhere. This also solves a problem involving the result of SCEV expression rewritting when the predicate changes. Suppose we have the expression (sext {a,+,b}) and two predicates P1: {a,+,b} has nsw P2: b = 1. Applying P1 and then P2 gives us {a,+,1}, while applying P2 and the P1 gives us sext({a,+,1}) (the AddRec expression was changed by P2 so P1 no longer applies). The SCEVPredicatedLayer maintains the order of transformations by feeding back the results of previous transformations into new transformations, and therefore avoiding this issue. The SCEVPredicatedLayer maintains a cache to remember the results of previous SCEV rewritting results. This also has the benefit of reducing the overall number of expression rewrites. Reviewers: mzolotukhin, anemet Subscribers: jmolloy, sanjoy, llvm-commits Differential Revision: http://reviews.llvm.org/D14296 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@255122 91177308-0d34-0410-b5e6-96231b3b80d8	2015-12-09 16:06:28 +00:00
Silviu Baranga	bdd73bcbd7	Revert r255115 until we figure out how to fix the bot failures. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@255117 91177308-0d34-0410-b5e6-96231b3b80d8	2015-12-09 15:25:28 +00:00
Silviu Baranga	69c30d5b6c	[LV][LAA] Add a layer over SCEV to apply run-time checked knowledge on SCEV expressions Summary: This change creates a layer over ScalarEvolution for LAA and LV, and centralizes the usage of SCEV predicates. The SCEVPredicatedLayer takes the statically deduced knowledge by ScalarEvolution and applies the knowledge from the SCEV predicates. The end goal is that both LAA and LV should use this interface everywhere. This also solves a problem involving the result of SCEV expression rewritting when the predicate changes. Suppose we have the expression (sext {a,+,b}) and two predicates P1: {a,+,b} has nsw P2: b = 1. Applying P1 and then P2 gives us {a,+,1}, while applying P2 and the P1 gives us sext({a,+,1}) (the AddRec expression was changed by P2 so P1 no longer applies). The SCEVPredicatedLayer maintains the order of transformations by feeding back the results of previous transformations into new transformations, and therefore avoiding this issue. The SCEVPredicatedLayer maintains a cache to remember the results of previous SCEV rewritting results. This also has the benefit of reducing the overall number of expression rewrites. Reviewers: mzolotukhin, anemet Subscribers: jmolloy, sanjoy, llvm-commits Differential Revision: http://reviews.llvm.org/D14296 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@255115 91177308-0d34-0410-b5e6-96231b3b80d8	2015-12-09 15:03:52 +00:00
Cong Hou	c5cf58b8a7	Fix a typo in LoopVectorize.cpp. NFC. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@254813 91177308-0d34-0410-b5e6-96231b3b80d8	2015-12-05 01:00:22 +00:00
Cong Hou	aaaedd7f8f	Fix a typo in LoopVectorize.cpp. NFC. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@254549 91177308-0d34-0410-b5e6-96231b3b80d8	2015-12-02 21:33:47 +00:00
Charlie Turner	c8dc70b584	[LoopVectorize] Use MapVector rather than DenseMap for MinBWs. The order in which instructions are truncated in truncateToMinimalBitwidths effects code generation. Switch to a map with a determinisic order, since the iteration order over a DenseMap is not defined. This code is not hot, so the difference in container performance isn't interesting. Many thanks to David Blaikie for making me aware of MapVector! Fixes PR25490. Differential Revision: http://reviews.llvm.org/D14981 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@254179 91177308-0d34-0410-b5e6-96231b3b80d8	2015-11-26 20:39:51 +00:00
Chad Rosier	33e3b5e479	[LV] Add a helper function, isReductionVariable. NFC. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@253565 91177308-0d34-0410-b5e6-96231b3b80d8	2015-11-19 14:19:06 +00:00
Cong Hou	ce103d4605	Fix several long lines (>80) in LoopVectorize.cpp. NFC. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@253527 91177308-0d34-0410-b5e6-96231b3b80d8	2015-11-19 00:32:30 +00:00
Chad Rosier	08e6ab9d25	Typo. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@253336 91177308-0d34-0410-b5e6-96231b3b80d8	2015-11-17 13:58:10 +00:00
James Molloy	ae263d48b0	[LoopVectorize] Address post-commit feedback on r250032 Implemented as many of Michael's suggestions as were possible: * clang-format the added code while it is still fresh. * tried to change Value* to Instruction* in many places in computeMinimumValueSizes - unfortunately there are several places where Constants need to be handled so this wasn't possible. * Reduce the pass list on loop-vectorization-factors.ll. * Fix a bug where we were querying MinBWs for I->getOperand(0) but using MinBWs[I]. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@252469 91177308-0d34-0410-b5e6-96231b3b80d8	2015-11-09 14:32:05 +00:00
Elena Demikhovsky	2c4a333422	LoopVectorizer - skip 'bitcast' between GEP and load. Skipping 'bitcast' in this case allows to vectorize load: %arrayidx = getelementptr inbounds double, double* %in, i64 %indvars.iv %tmp53 = bitcast double** %arrayidx to i64* %tmp54 = load i64, i64* %tmp53, align 8 Differential Revision http://reviews.llvm.org/D14112 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@251907 91177308-0d34-0410-b5e6-96231b3b80d8	2015-11-03 10:29:34 +00:00
Cong Hou	c895fd0d67	Add a flag vectorizer-maximize-bandwidth in loop vectorizer to enable using larger vectorization factor. To be able to maximize the bandwidth during vectorization, this patch provides a new flag vectorizer-maximize-bandwidth. When it is turned on, the vectorizer will determine the vectorization factor (VF) using the smallest instead of widest type in the loop. To avoid increasing register pressure too much, estimates of the register usage for different VFs are calculated so that we only choose a VF when its register usage doesn't exceed the number of available registers. This is the second attempt to submit this patch. The first attempt got a test failure on ARM. This patch is updated to try to fix the failure (more specifically, by handling the case when VF=1). Differential revision: http://reviews.llvm.org/D8943 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@251850 91177308-0d34-0410-b5e6-96231b3b80d8	2015-11-02 22:53:48 +00:00
Silviu Baranga	a0b73c263e	[SCEV][LV] Add SCEV Predicates and use them to re-implement stride versioning Summary: SCEV Predicates represent conditions that typically cannot be derived from static analysis, but can be used to reduce SCEV expressions to forms which are usable for different optimizers. ScalarEvolution now has the rewriteUsingPredicate method which can simplify a SCEV expression using a SCEVPredicateSet. The normal workflow of a pass using SCEVPredicates would be to hold a SCEVPredicateSet and every time assumptions need to be made a new SCEV Predicate would be created and added to the set. Each time after calling getSCEV, the user will call the rewriteUsingPredicate method. We add two types of predicates SCEVPredicateSet - implements a set of predicates SCEVEqualPredicate - tests for equality between two SCEV expressions We use the SCEVEqualPredicate to re-implement stride versioning. Every time we version a stride, we will add a SCEVEqualPredicate to the context. Instead of adding specific stride checks, LoopVectorize now adds a more generic SCEV check. We only need to add support for this in the LoopVectorizer since this is the only pass that will do stride versioning. Reviewers: mzolotukhin, anemet, hfinkel, sanjoy Subscribers: sanjoy, hfinkel, rengolin, jmolloy, llvm-commits Differential Revision: http://reviews.llvm.org/D13595 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@251800 91177308-0d34-0410-b5e6-96231b3b80d8	2015-11-02 14:41:02 +00:00
Cong Hou	e29e7e235a	Revert the revision 251592 as it fails a test on some platforms. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@251617 91177308-0d34-0410-b5e6-96231b3b80d8	2015-10-29 05:35:22 +00:00
Cong Hou	72daf62570	Add a flag vectorizer-maximize-bandwidth in loop vectorizer to enable using larger vectorization factor. To be able to maximize the bandwidth during vectorization, this patch provides a new flag vectorizer-maximize-bandwidth. When it is turned on, the vectorizer will determine the vectorization factor (VF) using the smallest instead of widest type in the loop. To avoid increasing register pressure too much, estimates of the register usage for different VFs are calculated so that we only choose a VF when its register usage doesn't exceed the number of available registers. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@251592 91177308-0d34-0410-b5e6-96231b3b80d8	2015-10-29 01:28:44 +00:00
NAKAMURA Takumi	c287d4cc99	Whitespace. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@251437 91177308-0d34-0410-b5e6-96231b3b80d8	2015-10-27 19:02:52 +00:00
NAKAMURA Takumi	50aba1e345	Revert r251291, "Loop Vectorizer - skipping "bitcast" before GEP" It causes miscompilation of llvm/lib/ExecutionEngine/Interpreter/Execution.cpp. See also PR25324. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@251436 91177308-0d34-0410-b5e6-96231b3b80d8	2015-10-27 19:02:36 +00:00
Elena Demikhovsky	df91a787c0	Loop Vectorizer - skipping "bitcast" before GEP Vectorization of memory instruction (Load/Store) is possible when the pointer is coming from GEP. The GEP analysis allows to estimate the profit. In some cases we have a "bitcast" between GEP and memory instruction. I added code that skips the "bitcast". http://reviews.llvm.org/D13886 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@251291 91177308-0d34-0410-b5e6-96231b3b80d8	2015-10-26 13:42:41 +00:00
Michael Zolotukhin	df43fcd565	Refactor: Simplify boolean conditional return statements in lib/Transforms/Vectorize (NFC). Summary: Use clang-tidy to simplify boolean conditional return statements Differential Revision: http://reviews.llvm.org/D10003 Patch by Richard<legalize@xmission.com> git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@251206 91177308-0d34-0410-b5e6-96231b3b80d8	2015-10-24 20:16:42 +00:00
Duncan P. N. Exon Smith	f792618d1c	Vectorize: Remove implicit ilist iterator conversions, NFC Besides the usual, I finally added an overload to `BasicBlock::splitBasicBlock()` that accepts an `Instruction*` instead of `BasicBlock::iterator`. Someone can go back and remove this overload later (after updating the callers I'm going to skip going forward), but the most common call seems to be `BB->splitBasicBlock(BB->getTerminator(), ...)` and I'm not sure it's better to add `->getIterator()` to every one than have the overload. It's pretty hard to get the usage wrong. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@250745 91177308-0d34-0410-b5e6-96231b3b80d8	2015-10-19 22:06:09 +00:00
Elena Demikhovsky	da40167c02	Removed parameter "Consecutive" from isLegalMaskedLoad() / isLegalMaskedStore(). Originally I planned to use the same interface for masked gather/scatter and set isConsecutive to "false" in this case. Now I'm implementing masked gather/scatter and see that the interface is inconvenient. I want to add interfaces isLegalMaskedGather() / isLegalMaskedScatter() instead of using the "Consecutive" parameter in the existing interfaces. Differential Revision: http://reviews.llvm.org/D13850 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@250686 91177308-0d34-0410-b5e6-96231b3b80d8	2015-10-19 07:43:38 +00:00
James Molloy	7dab7edf06	[LoopVectorize] Shrink integer operations into the smallest type possible C semantics force sub-int-sized values (e.g. i8, i16) to be promoted to int type (e.g. i32) whenever arithmetic is performed on them. For targets with native i8 or i16 operations, usually InstCombine can shrink the arithmetic type down again. However InstCombine refuses to create illegal types, so for targets without i8 or i16 registers, the lengthening and shrinking remains. Most SIMD ISAs (e.g. NEON) however support vectors of i8 or i16 even when their scalar equivalents do not, so during vectorization it is important to remove these lengthens and truncates when deciding the profitability of vectorization. The algorithm this uses starts at truncs and icmps, trawling their use-def chains until they terminate or instructions outside the loop are found (or unsafe instructions like inttoptr casts are found). If the use-def chains starting from different root instructions (truncs/icmps) meet, they are unioned. The demanded bits of each node in the graph are ORed together to form an overall mask of the demanded bits in the entire graph. The minimum bitwidth that graph can be truncated to is the bitwidth minus the number of leading zeroes in the overall mask. The intention is that this algorithm should "first do no harm", so it will never insert extra cast instructions. This is why the use-def graphs are unioned, so that subgraphs with different minimum bitwidths do not need casts inserted between them. This algorithm works hard to reduce compile time impact. DemandedBits are only queried if there are extends of illegal types and if a truncate to an illegal type is seen. In the general case, this results in a simple linear scan of the instructions in the loop. No non-noise compile time impact was seen on a clang bootstrap build. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@250032 91177308-0d34-0410-b5e6-96231b3b80d8	2015-10-12 12:34:45 +00:00

1 2 3 4 5 ...

559 Commits