archived-llvm

mirror of https://github.com/RPCS3/llvm.git synced 2026-01-31 01:25:19 +01:00

Author	SHA1	Message	Date
Max Kazantsev	eb3fd4c932	[LV] Do not create SCEVs on broken IR in emitTransformedIndex. PR39160 At the point when we perform `emitTransformedIndex`, we have a broken IR (in particular, we have Phis for which not every incoming value is properly set). On such IR, it is illegal to create SCEV expressions, because their internal simplification process may try to prove some predicates and break when it stumbles across some broken IR. The only purpose of using SCEV in this particular place is attempt to simplify the generated code slightly. It seems that the result isn't worth it, because some trivial cases (like addition of zero and multiplication by 1) can be handled separately if needed, but more generally InstCombine is able to achieve the goals we want to achieve by using SCEV. This patch fixes a functional crash described in PR39160, and as side-effect it also generates a bit smarter code in some simple cases. It also may cause some optimality loss (i.e. we will now generate `mul` by power of `2` instead of shift etc), but there is nothing what InstCombine could not handle later. In case of dire need, we can support more trivial cases just in place. Note that this patch only fixes one particular case of the general problem that LV misuses SCEV, attempting to create SCEVs or prove predicates on invalid IR. The general solution, however, seems complex enough. Differential Revision: https://reviews.llvm.org/D52881 Reviewed By: fhahn, hsaito git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@343954 91177308-0d34-0410-b5e6-96231b3b80d8	2018-10-08 05:46:29 +00:00
Dorit Nuzman	2e1904606d	[IAI,LV] Avoid creating interleave-groups for predicated accesse This patch fixes PR39099. When strided loads are predicated, each of them will form an interleaved-group (with gaps). However, subsequent stages of vectorization (planning and transformation) assume that if a load is part of an Interleave-Group it is not predicated, resulting in wrong code - unmasked wide loads are created. The Interleaving Analysis does take care not to have conditional interleave groups of size > 1, but until we extend the planning and transformation stages to support masked-interleave-groups we should also avoid having them for size == 1. Reviewers: Ayal, hsaito, dcaballe, fhahn Reviewed By: Ayal Differential Revision: https://reviews.llvm.org/D52682 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@343931 91177308-0d34-0410-b5e6-96231b3b80d8	2018-10-07 06:57:25 +00:00
Anna Thomas	edafc389f1	[LV][LAA] Vectorize loop invariant values stored into loop invariant address Summary: We are overly conservative in loop vectorizer with respect to stores to loop invariant addresses. More details in https://bugs.llvm.org/show_bug.cgi?id=38546 This is the first part of the fix where we start with vectorizing loop invariant values to loop invariant addresses. This also includes changes to ORE for stores to invariant address. Reviewers: anemet, Ayal, mkuper, mssimpso Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D50665 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@343028 91177308-0d34-0410-b5e6-96231b3b80d8	2018-09-25 20:57:20 +00:00
Tim Northover	0d018d17b3	InstCombine: move hasOneUse check to the top of foldICmpAddConstant There were two combines not covered by the check before now, neither of which actually differed from normal in the benefit analysis. The most recent seems to be because it was just added at the top of the function (naturally). The older is from way back in 2008 (r46687) when we just didn't put those checks in so routinely, and has been diligently maintained since. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@341831 91177308-0d34-0410-b5e6-96231b3b80d8	2018-09-10 14:26:44 +00:00
Anna Thomas	78b2eddfbc	[LV] Fix code gen for conditionally executed loads and stores Fix a latent bug in loop vectorizer which generates incorrect code for memory accesses that are executed conditionally. As pointed in review, this bug definitely affects uniform loads and may affect conditional stores that should have turned into scatters as well). The code gen for conditionally executed uniform loads on architectures that support masked gather instructions is broken. Without this patch, we were unconditionally executing the conditional load in the vectorized version. This patch does the following: 1. Uniform conditional loads on architectures with gather support will have correct code generated. In particular, the cost model (setCostBasedWideningDecision) is fixed. 2. For the recipes which are handled after the widening decision is set, we use the isScalarWithPredication(I, VF) form which is added in the patch. 3. Fix the vectorization cost model for scalarization (getMemInstScalarizationCost): implement and use isPredicatedInst to identify all predicated instructions, not just scalar+predicated. So, now the cost for scalarization will be increased for maskedloads/stores and gather/scatter operations. In short, we should be choosing the gather/scatter in place of scalarization on archs where it is profitable. 4. We needed to weaken the assert in useEmulatedMaskMemRefHack. Reviewers: Ayal, hsaito, mkuper Differential Revision: https://reviews.llvm.org/D51313 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@341673 91177308-0d34-0410-b5e6-96231b3b80d8	2018-09-07 15:53:48 +00:00
Anna Thomas	a989ec75d4	[LV] First order recurrence phis should not be treated as uniform This is fix for PR38786. First order recurrence phis were incorrectly treated as uniform, which caused them to be vectorized as uniform instructions. Patch by Ayal Zaks and Orivej Desh! Reviewed by: Anna Differential Revision: https://reviews.llvm.org/D51639 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@341416 91177308-0d34-0410-b5e6-96231b3b80d8	2018-09-04 22:12:23 +00:00
Nicola Zaghen	4cf831c0ee	[InstCombine] Fold icmp ugt/ult (add nuw X, C2), C --> icmp ugt/ult X, (C - C2) Support for sgt/slt was added in rL294898, this adds the same cases also for unsigned compares. This is the Alive proof: https://rise4fun.com/Alive/nyY Differential Revision: https://reviews.llvm.org/D50972 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@341353 91177308-0d34-0410-b5e6-96231b3b80d8	2018-09-04 10:29:48 +00:00
Roman Tereshin	d52e35312a	Revert "[SCEV][NFC] Check NoWrap flags before lexicographical comparison of SCEVs" This reverts r319889. Unfortunately, wrapping flags are not a part of SCEV's identity (they do not participate in computing a hash value or in equality comparisons) and in fact they could be assigned after the fact w/o rebuilding a SCEV. Grep for const_cast's to see quite a few of examples, apparently all for AddRec's at the moment. So, if 2 expressions get built in 2 slightly different ways: one with flags set in the beginning, the other with the flags attached later on, we may end up with 2 expressions which are exactly the same but have their operands swapped in one of the commutative N-ary expressions, and at least one of them will have "sorted by complexity" invariant broken. 2 identical SCEV's won't compare equal by pointer comparison as they are supposed to. A real-world reproducer is added as a regression test: the issue described causes 2 identical SCEV expressions to have different order of operands and therefore compare not equal, which in its turn prevents LoadStoreVectorizer from vectorizing a pair of consecutive loads. On a larger example (the source of the test attached, which is a bugpoint) I have seen even weirder behavior: adding a constant to an existing SCEV changes the order of the existing terms, for instance, getAddExpr(1, ((A * B) + (C * D))) returns (1 + (C * D) + (A * B)). Differential Revision: https://reviews.llvm.org/D40645 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@340777 91177308-0d34-0410-b5e6-96231b3b80d8	2018-08-27 21:41:37 +00:00
Max Kazantsev	ed10960040	Revert "[InstCombine] Delay foldICmpUsingKnownBits until simple transforms are done" git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@336410 91177308-0d34-0410-b5e6-96231b3b80d8	2018-07-06 04:04:13 +00:00
Gabor Buella	0aae914817	NFC - Various typo fixes in tests git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@336268 91177308-0d34-0410-b5e6-96231b3b80d8	2018-07-04 13:28:39 +00:00
Max Kazantsev	6ac04e25a3	[InstCombine] Delay foldICmpUsingKnownBits until simple transforms are done This patch changes order of transform in InstCombineCompares to avoid performing transforms based on ranges which produce complex bit arithmetics before more simple things (like folding with constants) are done. See PR37636 for the motivating example. Differential Revision: https://reviews.llvm.org/D48584 Reviewed By: spatel, lebedev.ri git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@336172 91177308-0d34-0410-b5e6-96231b3b80d8	2018-07-03 06:23:57 +00:00
Diego Caballero	3a32fa4e62	Move redundant-vf2-cost.ll test to X86 directory redundant-vf2-cost.ll is X86 specific. Moved from test/Transforms/LoopVectorize/redundant-vf2-cost.ll to test/Transforms/LoopVectorize/X86/redundant-vf2-cost.ll git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@334854 91177308-0d34-0410-b5e6-96231b3b80d8	2018-06-15 18:46:03 +00:00
Sanjay Patel	bd9be7a1ed	[TargetLibraryInfo] add mappings from LLVM sin/cos intrinsics to SVML calls These weren't included in D19544 - probably just an oversight. D40044 made it more likely that we'll have LLVM math intrinsics rather than libcalls, so this bug was more easily exposed. As the tests/code show, we already have the complete mappings for pow/exp/log. I don't have any experience with SVML, so I don't know if anything else is missing. It's also not clear to me that we should be doing this transform in IR rather than DAG/isel, but that's a separate issue. Differential Revision: https://reviews.llvm.org/D47610 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@334211 91177308-0d34-0410-b5e6-96231b3b80d8	2018-06-07 18:21:24 +00:00
Karl-Johan Karlsson	ad388af414	[ConstantFold] Disallow folding vector geps into bitcasts Summary: Getelementptr returns a vector of pointers, instead of a single address, when one or more of its arguments is a vector. In such case it is not possible to simplify the expression by inserting a bitcast of operand(0) into the destination type, as it will create a bitcast between different sizes. Reviewers: majnemer, mkuper, mssimpso, spatel Reviewed By: spatel Subscribers: lebedev.ri, llvm-commits Differential Revision: https://reviews.llvm.org/D46379 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@333783 91177308-0d34-0410-b5e6-96231b3b80d8	2018-06-01 19:34:35 +00:00
Sanjay Patel	6da40ce2e2	[LoopVectorize, x86] add tests to show missing SVML transforms; NFC git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@333707 91177308-0d34-0410-b5e6-96231b3b80d8	2018-05-31 22:31:02 +00:00
Sanjay Patel	5a3ab98eb8	[LoopVectorize, x86] regenerate checks; NFC I removed the 'fast' flag from the calls because that's not required. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@333695 91177308-0d34-0410-b5e6-96231b3b80d8	2018-05-31 21:30:36 +00:00
Alexander Ivchenko	1fcee954a4	[X86][CET] Changing -fcf-protection behavior to comply with gcc (LLVM part) This patch aims to match the changes introduced in gcc by https://gcc.gnu.org/ml/gcc-cvs/2018-04/msg00534.html. The IBT feature definition is removed, with the IBT instructions being freely available on all X86 targets. The shadow stack instructions are also being made freely available, and the use of all these CET instructions is controlled by the module flags derived from the -fcf-protection clang option. The hasSHSTK option remains since clang uses it to determine availability of shadow stack instruction intrinsics, but it is no longer directly used. Comes with a clang patch (D46881). Patch by mike.dvoretsky Differential Revision: https://reviews.llvm.org/D46882 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@332705 91177308-0d34-0410-b5e6-96231b3b80d8	2018-05-18 11:58:25 +00:00
Karl-Johan Karlsson	ac455c2fd0	[LV] Add lit testcase for bitcast problem. NFC git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@331878 91177308-0d34-0410-b5e6-96231b3b80d8	2018-05-09 13:34:57 +00:00
Shiva Chen	a8a13bc662	[DebugInfo] Add DILabel metadata and intrinsic llvm.dbg.label. In order to set breakpoints on labels and list source code around labels, we need collect debug information for labels, i.e., label name, the function label belong, line number in the file, and the address label located. In order to keep these information in LLVM IR and to allow backend to generate debug information correctly. We create a new kind of metadata for labels, DILabel. The format of DILabel is !DILabel(scope: !1, name: "foo", file: !2, line: 3) We hope to keep debug information as much as possible even the code is optimized. So, we create a new kind of intrinsic for label metadata to avoid the metadata is eliminated with basic block. The intrinsic will keep existing if we keep it from optimized out. The format of the intrinsic is llvm.dbg.label(metadata !1) It has only one argument, that is the DILabel metadata. The intrinsic will follow the label immediately. Backend could get the label metadata through the intrinsic's parameter. We also create DIBuilder API for labels to be used by Frontend. Frontend could use createLabel() to allocate DILabel objects, and use insertLabel() to insert llvm.dbg.label intrinsic in LLVM IR. Differential Revision: https://reviews.llvm.org/D45024 Patch by Hsiangkai Wang. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@331841 91177308-0d34-0410-b5e6-96231b3b80d8	2018-05-09 02:40:45 +00:00
Daniel Neilson	31d3d2138a	[LV] Move test/Transforms/LoopVectorize/pr23997.ll Summary: This fixes a build break with r331269. test/Transforms/LoopVectorize/pr23997.ll should be in: test/Transforms/LoopVectorize/X86/pr23997.ll git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@331281 91177308-0d34-0410-b5e6-96231b3b80d8	2018-05-01 16:40:45 +00:00
Daniel Neilson	974321bd67	[LV] Preserve inbounds on created GEPs Summary: This is a fix for PR23997. The loop vectorizer is not preserving the inbounds property of GEPs that it creates. This is inhibiting some optimizations. This patch preserves the inbounds property in the case where a load/store is being fed by an inbounds GEP. Reviewers: mkuper, javed.absar, hsaito Reviewed By: hsaito Subscribers: dcaballe, hsaito, llvm-commits Differential Revision: https://reviews.llvm.org/D46191 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@331269 91177308-0d34-0410-b5e6-96231b3b80d8	2018-05-01 15:35:08 +00:00
Evgeny Stupachenko	ca64890b85	Revert r325687 (workaround for PR36032). Summary: Revert r325687 workaround for PR36032 since a fix was committed in r326154. Reviewers: sbaranga Differential Revision: http://reviews.llvm.org/D44768 From: Evgeny Stupachenko <evstupac@gmail.com> <evgeny.v.stupachenko@intel.com> git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@328257 91177308-0d34-0410-b5e6-96231b3b80d8	2018-03-22 22:04:39 +00:00
Andrei Elovikov	afd766bf9c	[LV] Let recordVectorLoopValueForInductionCast to check if IV was created from the cast. Summary: It turned out to be error-prone to expect the callers to handle that - better to leave the decision to this routine and make the required data to be explicitly passed to the function. This handles the case that was missed in the r322473 and fixes the assert mentioned in PR36524. Reviewers: dorit, mssimpso, Ayal, dcaballe Reviewed By: dcaballe Subscribers: Ka-Ka, hiraditya, dneilson, hsaito, llvm-commits Differential Revision: https://reviews.llvm.org/D43812 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@327960 91177308-0d34-0410-b5e6-96231b3b80d8	2018-03-20 09:04:39 +00:00
Max Kazantsev	1eba8752d7	[SCEV] Smart range calculation for SCEVUnknown Phis The range of SCEVUnknown Phi which merges values `X1, X2, ..., XN` can be evaluated as `U(Range(X1), Range(X2), ..., Range(XN))`. Reviewed By: sanjoy Differential Revision: https://reviews.llvm.org/D43810 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@326418 91177308-0d34-0410-b5e6-96231b3b80d8	2018-03-01 06:56:48 +00:00
Evgeny Stupachenko	f0454dcc5c	Fix r326154 buildbots test fail Summary: Add specific mtriples to tests added in r326154. From: Evgeny Stupachenko <evstupac@gmail.com> <evgeny.v.stupachenko@intel.com> git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@326158 91177308-0d34-0410-b5e6-96231b3b80d8	2018-02-27 01:33:11 +00:00
Alexey Bataev	bde3a91811	[LV] Fix test checks, NFC git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@325699 91177308-0d34-0410-b5e6-96231b3b80d8	2018-02-21 16:48:23 +00:00
Sanjay Patel	1c629279f1	revert r325515: [TTI CostModel] change default cost of FP ops to 1 (PR36280) There are too many perf regressions resulting from this, so we need to investigate (and add tests for) targets like ARM and AArch64 before trying to reinstate. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@325658 91177308-0d34-0410-b5e6-96231b3b80d8	2018-02-21 01:42:52 +00:00
Alexey Bataev	1918edb644	[LV] Fix test checks, NFC. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@325617 91177308-0d34-0410-b5e6-96231b3b80d8	2018-02-20 19:49:25 +00:00
Sanjay Patel	1fceaab028	[TTI CostModel] change default cost of FP ops to 1 (PR36280) This change was mentioned at least as far back as: https://bugs.llvm.org/show_bug.cgi?id=26837#c26 ...and I found a real program that is harmed by this: Himeno running on AMD Jaguar gets 6% slower with SLP vectorization: https://bugs.llvm.org/show_bug.cgi?id=36280 ...but the change here appears to solve that bug only accidentally. The div/rem costs for x86 look very wrong in some cases, but that's already true, so we can fix those in follow-up patches. There's also evidence that more cost model changes are needed to solve SLP problems as shown in D42981, but that's an independent problem (though the solution may be adjusted after this change is made). Differential Revision: https://reviews.llvm.org/D43079 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@325515 91177308-0d34-0410-b5e6-96231b3b80d8	2018-02-19 16:11:44 +00:00
Sanjay Patel	a69986c1fa	[LoopVectorize] auto-generate complete checks; NFC git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@324611 91177308-0d34-0410-b5e6-96231b3b80d8	2018-02-08 15:13:47 +00:00
Craig Topper	2f472871bc	[X86] Add support for passing 'prefer-vector-width' function attribute into X86Subtarget and exposing via X86's getRegisterWidth TTI interface. This will cause the vectorizers to do some limiting of the vector widths they create. This is not a strict limit. There are reasons I know of that the loop vectorizer will generate larger vectors for. I've written this in such a way that the interface will only return a properly supported width(0/128/256/512) even if the attribute says something funny like 384 or 10. This has been split from D41895 with the remainder in a follow up commit. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@323015 91177308-0d34-0410-b5e6-96231b3b80d8	2018-01-20 00:26:08 +00:00
Craig Topper	36049c5e9c	[X86] Use vmovdqu64/vmovdqa64 for unmasked integer vector stores for consistency with loads. Previously we used 64 for vXi64 stores and 32 for everything else. This change uses 64 for everything just like do for loads. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@322820 91177308-0d34-0410-b5e6-96231b3b80d8	2018-01-18 07:44:09 +00:00
Hal Finkel	c4ce27e207	Move Transforms/LoopVectorize/consecutive-ptr-cg-bug.ll into the X86 subdirectory This test depends on X86's TTI; move into the X86 subdirectory. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@320914 91177308-0d34-0410-b5e6-96231b3b80d8	2017-12-16 05:10:20 +00:00
Dorit Nuzman	330c5d954f	[LV] Ignore the cost of values that will not appear in the vectorized loop VecValuesToIgnore holds values that will not appear in the vectorized loop. We should therefore ignore their cost when VF > 1. Differential Revision: https://reviews.llvm.org/D40883 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@320463 91177308-0d34-0410-b5e6-96231b3b80d8	2017-12-12 08:57:43 +00:00
Gil Rapaport	c07df42b1a	[LV] Introduce VPBlendRecipe, VPWidenMemoryInstructionRecipe This patch is part of D38676. The patch introduces two new Recipes to handle instructions whose vectorization involves masking. These Recipes take VPlan-level masks in D38676, but still rely on ILV's existing createEdgeMask(), createBlockInMask() in this patch. VPBlendRecipe handles intra-loop phi nodes, which are vectorized as a sequence of SELECTs. Its execute() code is refactored out of ILV::widenPHIInstruction(), which now handles only loop-header phi nodes. VPWidenMemoryInstructionRecipe handles load/store which are to be widened (but are not part of an Interleave Group). In this patch it simply calls ILV::vectorizeMemoryInstruction on execute(). Differential Revision: https://reviews.llvm.org/D39068 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@318149 91177308-0d34-0410-b5e6-96231b3b80d8	2017-11-14 12:09:30 +00:00
Sanjay Patel	72428f5e04	[SimplifyCFG] use pass options and remove the latesimplifycfg pass This is no-functional-change-intended. This is repackaging the functionality of D30333 (defer switch-to-lookup-tables) and D35411 (defer folding unconditional branches) with pass parameters rather than a named "latesimplifycfg" pass. Now that we have individual options to control the functionality, we could decouple when these fire (but that's an independent patch if desired). The next planned step would be to add another option bit to disable the sinking transform mentioned in D38566. This should also make it clear that the new pass manager needs to be updated to limit simplifycfg in the same way as the old pass manager. Differential Revision: https://reviews.llvm.org/D38631 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@316835 91177308-0d34-0410-b5e6-96231b3b80d8	2017-10-28 18:43:07 +00:00
Anna Thomas	fb0da33fc1	[LV] Avoid computing the register usage for default VF. NFC These are changes to reduce redundant computations when calculating a feasible vectorization factor: 1. early return when target has no vector registers 2. don't compute register usage for the default VF. Suggested during review for D37702. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@313176 91177308-0d34-0410-b5e6-96231b3b80d8	2017-09-13 19:35:45 +00:00
Anna Thomas	2691122302	[LV] Clamp the VF to the trip count Summary: When the MaxVectorSize > ConstantTripCount, we should just clamp the vectorization factor to be the ConstantTripCount. This vectorizes loops where the TinyTripCountThreshold >= TripCount < MaxVF. Earlier we were finding the maximum vector width, which could be greater than the trip count itself. The Loop vectorizer does all the work for generating a vectorizable loop, but in the end we would always choose the scalar loop (since the VF > trip count). This allows us to choose the VF keeping in mind the trip count if available. This is a fix on top of rL312472. Reviewers: Ayal, zvi, hfinkel, dneilson Reviewed by: Ayal Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D37702 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@313046 91177308-0d34-0410-b5e6-96231b3b80d8	2017-09-12 16:32:45 +00:00
Zvi Rackover	3438d07f09	LoopVectorize: MaxVF should not be larger than the loop trip count Summary: Improve how MaxVF is computed while taking into account that MaxVF should not be larger than the loop's trip count. Other than saving on compile-time by pruning the possible MaxVF candidates, this patch fixes pr34438 which exposed the following flow: 1. Short trip count identified -> Don't bail out, set OptForSize:=True to avoid tail-loop and runtime checks. 2. Compute MaxVF returned 16 on a target supporting AVX512. 3. OptForSize -> choose VF:=MaxVF. 4. Bail out because TripCount = 8, VF = 16, TripCount % VF !=0 means we need a tail loop. With this patch step 2. will choose MaxVF=8 based on TripCount. Reviewers: Ayal, dorit, mkuper, hfinkel Reviewed By: hfinkel Subscribers: hfinkel, llvm-commits Differential Revision: https://reviews.llvm.org/D37425 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@312472 91177308-0d34-0410-b5e6-96231b3b80d8	2017-09-04 08:35:13 +00:00
Elena Demikhovsky	4a05fa1f1b	Changed basic cost of store operation on X86 Store operation takes 2 UOps on X86 processors. The exact cost calculation affects several optimization passes including loop unroling. This change compensates performance degradation caused by https://reviews.llvm.org/D34458 and shows improvements on some benchmarks. Differential Revision: https://reviews.llvm.org/D35888 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@311285 91177308-0d34-0410-b5e6-96231b3b80d8	2017-08-20 12:34:29 +00:00
Aditya Kumar	70fb4705b4	[Loop Vectorize] Added a separate metadata Added a separate metadata to indicate when the loop has already been vectorized instead of setting width and count to 1. Patch written by Divya Shanmughan and Aditya Kumar Differential Revision: https://reviews.llvm.org/D36220 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@311281 91177308-0d34-0410-b5e6-96231b3b80d8	2017-08-20 10:32:41 +00:00
Sam Elliott	74a34d9193	Keep Optimization Remark Yaml in NewPM Summary: The New Pass Manager infrastructure was forgetting to keep around the optimization remark yaml file that the compiler might have been producing. This meant setting the option to '-' for stdout worked, but setting it to a filename didn't give file output (presumably it was deleted because compilation didn't explicitly keep it). This change just ensures that the file is kept if compilation succeeds. So far I have updated one of the optimization remark output tests to add a version with the new pass manager. It is my intention for this patch to also include changes to all tests that use `-opt-remark-output=` but I wanted to get the code patch ready for review while I was making all those changes. Fixes https://bugs.llvm.org/show_bug.cgi?id=33951 Reviewers: anemet, chandlerc Reviewed By: anemet, chandlerc Subscribers: javed.absar, chandlerc, fhahn, llvm-commits Differential Revision: https://reviews.llvm.org/D36906 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@311271 91177308-0d34-0410-b5e6-96231b3b80d8	2017-08-20 01:30:45 +00:00
Balaram Makam	788841cb66	[SimplifyCFG] Defer folding unconditional branches to LateSimplifyCFG if it can destroy canonical loop structure. Summary: When simplifying unconditional branches from empty blocks, we pre-test if the BB belongs to a set of loop headers and keep the block to prevent passes from destroying canonical loop structure. However, the current algorithm fails if the destination of the branch is a loop header. Especially when such a loop's latch block is folded into loop header it results in additional backedges and LoopSimplify turns it into a nested loop which prevent later optimizations from being applied (e.g., loop unrolling and loop interleaving). This patch augments the existing algorithm by further checking if the destination of the branch belongs to a set of loop headers and defer eliminating it if yes to LateSimplifyCFG. Fixes PR33605: https://bugs.llvm.org/show_bug.cgi?id=33605 Reviewers: efriedma, mcrosier, pacxx, hsung, davidxl Reviewed By: efriedma Subscribers: ashutosh.nema, gberry, javed.absar, llvm-commits Differential Revision: https://reviews.llvm.org/D35411 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@308422 91177308-0d34-0410-b5e6-96231b3b80d8	2017-07-19 08:53:34 +00:00
Ayal Zaks	e1f7499ee7	[LV] Test once if vector trip count is zero, instead of twice Generate a single test to decide if there are enough iterations to jump to the vectorized loop, or else go to the scalar remainder loop. This test compares the Scalar Trip Count: if STC < VF * UF go to the scalar loop. If requiresScalarEpilogue() holds, at-least one iteration must remain scalar; the rest can be used to form vector iterations. So in this case the test checks instead if (STC - 1) < VF * UF by comparing STC <= VF * UF, and going to the scalar loop if so. Otherwise the vector loop is entered for at-least one vector iteration. This test covers the case where incrementing the backedge-taken count will overflow leading to an incorrect trip count of zero. In this (rare) case we will also avoid the vector loop and jump to the scalar loop. This patch simplifies the existing tests and effectively removes the basic-block originally named "min.iters.checked", leaving the single test in block "vector.ph". Original observation and initial patch by Evgeny Stupachenko. Differential Revision: https://reviews.llvm.org/D34150 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@308421 91177308-0d34-0410-b5e6-96231b3b80d8	2017-07-19 05:16:39 +00:00
NAKAMURA Takumi	c49c5b222e	llvm/test/Transforms/LoopVectorize/X86/slm-no-vectorize.ll: -debug is available in +Asserts. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@306979 91177308-0d34-0410-b5e6-96231b3b80d8	2017-07-02 14:25:27 +00:00
Mohammed Agabaria	ffb8f09571	[X86][CM] update add\sub costs of vectors of 64 in X86\SLM arch this patch updates the cost of addq\subq (add\subtract of vectors of 64bits) based on the performance numbers of SLM arch. Differential Revision: https://reviews.llvm.org/D33983 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@306974 91177308-0d34-0410-b5e6-96231b3b80d8	2017-07-02 12:16:15 +00:00
Teresa Johnson	9d923b35aa	Revert "r306473 - re-commit r306336: Enable vectorizer-maximize-bandwidth by default." This still breaks PPC tests we have. I'll forward reproduction instructions to dehao. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@306936 91177308-0d34-0410-b5e6-96231b3b80d8	2017-07-01 03:24:09 +00:00
Teresa Johnson	f7497bfb4a	re-commit r306336: Enable vectorizer-maximize-bandwidth by default. Differential Revision: https://reviews.llvm.org/D33341 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@306935 91177308-0d34-0410-b5e6-96231b3b80d8	2017-07-01 03:24:08 +00:00
Teresa Johnson	423d09931a	revert r306336 for breaking ppc test. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@306934 91177308-0d34-0410-b5e6-96231b3b80d8	2017-07-01 03:24:07 +00:00
Teresa Johnson	005cfad2e8	Enable vectorizer-maximize-bandwidth by default. Summary: vectorizer-maximize-bandwidth is generally useful in terms of performance. I've tested the impact of changing this to default on speccpu benchmarks on sandybridge machines. The result shows non-negative impact: spec/2006/fp/C++/444.namd 26.84 -0.31% spec/2006/fp/C++/447.dealII 46.19 +0.89% spec/2006/fp/C++/450.soplex 42.92 -0.44% spec/2006/fp/C++/453.povray 38.57 -2.25% spec/2006/fp/C/433.milc 24.54 -0.76% spec/2006/fp/C/470.lbm 41.08 +0.26% spec/2006/fp/C/482.sphinx3 47.58 -0.99% spec/2006/int/C++/471.omnetpp 22.06 +1.87% spec/2006/int/C++/473.astar 22.65 -0.12% spec/2006/int/C++/483.xalancbmk 33.69 +4.97% spec/2006/int/C/400.perlbench 33.43 +1.70% spec/2006/int/C/401.bzip2 23.02 -0.19% spec/2006/int/C/403.gcc 32.57 -0.43% spec/2006/int/C/429.mcf 40.35 +0.27% spec/2006/int/C/445.gobmk 26.96 +0.06% spec/2006/int/C/456.hmmer 24.4 +0.19% spec/2006/int/C/458.sjeng 27.91 -0.08% spec/2006/int/C/462.libquantum 57.47 -0.20% spec/2006/int/C/464.h264ref 46.52 +1.35% geometric mean +0.29% The regression on 453.povray seems real, but is due to secondary effects as all hot functions are bit-identical with and without the flag. I started this patch to consult upstream opinions on this. It will be greatly appreciated if the community can help test the performance impact of this change on other architectures so that we can decided if this should be target-dependent. Reviewers: hfinkel, mkuper, davidxl, chandlerc Reviewed By: chandlerc Subscribers: rengolin, sanjoy, javed.absar, bjope, dorit, magabari, RKSimon, llvm-commits, mzolotukhin Differential Revision: https://reviews.llvm.org/D33341 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@306933 91177308-0d34-0410-b5e6-96231b3b80d8	2017-07-01 03:24:06 +00:00

1 2 3 4 5 ...

264 Commits