archived-llvm

mirror of https://github.com/RPCS3/llvm.git synced 2026-01-31 01:25:19 +01:00

Author	SHA1	Message	Date
Michael Kruse	0bb9e7af42	Refactor setAlreadyUnrolled() and setAlreadyVectorized(). Loop::setAlreadyUnrolled() and LoopVectorizeHints::setLoopAlreadyUnrolled() both add loop metadata that stops the same loop from being transformed multiple times. This patch merges both implementations. In doing so we fix 3 potential issues: * setLoopAlreadyUnrolled() kept the llvm.loop.vectorize/interleave.* metadata even though it will not be used anymore. This already caused problems such as http://llvm.org/PR40546. Change the behavior to the one of setAlreadyUnrolled which deletes this loop metadata. * setAlreadyUnrolled() used to create a new LoopID by calling MDNode::get with nullptr as the first operand, then replacing it by the returned references using replaceOperandWith. It is possible that MDNode::get would instead return an existing node (due to de-duplication) that then gets modified. To avoid, use a fresh TempMDNode that does not get uniqued with anything else before replacing it with replaceOperandWith. * LoopVectorizeHints::matchesHintMetadataName() only compares the suffix of the attribute to set the new value for. That is, when called with "enable", would erase attributes such as "llvm.loop.unroll.enable", "llvm.loop.vectorize.enable" and "llvm.loop.distribute.enable" instead of the one to replace. Fortunately, function was only called with "isvectorized". Differential Revision: https://reviews.llvm.org/D57566 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@353738 91177308-0d34-0410-b5e6-96231b3b80d8	2019-02-11 19:45:44 +00:00
Florian Hahn	6bed14d8a4	[LV] Prevent interleaving if computeMaxVF returned None. As discussed in D57382, interleaving should be avoided if computeMaxVF returns None, same as we currently do for vectorization. Fixes https://bugs.chromium.org/p/oss-fuzz/issues/detail?id=6477 Reviewers: Ayal, dcaballe, hsaito, mkuper, rengolin Reviewed By: Ayal Differential Revision: https://reviews.llvm.org/D57837 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@353461 91177308-0d34-0410-b5e6-96231b3b80d8	2019-02-07 20:49:10 +00:00
Alina Sbirlea	db7033fef7	Check bool attribute value in getOptionalBoolLoopAttribute. Summary: Check the bool value of the attribute in getOptionalBoolLoopAttribute not just its existance. Eliminates the warning noise generated when vectorization is explicitly disabled. Reviewers: Meinersbur, hfinkel, dmgreen Subscribers: jlebar, sanjoy, llvm-commits Differential Revision: https://reviews.llvm.org/D57260 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@352555 91177308-0d34-0410-b5e6-96231b3b80d8	2019-01-29 22:33:20 +00:00
Johannes Doerfert	9e7dd29d66	[ValueTracking] Look through casts when determining non-nullness Bitcast and certain Ptr2Int/Int2Ptr instructions will not alter the value of their operand and can therefore be looked through when we determine non-nullness. Differential Revision: https://reviews.llvm.org/D54956 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@352293 91177308-0d34-0410-b5e6-96231b3b80d8	2019-01-26 23:40:35 +00:00
Simon Pilgrim	21d100aff8	[CostModel][X86] Add explicit vector select costs Prior to SSE41 (and sometimes on AVX1), vector select has to be performed as a ((X & C)\|(Y & ~C)) bit select. Exposes a couple of issues with the min/max reduction costs (which only go down to SSE42 for some reason). The increase pre-SSE41 selection costs also prevent a couple of tests from firing any longer, so I've either tweaked the target or added AVX tests as well to the existing SSE2 tests. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@351685 91177308-0d34-0410-b5e6-96231b3b80d8	2019-01-20 13:55:01 +00:00
Sanjay Patel	d4e1d2f774	[LoopVectorizer] give more advice in remark about failure to vectorize call Something like this is requested by: https://bugs.llvm.org/show_bug.cgi?id=40265 ...and it seems like a common enough case that we should acknowledge it. Differential Revision: https://reviews.llvm.org/D56551 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@351010 91177308-0d34-0410-b5e6-96231b3b80d8	2019-01-12 15:27:15 +00:00
Florian Hahn	2e8e13e712	[LAA] Avoid generating RT checks for known deps preventing vectorization. If we found unsafe dependences other than 'unknown', we already know at compile time that they are unsafe and the runtime checks should always fail. So we can avoid generating them in those cases. This should have no negative impact on performance as the runtime checks that would be created previously should always fail. As a sanity check, I measured the test-suite, spec2k and spec2k6 and there were no regressions. Reviewers: Ayal, anemet, hsaito Reviewed By: Ayal Differential Revision: https://reviews.llvm.org/D55798 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@349794 91177308-0d34-0410-b5e6-96231b3b80d8	2018-12-20 18:49:09 +00:00
Michael Kruse	42a382c204	Introduce llvm.loop.parallel_accesses and llvm.access.group metadata. The current llvm.mem.parallel_loop_access metadata has a problem in that it uses LoopIDs. LoopID unfortunately is not loop identifier. It is neither unique (there's even a regression test assigning the some LoopID to multiple loops; can otherwise happen if passes such as LoopVersioning make copies of entire loops) nor persistent (every time a property is removed/added from a LoopID's MDNode, it will also receive a new LoopID; this happens e.g. when calling Loop::setLoopAlreadyUnrolled()). Since most loop transformation passes change the loop attributes (even if it just to mark that a loop should not be processed again as llvm.loop.isvectorized does, for the versioned and unversioned loop), the parallel access information is lost for any subsequent pass. This patch unlinks LoopIDs and parallel accesses. llvm.mem.parallel_loop_access metadata on instruction is replaced by llvm.access.group metadata. llvm.access.group points to a distinct MDNode with no operands (avoiding the problem to ever need to add/remove operands), called "access group". Alternatively, it can point to a list of access groups. The LoopID then has an attribute llvm.loop.parallel_accesses with all the access groups that are parallel (no dependencies carries by this loop). This intentionally avoid any kind of "ID". Loops that are clones/have their attributes modifies retain the llvm.loop.parallel_accesses attribute. Access instructions that a cloned point to the same access group. It is not necessary for each access to have it's own "ID" MDNode, but those memory access instructions with the same behavior can be grouped together. The behavior of llvm.mem.parallel_loop_access is not changed by this patch, but should be considered deprecated. Differential Revision: https://reviews.llvm.org/D52116 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@349725 91177308-0d34-0410-b5e6-96231b3b80d8	2018-12-20 04:58:07 +00:00
Florian Hahn	efdc43373b	[LAA] Introduce enum for vectorization safety status (NFC). This patch adds a VectorizationSafetyStatus enum, which will be extended in a follow up patch to distinguish between 'safe with runtime checks' and 'known unsafe' dependences. Reviewers: anemet, anna, Ayal, hsaito Reviewed By: Ayal Differential Revision: https://reviews.llvm.org/D54892 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@349556 91177308-0d34-0410-b5e6-96231b3b80d8	2018-12-18 22:25:11 +00:00
Sanjay Patel	83544a50cc	[LoopVectorize] auto-generate complete checks; NFC The first test claims to show that the vectorizer will generate a vector load/loop, but then this file runs other passes which might scalarize that op. I'm removing instcombine from the RUN line here to break that dependency. Also, I'm generating full checks to make it clear exactly what the vectorizer has done. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@349554 91177308-0d34-0410-b5e6-96231b3b80d8	2018-12-18 22:23:04 +00:00
Michael Kruse	9a395de086	[Unroll/UnrollAndJam/Vectorizer/Distribute] Add followup loop attributes. When multiple loop transformation are defined in a loop's metadata, their order of execution is defined by the order of their respective passes in the pass pipeline. For instance, e.g. #pragma clang loop unroll_and_jam(enable) #pragma clang loop distribute(enable) is the same as #pragma clang loop distribute(enable) #pragma clang loop unroll_and_jam(enable) and will try to loop-distribute before Unroll-And-Jam because the LoopDistribute pass is scheduled after UnrollAndJam pass. UnrollAndJamPass only supports one inner loop, i.e. it will necessarily fail after loop distribution. It is not possible to specify another execution order. Also,t the order of passes in the pipeline is subject to change between versions of LLVM, optimization options and which pass manager is used. This patch adds 'followup' attributes to various loop transformation passes. These attributes define which attributes the resulting loop of a transformation should have. For instance, !0 = !{!0, !1, !2} !1 = !{!"llvm.loop.unroll_and_jam.enable"} !2 = !{!"llvm.loop.unroll_and_jam.followup_inner", !3} !3 = !{!"llvm.loop.distribute.enable"} defines a loop ID (!0) to be unrolled-and-jammed (!1) and then the attribute !3 to be added to the jammed inner loop, which contains the instruction to distribute the inner loop. Currently, in both pass managers, pass execution is in a fixed order and UnrollAndJamPass will not execute again after LoopDistribute. We hope to fix this in the future by allowing pass managers to run passes until a fixpoint is reached, use Polly to perform these transformations, or add a loop transformation pass which takes the order issue into account. For mandatory/forced transformations (e.g. by having been declared by #pragma omp simd), the user must be notified when a transformation could not be performed. It is not possible that the responsible pass emits such a warning because the transformation might be 'hidden' in a followup attribute when it is executed, or it is not present in the pipeline at all. For this reason, this patche introduces a WarnMissedTransformations pass, to warn about orphaned transformations. Since this changes the user-visible diagnostic message when a transformation is applied, two test cases in the clang repository need to be updated. To ensure that no other transformation is executed before the intended one, the attribute `llvm.loop.disable_nonforced` can be added which should disable transformation heuristics before the intended transformation is applied. E.g. it would be surprising if a loop is distributed before a #pragma unroll_and_jam is applied. With more supported code transformations (loop fusion, interchange, stripmining, offloading, etc.), transformations can be used as building blocks for more complex transformations (e.g. stripmining+stripmining+interchange -> tiling). Reviewed By: hfinkel, dmgreen Differential Revision: https://reviews.llvm.org/D49281 Differential Revision: https://reviews.llvm.org/D55288 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@348944 91177308-0d34-0410-b5e6-96231b3b80d8	2018-12-12 17:32:52 +00:00
Nikita Popov	38880e6df9	Reapply "[DemandedBits][BDCE] Support vectors of integers" DemandedBits and BDCE currently only support scalar integers. This patch extends them to also handle vector integer operations. In this case bits are not tracked for individual vector elements, instead a bit is demanded if it is demanded for any of the elements. This matches the behavior of computeKnownBits in ValueTracking and SimplifyDemandedBits in InstCombine. Unlike the previous iteration of this patch, getDemandedBits() can now again be called on arbirary (sized) instructions, even if they don't have integer or vector of integer type. (For vector types the size of the returned mask will now be the scalar size in bits though.) The added LoopVectorize test case shows a case which triggered an assertion failure with the previous attempt, because getDemandedBits() was called on a pointer-typed instruction. Differential Revision: https://reviews.llvm.org/D55297 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@348602 91177308-0d34-0410-b5e6-96231b3b80d8	2018-12-07 15:38:13 +00:00
David L. Jones	ad6bed67ac	Revert r347934 "[SCEV] Guard movement of insertion point for loop-invariants" This change caused SEGVs in instcombine. (The r347934 change seems to me to be a precipitating cause, not a root cause. Details are on the llvm-commits thread for r347934.) git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@348426 91177308-0d34-0410-b5e6-96231b3b80d8	2018-12-05 23:13:50 +00:00
Craig Topper	26f6d0a901	[X86][LoopVectorize] Replace -mcpu=skylake-avx512 with -mattr=avx512f in some tests that failed when experimenting with defaulting to -mprefer-vector-width=256 for skylake-avx512. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@348063 91177308-0d34-0410-b5e6-96231b3b80d8	2018-12-01 01:38:44 +00:00
Renato Golin	4e25c19165	Add a new reduction pattern match Adding a new reduction pattern match for vectorizing code similar to TSVC s3111: for (int i = 0; i < N; i++) if (a[i] > b) sum += a[i]; This patch adds support for fadd, fsub and fmull, as well as multiple branches and different (but compatible) instructions (ex. add+sub) in different branches. The difference from the previous patch(https://reviews.llvm.org/D49168) is as follows: - Added check of fast-math property of fp-instruction to the previous patch - Fix/add some pattern for if-reduction.ll Differential Revision: https://reviews.llvm.org/D54464 Patch by Takahiro Miyoshi <takahiro.miyoshi@linaro.org> and Masakazu Ueno <masakazu.ueno@linaro.org> git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@347989 91177308-0d34-0410-b5e6-96231b3b80d8	2018-11-30 13:40:10 +00:00
Warren Ristow	28ade4e6f8	[SCEV] Guard movement of insertion point for loop-invariants r320789 suppressed moving the insertion point of SCEV expressions with dev/rem operations to the loop header in non-loop-invariant situations. This, and similar, hoisting is also unsafe in the loop-invariant case, since there may be a guard against a zero denominator. This is an adjustment to the fix of r320789 to suppress the movement even in the loop-invariant case. This fixes PR30806. Differential Revision: https://reviews.llvm.org/D54713 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@347934 91177308-0d34-0410-b5e6-96231b3b80d8	2018-11-30 00:02:54 +00:00
Martin Storsjo	b8e45d4727	Revert "[LICM] Enable control flow hoisting by default" and "[LICM] Reapply r347190 "Make LICM able to hoist phis" with fix" This reverts commits r347776 and r347778. The first one, r347776, caused significant compile time regressions for certain input files, see PR39836 for details. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@347867 91177308-0d34-0410-b5e6-96231b3b80d8	2018-11-29 14:39:39 +00:00
John Brawn	3ed031b2c7	[LICM] Enable control flow hoisting by default Differential Revision: https://reviews.llvm.org/D54949 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@347778 91177308-0d34-0410-b5e6-96231b3b80d8	2018-11-28 17:23:03 +00:00
Joel Jones	9951b07907	Revert unapproved commit git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@347511 91177308-0d34-0410-b5e6-96231b3b80d8	2018-11-24 07:26:55 +00:00
Joel Jones	8ae5719795	[AArch64] Enable libm vectorized functions via SLEEF This changeset is modeled after Intel's submission for SVML. It enables trigonometry functions vectorization via SLEEF: http://sleef.org/. * A new vectorization library enum is added to TargetLibraryInfo.h: SLEEF. * A new option is added to TargetLibraryInfoImpl - ClVectorLibrary: SLEEF. * A comprehensive test case is included in this changeset. * In a separate changeset (for clang), a new vectorization library argument is added to -fveclib: -fveclib=SLEEF. Trigonometry functions that are vectorized by sleef: acos asin atan atanh cos cosh exp exp2 exp10 lgamma log10 log2 log sin sinh sqrt tan tanh tgamma Patch by Stefan Teleman Differential Revision: https://reviews.llvm.org/D53927 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@347510 91177308-0d34-0410-b5e6-96231b3b80d8	2018-11-24 06:41:39 +00:00
Benjamin Kramer	e0270602c3	Revert "[LICM] Make LICM able to hoist phis" This reverts commit r347190. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@347225 91177308-0d34-0410-b5e6-96231b3b80d8	2018-11-19 16:51:57 +00:00
Anna Thomas	0048f91a87	[LV] Avoid vectorizing unsafe dependencies in uniform address Summary: Currently, when vectorizing stores to uniform addresses, the only instance we prevent vectorization is if there are multiple stores to the same uniform address causing an unsafe dependency. This patch teaches LAA to avoid vectorizing loops that have an unsafe cross-iteration dependency between a load and a store to the same uniform address. Fixes PR39653. Reviewers: Ayal, efriedma Subscribers: rkruppe, llvm-commits Differential Revision: https://reviews.llvm.org/D54538 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@347220 91177308-0d34-0410-b5e6-96231b3b80d8	2018-11-19 15:39:59 +00:00
John Brawn	ae7ddf0c35	[LICM] Make LICM able to hoist phis The general approach taken is to make note of loop invariant branches, then when we see something conditional on that branch, such as a phi, we create a copy of the branch and (empty versions of) its successors and hoist using that. This has no impact by itself that I've been able to see, as LICM typically doesn't see such phis as they will have been converted into selects by the time LICM is run, but once we start doing phi-to-select conversion later it will be important. Differential Revision: https://reviews.llvm.org/D52827 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@347190 91177308-0d34-0410-b5e6-96231b3b80d8	2018-11-19 11:31:24 +00:00
Simon Pilgrim	bb61c72405	[CostModel] Add more realistic SK_InsertSubvector generic costs. Instead of defaulting to a cost = 1, expand to element extract/insert like we do for other shuffles. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@346662 91177308-0d34-0410-b5e6-96231b3b80d8	2018-11-12 15:20:24 +00:00
Sanjay Patel	aeeabb506a	[VectorUtils] add funnel-shifts to the list of vectorizable intrinsics This just identifies the intrinsics as candidates for vectorization. It does not mean we will attempt to vectorize under normal conditions (the test file is forcing vectorization). The cost model must be fixed to show that the transform is profitable in general. Allowing vectorization with these intrinsics is required to avoid potential regressions from canonicalizing to the intrinsics from generic IR: https://bugs.llvm.org/show_bug.cgi?id=37417 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@346661 91177308-0d34-0410-b5e6-96231b3b80d8	2018-11-12 15:20:14 +00:00
Sanjay Patel	af76058428	[LoopVectorize] add tests for funnel shifts; NFC git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@346658 91177308-0d34-0410-b5e6-96231b3b80d8	2018-11-12 14:52:01 +00:00
Jonas Paulsson	7e36a98252	[SystemZ] Rework getInterleavedMemoryOpCost() Model this function more closely after the BasicTTIImpl version, with separate handling of loads and stores. For loads, the set of actually loaded vectors is checked. This makes it more readable and just slightly more accurate generally. Review: Ulrich Weigand https://reviews.llvm.org/D53071 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@345998 91177308-0d34-0410-b5e6-96231b3b80d8	2018-11-02 17:15:36 +00:00
Ayal Zaks	6d0de65682	[LV] Avoid vectorizing loops under opt for size that involve SCEV checks Fix PR39417, PR39497 The loop vectorizer may generate runtime SCEV checks for overflow and stride==1 cases, leading to execution of original scalar loop. The latter is forbidden when optimizing for size. An assert introduced in r344743 triggered the above PR's showing it does happen. This patch fixes this behavior by preventing vectorization in such cases. Differential Revision: https://reviews.llvm.org/D53612 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@345959 91177308-0d34-0410-b5e6-96231b3b80d8	2018-11-02 09:16:12 +00:00
Dorit Nuzman	06bac6c858	[LV] Support vectorization of interleave-groups that require an epilog under optsize using masked wide loads Under Opt for Size, the vectorizer does not vectorize interleave-groups that have gaps at the end of the group (such as a loop that reads only the even elements: a[2*i]) because that implies that we'll require a scalar epilogue (which is not allowed under Opt for Size). This patch extends the support for masked-interleave-groups (introduced by D53011 for conditional accesses) to also cover the case of gaps in a group of loads; Targets that enable the masked-interleave-group feature don't have to invalidate interleave-groups of loads with gaps; they could now use masked wide-loads and shuffles (if that's what the cost model selects). Reviewers: Ayal, hsaito, dcaballe, fhahn Reviewed By: Ayal Differential Revision: https://reviews.llvm.org/D53668 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@345705 91177308-0d34-0410-b5e6-96231b3b80d8	2018-10-31 09:57:56 +00:00
Jonas Paulsson	ccd4f446eb	[LoopVectorizer] Fix for cost values of memory accesses. This commit is a combination of two patches: * "Fix in getScalarizationOverhead()" If target returns false in TTI.prefersVectorizedAddressing(), it means the address registers will not need to be extracted. Therefore, there should be no operands scalarization overhead for a load instruction. * "Don't pass the instruction pointer from getMemInstScalarizationCost." Since VF is always > 1, this is a cost query for an instruction in the vectorized loop and it should not be evaluated within the scalar context of the instruction. Review: Ulrich Weigand, Hal Finkel https://reviews.llvm.org/D52351 https://reviews.llvm.org/D52417 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@345603 91177308-0d34-0410-b5e6-96231b3b80d8	2018-10-30 14:34:15 +00:00
Renato Golin	a8bdd2f238	Revert r344172: [LV] Add a new reduction pattern match This patch has caused fast-math issues in the reduction pattern. Will re-work and land again. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@345465 91177308-0d34-0410-b5e6-96231b3b80d8	2018-10-27 22:13:43 +00:00
Simon Pilgrim	8a10b6b077	[CostModel][X86] Add realistic vXi64 uitofp vXf64 costs Match codegen improvements from D53649/rL345256 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@345263 91177308-0d34-0410-b5e6-96231b3b80d8	2018-10-25 13:06:20 +00:00
Dorit Nuzman	63aae622d8	[LV] Don't have fold-tail under optsize invalidate interleave-groups when masked-interleaving is enabled Enable interleave-groups under fold-tail scenario for Opt for size compilation; D50480 added support for vectorizing loops of arbitrary trip-count without a remiander, which in turn makes everything in the loop conditional, including interleave-groups if any. It therefore invalidated all interleave-groups because we didn't have support for vectorizing predicated interleaved-groups at the time. In the meantime, D53011 introduced this support, so we don't have to invalidate interleave-groups when masked-interleaved support is enabled. Reviewers: Ayal, hsaito, dcaballe, fhahn Reviewed By: hsaito Differential Revision: https://reviews.llvm.org/D53559 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@345115 91177308-0d34-0410-b5e6-96231b3b80d8	2018-10-24 07:11:38 +00:00
Sanjay Patel	f46dd75b53	[InstCombine] use 'match' to handle vectors and simplify code This is another step towards completely removing the fake binop queries for not/neg/fneg. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@345036 91177308-0d34-0410-b5e6-96231b3b80d8	2018-10-23 15:05:12 +00:00
Dorit Nuzman	c7a8ddb849	[IAI,LV] Avoid creating a scalar epilogue due to gaps in interleave-groups when optimizing for size LV is careful to respect -Os and not to create a scalar epilog in all cases (runtime tests, trip-counts that require a remainder loop) except for peeling due to gaps in interleave-groups. This patch fixes that; -Os will now have us invalidate such interleave-groups and vectorize without an epilog. The patch also removes a related FIXME comment that is now obsolete, and was also inaccurate: "FIXME: return None if loop requiresScalarEpilog(<MaxVF>), or look for a smaller MaxVF that does not require a scalar epilog." (requiresScalarEpilog() has nothing to do with VF). Reviewers: Ayal, hsaito, dcaballe, fhahn Reviewed By: Ayal Differential Revision: https://reviews.llvm.org/D53420 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@344883 91177308-0d34-0410-b5e6-96231b3b80d8	2018-10-22 06:17:09 +00:00
Thomas Lively	97a6779252	[LoopVectorize] Loop vectorization for minimum and maximum Summary: Depends on D52766. Reviewers: aheejin, dschuff Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D52767 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@344816 91177308-0d34-0410-b5e6-96231b3b80d8	2018-10-19 21:11:43 +00:00
Ayal Zaks	9e75857c92	[LV] Fold tail by masking to vectorize loops of arbitrary trip count under opt for size When optimizing for size, a loop is vectorized only if the resulting vector loop completely replaces the original scalar loop. This holds if no runtime guards are needed, if the original trip-count TC does not overflow, and if TC is a known constant that is a multiple of the VF. The last two TC-related conditions can be overcome by 1. rounding the trip-count of the vector loop up from TC to a multiple of VF; 2. masking the vector body under a newly introduced "if (i <= TC-1)" condition. The patch allows loops with arbitrary trip counts to be vectorized under -Os, subject to the existing cost model considerations. It also applies to loops with small trip counts (under -O2) which are currently handled as if under -Os. The patch does not handle loops with reductions, live-outs, or w/o a primary induction variable, and disallows interleave groups. (Third, final and main part of -) Differential Revision: https://reviews.llvm.org/D50480 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@344743 91177308-0d34-0410-b5e6-96231b3b80d8	2018-10-18 15:03:15 +00:00
Anna Thomas	c2874102cb	[LV] Teach vectorizer about variant value store into uniform address Summary: Teach vectorizer about vectorizing variant value stores to uniform address. Similar to rL343028, we do not allow vectorization if we have multiple stores to the same uniform address. Cost model already has the change for considering the extract instruction cost for a variant value store. See added test cases for how vectorization is done. The patch also contains changes to the ORE messages. Reviewers: Ayal, mkuper, anemet, hsaito Subscribers: rkruppe, llvm-commits Differential Revision: https://reviews.llvm.org/D52656 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@344613 91177308-0d34-0410-b5e6-96231b3b80d8	2018-10-16 15:46:26 +00:00
Ayal Zaks	dacda52aca	[LV] Add test checks when vectorizing loops under opt for size; NFC Landing this as a separate part of https://reviews.llvm.org/D50480, recording current behavior more accurately, to clarify subsequent diff ([LV] Vectorizing loops of arbitrary trip count without remainder under opt for size). git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@344606 91177308-0d34-0410-b5e6-96231b3b80d8	2018-10-16 14:25:02 +00:00
Dorit Nuzman	7d7250490b	recommit 344472 after fixing build failure on ARM and PPC. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@344475 91177308-0d34-0410-b5e6-96231b3b80d8	2018-10-14 08:50:06 +00:00
Dorit Nuzman	473da03560	revert 344472 due to failures. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@344473 91177308-0d34-0410-b5e6-96231b3b80d8	2018-10-14 07:21:20 +00:00
Dorit Nuzman	a3ff03e8e2	[IAI,LV] Add support for vectorizing predicated strided accesses using masked interleave-group The vectorizer currently does not attempt to create interleave-groups that contain predicated loads/stores; predicated strided accesses can currently be vectorized only using masked gather/scatter or scalarization. This patch makes predicated loads/stores candidates for forming interleave-groups during the Loop-Vectorizer's analysis, and adds the proper support for masked-interleave- groups to the Loop-Vectorizer's planning and transformation stages. The patch also extends the TTI API to allow querying the cost of masked interleave groups (which each target can control); Targets that support masked vector loads/ stores may choose to enable this feature and allow vectorizing predicated strided loads/stores using masked wide loads/stores and shuffles. Reviewers: Ayal, hsaito, dcaballe, fhahn, javed.absar Reviewed By: Ayal Differential Revision: https://reviews.llvm.org/D53011 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@344472 91177308-0d34-0410-b5e6-96231b3b80d8	2018-10-14 07:06:16 +00:00
Sanjay Patel	66a2c5ecaa	[InstCombine] reverse 'trunc X to <N x i1>' canonicalization; 2nd try Re-trying r344082 because it unintentionally included extra diffs. Original commit message: icmp ne (and X, 1), 0 --> trunc X to N x i1 Ideally, we'd do the same for scalars, but there will likely be regressions unless we add more trunc folds as we're doing here for vectors. The motivating vector case is from PR37549: https://bugs.llvm.org/show_bug.cgi?id=37549 define <4 x float> @bitwise_select(<4 x float> %x, <4 x float> %y, <4 x float> %z, <4 x float> %w) { %c = fcmp ole <4 x float> %x, %y %s = sext <4 x i1> %c to <4 x i32> %s1 = shufflevector <4 x i32> %s, <4 x i32> undef, <4 x i32> <i32 0, i32 0, i32 1, i32 1> %s2 = shufflevector <4 x i32> %s, <4 x i32> undef, <4 x i32> <i32 2, i32 2, i32 3, i32 3> %cond = or <4 x i32> %s1, %s2 %condtr = trunc <4 x i32> %cond to <4 x i1> %r = select <4 x i1> %condtr, <4 x float> %z, <4 x float> %w ret <4 x float> %r } Here's a sampling of the vector codegen for that case using mask+icmp (current behavior) vs. trunc (with this patch): AVX before: vcmpleps %xmm1, %xmm0, %xmm0 vpermilps $80, %xmm0, %xmm1 ## xmm1 = xmm0[0,0,1,1] vpermilps $250, %xmm0, %xmm0 ## xmm0 = xmm0[2,2,3,3] vorps %xmm0, %xmm1, %xmm0 vandps LCPI0_0(%rip), %xmm0, %xmm0 vxorps %xmm1, %xmm1, %xmm1 vpcmpeqd %xmm1, %xmm0, %xmm0 vblendvps %xmm0, %xmm3, %xmm2, %xmm0 AVX after: vcmpleps %xmm1, %xmm0, %xmm0 vpermilps $80, %xmm0, %xmm1 ## xmm1 = xmm0[0,0,1,1] vpermilps $250, %xmm0, %xmm0 ## xmm0 = xmm0[2,2,3,3] vorps %xmm0, %xmm1, %xmm0 vblendvps %xmm0, %xmm2, %xmm3, %xmm0 AVX512f before: vcmpleps %xmm1, %xmm0, %xmm0 vpermilps $80, %xmm0, %xmm1 ## xmm1 = xmm0[0,0,1,1] vpermilps $250, %xmm0, %xmm0 ## xmm0 = xmm0[2,2,3,3] vorps %xmm0, %xmm1, %xmm0 vpbroadcastd LCPI0_0(%rip), %xmm1 ## xmm1 = [1,1,1,1] vptestnmd %zmm1, %zmm0, %k1 vblendmps %zmm3, %zmm2, %zmm0 {%k1} AVX512f after: vcmpleps %xmm1, %xmm0, %xmm0 vpermilps $80, %xmm0, %xmm1 ## xmm1 = xmm0[0,0,1,1] vpermilps $250, %xmm0, %xmm0 ## xmm0 = xmm0[2,2,3,3] vorps %xmm0, %xmm1, %xmm0 vpslld $31, %xmm0, %xmm0 vptestmd %zmm0, %zmm0, %k1 vblendmps %zmm2, %zmm3, %zmm0 {%k1} AArch64 before: fcmge v0.4s, v1.4s, v0.4s zip1 v1.4s, v0.4s, v0.4s zip2 v0.4s, v0.4s, v0.4s orr v0.16b, v1.16b, v0.16b movi v1.4s, #1 and v0.16b, v0.16b, v1.16b cmeq v0.4s, v0.4s, #0 bsl v0.16b, v3.16b, v2.16b AArch64 after: fcmge v0.4s, v1.4s, v0.4s zip1 v1.4s, v0.4s, v0.4s zip2 v0.4s, v0.4s, v0.4s orr v0.16b, v1.16b, v0.16b bsl v0.16b, v2.16b, v3.16b PowerPC-le before: xvcmpgesp 34, 35, 34 vspltisw 0, 1 vmrglw 3, 2, 2 vmrghw 2, 2, 2 xxlor 0, 35, 34 xxlxor 35, 35, 35 xxland 34, 0, 32 vcmpequw 2, 2, 3 xxsel 34, 36, 37, 34 PowerPC-le after: xvcmpgesp 34, 35, 34 vmrglw 3, 2, 2 vmrghw 2, 2, 2 xxlor 0, 35, 34 xxsel 34, 37, 36, 0 Differential Revision: https://reviews.llvm.org/D52747 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@344181 91177308-0d34-0410-b5e6-96231b3b80d8	2018-10-10 20:47:46 +00:00
Sanjay Patel	9f5daa2df0	revert r344082: [InstCombine] reverse 'trunc X to <N x i1>' canonicalization This commit accidentally included the diffs from D53057. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@344178 91177308-0d34-0410-b5e6-96231b3b80d8	2018-10-10 20:39:39 +00:00
Renato Golin	ca0c32a3b8	[LV] Add a new reduction pattern match Adding a new reduction pattern match for vectorizing code similar to TSVC s3111: for (int i = 0; i < N; i++) if (a[i] > b) sum += a[i]; This patch adds support for fadd, fsub and fmull, as well as multiple branches and different (but compatible) instructions (ex. add+sub) in different branches. I have forwarded to trunk, added fsub and fmul functionality and additional tests, but the credit goes to Takahiro, who did most of the actual work. Differential Revision: https://reviews.llvm.org/D49168 Patch by Takahiro Miyoshi <takahiro.miyoshi@linaro.org>. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@344172 91177308-0d34-0410-b5e6-96231b3b80d8	2018-10-10 18:49:49 +00:00
Justin Bogner	a219e952b7	[LV] Move test for r343954 into x86 subdirectory This test uses an x86 triple, so it needs to be in the x86 specific test directory. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@344087 91177308-0d34-0410-b5e6-96231b3b80d8	2018-10-09 22:40:04 +00:00
Sanjay Patel	1dd3c06445	[InstCombine] reverse 'trunc X to <N x i1>' canonicalization icmp ne (and X, 1), 0 --> trunc X to N x i1 Ideally, we'd do the same for scalars, but there will likely be regressions unless we add more trunc folds as we're doing here for vectors. The motivating vector case is from PR37549: https://bugs.llvm.org/show_bug.cgi?id=37549 define <4 x float> @bitwise_select(<4 x float> %x, <4 x float> %y, <4 x float> %z, <4 x float> %w) { %c = fcmp ole <4 x float> %x, %y %s = sext <4 x i1> %c to <4 x i32> %s1 = shufflevector <4 x i32> %s, <4 x i32> undef, <4 x i32> <i32 0, i32 0, i32 1, i32 1> %s2 = shufflevector <4 x i32> %s, <4 x i32> undef, <4 x i32> <i32 2, i32 2, i32 3, i32 3> %cond = or <4 x i32> %s1, %s2 %condtr = trunc <4 x i32> %cond to <4 x i1> %r = select <4 x i1> %condtr, <4 x float> %z, <4 x float> %w ret <4 x float> %r } Here's a sampling of the vector codegen for that case using mask+icmp (current behavior) vs. trunc (with this patch): AVX before: vcmpleps %xmm1, %xmm0, %xmm0 vpermilps $80, %xmm0, %xmm1 ## xmm1 = xmm0[0,0,1,1] vpermilps $250, %xmm0, %xmm0 ## xmm0 = xmm0[2,2,3,3] vorps %xmm0, %xmm1, %xmm0 vandps LCPI0_0(%rip), %xmm0, %xmm0 vxorps %xmm1, %xmm1, %xmm1 vpcmpeqd %xmm1, %xmm0, %xmm0 vblendvps %xmm0, %xmm3, %xmm2, %xmm0 AVX after: vcmpleps %xmm1, %xmm0, %xmm0 vpermilps $80, %xmm0, %xmm1 ## xmm1 = xmm0[0,0,1,1] vpermilps $250, %xmm0, %xmm0 ## xmm0 = xmm0[2,2,3,3] vorps %xmm0, %xmm1, %xmm0 vblendvps %xmm0, %xmm2, %xmm3, %xmm0 AVX512f before: vcmpleps %xmm1, %xmm0, %xmm0 vpermilps $80, %xmm0, %xmm1 ## xmm1 = xmm0[0,0,1,1] vpermilps $250, %xmm0, %xmm0 ## xmm0 = xmm0[2,2,3,3] vorps %xmm0, %xmm1, %xmm0 vpbroadcastd LCPI0_0(%rip), %xmm1 ## xmm1 = [1,1,1,1] vptestnmd %zmm1, %zmm0, %k1 vblendmps %zmm3, %zmm2, %zmm0 {%k1} AVX512f after: vcmpleps %xmm1, %xmm0, %xmm0 vpermilps $80, %xmm0, %xmm1 ## xmm1 = xmm0[0,0,1,1] vpermilps $250, %xmm0, %xmm0 ## xmm0 = xmm0[2,2,3,3] vorps %xmm0, %xmm1, %xmm0 vpslld $31, %xmm0, %xmm0 vptestmd %zmm0, %zmm0, %k1 vblendmps %zmm2, %zmm3, %zmm0 {%k1} AArch64 before: fcmge v0.4s, v1.4s, v0.4s zip1 v1.4s, v0.4s, v0.4s zip2 v0.4s, v0.4s, v0.4s orr v0.16b, v1.16b, v0.16b movi v1.4s, #1 and v0.16b, v0.16b, v1.16b cmeq v0.4s, v0.4s, #0 bsl v0.16b, v3.16b, v2.16b AArch64 after: fcmge v0.4s, v1.4s, v0.4s zip1 v1.4s, v0.4s, v0.4s zip2 v0.4s, v0.4s, v0.4s orr v0.16b, v1.16b, v0.16b bsl v0.16b, v2.16b, v3.16b PowerPC-le before: xvcmpgesp 34, 35, 34 vspltisw 0, 1 vmrglw 3, 2, 2 vmrghw 2, 2, 2 xxlor 0, 35, 34 xxlxor 35, 35, 35 xxland 34, 0, 32 vcmpequw 2, 2, 3 xxsel 34, 36, 37, 34 PowerPC-le after: xvcmpgesp 34, 35, 34 vmrglw 3, 2, 2 vmrghw 2, 2, 2 xxlor 0, 35, 34 xxsel 34, 37, 36, 0 Differential Revision: https://reviews.llvm.org/D52747 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@344082 91177308-0d34-0410-b5e6-96231b3b80d8	2018-10-09 21:26:01 +00:00
Max Kazantsev	eb3fd4c932	[LV] Do not create SCEVs on broken IR in emitTransformedIndex. PR39160 At the point when we perform `emitTransformedIndex`, we have a broken IR (in particular, we have Phis for which not every incoming value is properly set). On such IR, it is illegal to create SCEV expressions, because their internal simplification process may try to prove some predicates and break when it stumbles across some broken IR. The only purpose of using SCEV in this particular place is attempt to simplify the generated code slightly. It seems that the result isn't worth it, because some trivial cases (like addition of zero and multiplication by 1) can be handled separately if needed, but more generally InstCombine is able to achieve the goals we want to achieve by using SCEV. This patch fixes a functional crash described in PR39160, and as side-effect it also generates a bit smarter code in some simple cases. It also may cause some optimality loss (i.e. we will now generate `mul` by power of `2` instead of shift etc), but there is nothing what InstCombine could not handle later. In case of dire need, we can support more trivial cases just in place. Note that this patch only fixes one particular case of the general problem that LV misuses SCEV, attempting to create SCEVs or prove predicates on invalid IR. The general solution, however, seems complex enough. Differential Revision: https://reviews.llvm.org/D52881 Reviewed By: fhahn, hsaito git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@343954 91177308-0d34-0410-b5e6-96231b3b80d8	2018-10-08 05:46:29 +00:00
Dorit Nuzman	2e1904606d	[IAI,LV] Avoid creating interleave-groups for predicated accesse This patch fixes PR39099. When strided loads are predicated, each of them will form an interleaved-group (with gaps). However, subsequent stages of vectorization (planning and transformation) assume that if a load is part of an Interleave-Group it is not predicated, resulting in wrong code - unmasked wide loads are created. The Interleaving Analysis does take care not to have conditional interleave groups of size > 1, but until we extend the planning and transformation stages to support masked-interleave-groups we should also avoid having them for size == 1. Reviewers: Ayal, hsaito, dcaballe, fhahn Reviewed By: Ayal Differential Revision: https://reviews.llvm.org/D52682 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@343931 91177308-0d34-0410-b5e6-96231b3b80d8	2018-10-07 06:57:25 +00:00
Anna Thomas	edafc389f1	[LV][LAA] Vectorize loop invariant values stored into loop invariant address Summary: We are overly conservative in loop vectorizer with respect to stores to loop invariant addresses. More details in https://bugs.llvm.org/show_bug.cgi?id=38546 This is the first part of the fix where we start with vectorizing loop invariant values to loop invariant addresses. This also includes changes to ORE for stores to invariant address. Reviewers: anemet, Ayal, mkuper, mssimpso Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D50665 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@343028 91177308-0d34-0410-b5e6-96231b3b80d8	2018-09-25 20:57:20 +00:00

1 2 3 4 5 ...

779 Commits