archived-llvm

mirror of https://github.com/RPCS3/llvm.git synced 2026-01-31 01:25:19 +01:00

Author	SHA1	Message	Date
Orlando Cazalet-Hyams	b1ba8a93bc	[DebugInfo@O2][LoopVectorize] pr39024: Vectorized code linenos step through loop even after completion Summary: Bug: https://bugs.llvm.org/show_bug.cgi?id=39024 The bug reports that a vectorized loop is stepped through 4 times and each step through the loop seemed to show a different path. I found two problems here: A) An incorrect line number on a preheader block (for.body.preheader) instruction causes a step into the loop before it begins. B) Instructions in the middle block have different line numbers which give the impression of another iteration. In this patch I give all of the middle block instructions the line number of the scalar loop latch terminator branch. This seems to provide the smoothest debugging experience because the vectorized loops will always end on this line before dropping into the scalar loop. To solve problem A I have altered llvm::SplitBlockPredecessors to accommodate loop header blocks. I have set up a separate review D61933 for a fix which is required for this patch. Reviewers: samsonov, vsk, aprantl, probinson, anemet, hfinkel, jmorse Reviewed By: hfinkel, jmorse Subscribers: jmorse, javed.absar, eraman, kcc, bjope, jmellorcrummey, hfinkel, gbedwell, hiraditya, zzheng, llvm-commits Tags: #llvm, #debug-info Differential Revision: https://reviews.llvm.org/D60831 llvm-svn: 363046 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@363786 91177308-0d34-0410-b5e6-96231b3b80d8	2019-06-19 10:50:47 +00:00
Warren Ristow	31868b92df	[LV] Suppress vectorization in some nontemporal cases When considering a loop containing nontemporal stores or loads for vectorization, suppress the vectorization if the corresponding vectorized store or load with the aligment of the original scaler memory op is not supported with the nontemporal hint on the target. This adds two new functions: bool isLegalNTStore(Type DataType, unsigned Alignment) const; bool isLegalNTLoad(Type DataType, unsigned Alignment) const; to TTI, leaving the target independent default implementation as returning true, but with overriding implementations for X86 that check the legality based on available Subtarget features. This fixes https://llvm.org/PR40759 Differential Revision: https://reviews.llvm.org/D61764 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@363581 91177308-0d34-0410-b5e6-96231b3b80d8	2019-06-17 17:20:08 +00:00
Bjorn Pettersson	784000b4c2	[LV] Deny irregular types in interleavedAccessCanBeWidened Summary: Avoid that loop vectorizer creates loads/stores of vectors with "irregular" types when interleaving. An example of an irregular type is x86_fp80 that is 80 bits, but that may have an allocation size that is 96 bits. So an array of x86_fp80 is not bitcast compatible with a vector of the same type. Not sure if interleavedAccessCanBeWidened is the best place for this check, but it solves the problem seen in the added test case. And it is the same kind of check that already exists in memoryInstructionCanBeWidened. Reviewers: fhahn, Ayal, craig.topper Reviewed By: fhahn Subscribers: hiraditya, rkruppe, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D63386 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@363547 91177308-0d34-0410-b5e6-96231b3b80d8	2019-06-17 12:02:24 +00:00
Fangrui Song	1002960b9d	[lit] Delete empty lines at the end of lit.local.cfg NFC git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@363538 91177308-0d34-0410-b5e6-96231b3b80d8	2019-06-17 09:51:07 +00:00
Sam Parker	d021f415d1	[SCEV] Pass NoWrapFlags when expanding an AddExpr InsertBinop now accepts NoWrapFlags, so pass them through when expanding a simple add expression. This is the first re-commit of the functional changes from rL362687, which was previously reverted. Differential Revision: https://reviews.llvm.org/D61934 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@363364 91177308-0d34-0410-b5e6-96231b3b80d8	2019-06-14 09:19:41 +00:00
Orlando Cazalet-Hyams	f36c1c031e	Revert "[DebugInfo@O2][LoopVectorize] pr39024: Vectorized code linenos step through loop even after completion" This reverts commit 1a0f7a2077b70c9864faa476e15b048686cf1ca7. See phabricator thread for D60831. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@363132 91177308-0d34-0410-b5e6-96231b3b80d8	2019-06-12 08:34:51 +00:00
Orlando Cazalet-Hyams	6c60c71a4e	[DebugInfo@O2][LoopVectorize] pr39024: Vectorized code linenos step through loop even after completion Summary: Bug: https://bugs.llvm.org/show_bug.cgi?id=39024 The bug reports that a vectorized loop is stepped through 4 times and each step through the loop seemed to show a different path. I found two problems here: A) An incorrect line number on a preheader block (for.body.preheader) instruction causes a step into the loop before it begins. B) Instructions in the middle block have different line numbers which give the impression of another iteration. In this patch I give all of the middle block instructions the line number of the scalar loop latch terminator branch. This seems to provide the smoothest debugging experience because the vectorized loops will always end on this line before dropping into the scalar loop. To solve problem A I have altered llvm::SplitBlockPredecessors to accommodate loop header blocks. I have set up a separate review D61933 for a fix which is required for this patch. Reviewers: samsonov, vsk, aprantl, probinson, anemet, hfinkel, jmorse Reviewed By: hfinkel, jmorse Subscribers: jmorse, javed.absar, eraman, kcc, bjope, jmellorcrummey, hfinkel, gbedwell, hiraditya, zzheng, llvm-commits Tags: #llvm, #debug-info Differential Revision: https://reviews.llvm.org/D60831 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@363046 91177308-0d34-0410-b5e6-96231b3b80d8	2019-06-11 10:37:20 +00:00
Benjamin Kramer	600f7b5a8d	Revert "[SCEV] Use wrap flags in InsertBinop" This reverts commit r362687. Miscompiles llvm-profdata during selfhost. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@362699 91177308-0d34-0410-b5e6-96231b3b80d8	2019-06-06 12:35:46 +00:00
Sam Parker	6892c31cf9	[SCEV] Use wrap flags in InsertBinop If the given SCEVExpr has no (un)signed flags attached to it, transfer these to the resulting instruction or use them to find an existing instruction. Differential Revision: https://reviews.llvm.org/D61934 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@362687 91177308-0d34-0410-b5e6-96231b3b80d8	2019-06-06 08:56:26 +00:00
Simon Pilgrim	3748537115	[CostModel][X86] Improve masked load/store AVX1/AVX2 costs A mixture of internal tests and review of the scheduler models indicates we're overestimating the cost of a masked load, which we're estimating at 4x regular memory ops - more realistic values indicates that its closer to 2x. Masked stores costs are a lot more diverse but 8x is roughly in the middle of the range. e.g. SandyBridge defm : X86WriteRes<WriteFMaskedLoad, [SBPort23,SBPort05], 8, [1,2], 3>; defm : X86WriteRes<WriteFMaskedLoadY, [SBPort23,SBPort05], 9, [1,2], 3>; defm : X86WriteRes<WriteFMaskedStore, [SBPort4,SBPort01,SBPort23], 5, [1,1,1], 3>; defm : X86WriteRes<WriteFMaskedStoreY, [SBPort4,SBPort01,SBPort23], 5, [1,1,1], 3>; e.g. Btver2 defm : X86WriteRes<WriteFMaskedLoad, [JLAGU, JFPU01, JFPX], 6, [1, 2, 2], 1>; defm : X86WriteRes<WriteFMaskedLoadY, [JLAGU, JFPU01, JFPX], 6, [2, 4, 4], 2>; defm : X86WriteRes<WriteFMaskedStore, [JSAGU, JFPU01, JFPX], 6, [1, 1, 4], 1>; defm : X86WriteRes<WriteFMaskedStoreY, [JSAGU, JFPU01, JFPX], 6, [2, 2, 4], 2>; Differential Revision: https://reviews.llvm.org/D61257 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@362338 91177308-0d34-0410-b5e6-96231b3b80d8	2019-06-02 20:37:02 +00:00
Craig Topper	8c0b3da9d4	[LoopVectorize] Add FNeg instruction support Differential Revision: https://reviews.llvm.org/D62510 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@362124 91177308-0d34-0410-b5e6-96231b3b80d8	2019-05-30 18:19:35 +00:00
Craig Topper	13480b8846	[LoopVectorize] Precommit tests for D62510. NFC git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@362060 91177308-0d34-0410-b5e6-96231b3b80d8	2019-05-30 06:48:13 +00:00
Sanjay Patel	1a7c631eb2	[LoopVectorizer] fix test file to not run the entire -O3 pipeline This test file has a long history of edits from changes outside of vectorization, and it would happen again with the proposal in D61726. End-to-end testing shouldn't be happening in a test file that is specifically checking for vector masked load/store ops. Larger-scale testing goes in PhaseOrdering or the test-suite. I've hopefully preserved the intent by taking what was completely unoptimized IR in some tests and passing that through the -O1 pipeline. That becomes the input IR, and now we just run the loop vectorizer and verify that the vector masked ops are produced as expected. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@360340 91177308-0d34-0410-b5e6-96231b3b80d8	2019-05-09 13:43:22 +00:00
Kostya Serebryany	53dcce4331	revert r360162 as it breaks most of the buildbots git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@360190 91177308-0d34-0410-b5e6-96231b3b80d8	2019-05-07 20:57:11 +00:00
Orlando Cazalet-Hyams	8c101025dc	[DebugInfo@O2][LoopVectorize] pr39024: Vectorized code linenos step through loop even after completion Summary: Bug: https://bugs.llvm.org/show_bug.cgi?id=39024 The bug reports that a vectorized loop is stepped through 4 times and each step through the loop seemed to show a different path. I found two problems here: A) An incorrect line number on a preheader block (for.body.preheader) instruction causes a step into the loop before it begins. B) Instructions in the middle block have different line numbers which give the impression of another iteration. In this patch I give all of the middle block instructions the line number of the scalar loop latch terminator branch. This seems to provide the smoothest debugging experience because the vectorized loops will always end on this line before dropping into the scalar loop. To solve problem A I have altered llvm::SplitBlockPredecessors to accommodate loop header blocks. Reviewers: samsonov, vsk, aprantl, probinson, anemet, hfinkel Reviewed By: hfinkel Subscribers: bjope, jmellorcrummey, hfinkel, gbedwell, hiraditya, zzheng, llvm-commits Tags: #llvm, #debug-info Differential Revision: https://reviews.llvm.org/D60831 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@360162 91177308-0d34-0410-b5e6-96231b3b80d8	2019-05-07 15:37:38 +00:00
Keno Fischer	35a5cc8b49	[SCEV] Add explicit representations of umin/smin Summary: Currently we express umin as `~umax(~x, ~y)`. However, this becomes a problem for operands in non-integral pointer spaces, because `~x` is not something we can compute for `x` non-integral. However, since comparisons are generally still allowed, we are actually able to express `umin(x, y)` directly as long as we don't try to express is as a umax. Support this by adding an explicit umin/smin representation to SCEV. We do this by factoring the existing getUMax/getSMax functions into a new function that does all four. The previous two functions were largely identical. Reviewed By: sanjoy Differential Revision: https://reviews.llvm.org/D50167 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@360159 91177308-0d34-0410-b5e6-96231b3b80d8	2019-05-07 15:28:47 +00:00
Alina Sbirlea	1e33097313	[PassManagerBuilder] Add option for interleaved loops, for loop vectorize. Summary: Match NewPassManager behavior: add option for interleaved loops in the old pass manager, and use that instead of the flag used to disable loop unroll. No changes in the defaults. Reviewers: chandlerc Subscribers: mehdi_amini, jlebar, dmgreen, hsaito, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D61030 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@359615 91177308-0d34-0410-b5e6-96231b3b80d8	2019-04-30 21:29:20 +00:00
Alina Sbirlea	6ac87016ff	Enable LoopVectorization by default. Summary: When refactoring vectorization flags, vectorization was disabled by default in the new pass manager. This patch re-enables is for both managers, and changes the assumptions opt makes, based on the new defaults. Comments in opt.cpp should clarify the intended use of all flags to enable/disable vectorization. Reviewers: chandlerc, jgorbe Subscribers: jlebar, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D61091 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@359167 91177308-0d34-0410-b5e6-96231b3b80d8	2019-04-25 04:49:48 +00:00
Eric Christopher	598198edbc	Revert "Temporarily Revert "Add basic loop fusion pass."" The reversion apparently deleted the test/Transforms directory. Will be re-reverting again. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@358552 91177308-0d34-0410-b5e6-96231b3b80d8	2019-04-17 04:52:47 +00:00
Eric Christopher	02cc44c1b9	Temporarily Revert "Add basic loop fusion pass." As it's causing some bot failures (and per request from kbarton). This reverts commit r358543/ab70da07286e618016e78247e4a24fcb84077fda. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@358546 91177308-0d34-0410-b5e6-96231b3b80d8	2019-04-17 02:12:23 +00:00
Florian Hahn	3259337afd	[VPlan] Determine Vector Width programmatically. With this change, the VPlan native path is triggered with the directive: #pragma clang loop vectorize(enable) There is no need to specify the vectorize_width(N) clause. Patch by Francesco Petrogalli <francesco.petrogalli@arm.com> Differential Revision: https://reviews.llvm.org/D57598 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@357156 91177308-0d34-0410-b5e6-96231b3b80d8	2019-03-28 10:37:12 +00:00
Simon Pilgrim	ce23689b40	[SLPVectorizer] reorderInputsAccordingToOpcode - remove non-Instruction canonicalization Remove attempts to commute non-Instructions to the LHS - the codegen changes appear to rely on chance more than anything else and also have a tendency to fight existing instcombine canonicalization which moves constants to the RHS of commutable binary ops. This is prep work towards: (a) reusing reorderInputsAccordingToOpcode for alt-shuffles and removing the similar reorderAltShuffleOperands (b) improving reordering to optimized cases with commutable and non-commutable instructions to still find splat/consecutive ops. Differential Revision: https://reviews.llvm.org/D59738 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@356913 91177308-0d34-0410-b5e6-96231b3b80d8	2019-03-25 15:53:55 +00:00
Craig Topper	8ff8e3360b	[InstCombine] Don't transform ((C1 OP zext(X)) & C2) -> zext((C1 OP X) & C2) if either zext or OP has another use. If they have other users we'll just end up increasing the instruction count. We might be able to weaken this to only one of them having a single use if we can prove that the and will be removed. Fixes PR41164. Differential Revision: https://reviews.llvm.org/D59630 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@356690 91177308-0d34-0410-b5e6-96231b3b80d8	2019-03-21 17:50:49 +00:00
Nikita Popov	cd6f62dcd4	[ValueTracking] Use computeConstantRange() for unsigned add/sub overflow Improve computeOverflowForUnsignedAdd/Sub in ValueTracking by intersecting the computeConstantRange() result into the ConstantRange created from computeKnownBits(). This allows us to detect some additional never/always overflows conditions that can't be determined from known bits. This revision also adds basic handling for constants to computeConstantRange(). Non-splat vectors will be handled in a followup. The signed case will also be handled in a followup, as it needs some more groundwork. Differential Revision: https://reviews.llvm.org/D59386 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@356489 91177308-0d34-0410-b5e6-96231b3b80d8	2019-03-19 17:53:56 +00:00
Sanjoy Das	2d9ad10711	Reland "Relax constraints for reduction vectorization" Change from original commit: move test (that uses an X86 triple) into the X86 subdirectory. Original description: Gating vectorizing reductions on all fastmath flags seems unnecessary; `reassoc` should be sufficient. Reviewers: tvvikram, mkuper, kristof.beyls, sdesmalen, Ayal Reviewed By: sdesmalen Subscribers: dcaballe, huntergr, jmolloy, mcrosier, jlebar, bixia, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D57728 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@355889 91177308-0d34-0410-b5e6-96231b3b80d8	2019-03-12 01:31:44 +00:00
Florian Hahn	b11b60da2d	[InterleavedAccessAnalysis] Fix integer overflow in insertMember. Without checking for integer overflow, invalid members can be added e.g. if the calculated key overflows, becomes positive and the largest key. This fixes https://bugs.chromium.org/p/oss-fuzz/issues/detail?id=7560 https://bugs.chromium.org/p/oss-fuzz/issues/detail?id=13128 https://bugs.chromium.org/p/oss-fuzz/issues/detail?id=13229 Reviewers: Ayal, anna, hsaito, efriedma Reviewed By: efriedma Differential Revision: https://reviews.llvm.org/D55538 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@355613 91177308-0d34-0410-b5e6-96231b3b80d8	2019-03-07 17:50:16 +00:00
Nikita Popov	29ba81ded0	[ValueTracking] More accurate unsigned sub overflow detection Second part of D58593. Compute precise overflow conditions based on all known bits, rather than just the sign bits. Unsigned a - b overflows iff a < b, and we can determine whether this always/never happens based on the minimal and maximal values achievable for a and b subject to the known bits constraint. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@355109 91177308-0d34-0410-b5e6-96231b3b80d8	2019-02-28 18:04:20 +00:00
Michael Kruse	0bb9e7af42	Refactor setAlreadyUnrolled() and setAlreadyVectorized(). Loop::setAlreadyUnrolled() and LoopVectorizeHints::setLoopAlreadyUnrolled() both add loop metadata that stops the same loop from being transformed multiple times. This patch merges both implementations. In doing so we fix 3 potential issues: * setLoopAlreadyUnrolled() kept the llvm.loop.vectorize/interleave.* metadata even though it will not be used anymore. This already caused problems such as http://llvm.org/PR40546. Change the behavior to the one of setAlreadyUnrolled which deletes this loop metadata. * setAlreadyUnrolled() used to create a new LoopID by calling MDNode::get with nullptr as the first operand, then replacing it by the returned references using replaceOperandWith. It is possible that MDNode::get would instead return an existing node (due to de-duplication) that then gets modified. To avoid, use a fresh TempMDNode that does not get uniqued with anything else before replacing it with replaceOperandWith. * LoopVectorizeHints::matchesHintMetadataName() only compares the suffix of the attribute to set the new value for. That is, when called with "enable", would erase attributes such as "llvm.loop.unroll.enable", "llvm.loop.vectorize.enable" and "llvm.loop.distribute.enable" instead of the one to replace. Fortunately, function was only called with "isvectorized". Differential Revision: https://reviews.llvm.org/D57566 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@353738 91177308-0d34-0410-b5e6-96231b3b80d8	2019-02-11 19:45:44 +00:00
Johannes Doerfert	9e7dd29d66	[ValueTracking] Look through casts when determining non-nullness Bitcast and certain Ptr2Int/Int2Ptr instructions will not alter the value of their operand and can therefore be looked through when we determine non-nullness. Differential Revision: https://reviews.llvm.org/D54956 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@352293 91177308-0d34-0410-b5e6-96231b3b80d8	2019-01-26 23:40:35 +00:00
Simon Pilgrim	21d100aff8	[CostModel][X86] Add explicit vector select costs Prior to SSE41 (and sometimes on AVX1), vector select has to be performed as a ((X & C)\|(Y & ~C)) bit select. Exposes a couple of issues with the min/max reduction costs (which only go down to SSE42 for some reason). The increase pre-SSE41 selection costs also prevent a couple of tests from firing any longer, so I've either tweaked the target or added AVX tests as well to the existing SSE2 tests. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@351685 91177308-0d34-0410-b5e6-96231b3b80d8	2019-01-20 13:55:01 +00:00
Michael Kruse	42a382c204	Introduce llvm.loop.parallel_accesses and llvm.access.group metadata. The current llvm.mem.parallel_loop_access metadata has a problem in that it uses LoopIDs. LoopID unfortunately is not loop identifier. It is neither unique (there's even a regression test assigning the some LoopID to multiple loops; can otherwise happen if passes such as LoopVersioning make copies of entire loops) nor persistent (every time a property is removed/added from a LoopID's MDNode, it will also receive a new LoopID; this happens e.g. when calling Loop::setLoopAlreadyUnrolled()). Since most loop transformation passes change the loop attributes (even if it just to mark that a loop should not be processed again as llvm.loop.isvectorized does, for the versioned and unversioned loop), the parallel access information is lost for any subsequent pass. This patch unlinks LoopIDs and parallel accesses. llvm.mem.parallel_loop_access metadata on instruction is replaced by llvm.access.group metadata. llvm.access.group points to a distinct MDNode with no operands (avoiding the problem to ever need to add/remove operands), called "access group". Alternatively, it can point to a list of access groups. The LoopID then has an attribute llvm.loop.parallel_accesses with all the access groups that are parallel (no dependencies carries by this loop). This intentionally avoid any kind of "ID". Loops that are clones/have their attributes modifies retain the llvm.loop.parallel_accesses attribute. Access instructions that a cloned point to the same access group. It is not necessary for each access to have it's own "ID" MDNode, but those memory access instructions with the same behavior can be grouped together. The behavior of llvm.mem.parallel_loop_access is not changed by this patch, but should be considered deprecated. Differential Revision: https://reviews.llvm.org/D52116 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@349725 91177308-0d34-0410-b5e6-96231b3b80d8	2018-12-20 04:58:07 +00:00
Sanjay Patel	83544a50cc	[LoopVectorize] auto-generate complete checks; NFC The first test claims to show that the vectorizer will generate a vector load/loop, but then this file runs other passes which might scalarize that op. I'm removing instcombine from the RUN line here to break that dependency. Also, I'm generating full checks to make it clear exactly what the vectorizer has done. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@349554 91177308-0d34-0410-b5e6-96231b3b80d8	2018-12-18 22:23:04 +00:00
Michael Kruse	9a395de086	[Unroll/UnrollAndJam/Vectorizer/Distribute] Add followup loop attributes. When multiple loop transformation are defined in a loop's metadata, their order of execution is defined by the order of their respective passes in the pass pipeline. For instance, e.g. #pragma clang loop unroll_and_jam(enable) #pragma clang loop distribute(enable) is the same as #pragma clang loop distribute(enable) #pragma clang loop unroll_and_jam(enable) and will try to loop-distribute before Unroll-And-Jam because the LoopDistribute pass is scheduled after UnrollAndJam pass. UnrollAndJamPass only supports one inner loop, i.e. it will necessarily fail after loop distribution. It is not possible to specify another execution order. Also,t the order of passes in the pipeline is subject to change between versions of LLVM, optimization options and which pass manager is used. This patch adds 'followup' attributes to various loop transformation passes. These attributes define which attributes the resulting loop of a transformation should have. For instance, !0 = !{!0, !1, !2} !1 = !{!"llvm.loop.unroll_and_jam.enable"} !2 = !{!"llvm.loop.unroll_and_jam.followup_inner", !3} !3 = !{!"llvm.loop.distribute.enable"} defines a loop ID (!0) to be unrolled-and-jammed (!1) and then the attribute !3 to be added to the jammed inner loop, which contains the instruction to distribute the inner loop. Currently, in both pass managers, pass execution is in a fixed order and UnrollAndJamPass will not execute again after LoopDistribute. We hope to fix this in the future by allowing pass managers to run passes until a fixpoint is reached, use Polly to perform these transformations, or add a loop transformation pass which takes the order issue into account. For mandatory/forced transformations (e.g. by having been declared by #pragma omp simd), the user must be notified when a transformation could not be performed. It is not possible that the responsible pass emits such a warning because the transformation might be 'hidden' in a followup attribute when it is executed, or it is not present in the pipeline at all. For this reason, this patche introduces a WarnMissedTransformations pass, to warn about orphaned transformations. Since this changes the user-visible diagnostic message when a transformation is applied, two test cases in the clang repository need to be updated. To ensure that no other transformation is executed before the intended one, the attribute `llvm.loop.disable_nonforced` can be added which should disable transformation heuristics before the intended transformation is applied. E.g. it would be surprising if a loop is distributed before a #pragma unroll_and_jam is applied. With more supported code transformations (loop fusion, interchange, stripmining, offloading, etc.), transformations can be used as building blocks for more complex transformations (e.g. stripmining+stripmining+interchange -> tiling). Reviewed By: hfinkel, dmgreen Differential Revision: https://reviews.llvm.org/D49281 Differential Revision: https://reviews.llvm.org/D55288 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@348944 91177308-0d34-0410-b5e6-96231b3b80d8	2018-12-12 17:32:52 +00:00
Craig Topper	26f6d0a901	[X86][LoopVectorize] Replace -mcpu=skylake-avx512 with -mattr=avx512f in some tests that failed when experimenting with defaulting to -mprefer-vector-width=256 for skylake-avx512. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@348063 91177308-0d34-0410-b5e6-96231b3b80d8	2018-12-01 01:38:44 +00:00
Simon Pilgrim	bb61c72405	[CostModel] Add more realistic SK_InsertSubvector generic costs. Instead of defaulting to a cost = 1, expand to element extract/insert like we do for other shuffles. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@346662 91177308-0d34-0410-b5e6-96231b3b80d8	2018-11-12 15:20:24 +00:00
Ayal Zaks	6d0de65682	[LV] Avoid vectorizing loops under opt for size that involve SCEV checks Fix PR39417, PR39497 The loop vectorizer may generate runtime SCEV checks for overflow and stride==1 cases, leading to execution of original scalar loop. The latter is forbidden when optimizing for size. An assert introduced in r344743 triggered the above PR's showing it does happen. This patch fixes this behavior by preventing vectorization in such cases. Differential Revision: https://reviews.llvm.org/D53612 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@345959 91177308-0d34-0410-b5e6-96231b3b80d8	2018-11-02 09:16:12 +00:00
Dorit Nuzman	06bac6c858	[LV] Support vectorization of interleave-groups that require an epilog under optsize using masked wide loads Under Opt for Size, the vectorizer does not vectorize interleave-groups that have gaps at the end of the group (such as a loop that reads only the even elements: a[2*i]) because that implies that we'll require a scalar epilogue (which is not allowed under Opt for Size). This patch extends the support for masked-interleave-groups (introduced by D53011 for conditional accesses) to also cover the case of gaps in a group of loads; Targets that enable the masked-interleave-group feature don't have to invalidate interleave-groups of loads with gaps; they could now use masked wide-loads and shuffles (if that's what the cost model selects). Reviewers: Ayal, hsaito, dcaballe, fhahn Reviewed By: Ayal Differential Revision: https://reviews.llvm.org/D53668 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@345705 91177308-0d34-0410-b5e6-96231b3b80d8	2018-10-31 09:57:56 +00:00
Simon Pilgrim	8a10b6b077	[CostModel][X86] Add realistic vXi64 uitofp vXf64 costs Match codegen improvements from D53649/rL345256 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@345263 91177308-0d34-0410-b5e6-96231b3b80d8	2018-10-25 13:06:20 +00:00
Dorit Nuzman	63aae622d8	[LV] Don't have fold-tail under optsize invalidate interleave-groups when masked-interleaving is enabled Enable interleave-groups under fold-tail scenario for Opt for size compilation; D50480 added support for vectorizing loops of arbitrary trip-count without a remiander, which in turn makes everything in the loop conditional, including interleave-groups if any. It therefore invalidated all interleave-groups because we didn't have support for vectorizing predicated interleaved-groups at the time. In the meantime, D53011 introduced this support, so we don't have to invalidate interleave-groups when masked-interleaved support is enabled. Reviewers: Ayal, hsaito, dcaballe, fhahn Reviewed By: hsaito Differential Revision: https://reviews.llvm.org/D53559 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@345115 91177308-0d34-0410-b5e6-96231b3b80d8	2018-10-24 07:11:38 +00:00
Dorit Nuzman	c7a8ddb849	[IAI,LV] Avoid creating a scalar epilogue due to gaps in interleave-groups when optimizing for size LV is careful to respect -Os and not to create a scalar epilog in all cases (runtime tests, trip-counts that require a remainder loop) except for peeling due to gaps in interleave-groups. This patch fixes that; -Os will now have us invalidate such interleave-groups and vectorize without an epilog. The patch also removes a related FIXME comment that is now obsolete, and was also inaccurate: "FIXME: return None if loop requiresScalarEpilog(<MaxVF>), or look for a smaller MaxVF that does not require a scalar epilog." (requiresScalarEpilog() has nothing to do with VF). Reviewers: Ayal, hsaito, dcaballe, fhahn Reviewed By: Ayal Differential Revision: https://reviews.llvm.org/D53420 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@344883 91177308-0d34-0410-b5e6-96231b3b80d8	2018-10-22 06:17:09 +00:00
Ayal Zaks	9e75857c92	[LV] Fold tail by masking to vectorize loops of arbitrary trip count under opt for size When optimizing for size, a loop is vectorized only if the resulting vector loop completely replaces the original scalar loop. This holds if no runtime guards are needed, if the original trip-count TC does not overflow, and if TC is a known constant that is a multiple of the VF. The last two TC-related conditions can be overcome by 1. rounding the trip-count of the vector loop up from TC to a multiple of VF; 2. masking the vector body under a newly introduced "if (i <= TC-1)" condition. The patch allows loops with arbitrary trip counts to be vectorized under -Os, subject to the existing cost model considerations. It also applies to loops with small trip counts (under -O2) which are currently handled as if under -Os. The patch does not handle loops with reductions, live-outs, or w/o a primary induction variable, and disallows interleave groups. (Third, final and main part of -) Differential Revision: https://reviews.llvm.org/D50480 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@344743 91177308-0d34-0410-b5e6-96231b3b80d8	2018-10-18 15:03:15 +00:00
Anna Thomas	c2874102cb	[LV] Teach vectorizer about variant value store into uniform address Summary: Teach vectorizer about vectorizing variant value stores to uniform address. Similar to rL343028, we do not allow vectorization if we have multiple stores to the same uniform address. Cost model already has the change for considering the extract instruction cost for a variant value store. See added test cases for how vectorization is done. The patch also contains changes to the ORE messages. Reviewers: Ayal, mkuper, anemet, hsaito Subscribers: rkruppe, llvm-commits Differential Revision: https://reviews.llvm.org/D52656 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@344613 91177308-0d34-0410-b5e6-96231b3b80d8	2018-10-16 15:46:26 +00:00
Ayal Zaks	dacda52aca	[LV] Add test checks when vectorizing loops under opt for size; NFC Landing this as a separate part of https://reviews.llvm.org/D50480, recording current behavior more accurately, to clarify subsequent diff ([LV] Vectorizing loops of arbitrary trip count without remainder under opt for size). git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@344606 91177308-0d34-0410-b5e6-96231b3b80d8	2018-10-16 14:25:02 +00:00
Dorit Nuzman	7d7250490b	recommit 344472 after fixing build failure on ARM and PPC. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@344475 91177308-0d34-0410-b5e6-96231b3b80d8	2018-10-14 08:50:06 +00:00
Dorit Nuzman	473da03560	revert 344472 due to failures. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@344473 91177308-0d34-0410-b5e6-96231b3b80d8	2018-10-14 07:21:20 +00:00
Dorit Nuzman	a3ff03e8e2	[IAI,LV] Add support for vectorizing predicated strided accesses using masked interleave-group The vectorizer currently does not attempt to create interleave-groups that contain predicated loads/stores; predicated strided accesses can currently be vectorized only using masked gather/scatter or scalarization. This patch makes predicated loads/stores candidates for forming interleave-groups during the Loop-Vectorizer's analysis, and adds the proper support for masked-interleave- groups to the Loop-Vectorizer's planning and transformation stages. The patch also extends the TTI API to allow querying the cost of masked interleave groups (which each target can control); Targets that support masked vector loads/ stores may choose to enable this feature and allow vectorizing predicated strided loads/stores using masked wide loads/stores and shuffles. Reviewers: Ayal, hsaito, dcaballe, fhahn, javed.absar Reviewed By: Ayal Differential Revision: https://reviews.llvm.org/D53011 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@344472 91177308-0d34-0410-b5e6-96231b3b80d8	2018-10-14 07:06:16 +00:00
Sanjay Patel	66a2c5ecaa	[InstCombine] reverse 'trunc X to <N x i1>' canonicalization; 2nd try Re-trying r344082 because it unintentionally included extra diffs. Original commit message: icmp ne (and X, 1), 0 --> trunc X to N x i1 Ideally, we'd do the same for scalars, but there will likely be regressions unless we add more trunc folds as we're doing here for vectors. The motivating vector case is from PR37549: https://bugs.llvm.org/show_bug.cgi?id=37549 define <4 x float> @bitwise_select(<4 x float> %x, <4 x float> %y, <4 x float> %z, <4 x float> %w) { %c = fcmp ole <4 x float> %x, %y %s = sext <4 x i1> %c to <4 x i32> %s1 = shufflevector <4 x i32> %s, <4 x i32> undef, <4 x i32> <i32 0, i32 0, i32 1, i32 1> %s2 = shufflevector <4 x i32> %s, <4 x i32> undef, <4 x i32> <i32 2, i32 2, i32 3, i32 3> %cond = or <4 x i32> %s1, %s2 %condtr = trunc <4 x i32> %cond to <4 x i1> %r = select <4 x i1> %condtr, <4 x float> %z, <4 x float> %w ret <4 x float> %r } Here's a sampling of the vector codegen for that case using mask+icmp (current behavior) vs. trunc (with this patch): AVX before: vcmpleps %xmm1, %xmm0, %xmm0 vpermilps $80, %xmm0, %xmm1 ## xmm1 = xmm0[0,0,1,1] vpermilps $250, %xmm0, %xmm0 ## xmm0 = xmm0[2,2,3,3] vorps %xmm0, %xmm1, %xmm0 vandps LCPI0_0(%rip), %xmm0, %xmm0 vxorps %xmm1, %xmm1, %xmm1 vpcmpeqd %xmm1, %xmm0, %xmm0 vblendvps %xmm0, %xmm3, %xmm2, %xmm0 AVX after: vcmpleps %xmm1, %xmm0, %xmm0 vpermilps $80, %xmm0, %xmm1 ## xmm1 = xmm0[0,0,1,1] vpermilps $250, %xmm0, %xmm0 ## xmm0 = xmm0[2,2,3,3] vorps %xmm0, %xmm1, %xmm0 vblendvps %xmm0, %xmm2, %xmm3, %xmm0 AVX512f before: vcmpleps %xmm1, %xmm0, %xmm0 vpermilps $80, %xmm0, %xmm1 ## xmm1 = xmm0[0,0,1,1] vpermilps $250, %xmm0, %xmm0 ## xmm0 = xmm0[2,2,3,3] vorps %xmm0, %xmm1, %xmm0 vpbroadcastd LCPI0_0(%rip), %xmm1 ## xmm1 = [1,1,1,1] vptestnmd %zmm1, %zmm0, %k1 vblendmps %zmm3, %zmm2, %zmm0 {%k1} AVX512f after: vcmpleps %xmm1, %xmm0, %xmm0 vpermilps $80, %xmm0, %xmm1 ## xmm1 = xmm0[0,0,1,1] vpermilps $250, %xmm0, %xmm0 ## xmm0 = xmm0[2,2,3,3] vorps %xmm0, %xmm1, %xmm0 vpslld $31, %xmm0, %xmm0 vptestmd %zmm0, %zmm0, %k1 vblendmps %zmm2, %zmm3, %zmm0 {%k1} AArch64 before: fcmge v0.4s, v1.4s, v0.4s zip1 v1.4s, v0.4s, v0.4s zip2 v0.4s, v0.4s, v0.4s orr v0.16b, v1.16b, v0.16b movi v1.4s, #1 and v0.16b, v0.16b, v1.16b cmeq v0.4s, v0.4s, #0 bsl v0.16b, v3.16b, v2.16b AArch64 after: fcmge v0.4s, v1.4s, v0.4s zip1 v1.4s, v0.4s, v0.4s zip2 v0.4s, v0.4s, v0.4s orr v0.16b, v1.16b, v0.16b bsl v0.16b, v2.16b, v3.16b PowerPC-le before: xvcmpgesp 34, 35, 34 vspltisw 0, 1 vmrglw 3, 2, 2 vmrghw 2, 2, 2 xxlor 0, 35, 34 xxlxor 35, 35, 35 xxland 34, 0, 32 vcmpequw 2, 2, 3 xxsel 34, 36, 37, 34 PowerPC-le after: xvcmpgesp 34, 35, 34 vmrglw 3, 2, 2 vmrghw 2, 2, 2 xxlor 0, 35, 34 xxsel 34, 37, 36, 0 Differential Revision: https://reviews.llvm.org/D52747 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@344181 91177308-0d34-0410-b5e6-96231b3b80d8	2018-10-10 20:47:46 +00:00
Sanjay Patel	9f5daa2df0	revert r344082: [InstCombine] reverse 'trunc X to <N x i1>' canonicalization This commit accidentally included the diffs from D53057. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@344178 91177308-0d34-0410-b5e6-96231b3b80d8	2018-10-10 20:39:39 +00:00
Justin Bogner	a219e952b7	[LV] Move test for r343954 into x86 subdirectory This test uses an x86 triple, so it needs to be in the x86 specific test directory. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@344087 91177308-0d34-0410-b5e6-96231b3b80d8	2018-10-09 22:40:04 +00:00
Sanjay Patel	1dd3c06445	[InstCombine] reverse 'trunc X to <N x i1>' canonicalization icmp ne (and X, 1), 0 --> trunc X to N x i1 Ideally, we'd do the same for scalars, but there will likely be regressions unless we add more trunc folds as we're doing here for vectors. The motivating vector case is from PR37549: https://bugs.llvm.org/show_bug.cgi?id=37549 define <4 x float> @bitwise_select(<4 x float> %x, <4 x float> %y, <4 x float> %z, <4 x float> %w) { %c = fcmp ole <4 x float> %x, %y %s = sext <4 x i1> %c to <4 x i32> %s1 = shufflevector <4 x i32> %s, <4 x i32> undef, <4 x i32> <i32 0, i32 0, i32 1, i32 1> %s2 = shufflevector <4 x i32> %s, <4 x i32> undef, <4 x i32> <i32 2, i32 2, i32 3, i32 3> %cond = or <4 x i32> %s1, %s2 %condtr = trunc <4 x i32> %cond to <4 x i1> %r = select <4 x i1> %condtr, <4 x float> %z, <4 x float> %w ret <4 x float> %r } Here's a sampling of the vector codegen for that case using mask+icmp (current behavior) vs. trunc (with this patch): AVX before: vcmpleps %xmm1, %xmm0, %xmm0 vpermilps $80, %xmm0, %xmm1 ## xmm1 = xmm0[0,0,1,1] vpermilps $250, %xmm0, %xmm0 ## xmm0 = xmm0[2,2,3,3] vorps %xmm0, %xmm1, %xmm0 vandps LCPI0_0(%rip), %xmm0, %xmm0 vxorps %xmm1, %xmm1, %xmm1 vpcmpeqd %xmm1, %xmm0, %xmm0 vblendvps %xmm0, %xmm3, %xmm2, %xmm0 AVX after: vcmpleps %xmm1, %xmm0, %xmm0 vpermilps $80, %xmm0, %xmm1 ## xmm1 = xmm0[0,0,1,1] vpermilps $250, %xmm0, %xmm0 ## xmm0 = xmm0[2,2,3,3] vorps %xmm0, %xmm1, %xmm0 vblendvps %xmm0, %xmm2, %xmm3, %xmm0 AVX512f before: vcmpleps %xmm1, %xmm0, %xmm0 vpermilps $80, %xmm0, %xmm1 ## xmm1 = xmm0[0,0,1,1] vpermilps $250, %xmm0, %xmm0 ## xmm0 = xmm0[2,2,3,3] vorps %xmm0, %xmm1, %xmm0 vpbroadcastd LCPI0_0(%rip), %xmm1 ## xmm1 = [1,1,1,1] vptestnmd %zmm1, %zmm0, %k1 vblendmps %zmm3, %zmm2, %zmm0 {%k1} AVX512f after: vcmpleps %xmm1, %xmm0, %xmm0 vpermilps $80, %xmm0, %xmm1 ## xmm1 = xmm0[0,0,1,1] vpermilps $250, %xmm0, %xmm0 ## xmm0 = xmm0[2,2,3,3] vorps %xmm0, %xmm1, %xmm0 vpslld $31, %xmm0, %xmm0 vptestmd %zmm0, %zmm0, %k1 vblendmps %zmm2, %zmm3, %zmm0 {%k1} AArch64 before: fcmge v0.4s, v1.4s, v0.4s zip1 v1.4s, v0.4s, v0.4s zip2 v0.4s, v0.4s, v0.4s orr v0.16b, v1.16b, v0.16b movi v1.4s, #1 and v0.16b, v0.16b, v1.16b cmeq v0.4s, v0.4s, #0 bsl v0.16b, v3.16b, v2.16b AArch64 after: fcmge v0.4s, v1.4s, v0.4s zip1 v1.4s, v0.4s, v0.4s zip2 v0.4s, v0.4s, v0.4s orr v0.16b, v1.16b, v0.16b bsl v0.16b, v2.16b, v3.16b PowerPC-le before: xvcmpgesp 34, 35, 34 vspltisw 0, 1 vmrglw 3, 2, 2 vmrghw 2, 2, 2 xxlor 0, 35, 34 xxlxor 35, 35, 35 xxland 34, 0, 32 vcmpequw 2, 2, 3 xxsel 34, 36, 37, 34 PowerPC-le after: xvcmpgesp 34, 35, 34 vmrglw 3, 2, 2 vmrghw 2, 2, 2 xxlor 0, 35, 34 xxsel 34, 37, 36, 0 Differential Revision: https://reviews.llvm.org/D52747 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@344082 91177308-0d34-0410-b5e6-96231b3b80d8	2018-10-09 21:26:01 +00:00

1 2 3 4 5 ...

314 Commits