RPCS3/llvm - llvm - Gitea: Git with a cup of tea

RPCS3/llvm

mirror of https://github.com/RPCS3/llvm.git synced 2025-05-28 08:16:14 +00:00

Author	SHA1	Message	Date
Matthew Simpson	ccc38cc5e7	Reapply r298620: [LV] Vectorize GEPs This patch reapplies r298620. The original patch was reverted because of two issues. First, the patch exposed a bug in InstCombine that caused the Chromium builds to fail (PR32414). This issue was fixed in r299017. Second, the patch introduced a bug in the vectorizer's scalars analysis that caused test suite builds to fail on SystemZ. The scalars analysis was too aggressive and marked a memory instruction scalar, even though it was going to be vectorized. This issue has been fixed in the current patch and several new test cases for the scalars analysis have been added. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@299770 91177308-0d34-0410-b5e6-96231b3b80d8	2017-04-07 14:15:34 +00:00
Ivan Krasin	76f79b8934	Revert r298620: [LV] Vectorize GEPs Reason: breaks linking Chromium with LLD + ThinLTO (a pass crashes) LLVM bug: https://bugs.llvm.org//show_bug.cgi?id=32413 Original change description: [LV] Vectorize GEPs This patch adds support for vectorizing GEPs. Previously, we only generated vector GEPs on-demand when creating gather or scatter operations. All GEPs from the original loop were scalarized by default, and if a pointer was to be stored to memory, we would have to build up the pointer vector with insertelement instructions. With this patch, we will vectorize all GEPs that haven't already been marked for scalarization. The patch refines collectLoopScalars to more exactly identify the scalar GEPs. The function now more closely resembles collectLoopUniforms. And the patch moves vector GEP creation out of vectorizeMemoryInstruction and into the main vectorization loop. The vector GEPs needed for gather and scatter operations will have already been generated before vectoring the memory accesses. Original Differential Revision: https://reviews.llvm.org/D30710 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@298735 91177308-0d34-0410-b5e6-96231b3b80d8	2017-03-24 20:49:43 +00:00
Matthew Simpson	77a842c0a2	[LV] Vectorize GEPs This patch adds support for vectorizing GEPs. Previously, we only generated vector GEPs on-demand when creating gather or scatter operations. All GEPs from the original loop were scalarized by default, and if a pointer was to be stored to memory, we would have to build up the pointer vector with insertelement instructions. With this patch, we will vectorize all GEPs that haven't already been marked for scalarization. The patch refines collectLoopScalars to more exactly identify the scalar GEPs. The function now more closely resembles collectLoopUniforms. And the patch moves vector GEP creation out of vectorizeMemoryInstruction and into the main vectorization loop. The vector GEPs needed for gather and scatter operations will have already been generated before vectoring the memory accesses. Differential Revision: https://reviews.llvm.org/D30710 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@298620 91177308-0d34-0410-b5e6-96231b3b80d8	2017-03-23 16:29:58 +00:00
Adam Nemet	90c4f1ee16	[LV] These remark should have been missed remarks The practice in LV is that we emit analysis remarks and then finally report either a missed or applied remark on the final decision whether vectorization is taking place. On this code path, we were closing with an analysis remark. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@296578 91177308-0d34-0410-b5e6-96231b3b80d8	2017-03-01 04:31:15 +00:00
Craig Topper	f46e01bb23	[AVX-512] Fix the execution domain for AVX-512 integer broadcasts. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@296290 91177308-0d34-0410-b5e6-96231b3b80d8	2017-02-26 06:45:51 +00:00
Dehao Chen	1ae1089dec	Increases full-unroll threshold. Summary: The default threshold for fully unroll is too conservative. This patch doubles the full-unroll threshold This change will affect the following speccpu2006 benchmarks (performance numbers were collected from Intel Sandybridge): Performance: 403 0.11% 433 0.51% 445 0.48% 447 3.50% 453 1.49% 464 0.75% Code size: 403 0.56% 433 0.96% 445 2.16% 447 2.96% 453 0.94% 464 8.02% The compiler time overhead is similar with code size. Reviewers: davidxl, mkuper, mzolotukhin, hfinkel, chandlerc Reviewed By: hfinkel, chandlerc Subscribers: mehdi_amini, zzheng, efriedma, haicheng, hfinkel, llvm-commits Differential Revision: https://reviews.llvm.org/D28368 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@295538 91177308-0d34-0410-b5e6-96231b3b80d8	2017-02-18 03:46:51 +00:00
Elena Demikhovsky	92cc2185b8	[Loop Vectorizer] Cost-based decision for vectorization form of memory instruction. Making the cost model selecting between Interleave, GatherScatter or Scalar vectorization form of memory instruction. The right decision should be done for non-consecutive memory access instrcuctions that may have more than one vectorization solution. This patch includes the following changes: - Cost Model calculates the cost of Load/Store vector form and choose the better option between Widening, Interleave, GatherScactter and Scalarization. Cost Model keeps the widening decision. - Arrays of Uniform and Scalar values are moved from Legality to Cost Model. - Cost Model collects Uniforms and Scalars per VF. The collection is based on CM decision map of Loadis/Stores vectorization form. - Vectorization of memory instruction is performed according to the CM decision. Differential Revision: https://reviews.llvm.org/D27919 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@294503 91177308-0d34-0410-b5e6-96231b3b80d8	2017-02-08 19:25:23 +00:00
Craig Topper	c9ff831ead	[X86] Remove PCOMMIT instruction support since Intel has deprecated this instruction with no plans to release products with it. Intel's documentation for the deprecation https://software.intel.com/en-us/blogs/2016/09/12/deprecate-pcommit-instruction git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@294405 91177308-0d34-0410-b5e6-96231b3b80d8	2017-02-08 05:45:39 +00:00
Adam Nemet	49e45b454f	[LV] Also port failure remarks to new OptimizationRemarkEmitter API git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@293866 91177308-0d34-0410-b5e6-96231b3b80d8	2017-02-02 05:41:51 +00:00
Mohammed Agabaria	e0bafdf059	[X86] enable memory interleaving for X86\SLM arch. Differential Revision: https://reviews.llvm.org/D28547 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@293040 91177308-0d34-0410-b5e6-96231b3b80d8	2017-01-25 09:14:48 +00:00
Matthew Simpson	1b2cd634db	[LV] Add requires asserts to test case git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@292280 91177308-0d34-0410-b5e6-96231b3b80d8	2017-01-17 22:21:33 +00:00
Matthew Simpson	caab3a817f	[LV] Mark non-consecutive-like pointers non-uniform If a memory instruction will be vectorized, but it's pointer operand is non-consecutive-like, the instruction is a gather or scatter operation. Its pointer operand will be non-uniform. This should fix PR31671. Reference: https://llvm.org/bugs/show_bug.cgi?id=31671 Differential Revision: https://reviews.llvm.org/D28819 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@292254 91177308-0d34-0410-b5e6-96231b3b80d8	2017-01-17 20:51:39 +00:00
Mohammed Agabaria	208cabd074	[X86] fixing failed test in commit: r291657 Missing Requires asserts. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@291659 91177308-0d34-0410-b5e6-96231b3b80d8	2017-01-11 09:03:11 +00:00
Mohammed Agabaria	9c6b24cc3a	[X86] updating TTI costs for arithmetic instructions on X86\SLM arch. updated instructions: pmulld, pmullw, pmulhw, mulsd, mulps, mulpd, divss, divps, divsd, divpd, addpd and subpd. special optimization case which replaces pmulld with pmullw\pmulhw\pshuf seq. In case if the real operands bitwidth <= 16. Differential Revision: https://reviews.llvm.org/D28104 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@291657 91177308-0d34-0410-b5e6-96231b3b80d8	2017-01-11 08:23:37 +00:00
Mohammed Agabaria	6bf7471dbc	Currently isLikelyComplexAddressComputation tries to figure out if the given stride seems to be 'complex' and need some extra cost for address computation handling. This code seems to be target dependent which may not be the same for all targets. Passed the decision whether the given stride is complex or not to the target by sending stride information via SCEV to getAddressComputationCost instead of 'IsComplex'. Specifically at X86 targets we dont see any significant address computation cost in case of the strided access in general. Differential Revision: https://reviews.llvm.org/D27518 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@291106 91177308-0d34-0410-b5e6-96231b3b80d8	2017-01-05 14:03:41 +00:00
Matthew Simpson	b6b20e1aa2	[LV] Scalarize operands of predicated instructions This patch attempts to scalarize the operand expressions of predicated instructions if they were conditionally executed in the original loop. After scalarization, the expressions will be sunk inside the blocks created for the predicated instructions. The transformation essentially performs un-if-conversion on the operands. The cost model has been updated to determine if scalarization is profitable. It compares the cost of a vectorized instruction, assuming it will be if-converted, to the cost of the scalarized instruction, assuming that the instructions corresponding to each vector lane will be sunk inside a predicated block, possibly avoiding execution. If it's more profitable to scalarize the entire expression tree feeding the predicated instruction, the expression will be scalarized; otherwise, it will be vectorized. We only consider the cost of the entire expression to accurately estimate the cost of the required insertelement and extractelement instructions. Differential Revision: https://reviews.llvm.org/D26083 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@288909 91177308-0d34-0410-b5e6-96231b3b80d8	2016-12-07 15:03:32 +00:00
Robert Lougher	75d009930b	[LoopVectorizer] When estimating reg usage, unused insts may "end" another use The register usage algorithm incorrectly treats instructions whose value is not used within the loop (e.g. those that do not produce a value). The algorithm first calculates the usages within the loop. It iterates over the instructions in order, and records at which instruction index each use ends (in fact, they're actually recorded against the next index, as this is when we want to delete them from the open intervals). The algorithm then iterates over the instructions again, adding each instruction in turn to a list of open intervals. Instructions are then removed from the list of open intervals when they occur in the list of uses ended at the current index. The problem is, instructions which are not used in the loop are skipped. However, although they aren't used, the last use of a value may have been recorded against that instruction index. In this case, the use is not deleted from the open intervals, which may then bump up the estimated register usage. This patch fixes the issue by simply moving the "is used" check after the loop which erases the uses at the current index. Differential Revision: https://reviews.llvm.org/D26554 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@286969 91177308-0d34-0410-b5e6-96231b3b80d8	2016-11-15 14:27:33 +00:00
Simon Pilgrim	2e6f35ab88	[X86][AVX] Fixed v16i16/v32i8 ADD/SUB costs on AVX1 subtargets Add explicit v16i16/v32i8 ADD/SUB costs, matching the costs of v4i64/v8i32 - they were missing for some reason. This has side effects on the LV max bandwidth tests (AVX1 now prefers 128-bit vectors vs AVX2 which still prefers 256-bit) git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@286832 91177308-0d34-0410-b5e6-96231b3b80d8	2016-11-14 14:45:16 +00:00
Adam Nemet	7f0fc37d0d	[LV] Stop saying "use -Rpass-analysis=loop-vectorize" This is PR28376. Unfortunately given the current structure of optimization diagnostics we lack the capability to tell whether the user has passed -Rpass-analysis=loop-vectorize since this is local to the front-end (BackendConsumer::OptimizationRemarkHandler). So rather than printing this even if the user has already passed -Rpass-analysis, this patch just punts and stops recommending this option. I don't think that getting this right is worth the complexity. Differential Revision: https://reviews.llvm.org/D26563 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@286662 91177308-0d34-0410-b5e6-96231b3b80d8	2016-11-11 22:51:46 +00:00
Dorit Nuzman	3aa311854a	Second attempt at r285517. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@285568 91177308-0d34-0410-b5e6-96231b3b80d8	2016-10-31 13:17:31 +00:00
Dorit Nuzman	6d3c9bdc8f	Revert r285517 due to build failures. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@285518 91177308-0d34-0410-b5e6-96231b3b80d8	2016-10-30 14:34:57 +00:00
Dorit Nuzman	b10d927158	[LoopVectorize] Make interleaved-accesses analysis less conservative about possible pointer-wrap-around concerns, in some cases. Before this patch, collectConstStridedAccesses (part of interleaved-accesses analysis) called getPtrStride with [Assume=false, ShouldCheckWrap=true] when examining all candidate pointers. This is too conservative. Instead, this patch makes collectConstStridedAccesses use an optimistic approach, calling getPtrStride with [Assume=true, ShouldCheckWrap=false], and then, once the candidate interleave groups have been formed, revisits the pointer-wrapping analysis but only where it matters: namely, in groups that have gaps, and where the gaps are not at the very end of the group (in which case the loop is peeled). This second time getPtrStride is called with [Assume=false, ShouldCheckWrap=true], but this could further be improved to using Assume=true, once we also add the logic to track that we are not going to meet the scev runtime checks threshold. Differential Revision: https://reviews.llvm.org/D25276 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@285517 91177308-0d34-0410-b5e6-96231b3b80d8	2016-10-30 12:23:26 +00:00
Michael Kuperstein	fe040323c6	[X86] Enable interleaved memory access by default This lets the loop vectorizer generate interleaved memory accesses on x86. Differential Revision: https://reviews.llvm.org/D25350 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@284779 91177308-0d34-0410-b5e6-96231b3b80d8	2016-10-20 21:04:31 +00:00
Matthew Simpson	1254de0c5c	[LV] Move insertelement sequence after scalar definitions After r279649 when getting a vector value from VectorLoopValueMap, we create an insertelement sequence on-demand if the value has been scalarized instead of vectorized. We previously inserted this insertelement sequence before the value's first vector user. However, this insert location is problematic if that user is the phi node of a first-order recurrence. With this patch, we move the insertelement sequence after the last scalar instruction we created when scalarizing the value. Thus, the value's vector definition in the new loop will immediately follow its scalar definitions. This should fix PR30183. Reference: https://llvm.org/bugs/show_bug.cgi?id=30183 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@280001 91177308-0d34-0410-b5e6-96231b3b80d8	2016-08-29 20:14:04 +00:00
Matthew Simpson	428e79c2bb	[LV] Unify vector and scalar maps This patch unifies the data structures we use for mapping instructions from the original loop to their corresponding instructions in the new loop. Previously, we maintained two distinct maps for this purpose: WidenMap and ScalarIVMap. WidenMap maintained the vector values each instruction from the old loop was represented with, and ScalarIVMap maintained the scalar values each scalarized induction variable was represented with. With this patch, all values created for the new loop are maintained in VectorLoopValueMap. The change allows for several simplifications. Previously, when an instruction was scalarized, we had to insert the scalar values into vectors in order to maintain the mapping in WidenMap. Then, if a user of the scalarized value was also scalar, we had to extract the scalar values from the temporary vector we created. We now aovid these unnecessary scalar-to-vector-to-scalar conversions. If a scalarized value is used by a scalar instruction, the scalar value is used directly. However, if the scalarized value is needed by a vector instruction, we generate the needed insertelement instructions on-demand. A common idiom in several locations in the code (including the scalarization code), is to first get the vector values an instruction from the original loop maps to, and then extract a particular scalar value. This patch adds getScalarValue for this purpose along side getVectorValue as an interface into VectorLoopValueMap. These functions work together to return the requested values if they're available or to produce them if they're not. The mapping has also be made less permissive. Entries can be added to VectorLoopValue map with the new initVector and initScalar functions. getVectorValue has been modified to return a constant reference to the mapped entries. There's no real functional change with this patch; however, in some cases we will generate slightly different code. For example, instead of an insertelement sequence following the definition of an instruction, it will now precede the first use of that instruction. This can be seen in the test case changes. Differential Revision: https://reviews.llvm.org/D23169 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@279649 91177308-0d34-0410-b5e6-96231b3b80d8	2016-08-24 18:23:17 +00:00
Michael Kuperstein	e788186982	[LV, X86] Be more optimistic about vectorizing shifts. Shifts with a uniform but non-constant count were considered very expensive to vectorize, because the splat of the uniform count and the shift would tend to appear in different blocks. That made the splat invisible to ISel, and we'd scalarize the shift at codegen time. Since r201655, CodeGenPrepare sinks those splats to be next to their use, and we are able to select the appropriate vector shifts. This updates the cost model to to take this into account by making shifts by a uniform cheap again. Differential Revision: https://reviews.llvm.org/D23049 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@277782 91177308-0d34-0410-b5e6-96231b3b80d8	2016-08-04 22:48:03 +00:00
Wei Mi	ba9543ccae	[LoopVectorize] Change comment for isOutOfScope in collectLoopUniforms, NFC Update comment for isOutOfScope and add a testcase for uniform value being used out of scope. Differential Revision: https://reviews.llvm.org/D23073 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@277515 91177308-0d34-0410-b5e6-96231b3b80d8	2016-08-02 20:27:49 +00:00
Matthew Simpson	155b8551c6	[LV] Generate both scalar and vector integer induction variables This patch enables the vectorizer to generate both scalar and vector versions of an integer induction variable for a given loop. Previously, we only generated a scalar induction variable if we knew all its users were going to be scalar. Otherwise, we generated a vector induction variable. In the case of a loop with both scalar and vector users of the induction variable, we would generate the vector induction variable and extract scalar values from it for the scalar users. With this patch, we now generate both versions of the induction variable when there are both scalar and vector users and select which version to use based on whether the user is scalar or vector. Differential Revision: https://reviews.llvm.org/D22869 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@277474 91177308-0d34-0410-b5e6-96231b3b80d8	2016-08-02 15:25:16 +00:00
Igor Breger	0691cf23e8	[AVX512] Don't use i128 masked gather/scatter/load/store. Do more accurately dataWidth check. Differential Revision: http://reviews.llvm.org/D23055 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@277435 91177308-0d34-0410-b5e6-96231b3b80d8	2016-08-02 09:15:28 +00:00
Craig Topper	a2cd077470	[AVX-512] Fix a test missed in r277327. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@277330 91177308-0d34-0410-b5e6-96231b3b80d8	2016-08-01 08:15:30 +00:00
Matt Masten	bbbcccbfc4	Initial support for vectorization using svml (short vector math library). Differential Revision: https://reviews.llvm.org/D19544 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@277166 91177308-0d34-0410-b5e6-96231b3b80d8	2016-07-29 16:42:44 +00:00
Elena Demikhovsky	ba55955caa	[Loop Vectorizer] Handling loops FP induction variables. Allowed loop vectorization with secondary FP IVs. Like this: float *A; float x = init; for (int i=0; i < N; ++i) { A[i] = x; x -= fp_inc; } The auto-vectorization is possible when the induction binary operator is "fast" or the function has "unsafe" attribute. Differential Revision: https://reviews.llvm.org/D21330 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@276554 91177308-0d34-0410-b5e6-96231b3b80d8	2016-07-24 07:24:54 +00:00
Matthew Simpson	0414f48742	[LV] Move vector int induction update to end of latch This patch moves the update instruction for vectorized integer induction phi nodes to the end of the latch block. This ensures consistent placement of all induction updates across all the kinds of int inductions we create (scalar, splat vector, or vector phi). Differential Revision: https://reviews.llvm.org/D22416 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@276339 91177308-0d34-0410-b5e6-96231b3b80d8	2016-07-21 21:20:15 +00:00
Adam Nemet	42a372e9b8	[OptDiag,LV] Add hotness attribute to applied-optimization remarks Test coverage is provided by modifying the function in the FP-math testcase that we are allowed to vectorize. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@276223 91177308-0d34-0410-b5e6-96231b3b80d8	2016-07-21 01:07:13 +00:00
Adam Nemet	cebe016761	[OptDiag,LV] Add hotness attribute to the derived analysis remarks This includes FPCompute and Aliasing. Testcase is based on no_fpmath.ll. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@276211 91177308-0d34-0410-b5e6-96231b3b80d8	2016-07-20 23:50:32 +00:00
Wei Mi	92a8d601a3	Recommit the patch "Use uniforms set to populate VecValuesToIgnore". For instructions in uniform set, they will not have vector versions so add them to VecValuesToIgnore. For induction vars, those only used in uniform instructions or consecutive ptrs instructions have already been added to VecValuesToIgnore above. For those induction vars which are only used in uniform instructions or non-consecutive/non-gather scatter ptr instructions, the related phi and update will also be added into VecValuesToIgnore set. The change will make the vector RegUsages estimation less conservative. Differential Revision: https://reviews.llvm.org/D20474 The recommit fixed the testcase global_alias.ll. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@275936 91177308-0d34-0410-b5e6-96231b3b80d8	2016-07-19 00:50:43 +00:00
Wei Mi	fba236f858	Revert rL275912. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@275915 91177308-0d34-0410-b5e6-96231b3b80d8	2016-07-18 21:14:43 +00:00
Wei Mi	1938056381	Use uniforms set to populate VecValuesToIgnore. For instructions in uniform set, they will not have vector versions so add them to VecValuesToIgnore. For induction vars, those only used in uniform instructions or consecutive ptrs instructions have already been added to VecValuesToIgnore above. For those induction vars which are only used in uniform instructions or non-consecutive/non-gather scatter ptr instructions, the related phi and update will also be added into VecValuesToIgnore set. The change will make the vector RegUsages estimation less conservative. Differential Revision: https://reviews.llvm.org/D20474 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@275912 91177308-0d34-0410-b5e6-96231b3b80d8	2016-07-18 20:59:53 +00:00
Michael Kuperstein	b1fce5cc4c	[X86] Make some cast costs more precise Make some AVX and AVX512 cast costs more precise. Based on part of a patch by Elena Demikhovsky (D15604). Differential Revision: http://reviews.llvm.org/D22064 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@275106 91177308-0d34-0410-b5e6-96231b3b80d8	2016-07-11 21:39:44 +00:00
Elena Demikhovsky	407fc99045	Fixed a bug in vectorizing GEP before gather/scatter intrinsic. Vectorizing GEP was incorrect and broke SSA in some cases. The patch fixes PR27997 https://llvm.org/bugs/show_bug.cgi?id=27997. Differential revision: http://reviews.llvm.org/D22035 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@274735 91177308-0d34-0410-b5e6-96231b3b80d8	2016-07-07 06:06:46 +00:00
Michael Kuperstein	c7432f9ad3	[TTI] The cost model should not assume vector casts get completely scalarized The cost model should not assume vector casts get completely scalarized, since on targets that have vector support, the common case is a partial split up to the legal vector size. So, when a vector cast gets split, the resulting casts end up legal and cheap. Instead of pessimistically assuming scalarization, base TTI can use the costs the concrete TTI provides for the split vector, plus a fudge factor to account for the cost of the split itself. This fudge factor is currently 1 by default, except on AMDGPU where inserts and extracts are considered free. Differential Revision: http://reviews.llvm.org/D21251 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@274642 91177308-0d34-0410-b5e6-96231b3b80d8	2016-07-06 17:30:56 +00:00
Matt Arsenault	6bcca1a915	SLPVectorizer: Move propagateMetadata to VectorUtils This will be re-used by the LoadStoreVectorizer. Fix handling of range metadata and testcase by Justin Lebar. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@274281 91177308-0d34-0410-b5e6-96231b3b80d8	2016-06-30 21:17:59 +00:00
Wei Mi	693c332887	Refine the set of UniformAfterVectorization instructions. Except the seed uniform instructions (conditional branch and consecutive ptr instructions), dependencies to be added into uniform set should only be used by existing uniform instructions or intructions outside of current loop. Differential Revision: http://reviews.llvm.org/D21755 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@274262 91177308-0d34-0410-b5e6-96231b3b80d8	2016-06-30 18:42:56 +00:00
Artur Pilipenko	48917c9e44	Support arbitrary addrspace pointers in masked load/store intrinsics This is a resubmittion of 263158 change after fixing the existing problem with intrinsics mangling (see LTO and intrinsics mangling llvm-dev thread for details). This patch fixes the problem which occurs when loop-vectorize tries to use @llvm.masked.load/store intrinsic for a non-default addrspace pointer. It fails with "Calling a function with a bad signature!" assertion in CallInst constructor because it tries to pass a non-default addrspace pointer to the pointer argument which has default addrspace. The fix is to add pointer type as another overloaded type to @llvm.masked.load/store intrinsics. Reviewed By: reames Differential Revision: http://reviews.llvm.org/D17270 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@274043 91177308-0d34-0410-b5e6-96231b3b80d8	2016-06-28 18:27:25 +00:00
Artur Pilipenko	be0da39a48	Revert -r273892 "Support arbitrary addrspace pointers in masked load/store intrinsics" since some of the clang tests don't expect to see the updated signatures. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@273895 91177308-0d34-0410-b5e6-96231b3b80d8	2016-06-27 16:54:33 +00:00
Artur Pilipenko	9227558e8e	Support arbitrary addrspace pointers in masked load/store intrinsics This is a resubmittion of 263158 change after fixing the existing problem with intrinsics mangling (see LTO and intrinsics mangling llvm-dev thread for details). This patch fixes the problem which occurs when loop-vectorize tries to use @llvm.masked.load/store intrinsic for a non-default addrspace pointer. It fails with "Calling a function with a bad signature!" assertion in CallInst constructor because it tries to pass a non-default addrspace pointer to the pointer argument which has default addrspace. The fix is to add pointer type as another overloaded type to @llvm.masked.load/store intrinsics. Reviewed By: reames Differential Revision: http://reviews.llvm.org/D17270 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@273892 91177308-0d34-0410-b5e6-96231b3b80d8	2016-06-27 16:29:26 +00:00
Michael Kuperstein	01d6c3dbf9	[LV] For some IVs, use vector phis instead of widening in the loop body Previously, whenever we needed a vector IV, we would create it on the fly, by splatting the scalar IV and adding a step vector. Instead, we can create a real vector IV. This tends to save a couple of instructions per iteration. This only changes the behavior for the most basic case - integer primary IVs with a constant step. Differential Revision: http://reviews.llvm.org/D20315 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@271410 91177308-0d34-0410-b5e6-96231b3b80d8	2016-06-01 17:16:46 +00:00
Tim Northover	5b363367fe	Move test to X86 directory: I think it depends on X86 TTI. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@271019 91177308-0d34-0410-b5e6-96231b3b80d8	2016-05-27 16:56:54 +00:00
Hal Finkel	d86e7af14a	Look for a loop's starting location in the llvm.loop metadata Getting accurate locations for loops is important, because those locations are used by the frontend to generate optimization remarks. Currently, optimization remarks for loops often appear on the wrong line, often the first line of the loop body instead of the loop itself. This is confusing because that line might itself be another loop, or might be somewhere else completely if the body was inlined function call. This happens because of the way we find the loop's starting location. First, we look for a preheader, and if we find one, and its terminator has a debug location, then we use that. Otherwise, we look for a location on an instruction in the loop header. The fallback heuristic is not bad, but will almost always find the beginning of the body, and not the loop statement itself. The preheader location search often fails because there's often not a preheader, and even when there is a preheader, depending on how it was formed, it sometimes carries the location of some preceeding code. I don't see any good theoretical way to fix this problem. On the other hand, this seems like a straightforward solution: Put the debug location in the loop's llvm.loop metadata. A companion Clang patch will cause Clang to insert llvm.loop metadata with appropriate locations when generating debugging information. With these changes, our loop remarks have much more accurate locations. Differential Revision: http://reviews.llvm.org/D19738 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@270771 91177308-0d34-0410-b5e6-96231b3b80d8	2016-05-25 21:42:37 +00:00
Sanjay Patel	cab076f44c	[x86] avoid code explosion from LoopVectorizer for gather loop (PR27826) By making pointer extraction from a vector more expensive in the cost model, we avoid the vectorization of a loop that is very likely to be memory-bound: https://llvm.org/bugs/show_bug.cgi?id=27826 There are still bugs related to this, so we may need a more general solution to avoid vectorizing obviously memory-bound loops when we don't have HW gather support. Differential Revision: http://reviews.llvm.org/D20601 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@270729 91177308-0d34-0410-b5e6-96231b3b80d8	2016-05-25 17:27:54 +00:00

1 2 3 4 5

202 Commits