archived-llvm-mirror

mirror of https://github.com/RPCS3/llvm-mirror.git synced 2026-01-31 01:35:20 +01:00

Author	SHA1	Message	Date
Sanjay Patel	4efc3d6d2c	[IRBuilder][VectorCombine] make and use a convenience function for unary shuffle; NFC This reduces code duplication for common construct. Follow-ups can use this in SLP, LoopVectorizer, and other passes.	2020-09-21 13:47:01 -04:00
Sanjay Patel	ffe3499266	[VectorCombine] limit load+insert transform to one-use As discussed in: https://llvm.org/PR47558 ...there are several potential fixes/follow-ups visible in the test case, but this is the quickest and safest fix of the perf regression.	2020-09-17 14:29:15 -04:00
Sanjay Patel	c89bcabba1	[VectorCombine] rearrange bailouts for load insert for efficiency; NFC	2020-09-17 13:50:37 -04:00
Fangrui Song	306ab784d5	[VectorCombine] Don't vectorize scalar load under asan/hwasan/memtag/tsan Similar to the tsan suppression in `Utils/VNCoercion.cpp:getLoadLoadClobberFullWidthSize` (rL175034; load widening used by GVN), the D81766 optimization should be suppressed under tsan due to potential spurious data race reports: struct A { int i; const short s; // the load cannot be vectorized because int modify; // it overlaps with bytes being concurrently modified long pad1, pad2; }; // __tsan_read16 does not know that some bytes are undef and accessing is safe Similarly, under asan, users can mark memory regions with `__asan_poison_memory_region`. A widened load can lead to a spurious use-after-poison error. hwasan/memtag should be similarly suppressed. `mustSuppressSpeculation` suppresses asan/hwasan/tsan but not memtag, so we need to exclude memtag in `vectorizeLoadInsert`. Note, memtag suppression can be relaxed if the load is aligned to the its granule (usually 16), but that is out of scope of this patch. Reviewed By: spatel, vitalybuka Differential Revision: https://reviews.llvm.org/D87538	2020-09-15 09:47:21 -07:00
Huihui Zhang	dc1e85f7b4	[VectorCombine][SVE] Do not fold bitcast shuffle for scalable type. First, shuffle cost for scalable type is not known for scalable type; Second, we cannot reason if the narrowed shuffle mask for scalable type is a splat or not. E.g., Bitcast splat vector from type <vscale x 4 x i32> to <vscale x 8 x i16> will involve narrowing shuffle mask <vscale x 4 x i32> zeroinitializer to <vscale x 8 x i32> with element sequence of <0, 1, 0, 1, ...>, which cannot be reasoned if it's a valid splat or not. Reviewed By: spatel Differential Revision: https://reviews.llvm.org/D86995	2020-09-02 15:02:16 -07:00
Sanjay Patel	4e9822e551	[VectorCombine] allow vector loads with mismatched insert type This is an enhancement to D81766 to allow loading the minimum target vector type into an IR vector with a different number of elements. In one of the motivating tests from PR16739, SLP creates <2 x float> load ops mixed with <4 x float> insert ops, so we want to handle that pattern in addition to potential oversized vectors created by the vectorizers. For now, we are assuming the insert/extract subvector with undef is free because there is no exact corresponding TTI modeling for that. Differential Revision: https://reviews.llvm.org/D86160	2020-09-02 08:11:36 -04:00
Christopher Tetreault	c8d8d4e8c9	[SVE] Remove calls to VectorType::getNumElements from Transforms/Vectorize Reviewed By: spatel Differential Revision: https://reviews.llvm.org/D82056	2020-08-27 12:02:20 -07:00
Bjorn Pettersson	457f9b7c13	[VectorCombine] Fix for non-zero addrspace when creating vector load from scalar load This is a fixup to commit 43bdac290663f4424f9fb, to make sure the address space from the original load pointer is retained in the vector pointer. Resolves problem with Assertion `castIsValid(op, S, Ty) && "Invalid cast!"' failed. due to address space mismatch. Reviewed By: spatel Differential Revision: https://reviews.llvm.org/D85912	2020-08-13 18:25:32 +02:00
Sanjay Patel	00ebcf5eea	[VectorCombine] early exit if target has no vector registers Based on post-commit discussion in: D81766 Other vectorization passes (SLP and Loop) use this TTI API similarly.	2020-08-12 09:22:31 -04:00
Sanjay Patel	b6c5255819	[VectorCombine] add safety check for 0-width register Based on post-commit discussion in D81766, Hexagon sets this to "0". I'll see if I can come up with a test, but making the obvious code fix first to unblock that target.	2020-08-11 20:30:02 -04:00
Sanjay Patel	0cee2d7cd7	[VectorCombine] try to create vector loads from scalar loads This patch was adjusted to match the most basic pattern that starts with an insertelement (so there's no extract created here). Hopefully, that removes any concern about interfering with other passes. Ie, the transform should almost always be profitable. We could make an argument that this could be part of canonicalization, but we conservatively try not to create vector ops from scalar ops in passes like instcombine. If the transform is not profitable, the backend should be able to re-scalarize the load. Differential Revision: https://reviews.llvm.org/D81766	2020-08-09 09:05:06 -04:00
Benjamin Kramer	996e94e98e	Make helpers static. NFC.	2020-07-09 13:48:56 +02:00
Sanjay Patel	8ded996357	[VectorCombine] try to form vector compare and binop to eliminate scalar ops binop i1 (cmp Pred (ext X, Index0), C0), (cmp Pred (ext X, Index1), C1) --> vcmp = cmp Pred X, VecC ext (binop vNi1 vcmp, (shuffle vcmp, Index1)), Index0 This is a larger pattern than the existing extractelement folds because we can't reasonably vectorize the sub-patterns with constants based on cost model calcs (it doesn't usually make sense to replace a single extracted scalar op with constant operand with a vector op). I salvaged as much of the existing logic as I could, but there might be better ways to share and reduce code. The motivating case from PR43745: https://bugs.llvm.org/show_bug.cgi?id=43745 ...is the special case of a 2-way reduction. We tried to get SLP to handle that particular pattern in D59710, but that caused crashing and regressions. This patch is more general, but hopefully safer. The v2f64 test with SSE2 surprised me - the cost model accounting looks like this: OldCost = 0 (free extract of f64 at index 0) + 1 (extract of f64 at index 1) + 2 (scalar fcmps) + 1 (and of bools) = 4 NewCost = 2 (vector fcmp) + 1 (shuffle) + 1 (vector 'and') + 1 (extract of bool) = 5 Differential Revision: https://reviews.llvm.org/D82474	2020-06-29 10:38:52 -04:00
Sanjay Patel	68105631e7	[VectorCombine] refactor - make helper function for extract to shuffle logic; NFC Preliminary for D82474	2020-06-29 09:55:34 -04:00
Sanjay Patel	71738deebf	[VectorCombine] give invalid index value a name; NFC	2020-06-24 11:10:36 -04:00
Sanjay Patel	92d013ae17	[VectorCombine] do not use magic number for undef mask element; NFC	2020-06-22 20:47:09 -04:00
Sanjay Patel	f0d13192c9	[VectorCombine] make helper function for shift-shuffle; NFC This will probably be useful for other extract patterns.	2020-06-22 12:23:52 -04:00
Sanjay Patel	ad12362f7d	[VectorCombine] add helper to replace uses and rename The tests are regenerated to show a path that missed renaming, but there should be no functional difference from this patch.	2020-06-22 09:58:49 -04:00
Sanjay Patel	4aa4c0ae7d	[VectorCombine] add/use pass-level IRBuilder This saves creating/destroying a builder every time we perform some transform. The tests show instruction ordering diffs resulting from always inserting at the root instruction now, but those should be benign.	2020-06-22 09:01:29 -04:00
Sanjay Patel	b59dba09e0	[VectorCombine] improve IR debugging by providing/salvaging value names The tests are regenerated to show the diffs, but there should be no functional change from this patch.	2020-06-22 08:35:47 -04:00
Sanjay Patel	357def70f5	[VectorCombine] create class for pass to hold analyses, etc; NFC This doesn't change anything currently, but it would make sense to create a class-level IRBuilder instead of recreating that everywhere. As we expand to more optimizations, we will probably also want to hold things like the DataLayout or other constant refs in here too.	2020-06-21 16:07:33 -04:00
Sanjay Patel	cbbf7aa25a	[VectorCombine] fix assert for type of compare operand As shown in the post-commit comment for D81661 - we need to loosen the type assertion to allow scalarization of a compare for vectors of pointers.	2020-06-20 15:20:17 -04:00
Sanjay Patel	46f661f4a4	[VectorCombine] refactor extract-extract logic; NFCI	2020-06-19 14:52:27 -04:00
Sanjay Patel	54234e8ffc	[VectorCombine] fix crash while transforming constants This is a variation of the proposal in D82049 with an extra test.	2020-06-19 12:30:32 -04:00
Sanjay Patel	16ab4831de	[IRBuilder] add/use wrapper to create a generic compare based on predicate type; NFC The predicate can always be used to distinguish between icmp and fcmp, so we don't need to keep repeating this check in the callers.	2020-06-18 15:47:06 -04:00
Sanjay Patel	fe400c6e8c	[VectorCombine] scalarize compares with insertelement operand(s) Generalize scalarization (recently enhanced with D80885) to allow compares as well as binops. Similar to binops, we are avoiding scalarization of a loaded value because that could avoid a register transfer in codegen. This requires 1 extra predicate that I am aware of: we do not want to scalarize the condition value of a vector select. That might also invert a transform that we do in instcombine that prefers a vector condition operand for a vector select. I think this is the final step in solving PR37463: https://bugs.llvm.org/show_bug.cgi?id=37463 Differential Revision: https://reviews.llvm.org/D81661	2020-06-16 13:48:10 -04:00
Roman Lebedev	d7bb9dc47c	[NFCI] VectorCombine: add statistic for bitcast(shuf()) -> shuf(bitcast()) xform	2020-06-12 23:10:53 +03:00
Sanjay Patel	85a5a1a823	[VectorCombine] remove unused parameters; NFC	2020-06-11 19:15:03 -04:00
Simon Pilgrim	7cc81c9a51	[VectorCombine] scalarizeBinop - support an all-constant src vector operand scalarizeBinop currently folds vec_bo((inselt VecC0, V0, Index), (inselt VecC1, V1, Index)) -> inselt(vec_bo(VecC0, VecC1), scl_bo(V0,V1), Index) This patch extends this to account for cases where one of the vec_bo operands is already all-constant and performs similar cost checks to determine if the scalar binop with a constant still makes sense: vec_bo((inselt VecC0, V0, Index), VecC1) -> inselt(vec_bo(VecC0, VecC1), scl_bo(V0,extractelt(V1,Index)), Index) Fixes PR42174 Differential Revision: https://reviews.llvm.org/D80885	2020-06-09 19:02:05 +01:00
Simon Pilgrim	4dedca4724	LoopAnalysisManager.h - reduce includes to forward declarations. NFC. Move implicit include dependencies down to header/source files.	2020-06-06 14:06:46 +01:00
Sanjay Patel	a405d76236	[PatternMatch] abbreviate vector inst matchers; NFC Readability is not reduced with these opcodes/match lines, so reduce odds of awkward wrapping from 80-col limit.	2020-05-24 09:19:47 -04:00
Sanjay Patel	13a25dc6e3	[VectorCombine] set preserve alias analysis As noted in D80236, moving the pass in the pipeline exposed this shortcoming. Extra work to recalculate the alias results showed up as a compile-time slowdown.	2020-05-22 16:25:16 -04:00
Sanjay Patel	a80619608b	[VectorCombine] forward walk through instructions to improve chaining of transforms This is split off from D79799 - where I was proposing to fully iterate over a function until there are no more transforms. I suspect we are still going to want to do something like that eventually. But we can achieve the same gains much more efficiently on the current set of regression tests just by reversing the order that we visit the instructions. This may also reduce the motivation for D79078, but we are still not getting the optimal pattern for a reduction.	2020-05-16 13:08:01 -04:00
Sanjay Patel	27365d9ef6	[VectorCombine] account for extra uses in scalarization cost Follow-up to D79452. Mimics the extra use cost formula for the inverse transform with extracts.	2020-05-11 15:20:57 -04:00
Sanjay Patel	ba0bcdfe21	[VectorCombine] scalarize binop of inserted elements into vector constants As with the extractelement patterns that are currently in vector-combine, there are going to be several possible variations on this theme. This should be the clearest, simplest example. Scalarization is the right direction for target-independent canonicalization, and InstCombine has some of those folds already, but it doesn't do this. I proposed a similar transform in D50992. Here in vector-combine, we can check the cost model to be sure it's profitable, so there should be less risk. Differential Revision: https://reviews.llvm.org/D79452	2020-05-08 16:31:12 -04:00
Sam Parker	55f623649f	[NFC][TTI] Explicit use of VectorType The API for shuffles and reductions uses generic Type parameters, instead of VectorType, and so assertions and casts are used a lot. This patch makes those types explicit, which means that the clients can't be lazy, but results in less ambiguity, and that can only be a good thing. Bugzilla: https://bugs.llvm.org/show_bug.cgi?id=45562 Differential Revision: https://reviews.llvm.org/D78357	2020-04-20 09:16:52 +01:00
Sanjay Patel	166ecaf93d	[VectorCombine] transform bitcasted shuffle to wider elements bitcast (shuf V, MaskC) --> shuf (bitcast V), MaskC' This is the widen shuffle elements enhancement to D76727. It builds on the analysis and simplifications in D77881 and rG6a7e958a423e. The phase ordering tests show that we can simplify inverse shuffles across a binop in both directions (widen/narrow or narrow/widen) now. There's another potential transform visible in some of the remaining TODOs - move a bitcasted operand of a shuffle after the shuffle. Differential Revision: https://reviews.llvm.org/D78371	2020-04-19 08:24:38 -04:00
Benjamin Kramer	b51ce9af45	Upgrade calls to CreateShuffleVector to use the preferred form of passing an array of ints No functionality change intended.	2020-04-15 12:51:38 +02:00
Christopher Tetreault	6525db3b03	Clean up usages of asserting vector getters in Type Summary: Remove usages of asserting vector getters in Type in preparation for the VectorType refactor. The existence of these functions complicates the refactor while adding little value. Reviewers: rriddle, sdesmalen, efriedma Reviewed By: efriedma Subscribers: hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D77259	2020-04-13 12:29:43 -07:00
Sanjay Patel	8105dac4cc	[VectorUtils] rename scaleShuffleMask to narrowShuffleMaskElts; NFC As proposed in D77881, we'll have the related widening operation, so this name becomes too vague. While here, change the function signature to take an 'int' rather than 'size_t' for the scaling factor, add an assert for overflow of 32-bits, and improve the documentation comments.	2020-04-11 10:05:49 -04:00
Sanjay Patel	24269f9eb6	[VectorCombine] try to form a better extractelement Extracting to the same index that we are going to insert back into allows forming select ("blend") shuffles and enables further transforms. Admittedly, this is a quick-fix for a more general problem that I'm hoping to solve by adding transforms for patterns that start with an insertelement. But this might resolve some regressions known to be caused by the extract-extract transform (although I have not gotten more details on those yet). In the motivating case from PR34724: https://bugs.llvm.org/show_bug.cgi?id=34724 The combination of subsequent instcombine and codegen transforms gets us this improvement: vmovshdup %xmm0, %xmm2 ## xmm2 = xmm0[1,1,3,3] vhaddps %xmm1, %xmm1, %xmm4 vmovshdup %xmm1, %xmm3 ## xmm3 = xmm1[1,1,3,3] vaddps %xmm0, %xmm2, %xmm0 vaddps %xmm1, %xmm3, %xmm1 vshufps $200, %xmm4, %xmm0, %xmm0 ## xmm0 = xmm0[0,2],xmm4[0,3] vinsertps $177, %xmm1, %xmm0, %xmm0 ## xmm0 = zero,xmm0[1,2],xmm1[2] --> vmovshdup %xmm0, %xmm2 ## xmm2 = xmm0[1,1,3,3] vhaddps %xmm1, %xmm1, %xmm1 vaddps %xmm0, %xmm2, %xmm0 vshufps $200, %xmm1, %xmm0, %xmm0 ## xmm0 = xmm0[0,2],xmm1[0,3] Differential Revision: https://reviews.llvm.org/D76623	2020-04-03 13:55:13 -04:00
Sanjay Patel	1e7408d565	[VectorCombine] transform bitcasted shuffle to narrower elements bitcast (shuf V, MaskC) --> shuf (bitcast V), MaskC' We do not attempt this in InstCombine because we do not want to change types and create new shuffle ops that are potentially not lowered as well as the original code. Here, we can check the cost model to see if it is worthwhile. I've aggressively enabled this transform even if the types are the same size and/or equal cost because moving the bitcast allows InstCombine to make further simplifications. In the motivating cases from PR35454: https://bugs.llvm.org/show_bug.cgi?id=35454 ...this is enough to let instcombine and the backend eliminate the redundant shuffles, but we probably want to extend VectorCombine to handle the inverse pattern (shuffle-of-bitcast) to get that simplification directly in IR. Differential Revision: https://reviews.llvm.org/D76727	2020-04-02 13:30:22 -04:00
Sanjay Patel	c46bba2092	[VectorCombine] skip debug intrinsics first for efficiency	2020-03-29 13:58:04 -04:00
Sanjay Patel	2a676af4c8	[VectorCombine] fold extract-extract-op with different extraction indexes opcode (extelt V0, Ext0), (ext V1, Ext1) --> extelt (opcode (splat V0, Ext0), V1), Ext1 The first part of this patch generalizes the cost calculation to accept different extraction indexes. The second part creates a shuffle+extract before feeding into the existing code to create a vector op+extract. The patch conservatively uses "TargetTransformInfo::SK_PermuteSingleSrc" rather than "TargetTransformInfo::SK_Broadcast" (splat specifically from element 0) because we do not have a more general "SK_Splat" currently. That does not affect any of the current regression tests, but we might be able to find some cost model target specialization where that comes into play. I suspect that we can expose some missing x86 horizontal op codegen with this transform, so I'm speculatively adding a debug flag to disable the binop variant of this transform to allow easier testing. The test changes show that we're sensitive to cost model diffs (as we should be), so that means that patches like D74976 should have better coverage. Differential Revision: https://reviews.llvm.org/D75689	2020-03-08 09:57:55 -04:00
Austin Kerbow	6f849d2fc0	[VectorCombine] Fix assert on compare extract index Extract index could be a differnet integral type. Differential Revision: https://reviews.llvm.org/D75327	2020-02-28 10:37:08 -08:00
Sanjay Patel	98b9a418a9	[VectorCombine] add a debug flag to skip all transforms As suggested in D75145 - I'm not sure why, but several passes have this kind of disable/enable flag implemented at the pass manager level. But that means we have to duplicate the flag for both pass managers and add code to check the flag every time the pass appears in the pipeline. We want a debug option to see if this pass is misbehaving regardless of the pass managers, so just add a disablement check at the single point before any transforms run. Differential Revision: https://reviews.llvm.org/D75204	2020-02-26 15:15:42 -05:00
Sanjay Patel	653c6caf15	[VectorCombine] make cost calc consistent for binops and cmps Code duplication (subsequently removed by refactoring) allowed a logic discrepancy to creep in here. We were being conservative about creating a vector binop -- but not a vector cmp -- in the case where a vector op has the same estimated cost as the scalar op. We want to be more aggressive here because that can allow other combines based on reduced instruction count/uses. We can reverse the transform in DAGCombiner (potentially with a more accurate cost model) if this causes regressions. AFAIK, this does not conflict with InstCombine. We have a scalarize transform there, but it relies on finding a constant operand or a matching insertelement, so that means it eliminates an extractelement from the sequence (so we won't have 2 extracts by the time we get here if InstCombine succeeds). Differential Revision: https://reviews.llvm.org/D75062	2020-02-25 08:41:59 -05:00
Sanjay Patel	d00cc088dd	[VectorCombine] refactor to reduce duplicated code; NFC This should be the last step in the current cleanup. Follow-ups should resolve the TODO about cost calc and enable the more general case where we extract different elements.	2020-02-21 15:56:00 -05:00
Sanjay Patel	dd89b4293b	[VectorCombine] refactor cost calcs to reduce duplication; NFC More cleanup is possible now, but we probably need to resolve the TODO about the existing difference between compares and binops.	2020-02-21 15:12:00 -05:00
Sanjay Patel	e7f26faadd	[VectorCombine] refactor matching code to reduce duplication; NFC cmp/binop were already diverging even though they are largely the same logic.	2020-02-21 12:06:51 -05:00

1 2

55 Commits