archived-llvm-mirror

mirror of https://github.com/RPCS3/llvm-mirror.git synced 2026-01-31 01:35:20 +01:00

Author	SHA1	Message	Date
Alexey Bataev	2b896fc0d5	[SLP]Improved isGatherShuffledEntry, NFC. Reworked isGatherShuffledEntry function, simplified and moved common code to the lambda (it shall go away when non-power-2 patch will be landed).	2021-04-27 05:59:46 -07:00
Florian Hahn	eed27336cf	[LV] Hoist code to get vector loop latch (NFC). Address suggestion from D99294.	2021-04-27 13:30:17 +01:00
Sanjay Patel	5d538c6cd1	[IndVars] avoid crash in LFTR when assuming an add recurrence The test is a crasher reduced from: https://llvm.org/PR49993 linearFunctionTestReplace() assumes that we have an add recurrence, so check for that as a condition of matching a loop counter. Differential Revision: https://reviews.llvm.org/D101291	2021-04-27 08:26:02 -04:00
Florian Hahn	1eea0872a4	[VPlan] Use recursive traversal iterator in VPSlotTracker. This patch simplifies VPSlotTracker by using the recursive traversal iterator to traverse all blocks in a VPlan in reverse post-order when numbering VPValues in a plan. This depends on a fix to RPOT (D100169). It also extends the traversal unit tests to check RPOT. Reviewed By: a.elovikov Differential Revision: https://reviews.llvm.org/D100176	2021-04-27 12:39:06 +01:00
Vitaly Buka	519ec0d9ab	[NFC] Fix "not used" warning	2021-04-26 22:09:23 -07:00
Arthur Eubanks	1f40f70a19	[Inliner] Make ModuleInlinerWrapperPass return PreservedAnalyses::all() The ModulePassManager should already have taken care of all analysis invalidation. Without this change, upcoming changes will cause more invalidation than necessary. Reviewed By: mtrofin Differential Revision: https://reviews.llvm.org/D101320	2021-04-26 17:22:35 -07:00
William S. Moses	e7084f2810	[NVPTX] Enable lowering of atomics on local memory LLVM does not have valid assembly backends for atomicrmw on local memory. However, as this memory is thread local, we should be able to lower this to the relevant load/store. Differential Revision: https://reviews.llvm.org/D98650	2021-04-26 20:12:12 -04:00
Hongtao Yu	7416a011e2	[CSSPGO] Unblock optimizations with pseudo probe instrumentation part 2. As a follow-up to D95982, this patch continues unblocking optimizations that are blocked by pseudu probe instrumention. The optimizations unblocked are: - In-block load propagation. - In-block dead store elimination - Memory copy optimization that turns stores to consecutive memories into a memset. These optimizations are local to a block, so they shouldn't affect the profile quality. Reviewed By: wmi Differential Revision: https://reviews.llvm.org/D100075	2021-04-26 16:52:33 -07:00
Fangrui Song	d0dc7f2344	[ADT] Remove StatisticBase and make NoopStatistic empty In LLVM_ENABLE_STATS=0 builds, `llvm::Statistic` maps to `llvm::NoopStatistic` but has 3 mostly unused pointers. GlobalOpt considers that the pointers can potentially retain allocated objects, so GlobalOpt cannot optimize out the `NoopStatistic` variables (see D69428 for more context), wasting 23KiB for stage 2 clang. This patch makes `NoopStatistic` empty and thus reclaims the wasted space. The clang size is even smaller than applying D69428 (slightly smaller in both .bss and .text). ``` # This means the D69428 optimization on clang is mostly nullified by this patch. HEAD+D69428: size(.bss) = 0x0725a8 HEAD+D101211: size(.bss) = 0x072238 # bloaty - HEAD+D69428 vs HEAD+D101211 # With D101211, we also save a lot of string table space (.rodata). FILE SIZE VM SIZE -------------- -------------- -0.0% -32 -0.0% -24 .eh_frame -0.0% -336 [ = ] 0 .symtab -0.0% -360 [ = ] 0 .strtab [ = ] 0 -0.2% -880 .bss -0.0% -2.11Ki -0.0% -2.11Ki .rodata -0.0% -2.89Ki -0.0% -2.89Ki .text -0.0% -5.71Ki -0.0% -5.88Ki TOTAL ``` Note: LoopFuse is a disabled pass. For now this patch adds `#if LLVM_ENABLE_STATS` so `OptimizationRemarkMissed` is skipped in LLVM_ENABLE_STATS==0 builds. If these `OptimizationRemarkMissed` are useful in LLVM_ENABLE_STATS==0 builds, we can replace `llvm::Statistic` with `llvm::TrackingStatistic`, or use a different abstraction to keep track of the strings. Similarly, skip the code in `mlir/lib/Pass/PassStatistics.cpp` which calls `getName`/`getDesc`/`getValue`. Reviewed By: lattner Differential Revision: https://reviews.llvm.org/D101211	2021-04-26 16:47:32 -07:00
William S. Moses	5b35b95712	Revert "[NVPTX] Enable lowering of atomics on local memory" This reverts commit fede99d386ec9e7bab2762aa16cb10c0513ae464.	2021-04-26 19:33:01 -04:00
William S. Moses	9ac62ee58d	[NVPTX] Enable lowering of atomics on local memory LLVM does not have valid assembly backends for atomicrmw on local memory. However, as this memory is thread local, we should be able to lower this to the relevant load/store. Differential Revision: https://reviews.llvm.org/D98650	2021-04-26 19:27:27 -04:00
Lei Zhang	6f2c652025	Revert "[ADT] Remove StatisticBase and make NoopStatistic empty" This reverts commit b5403117814a7c39b944839e10492493f2ceb4ac because it breaks MLIR build: https://buildkite.com/mlir/mlir-core/builds/13299#ad0f8901-dfa4-43cf-81b8-7940e2c6c15b	2021-04-26 18:31:04 -04:00
Michael Kruse	7bf3404c76	[SimplifyCFG] Preserve metadata when unconditionalizing branches (same target). When replacing a conditional branch by an unconditional one because the targets are identical, transfer the metadata to the new branch instruction. Reviewed By: lebedev.ri Differential Revision: https://reviews.llvm.org/D101226	2021-04-26 17:23:01 -05:00
Fangrui Song	7fa56f879f	[ADT] Remove StatisticBase and make NoopStatistic empty In LLVM_ENABLE_STATS=0 builds, `llvm::Statistic` maps to `llvm::NoopStatistic` but has 3 unused pointers. GlobalOpt considers that the pointers can potentially retain allocated objects, so GlobalOpt cannot optimize out the `NoopStatistic` variables (see D69428 for more context), wasting 23KiB for stage 2 clang. This patch makes `NoopStatistic` empty and thus reclaims the wasted space. The clang size is even smaller than applying D69428 (slightly smaller in both .bss and .text). ``` # This means the D69428 optimization on clang is mostly nullified by this patch. HEAD+D69428: size(.bss) = 0x0725a8 HEAD+D101211: size(.bss) = 0x072238 # bloaty - HEAD+D69428 vs HEAD+D101211 # With D101211, we also save a lot of string table space (.rodata). FILE SIZE VM SIZE -------------- -------------- -0.0% -32 -0.0% -24 .eh_frame -0.0% -336 [ = ] 0 .symtab -0.0% -360 [ = ] 0 .strtab [ = ] 0 -0.2% -880 .bss -0.0% -2.11Ki -0.0% -2.11Ki .rodata -0.0% -2.89Ki -0.0% -2.89Ki .text -0.0% -5.71Ki -0.0% -5.88Ki TOTAL ``` Note: LoopFuse is a disabled pass. This patch adds `#if LLVM_ENABLE_STATS` so `OptimizationRemarkMissed` is skipped in LLVM_ENABLE_STATS==0 builds. If these `OptimizationRemarkMissed` are useful and not noisy, we can replace `llvm::Statistic` with `llvm::TrackingStatistic` in the future. Reviewed By: lattner Differential Revision: https://reviews.llvm.org/D101211	2021-04-26 13:39:35 -07:00
Fangrui Song	4cb0dafe51	[gcov] Set nounwind and respect module flags metadata "frame-pointer" & "uwtable" for synthesized functions This applies the D100251 mechanism to the gcov instrumentation pass. With this patch, `-fno-omit-frame-pointer` in `clang -fprofile-arcs -O1 -fno-omit-frame-pointer` will be respected for synthesized `__llvm_gcov_writeout,__llvm_gcov_reset,__llvm_gcov_init` functions: the frame pointer will be kept (note: on many targets -O1 eliminates the frame pointer by default). `clang -fno-exceptions -fno-asynchronous-unwind-tables -g -fprofile-arcs` will produce .debug_frame instead of .eh_frame. Fix: https://github.com/ClangBuiltLinux/linux/issues/955 Reviewed By: nickdesaulniers Differential Revision: https://reviews.llvm.org/D101129	2021-04-26 13:30:21 -07:00
Michael Kruse	11f3dcfec5	[SimplifyCFG] Preserve metadata when unconditionalizing branches (constant condition). When replacing a conditional branch by an unconditional one because the condition is a constant, transfer the metadata to the new branch instruction. Part of fix for llvm.org/PR50060 Reviewed By: lebedev.ri Differential Revision: https://reviews.llvm.org/D101141	2021-04-26 10:57:31 -05:00
Dávid Bolvanský	8752b55ca6	[InstCombine] C - ctpop(a) - > ctpop(~a)) if C is bitwidth (PR50104) Proof: https://alive2.llvm.org/ce/z/mncA9K Solves https://bugs.llvm.org/show_bug.cgi?id=50104 Reviewed By: spatel Differential Revision: https://reviews.llvm.org/D101257	2021-04-26 15:40:54 +02:00
Yuanbo Li	5b6b066173	[LSR][DebugInfo] Don't unnecessarily drop DebugLocs When transforming a loop terminating condition into a "max" comparison, the DebugLoc from the old condition should be set on the newly created comparison. They are the same operation, just optimized. Fixes PR48067. Differential Revision: https://reviews.llvm.org/D98218	2021-04-26 13:14:42 +01:00
Florian Hahn	cc6f98f11f	[VPlan] Make blocksOnly work properly with ranges over const pointers. When iterating over const blocks, the base type in the lambdas needs to use const VPBlockBase *, otherwise it cannot be used with input iterators over const VPBlockBase. Also adjust the type of the input iterator range to const &, as it does not take ownership of the input range.	2021-04-26 10:52:35 +01:00
Florian Hahn	55f97d3d98	[VPlan] Add VPBlockUtils::blocksOnly helper. This patch adds a blocksOnly helpers which take an iterator range over VPBlockBase * or const VPBlockBase * and returns an interator range that only include BlockTy blocks. The accesses are casted to BlockTy. Reviewed By: a.elovikov Differential Revision: https://reviews.llvm.org/D101093	2021-04-25 17:38:09 +01:00
Florian Hahn	b7e8ac7d19	[NewGVN] Properly transfer PredDep in move constructor.	2021-04-25 11:22:59 +01:00
Florian Hahn	1f7961e68e	[NewGVN] Use ExprResult to add extra predicate users. This patch updates performSymbolicPredicateInfoEvaluation to manage registering additional dependencies using ExprResult. Similar to D99987, this fixes an issues where we failed to track the correct dependency for a phi-of-ops value, which is marked as temporary. Fixes PR49873. Reviewed By: asbirlea, ruiling Differential Revision: https://reviews.llvm.org/D100560	2021-04-25 11:13:32 +01:00
Florian Hahn	2aa8af916f	[NewGVN] Use performSymbolicEvaluation instead of createExpression. performSymbolicEvaluation is used to obtain the symbolic expression when visiting instructions and this is used to determine their congruence class. performSymbolicEvaluation only creates expressions for certain instructions (via createExpression). For unsupported instructions, 'unknown' expression are created. The use of createExpression in processOutgoingEdges means we may simplify the condition in processOutgoingEdges to a constant in the initial round of processing, but we use Unknown(I) for the congruence class. If an operand of I changes the expression Unknown(I) stays the same, so there is no update of the congruence class of I. Hence it won't get re-visited. So if an operand of I changes in a way that causes createExpression to return different result, this update is missed. This patch updates the code to use performSymbolicEvaluation, to be symmetric with the congruence class updating code. Reviewed By: asbirlea Differential Revision: https://reviews.llvm.org/D99990	2021-04-24 18:49:07 +01:00
Dávid Bolvanský	abe87e4bc2	[InstCombine] Fixed UB in foldCtpop	2021-04-24 19:44:16 +02:00
Dávid Bolvanský	3884c3dc13	[InstCombine] ctpop(rot(X)) -> ctpop(X) Proof: https://alive2.llvm.org/ce/z/ss2zyt - rotl https://alive2.llvm.org/ce/z/ZM7Aue - rotr Reviewed By: nikic Differential Revision: https://reviews.llvm.org/D101235	2021-04-24 18:25:03 +02:00
Dávid Bolvanský	e156538759	[InstCombine] ctpop(X) + ctpop(Y) => ctpop(X \| Y) if X and Y have no common bits (PR48999) For example: ``` int src(unsigned int a, unsigned int b) { return __builtin_popcount(a << 16) + __builtin_popcount(b >> 16); } int tgt(unsigned int a, unsigned int b) { return __builtin_popcount((a << 16) \| (b >> 16)); } ``` Reviewed By: nikic Differential Revision: https://reviews.llvm.org/D101210	2021-04-24 17:52:10 +02:00
dfukalov	4398288fa7	[GVN] Clobber partially aliased loads. Use offsets stored in `AliasResult` implemented in D98718. Reviewed By: nikic Differential Revision: https://reviews.llvm.org/D95543	2021-04-24 14:14:20 +03:00
wlei	1f3e5b2188	[CSSPGO] Fix missing debug info of dangling pseudo probe While doing speculative execution opt, it conservatively drops all insn's debug info in the merged `ThenBB`(see the loop at line 2384) including the dangling probe. The missing debug info of the dangling probe will cause the wrong inference computation. So we should avoid dropping the debug info from pseudo probe, this change try to fix this by moving the to-be dangling probe to the merging target BB before the debug info is dropped. Reviewed By: hoy, wenlei Differential Revision: https://reviews.llvm.org/D101195	2021-04-23 14:26:47 -07:00
Dávid Bolvanský	a2061d4adb	[InstCombine] X - usub.sat(X, Y) => umin(X, Y) Pattern regressed in LLVM 9 with the introduction of usub.sat. Fixes https://bugs.llvm.org/show_bug.cgi?id=42178#c2 Reviewed By: nikic Differential Revision: https://reviews.llvm.org/D101184	2021-04-23 21:13:07 +02:00
Hongtao Yu	cbb480a8eb	[CSSPGO] Fix incorrect prorating indirect call distribution factor that leads to target count loss. Pseudo probe distribution factor is used to scale down profile samples to avoid misleading the counts inference due to the usage of "maximum" in `getBlockWeight`. For callsites, the scaling down can come from code duplication prior to the sample profile loader (prelink or postlink), or due to the indirect call promotion in sample loader inliner. This patch fixes an issue in sample loader ICP where the leftover indirect callsite scaling down causes the loss of non-promoted call target samples unexpectedly. While the scaling down is to favor BFI/BPI with accurate an callsite count, it doesn't fit in the current distribution factor that represents code duplication changes. Ideally, we would need two factors, one is for code duplication, the other is for ICP. However this seems over complicated. I'm going to trade one usage (callsite counts) for the other (call target counts). Seeing perf win on one benchmark (mcf) of SPEC2017 with others unchanged. Reviewed By: wenlei Differential Revision: https://reviews.llvm.org/D100993	2021-04-23 11:09:22 -07:00
Sanjay Patel	173c7ee255	[InstCombine] fold 'not' of ctpop in parity pattern As discussed in https://llvm.org/PR50096 , we could convert the 'not' into a 'sub' and see the same fold. That's because we already have another demanded bits optimization for 'sub'. We could add a related transform for odd-number-of-type-bits, but that seems unlikely to be practical. https://alive2.llvm.org/ce/z/TWJZXr	2021-04-23 13:23:24 -04:00
Florian Hahn	4b94f2fec6	[VPlan] Add GraphTraits impl to traverse through VPRegionBlock. This patch adds a new iterator to traverse through VPRegionBlocks and a GraphTraits specialization using the iterator to traverse through VPRegionBlocks. Because there is already a GraphTraits specialization for VPBlockBase * and co, a new VPBlockRecursiveTraversalWrapper helper is introduced. This allows us to provide a new GraphTraits specialization for that type. Users can use the new recursive traversal by using this wrapper. The graph trait visits both the entry block of a region, as well as all its successors. Exit blocks of a region implicitly have their parent region's successors. This ensures all blocks in a region are visited before any blocks in a successor region when doing a reverse post-order traversal of the graph. Reviewed By: a.elovikov Differential Revision: https://reviews.llvm.org/D100175	2021-04-23 17:26:47 +01:00
Sander de Smalen	b71b6a828f	[TTI] NFC: Change getIntImmCost[Inst\|Intrin] to return InstructionCost This patch migrates the TTI cost interfaces to return an InstructionCost. See this patch for the introduction of the type: https://reviews.llvm.org/D91174 See this thread for context: http://lists.llvm.org/pipermail/llvm-dev/2020-November/146408.html Differential Revision: https://reviews.llvm.org/D100565	2021-04-23 16:06:36 +01:00
Sander de Smalen	b447679db3	[TTI] NFC: Change getScalingFactorCost to return InstructionCost This patch migrates the TTI cost interfaces to return an InstructionCost. See this patch for the introduction of the type: https://reviews.llvm.org/D91174 See this thread for context: http://lists.llvm.org/pipermail/llvm-dev/2020-November/146408.html Differential Revision: https://reviews.llvm.org/D100564	2021-04-23 16:06:36 +01:00
Timm Bäder	438fece2aa	[llvm][NFC] Fix assert indentation This triggers GCC's misleading-indentation checker.	2021-04-23 14:44:05 +02:00
Dávid Bolvanský	30eb4998c0	[InstCombine] Fixed crash when setting align attr for memalign	2021-04-23 14:04:08 +02:00
Florian Hahn	baa2054364	Recommit "[NewGVN] Track simplification dependencies for phi-of-ops." This recommits 4f5da356ff35a218f23f0b0c4d08aee90da7de6e, including explicit implementations of move a constructor and deleted copy constructors/assignment operators, to fix failures with some compilers. This reverts the revert 74854d00e854196445727a49df58fe5768d9ed5b.	2021-04-23 11:27:43 +01:00
Stephen Tozer	8b8275b1fc	Re-reapply "[DebugInfo] Use variadic debug values to salvage BinOps and GEP instrs with non-const operands" Previous build failures were caused by an error in bitcode reading and writing for DIArgList metadata, which has been fixed in e5d844b587. There were also some unnecessary asserts that were being triggered on certain builds, which have been removed. This reverts commit dad5caa59e6b2bde8d6cf5b64a972c393c526c82.	2021-04-23 10:54:01 +01:00
Florian Hahn	3a8b302d64	Revert "[NewGVN] Track simplification dependencies for phi-of-ops." This reverts commit 4f5da356ff35a218f23f0b0c4d08aee90da7de6e. This causes some buildbot failures, e.g. https://lab.llvm.org/buildbot/#/builders/139/builds/3019	2021-04-23 09:56:17 +01:00
Florian Hahn	a088ff213f	[NewGVN] Track simplification dependencies for phi-of-ops. If we are using a simplified value, we need to add an extra dependency this value , because changes to the class of the simplified value may require us to invalidate any decision based on that value. This is done by adding such values as additional users, however the current code does not excludes temporary instructions. At the moment, this means that we miss those dependencies for phi-of-ops, because they are temporary instructions at this point. We instead need to add the extra dependencies to the root instruction of the phi-of-ops. This patch pushes the responsibility of adding extra users to the callers of createExpression & performSymbolicEvaluation. At those points, it is clearer which real instruction to pick. Alternatively we could either pass the 'real' instruction as additional argument or use another map, but I think the approach in the patch makes things a bit easier to follow. Fixes PR35074. Reviewed By: asbirlea Differential Revision: https://reviews.llvm.org/D99987	2021-04-23 09:48:38 +01:00
KAWASHIMA Takahiro	47186c3ead	[LoopReroll] Fix rerolling loop with extra instructions Fixes PR47627 This fix suppresses rerolling a loop which has an unrerollable instruction. Sample IR for the explanation below: ``` define void @foo([2 x i32]* nocapture %a) { entry: br label %loop loop: ; base instruction %indvar = phi i64 [ 0, %entry ], [ %indvar.next, %loop ] ; unrerollable instructions %stptrx = getelementptr inbounds [2 x i32], [2 x i32]* %a, i64 %indvar, i64 0 store i32 999, i32* %stptrx, align 4 ; extra simple arithmetic operations, used by root instructions %plus20 = add nuw nsw i64 %indvar, 20 %plus10 = add nuw nsw i64 %indvar, 10 ; root instruction 0 %ldptr0 = getelementptr inbounds [2 x i32], [2 x i32]* %a, i64 %plus20, i64 0 %value0 = load i32, i32* %ldptr0, align 4 %stptr0 = getelementptr inbounds [2 x i32], [2 x i32]* %a, i64 %plus10, i64 0 store i32 %value0, i32* %stptr0, align 4 ; root instruction 1 %ldptr1 = getelementptr inbounds [2 x i32], [2 x i32]* %a, i64 %plus20, i64 1 %value1 = load i32, i32* %ldptr1, align 4 %stptr1 = getelementptr inbounds [2 x i32], [2 x i32]* %a, i64 %plus10, i64 1 store i32 %value1, i32* %stptr1, align 4 ; loop-increment and latch %indvar.next = add nuw nsw i64 %indvar, 1 %exitcond = icmp eq i64 %indvar.next, 5 br i1 %exitcond, label %exit, label %loop exit: ret void } ``` In the loop rerolling pass, `%indvar` and `%indvar.next` are appended to the `LoopIncs` vector in the `LoopReroll::DAGRootTracker::findRoots` function. Before this fix, two instructions with `unrerollable instructions` comment above are marked as `IL_All` at the end of the `LoopReroll::DAGRootTracker::collectUsedInstructions` function, as well as instructions with `extra simple arithmetic operations` comment and `loop-increment and latch` comment. It is incorrect because `IL_All` means that the instruction should be executed in all iterations of the rerolled loop but the `store` instruction should not. This fix rejects instructions which may have side effects and don't belong to def-use chains of any root instructions and reductions. See https://bugs.llvm.org/show_bug.cgi?id=47627 for more information.	2021-04-23 15:14:46 +09:00
Elia Geretto	99885567cb	[dfsan] Fix Len argument type in call to __dfsan_mem_transfer_callback This patch is supposed to solve: https://bugs.llvm.org/show_bug.cgi?id=50075 The function `__dfsan_mem_transfer_callback` takes a `Len` argument of type `i64`; however, when processing a `MemTransferInst` such as `llvm.memcpy.p0i8.p0i8.i32`, the `len` argument has type `i32`. In order to make the type of `len` compatible with the one of the callback argument, this change zero-extends it when necessary. Reviewed By: stephan.yichao.zhao, gbalats Differential Revision: https://reviews.llvm.org/D101048	2021-04-22 21:12:20 +00:00
Arthur Eubanks	21048e7590	[GlobalOpt] Don't replace alias with aliasee if aliasee is interposable Both the alias and aliasee linkage are important. PR27866 provides some background. Reviewed By: rnk Differential Revision: https://reviews.llvm.org/D99629	2021-04-22 13:12:34 -07:00
Philip Reames	8361e53fbe	Revert "[instcombine] Exploit UB implied by nofree attributes" This change effectively reverts 86664638, but since there have been some changes on top and I wanted to leave the tests in, it's not a mechanical revert. Why revert this now? Two main reasons: 1) There are continuing discussion around what the semantics of nofree. I am getting increasing uncomfortable with the seeming possibility we might redefine nofree in a way incompatible with these changes. 2) There was a reported miscompile triggered by this change (https://github.com/emscripten-core/emscripten/issues/9443). At first, I was making good progress on tracking down the issues exposed and those issues appeared to be unrelated latent bugs. Now that we've found at least one bug in the original change, and the investigation has stalled, I'm no longer comfortable leaving this in tree. In retrospect, I probably should have reverted this earlier and investigated the issues once the triggering change was out of tree.	2021-04-22 10:53:17 -07:00
Jianzhou Zhao	94cf740f57	[dfsan] Track origin at loads The first version of origin tracking tracks only memory stores. Although this is sufficient for understanding correct flows, it is hard to figure out where an undefined value is read from. To find reading undefined values, we still have to do a reverse binary search from the last store in the chain with printing and logging at possible code paths. This is quite inefficient. Tracking memory load instructions can help this case. The main issues of tracking loads are performance and code size overheads. With tracking only stores, the code size overhead is 38%, memory overhead is 1x, and cpu overhead is 3x. In practice #load is much larger than #store, so both code size and cpu overhead increases. The first blocker is code size overhead: link fails if we inline tracking loads. The workaround is using external function calls to propagate metadata. This is also the workaround ASan uses. The cpu overhead is ~10x. This is a trade off between debuggability and performance, and will be used only when debugging cases that tracking only stores is not enough. Reviewed By: gbalats Differential Revision: https://reviews.llvm.org/D100967	2021-04-22 16:25:24 +00:00
Alexey Bataev	dab3d7322e	[SLP]Skip undefs trying to find perfect/shuffled tree entries matching. We can skip check for undefs trying to find perfect/shuffled tree entries matching, they can be ignored completely improving the final cost/vectorization results. Differential Revision: https://reviews.llvm.org/D101061	2021-04-22 08:59:07 -07:00
Joe Ellis	0427e8801a	[LoopVectorize] Fix bug where predicated loads/stores were dropped This commit fixes a bug where the loop vectoriser fails to predicate loads/stores when interleaving for targets that support masked loads and stores. Code such as: 1 void foo(int restrict data1, int restrict data2) 2 { 3 int counter = 1024; 4 while (counter--) 5 if (data1[counter] > data2[counter]) 6 data1[counter] = data2[counter]; 7 } ... could previously be transformed in such a way that the predicated store implied by: if (data1[counter] > data2[counter]) data1[counter] = data2[counter]; ... was lost, resulting in miscompiles. This bug was causing some tests in llvm-test-suite to fail when built for SVE. Differential Revision: https://reviews.llvm.org/D99569	2021-04-22 15:05:54 +00:00
Alexey Bataev	be1b72b8b6	[SLP]Replace more `TTI` with `TTIRef`, NFC. To pacify MSVC buildbots.	2021-04-22 07:53:20 -07:00
Alexey Bataev	b09b5f35d0	[SLP]Added explicit ref to TargetTransformInfo to try to pacify MSVC buildbots, NFC.	2021-04-22 07:49:48 -07:00
Alexey Bataev	a899f9f408	[SLP]Improve cost model for the vectorized extractelements. 1. No need to call `areAllUsersVectorized` as later the cost is calculated only if the instruction has one use and gets vectorized. 2. Need to calculate the cost of the dead extractelement more precisely, taking the vector type of the vector operand, not the resulting vector type. Part of D57059. Differential Revision: https://reviews.llvm.org/D99980	2021-04-22 07:40:17 -07:00

1 2 3 4 5 ...

27173 Commits