llvm-mirror

mirror of https://github.com/RPCS3/llvm-mirror.git synced 2025-03-02 09:16:40 +00:00

Author	SHA1	Message	Date
Roman Lebedev	bf8ef29ff5	[NFC][InstCombine] Add a PHI-of-insertvalues test with different base aggregate types	2020-08-26 09:57:50 +03:00
Roman Lebedev	a2f15b5ff5	Revert "[InstCombine] PHI-of-extractvalues -> extractvalue-of-PHI, aka invokes are bad" This reverts commit fcb51d8c2460faa23b71e06abb7e826243887dd6. As buildbots report, there's apparently some missing check to ensure that the types of incoming values match the type of PHI. Let's revert for a moment.	2020-08-26 09:23:22 +03:00
Roman Lebedev	5ec7b497bf	[InstCombine] PHI-of-extractvalues -> extractvalue-of-PHI, aka invokes are bad While since D86306 we do it's sibling fold for `insertvalue`, we should also do this for `extractvalue`'s. And unlike that one, the results here are, quite honestly, shocking, as it can be observed here on vanilla llvm test-suite + RawSpeed results: ``` \| statistic name \| baseline \| proposed \| Δ \| % \| \|%\| \| \|----------------------------------------------------\|-----------\|-----------\|--------:\|--------:\|-------:\| \| asm-printer.EmittedInsts \| 7945095 \| 7942507 \| -2588 \| -0.03% \| 0.03% \| \| assembler.ObjectBytes \| 273209920 \| 273069800 \| -140120 \| -0.05% \| 0.05% \| \| early-cse.NumCSE \| 2183363 \| 2183398 \| 35 \| 0.00% \| 0.00% \| \| early-cse.NumSimplify \| 541847 \| 550017 \| 8170 \| 1.51% \| 1.51% \| \| instcombine.NumAggregateReconstructionsSimplified \| 2139 \| 108 \| -2031 \| -94.95% \| 94.95% \| \| instcombine.NumCombined \| 3601364 \| 3635448 \| 34084 \| 0.95% \| 0.95% \| \| instcombine.NumConstProp \| 27153 \| 27157 \| 4 \| 0.01% \| 0.01% \| \| instcombine.NumDeadInst \| 1694521 \| 1765022 \| 70501 \| 4.16% \| 4.16% \| \| instcombine.NumPHIsOfExtractValues \| 0 \| 37546 \| 37546 \| 0.00% \| 0.00% \| \| instcombine.NumSunkInst \| 63158 \| 63686 \| 528 \| 0.84% \| 0.84% \| \| instcount.NumBrInst \| 874304 \| 871857 \| -2447 \| -0.28% \| 0.28% \| \| instcount.NumCallInst \| 1757657 \| 1758402 \| 745 \| 0.04% \| 0.04% \| \| instcount.NumExtractValueInst \| 45623 \| 11483 \| -34140 \| -74.83% \| 74.83% \| \| instcount.NumInsertValueInst \| 4983 \| 580 \| -4403 \| -88.36% \| 88.36% \| \| instcount.NumInvokeInst \| 61018 \| 59478 \| -1540 \| -2.52% \| 2.52% \| \| instcount.NumLandingPadInst \| 35334 \| 34215 \| -1119 \| -3.17% \| 3.17% \| \| instcount.NumPHIInst \| 344428 \| 331116 \| -13312 \| -3.86% \| 3.86% \| \| instcount.NumRetInst \| 100773 \| 100772 \| -1 \| 0.00% \| 0.00% \| \| instcount.TotalBlocks \| 1081154 \| 1077166 \| -3988 \| -0.37% \| 0.37% \| \| instcount.TotalFuncs \| 101443 \| 101442 \| -1 \| 0.00% \| 0.00% \| \| instcount.TotalInsts \| 8890201 \| 8833747 \| -56454 \| -0.64% \| 0.64% \| \| instsimplify.NumSimplified \| 75822 \| 75707 \| -115 \| -0.15% \| 0.15% \| \| simplifycfg.NumHoistCommonCode \| 24203 \| 24197 \| -6 \| -0.02% \| 0.02% \| \| simplifycfg.NumHoistCommonInstrs \| 48201 \| 48195 \| -6 \| -0.01% \| 0.01% \| \| simplifycfg.NumInvokes \| 2785 \| 4298 \| 1513 \| 54.33% \| 54.33% \| \| simplifycfg.NumSimpl \| 997332 \| 1018189 \| 20857 \| 2.09% \| 2.09% \| \| simplifycfg.NumSinkCommonCode \| 7088 \| 6464 \| -624 \| -8.80% \| 8.80% \| \| simplifycfg.NumSinkCommonInstrs \| 15117 \| 14021 \| -1096 \| -7.25% \| 7.25% \| ``` ... which tells us that this new fold fires whopping 38k times, increasing the amount of SimplifyCFG's `invoke`->`call` transforms by +54% (+1513) (again, D85787 did that last time), decreasing total instruction count by -0.64% (-56454), and sharply decreasing count of `insertvalue`'s (-88.36%, i.e. 9 times less) and `extractvalue`'s (-74.83%, i.e. four times less). This causes geomean -0.01% binary size decrease http://llvm-compile-time-tracker.com/compare.php?from=4d5ca22b8adfb6643466e4e9f48ba14bb48938bc&to=97dacca0111cb2ae678204e52a3cee00e3a69208&stat=size-text and, ignoring `O0-g`, is a geomean -0.01%..-0.05% compile-time improvement http://llvm-compile-time-tracker.com/compare.php?from=4d5ca22b8adfb6643466e4e9f48ba14bb48938bc&to=97dacca0111cb2ae678204e52a3cee00e3a69208&stat=instructions The other thing that tells is, is that while this is a massive win for `invoke`->`call` transform `InstCombinerImpl::foldAggregateConstructionIntoAggregateReuse()` fold, which is supposed to be dealing with such aggregate reconstructions, fires a lot less now. There are two reasons why: 1. After this fold, as it can be seen in tests, we may (will) end up with trivially redundant PHI nodes. We don't CSE them in InstCombine presently, which means that EarlyCSE needs to run and then InstCombine rerun. 2. But then, EarlyCSE not only manages to fold such redundant PHI's, it also sees that the extract-insert chain recreates the original aggregate, and replaces it with the original aggregate. The take-aways are 1. We maybe should do most trivial, same-BB PHI CSE in InstCombine 2. I need to check if what other patterns remain, and how they can be resolved. (i.e. i wonder if `foldAggregateConstructionIntoAggregateReuse()` might go away) Reviewed By: spatel Differential Revision: https://reviews.llvm.org/D86530	2020-08-26 09:08:24 +03:00
Mircea Trofin	b186c6758c	[MLInliner] Simplify TFUTILS_SUPPORTED_TYPES We only need the C++ type and the corresponding TF Enum. The other parameter was used for the output spec json file, but we can just standardize on the C++ type name there. Differential Revision: https://reviews.llvm.org/D86549	2020-08-25 14:19:39 -07:00
Juneyoung Lee	65c4cb9e7b	[ValueTracking] Let getGuaranteedNonPoisonOp find multiple non-poison operands This patch helps getGuaranteedNonPoisonOp find multiple non-poison operands. Instead of special-casing llvm.assume, I think it is also a viable option to add noundef to Intrinsics.td. If it makes sense, I'll make a patch for that. Reviewed By: jdoerfert Differential Revision: https://reviews.llvm.org/D86477	2020-08-26 04:40:21 +09:00
Juneyoung Lee	e44db8bec4	[ValueTracking] Add a noundef test for D86477; NFC	2020-08-26 04:40:21 +09:00
Arthur Eubanks	de0524e48b	[test] Add -inject-tli-mapping to -loop-vectorize -vector-library tests The legacy LoopVectorize has a dependency on InjectTLIMappingsLegacy. That cannot be expressed in the new PM since they are both normal passes. Explicitly add -inject-tli-mappings as a pass. Follow-up to https://reviews.llvm.org/D86492. Reviewed By: spatel Differential Revision: https://reviews.llvm.org/D86561	2020-08-25 11:55:11 -07:00
Arthur Eubanks	b28c9e3931	[NewPM][test] Fix accelerate-vector-functions.ll under NPM The legacy SLPVectorizer has a dependency on InjectTLIMappingsLegacy. That cannot be expressed in the new PM since they are both normal passes. Explicitly add -inject-tli-mappings as a pass. Reviewed By: spatel Differential Revision: https://reviews.llvm.org/D86492	2020-08-25 10:50:14 -07:00
Sanjay Patel	66453001f1	[InstCombine] improve demanded element analysis for vector insert-of-extract (2nd try) The 1st attempt (rG557b890) was reverted because it caused miscompiles. That bug is avoided here by changing the order of folds and as verified in the new tests. Original commit message: InstCombine currently has odd rules for folding insert-extract chains to shuffles, so we miss collapsing seemingly simple cases as shown in the tests here. But poison makes this not quite as easy as we might have guessed. Alive2 tests to show the subtle difference (similar to the regression tests): https://alive2.llvm.org/ce/z/hp4hv3 (this is ok) https://alive2.llvm.org/ce/z/ehEWaN (poison leakage) SLP tends to create these patterns (as shown in the SLP tests), and this could help with solving PR16739. Differential Revision: https://reviews.llvm.org/D86460	2020-08-25 11:19:36 -04:00
Sanjay Patel	c2dd03cd0f	[InstCombine] add vector demanded elements tests with shuffles; NFC The 1st draft of D86460 (reverted) would show miscompiles with these tests because the undef element tracking went wrong and became visible in the shuffle masks.	2020-08-25 11:19:35 -04:00
Sjoerd Meijer	1cd139275c	[LV] get.active.lane.mask consuming tripcount instead of backedge-taken count This adapts LV to the new semantics of get.active.lane.mask as discussed in D86147, which means that the LV now emits intrinsic get.active.lane.mask with the loop tripcount instead of the backedge-taken count as its second argument. The motivation for this is described in D86147. Differential Revision: https://reviews.llvm.org/D86304	2020-08-25 13:49:19 +01:00
Sam Parker	50697e16b0	[NFC][SimplifyCFG] More tests for Arm	2020-08-25 12:13:48 +01:00
Sam Parker	2ad592033d	[NFC][SimplifyCFG] Add some more tests for Arm.	2020-08-25 11:44:17 +01:00
Roman Lebedev	8908aecdf3	[NFC][InstCombine] Tests for PHI-of-extractvalues Much like with it's sibling fold HI-of-insertvalues, it appears to be much more worthwhile than it would seem.	2020-08-25 13:01:07 +03:00
Benjamin Kramer	cc40144ebc	Revert "[InstCombine] improve demanded element analysis for vector insert-of-extract" This reverts commit 557b890ff4f4dd5fa979c232df5b31cf3fef04c1. Causing miscompiles, test case is on llvm-commits.	2020-08-25 11:31:31 +02:00
David Sherwood	82c9874179	[SVE] Fix TypeSize related warnings with IR truncates of scalable vectors In getCastInstrCost when the instruction is a truncate we were relying upon the implicit TypeSize -> uint64_t cast when asking if a given type has the same size as a legal integer. I've changed the code to only ask the question if the type is fixed length. I have also changed InstCombinerImpl::SimplifyDemandedUseBits to bail out for now if the type is a scalable vector. I've added the following new tests: Analysis/CostModel/AArch64/sve-trunc.ll Transforms/InstCombine/AArch64/sve-trunc.ll for both of these fixes. Differential revision: https://reviews.llvm.org/D86432	2020-08-25 09:17:56 +01:00
Roman Lebedev	ed8ecc651f	[InstCombine] PHI-of-insertvalues -> insertvalue-of-PHI's As per statistic, this happens pretty exceedingly rare, but i have seen it in exactly the situations the Phi-aware aggregate reconstruction would have handled, eventually, and allowed invoke -> call fold later on. So while this might be something that other fold will have to learn about, i believe we should be doing this transform in general. Here, we are okay with adding two PHI's to get both the base aggregate, and the inserted value. I'm not sure it makes much sense to restrict it to a single phi (to just the inserted value?), because originally we'd be receiving the final aggregate already.. llvm test-suite + RawSpeed: ``` \| statistic name \| baseline \| proposed \| Δ \| % \| \\|%\\| \| \|--------------------------------------------\|-----------\|-----------\|-----:\|-------:\|------:\| \| instcombine.NumPHIsOfInsertValues \| 0 \| 12 \| 12 \| 0.00% \| 0.00% \| \| asm-printer.EmittedInsts \| 8926643 \| 8926595 \| -48 \| 0.00% \| 0.00% \| \| instcombine.NumCombined \| 3846614 \| 3846640 \| 26 \| 0.00% \| 0.00% \| \| instcombine.NumConstProp \| 24302 \| 24293 \| -9 \| -0.04% \| 0.04% \| \| instcombine.NumDeadInst \| 1620140 \| 1620112 \| -28 \| 0.00% \| 0.00% \| \| instcount.NumBrInst \| 898466 \| 898464 \| -2 \| 0.00% \| 0.00% \| \| instcount.NumCallInst \| 1760819 \| 1760875 \| 56 \| 0.00% \| 0.00% \| \| instcount.NumExtractValueInst \| 45659 \| 45649 \| -10 \| -0.02% \| 0.02% \| \| instcount.NumInsertValueInst \| 4991 \| 4981 \| -10 \| -0.20% \| 0.20% \| \| instcount.NumIntToPtrInst \| 27084 \| 27087 \| 3 \| 0.01% \| 0.01% \| \| instcount.NumPHIInst \| 371435 \| 371429 \| -6 \| 0.00% \| 0.00% \| \| instcount.NumStoreInst \| 906011 \| 906019 \| 8 \| 0.00% \| 0.00% \| \| instcount.TotalBlocks \| 1105520 \| 1105518 \| -2 \| 0.00% \| 0.00% \| \| instcount.TotalInsts \| 9795737 \| 9795776 \| 39 \| 0.00% \| 0.00% \| \| simplifycfg.NumInvokes \| 2784 \| 2786 \| 2 \| 0.07% \| 0.07% \| \| simplifycfg.NumSimpl \| 1001840 \| 1001850 \| 10 \| 0.00% \| 0.00% \| \| simplifycfg.NumSinkCommonInstrs \| 15174 \| 15170 \| -4 \| -0.03% \| 0.03% \| ``` Reviewed By: spatel Differential Revision: https://reviews.llvm.org/D86306	2020-08-25 10:38:11 +03:00
Mircea Trofin	9c71d4e1d1	[MLInliner] Support training that doesn't require partial rewards If we use training algorithms that don't need partial rewards, we don't need to worry about an ir2native model. In that case, training logs won't contain a 'delta_size' feature either (since that's the partial reward). Differential Revision: https://reviews.llvm.org/D86481	2020-08-24 17:36:29 -07:00
Sanjay Patel	8f9cb71b9c	[InstCombine] improve demanded element analysis for vector insert-of-extract InstCombine currently has odd rules for folding insert-extract chains to shuffles, so we miss collapsing seemingly simple cases as shown in the tests here. But poison makes this not quite as easy as we might have guessed. Alive2 tests to show the subtle difference (similar to the regression tests): https://alive2.llvm.org/ce/z/hp4hv3 (this is ok) https://alive2.llvm.org/ce/z/ehEWaN (poison leakage) SLP tends to create these patterns (as shown in the SLP tests), and this could help with solving PR16739. Differential Revision: https://reviews.llvm.org/D86460	2020-08-24 17:00:16 -04:00
Sanjay Patel	663055a339	[SLP] avoid 'tmp' names in regression tests; NFC That can cause problems for update_test_checks.py (it warns when updating this file).	2020-08-24 17:00:16 -04:00
Sanjay Patel	07008dabee	[InstCombine] add tests for insert+extract demanded elements; NFC	2020-08-24 17:00:16 -04:00
Bjorn Pettersson	d32d86bc38	[Scalarizer] Avoid updating the name of globals The "takeName" logic at the end of ScalarizerVisitor::finish could end up renaming global variables when having simplified and extractelement instruction to simply pick a single vector element. If the input vector to the extractelement instruction held pointers to global variables we ended up renaming the global variable. The patch make sure we only take the name of the replaced Op when we have added new instructions that might need a useful name. Reviewed By: lebedev.ri Differential Revision: https://reviews.llvm.org/D86472	2020-08-24 21:55:03 +02:00
Roman Lebedev	c9233e6b8a	[NFC][InstCombine] Multi-level aggregate test for phi-of-insertvalue pattern See https://reviews.llvm.org/D86306	2020-08-24 22:39:34 +03:00
Fangrui Song	9e7d7f3f68	Revert D85812 "[coroutine] should disable inline before calling coro split" This reverts commit 2e43acfed89b1903de473f682c65878bdebc395a. LLVMCoroutines (the library which contains Coroutines.h) depends on LLVMipo (the library which contains SampleProfile.cpp). It is inappropriate for SampleProfile.cpp to depent on Coroutines.h (circular dependency). The test inverted dependencies as well: llvm/test/Transforms/Coroutines/coro-inline.ll uses -sample-profile.	2020-08-24 11:41:05 -07:00
dongAxis	7a35eee5d4	[coroutine] should disable inline before calling coro split summary: When callee coroutine function is inlined into caller coroutine function before coro-split pass, llvm will emits "coroutine should have exactly one defining @llvm.coro.begin". It seems that coro-early pass can not handle this quiet well. So we believe that unsplited coroutine function should not be inlined. This patch fix such issue by not inlining function if it has attribute "coroutine.presplit" (it means the function has not been splited) to fix this issue TestPlan: check-llvm Reviewed By: wenlei Differential Revision: https://reviews.llvm.org/D85812	2020-08-24 22:22:08 +08:00
Florian Hahn	726de666d8	[DSE,MemorySSA] Delay PointerMayBeCaptured calls until actually needed. Avoid computing InvisibleToCallerBefore/AfterRet up front. In most cases, this information is not really needed. Instead, introduce helper functions to compute and cache the result on demand. Notably, this also does not use PointerMayBeCapturedBefore for isInvisibleToCallerBeforeRet, as it requires the killing MemoryDef as starting instruction, making the caching ineffective. But it appears the use of PointerMayBeCapturedBefore has very limited benefits in practice (e.g. on SPEC2000/SPEC2006/MultiSource there are no binary changes with -O3 -flto). Refrain from using it for now, to limit-compile-time. This gives some nice compile-time improvements: http://llvm-compile-time-tracker.com/compare.php?from=db9345f6810f379a36752dc52caf5230585d0ebd&to=b4d091047e1b8a3d377d200137b79d03aca65663&stat=instructions	2020-08-24 14:05:44 +01:00
Anna Welker	1f8e3db230	[ARM][MVE] Allow tail predication for strides !=1 with gather/scatters If gather/scatters are enabled, ARMTargetTransformInfo now allows tail predication for loops with a much wider range of strides, up to anything that is loop invariant. Differential Revision: https://reviews.llvm.org/D85410	2020-08-24 13:54:47 +01:00
Florian Hahn	8ba75d1814	[DSE,MemorySSA] Regnerate some check lines. The check lines where generated before align was added for all instructions. Re-generate them, to reduce diff noise for actual functional changes.	2020-08-24 13:24:44 +01:00
Florian Hahn	fd197bfffa	[DSE,MemorySSA] Limit elimination at end of function to single UO. Limit elimination of stores at the end of a function to MemoryDefs with a single underlying object, to save compile time. In practice, the case with multiple underlying objects seems not very important in practice. For -O3 -flto on MultiSource/SPEC2000/SPEC2006 this results in a total of 2 more stores being eliminated. We can always re-visit that in the future.	2020-08-24 13:00:17 +01:00
Sanjay Patel	8e77949af5	[InstCombine] fold abs of select with negated op (PR39474) Similar to the existing transform - peek through a select to match a value and its negation. https://alive2.llvm.org/ce/z/MXi5KG define i8 @src(i1 %b, i8 %x) { %0: %neg = sub i8 0, %x %sel = select i1 %b, i8 %x, i8 %neg %abs = abs i8 %sel, 1 ret i8 %abs } => define i8 @tgt(i1 %b, i8 %x) { %0: %abs = abs i8 %x, 1 ret i8 %abs } Transformation seems to be correct!	2020-08-24 07:37:55 -04:00
Sanjay Patel	ee2a844238	[InstCombine] add tests for abs of select with negated op; NFC (PR39474)	2020-08-24 07:37:54 -04:00
Roman Lebedev	6072801a3e	[InstCombine] Negator: freeze is freely negatible if it's operand is negatible	2020-08-23 23:28:19 +03:00
Roman Lebedev	324948b801	[NFC][InstCombine] Add tests for negation of freeze	2020-08-23 23:28:19 +03:00
Sanjay Patel	8998944672	[InstCombine] canonicalize 'not' ops before logical shifts This reverses the existing transform that would uniformly canonicalize any 'xor' after any shift. In the case of logical shifts, that turns a 'not' into an arbitrary 'xor' with constant, and that's probably not as good for analysis, SCEV, or codegen. The SCEV motivating case is discussed in: http://bugs.llvm.org/PR47136 There's an analysis motivating case at: http://bugs.llvm.org/PR38781 I did draft a patch that would do the same for 'ashr' but that's questionable because it's just swapping the position of a 'not' and uncovers at least 2 missing folds that we would probably need to deal with as preliminary steps. Alive proofs: https://rise4fun.com/Alive/BBV Name: shift right of 'not' Pre: C2 == (-1 u>> C1) %a = lshr i8 %x, C1 %r = xor i8 %a, C2 => %n = xor i8 %x, -1 %r = lshr i8 %n, C1 Name: shift left of 'not' Pre: C2 == (-1 << C1) %a = shl i8 %x, C1 %r = xor i8 %a, C2 => %n = xor i8 %x, -1 %r = shl i8 %n, C1 Name: ashr of 'not' %a = ashr i8 %x, C1 %r = xor i8 %a, -1 => %n = xor i8 %x, -1 %r = ashr i8 %n, C1 Differential Revision: https://reviews.llvm.org/D86243	2020-08-22 09:38:13 -04:00
Fangrui Song	bd670142b0	[Attributor][test] Add REQUIRES: asserts after D86129	2020-08-21 16:20:41 -07:00
Roman Lebedev	29a87631f2	Temporairly revert "[SimplifyCFG][LoopRotate] SimplifyCFG: disable common instruction hoisting by default, enable late in pipeline" As disscussed in post-commit review starting with https://reviews.llvm.org/D84108#2227365 while this appears to be mostly a win overall, especially code-size-wise, this appears to shake //certain// code pattens in a way that is extremely unfavorable for performance (+30% runtime regression) on certain CPU's (i personally can't reproduce). So until the behaviour is better understood, and a path forward is mapped, let's back this out for now. This reverts commit 1d51dc38d89bd33fb8874e242ab87b265b4dec1c.	2020-08-22 00:33:22 +03:00
Arthur Eubanks	5e4555a20b	[opt][NewPM] Add basic-aa in legacy PM compatibility mode The legacy PM alias analysis pipeline by default includes basic-aa. When running `opt -foo-pass` under the NPM and -disable-basic-aa is not specified, use basic-aa. This decreases the number of check-llvm failures under NPM from 913 to 752. Reviewed By: ychen, asbirlea Differential Revision: https://reviews.llvm.org/D86167	2020-08-21 14:05:07 -07:00
kuterd	bc2c6ef6db	[Attributor] Function seed allow list - Adds a command line option to seed only selected functions. - Makes seed allow listing exclusive to assertions enabled builds. Reviewed By: sstefan1 Differential Revision: https://reviews.llvm.org/D86129	2020-08-21 23:55:26 +03:00
Shinji Okumura	a7f07cdc6b	[Attributor] fix AANoUndef initialization Currently, `AANoUndefImpl::initialize` mistakenly always indicates optimistic fixpoint for function returned position. This is because an associated value is `Function` in the case, and `isGuaranteedNotToBeUndefOrPoison` returns true for Function. Reviewed By: jdoerfert Differential Revision: https://reviews.llvm.org/D86361	2020-08-22 05:06:14 +09:00
Serguei Katkov	facbe0803e	[InstCombine] Remove unused entries in gc-live bundle of statepoint If some of gc live value are not used in gc.relocate we can remove them from gc-live bundle of statepoint instruction. Also the CL removes duplicated Values in gc-live bundle. Reviewers: reames, dantrushin Reviewed By: dantrushin Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D85959	2020-08-22 01:36:22 +07:00
Florian Hahn	5440177a2e	[LoopIdiom,LSR] Add additional tests for SCEVExpander cleanups.	2020-08-21 13:48:31 +01:00
Sam Parker	25faf23408	[NFC] Add SimplifyCFG for ARM Add some phi elimination threshold testing.	2020-08-21 11:52:31 +01:00
Mirko Brkusanin	08706e7bce	[AMDGPU] Reorganize GCN subtarget features for unaligned access Features UnalignedBufferAccess and UnalignedDSAccess are now used to determine whether hardware supports such access. UnalignedAccessMode should be used to enable them. hasUnalignedBufferAccessEnabled() and hasUnalignedDSAccessEnabled() can be now used to quickly check both. Differential Revision: https://reviews.llvm.org/D84522	2020-08-21 12:26:31 +02:00
Mirko Brkusanin	49f2d14543	[AMDGPU] Fix alignment requirements for 96bit and 128bit local loads and stores Adjust alignment requirements for ds_read/write_b96/b128. GFX9 and onwards allow misaligned access for reads and writes but only if SH_MEM_CONFIG.alignment_mode allows it. UnalignedDSAccess is set on GCN subtargets from GFX9 onward to let us know if we can relax alignment requirements. UnalignedAccessMode acts similary to UnalignedBufferAccess for DS instructions but only from GFX9 onward and is supposed to match alignment_mode. By default alignment of 4 is required. Differential Revision: https://reviews.llvm.org/D82788	2020-08-21 12:26:31 +02:00
Florian Hahn	48a439a77a	[DSE,MemorySSA] Handle atomicrmw/cmpxchg conservatively. This adds conservative handling of AtomicRMW/AtomicCmpXChg to isDSEBarrier, similar to atomic loads and stores.	2020-08-21 10:42:42 +01:00
Florian Hahn	20d85a73f8	[DSE,MemorySSA] Regenerate check lines for atomic.ll tests.	2020-08-21 10:18:06 +01:00
sstefan1	8f1b61f465	[Attributor][NFC] run update_test_checks with --check-attributes.	2020-08-21 11:12:41 +02:00
Sam Parker	76932d3b0f	[SimplifyCFG] Cost required selects Before we speculatively execute a basic block, query the cost of inserting the necessary select instructions against the phi folding threshold. For non-trivial insertions, a more accurate decision can probably be made during machine if-conversion. With minsize we query the CodeSize cost, otherwise we use SizeAndLatency. Differential Revision: https://reviews.llvm.org/D82438	2020-08-21 09:52:52 +01:00
David Green	53fac1f9ad	[ARM][LV] Add a preferPredicatedReductionSelect target hook As part of D84741, this adds a target hook for the preferPredicatedReductionSelect option and makes use of it under MVE, allowing us to tail predicate most reduction loops. Differential Revision: https://reviews.llvm.org/D85980	2020-08-21 08:48:12 +01:00
Roman Lebedev	c0a69dfec4	[NFC][InstCombine] Tests for PHI-of-insertvalue's Currently we don't do anything about these, neither in InstCombine, nor in SimplifyCFG's sinking. These happen exceedingly rarely, but i've seen them in the cases where PHI-aware aggregate reconstruction would have fired if not for them.	2020-08-20 20:16:31 +03:00

1 2 3 4 5 ...

15950 Commits