RPCS3/llvm - llvm - Free-Git: DMCA Non-Compliant

RPCS3/llvm

mirror of https://github.com/RPCS3/llvm.git synced 2026-07-01 21:04:04 -04:00

Author	SHA1	Message	Date
Roman Lebedev	d8ca3c0d57	[CVP] No-wrap deduction for `shl` Summary: This is the last `OverflowingBinaryOperator` for which we don't deduce flags. D69217 taught `ConstantRange::makeGuaranteedNoWrapRegion()` about it. The effect is better than of the `mul` patch (D69203): \| statistic \| old \| new \| delta \| % change \| \| correlated-value-propagation.NumAddNUW \| 7145 \| 7144 \| -1 \| -0.0140% \| \| correlated-value-propagation.NumAddNW \| 12126 \| 12125 \| -1 \| -0.0082% \| \| correlated-value-propagation.NumAnd \| 443 \| 446 \| 3 \| 0.6772% \| \| correlated-value-propagation.NumNSW \| 5986 \| 7158 \| 1172 \| 19.5790% \| \| correlated-value-propagation.NumNUW \| 10512 \| 13304 \| 2792 \| 26.5601% \| \| correlated-value-propagation.NumNW \| 16498 \| 20462 \| 3964 \| 24.0272% \| \| correlated-value-propagation.NumShlNSW \| 0 \| 1172 \| 1172 \| \| \| correlated-value-propagation.NumShlNUW \| 0 \| 2793 \| 2793 \| \| \| correlated-value-propagation.NumShlNW \| 0 \| 3965 \| 3965 \| \| \| instcount.NumAShrInst \| 13824 \| 13790 \| -34 \| -0.2459% \| \| instcount.NumAddInst \| 277584 \| 277586 \| 2 \| 0.0007% \| \| instcount.NumAndInst \| 66061 \| 66056 \| -5 \| -0.0076% \| \| instcount.NumBrInst \| 709153 \| 709147 \| -6 \| -0.0008% \| \| instcount.NumICmpInst \| 483709 \| 483708 \| -1 \| -0.0002% \| \| instcount.NumSExtInst \| 79497 \| 79496 \| -1 \| -0.0013% \| \| instcount.NumShlInst \| 40691 \| 40654 \| -37 \| -0.0909% \| \| instcount.NumSubInst \| 61997 \| 61996 \| -1 \| -0.0016% \| \| instcount.NumZExtInst \| 68208 \| 68211 \| 3 \| 0.0044% \| \| instcount.TotalBlocks \| 843916 \| 843910 \| -6 \| -0.0007% \| \| instcount.TotalInsts \| 7387528 \| 7387448 \| -80 \| -0.0011% \| Reviewers: nikic, reames, sanjoy, timshen Reviewed By: nikic Subscribers: hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D69277 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@375455 91177308-0d34-0410-b5e6-96231b3b80d8	2019-10-21 21:31:19 +00:00
Roman Lebedev	c0f32bd552	[NFC][CVP] Add `shl` no-wrap deduction test coverage git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@375441 91177308-0d34-0410-b5e6-96231b3b80d8	2019-10-21 18:35:26 +00:00
Jay Foad	a3253c0261	Pre-commit test cases for D64713. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@375418 91177308-0d34-0410-b5e6-96231b3b80d8	2019-10-21 15:01:59 +00:00
Sam Elliott	7fc698b06a	[MemCpyOpt] Fixing Incorrect Code Motion while Handling Aggregate Type Values Summary: When MemCpyOpt is handling aggregate type values, if an instruction (let's call it P) between the targeting load (L) and store (S) clobbers the source pointer of L, it will try to hoist S before P. This process will also hoist S's data dependency instructions. However, the current implementation has a bug that if one of S's dependency instructions is //also// a user of P, MemCpyOpt will not prevent it from being hoisted above P and cause a use-before-define error. For example, in the newly added test file (i.e. `aggregate-type-crash.ll`), it will try to hoist both `store %my_struct %1, %my_struct* %3` and its dependent, `%3 = bitcast i8* %2 to %my_struct`, above `%2 = call i8 @my_malloc(%my_struct* %0)`. Creating the following BB: ``` entry: %1 = bitcast i8* %4 to %my_struct* %2 = bitcast %my_struct* %1 to i8* %3 = bitcast %my_struct* %0 to i8* call void @llvm.memcpy.p0i8.p0i8.i64(i8* align 4 %2, i8* align 4 %3, i64 8, i1 false) %4 = call i8* @my_malloc(%my_struct* %0) ret void ``` Where there is a use-before-define error between `%1` and `%4`. Update: The compiler for the Pony Programming Language [also encounter the same bug](https://github.com/ponylang/ponyc/issues/3140) Patch by Min-Yih Hsu (myhsu) Reviewers: eugenis, pcc, dblaikie, dneilson, t.p.northover, lattner Reviewed By: eugenis Subscribers: lenary, hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D66060 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@375403 91177308-0d34-0410-b5e6-96231b3b80d8	2019-10-21 10:00:34 +00:00
Roman Lebedev	3ddad0b0b1	[CVP] Deduce no-wrap on `mul` Summary: `ConstantRange::makeGuaranteedNoWrapRegion()` knows how to deal with `mul` since rL335646, there is exhaustive test coverage. This is already used by CVP's `processOverflowIntrinsic()`, and by SCEV's `StrengthenNoWrapFlags()` That being said, currently, this doesn't help much in the end: \| statistic \| old \| new \| delta \| percentage \| \| correlated-value-propagation.NumMulNSW \| 4 \| 275 \| 271 \| 6775.00% \| \| correlated-value-propagation.NumMulNUW \| 4 \| 1323 \| 1319 \| 32975.00% \| \| correlated-value-propagation.NumMulNW \| 8 \| 1598 \| 1590 \| 19875.00% \| \| correlated-value-propagation.NumNSW \| 5715 \| 5986 \| 271 \| 4.74% \| \| correlated-value-propagation.NumNUW \| 9193 \| 10512 \| 1319 \| 14.35% \| \| correlated-value-propagation.NumNW \| 14908 \| 16498 \| 1590 \| 10.67% \| \| instcount.NumAddInst \| 275871 \| 275869 \| -2 \| 0.00% \| \| instcount.NumBrInst \| 708234 \| 708232 \| -2 \| 0.00% \| \| instcount.NumMulInst \| 43812 \| 43810 \| -2 \| 0.00% \| \| instcount.NumPHIInst \| 316786 \| 316784 \| -2 \| 0.00% \| \| instcount.NumTruncInst \| 62165 \| 62167 \| 2 \| 0.00% \| \| instcount.NumUDivInst \| 2528 \| 2526 \| -2 \| -0.08% \| \| instcount.TotalBlocks \| 842995 \| 842993 \| -2 \| 0.00% \| \| instcount.TotalInsts \| 7376486 \| 7376478 \| -8 \| 0.00% \| (^ test-suite plain, tests still pass) Reviewers: nikic, reames, luqmana, sanjoy, timshen Reviewed By: reames Subscribers: hiraditya, javed.absar, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D69203 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@375396 91177308-0d34-0410-b5e6-96231b3b80d8	2019-10-21 08:21:44 +00:00
Piotr Sobczak	ec10cb25d9	[InstCombine] Allow values with multiple users in SimplifyDemandedVectorElts Summary: Allow for ignoring the check for a single use in SimplifyDemandedVectorElts to be able to simplify operands if DemandedElts is known to contain the union of elements used by all users. It is a responsibility of a caller of SimplifyDemandedVectorElts to supply correct DemandedElts. Simplify a series of extractelement instructions if only a subset of elements is used. Reviewers: reames, arsenm, majnemer, nhaehnle Reviewed By: nhaehnle Subscribers: wdng, jvesely, nhaehnle, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D67345 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@375395 91177308-0d34-0410-b5e6-96231b3b80d8	2019-10-21 08:12:47 +00:00
Yevgeny Rouban	53fb41197c	[IR] Fix mayReadFromMemory() for writeonly calls Current implementation of Instruction::mayReadFromMemory() returns !doesNotAccessMemory() which is !ReadNone. This does not take into account that the writeonly attribute also indicates that the call does not read from memory. The patch changes the predicate to !doesNotReadMemory() that reflects the intended behavior. Differential Revision: https://reviews.llvm.org/D69086 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@375389 91177308-0d34-0410-b5e6-96231b3b80d8	2019-10-21 06:52:08 +00:00
Johannes Doerfert	463d8212b2	[Attributor] Teach AANoCapture to use information in-flight more aggressively AAReturnedValues, AAMemoryBehavior, and AANoUnwind, can provide information that helps during the tracking or even justifies no-capture. We now use this information and enable no-capture in some test cases designed a long while a ago for these cases. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@375382 91177308-0d34-0410-b5e6-96231b3b80d8	2019-10-21 00:48:42 +00:00
Philip Reames	9efd72ad02	[IndVars] Eliminate loop exits with equivalent exit counts We can end up with two loop exits whose exit counts are equivalent, but whose textual representation is different and non-obvious. For the sub-case where we have a series of exits which dominate one another (common), eliminate any exits which would iterate after a previous exit on the exiting iteration. As noted in the TODO being removed, I'd always thought this was a good idea, but I've now seen this in a real workload as well. Interestingly, in review, Nikita pointed out there's let another oppurtunity to leverage SCEV's reasoning. If we kept track of the min of dominanting exits so far, we could discharge exits with EC >= MDE. This is less powerful than the existing transform (since later exits aren't considered), but potentially more powerful for any case where SCEV can prove a >= b, but neither a == b or a > b. I don't have an example to illustrate that oppurtunity, but won't be suprised if we find one and return to handle that case as well. Differential Revision: https://reviews.llvm.org/D69009 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@375379 91177308-0d34-0410-b5e6-96231b3b80d8	2019-10-20 23:38:02 +00:00
Roman Lebedev	b93a52f5fe	[InstCombine] conditional sign-extend of high-bit-extract: 'or' pattern. In this pattern, all the "magic" bits that we'd `add` are all high sign bits, and in the value we'd be adding to they are all unset, not unexpectedly, so we can have an `or` there: https://rise4fun.com/Alive/ups It is possible that `haveNoCommonBitsSet()` should be taught about this pattern so that we never have an `add` variant, but the reasoning would need to be recursive (because of that `select`), so i'm not really sure that would be worth it just yet. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@375378 91177308-0d34-0410-b5e6-96231b3b80d8	2019-10-20 20:52:06 +00:00
Roman Lebedev	150b0bedb7	[NFC][InstCombine] conditional sign-extend of high-bit-extract: 'and' pat. can be 'or' pattern. In this pattern, all the "magic" bits that we'd add are all high sign bits, and in the value we'd be adding to they are all unset, not unexpectedly, so we can have an `or` there: https://rise4fun.com/Alive/ups git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@375377 91177308-0d34-0410-b5e6-96231b3b80d8	2019-10-20 20:51:37 +00:00
Nikita Popov	e568120da3	[InstCombine] Fold uadd.sat(a, b) == 0 and usub.sat(a, b) == 0 This adds folds for comparing uadd.sat/usub.sat with zero: * uadd.sat(a, b) == 0 => a == 0 && b == 0 => (a \| b) == 0 * usub.sat(a, b) == 0 => a <= b And inverted forms for !=. Differential Revision: https://reviews.llvm.org/D69224 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@375374 91177308-0d34-0410-b5e6-96231b3b80d8	2019-10-20 20:19:42 +00:00
Nikita Popov	3ab0cb15c0	[InstCombine] Add tests for uadd/sub.sat(a, b) == 0; NFC git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@375372 91177308-0d34-0410-b5e6-96231b3b80d8	2019-10-20 19:50:31 +00:00
Roman Lebedev	32d24a3249	[InstCombine] Shift amount reassociation in shifty sign bit test (PR43595) Summary: This problem consists of several parts: * Basic sign bit extraction - `trunc? (?shr %x, (bitwidth(x)-1))`. This is trivial, and easy to do, we have a fold for it. * Shift amount reassociation - if we have two identical shifts, and we can simplify-add their shift amounts together, then we likely can just perform them as a single shift. But this is finicky, has one-use restrictions, and shift opcodes must be identical. But there is a super-pattern where both of these work together. to produce sign bit test from two shifts + comparison. We do indeed already handle this in most cases. But since we get that fold transitively, it has one-use restrictions. And what's worse, in this case the right-shifts aren't required to be identical, and we can't handle that transitively: If the total shift amount is bitwidth-1, only a sign bit will remain in the output value. But if we look at this from the perspective of two shifts, we can't fold - we can't possibly know what bit pattern we'd produce via two shifts, it will be some kind of a mask produced from original sign bit, but we just can't tell it's shape: https://rise4fun.com/Alive/cM0 https://rise4fun.com/Alive/9IN But it will only contain sign bit and zeros. So from the perspective of sign bit test, we're good: https://rise4fun.com/Alive/FRz https://rise4fun.com/Alive/qBU Superb! So the simplest solution is to extend `reassociateShiftAmtsOfTwoSameDirectionShifts()` to also have a sudo-analysis mode that will ignore extra-uses, and will only check whether a) those are two right shifts and b) they end up with bitwidth(x)-1 shift amount and return either the original value that we sign-checking, or null. This does not have any functionality change for the existing `reassociateShiftAmtsOfTwoSameDirectionShifts()`. All that being said, as disscussed in the review, this yet again increases usage of instsimplify in instcombine as utility. Some day that may need to be reevaluated. https://bugs.llvm.org/show_bug.cgi?id=43595 Reviewers: spatel, efriedma, vsk Reviewed By: spatel Subscribers: xbolva00, hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D68930 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@375371 91177308-0d34-0410-b5e6-96231b3b80d8	2019-10-20 19:38:50 +00:00
Wei Mi	762a6a2124	[SampleFDO] Add profile remapping support for profile on-demand loading used by ExtBinary format profile Profile on-demand loading was added for ExtBinary format profile in rL374233, but currently profile on-demand loading doesn't work well with profile remapping. The patch adds the support. Suppose a function in the current module has outline instance in the profile. The function name in the module is different from the name of the outline instance, but remapper knows the two names are equal. When loading profile on-demand, the outline instance has to be loaded with remapper's help. At the same time SampleProfileReaderItaniumRemapper is changed from a proxy of SampleProfileReader to a helper member in SampleProfileReader. Differential Revision: https://reviews.llvm.org/D68901 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@375295 91177308-0d34-0410-b5e6-96231b3b80d8	2019-10-18 22:35:20 +00:00
Roman Lebedev	a82e2a53ab	[NFC][CVP] Some tests for `mul` no-wrap deduction git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@375285 91177308-0d34-0410-b5e6-96231b3b80d8	2019-10-18 20:36:19 +00:00
Roman Lebedev	e1659b963e	[CVP] After proving that @llvm.with.overflow()/@llvm.sat() don't overflow, also try to prove other no-wrap Summary: CVP, unlike InstCombine, does not run till exaustion. It only does a single pass. When dealing with those special binops, if we prove that they can safely be demoted into their usual binop form, we do set the no-wrap we deduced. But when dealing with usual binops, we try to deduce both no-wraps. So if we convert e.g. @llvm.uadd.with.overflow() to `add nuw`, we won't attempt to check whether it can be `add nuw nsw`. This patch proposes to call `processBinOp()` on newly-created binop, which is identical to what we do for div/rem already. Reviewers: nikic, spatel, reames Reviewed By: nikic Subscribers: hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D69183 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@375273 91177308-0d34-0410-b5e6-96231b3b80d8	2019-10-18 19:32:47 +00:00
Roman Lebedev	07fddc38a9	[NFC][CVP] Add @llvm.*.sat tests where we could prove both no-overflows git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@375260 91177308-0d34-0410-b5e6-96231b3b80d8	2019-10-18 17:18:12 +00:00
Bjorn Pettersson	c395575313	[InstCombine] Fix miscompile bug in canEvaluateShuffled Summary: Add restrictions in canEvaluateShuffled to prevent that we for example transform %0 = insertelement <2 x i16> undef, i16 %a, i32 0 %1 = srem <2 x i16> %0, <i16 2, i16 1> %2 = shufflevector <2 x i16> %1, <2 x i16> undef, <2 x i32> <i32 undef, i32 0> into %1 = insertelement <2 x i16> undef, i16 %a, i32 1 %2 = srem <2 x i16> %1, <i16 undef, i16 2> as having an undef denominator makes the srem undefined (for all vector elements). Fixes: https://bugs.llvm.org/show_bug.cgi?id=43689 Reviewers: spatel, lebedev.ri Reviewed By: spatel, lebedev.ri Subscribers: lebedev.ri, hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D69038 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@375208 91177308-0d34-0410-b5e6-96231b3b80d8	2019-10-18 07:42:02 +00:00
Bjorn Pettersson	acb3705a54	[InstCombine] Pre-commit of test case showing miscompile bug in canEvaluateShuffled Adding the reproducer from https://bugs.llvm.org/show_bug.cgi?id=43689, showing that instcombine is doing a bad transform. It transforms %0 = insertelement <2 x i16> undef, i16 %a, i32 0 %1 = srem <2 x i16> %0, <i16 2, i16 1> %2 = shufflevector <2 x i16> %1, <2 x i16> undef, <2 x i32> <i32 undef, i32 0> into %1 = insertelement <2 x i16> undef, i16 %a, i32 1 %2 = srem <2 x i16> %1, <i16 undef, i16 2> The undef denominator makes the whole srem undefined. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@375207 91177308-0d34-0410-b5e6-96231b3b80d8	2019-10-18 07:41:53 +00:00
Roman Lebedev	5600873160	[NFC][InstCombine] Tests for "fold variable mask before variable shift-of-trunc" (PR42563) https://bugs.llvm.org/show_bug.cgi?id=42563 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@375135 91177308-0d34-0410-b5e6-96231b3b80d8	2019-10-17 17:20:12 +00:00
Roman Lebedev	f7a287e981	[LoopIdiom] BCmp: check, not assert that loop exits exit out of the loop (PR43687) We can't normally stumble into that assertion because a tautological conditional `br` in loop body is required, one that always branches to loop latch. But that should have been always folded to an unconditional branch before we get it. But that is not guaranteed if the pass is run standalone. So let's just promote the assertion into a proper check. Fixes https://bugs.llvm.org/show_bug.cgi?id=43687 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@375100 91177308-0d34-0410-b5e6-96231b3b80d8	2019-10-17 11:01:29 +00:00
Oliver Stannard	3400920f53	Reland: Dead Virtual Function Elimination Remove dead virtual functions from vtables with replaceNonMetadataUsesWith, so that CGProfile metadata gets cleaned up correctly. Original commit message: Currently, it is hard for the compiler to remove unused C++ virtual functions, because they are all referenced from vtables, which are referenced by constructors. This means that if the constructor is called from any live code, then we keep every virtual function in the final link, even if there are no call sites which can use it. This patch allows unused virtual functions to be removed during LTO (and regular compilation in limited circumstances) by using type metadata to match virtual function call sites to the vtable slots they might load from. This information can then be used in the global dead code elimination pass instead of the references from vtables to virtual functions, to more accurately determine which functions are reachable. To make this transformation safe, I have changed clang's code-generation to always load virtual function pointers using the llvm.type.checked.load intrinsic, instead of regular load instructions. I originally tried writing this using clang's existing code-generation, which uses the llvm.type.test and llvm.assume intrinsics after doing a normal load. However, it is possible for optimisations to obscure the relationship between the GEP, load and llvm.type.test, causing GlobalDCE to fail to find virtual function call sites. The existing linkage and visibility types don't accurately describe the scope in which a virtual call could be made which uses a given vtable. This is wider than the visibility of the type itself, because a virtual function call could be made using a more-visible base class. I've added a new !vcall_visibility metadata type to represent this, described in TypeMetadata.rst. The internalization pass and libLTO have been updated to change this metadata when linking is performed. This doesn't currently work with ThinLTO, because it needs to see every call to llvm.type.checked.load in the linkage unit. It might be possible to extend this optimisation to be able to use the ThinLTO summary, as was done for devirtualization, but until then that combination is rejected in the clang driver. To test this, I've written a fuzzer which generates random C++ programs with complex class inheritance graphs, and virtual functions called through object and function pointers of different types. The programs are spread across multiple translation units and DSOs to test the different visibility restrictions. I've also tried doing bootstrap builds of LLVM to test this. This isn't ideal, because only classes in anonymous namespaces can be optimised with -fvisibility=default, and some parts of LLVM (plugins and bugpoint) do not work correctly with -fvisibility=hidden. However, there are only 12 test failures when building with -fvisibility=hidden (and an unmodified compiler), and this change does not cause any new failures for either value of -fvisibility. On the 7 C++ sub-benchmarks of SPEC2006, this gives a geomean code-size reduction of ~6%, over a baseline compiled with "-O2 -flto -fvisibility=hidden -fwhole-program-vtables". The best cases are reductions of ~14% in 450.soplex and 483.xalancbmk, and there are no code size increases. I've also run this on a set of 8 mbed-os examples compiled for Armv7M, which show a geomean size reduction of ~3%, again with no size increases. I had hoped that this would have no effect on performance, which would allow it to awlays be enabled (when using -fwhole-program-vtables). However, the changes in clang to use the llvm.type.checked.load intrinsic are causing ~1% performance regression in the C++ parts of SPEC2006. It should be possible to recover some of this perf loss by teaching optimisations about the llvm.type.checked.load intrinsic, which would make it worth turning this on by default (though it's still dependent on -fwhole-program-vtables). Differential revision: https://reviews.llvm.org/D63932 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@375094 91177308-0d34-0410-b5e6-96231b3b80d8	2019-10-17 09:58:57 +00:00
Mikhail Maltsev	faeea2dc5e	[Analysis] Don't assume that unsigned overflow can't happen in EmitGEPOffset (PR42699) Summary: Currently when computing a GEP offset using the function EmitGEPOffset for the following instruction getelementptr inbounds i32, i32* %p, i64 %offs we get mul nuw i64 %offs, 4 Unfortunately we cannot assume that unsigned wrapping won't happen here because %offs is allowed to be negative. Making such assumptions can lead to miscompilations: see the new test test24_neg_offs in InstCombine/icmp.ll. Without the patch InstCombine would generate the following comparison: icmp eq i64 %offs, 4611686018427387902; 0x3ffffffffffffffe Whereas the correct value to compare with is -2. This patch replaces the NUW flag with NSW in the multiplication instructions generated by EmitGEPOffset and adjusts the test suite. https://bugs.llvm.org/show_bug.cgi?id=42699 Reviewers: chandlerc, craig.topper, ostannard, lebedev.ri, spatel, efriedma, nlopes, aqjune Reviewed By: lebedev.ri Subscribers: reames, lebedev.ri, hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D68342 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@375089 91177308-0d34-0410-b5e6-96231b3b80d8	2019-10-17 08:59:06 +00:00
Sam Parker	3a4bfa616e	[DAGCombine][ARM] Enable extending masked loads Add generic DAG combine for extending masked loads. Allow us to generate sext/zext masked loads which can access v4i8, v8i8 and v4i16 memory to produce v4i32, v8i16 and v4i32 respectively. Differential Revision: https://reviews.llvm.org/D68337 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@375085 91177308-0d34-0410-b5e6-96231b3b80d8	2019-10-17 07:55:55 +00:00
Philip Reames	91e38641d2	Remove a stale comment, noted in post commit review for rL375038 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@375040 91177308-0d34-0410-b5e6-96231b3b80d8	2019-10-16 20:27:10 +00:00
Philip Reames	25ddbc8218	[IndVars] Fix a miscompile in off-by-default loop predication implementation The problem is that we can have two loop exits, 'a' and 'b', where 'a' and 'b' would exit at the same iteration, 'a' precedes 'b' along some path, and 'b' is predicated while 'a' is not. In this case (see the previously submitted test case), we causing the loop to exit through 'b' whereas it should have exited through 'a'. This only applies to loop exits where the exit counts are not provably inequal, but that isn't as much of a restriction as it appears. If we could order the exit counts, we'd have already removed one of the two exits. In theory, we might be able to prove inequality w/o ordering, but I didn't really explore that piece. Instead, I went for the obvious restriction and ensured we didn't predicate exits following non-predicateable exits. Credit goes to Evgeny Brevnov for figuring out the problematic case. Fuzzing probably also found it (failures seen), but due to some silly infrastructure problems I hadn't gotten to the results before Evgeny hand reduced it from a benchmark (he manually enabled the transform). Once this is fixed, I'll try to filter through the fuzzer failures to see if there's anything additional lurking. Differential Revision https://reviews.llvm.org/D68956 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@375038 91177308-0d34-0410-b5e6-96231b3b80d8	2019-10-16 19:58:26 +00:00
Sanjay Patel	b531baef64	[SLP] avoid reduction transform on patterns that the backend can load-combine (2nd try) The 1st attempt at this modified the cost model in a bad way to avoid the vectorization, but that caused problems for other users (the loop vectorizer) of the cost model. I don't see an ideal solution to these 2 related, potentially large, perf regressions: https://bugs.llvm.org/show_bug.cgi?id=42708 https://bugs.llvm.org/show_bug.cgi?id=43146 We decided that load combining was unsuitable for IR because it could obscure other optimizations in IR. So we removed the LoadCombiner pass and deferred to the backend. Therefore, preventing SLP from destroying load combine opportunities requires that it recognizes patterns that could be combined later, but not do the optimization itself ( it's not a vector combine anyway, so it's probably out-of-scope for SLP). Here, we add a cost-independent bailout with a conservative pattern match for a multi-instruction sequence that can probably be reduced later. In the x86 tests shown (and discussed in more detail in the bug reports), SDAG combining will produce a single instruction on these tests like: movbe rax, qword ptr [rdi] or: mov rax, qword ptr [rdi] Not some (half) vector monstrosity as we currently do using SLP: vpmovzxbq ymm0, dword ptr [rdi + 1] # ymm0 = mem[0],zero,zero,.. vpsllvq ymm0, ymm0, ymmword ptr [rip + .LCPI0_0] movzx eax, byte ptr [rdi] movzx ecx, byte ptr [rdi + 5] shl rcx, 40 movzx edx, byte ptr [rdi + 6] shl rdx, 48 or rdx, rcx movzx ecx, byte ptr [rdi + 7] shl rcx, 56 or rcx, rdx or rcx, rax vextracti128 xmm1, ymm0, 1 vpor xmm0, xmm0, xmm1 vpshufd xmm1, xmm0, 78 # xmm1 = xmm0[2,3,0,1] vpor xmm0, xmm0, xmm1 vmovq rax, xmm0 or rax, rcx vzeroupper ret Differential Revision: https://reviews.llvm.org/D67841 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@375025 91177308-0d34-0410-b5e6-96231b3b80d8	2019-10-16 18:06:24 +00:00
Piotr Sobczak	5f2beaf914	[InstCombine][AMDGPU] Fix crash with v3i16/v3f16 buffer intrinsics Summary: This is something of a workaround to avoid a crash later on in type legalizer (WidenVectorResult()). Also added some f16 tests, including a non-working v3f16 case with a FIXME. Reviewers: arsenm, tpr, nhaehnle Reviewed By: arsenm Subscribers: kzhuravl, jvesely, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D68865 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@374993 91177308-0d34-0410-b5e6-96231b3b80d8	2019-10-16 11:14:01 +00:00
Sjoerd Meijer	d773329d75	Revert "[HardwareLoops] Optimisation remarks" while I investigate the PPC build bot failures. This reverts commit ad763751565b9663bc338fa2ca5ade86c6ca22ec. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@374992 91177308-0d34-0410-b5e6-96231b3b80d8	2019-10-16 10:55:06 +00:00
Sjoerd Meijer	52192eb65c	[HardwareLoops] Optimisation remarks This adds the initial plumbing to support optimisation remarks in the IR hardware-loop pass. I have left a todo in a comment where we can improve the reporting, and will iterate on that now that we have this initial support in. Differential Revision: https://reviews.llvm.org/D68579 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@374980 91177308-0d34-0410-b5e6-96231b3b80d8	2019-10-16 09:09:55 +00:00
Alina Sbirlea	b09882b096	[NewGVN] Check that call has an access. Check that a call has an attached MemoryAccess before calling getClobbering on the instruction. If no access is attached, the instruction does not access memory. Resolves PR43441. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@374920 91177308-0d34-0410-b5e6-96231b3b80d8	2019-10-15 17:25:36 +00:00
Sanjay Patel	245690544d	[InstCombine] fold a shifted bool zext to a select (2nd try) The 1st attempt at rL374828 inserted the code at the wrong position (outside of the constant-shift-amount block). Trying again with an additional test to verify const-ness. For a constant shift amount, add the following fold. shl (zext (i1 X)), ShAmt --> select (X, 1 << ShAmt, 0) https://rise4fun.com/Alive/IZ9 Fixes PR42257. Based on original patch by @zvi (Zvi Rackover) Differential Revision: https://reviews.llvm.org/D63382 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@374886 91177308-0d34-0410-b5e6-96231b3b80d8	2019-10-15 13:12:44 +00:00
David L. Jones	75cbca09a4	Revert [SROA] Reuse existing lifetime markers if possible This reverts r374692 (git commit 92694eba933ef4ea0b1b6377809ff266df37d61b) Reproducer sent to commit thread on llvm-commits. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@374859 91177308-0d34-0410-b5e6-96231b3b80d8	2019-10-15 04:32:07 +00:00
Sanjay Patel	820385076f	Revert [InstCombine] fold a shifted bool zext to a select This reverts r374828 (git commit 1f40f15d54aac06421448b6de131231d2d78bc75) due to bot breakage git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@374851 91177308-0d34-0410-b5e6-96231b3b80d8	2019-10-14 23:55:39 +00:00
Alina Sbirlea	caaecec384	[MemorySSA] Update for partial unswitch. Update MSSA for blocks cloned when doing partial unswitching. Enable additional testing with MSSA. Resolves PR43641. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@374850 91177308-0d34-0410-b5e6-96231b3b80d8	2019-10-14 23:52:39 +00:00
Jorge Gorbe Moya	9a694d933a	Revert "Dead Virtual Function Elimination" This reverts commit 9f6a873268e1ad9855873d9d8007086c0d01cf4f. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@374844 91177308-0d34-0410-b5e6-96231b3b80d8	2019-10-14 23:25:25 +00:00
Sanjay Patel	17f93c96ec	[InstCombine] fold a shifted bool zext to a select For a constant shift amount, add the following fold. shl (zext (i1 X)), ShAmt --> select (X, 1 << ShAmt, 0) https://rise4fun.com/Alive/IZ9 Fixes PR42257. Based on original patch by @zvi (Zvi Rackover) Differential Revision: https://reviews.llvm.org/D63382 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@374828 91177308-0d34-0410-b5e6-96231b3b80d8	2019-10-14 21:56:40 +00:00
Sanjay Patel	f4932ab6c4	[InstCombine] add tests for select/shift transforms; NFC A transform proposal for the shift form is in D63382. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@374818 91177308-0d34-0410-b5e6-96231b3b80d8	2019-10-14 20:28:03 +00:00
Philip Reames	a8b14d4c58	[Tests] Add a test demonstrating a miscompile in the off-by-default loop-pred transform Credit goes to Evgeny Brevnov for figuring out the problematic case. Fuzzing probably also found it (lots of failures), but due to some silly infrastructure problems I hadn't gotten to the results before Evgeny hand reduced it from a benchmark. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@374812 91177308-0d34-0410-b5e6-96231b3b80d8	2019-10-14 19:49:40 +00:00
Roman Lebedev	da8e68c8aa	[LoopIdiom] BCmp: loop exit count must not be wider than size_t that `bcmp` takes As reported by Joerg Sonnenberger in IRC, for 32-bit systems, where pointer and size_t are 32-bit, if you use 64-bit-wide variable in the loop, you could end up with loop exit count being of the type wider than the size_t. Now, i'm not sure if we can produce `bcmp` from that (just truncate?), but we certainly should not assert/miscompile. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@374811 91177308-0d34-0410-b5e6-96231b3b80d8	2019-10-14 19:46:34 +00:00
Philip Reames	f7875e639f	[Tests] Add a few more tests for idioms with FP induction variables git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@374807 91177308-0d34-0410-b5e6-96231b3b80d8	2019-10-14 19:10:39 +00:00
Simon Pilgrim	7cb9be5d19	[CostModel][X86] Add CTLZ scalar costs Add specific scalar costs for CTLZ instructions, we can't discriminate between CTLZ and CTLZ_ZERO_UNDEF so we have to assume the worst. Given how BSR is often a microcoded nightmare on some older targets we might still be underestimating it. For targets supporting LZCNT (Intel Haswell+ or AMD Fam10+), we provide overrides that assume 1cy costs. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@374786 91177308-0d34-0410-b5e6-96231b3b80d8	2019-10-14 16:30:17 +00:00
Joerg Sonnenberger	732f95ff9a	Reapply r374743 with a fix for the ocaml binding Add a pass to lower is.constant and objectsize intrinsics This pass lowers is.constant and objectsize intrinsics not simplified by earlier constant folding, i.e. if the object given is not constant or if not using the optimized pass chain. The result is recursively simplified and constant conditionals are pruned, so that dead blocks are removed even for -O0. This allows inline asm blocks with operand constraints to work all the time. The new pass replaces the existing lowering in the codegen-prepare pass and fallbacks in SDAG/GlobalISEL and FastISel. The latter now assert on the intrinsics. Differential Revision: https://reviews.llvm.org/D65280 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@374784 91177308-0d34-0410-b5e6-96231b3b80d8	2019-10-14 16:15:14 +00:00
Cameron McInally	00c70bcbce	[IRBuilder] Update IRBuilder::CreateFNeg(...) to return a UnaryOperator Reapply r374240 with fix for Ocaml test, namely Bindings/OCaml/core.ml. Differential Revision: https://reviews.llvm.org/D61675 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@374782 91177308-0d34-0410-b5e6-96231b3b80d8	2019-10-14 15:35:01 +00:00
Simon Pilgrim	13da61c8c4	[CostModel][X86] Add CTPOP scalar costs (PR43656) Add specific scalar costs for ctpop instructions, these are based on the llvm-mca's SLM throughput numbers (the oldest model we have). For targets supporting POPCNT, we provide overrides that assume 1cy costs. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@374775 91177308-0d34-0410-b5e6-96231b3b80d8	2019-10-14 14:07:43 +00:00
Dmitri Gribenko	e0cea29324	Revert "Add a pass to lower is.constant and objectsize intrinsics" This reverts commit r374743. It broke the build with Ocaml enabled: http://lab.llvm.org:8011/builders/clang-x86_64-debian-fast/builds/19218 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@374768 91177308-0d34-0410-b5e6-96231b3b80d8	2019-10-14 12:22:48 +00:00
Joerg Sonnenberger	314e3cde15	Add a pass to lower is.constant and objectsize intrinsics This pass lowers is.constant and objectsize intrinsics not simplified by earlier constant folding, i.e. if the object given is not constant or if not using the optimized pass chain. The result is recursively simplified and constant conditionals are pruned, so that dead blocks are removed even for -O0. This allows inline asm blocks with operand constraints to work all the time. The new pass replaces the existing lowering in the codegen-prepare pass and fallbacks in SDAG/GlobalISEL and FastISel. The latter now assert on the intrinsics. Differential Revision: https://reviews.llvm.org/D65280 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@374743 91177308-0d34-0410-b5e6-96231b3b80d8	2019-10-13 23:00:15 +00:00
Johannes Doerfert	b5c6f41f8c	[Attributor] Shortcut no-return through will-return No-return and will-return are exclusive, assuming the latter is more prominent we can avoid updates of the former unless will-return is not known for sure. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@374739 91177308-0d34-0410-b5e6-96231b3b80d8	2019-10-13 21:25:53 +00:00
Johannes Doerfert	6f5d69720f	[Attributor][FIX] NullPointerIsDefined needs the pointer AS (AANonNull) Also includes a shortcut via AADereferenceable if possible. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@374737 91177308-0d34-0410-b5e6-96231b3b80d8	2019-10-13 20:48:26 +00:00

1 2 3 4 5 ...

14809 Commits