If we generate the gc.relocate, and then later prove two arguments to the statepoint are equivalent, we should canonicalize the gc.relocate to the form we would have produced if this had been known before rewriting.
llvm-svn: 372771
Summary:
This is again motivated by D67122 sanitizer check enhancement.
That patch seemingly worsens `-fsanitize=pointer-overflow`
overhead from 25% to 50%, which strongly implies missing folds.
For
```
#include <cassert>
char* test(char& base, signed long offset) {
__builtin_assume(offset < 0);
return &base + offset;
}
```
We produce
https://godbolt.org/z/r40U47
and again those two icmp's can be merged:
```
Name: 0
Pre: C != 0
%adjusted = add i8 %base, C
%not_null = icmp ne i8 %adjusted, 0
%no_underflow = icmp ult i8 %adjusted, %base
%r = and i1 %not_null, %no_underflow
=>
%neg_offset = sub i8 0, C
%r = icmp ugt i8 %base, %neg_offset
```
https://rise4fun.com/Alive/ALaphttps://rise4fun.com/Alive/slnN
There are 3 other variants of this pattern,
i believe they all will go into InstSimplify.
https://bugs.llvm.org/show_bug.cgi?id=43259
Reviewers: spatel, xbolva00, nikic
Reviewed By: spatel
Subscribers: efriedma, hiraditya, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D67849
llvm-svn: 372768
Summary:
This is again motivated by D67122 sanitizer check enhancement.
That patch seemingly worsens `-fsanitize=pointer-overflow`
overhead from 25% to 50%, which strongly implies missing folds.
This pattern isn't exactly what we get there
(strict vs. non-strict predicate), but this pattern does not
require known-bits analysis, so it is best to handle it first.
```
Name: 0
%adjusted = add i8 %base, %offset
%not_null = icmp ne i8 %adjusted, 0
%no_underflow = icmp ule i8 %adjusted, %base
%r = and i1 %not_null, %no_underflow
=>
%neg_offset = sub i8 0, %offset
%r = icmp ugt i8 %base, %neg_offset
```
https://rise4fun.com/Alive/knp
There are 3 other variants of this pattern,
they all will go into InstSimplify:
https://rise4fun.com/Alive/bIDZhttps://bugs.llvm.org/show_bug.cgi?id=43259
Reviewers: spatel, xbolva00, nikic
Reviewed By: spatel
Subscribers: hiraditya, majnemer, vsk, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D67846
llvm-svn: 372767
The static analyzer is warning about a potential null dereference, but we should be able to use cast<CmpInst> directly and if not assert will fire for us.
llvm-svn: 372732
The static analyzer is warning about a potential null dereference, but we should be able to use cast<LandingPadInst> directly and if not assert will fire for us.
llvm-svn: 372727
The static analyzer is warning about a potential null dereference, but we should be able to use cast<Instruction> directly and if not assert will fire for us.
llvm-svn: 372726
When vectorisation is forced with a pragma, we optimise for min size, and we
need to emit runtime memory checks, then allow this code growth and don't run
in an assert like we currently do.
This is the result of D65197 and D66803, and was a use-case not really
considered before. If this now happens, we emit an optimisation remark warning
about the code-size expansion, which can be avoided by not forcing
vectorisation or possibly source-code modifications.
Differential Revision: https://reviews.llvm.org/D67764
llvm-svn: 372694
Summary:
Fold
or(ashr(subNSW(Y, X), ScalarSizeInBits(Y)-1), X)
into
X s> Y ? -1 : X
https://rise4fun.com/Alive/d8Ab
clamp255 is a common operator in image processing, can be implemented
in a shifty way "(255 - X) >> 31 | X & 255". Fold shift into select
enables more optimization, e.g., vmin generation for ARM target.
Reviewers: lebedev.ri, efriedma, spatel, kparzysz, bcahoon
Reviewed By: lebedev.ri
Subscribers: kristof.beyls, hiraditya, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D67800
llvm-svn: 372678
When a cold path is outlined, the value tracking in the assumption cache may be
invalidated due to the code motion. We would previously trip an assertion in
subsequent passes (but required the passes to happen in a single run as the
assumption cache is shared across the passes). Invalidating the cache ensures
that we get the correct information when needed with the legacy pass manager as
well.
llvm-svn: 372667
is available
In rL372232, we treated names showing up in profile as not cold when
profile-sample-accurate is enabled. This caused 70k size regression in
Chrome/Android. The patch put a guard and only enable the change when
profile symbol list is available, i.e., keep the old behavior when profile
symbol list is not available.
Differential Revision: https://reviews.llvm.org/D67931
llvm-svn: 372665
"Implementations are free to malloc() a buffer containing either (size + 1) bytes or (strnlen(s, size) + 1) bytes. Applications should not assume that strndup() will allocate (size + 1) bytes when strlen(s) is smaller than size."
llvm-svn: 372647
Summary:
Motivation:
- If we can fold it to strdup, we should (strndup does more things than strdup).
- Annotation mechanism. (Works for strdup well).
strdup and strndup are part of C 20 (currently posix fns), so we should optimize them.
Reviewers: efriedma, jdoerfert
Reviewed By: jdoerfert
Subscribers: lebedev.ri, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D67679
llvm-svn: 372636
Summary:
If we have a pattern `(x & (-1 >> maskNbits)) << shiftNbits`,
we already know (have a fold) that will drop the `& (-1 >> maskNbits)`
mask iff `(shiftNbits-maskNbits) s>= 0` (i.e. `shiftNbits u>= maskNbits`).
So even if `(shiftNbits-maskNbits) s< 0`, we can still
fold, we will just need to apply a **constant** mask afterwards:
```
Name: c, normal+mask
%t0 = lshr i32 -1, C1
%t1 = and i32 %t0, %x
%r = shl i32 %t1, C2
=>
%n0 = shl i32 %x, C2
%n1 = i32 ((-(C2-C1))+32)
%n2 = zext i32 %n1 to i64
%n3 = lshr i64 -1, %n2
%n4 = trunc i64 %n3 to i32
%r = and i32 %n0, %n4
```
https://rise4fun.com/Alive/gslRa
Naturally, old `%masked` will have to be one-use.
This is not valid for pattern f - where "masking" is done via `ashr`.
https://bugs.llvm.org/show_bug.cgi?id=42563
Reviewers: spatel, nikic, xbolva00
Reviewed By: spatel
Subscribers: hiraditya, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D67725
llvm-svn: 372630
Summary:
And this is **finally** the interesting part of that fold!
If we have a pattern `(x & (~(-1 << maskNbits))) << shiftNbits`,
we already know (have a fold) that will drop the `& (~(-1 << maskNbits))`
mask iff `(maskNbits+shiftNbits) u>= bitwidth(x)`.
But that is actually ignorant, there's more general fold here:
In this pattern, `(maskNbits+shiftNbits)` actually correlates
with the number of low bits that will remain in the final value.
So even if `(maskNbits+shiftNbits) u< bitwidth(x)`, we can still
fold, we will just need to apply a **constant** mask afterwards:
```
Name: a, normal+mask
%onebit = shl i32 -1, C1
%mask = xor i32 %onebit, -1
%masked = and i32 %mask, %x
%r = shl i32 %masked, C2
=>
%n0 = shl i32 %x, C2
%n1 = add i32 C1, C2
%n2 = zext i32 %n1 to i64
%n3 = shl i64 -1, %n2
%n4 = xor i64 %n3, -1
%n5 = trunc i64 %n4 to i32
%r = and i32 %n0, %n5
```
https://rise4fun.com/Alive/F5R
Naturally, old `%masked` will have to be one-use.
Similar fold exists for patterns c,d,e, will post patch later.
https://bugs.llvm.org/show_bug.cgi?id=42563
Reviewers: spatel, nikic, xbolva00
Reviewed By: spatel
Subscribers: hiraditya, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D67677
llvm-svn: 372629
Summary:
Initially SLP vectorizer replaced all going-to-be-vectorized
instructions with Undef values. It may break ScalarEvaluation and may
cause a crash.
Reworked SLP vectorizer so that it does not replace vectorized
instructions by UndefValue anymore. Instead vectorized instructions are
marked for deletion inside if BoUpSLP class and deleted upon class
destruction.
Reviewers: mzolotukhin, mkuper, hfinkel, RKSimon, davide, spatel
Subscribers: RKSimon, Gerolf, anemet, hans, majnemer, llvm-commits, sanjoy
Differential Revision: https://reviews.llvm.org/D29641
llvm-svn: 372626
Enable flag introduced in rL294998. Security concerns are no longer valid, since function signatures for mentioned libc functions has no nonnull attribute (Clang does not generate them? I see no nonnull attr in LLVM IR for these functions) and since rL372091 we carefully annotate the callsites where we know that size is static, non zero. So let's enable this flag again..
llvm-svn: 372573
This has the potential to uncover missed analysis/folds as shown in the
min/max code comment/test, but fewer restrictions on icmp folds should
be better in general to solve cases like:
https://bugs.llvm.org/show_bug.cgi?id=43310
llvm-svn: 372510
While Promoting alloca instruction of Vector Type,
Check total size in bits of its slices too.
If they don't match, don't promote the alloca instruction.
Bug : https://bugs.llvm.org/show_bug.cgi?id=42585
llvm-svn: 372480
Summary:
This patch introduces `norecurse` function attribute deduction.
`norecurse` will be deduced if the following conditions hold:
* The size of SCC in which the function belongs equals to 1.
* The function doesn't have self-recursion.
* We have `norecurse` for all call site.
To avoid a large change, SCC is calculated using scc_iterator in InfoCache initialization for now.
Reviewers: jdoerfert, sstefan1
Reviewed By: jdoerfert
Subscribers: hiraditya, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D67751
llvm-svn: 372475
The static analyzer is warning about potential null dereference, but we can use cast<ConstantInt> directly and if not assert will fire for us.
llvm-svn: 372429
objc_release calls
This fixes a bug where the presence of debug instructions would cause
ARC optimizer to change the order of retain and release calls.
rdar://problem/55319419
llvm-svn: 372352
Summary:
FlattenCFG may erase unnecessary blocks, which also invalidates iterators to those erased blocks.
Before this patch, `iterativelyFlattenCFG` could try to increment a BB iterator after that BB has been removed and crash.
This patch makes FlattenCFGPass use `WeakVH` to skip over erased blocks.
Reviewers: dblaikie, tstellar, davide, sanjoy, asbirlea, grosser
Reviewed By: asbirlea
Subscribers: hiraditya, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D67672
llvm-svn: 372347
Summary:
This is again motivated by D67122 sanitizer check enhancement.
That patch seemingly worsens `-fsanitize=pointer-overflow`
overhead from 25% to 50%, which strongly implies missing folds.
In this particular case, given
```
char* test(char& base, unsigned long offset) {
return &base - offset;
}
```
it will end up producing something like
https://godbolt.org/z/luGEju
which after optimizations reduces down to roughly
```
declare void @use64(i64)
define i1 @test(i8* dereferenceable(1) %base, i64 %offset) {
%base_int = ptrtoint i8* %base to i64
%adjusted = sub i64 %base_int, %offset
call void @use64(i64 %adjusted)
%not_null = icmp ne i64 %adjusted, 0
%no_underflow = icmp ule i64 %adjusted, %base_int
%no_underflow_and_not_null = and i1 %not_null, %no_underflow
ret i1 %no_underflow_and_not_null
}
```
Without D67122 there was no `%not_null`,
and in this particular case we can "get rid of it", by merging two checks:
Here we are checking: `Base u>= Offset && (Base u- Offset) != 0`, but that is simply `Base u> Offset`
Alive proofs:
https://rise4fun.com/Alive/QOs
The `@llvm.usub.with.overflow` pattern itself is not handled here
because this is the main pattern, that we currently consider canonical.
https://bugs.llvm.org/show_bug.cgi?id=43251
Reviewers: spatel, nikic, xbolva00, majnemer
Reviewed By: xbolva00, majnemer
Subscribers: vsk, majnemer, xbolva00, hiraditya, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D67356
llvm-svn: 372341
In the example from:
https://bugs.llvm.org/show_bug.cgi?id=38502
...we hit infinite looping/crashing because we have non-standard IR -
an instruction operand is used before defined.
This and other unusual constructs are allowed in unreachable blocks,
so avoid the problem by using DominatorTree to step around landmines.
Differential Revision: https://reviews.llvm.org/D67766
llvm-svn: 372339
Add an ability to specify the max full unroll count for LoopUnrollPass pass
in pass options.
Reviewers: fhahn, fedor.sergeev
Reviewed By: fedor.sergeev
Subscribers: hiraditya, zzheng, dmgreen, llvm-commits
Differential Revision: https://reviews.llvm.org/D67701
llvm-svn: 372305
MSAN bot complains that there is use-of-uninitialized-value
of this FreeStores later in IsWorthwhile().
Perhaps FreeStores needs to be stored in a vector?
llvm-svn: 372262
Summary:
I don't have a direct motivational case for this,
but it would be good to have this for completeness/symmetry.
This pattern is basically the motivational pattern from
https://bugs.llvm.org/show_bug.cgi?id=43251
but with different predicate that requires that the offset is non-zero.
The completeness bit comes from the fact that a similar pattern (offset != zero)
will be needed for https://bugs.llvm.org/show_bug.cgi?id=43259,
so it'd seem to be good to not overlook very similar patterns..
Proofs: https://rise4fun.com/Alive/21b
Also, there is something odd with `isKnownNonZero()`, if the non-zero
knowledge was specified as an assumption, it didn't pick it up (PR43267)
With this, i see no other missing folds for
https://bugs.llvm.org/show_bug.cgi?id=43251
Reviewers: spatel, nikic, xbolva00
Reviewed By: spatel
Subscribers: hiraditya, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D67412
llvm-svn: 372257
Summary:
As it can be see in the changed test, while `div` is really costly,
we were speculating it. This does not seem correct.
Also, the old code would run for every single insturuction in BB,
instead of eagerly bailing out as soon as there are too many instructions.
This function still has a problem that `PHINodeFoldingThreshold` is
per-basic-block, while it should be for all the basic blocks.
Reviewers: efriedma, craig.topper, dmgreen, jmolloy
Reviewed By: jmolloy
Subscribers: xbolva00, hiraditya, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D67315
llvm-svn: 372255
is enabled.
We can save memory and reduce binary size significantly by enabling
ProfileSampleAccurate. However when ProfileSampleAccurate is true,
function without sample will be regarded as cold and this could
potentially cause performance regression.
To minimize the potential negative performance impact, we want to be
a little conservative here saying if a function shows up in the profile,
no matter as outline instance, inline instance or call targets, treat
the function as not being cold. This will handle the cases such as most
callsites of a function are inlined in sampled binary (thus outline copy
don't get any sample) but not inlined in current build (because of source
code drift, imprecise debug information, or the callsites are all cold
individually but not cold accumulatively...), so that the outline function
showing up as cold in sampled binary will actually not be cold after current
build. After the change, such function will be treated as not cold even
profile-sample-accurate is enabled.
At the same time we lower the hot criteria of callsiteIsHot check when
profile-sample-accurate is enabled. callsiteIsHot is used to determined
whether a callsite is hot and qualified for early inlining. When
profile-sample-accurate is enabled, functions without profile will be
regarded as cold and much less inlining will happen in CGSCC inlining pass,
so we can worry less about size increase and be aggressive to allow more
early inlining to happen for warm callsites and it is helpful for performance
overall.
Differential Revision: https://reviews.llvm.org/D67561
llvm-svn: 372232
Summary:
The PGO counter reading will add cold and inlinehint (hot) attributes
to functions that are very cold or hot. This was using hardcoded
thresholds, instead of the profile summary cutoffs which are used in
other hot/cold detection and are more dynamic and adaptable. Switch
to using the summary-based cold/hot detection.
The hardcoded limits were causing some code that had a medium level of
hotness (per the summary) to be incorrectly marked with a cold
attribute, blocking inlining.
Reviewers: davidxl
Subscribers: llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D67673
llvm-svn: 372189
For COFF, a comdat group is really a symbol marked
IMAGE_COMDAT_SELECT_ANY and zero or more other symbols marked
IMAGE_COMDAT_SELECT_ASSOCIATIVE. Typically the associative symbols in
the group are not external and are not referenced by other TUs, they are
things like debug info, C++ dynamic initializers, or other section
registration schemes. The Visual C++ linker reports a duplicate symbol
error for symbols marked IMAGE_COMDAT_SELECT_ASSOCIATIVE even if they
would be discarded after handling the leader symbol.
Fixes coverage-inline.cpp in check-profile after r372020.
llvm-svn: 372182
The static analyzer is warning about potential null dereferences of dyn_cast<> results, we can use cast<> directly as we know that these cases should all be CastInst, which is why its working atm and anyway cast<> will assert if they aren't.
llvm-svn: 372116