Commit Graph

27173 Commits

Author SHA1 Message Date
Alexey Bataev
2b896fc0d5 [SLP]Improved isGatherShuffledEntry, NFC.
Reworked isGatherShuffledEntry function, simplified and moved
common code to the lambda (it shall go away when non-power-2 patch will
be landed).
2021-04-27 05:59:46 -07:00
Florian Hahn
eed27336cf [LV] Hoist code to get vector loop latch (NFC).
Address suggestion from D99294.
2021-04-27 13:30:17 +01:00
Sanjay Patel
5d538c6cd1 [IndVars] avoid crash in LFTR when assuming an add recurrence
The test is a crasher reduced from:
https://llvm.org/PR49993

linearFunctionTestReplace() assumes that we have an add recurrence,
so check for that as a condition of matching a loop counter.

Differential Revision: https://reviews.llvm.org/D101291
2021-04-27 08:26:02 -04:00
Florian Hahn
1eea0872a4 [VPlan] Use recursive traversal iterator in VPSlotTracker.
This patch simplifies VPSlotTracker by using the recursive traversal
iterator to traverse all blocks in a VPlan in reverse post-order when
numbering VPValues in a plan.

This depends on a fix to RPOT (D100169). It also extends the traversal
unit tests to check RPOT.

Reviewed By: a.elovikov

Differential Revision: https://reviews.llvm.org/D100176
2021-04-27 12:39:06 +01:00
Vitaly Buka
519ec0d9ab [NFC] Fix "not used" warning 2021-04-26 22:09:23 -07:00
Arthur Eubanks
1f40f70a19 [Inliner] Make ModuleInlinerWrapperPass return PreservedAnalyses::all()
The ModulePassManager should already have taken care of all analysis
invalidation. Without this change, upcoming changes will cause more
invalidation than necessary.

Reviewed By: mtrofin

Differential Revision: https://reviews.llvm.org/D101320
2021-04-26 17:22:35 -07:00
William S. Moses
e7084f2810 [NVPTX] Enable lowering of atomics on local memory
LLVM does not have valid assembly backends for atomicrmw on local memory. However, as this memory is thread local, we should be able to lower this to the relevant load/store.

Differential Revision: https://reviews.llvm.org/D98650
2021-04-26 20:12:12 -04:00
Hongtao Yu
7416a011e2 [CSSPGO] Unblock optimizations with pseudo probe instrumentation part 2.
As a follow-up to D95982, this patch continues unblocking optimizations that are blocked by pseudu probe instrumention.

The optimizations unblocked are:
		- In-block load propagation.
		- In-block dead store elimination
		- Memory copy optimization that turns stores to consecutive memories into a memset.

These optimizations are local to a block, so they shouldn't affect the profile quality.

Reviewed By: wmi

Differential Revision: https://reviews.llvm.org/D100075
2021-04-26 16:52:33 -07:00
Fangrui Song
d0dc7f2344 [ADT] Remove StatisticBase and make NoopStatistic empty
In LLVM_ENABLE_STATS=0 builds, `llvm::Statistic` maps to `llvm::NoopStatistic`
but has 3 mostly unused pointers. GlobalOpt considers that the pointers can
potentially retain allocated objects, so GlobalOpt cannot optimize out the
`NoopStatistic` variables (see D69428 for more context), wasting 23KiB for stage
2 clang.

This patch makes `NoopStatistic` empty and thus reclaims the wasted space.  The
clang size is even smaller than applying D69428 (slightly smaller in both .bss and
.text).
```
# This means the D69428 optimization on clang is mostly nullified by this patch.
HEAD+D69428: size(.bss) = 0x0725a8
HEAD+D101211: size(.bss) = 0x072238

# bloaty - HEAD+D69428 vs HEAD+D101211
# With D101211, we also save a lot of string table space (.rodata).
    FILE SIZE        VM SIZE
 --------------  --------------
  -0.0%     -32  -0.0%     -24    .eh_frame
  -0.0%    -336  [ = ]       0    .symtab
  -0.0%    -360  [ = ]       0    .strtab
  [ = ]       0  -0.2%    -880    .bss
  -0.0% -2.11Ki  -0.0% -2.11Ki    .rodata
  -0.0% -2.89Ki  -0.0% -2.89Ki    .text
  -0.0% -5.71Ki  -0.0% -5.88Ki    TOTAL
```

Note: LoopFuse is a disabled pass. For now this patch adds
`#if LLVM_ENABLE_STATS` so `OptimizationRemarkMissed` is skipped in
LLVM_ENABLE_STATS==0 builds.  If these `OptimizationRemarkMissed` are useful in
LLVM_ENABLE_STATS==0 builds, we can replace `llvm::Statistic` with
`llvm::TrackingStatistic`, or use a different abstraction to keep track of the strings.

Similarly, skip the code in `mlir/lib/Pass/PassStatistics.cpp` which
calls `getName`/`getDesc`/`getValue`.

Reviewed By: lattner

Differential Revision: https://reviews.llvm.org/D101211
2021-04-26 16:47:32 -07:00
William S. Moses
5b35b95712 Revert "[NVPTX] Enable lowering of atomics on local memory"
This reverts commit fede99d386ec9e7bab2762aa16cb10c0513ae464.
2021-04-26 19:33:01 -04:00
William S. Moses
9ac62ee58d [NVPTX] Enable lowering of atomics on local memory
LLVM does not have valid assembly backends for atomicrmw on local memory. However, as this memory is thread local, we should be able to lower this to the relevant load/store.

Differential Revision: https://reviews.llvm.org/D98650
2021-04-26 19:27:27 -04:00
Lei Zhang
6f2c652025 Revert "[ADT] Remove StatisticBase and make NoopStatistic empty"
This reverts commit b5403117814a7c39b944839e10492493f2ceb4ac
because it breaks MLIR build:

https://buildkite.com/mlir/mlir-core/builds/13299#ad0f8901-dfa4-43cf-81b8-7940e2c6c15b
2021-04-26 18:31:04 -04:00
Michael Kruse
7bf3404c76 [SimplifyCFG] Preserve metadata when unconditionalizing branches (same target).
When replacing a conditional branch by an unconditional one because the targets are identical, transfer the metadata to the new branch instruction.

Reviewed By: lebedev.ri

Differential Revision: https://reviews.llvm.org/D101226
2021-04-26 17:23:01 -05:00
Fangrui Song
7fa56f879f [ADT] Remove StatisticBase and make NoopStatistic empty
In LLVM_ENABLE_STATS=0 builds, `llvm::Statistic` maps to `llvm::NoopStatistic`
but has 3 unused pointers. GlobalOpt considers that the pointers can potentially
retain allocated objects, so GlobalOpt cannot optimize out the `NoopStatistic`
variables (see D69428 for more context), wasting 23KiB for stage 2 clang.

This patch makes `NoopStatistic` empty and thus reclaims the wasted space.  The
clang size is even smaller than applying D69428 (slightly smaller in both .bss and
.text).
```
# This means the D69428 optimization on clang is mostly nullified by this patch.
HEAD+D69428: size(.bss) = 0x0725a8
HEAD+D101211: size(.bss) = 0x072238

# bloaty - HEAD+D69428 vs HEAD+D101211
# With D101211, we also save a lot of string table space (.rodata).
    FILE SIZE        VM SIZE
 --------------  --------------
  -0.0%     -32  -0.0%     -24    .eh_frame
  -0.0%    -336  [ = ]       0    .symtab
  -0.0%    -360  [ = ]       0    .strtab
  [ = ]       0  -0.2%    -880    .bss
  -0.0% -2.11Ki  -0.0% -2.11Ki    .rodata
  -0.0% -2.89Ki  -0.0% -2.89Ki    .text
  -0.0% -5.71Ki  -0.0% -5.88Ki    TOTAL
```

Note: LoopFuse is a disabled pass. This patch adds `#if LLVM_ENABLE_STATS` so
`OptimizationRemarkMissed` is skipped in LLVM_ENABLE_STATS==0 builds.  If these
`OptimizationRemarkMissed` are useful and not noisy, we can replace
`llvm::Statistic` with `llvm::TrackingStatistic` in the future.

Reviewed By: lattner

Differential Revision: https://reviews.llvm.org/D101211
2021-04-26 13:39:35 -07:00
Fangrui Song
4cb0dafe51 [gcov] Set nounwind and respect module flags metadata "frame-pointer" & "uwtable" for synthesized functions
This applies the D100251 mechanism to the gcov instrumentation pass.

With this patch, `-fno-omit-frame-pointer` in
`clang -fprofile-arcs -O1 -fno-omit-frame-pointer` will be respected for synthesized
`__llvm_gcov_writeout,__llvm_gcov_reset,__llvm_gcov_init` functions: the frame pointer
will be kept (note: on many targets -O1 eliminates the frame pointer by default).

`clang -fno-exceptions -fno-asynchronous-unwind-tables -g -fprofile-arcs` will
produce .debug_frame instead of .eh_frame.

Fix: https://github.com/ClangBuiltLinux/linux/issues/955

Reviewed By: nickdesaulniers

Differential Revision: https://reviews.llvm.org/D101129
2021-04-26 13:30:21 -07:00
Michael Kruse
11f3dcfec5 [SimplifyCFG] Preserve metadata when unconditionalizing branches (constant condition).
When replacing a conditional branch by an unconditional one because the condition is a constant, transfer the metadata to the new branch instruction.

Part of fix for llvm.org/PR50060

Reviewed By: lebedev.ri

Differential Revision: https://reviews.llvm.org/D101141
2021-04-26 10:57:31 -05:00
Dávid Bolvanský
8752b55ca6 [InstCombine] C - ctpop(a) - > ctpop(~a)) if C is bitwidth (PR50104)
Proof: https://alive2.llvm.org/ce/z/mncA9K
Solves https://bugs.llvm.org/show_bug.cgi?id=50104

Reviewed By: spatel

Differential Revision: https://reviews.llvm.org/D101257
2021-04-26 15:40:54 +02:00
Yuanbo Li
5b6b066173 [LSR][DebugInfo] Don't unnecessarily drop DebugLocs
When transforming a loop terminating condition into a "max" comparison,
the DebugLoc from the old condition should be set on the newly created
comparison. They are the same operation, just optimized. Fixes PR48067.

Differential Revision: https://reviews.llvm.org/D98218
2021-04-26 13:14:42 +01:00
Florian Hahn
cc6f98f11f [VPlan] Make blocksOnly work properly with ranges over const pointers.
When iterating over const blocks, the base type in the lambdas needs
to use const VPBlockBase *, otherwise it cannot be used with input
iterators over const VPBlockBase.

Also adjust the type of the input iterator range to const &, as it
does not take ownership of the input range.
2021-04-26 10:52:35 +01:00
Florian Hahn
55f97d3d98 [VPlan] Add VPBlockUtils::blocksOnly helper.
This patch adds a blocksOnly helpers which take an iterator range
over VPBlockBase * or const VPBlockBase * and returns an interator
range that only include BlockTy blocks. The accesses are casted to
BlockTy.

Reviewed By: a.elovikov

Differential Revision: https://reviews.llvm.org/D101093
2021-04-25 17:38:09 +01:00
Florian Hahn
b7e8ac7d19 [NewGVN] Properly transfer PredDep in move constructor. 2021-04-25 11:22:59 +01:00
Florian Hahn
1f7961e68e [NewGVN] Use ExprResult to add extra predicate users.
This patch updates performSymbolicPredicateInfoEvaluation to manage
registering additional dependencies using ExprResult. Similar to D99987,
this fixes an issues where we failed to track the correct dependency for
a phi-of-ops value, which is marked as temporary.

Fixes PR49873.

Reviewed By: asbirlea, ruiling

Differential Revision: https://reviews.llvm.org/D100560
2021-04-25 11:13:32 +01:00
Florian Hahn
2aa8af916f [NewGVN] Use performSymbolicEvaluation instead of createExpression.
performSymbolicEvaluation is used to obtain the symbolic expression when
visiting instructions and this is used to determine their congruence
class.

performSymbolicEvaluation only creates expressions for certain
instructions (via createExpression). For unsupported instructions,
'unknown' expression are created.

The use of createExpression in processOutgoingEdges means we may
simplify the condition in processOutgoingEdges to a constant in the
initial round of processing, but we use Unknown(I) for the congruence
class. If an operand of I changes the expression Unknown(I) stays the
same, so there is no update of the congruence class of I. Hence it
won't get re-visited. So if an operand of I changes in a way that causes
createExpression to return different result, this update is missed.

This patch updates the code to use performSymbolicEvaluation, to be
symmetric with the congruence class updating code.

Reviewed By: asbirlea

Differential Revision: https://reviews.llvm.org/D99990
2021-04-24 18:49:07 +01:00
Dávid Bolvanský
abe87e4bc2 [InstCombine] Fixed UB in foldCtpop 2021-04-24 19:44:16 +02:00
Dávid Bolvanský
3884c3dc13 [InstCombine] ctpop(rot(X)) -> ctpop(X)
Proof:
https://alive2.llvm.org/ce/z/ss2zyt - rotl
https://alive2.llvm.org/ce/z/ZM7Aue - rotr

Reviewed By: nikic

Differential Revision: https://reviews.llvm.org/D101235
2021-04-24 18:25:03 +02:00
Dávid Bolvanský
e156538759 [InstCombine] ctpop(X) + ctpop(Y) => ctpop(X | Y) if X and Y have no common bits (PR48999)
For example:

```
int src(unsigned int a, unsigned int b)
{
    return __builtin_popcount(a << 16) + __builtin_popcount(b >> 16);
}

int tgt(unsigned int a, unsigned int b)
{
    return __builtin_popcount((a << 16)  | (b >> 16));
}
```

Reviewed By: nikic

Differential Revision: https://reviews.llvm.org/D101210
2021-04-24 17:52:10 +02:00
dfukalov
4398288fa7 [GVN] Clobber partially aliased loads.
Use offsets stored in `AliasResult` implemented in D98718.

Reviewed By: nikic

Differential Revision: https://reviews.llvm.org/D95543
2021-04-24 14:14:20 +03:00
wlei
1f3e5b2188 [CSSPGO] Fix missing debug info of dangling pseudo probe
While doing speculative execution opt, it conservatively drops all insn's debug info in the merged `ThenBB`(see the loop at line 2384) including the dangling probe. The missing debug info of the dangling probe will cause the wrong inference computation.

So we should avoid dropping the debug info from pseudo probe, this change try to fix this by moving the to-be dangling probe to the merging target BB before the debug info is dropped.

Reviewed By: hoy, wenlei

Differential Revision: https://reviews.llvm.org/D101195
2021-04-23 14:26:47 -07:00
Dávid Bolvanský
a2061d4adb [InstCombine] X - usub.sat(X, Y) => umin(X, Y)
Pattern regressed in LLVM 9 with the introduction of usub.sat.

Fixes https://bugs.llvm.org/show_bug.cgi?id=42178#c2

Reviewed By: nikic

Differential Revision: https://reviews.llvm.org/D101184
2021-04-23 21:13:07 +02:00
Hongtao Yu
cbb480a8eb [CSSPGO] Fix incorrect prorating indirect call distribution factor that leads to target count loss.
Pseudo probe distribution factor is used to scale down profile samples to avoid misleading the counts inference due to the usage of "maximum" in `getBlockWeight`. For callsites, the scaling down can come from code duplication prior to the sample profile loader (prelink or postlink), or due to the indirect call promotion in sample loader inliner. This patch fixes an issue in sample loader ICP where the leftover indirect callsite scaling down causes the loss of non-promoted call target samples unexpectedly. While the scaling down is to favor BFI/BPI with accurate an callsite count, it doesn't fit in the current distribution factor that represents code duplication changes. Ideally,  we would need two factors, one is for code duplication, the other is for ICP. However this seems over complicated. I'm going to trade one usage (callsite counts) for the other (call target counts).

Seeing perf win on one benchmark (mcf) of SPEC2017 with others unchanged.

Reviewed By: wenlei

Differential Revision: https://reviews.llvm.org/D100993
2021-04-23 11:09:22 -07:00
Sanjay Patel
173c7ee255 [InstCombine] fold 'not' of ctpop in parity pattern
As discussed in https://llvm.org/PR50096 , we could
convert the 'not' into a 'sub' and see the same
fold. That's because we already have another demanded
bits optimization for 'sub'.

We could add a related transform for
odd-number-of-type-bits, but that seems unlikely
to be practical.

https://alive2.llvm.org/ce/z/TWJZXr
2021-04-23 13:23:24 -04:00
Florian Hahn
4b94f2fec6 [VPlan] Add GraphTraits impl to traverse through VPRegionBlock.
This patch adds a new iterator to traverse through VPRegionBlocks and a
GraphTraits specialization using the iterator to traverse through
VPRegionBlocks.

Because there is already a GraphTraits specialization for VPBlockBase *
and co, a new VPBlockRecursiveTraversalWrapper helper is introduced.
This allows us to provide a new GraphTraits specialization for that
type. Users can use the new recursive traversal by using this wrapper.

The graph trait visits both the entry block of a region, as well as all
its successors. Exit blocks of a region implicitly have their parent
region's successors. This ensures all blocks in a region are visited
before any blocks in a successor region when doing a reverse post-order
traversal of the graph.

Reviewed By: a.elovikov

Differential Revision: https://reviews.llvm.org/D100175
2021-04-23 17:26:47 +01:00
Sander de Smalen
b71b6a828f [TTI] NFC: Change getIntImmCost[Inst|Intrin] to return InstructionCost
This patch migrates the TTI cost interfaces to return an InstructionCost.

See this patch for the introduction of the type: https://reviews.llvm.org/D91174
See this thread for context: http://lists.llvm.org/pipermail/llvm-dev/2020-November/146408.html

Differential Revision: https://reviews.llvm.org/D100565
2021-04-23 16:06:36 +01:00
Sander de Smalen
b447679db3 [TTI] NFC: Change getScalingFactorCost to return InstructionCost
This patch migrates the TTI cost interfaces to return an InstructionCost.

See this patch for the introduction of the type: https://reviews.llvm.org/D91174
See this thread for context: http://lists.llvm.org/pipermail/llvm-dev/2020-November/146408.html

Differential Revision: https://reviews.llvm.org/D100564
2021-04-23 16:06:36 +01:00
Timm Bäder
438fece2aa [llvm][NFC] Fix assert indentation
This triggers GCC's misleading-indentation checker.
2021-04-23 14:44:05 +02:00
Dávid Bolvanský
30eb4998c0 [InstCombine] Fixed crash when setting align attr for memalign 2021-04-23 14:04:08 +02:00
Florian Hahn
baa2054364 Recommit "[NewGVN] Track simplification dependencies for phi-of-ops."
This recommits 4f5da356ff35a218f23f0b0c4d08aee90da7de6e, including
explicit implementations of move a constructor and deleted copy
constructors/assignment operators, to fix failures with some compilers.

This reverts the revert 74854d00e854196445727a49df58fe5768d9ed5b.
2021-04-23 11:27:43 +01:00
Stephen Tozer
8b8275b1fc Re-reapply "[DebugInfo] Use variadic debug values to salvage BinOps and GEP instrs with non-const operands"
Previous build failures were caused by an error in bitcode reading and
writing for DIArgList metadata, which has been fixed in e5d844b587.
There were also some unnecessary asserts that were being triggered on
certain builds, which have been removed.

This reverts commit dad5caa59e6b2bde8d6cf5b64a972c393c526c82.
2021-04-23 10:54:01 +01:00
Florian Hahn
3a8b302d64 Revert "[NewGVN] Track simplification dependencies for phi-of-ops."
This reverts commit 4f5da356ff35a218f23f0b0c4d08aee90da7de6e.

This causes some  buildbot failures, e.g.
https://lab.llvm.org/buildbot/#/builders/139/builds/3019
2021-04-23 09:56:17 +01:00
Florian Hahn
a088ff213f [NewGVN] Track simplification dependencies for phi-of-ops.
If we are using a simplified value, we need to add an extra
dependency this value , because changes to the class of the
simplified value may require us to invalidate any decision based on
that value.

This is done by adding such values as additional users, however the
current code does not excludes temporary instructions.

At the moment, this means that we miss those dependencies for
phi-of-ops, because they are temporary instructions at this point. We
instead need to add the extra dependencies to the root instruction of
the phi-of-ops.

This patch pushes the responsibility of adding extra users to the
callers of createExpression & performSymbolicEvaluation. At those
points, it is clearer which real instruction to pick.

Alternatively we could either pass the 'real' instruction as additional
argument or use another map, but I think the approach in the patch makes
things a bit easier to follow.

Fixes PR35074.

Reviewed By: asbirlea

Differential Revision: https://reviews.llvm.org/D99987
2021-04-23 09:48:38 +01:00
KAWASHIMA Takahiro
47186c3ead [LoopReroll] Fix rerolling loop with extra instructions
Fixes PR47627

This fix suppresses rerolling a loop which has an unrerollable
instruction.

Sample IR for the explanation below:

```
define void @foo([2 x i32]* nocapture %a) {
entry:
  br label %loop

loop:
  ; base instruction
  %indvar = phi i64 [ 0, %entry ], [ %indvar.next, %loop ]

  ; unrerollable instructions
  %stptrx = getelementptr inbounds [2 x i32], [2 x i32]* %a, i64 %indvar, i64 0
  store i32 999, i32* %stptrx, align 4

  ; extra simple arithmetic operations, used by root instructions
  %plus20 = add nuw nsw i64 %indvar, 20
  %plus10 = add nuw nsw i64 %indvar, 10

  ; root instruction 0
  %ldptr0 = getelementptr inbounds [2 x i32], [2 x i32]* %a, i64 %plus20, i64 0
  %value0 = load i32, i32* %ldptr0, align 4
  %stptr0 = getelementptr inbounds [2 x i32], [2 x i32]* %a, i64 %plus10, i64 0
  store i32 %value0, i32* %stptr0, align 4

  ; root instruction 1
  %ldptr1 = getelementptr inbounds [2 x i32], [2 x i32]* %a, i64 %plus20, i64 1
  %value1 = load i32, i32* %ldptr1, align 4
  %stptr1 = getelementptr inbounds [2 x i32], [2 x i32]* %a, i64 %plus10, i64 1
  store i32 %value1, i32* %stptr1, align 4

  ; loop-increment and latch
  %indvar.next = add nuw nsw i64 %indvar, 1
  %exitcond = icmp eq i64 %indvar.next, 5
  br i1 %exitcond, label %exit, label %loop

exit:
  ret void
}
```

In the loop rerolling pass, `%indvar` and `%indvar.next` are appended
to the `LoopIncs` vector in the `LoopReroll::DAGRootTracker::findRoots`
function.

Before this fix, two instructions with `unrerollable instructions`
comment above are marked as `IL_All` at the end of the
`LoopReroll::DAGRootTracker::collectUsedInstructions` function,
as well as instructions with `extra simple arithmetic operations`
comment and `loop-increment and latch` comment. It is incorrect
because `IL_All` means that the instruction should be executed in all
iterations of the rerolled loop but the `store` instruction should
not.

This fix rejects instructions which may have side effects and don't
belong to def-use chains of any root instructions and reductions.

See https://bugs.llvm.org/show_bug.cgi?id=47627 for more information.
2021-04-23 15:14:46 +09:00
Elia Geretto
99885567cb [dfsan] Fix Len argument type in call to __dfsan_mem_transfer_callback
This patch is supposed to solve: https://bugs.llvm.org/show_bug.cgi?id=50075

The function `__dfsan_mem_transfer_callback` takes a `Len` argument of type `i64`; however, when processing a `MemTransferInst` such as `llvm.memcpy.p0i8.p0i8.i32`, the `len` argument has type `i32`. In order to make the type of `len` compatible with the one of the callback argument, this change zero-extends it when necessary.

Reviewed By: stephan.yichao.zhao, gbalats

Differential Revision: https://reviews.llvm.org/D101048
2021-04-22 21:12:20 +00:00
Arthur Eubanks
21048e7590 [GlobalOpt] Don't replace alias with aliasee if aliasee is interposable
Both the alias and aliasee linkage are important.

PR27866 provides some background.

Reviewed By: rnk

Differential Revision: https://reviews.llvm.org/D99629
2021-04-22 13:12:34 -07:00
Philip Reames
8361e53fbe Revert "[instcombine] Exploit UB implied by nofree attributes"
This change effectively reverts 86664638, but since there have been some changes on top and I wanted to leave the tests in, it's not a mechanical revert.

Why revert this now?  Two main reasons:
1) There are continuing discussion around what the semantics of nofree.  I am getting increasing uncomfortable with the seeming possibility we might redefine nofree in a way incompatible with these changes.
2) There was a reported miscompile triggered by this change (https://github.com/emscripten-core/emscripten/issues/9443).  At first, I was making good progress on tracking down the issues exposed and those issues appeared to be unrelated latent bugs.  Now that we've found at least one bug in the original change, and the investigation has stalled, I'm no longer comfortable leaving this in tree.  In retrospect, I probably should have reverted this earlier and investigated the issues once the triggering change was out of tree.
2021-04-22 10:53:17 -07:00
Jianzhou Zhao
94cf740f57 [dfsan] Track origin at loads
The first version of origin tracking tracks only memory stores. Although
    this is sufficient for understanding correct flows, it is hard to figure
    out where an undefined value is read from. To find reading undefined values,
    we still have to do a reverse binary search from the last store in the chain
    with printing and logging at possible code paths. This is
    quite inefficient.

    Tracking memory load instructions can help this case. The main issues of
    tracking loads are performance and code size overheads.

    With tracking only stores, the code size overhead is 38%,
    memory overhead is 1x, and cpu overhead is 3x. In practice #load is much
    larger than #store, so both code size and cpu overhead increases. The
    first blocker is code size overhead: link fails if we inline tracking
    loads. The workaround is using external function calls to propagate
    metadata. This is also the workaround ASan uses. The cpu overhead
    is ~10x. This is a trade off between debuggability and performance,
    and will be used only when debugging cases that tracking only stores
    is not enough.

Reviewed By: gbalats

Differential Revision: https://reviews.llvm.org/D100967
2021-04-22 16:25:24 +00:00
Alexey Bataev
dab3d7322e [SLP]Skip undefs trying to find perfect/shuffled tree entries matching.
We can skip check for undefs trying to find perfect/shuffled tree
entries matching, they can be ignored completely improving the final
cost/vectorization results.

Differential Revision: https://reviews.llvm.org/D101061
2021-04-22 08:59:07 -07:00
Joe Ellis
0427e8801a [LoopVectorize] Fix bug where predicated loads/stores were dropped
This commit fixes a bug where the loop vectoriser fails to predicate
loads/stores when interleaving for targets that support masked
loads and stores.

Code such as:

     1  void foo(int *restrict data1, int *restrict data2)
     2  {
     3    int counter = 1024;
     4    while (counter--)
     5      if (data1[counter] > data2[counter])
     6        data1[counter] = data2[counter];
     7  }

... could previously be transformed in such a way that the predicated
store implied by:

    if (data1[counter] > data2[counter])
       data1[counter] = data2[counter];

... was lost, resulting in miscompiles.

This bug was causing some tests in llvm-test-suite to fail when built
for SVE.

Differential Revision: https://reviews.llvm.org/D99569
2021-04-22 15:05:54 +00:00
Alexey Bataev
be1b72b8b6 [SLP]Replace more TTI with TTIRef, NFC.
To pacify MSVC buildbots.
2021-04-22 07:53:20 -07:00
Alexey Bataev
b09b5f35d0 [SLP]Added explicit ref to TargetTransformInfo to try to pacify MSVC
buildbots, NFC.
2021-04-22 07:49:48 -07:00
Alexey Bataev
a899f9f408 [SLP]Improve cost model for the vectorized extractelements.
1. No need to call `areAllUsersVectorized` as later the cost is
   calculated only if the instruction has one use and gets vectorized.
2. Need to calculate the cost of the dead extractelement more precisely,
   taking the vector type of the vector operand, not the resulting
   vector type.

Part of D57059.

Differential Revision: https://reviews.llvm.org/D99980
2021-04-22 07:40:17 -07:00