Although LLVM supports vectorization of loops containing log10/sqrt, it did not support using SVML implementation of it. Added support so that when clang is invoked with -fveclib=SVML now an appropriate SVML library log2 implementation will be invoked.
Follow up on: https://reviews.llvm.org/D77114
Tests:
Added unit tests to svml-calls.ll, svml-calls-finite.ll. Can be run with llvm-lint.
Created a simple c++ file that tests log10/sqrt, and used clang+ to build it, and output final assembly.
Reviewed By: craig.topper
Differential Revision: https://reviews.llvm.org/D87169
D65060 was reverted because it introduced non-determinism by using BFI counts from already freed blocks. The parent of this revision fixes that by using a VH callback on blocks to prevent this from happening and makes sure BFI data is passed correctly in LoopStandardAnalysisResults.
This re-introduces the previous optimization of using BFI data to prevent LICM from hoisting/sinking if the instruction will end up moving to a colder block.
Internally at Facebook this change results in a ~7% win in a CPU related metric in one of our big services by preventing hoisting cold code into a hot pre-header like the added test case demonstrates.
Testing:
ninja check
Reviewed By: asbirlea
Differential Revision: https://reviews.llvm.org/D87551
These functions were extremely similar:
- `emitADD`
- `emitADDS`
- `emitCMN`
Refactor them a little, introducing a more generic `emitInstr` function to
do most of the work.
Also add support for the immediate + shifted register addressing modes in each
of them.
Update select-uaddo.mir to show that selecing ADDS now supports folding
immediates + shifts. (I don't think this can impact CMN, because the CMN checks
require a G_SUB with a non-constant on the RHS.)
This is around a 0.02% code size improvement on CTMark at -O3.
Differential Revision: https://reviews.llvm.org/D87529
When adding a new function via addNewFunctionIntoRefSCC(), it creates a
new node and immediately populates the edges. Since populateSlow() calls
G->get() on all referenced functions, it will create a node (but not
populate it) for functions that haven't yet been added. If we add two
mutually recursive functions, the assert that the node should never have
been created will fire when the second function is added. So here we
remove that assert since the node may have already been created (but not
yet populated).
createNode() is only called from addNewFunctionInto{,Ref}SCC().
https://bugs.llvm.org/show_bug.cgi?id=47502
Reviewed By: jdoerfert
Differential Revision: https://reviews.llvm.org/D87623
test/CodeGen/AArch64/GlobalISel/combine-trunc.mir was failing
due to the different order for evaluating function arguments.
This patch updates the related code to fix the issue.
~~D65060 uncovered that trying to use BFI in loop passes can lead to non-deterministic behavior when blocks are re-used while retaining old BFI data.~~
~~To make sure BFI is preserved through loop passes a Value Handle (VH) callback is registered on blocks themselves. When a block is freed it now also wipes out the accompanying BFI entry such that stale BFI data can no longer persist resolving the determinism issue. ~~
~~An optimistic approach would be to incrementally update BFI information throughout the loop passes rather than only invalidating them on removed blocks. The issues with that are:~~
~~1. It is not clear how BFI information should be incrementally updated: If a block is duplicated does its BFI information come with? How about if it's split/modified/moved around? ~~
~~2. Assuming we can address these problems the implementation here will be a massive undertaking. ~~
~~There's a known need of BFI in LICM analysis which requires correct but not incrementally updated BFI data. A follow-up change can register BFI in all loop passes so this preserved but potentially lossy data is available to any loop pass that wants it.~~
See: D75341 for an identical implementation of preserving BFI via VH callbacks. The previous statements do still apply but this change no longer has to be in this diff because it's already upstream 😄 .
This diff also moves BFI to be a part of LoopStandardAnalysisResults since the previous method using getCachedResults now (correctly!) statically asserts (D72893) that this data isn't static through the loop passes.
Testing
Ninja check
Reviewed By: asbirlea, nikic
Differential Revision: https://reviews.llvm.org/D86156
https://reviews.llvm.org/D87668
Patch adds two new GICombinerRules, one for G_MUL(X, 1) and another for G_MUL(X, -1).
G_MUL(X, 1) is an identity combine, and G_MUL(X, -1) gets replaced with G_SUB(0, X).
Patch additionally adds new combiner tests for the AArch64 target to test these
new combiner rules, as well as updates AMDGPU GISel tests.
Patch by mkitzan
This will embed bitcode after (Thin)LTO merge, but before optimizations.
In the case the thinlto backend is called from clang, the .llvmcmd
section is also produced. Doing so in the case where the caller is the
linker doesn't yet have a motivation, and would require plumbing through
command line args.
Differential Revision: https://reviews.llvm.org/D87636
We have a single noret intrinsic an a lot of special handling
around it. Declare it just as any other but do not define rtn
instructions itself instead.
Differential Revision: https://reviews.llvm.org/D87719
Call instructions with musttail tag must be optimized as a tailcall, otherwise could lead to incorrect program behavior.
When TSAN is instrumenting functions, it broke the contract by adding a call to the tsan exit function inbetween the musttail call and return instruction, and also inserted exception handling code.
This happend throguh EscapeEnumerator, which adds exception handling code and returns ret instructions as the place to insert instrumentation calls.
This becomes especially problematic for coroutines, because coroutines rely on tail calls to do symmetric transfers properly.
To fix this, this patch moves the location to insert instrumentation calls prior to the musttail call for ret instructions that are following musttail calls, and also does not handle exception for musttail calls.
Differential Revision: https://reviews.llvm.org/D87620
For scalable type, the aggregated size is unknown at compile-time.
Skip instructions with scalable type to ensure the list of instructions
for vectorizeSimpleInstructions does not contains any scalable-vector instructions.
Reviewed By: RKSimon
Differential Revision: https://reviews.llvm.org/D87550
Building LLVM with -DEXPENSIVE_CHECKS fails with the following error
message with libstdc++ in debug mode:
Error: comparison doesn't meet irreflexive requirements,
assert(!(a < a)).
The patch fixes the comparison function SizeOrder by returning false
when comparing two equal items.
Invalid IR in unreachable code is technically valid IR. In this case,
the address space of the value was never inferred, and we tried to
rewrite it with an invalid address space value which would assert.
This patch is a first draft of a new pass that adds a more flexible way
to eliminate compares based on more complex constraints collected from
dominating conditions.
In particular, it aims at simplifying conditions of the forms below
using a forward propagation approach, rather than instcomine-style
ad-hoc backwards walking of def-use chains.
if (x < y)
if (y < z)
if (x < z) <- simplify
or
if (x + 2 < y)
if (x + 1 < y) <- simplify assuming no wraps
The general approach is to collect conditions and blocks, sort them by
dominance and then iterate over the sorted list. Conditions are turned
into a linear inequality and add it to a system containing the linear
inequalities that hold on entry to the block. For blocks, we check each
compare against the system and see if it is implied by the constraints
in the system.
We also keep a stack of processed conditions and remove conditions from
the stack and the constraint system once they go out-of-scope (= do not
dominate the current block any longer).
Currently there still are the least the following areas for improvements
* Currently large unsigned constants cannot be added to the system
(coefficients must be represented as integers)
* The way constraints are managed currently is not very optimized.
Reviewed By: spatel
Differential Revision: https://reviews.llvm.org/D84547
Bugpoint has lots of assumptions and hacks around the legacy PM, put off migrating it to NPM until later.
Fixes tests under BugPoint under NPM.
Reviewed By: arsenm
Differential Revision: https://reviews.llvm.org/D87655
Was missing MODULE_ALIAS_ANALYSIS, previously only FUNCTION_ALIAS_ANALYSIS was taken into account.
Reviewed By: asbirlea
Differential Revision: https://reviews.llvm.org/D87664
Pin RUN lines with -analyze to legacy PM, add corresponding NPM RUN line if missing.
Reviewed By: asbirlea
Differential Revision: https://reviews.llvm.org/D87658
Similar to the tsan suppression in
`Utils/VNCoercion.cpp:getLoadLoadClobberFullWidthSize` (rL175034; load widening used by GVN),
the D81766 optimization should be suppressed under tsan due to potential
spurious data race reports:
struct A {
int i;
const short s; // the load cannot be vectorized because
int modify; // it overlaps with bytes being concurrently modified
long pad1, pad2;
};
// __tsan_read16 does not know that some bytes are undef and accessing is safe
Similarly, under asan, users can mark memory regions with
`__asan_poison_memory_region`. A widened load can lead to a spurious
use-after-poison error. hwasan/memtag should be similarly suppressed.
`mustSuppressSpeculation` suppresses asan/hwasan/tsan but not memtag, so
we need to exclude memtag in `vectorizeLoadInsert`.
Note, memtag suppression can be relaxed if the load is aligned to the
its granule (usually 16), but that is out of scope of this patch.
Reviewed By: spatel, vitalybuka
Differential Revision: https://reviews.llvm.org/D87538
Make it possible to run the script command with a different language
than currently selected.
$ ./bin/lldb -l python
(lldb) script -l lua
>>> io.stdout:write("Hello, World!\n")
Hello, World!
When passing the language option and a raw command, you need to separate
the flag from the script code with --.
$ ./bin/lldb -l python
(lldb) script -l lua -- io.stdout:write("Hello, World!\n")
Hello, World!
Differential revision: https://reviews.llvm.org/D86996
PR47534 exposes a case where calling lowerShuffleWithSHUFPS directly from a derived repeated mask (found by is128BitLaneRepeatedShuffleMask) results in us using an non-canonicalized mask.
The missed canonicalization in this case is trivial - just commute the mask so we have more (swapped) LHS than RHS references so lowerShuffleWithSHUFPS can handle it.