There is no upper cap set on current Schedule Optimizer to compute
schedule. In some cases a very long compile time taken to compute the
schedule resulting in hang kind of behavior. This patch introduces a
flag 'polly-schedule-computeout' to pass the capwhich is initialized to
300000. This patch handles the compute out cases by bailing out and
exiting gracefully.
Fixed the test that failed in previous commit.
Fixes#69090
There is no upper cap set on current Schedule Optimizer to compute
schedule. In some cases a very long compile time taken to compute the
schedule resulting in hang kind of behavior. This patch introduces a
flag 'polly-schedule-computeout' to pass the capwhich is initialized to
300000. This patch handles the compute out cases by bailing out and
exiting gracefully.
Fixes#69090
zext nneg was recently added to the IR in #67982. Teaching SCEVExpander
to emit nneg when possible is valuable since SCEV may have proved
non-trivial facts about loop bounds which would otherwise be lost when
materializing the value.
After D154102 multi-line labels would get split incorrectly.
When CFG is generated for a function with basic block name longer
than 80 lines, then the header separator will be placed after the
line break for the label name instead of after the whole label name.
The fix is simple by just moving the insert of | character before the
line splitting happens.
Differential Revision: https://reviews.llvm.org/D159207
This change adds separators for basic block names, which makes it
easier to find a basic block based on its name and separates it
from the code.
Currently there is also a chance that the basic block label will
be present twice, that is in case the basic block has explicit
numbering, this change fixes this bug.
Differential Revision: https://reviews.llvm.org/D154102
Before this patch, we can only use the MaxBECount for an AddRec's range
computation if the MaxBECount has <= bit width of the AddRec. This patch
reasons that if a MaxBECount has > bit width, and is <= the max value of
AddRec's bit width, we can still use the MaxBECount.
Reviewed By: nikic
Differential Revision: https://reviews.llvm.org/D151698
This is an ongoing series of commits that are reformatting our
Python code. This catches the last of the python files to
reformat. Since they where so few I bunched them together.
Reformatting is done with `black`.
If you end up having problems merging this commit because you
have made changes to a python file, the best way to handle that
is to run git checkout --ours <yourfile> and then reformat it
with black.
If you run into any problems, post to discourse about it and
we will try to help.
RFC Thread below:
https://discourse.llvm.org/t/rfc-document-and-standardize-python-code-style
Reviewed By: jhenderson, #libc, Mordante, sivachandra
Differential Revision: https://reviews.llvm.org/D150784
As long as aliasee has `@llvm.used` or `@llvm.compiler.used` references, we cannot do the related replace or delete operations. Even if it is a Local Linkage, we cannot infer if there is no other use for it, such as asm or other future added cases.
Reviewed By: nikic
Differential Revision: https://reviews.llvm.org/D145293
Polly-ACC is unmaintained and since it has never been ported to the NPM pipeline, since D136621 it is not even accessible anymore without manually specifying the passes on the `opt` command line.
Since there is no plan to put it to a maintainable state, remove it from Polly.
Reviewed By: grosser
Differential Revision: https://reviews.llvm.org/D142580
Polly's internal vectorizer is not well maintained and is known to not work in some cases such as region ScopStmts. Unlike LLVM's LoopVectorize pass it also does not have a target-dependent cost heuristics, and we recommend using LoopVectorize instead of -polly-vectorizer=polly.
In the future we hope that Polly can collaborate better with LoopVectorize, like Polly marking a loop is safe to vectorize with a specific simd width, instead of replicating its functionality.
Reviewed By: grosser
Differential Revision: https://reviews.llvm.org/D142640
Two lit test used overaligned i8, without the test case actually
depending on i8 alignment.
Change the datalayout string to use naturally aligned i8,
preparing for the upcoming requirement of naturally aligned i8.
IR is now always parsed in opaque pointer mode, unless
-opaque-pointers=0 is explicitly given. There is no automatic
detection of typed pointers anymore.
The -opaque-pointers=0 option is added to any remaining IR tests
that haven't been migrated yet.
Differential Revision: https://reviews.llvm.org/D141912
These have been effectively disabled ever since 'nvptx' was added to
the REQUIRES clauses, because REQUIRES does not support triple checks.
The new 'target=<triple>' is supported, so switch to that scheme.
Fix up XFAIL annotations, now that these tests are actually run.
Part of the project to eliminate special handling for triples in lit
expressions.
Differential Revision: https://reviews.llvm.org/D139728
Instcombine prefers this canonical form (see getPreferredVectorIndex),
as does IRBuilder when passing the index as an integer so we may as
well use the prefered form from creation.
NOTE: All test changes are mechanical with nothing else expected
beyond a change of index type from i32 to i64.
Differential Revision: https://reviews.llvm.org/D140983
Clang's lit.cfg.py reads this to add an "enable-shared" feature that
three of clang's lit tests use. Nothing else reads enable_shared, so
remove it from most lit.site.cfg.py.in files.
Differential Revision: https://reviews.llvm.org/D138301
A simple sed doing these substitutions:
- `${LLVM_BINARY_DIR}/lib${LLVM_LIBDIR_SUFFIX}\>` -> `${LLVM_LIBRARY_DIR}`
- `${LLVM_BINARY_DIR}/bin\>` -> `${LLVM_TOOLS_BINARY_DIR}`
where `\>` means "word boundary".
The only manual modifications were reverting changes in
- `runtimes/CMakeLists.txt`
because these were "entry points" where we wanted to tread carefully not not introduce a "loop" which would end with an undefined variable being expanded to nothing.
There are some `${LLVM_BINARY_DIR}/lib` without the `${LLVM_LIBDIR_SUFFIX}`, but these refer to the lib subdirectory of the source (`llvm/lib`). That `lib` is automatically appended to make the local `CMAKE_CURRENT_BINARY_DIR` value by `add_subdirectory`; since the directory name in the source tree is fixed without any suffix, the corresponding `CMAKE_CURRENT_BINARY_DIR` will also be. We therefore do not replace it but leave it as-is.
This picks up where D133828 left off, getting the occurrences with*out* `CMAKE_CFG_INTDIR`. But this is difficult to do correctly and so not done in the (retroactively) previous diff.
This hopefully increases readability overall, and also decreases the usages of `LLVM_LIBDIR_SUFFIX`, preparing us for D130586.
Reviewed By: sebastian-ne
Differential Revision: https://reviews.llvm.org/D132316
The unittests are already included in check-polly, so check-all was
running them twice. Running them twice causes a race on the output
files, which led to intermittent failures on the reverse-iteration
buildbot.
A simple sed doing these substitutions:
- `${LLVM_BINARY_DIR}/(\$\{CMAKE_CFG_INTDIR}/)?lib(${LLVM_LIBDIR_SUFFIX})?\>` -> `${LLVM_LIBRARY_DIR}`
- `${LLVM_BINARY_DIR}/(\$\{CMAKE_CFG_INTDIR}/)?bin\>` -> `${LLVM_TOOLS_BINARY_DIR}`
where `\>` means "word boundary".
The only manual modifications were reverting changes in
- `compiler-rt/cmake/Modules/CompilerRTUtils.cmake
- `runtimes/CMakeLists.txt`
because these were "entry points" where we wanted to tread carefully not not introduce a "loop" which would end with an undefined variable being expanded to nothing.
This hopefully increases readability overall, and also decreases the usages of `LLVM_LIBDIR_SUFFIX`, preparing us for D130586.
Reviewed By: sebastian-ne
Differential Revision: https://reviews.llvm.org/D132316
We held off on this before as `LLVM_LIBDIR_SUFFIX` conflicted with it.
Now we return this.
`LLVM_LIBDIR_SUFFIX` is kept as a deprecated way to set
`CMAKE_INSTALL_LIBDIR`. The other `*_LIBDIR_SUFFIX` are just removed
entirely.
I imagine this is too potentially-breaking to make LLVM 15. That's fine.
I have a more minimal version of this in the disto (NixOS) patches for
LLVM 15 (like previous versions). This more expansive version I will
test harder after the release is cut.
Reviewed By: sebastian-ne, ldionne, #libc, #libc_abi
Differential Revision: https://reviews.llvm.org/D130586
I went over the output of the following mess of a command:
`(ulimit -m 2000000; ulimit -v 2000000; git ls-files -z | parallel --xargs -0 cat | aspell list --mode=none --ignore-case | grep -E '^[A-Za-z][a-z]*$' | sort | uniq -c | sort -n | grep -vE '.{25}' | aspell pipe -W3 | grep : | cut -d' ' -f2 | less)`
and proceeded to spend a few days looking at it to find probable typos
and fixed a few hundred of them in all of the llvm project (note, the
ones I found are not anywhere near all of them, but it seems like a
good start).
Reviewed By: inclyc
Differential Revision: https://reviews.llvm.org/D131167
The pattern matching optimization of Polly detects and optimizes dense general
matrix-matrix multiplication. The generated code is close to high performance
implementations of matrix-matrix multiplications, which are contained in
manually tuned libraries. The described pattern matching optimization is
a particular case of tensor contraction optimization, which was
introduced in [1].
This patch generalizes the pattern matching to the case of tensor contractions
using the form of data dependencies and memory accesses produced by tensor
contractions [1].
Optimization of tensor contractions will be added in the next patch. Following
the ideas introduced in [2], it will logically represent tensor contraction
operands as matrix multiplication operands and use an approach for
optimization of matrix-matrix multiplications.
[1] - Gareev R., Grosser T., Kruse M. High-Performance Generalized Tensor
Operations: A Compiler-Oriented Approach // ACM Transactions on
Architecture and Code Optimization (TACO). 2018. Vol. 15, no. 3.
P. 34:1–34:27. DOI: 10.1145/3235029.
[2] - Matthews D. High-Performance Tensor Contraction without BLAS // SIAM
Journal on Scientific Computing. 2018. Vol. 40, no. 1. P. C 1—C 24.
DOI: 110.1137/16m108968x.
Reviewed By: Meinersbur
Differential Revision: https://reviews.llvm.org/D114336
The IR Verifier requires that every call instruction to an inlineable
function (among other things, its implementation must be visible in the
translation unit) must also have !dbg metadata attached to it. When
parallelizing, Polly emits calls to OpenMP runtime function out of thin
air, or at least not directly derived from a bounded list of previous
instruction. While we could search for instructions in the SCoP that has
some debug info attached to it, there is no guarantee that we find any.
Our solution is to generate a new DILocation that points to line 0 to
represent optimized code.
The OpenMP function implementation is usually not available in the
user's translation unit, but can become visible in an LTO build. For
the bug to appear, libomp must also be built with debug symbols.
IMHO, the IR verifier rule is too strict. Runtime functions can
also be inserted by other optimization passes, such as
LoopIdiomRecognize. When inserting a call to e.g. memset, it uses the
DebugLoc from a StoreInst from the unoptimized code. It is not
required to have !dbg metadata attached either.
Fixes#56692
Following some recent discussions, this changes the representation
of callbrs in IR. The current blockaddress arguments are replaced
with `!` label constraints that refer directly to callbr indirect
destinations:
; Before:
%res = callbr i8* asm "", "=r,r,i"(i8* %x, i8* blockaddress(@test8, %foo))
to label %asm.fallthrough [label %foo]
; After:
%res = callbr i8* asm "", "=r,r,!i"(i8* %x)
to label %asm.fallthrough [label %foo]
The benefit of this is that we can easily update the successors of
a callbr, without having to worry about also updating blockaddress
references. This should allow us to remove some limitations:
* Allow unrolling/peeling/rotation of callbr, or any other
clone-based optimizations
(https://github.com/llvm/llvm-project/issues/41834)
* Allow duplicate successors
(https://github.com/llvm/llvm-project/issues/45248)
This is just the IR representation change though, I will follow up
with patches to remove limtations in various transformation passes
that are no longer needed.
Differential Revision: https://reviews.llvm.org/D129288
The copy statements inserted by the matrix-multiplication optimization
introduce new dependencies between the copy statements and other
statements. As a result, the DependenceInfo must be recomputed.
Not recomputing them caused IslAstInfo to deduce that some loops are
parallel but cause race conditions when accessing the packed arrays.
As a result, matrix-matrix multiplication currently cannot be
parallelized.
Also see discussion at https://reviews.llvm.org/D125202
This enabled opaque pointers by default in LLVM. The effect of this
is twofold:
* If IR that contains *neither* explicit ptr nor %T* types is passed
to tools, we will now use opaque pointer mode, unless
-opaque-pointers=0 has been explicitly passed.
* Users of LLVM as a library will now default to opaque pointers.
It is possible to opt-out by calling setOpaquePointers(false) on
LLVMContext.
A cmake option to toggle this default will not be provided. Frontends
or other tools that want to (temporarily) keep using typed pointers
should disable opaque pointers via LLVMContext.
Differential Revision: https://reviews.llvm.org/D126689