One-Shot Bufferize correctly handles RaW conflicts around repetitive regions (loops). Specical handling is needed for parallel regions. These are a special kind of repetitive regions that can have additional RaW conflicts that would not be present if the regions would be executed sequentially.
Example:
```
%0 = bufferization.alloc_tensor()
scf.forall ... {
%1 = linalg.fill ins(...) outs(%0)
...
scf.forall.in_parallel {
tensor.parallel_insert_slice %1 into ...
}
}
```
A separate (private) buffer must be allocated for each iteration of the `scf.forall` loop.
This change adds a new interface method to `BufferizableOpInterface` to detect parallel regions. By default, regions are assumed to be sequential.
A buffer is privatized if an OpOperand bufferizes to a memory read inside a parallel region that is different from the parallel region where operand's value is defined.
Differential Revision: https://reviews.llvm.org/D159286
Assuming the ADD is nsw then it may be sign-extended to merge with a SHL op in a similar fold to the existing (shl (add x, c1), c2) -> (add (shl x, c2), c1 << c2) fold.
This is most useful for helping to expose address math for X86, but has also touched several aarch64 test cases as well.
Alive2: https://alive2.llvm.org/ce/z/2UpSbJ
Differential Revision: https://reviews.llvm.org/D159198
On z/OS, the following error message is not matched correctly in lit
tests.
`EDC5129I No such file or directory.`
This patch uses a lit config substitution to check for platform specific
error messages.
SITargetLowering::adjustWritemask calls SelectionDAG::UpdateNodeOperands
to update an EXTRACT_SUBREG node in-place to refer to a new IMAGE_LOAD
instruction, before we delete the old IMAGE_LOAD instruction. But in
UpdateNodeOperands can do CSE on the fly and return a different
EXTRACT_SUBREG node, so the original EXTRACT_SUBREG node would still
exist and would refer to the old deleted IMAGE_LOAD instruction. This
caused errors like:
t31: v3i32,ch = <<Deleted Node!>> # D:1
This target-independent node should have been selected!
UNREACHABLE executed at lib/CodeGen/SelectionDAG/InstrEmitter.cpp:1209!
Fix it by detecting the CSE case and replacing all uses of the original
EXTRACT_SUBREG node with the CSE'd one.
scf.forall ops without shared outputs (i.e., fully bufferized ops) are
lowered to scf.parallel. scf.forall ops are typically lowered by an
earlier pass depending on the execution target. E.g., there are
optimized lowerings for GPU execution. This new lowering is for
completeness (convert-scf-to-cf can now lower all SCF loop constructs)
and provides a simple CPU lowering strategy for testing purposes.
scf.parallel is currently lowered to scf.for, which executes
sequentially. The scf.parallel lowering could be improved in the future
to run on multiple threads.
Anything that produces a hlfir.expr should have an allocation side
effect so that it is not removed by CSE (which would result in two
hlfir.destroy operations for the same expression). Similarly for
hlfir.associate, which has hlfir.end_associate.
Also adds read effects on arguments which are pointer-like or boxes.
I see no regressions from this change when running llvm-testsuite with
optimization enabled, or from SPEC2017 rate benchmarks.
To test this, I have added MLIR's pass for testing side effect
interfaces to fir-opt.
Differential Revision: https://reviews.llvm.org/D158662
m_x86_Features_Group always turn `mno-xxxx` into `-target-feature
-xxxx`. In this case, we don't have `-gather/-scatter` but
`+prefer-no-gather/scatter`.
This patch solves unexpected warning when using
`mno-gather/mno-scatter`:
```
'-gather' is not a recognized feature for this target (ignoring feature)
'-scatter' is not a recognized feature for this target (ignoring feature)
```
V implies Zvl128b, but a lot of the fixed vector tests also redundantly
specify -riscv-v-vector-bits-min=128. This patch removes them where
there isn't another minimum vlen being tested for, and for cases where
Zve* is being used Zvl128b was added to maintain the old test diff (and
because an awkward vlen probably isn't interesting to test for). Other
places where -risc-v-vector-bits-min were being used were replaced with
Zvl.
To maintain the convention of Scudo names starting with "scudo:",
which is used by some tooling to categorize memory usage.
Reviewed By: Chia-hungDuan
Differential Revision: https://reviews.llvm.org/D157102
`type_traits` and `utility` refer to each other in their
implementations. Also `type_traits` starts to become too big to be
manageable. This PR splits each function into individual files. FTR this
is [how libcxx handles large headers as
well](https://github.com/llvm/llvm-project/tree/main/libcxx/include/__type_traits).
The reland adds two missing functions : is_destructible_v and is_reference_v
Assuming the ADD is nsw then it may be sign-extended to merge with a SHL op in a similar fold to the existing (shl (add x, c1), c2) -> (add (shl x, c2), c1 << c2) fold.
This is most useful for helping to expose address math for X86, but has also touched several aarch64 test cases as well.
Alive2: https://alive2.llvm.org/ce/z/2UpSbJ
Differential Revision: https://reviews.llvm.org/D159198
We started seeing a lot of timeouts that align with the change in lit to
execute gtests in shards. The logic there assumes tests are
single-threaded, which is the case for most of the LLVM, hence they
pick #shards ~ #cores (by slightly overshooting).
There are enough unittests in clangd that rely on multi-threading, they
can create arbitrarily many threads but we limit amount of meaningful
work to ~4 thread per process.
This change ensures that we're accounting for that paralelism when
executing clangd tests and not overloading test executors.
In theory the change overestimates the requirements, not all tests are
multi-threaded, but it doesn't seem to be resulting in any regressions
on my local runs.
Fixes https://github.com/llvm/llvm-project/issues/64964.
Fixes https://github.com/clangd/clangd/issues/1712.
In D158086, we limit all floating point scalar move and splat can't fuse
vsetvli with different SEW, and this patch try to relax the constraint
as possible by introducing new SEW demand type:
SEWGreaterThanOrEqualAndLessThan64, that allow SEW fused with larger
SEW, but constraint it can't fused with SEW=64.
Reviewed By: rogfer01
Differential Revision: https://reviews.llvm.org/D158177
`type_traits` and `utility` refer to each other in their
implementations. Also `type_traits` starts to become too big to be
manageable. This PR splits each function into individual files. FTR this
is [how libcxx handles large headers as
well](https://github.com/llvm/llvm-project/tree/main/libcxx/include/__type_traits).
The main way I cross build lldb is to point CMake at an existing host
build to get the native tablegen tools. This is what we had documented
before.
There is another option where you start from scratch and the host tools
are built for you. This patch documents that and explains which one to
choose.
Added another arm64 example which uses this. So the frst one is the
"automatic" build and the second is the traditional approach.
For ease of copy paste and understanding, I've kept the full command in
each section and noted the one difference between them.
Along the way I updated some of the preamble to explain the two
approaches and updated some language e.g. removing "just ...". Eveyone's
"just" is different, doubly so when cross-compiling.
We want to work towards eliminating the `RecordStorageLocation` from
`RecordValue`. These particular uses of `RecordValue::getChild()` can
simply
be replaced with `RecordStorageLocation::getChild()`.
This finishes the work of replacing OperandMatchResultTy with
ParseStatus, started in D154101.
As a drive-by change, rename some RegNo variables to just Reg
(a leftover from the days when RegNo had 'unsigned' type).
This patch adds attribute builders for all buildable attributes from the
builtin dialect that did not previously have any. These builders can be
used to construct attributes of a particular type identified by a string
from a Python argument without knowing the details of how to pass that
Python argument to the attribute constructor. This is used, for example,
in the generated code of the Python bindings of ops.
The list of "all" attributes was produced with:
(
grep -h "ods_ir.AttrBuilder.get" $(find ../build/ -name "*_ops_gen.py") \
| cut -f2 -d"'"
git grep -ho "^def [a-zA-Z0-9_]*" -- include/mlir/IR/CommonAttrConstraints.td \
| cut -f2 -d" "
) | sort -u
Then, I only retained those that had an occurence in
`mlir/include/mlir/IR`. In particular, this drops many dialect-specific
attributes; registering those builders is something that those dialects
should do. Finally, I removed those attrbiutes that had a match in
`mlir/python/mlir/ir.py` already and implemented the remaining ones. The
only ones that still miss a builder now are the following:
* Represent more than one possible attribute type:
- `Any.*Attr` (9x)
- `IntNonNegative`
- `IntPositive`
- `IsNullAttr`
- `ElementsAttr`
* I am not sure what "constant attributes" are:
- `ConstBoolAttrFalse`
- `ConstBoolAttrTrue`
- `ConstUnitAttr`
* `Location` not exposed by Python bindings:
- `LocationArrayAttr`
- `LocationAttr`
* `get` function not implemented in Python bindings:
- `StringElementsAttr`
This patch also fixes a compilation problem with
`I64SmallVectorArrayAttr`.
Reviewed By: makslevental, rkayaith
Differential Revision: https://reviews.llvm.org/D159403
When a module variable is referenced inside an internal procedure, but
the use statement for the module is inside the host, semantics may not
create any symbols with HostAssocDetails directly under the internal
procedure scope.
So pft::getScopeVariableList, that is called in the bridge when lowering
the internal procedure scope, failed to instantiate the module
variables. This lead to "symbol is not mapped to any IR value" compile
time errors.
This patch fixes the issue by adding the variables to the list of
"captured" global variables from the host program, so that they are
instantiated as part of the `internalProcedureBindings` in the bridge.
The rational of doing it that way instead of changing
`getScopeVariableList` is that `getScopeVariableList` would have to
import all the module variables used inside the host since it cannot
know which ones are referenced inside the internal procedure from the
semantics::Scope information. The fix in this patch only instantiates
the module variables from the host that are actually referenced inside
the internal procedure.
addi sp, sp, 512 may be used to recover the sp in the epilogue
when stack size is larger than 2047(2^11 - 1), however, it can
not be compressed using C extension, and addi sp, sp, 496 is
able to be compressed, so try to use 496 as the ajust amount of
the fisrt sp if function doesn't need extra instructions after
adjust.
Reviewed By: wangpc
Differential Revision: https://reviews.llvm.org/D159431
ParseStatus is slightly more convenient to use due to implicit
conversion from bool, which allows to do something like:
```
return Error(L, "msg");
```
when with MatchOperandResultTy it had to be:
```
Error(L, "msg");
return MatchOperand_ParseFail;
```
It also has more appropriate name since parse* methods are not only for
parsing operands.
Reviewed By: brad
Differential Revision: https://reviews.llvm.org/D154321
On PPC there are instructions to store element from vector(e.g.
stxsdx/stxsiwx), and these instructions can be leveraged to avoid tail
constant in memset and constant splat array initialization.
This patch tries to explore these opportunities.
Reviewed By: shchenz
Differential Revision: https://reviews.llvm.org/D138883
After D154102 multi-line labels would get split incorrectly.
When CFG is generated for a function with basic block name longer
than 80 lines, then the header separator will be placed after the
line break for the label name instead of after the whole label name.
The fix is simple by just moving the insert of | character before the
line splitting happens.
Differential Revision: https://reviews.llvm.org/D159207
Reductions specified the correct formula in the comments but was
implemented incorrectly.
Ordered reductions formula was refined to be more accurate to actual
hardware.
A llvm-mca test case for reductions is added so that we have a better
idea of how the model is performing.
/data/llvm-project/compiler-rt/lib/dfsan/dfsan_custom.cpp:2546:37: error: parameter 'va_labels' set but not used [-Werror,-Wunused-but-set-parameter]
dfsan_label *va_labels, dfsan_label *ret_label,
^
1 error generated.
The goal of this change is to clean up some of the code surrounding
HLSL using CXXThisExpr as a non-pointer l-value. This change cleans up
a bunch of assumptions and inconsistencies around how the type of
`this` is handled through the AST and code generation.
This change is be mostly NFC for HLSL, and completely NFC for other
language modes.
This change introduces a new member to query for the this object's type
and seeks to clarify the normal usages of the this type.
With the introudction of HLSL to clang, CXXThisExpr may now be an
l-value and behave like a reference type rather than C++'s normal
method of it being an r-value of pointer type.
With this change there are now three ways in which a caller might need
to query the type of `this`:
* The type of the `CXXThisExpr`
* The type of the object `this` referrs to
* The type of the implicit (or explicit) `this` argument
This change codifies those three ways you may need to query
respectively as:
* CXXMethodDecl::getThisType()
* CXXMethodDecl::getThisObjectType()
* CXXMethodDecl::getThisArgType()
This change then revisits all uses of `getThisType()`, and in cases
where the only use was to resolve the pointee type, it replaces the
call with `getThisObjectType()`. In other cases it evaluates whether
the desired returned type is the type of the `this` expr, or the type
of the `this` function argument. The `this` expr type is used for
creating additional expr AST nodes and for member lookup, while the
argument type is used mostly for code generation.
Additionally some cases that used `getThisType` in simple queries could
be substituted for `getThisObjectType`. Since `getThisType` is
implemented in terms of `getThisObjectType` calling the later should be
more efficient if the former isn't needed.
Reviewed By: aaron.ballman, bogner
Differential Revision: https://reviews.llvm.org/D159247
Our LLDB parser didn't correctly handle archives of all flavors on different systems, it currently only correctly handled BSD archives, normal and thin, on macOS, but I noticed that it was getting incorrect information when decoding a variety of archives on linux. There were subtle changes to how names were encoded that we didn't handle correctly and we also didn't set the result of GetObjectSize() correctly as there was some bad math. This didn't matter when exracting .o files from .a files for LLDB because the size was always way too big, but it was big enough to at least read enough bytes for each object within the archive.
This patch does the following:
- switch over to use LLVM's archive parser and avoids previous code duplication
- remove values from ObjectContainerBSDArchive::Object that we don't use like:
- uid
- gid
- mode
- fix ths ObjectContainerBSDArchive::Object::file_size value to be correct
- adds tests to test that we get the correct module specifications
Differential Revision: https://reviews.llvm.org/D159408
[[nodiscard]] on constructors is a defect report against C++17. That
means that it should be applied retroactively, though older compilers
might not know about it and emit warnings. This adds a
back-compatibility macro.