Verification of support for lowering private/firstprivate clauses
on unstructured sections.
Differential Revision: https://reviews.llvm.org/D145352
Reviewed By: TIFitis
i1 inserts will need an extra cset, and i1 extracts need a cmp (or tst) in
order to be used. This increase the cost of them a little to account for those
extra instructions.
https://godbolt.org/z/3c5z4G7Mh
Differential Revision: https://reviews.llvm.org/D151189
We need to add the replaced instruction itself to the worklist as
well. We want to remove the old instructions, but can't easily do
so directly, as the icmp is also one of the users and we need to
retain it until the fold has finished.
Adds a dynamic stack alignment to functions under the interrupt call
convention on x86-32. This fixes the issue where the stack can be
misaligned on entry, since x86-32 makes no guarantees about the stack
pointer position when the interrupt service routine is called.
The alignment is done by overriding X86RegisterInfo::shouldRealignStack,
and by setting the correct alignment in X86FrameLowering::calculateMaxStackAlign.
This forces the interrupt handler to be dynamically aligned, generating
the appropriate `and` instruction in the prologue and `lea` in the
epilogue. The `no-realign-stack` attribute can be used as an opt-out.
Fixes#26851
Reviewed By: pengfei
Differential Revision: https://reviews.llvm.org/D151400
Our comparison opcodes always produce a Boolean value and push it on the
stack. However, the result of such a comparison in C is int, so the
later code expects an integer value on the stack.
Work around this problem by casting the boolean value to int in those
cases. This is not ideal for C however. The comparison is usually
wrapped in a IntegerToBool cast anyway.
Differential Revision: https://reviews.llvm.org/D149645
Now, if the offset overflow happens, we just silently ignore it.
We will generate a bad dwp file, which will crash the gdb or make
it undefined behavior, and hard to address the root cause. So, we
need to produce some messages if overflow happens.
Reviewed By: ayermolo, dblaikie, steven.zhang
Differential Revision: https://reviews.llvm.org/D144565
This expands the reduction cost of i1 and/or/xor, so that larger type sizes get
handled by the existing code. For i1 reductions - and will use maxv, or will use
minv and xor will use addv, plus the cost of legalizing the type for larger
vectors using and/or/xor. The i1 vectors will be legalized to higher width
integers (say v16i8), which this overrides the cost of. As with all i1 vectors
there is a chance that the types the i1 vector is created with and how it is
used will not match, introducing extra extends that are not necessarily
costmodelled.
https://godbolt.org/z/6Gc9K6b7T
Differential Revision: https://reviews.llvm.org/D151184
This patch enables specifying scalable tile sizes when using the
Transform dialect to drive tiling, e.g.:
```
%1, %loop = transform.structured.tile %0 [[4]]
```
This is implemented by extending the TileOp with a dedicated attribute
for "scalability" and by updating various parsing hooks. At the moment,
only the trailing tile size can be scalable. The following is not yet
supported:
```
%1, %loop = transform.structured.tile %0 [[4], [4]]
```
This change is a part of larger effort to enable scalable vectorisation
in Linalg. See this RFC for more context:
* https://discourse.llvm.org/t/rfc-scalable-vectorisation-in-linalg/
Differential Revision: https://reviews.llvm.org/D150944
Main reason for this change is that these checkers were implemented in the same class
but had different dependency ordering. (NonNullParamChecker should run before StdCLibraryFunctionArgs
to get more special warning about null arguments, but the apiModeling.StdCLibraryFunctions was a modeling
checker that should run before other non-modeling checkers. The modeling checker changes state in a way
that makes it impossible to detect a null argument by NonNullParamChecker.)
To make it more simple, the modeling part is removed as separate checker and can be only used if
checker StdCLibraryFunctions is turned on, that produces the warnings too. Modeling the functions
without bug detection (for invalid argument) is not possible. The modeling of standard functions
does not happen by default from this change on.
Reviewed By: Szelethus
Differential Revision: https://reviews.llvm.org/D151225
Make ValueTracking directly call the KnownBits shift helpers, which
provides more precise results.
Unfortunately, ValueTracking has a special case where sometimes we
determine non-zero shift amounts using isKnownNonZero(). I have my
doubts about the usefulness of that special-case (it is only tested
in a single unit test), but I've reproduced the special-case via an
extra parameter to the KnownBits methods.
Differential Revision: https://reviews.llvm.org/D151816
Certain ExtractSliceOps, that do extract all elements from the destination, are treated like casts when looking for replacement ops. Such ExtractSliceOps are typically rank expansions.
Differential Revision: https://reviews.llvm.org/D151804
Zero ranked tensor (say tensor<i1>) when used for arith.select's condition,
crashes optimizer during bufferization. This patch puts a constraint on
condition to be either scalar or of matching shape as to its result.
Reviewed By: mehdi_amini
Differential Revision: https://reviews.llvm.org/D151270
The existing BOLT install targets are broken on Windows becase they
don't properly handle the output extension. We cannot use the existing
LLVM macros since those make assumptions that don't hold for BOLT. This
change instead implements custom macros following the approach used by
Clang and LLD.
Differential Revision: https://reviews.llvm.org/D151595
Encountered ASAN crash and found it dereference without check pointer.
Reviewed By: kito-cheng, eklepilkina
Differential Revision: https://reviews.llvm.org/D151716
Before this patch, we can only use the MaxBECount for an AddRec's range
computation if the MaxBECount has <= bit width of the AddRec. This patch
reasons that if a MaxBECount has > bit width, and is <= the max value of
AddRec's bit width, we can still use the MaxBECount.
Reviewed By: nikic
Differential Revision: https://reviews.llvm.org/D151698
Currently desired bytecode version is clamped to the maximum. This allows requesting bytecode versions that do not exist. We have added callsite validation for this in StableHLO to ensure we don't pass an invalid version number, probably better if this is managed upstream. If a user wants to use the current version, then omitting `setDesiredBytecodeVersion` is the best way to do that (as opposed to providing a large number).
Adding this check will also properly error on older version numbers as we increment the minimum supported version. Silently claming on minimum version would likely lead to unintentional forward incompatibilities.
Separately, due to bytecode version being `int64_t` and using methods to read/write uints, we can generate payloads with invalid version numbers:
```
mlir-opt file.mlir --emit-bytecode --emit-bytecode-version=-1 | mlir-opt
<stdin>:0:0: error: bytecode version 18446744073709551615 is newer than the current version 5
```
This is fixed with version bounds checking as well.
Reviewed By: mehdi_amini
Differential Revision: https://reviews.llvm.org/D151838
On AArch64, it is possible to have a program that accesses both low
(0x000...) and high (0xfff...) memory, and with pointer authentication,
you can have different numbers of bits used for pointer authentication
depending on whether the address is in high or low memory.
This adds a new target.process.highmem-virtual-addressable-bits
setting which the AArch64 Mac ABI plugin will use, when set, to
always set those unaddressable high bits for high memory addresses,
and will use the existing target.process.virtual-addressable-bits
setting for low memory addresses.
This patch does not change the existing behavior when only
target.process.virtual-addressable-bits is set. In that case, the
value will apply to all addresses.
Not yet done is recognizing metadata in a live process connection
(gdb-remote qHostInfo) or a Mach-O corefile LC_NOTE to set the
correct number of addressing bits for both memory ranges. That
will be a future change.
Differential Revision: https://reviews.llvm.org/D151292
rdar://109746900
The tests introduced by https://reviews.llvm.org/D151589 were failing
because I guess some test platforms don't have `lld`. Similar tests add
`-B%S/Inputs/lld` to the clang commands so lets try this here to fix the
tests.
```
clang: error: invalid linker name in argument '-fuse-ld=lld'
```
This simplifies the code inside copy/move and makes it easier to apply the optimization to other algorithms.
Reviewed By: ldionne, Mordante, #libc
Spies: arichardson, libcxx-commits
Differential Revision: https://reviews.llvm.org/D151265
Enable support for CSPGO for lld MachO targets.
Since lld MachO does not support `-plugin-opt=`, we need to create the `--cs-profile-generate` and `--cs-profile-path=` options and propagate them in `Darwin.cpp`. These flags are not supported by ld64.
Also outline code into `getLastCSProfileGenerateArg()` to share between `CommonArgs.cpp` and `Darwin.cpp`.
CSPGO is already implemented for ELF (https://reviews.llvm.org/D56675) and COFF (https://reviews.llvm.org/D98763).
Reviewed By: MaskRay
Differential Revision: https://reviews.llvm.org/D151589
Split DWARF doesn't handle LTO of any form (roughly there's an
assumption that each dwo file will have one CU - it's not explicitly
documented, nor explicitly handled, so the ecosystem isn't really well
understood/tested/etc).
This had previously been handled by implementing (& disabling by
default) the `-split-dwarf-cross-cu-references` flag, which would
disable use of ref_addr across two dwo CUs.
This worked for a while, at least in LTO (it didn't address Split
DWARF+Full LTO, but that's an unlikely combination, as the benefits of
Split DWARF are more limited in a full LTO build) - because the only
source of cross-CU references was inlined functions, so by making those
non-cross-CU (by moving the referenced inlined function DWARF
description into the referencing CU) the result was one CU per dwo.
But recently the Function Specialization pass was added to the ThinLTO
pipeline, which caused imported functions that may not be inlined to be
emitted by a backend compile. This meant foreign CU entities (not just
abstract origins/cross-CU referenced entities)/standalone foreign CUs
could be emitted by a backend compile.
The end result was, due to a bug* in binutils dwp (I think basically
it saw two CUs in a single dwo and reprocessed the offsets in the shared
debug_str_offsets.dwo section) this situation lead to corrupted strings.
So to make this more robust, I've generalized the definition of the
`-split-dwarf-cross-cu-references` flag (perhaps it should be renamed at
this point, but it's /really/ niche, doubt anyone's using it - more or
less there for experimentation when we get around to figuring out
spec'ing LTO+Split DWARF) to mean "single CU in a dwo file" and added
more general handling for this.
There's certainly some weird corner cases that could come up in terms of
"how do we choose which CU to put everything in" - for now it's "first
come, first served" which is probably going to be OK for ThinLTO - the
base module will have the first functions and first CU, imported
fragments will come after that. For LTO the choice will be fairly
arbitrary - but, again, essentially whichever module comes first.
* Arguably a bug in binutils dwp, but since the feature isn't well
specified, I'd rather avoid dabbling in this uncertain area and ensure
LLVM doesn't produce especially novel DWARF (dwos with multiple CUs)
regardless of whether binutils dwp would/should be fixed. I'm not
confident debuggers could read such a dwo file well, etc.
The partial move from JITTargetAddress to ExecutorAddr in 8b1771bd9f30 did not
update the ORC or Kaleidoscope documents. This patch fixes the inconsistency.
Reviewed By: lhames
Differential Revision: https://reviews.llvm.org/D150458
The OpenMP DeviceRTL uses a hacky workaround to keep certain runtime
calls alive. This used a function that prevented them from being
optimized out. We needed this hack because the 'OpenMPOpt' pass likes to
introduce new runtime calls into the TU. This then interacted badly with
the method of linking the bitcode file per-TU like we do with Nvidia.
The OpenMPOpt pass would then generate a runtime call to a function that
was never linked in.
This should not be a problem anymore because we unconditionally link in
the `libomptarget.devicertl.a` runtime library. This should thus only
extract symbols that are undefined. So, if we do end up with an
unresolved reference it will be resolved by the static library.
The downside to this is that if we are doing non-LTO NVPTX compilation
that introduces one of these calls it will be linked outside the module
and therefore provide the overhead of an external function call.
However, removing this flag should make optimizing things easier. We
will need to see if that performance is a problem.
Reviewed By: ye-luo
Differential Revision: https://reviews.llvm.org/D151324