Expand-Atomic pass emits the CAS loop for FP operations
which limits the optimizations offered by atomic optimizer.
Moving atomic optimizer before expand-atomics allows
better codegen.
Reviewed By: arsenm, #amdgpu
Differential Revision: https://reviews.llvm.org/D157265
Reduction and Scan are implemented using `Iterative`
and `DPP` strategy for `float` type.
Reviewed By: arsenm, #amdgpu
Differential Revision: https://reviews.llvm.org/D156301
Also picks up a few improvements (Some of the fcmp.ll
test names imply they aren't quite testing what was intended.
Checking the sign bit can't be performed with a compare to a 0).
Much of the logic in here is the same as the class detection
logic of fcmpToClassTest. We could unify more with a weaker
version of fcmpToClassTest which returns implied classes rather
than exact class-like compares. Also could unify more with detection
of possible classes in non-splat vectors.
One problem here is we now only perform folds that used
to always work now require a context instruction. This is
because fcmpToClassTest requires the parent function.
Either fcmpToClassTest could tolerate a missing context
function, or we could require passing in one to simplifyFCmpInst.
Without this it's possible to hit the !isNan assert (which feels like
an unnecessary assert). In any case, these cases don't appear in
any tests.
https://reviews.llvm.org/D151887
Extend `GlobPattern` to support brace expansions, e.g., `foo.{c,cpp}` as discussed in https://reviews.llvm.org/D152762#4425203.
The high level change was to turn `Tokens` into a list that gets larger when we see a new brace expansion term. Then in `GlobPattern::match()` we must check against each token group.
This is a breaking change since `{` will no longer match a literal without escaping. However, `\{` will match the literal `{` before and after this change. Also, from a brief survey of LLVM, it seems that `GlobPattern` is mostly used for symbol and path matching, which likely won't need `{` in their patterns.
See https://github.com/devongovett/glob-match#syntax for a nice glob reference.
Reviewed By: MaskRay
Differential Revision: https://reviews.llvm.org/D153587
This patch adds two new macros to setjmp (STORE, STORE_FP) and two new
macros to longjmp (LOAD, LOAD_FP) that takes a register and a buff, then
select the correct asm instruction for rv32 and rv64.
Reviewed By: sivachandra
Differential Revision: https://reviews.llvm.org/D158640
This patch changes the instruction in set_thread_ptr from ld to mv,
as rv32 doesn't have the ld instruction, and mv is supported by both
rv32 and rv64.
Reviewed By: sivachandra
Differential Revision: https://reviews.llvm.org/D159110
This patch changes a test case that tests for overflow when time_t is
32-bit long, however, it was checking size_t instead of time_t.
This in on par with other testcases that correctly check the size of
time_t (asctime_test.cpp, gmtime_r_test.cpp and gmtime_test.cpp).
Reviewed By: sivachandra
Differential Revision: https://reviews.llvm.org/D159113
From the discussion in https://reviews.llvm.org/D158853, moving the truncate
into the splat helps more splatted scalar operands get selected on RISC-V, and
also avoids the need for splat_vector_parts on RV32.
Reviewed By: RKSimon
Differential Revision: https://reviews.llvm.org/D159147
0-D vectors are now supported, so the special case of returning the just
the element type can now be removed.
A few callers that relied on the old behaviour have been updated.
Reviewed By: awarzynski, nicolasvasilache
Differential Revision: https://reviews.llvm.org/D159122
Updates optimization-remark.f90. Makes sure that every RUN line:
* discords the actual output of the compilation (we only care about the
optimisation remarks),
* re-uses the same definition of the output (better code re-use),
* doesn't generate object files - no need to use `-c` if `-emit-llvm` is
sufficient.
Differential Revision: https://reviews.llvm.org/D158951
Do not inline IR with multiple blocks into ops that may not support unstructured control flow.
This fixes#64978.
Differential Revision: https://reviews.llvm.org/D159072
`SingleBlockImplicitTerminator` is now a combination of two traits: `SingleBlock` and `SingleBlockImplicitTerminatorImpl` (the original `SingleBlockImplicitTerminator`).
This change makes it possible to check if the `SingleBlock` op trait is implemented. Until now, `Operation::hasTrait<OpTrait::SingleBlock>()` returned `false` for ops that implement `SingleBlockImplicitTerminator`.
Differential Revision: https://reviews.llvm.org/D159078
This patch adds a CI job for Clang on Windows that is separate from
the monolithic job that gets added automatically via the Phabricator
integration with Buildkite. This way, we will retain the Windows testing
for Clang when we move to GitHub Pull Requests.
Differential Revision: https://reviews.llvm.org/D158995
Change the LLVM dialect to LLVM IR translation to convert the alias
scope attributes lazily to LLVM IR metadata. Previously, the alias
scopes have been translated upfront walking the alias scopes of
operations that implement the AliasAnalysisOpInterface. As a result,
the translation of a module that contains only a noalias scope
intrinsic failed, since its alias scope attribute has not been
translated due to the intrinsic not implementing
AliasAnalysisOpInterface.
Reviewed By: zero9178
Differential Revision: https://reviews.llvm.org/D159187
When sending file from a Linux host to a Windows remote, Linux host will try to copy the source file's permission bits, which will contain `_S_I?GRP` and `_S_I?OTH` bits. Those bits are rejected by `_wsopen_s`, causing it to return EINVAL.
This patch masks out the rejected bits.
GitHub issue: #64313
Reviewed By: jasonmolenda, DavidSpickett
Differential Revision: https://reviews.llvm.org/D156817
The current tests in iv-select-cmp.ll are not representative of clang
output of common real-world C programs, which are often written with i32
induction vars, as opposed to i64 induction vars. Hence, add five tests
corresponding to the following programs:
int test(int *a, int n) {
int rdx = 331;
for (int i = 0; i < n; i++) {
if (a[i] > 3)
rdx = i;
}
return rdx;
}
int test(int *a) {
int rdx = 331;
for (int i = 0; i < 20000; i++) {
if (a[i] > 3)
rdx = i;
}
return rdx;
}
int test(int *a, long n) {
int rdx = 331;
for (int i = 0; i < n; i++) {
if (a[i] > 3)
rdx = i;
}
return rdx;
}
int test(int *a, unsigned n) {
int rdx = 331;
for (int i = 0; i < n; i++) {
if (a[i] > 3)
rdx = i;
}
return rdx;
}
int test(int *a) {
int rdx = 331;
for (long i = INT_MIN - 1; i < UINT_MAX; i++) {
if (a[i] > 3)
rdx = i;
}
return rdx;
}
The first two can theoretically be vectorized without a runtime-check,
while the third and fourth cannot. The fifth cannot be vectorized, even
with a runtime-check.
This issue was found while reviewing D150851.
Differential Revision: https://reviews.llvm.org/D156124
This upstreams a part of the C++ namespaces support in Clang API Notes.
The complete patch was recently merged downstream in the Apple fork: https://github.com/apple/llvm-project/pull/7230.
This patch only adds the parts of the namespace support that can be cleanly applied on top of the API Notes infrastructure that was upstreamed previously.
Differential Revision: https://reviews.llvm.org/D159092
If llvm-symbolizer finds a malformed command, it echoes it to the
standard output. New versions of binutils (starting from 2.39) allow to
specify an address by a symbols. Implementation of this feature in
llvm-symbolizer makes the current reaction on invalid input
inappropriate. Almost any invalid command may be treated as a symbol
name, so the right reaction should be "symbol not found" in such case.
The exception are commands that are recognized but have incorrect
syntax, like "FILE:FILE:". The utility must produce descriptive
diagnostic for such input and route it to the stderr.
This change implements the new reaction on invalid input and is a
prerequisite for implementation of symbol lookup in llvm-symbolizer.
Differential Revision: https://reviews.llvm.org/D157210
FMADD, FMSUB instructions perform better or the same compared to indexed
FMLA, FMLS.
For example, the Arm Cortex-A55 Software Optimization Guide lists "FP
multiply accumulate" FMADD, FMSUB instructions with a throughput of 2
IPC, whereas it lists "ASIMD FP multiply accumulate, by element" FMLA,
FMLS with a throughput of 1 IPC.
The Arm Cortex-A77 Software Optimization Guide, however, does not
separately list "by element" variants of the "ASIMD FP multiply
accumulate" instructions, which are listed with the same throughput of 2
IPC as "FP multiply accumulate" instructions.
Reviewed By: samtebbs, dzhidzhoev
Differential Revision: https://reviews.llvm.org/D158008
Protect from accidental passing of an invalid MCFixupKind value which
can cause an out-of-bounds access in the array.
Reviewed by: arsenm
Differential Revision: https://reviews.llvm.org/D158725
A rotate of 8 bits of an e16 vector in either direction is equivalent to a
byteswap, i.e. vrev8. There is a generic combine on ISD::ROT{L,R} to
canonicalize these rotations to byteswaps, but on fixed vectors they are
legalized before they have the chance to be combined. This patch teaches the
rotate vector_shuffle lowering to emit these rotations as byteswaps to match
the scalable vector behaviour.
Reviewed By: reames
Differential Revision: https://reviews.llvm.org/D158195
Given a shuffle mask like <3, 0, 1, 2, 7, 4, 5, 6> for v8i8, we can
reinterpret it as a shuffle of v2i32 where the two i32s are bit rotated, and
lower it as a vror.vi (if legal with zvbb enabled).
We also need to make sure that the larger element type is a valid SEW, hence
the tests for zve32x.
X86 already did this, so I've extracted the logic for it and put it inside
ShuffleVectorSDNode so it could be reused by RISC-V. I originally tried to add
this as a generic combine in DAGCombiner.cpp, but it ended up causing worse
codegen on X86 and PPC.
Reviewed By: reames, pengfei
Differential Revision: https://reviews.llvm.org/D157417
This allows to add facts even if no corresponding ICmp instruction
exists in the IR.
Reviewed By: nikic
Differential Revision: https://reviews.llvm.org/D158837
SVal argument 'Cond' passed in is corrupted in release mode with
exception handling enabled (result in an UndefinedSVal), or changing
lambda capture inside the callee can workaround this.
Known problematic VS Versions:
- VS 2022 17.4.4
- VS 2022 17.5.4
- VS 2022 17.7.2
Verified working VS Version:
- VS 2019 16.11.25
Fixes https://github.com/llvm/llvm-project/issues/62130
Reviewed By: steakhal
Differential Revision: https://reviews.llvm.org/D159163
The legalizer could keep an original mask type of masked load combined with
sign/zero extend, but we have to extend the mask to a type similar to our
combined load otherwise instruction selection could not lower the load.
Differential Revision: https://reviews.llvm.org/D158386
BLAKE3 implementation does not support using arm neon on big-endian hosts: see
blake3_neon.c. Setting `BLAKE3_USE_NEON` to 1 by default for all AArch64
hosts broke builds for big endian hosts. This patch fixes the behavior
by introducing an additional check against `__ARM_BIG_ENDIAN` before
setting `BLAKE3_USE_NEON`.
Differential Revision: https://reviews.llvm.org/D159156
This is a complementary to D156237.
These attributes have custom parsing logic.
Reviewed By: cor3ntin
Differential Revision: https://reviews.llvm.org/D159024
This patch deletes the unused `addDefaultFunctionDefinitionAttributes(llvm::Function);` function,
while it still keeps `void addDefaultFunctionDefinitionAttributes(llvm::AttrBuilder &attrs);` which is being used.
Differential Revision: https://reviews.llvm.org/D158990