481862 Commits

Author SHA1 Message Date
h-vetinari
80403e9fee
[libc++] fix some inconsistencies on libcxx status pages (#73471)
Minor things I've noticed from looking at the pages. For the missing
space I actually searched for "``\\\s\w" to see where they were hiding.
2023-11-27 10:14:18 -05:00
Tom Eccles
caba0314cf
[flang] Enable alias tags pass by default (#73111)
Enable by default for optimization levels higher than 0 (same behavior
as clang).

For simplicity, only forward the flag to the frontend driver when it
contradicts what is implied by the optimization level.

Since https://github.com/llvm/llvm-project/pull/72903 there are now no
known performance regressions.
2023-11-27 15:10:21 +00:00
Guillaume Chatelet
9539cbf033
[libc] Add detection support for float16 (#73372) 2023-11-27 16:08:17 +01:00
Simon Pilgrim
286905351f [X86] vector-interleaved tests - add AVX512F/AVX512DQ/AVX512BW/AVX512DQBW-ONLY common prefixes to merge more SLOW/FAST checks
Not used by many vector-interleaved tests, but its a LOT easier to maintain if we use the same prefixes for all of them.
2023-11-27 15:06:24 +00:00
Dmitri Gribenko
934efd2c9b Revert "[Bazel] Fix llvm-exegesis build post 12b0ab2"
This reverts commit 1449b349ac4072adb1f593234c1d6f67986d3b6a. The
corresponding CMake commit (12b0ab2) was reverted.
2023-11-27 15:53:48 +01:00
Erich Keane
ba1c869f00
[OpenACC] Implement 'routine' construct parsing (#73143)
The 'routine' construct applies either to a function directly, or, when
provided a name, applies to the function named (and is visible in the
current scope). This patch implements the parsing for this.  The
identifier provided (or Id Expression) is required to be a valid,
declared identifier, though the semantic analysis portion of the Routine
directive will need to enforce it being a function/overload set.
2023-11-27 06:49:29 -08:00
Joseph Huber
bf02c84cb8
[libc] Use file lock to join newline on RPC puts call (#73373)
Summary:
The puts call appends a newline. With multiple threads, this can be done
out of order such that another thread puts something before we finish
appending the newline. Add a flockfile and funlockfile to ensure that
the whole string is printed before another string can appear.
2023-11-27 08:41:15 -06:00
Igor Kirillov
839abdb0d2
[MachineLICM] Fix incorrect CSE on hoisted const load (#73007)
When hoisting an invariant load, we should not combine it with an
existing load through common subexpression elimination (CSE). This is
because there might be memory-changing instructions between the existing
load and the end of the block entering the loop.

Fixes https://github.com/llvm/llvm-project/issues/72855
2023-11-27 14:37:18 +00:00
Michael Buch
ae10baf0a0
[libcxxabi][test][NFC] Turn off clang-format for demangler test-case array (#73503)
Adding test-cases to the `cases` array causes `git clang-format` to
split the strings of many of the existing test-cases, making them harder
to read/work with in most cases.

This patch disables `clang-format` for the `cases` array so it doesn't
catch anyone off-guard in the future.
2023-11-27 09:24:37 -05:00
Joseph Huber
79b03306af
[llvm] Disable HandleLLVMOptions in runtimes mode (#73031)
Summary:
There are a few default options that LLVM adds that can be problematic
for runtimes builds. These options are generally intended to handle
building LLVM itself, but are also added when building in a runtimes
mode. One such issue I've run into is that in `libc` we deliberately use
`--target` to use a different device toolchain, which doesn't support
some linker arguments passed via `-Wl`. This is observed in
https://github.com/llvm/llvm-project/pull/73030 when attempting to use
these options.

This patch completely removes these default arguments.

The consensus is that any issues created by this patch should ultimately
be solved on a per-runtime basis.
2023-11-27 08:12:32 -06:00
Nikita Popov
6778dbe502 [DomTree] Avoid duplicate hash lookup (NFC)
We're performing the same lookup twice here, just to access
different fields. Only perform it once.

Also prefer using find() over operator[], as we do not want to
perform an insert if the node does not exist.
2023-11-27 15:05:03 +01:00
Sander de Smalen
adb130ccad
[llvm][TypeSize] Consider TypeSize of '0' to be fixed/scalable-agnostic. (#72994)
This patch allows adding any quantity to a zero-initialized TypeSize,
such
that e.g.:

  TypeSize::Scalable(0) + TypeSize::Fixed(4) == TypeSize::Fixed(4)
  TypeSize::Fixed(0) + TypeSize::Scalable(4) == TypeSize::Scalable(4)

This makes it easier to implement add-reductions using TypeSize where
the 'scalable' flag is not yet known before starting the reduction.

(this PR follows on from #72979)
2023-11-27 14:04:52 +00:00
martinboehme
5bd643e145
[clang][dataflow] Strengthen widening of boolean values. (#73484)
Before we widen to top, we now check if both values can be proved either
true or
false in their respective environments; if so, widening returns a true
or false
literal. The idea is that we avoid losing information if posssible.

This patch includes a test that fails without this change to widening.

This change does mean that we call the SAT solver in more places, but
this seems
acceptable given the additional precision we gain.

In tests on an internal codebase, the number of SAT solver timeouts we
observe
with Crubit's nullability checker does increase by about 25%. They can
be
brought back to the previous level by doubling the SAT solver work
limit.
2023-11-27 14:55:49 +01:00
Simon Pilgrim
edf645616f [X86] Regenerate vector-interleaved-store-i64-stride-4.ll 2023-11-27 13:48:36 +00:00
Shengchen Kan
cb112eb16c
[X86][CodeGen] Teach frame lowering to spill/reload registers w/ PUSHP/POPP, PUSH2[P]/POP2[P] (#73292)
#73092 supported the encoding/decoding for PUSHP/POPP
#73233 supported the encoding/decoding for PUSH2[P]/POP2[P]

In this patch, we teach frame lowering to spill/reload registers w/
these instructions.

1. Use PPX for balanced spill/reload
2. Use PUSH2/POP2 for continuous spills/reloads
3. PUSH2/POP2 must be 16B-aligned on the stack, so pad when necessary
2023-11-27 21:37:07 +08:00
David Spickett
d7c03a196e [clang][AArch64][NFC] Remove trailing space in SME intriniscs header 2023-11-27 13:31:15 +00:00
Hans Wennborg
a9e3d232a5 Revert "[runtimes] Add missing test dependencies to check-all (#72955)"
This caused some runtimes builds to fail with:
error: unknown target 'runtimes-test-depends'

See comments on the PR.

> The test-depends target contained all the dependencies needed to run the
> runtimes tests, but it was never added as a dependency of check-all.
> This caused some of the tsan tests to fail, since the custom libcxx
> build the tests were looking for was never built. Besides the tsan
> failures, this fixes all the other test failures I was seeing with:
> cmake -G Ninja -B release-build -S llvm \
>         -DCMAKE_POSITION_INDEPENDENT_CODE=ON \
>         -DCMAKE_BUILD_TYPE=Release \
>         -DLLVM_ENABLE_ASSERTIONS=OFF \
>         -DLLVM_ENABLE_PROJECTS="clang;lld" \
>         -DLLVM_ENABLE_RUNTIMES="libcxx;libcxxabi;libunwind;compiler-rt"
>
> This is the same configuration the test-release.sh script uses, so I'm
> hoping this will also fix all the test failures we've been seeing when
> building the releases.
>
> Fixes #58680

This reverts commit 7f215b1380da49dccbf57da3040a40d25ed898f4.
2023-11-27 13:05:02 +01:00
Momchil Velikov
ac06d4e4cb Re-commit "[MachineSink][AArch64] Enable sink-and-fold by default (#72132)"
This re-commits 13fe0386454d after fixing a couple of issues in the LLDB
testsuite in ef9bcace834e and 6b87d84ff45d
2023-11-27 11:28:22 +00:00
Guray Ozen
f21a70f9fe
[mlir][cuda] Guard mgpuLaunchClusterKernel for Cuda 12.0+ (NFC) (#73495) 2023-11-27 11:50:46 +01:00
Nico Weber
344b5346a0 Revert "[gn] port 92b821f2dcdd"
This reverts commit 580858ad0accee171256d720cff465e7016f018d.
92b821f2dcdd was reverted in ea5de6021cf69a23
2023-11-27 19:44:43 +09:00
Hans Wennborg
ea5de6021c Revert "Reland "[llvm-exegesis] Switch from MCJIT to LLJIT (#72838)""
This broke

  tools/llvm-exegesis/X86/latency/dummy-counters.test

on Mac, see comment on the PR.

> There was an issue with certain configurations failing to build due to a
> deleted Error constructor that should be fixed with this relanding.

This reverts commit 92b821f2dcddbfa934689e10d8f11df150ab1043.
2023-11-27 11:34:55 +01:00
Simon Pilgrim
11276563c8 [X86] X86DAGToDAGISel - attempt to merge XMM/YMM loads with YMM/ZMM loads of the same ptr (#73126)
If we are loading the same ptr at different vector widths, then reuse the largest load and just extract the low subvector.

Unlike the equivalent VBROADCAST_LOAD/SUBV_BROADCAST_LOAD folds which can occur in DAG, we have to wait until DAGISel otherwise we can hit infinite loops if constant folding recreates the original constant value.

This is mainly useful for better constant sharing.
2023-11-27 10:26:26 +00:00
Timm Bäder
489df61a29 [clang][Interp][NFC] const qualify a local variable 2023-11-27 11:17:49 +01:00
Rik Huijzer
56cf3ff479
[mlir][spirv][doc] Remove duplicate syntax formats (#73386)
Some operations defined their syntax both in the documentation and via
`assemblyFormat`. This leads to two syntax descriptions in the
documentation for SPIR-V, see for example the documentation for
[`spirv.mlir.yield`](https://mlir.llvm.org/docs/Dialects/SPIR-V/#spirvmliryield-spirvyieldop).
Since the `assemblyFormat` is used to generate the actual parsers and
printer implementations, this PR removes the manual syntax descriptions.
(Similar to https://github.com/llvm/llvm-project/pull/73343.)

The strategy that I used to find the duplicates was pretty
uncomplicated. I scrolled through the [SPIR-V
Dialect](https://mlir.llvm.org/docs/Dialects/SPIR-V) to find all
duplicates and then remove the duplicate text from the `td` file.

Note that the `Syntax:` block in the docs is a good proxy for whether
`assemblyFormat` is defined because it will only be generated if the op
has defined `assemblyFormat` (`op.hasAssemblyFormat()`):


e970652776/mlir/tools/mlir-tblgen/OpDocGen.cpp (L108-L124)


e970652776/mlir/tools/mlir-tblgen/OpDocGen.cpp (L197-L199)

Related issue https://github.com/llvm/llvm-project/issues/73359.
2023-11-27 11:17:07 +01:00
Nikita Popov
272085f10b
[DomTree] Remove unnecessary domtree level check in SemiNCA (NFC) (#73107)
runSemiNCA() currently checks that the ReverseChildren are below
MinLevel in the DT, which is used when performing incremental updates.

However, ReverseChildren is populated during runDFS with only the
predecessors that are part of that DFS walk, which will itself be level
limited in the relevant cases. As such, I don't believe that this should
be checked during runSemiNCA().

This code probably dates back to a time when predecessors were not
cached during runDFS and as such not limited to the visited subtree
only.
2023-11-27 11:12:16 +01:00
Florian Hahn
17139f38e5
[LAA] Check HasSameSize before couldPreventStoreLoadForward.
After 9645267, TypeByteSize is 0 if both access do not have the same
size (i.e. HasSameSize will be false). This can cause an infinite loop
in couldPreventStoreLoadForward, if HasSameSize is not checked first.

So check HasSameSize first instead of after
couldPreventStoreLoadForward. Checking HasSameSize first is also
cheaper.
2023-11-27 10:10:41 +00:00
Timm Baeder
0e86d3ea9b
[clang] Print static_assert values of arithmetic binary operators (#71671)
These are actually quite useful to print.
2023-11-27 11:10:02 +01:00
Florian Hahn
2fda8ca6da
[LAA] Auto-generate checks for forward-loop-carried.ll
Auto-generate checks for -loop-carried.ll to make it easier to update in
follow-on patch. As this test only checks the dependence, mark pointers
as noalias to avoid also checking various runtime pointer check groups.
2023-11-27 10:06:17 +00:00
Guray Ozen
edf5cae739
[mlir][gpu] Support Cluster of Thread Blocks in gpu.launch_func (#72871)
NVIDIA Hopper architecture introduced the Cooperative Group Array (CGA).
It is a new level of parallelism, allowing clustering of Cooperative
Thread Arrays (CTA) to synchronize and communicate through shared memory
while running concurrently.

This PR enables support for CGA within the `gpu.launch_func` in the GPU
dialect. It extends `gpu.launch_func` to accommodate this functionality.

The GPU dialect remains architecture-agnostic, so we've added CGA
functionality as optional parameters. We want to leverage mechanisms
that we have in the GPU dialects such as outlining and kernel launching,
making it a practical and convenient choice.

An example of this implementation can be seen below:

```
gpu.launch_func @kernel_module::@kernel
                clusters in (%1, %0, %0) // <-- Optional
                blocks in (%0, %0, %0)
                threads in (%0, %0, %0)
```

The PR also introduces index and dimensions Ops specific to clusters,
binding them to NVVM Ops:

```
%cidX = gpu.cluster_id  x
%cidY = gpu.cluster_id  y
%cidZ = gpu.cluster_id  z

%cdimX = gpu.cluster_dim  x
%cdimY = gpu.cluster_dim  y
%cdimZ = gpu.cluster_dim  z
```

We will introduce cluster support in `gpu.launch` Op in an upcoming PR. 

See [the
documentation](https://docs.nvidia.com/cuda/parallel-thread-execution/index.html#cluster-of-cooperative-thread-arrays)
provided by NVIDIA for details.
2023-11-27 11:05:07 +01:00
Kerry McLaughlin
d1652ff080
[Clang][SME2] Add outer product and accumulate/subtract builtins (#71176)
Adds the following SME2 builtins:
 - svmop(a|s)_za32,
 - svbmop(a|s)_za32

See https://github.com/ARM-software/acle/pull/217
2023-11-27 09:54:23 +00:00
Tim Northover
1726a59d46 AArch64: remove duplicate SHA3 feature from Apple CPUs. NFC. 2023-11-27 09:44:53 +00:00
David Spickett
772f296214 [lldb][AArch64][Linux] Correct name of FPCR field
It should be "RMode" as in "rounding mode" not "RMMode".
2023-11-27 09:10:56 +00:00
David Spickett
8167934480
[lldb] Improve error message for script commands when there's no interpreter (#73321)
It was:
```
error: there is no embedded script interpreter in this mode.
```

1. What does "mode" mean?
2. It implies there might be an embedded script interpreter for some
other "mode", whatever that would be.

So I'm simplifying it and noting the most common reason for this which
is that lldb wasn't built with a scripting language enabled in the first
place.

There are other tips for dealing with this, but I'm not sure this
message is the best place for them.
2023-11-27 09:10:39 +00:00
Florian Hahn
d045e23c2d
[ConstraintElim] Refactor GEP offset collection.
Move GEP offset collection to separate helper function and collect
variable and constant offsets in OffsetResult. For now, this only
supports 1 VariableOffset, but the new code structure can be more easily
extended to handle more offsets in the future.

The refactoring drops the check that the VariableOffset >= -1 * constant
offset. This is not needed to check whether the constraint is
monotonically increasing. The constant factors can be ignored, the
constraint will be monotonically increasing if all variables are
positive.

See https://alive2.llvm.org/ce/z/ah2uSQ,
    https://alive2.llvm.org/ce/z/NCADNZ
2023-11-27 09:05:58 +00:00
Nikita Popov
57a0a9aadf [InstCombine] Add more inbounds tests for indexed compare fold (NFC)
Don't only test the case where all GEPs are missing inbounds, also
test inbounds only being present on some of them. The fold should
not be performed in either case.
2023-11-27 09:42:19 +01:00
Nikita Popov
553f8853db
[DomTree] Reduce number of hash table lookups (NFC) (#73097)
Inside runSemiNCA(), create a direct mapping from node number of node
info, so we can save the node number -> node pointer -> node info lookup
in many cases.

To allow this in more cases, change Label to a node number instead of
node pointer.

I've limited this to runSemiNCA() for now, because we have the
convenient property there that no new node infos will be added, so we
don't have to worry about pointer invalidation.

This gives a pretty nice compile-time improvement of about 0.4%.
2023-11-27 09:36:34 +01:00
David Green
295edaab13 [AArch64][GlobalISel] Better vecreduce.fadd lowering. (PR #73294)
This changes the fadd legalization to handle fp16 types, and treats more types
as legal so that the backend can produce the correct patterns. This is
currently a missing identity fold for `fadd x -0.0 -> x`
2023-11-27 08:20:54 +00:00
Aiden Grossman
5eb85c052e
[JumpThreading] Remove LVI printer flag (#73426)
This patch removes the -print-lvi-after-jump-threading flag now that we
can print everything in the LVI cache using the print<lazy-value-info>
pass.
2023-11-27 00:19:23 -08:00
Nikita Popov
2b646b5989
[CVP] Don't try to fold load/store operands to constant (#73338)
CVP currently tries to fold load/store pointer operands to constants
using LVI. If there is a dominating condition of the form `icmp eq ptr
%p, @g`, then `%p` will be replaced with `@g`.

LVI is geared towards range-based optimizations, and is *very*
inefficient at handling simple pointer equality conditions. We have
other passes that can handle this optimization in a more efficient way,
such as IPSCCP and GVN.

Removing this optimization gives a geomean 0.4-1.2% compile-time
improvement depending on configuration. At the same time, there
is no impact on codegen.
2023-11-27 09:17:03 +01:00
Hsiangkai Wang
477c0b67a3
[mlir][affine][gpu] Replace DivSIOp to CeilDivSIOp when lowering to GPU launch (#73328)
When converting affine.for to GPU launch operator, we have to calculate
the block dimension and thread dimension for the launch operator.

The formula of the dimension size is

(upper_bound - lower_bound) / step_size

When the difference is indivisible by step_size, we use rounding-to-zero
as the division result. However, the block dimension and thread
dimension is right-open range, i.e., [0, block_dim) and [0, thread_dim).
So, we will get the wrong result if we use DivSIOp. In this patch, we
replace it with CeilDivSIOp to get the correct block and thread
dimension values.
2023-11-27 08:05:54 +00:00
Shengchen Kan
27c0bc9cae
[X86][MC] Allow to specify any of the 8/16/32/64 register names interchangeably for R16-R31 (#73421) 2023-11-27 15:25:19 +08:00
Owen Pan
659e4017b7 [clang-format][NFC] Improve an if conditional in the annotator 2023-11-26 22:54:44 -08:00
LLVM GN Syncbot
681d02d09e [gn build] Port f8afc53d641c 2023-11-27 06:00:31 +00:00
Dmitry Vyukov
f8afc53d64
[libc++] Speed up classic locale (#72112)
Locale objects use atomic reference counting, which may be very
expensive in parallel applications. The classic locale is used by
default by all streams and can be very contended. But it's never
destroyed, so the reference counting is also completely pointless on the
classic locale. Currently ~70% of time in the parallel stringstream
benchmarks is spent in locale ctor/dtor. And the execution radically
slows down with more threads.

Avoid reference counting on the classic locale. With this change
parallel benchmarks start to scale with threads.

Co-authored-by: Louis Dionne <ldionne.2@gmail.com>

```
                              │   baseline   │    optimized                            │
                              │    sec/op    │    sec/op      vs base                  │
Istream_numbers/0/threads:1      4.672µ ± 0%   4.419µ ± 0%     -5.42% (p=0.000 n=30+39)
Istream_numbers/0/threads:72   539.817µ ± 0%   9.842µ ± 1%    -98.18% (p=0.000 n=30+40)
Istream_numbers/1/threads:1      4.890µ ± 0%   4.750µ ± 0%     -2.85% (p=0.000 n=30+40)
Istream_numbers/1/threads:72     66.44µ ± 1%   10.14µ ± 1%    -84.74% (p=0.000 n=30+40)
Istream_numbers/2/threads:1      4.888µ ± 0%   4.746µ ± 0%     -2.92% (p=0.000 n=30+40)
Istream_numbers/2/threads:72     494.8µ ± 0%   410.2µ ± 1%    -17.11% (p=0.000 n=30+40)
Istream_numbers/3/threads:1      4.697µ ± 0%   4.695µ ± 5%          ~ (p=0.391 n=30+37)
Istream_numbers/3/threads:72     421.5µ ± 7%   421.9µ ± 9%          ~ (p=0.665 n=30)
Ostream_number/0/threads:1       183.0n ± 0%   141.0n ± 2%    -22.95% (p=0.000 n=30)
Ostream_number/0/threads:72    24196.5n ± 1%   343.5n ± 3%    -98.58% (p=0.000 n=30)
Ostream_number/1/threads:1       250.0n ± 0%   196.0n ± 2%    -21.60% (p=0.000 n=30)
Ostream_number/1/threads:72    16260.5n ± 0%   407.0n ± 2%    -97.50% (p=0.000 n=30)
Ostream_number/2/threads:1       254.0n ± 0%   196.0n ± 1%    -22.83% (p=0.000 n=30)
Ostream_number/2/threads:72      28.49µ ± 1%   18.89µ ± 5%    -33.72% (p=0.000 n=30)
Ostream_number/3/threads:1       185.0n ± 0%   185.0n ± 0%      0.00% (p=0.017 n=30)
Ostream_number/3/threads:72      19.38µ ± 4%   19.33µ ± 5%          ~ (p=0.425 n=30)
```
2023-11-27 07:00:21 +01:00
Zi Xuan Wu (Zeson)
e89324219a
[RISCV] Don't combine store of vmv.x.s/vfmv.f.s to vp_store with VL of 1 when it's indexed store (#73219)
Because we can't support vp_store with indexed address mode by lowering to vse intrinsic later.
2023-11-27 13:39:35 +08:00
Kazu Hirata
6318dd8273 [mlir] Fix a warning
This patch fixes:

 mlir/lib/Pass/PassRegistry.cpp:376:37: error: ISO C++ requires the
 name after '::~' to be found in the same scope as the name before
 '::~' [-Werror,-Wdtor-name]
2023-11-26 20:40:56 -08:00
Jakub Kuderski
771676878a
[mlir][spirv] Add missing group non-uniform bitwise and logical ops (#73475)
This covers the following ops:
`spirv.GroupNonUniform` x {`Bitwise`, `Logical`} x {`And`, `Or`, `Xor`}

We need these to efficiently lower from the `gpu.subgroup_reduce` op.
2023-11-26 23:23:19 -05:00
Wang Pengcheng
2e6c01be0d
[SelectionDAG] Add instantiated OPC_EmitInteger and OPC_EmitStringInteger (#73241)
These two opcodes are used to be followed by a MVT operand, which is
always one of i8/i16/i32/i64.

We add instantiated `OPC_EmitInteger` and `OPC_EmitStringInteger` with
i8/i16/i32/i64 so that we can reduce one byte.

We reserve `OPC_EmitInteger` and `OPC_EmitStringInteger` in case that
we may need them someday, though I haven't found one usage after this
change.

Overall this reduces the llc binary size with all in-tree targets by
about 200K.
2023-11-27 11:08:28 +08:00
Fangrui Song
282201dc63
[Driver] Allow -e entry but reject -eentry (#72804)
This short option taking an argument is unfortunate.

* If a cc1-only option starts with `-e`, using it for driver will not be
  reported as an error (e.g. commit
  6cd9886c88d16d288c74846495d89f2fe84ff827).
* If another `-e` driver option is intended but a typo is made, the
  option will be recognized as a `-e`.

`gcc -export-dynamic` passes `-export-dynamic` to ld. It's not clear
whether some options behave this way.

It seems `-Wl,-eentry` and `-Wl,--entry=entry` are primarily used. There
may also be a few `gcc -e entry`, but `gcc -eentry` is extremely rare or
not used at all. Therefore, we probably should reject the Joined form of
`-e`.
2023-11-27 11:04:29 +08:00
XinWang10
a77ea94c3c
[X86][MC] Update condition about ExplicitVEXPrefix (#73312)
After #72835, ExplicitVEXPrefix has changed and it is not a bit now, but
in scope ExplicitOpPrefix, so the bitwise op of ExplicitVEXPrefix may
need to update.
2023-11-27 10:39:46 +08:00