471499 Commits

Author SHA1 Message Date
Chris Bieneman
0c3f51c042 Re-land [DX] Add support for PSV signature elements
The pipeline state data captured in the PSV0 section of the DXContainer
file encodes signature elements which are read by the runtime to map
inputs and outputs from the GPU program.

This change adds support for generating and parsing signature elements
with testing driven through the ObjectYAML tooling.

Reviewed By: bogner

Differential Revision: https://reviews.llvm.org/D157671

Initially landed as 8c567e64f808f7a818965c6bc123fedf7db7336f, and
reverted in 4d800633b2683304a5431d002d8ffc40a1815520.

../llvm/include/llvm/BinaryFormat/DXContainerConstants.def
../llvm/test/ObjectYAML/DXContainer/PSVv1-amplification.yaml
../llvm/test/ObjectYAML/DXContainer/PSVv1-compute.yaml
../llvm/test/ObjectYAML/DXContainer/PSVv1-domain.yaml
../llvm/test/ObjectYAML/DXContainer/PSVv1-geometry.yaml
../llvm/test/ObjectYAML/DXContainer/PSVv1-vertex.yaml
../llvm/test/ObjectYAML/DXContainer/PSVv2-amplification.yaml
../llvm/test/ObjectYAML/DXContainer/PSVv2-compute.yaml
../llvm/test/ObjectYAML/DXContainer/PSVv2-domain.yaml
../llvm/test/ObjectYAML/DXContainer/PSVv2-geometry.yaml
../llvm/test/ObjectYAML/DXContainer/PSVv2-vertex.yaml
2023-08-16 14:26:13 -05:00
Jim Ingham
d268ba3808 Test follow-up to 2e7aa2ee34eb53347396731dc8a3b2dbc6a3df45
The TestEvents.py test I added for ShadowListeners fails on Windows.
Since there's no reason to believe the ShadowListeners feature has
different behavior from the other event-based tests here, I copied
the skips & expected_flakey's from the other tests in that file to
this one.
2023-08-16 12:19:07 -07:00
Lei Zhang
73ddc4474b [mlir][vector] Enable distribution over multiple dimensions
This commit starts enabling vector distruction over multiple
dimensions. It requires delinearize the lane ID to match the
expected rank. shape_cast and transfer_read now can properly
handle multiple dimensions.

Reviewed By: hanchung

Differential Revision: https://reviews.llvm.org/D157931
2023-08-16 12:08:43 -07:00
Craig Topper
42dad521e3 [RISCV] Add RISCVII::getRoundModeOpNum to reduce code duplication. NFC 2023-08-16 12:00:02 -07:00
Chris Bieneman
4d800633b2 Revert "[DX] Add support for PSV signature elements"
This reverts commit 8c567e64f808f7a818965c6bc123fedf7db7336f.
2023-08-16 13:52:26 -05:00
Chris Bieneman
8c567e64f8 [DX] Add support for PSV signature elements
The pipeline state data captured in the PSV0 section of the DXContainer
file encodes signature elements which are read by the runtime to map
inputs and outputs from the GPU program.

This change adds support for generating and parsing signature elements
with testing driven through the ObjectYAML tooling.

Reviewed By: bogner

Differential Revision: https://reviews.llvm.org/D157671
2023-08-16 13:38:20 -05:00
Blue Gaston
b5c2075081 [Sanitizers][Driverkit] Stop using Sanitizer Allocator64 on Driverkit
Before refactoring this code, all arm64 were set to use the 32bit allocator. This patch reverts back that behavior for DriverKit.

Because we target DriverKit as the target OS, rather than a specific platform, reverting back to the previous behavior is preferred to fix a failure we are seeing on embedded platforms.
Though it may be more correct in the future to match the allocator to the platform being used.

rdar://113649286

Differential Revision: https://reviews.llvm.org/D158028
2023-08-16 11:29:36 -07:00
Valentin Clement
1640b80d6f
[flang][openacc] Lower gang, vector, worker, seq and nohost for acc routine
Lower clauses to the routine info op.

Reviewed By: razvanlupusoru

Differential Revision: https://reviews.llvm.org/D158007
2023-08-16 11:22:40 -07:00
Daniel Hoekwater
2c43d591c6 [CodeGen] Move function splitting tests from X86 to Generic (NFC)
Machine function splitting will become available for AArch64; since MFS
is no longer X86-only, the tests for generic behavior should live
somewhere other than tests/CodeGen/X86.

MFS implementation doesn't vary much across platforms, and most tests
should be identical between X86 and AArch64 besides instruction
selection, so the tests can live together in tests/CodeGen/Generic.

Differential Revision: https://reviews.llvm.org/D157563
2023-08-16 18:11:23 +00:00
Valentin Clement
0e7649698a
[flang][openacc] Fix post deallocate suffix
The wrong suffix was applied

Reviewed By: razvanlupusoru

Differential Revision: https://reviews.llvm.org/D158098
2023-08-16 11:09:42 -07:00
Hanhan Wang
8b68cec9c0 [mlir][tensor] Add producer fusion for tensor.pack op.
We are able to fuse the pack op only if inner tiles are not tiled or
they are fully used. Otherwise, it could generate a sequence of
non-trivial ops.

Differential Revision: https://reviews.llvm.org/D157932
2023-08-16 11:02:59 -07:00
Owen Pan
063c42e919 [clang-format] Handle NamespaceMacro string arg for FixNamespaceComments
Fixes #63795.

Differential Revision: https://reviews.llvm.org/D157568
2023-08-16 10:45:54 -07:00
Matt Arsenault
c9d0d15e69 AMDGPU: Refine some rsq formation tests
Drop unnecessary flags and metadata, add contract flags that should be
necessary.
2023-08-16 13:37:03 -04:00
Jim Ingham
2e7aa2ee34 Replace the singleton "ShadowListener" with a primary and N secondary Listeners
Before the addition of the process "Shadow Listener" you could only have one
Listener observing the Process Broadcaster.  That was necessary because fetching the
Process event is what switches the public process state, and for the execution
control logic to be manageable you needed to keep other listeners from causing
this to happen before the main process control engine was ready.

Ismail added the notion of a "ShadowListener" - which allowed you ONE
extra process listener.  This patch inverts that setup by designating the
first listener as primary - and giving it priority in fetching events.

Differential Revision: https://reviews.llvm.org/D157556
2023-08-16 10:35:32 -07:00
LLVM GN Syncbot
329979cf37 [gn build] Port 2459ed67805c 2023-08-16 17:25:23 +00:00
Nico Weber
e87d68ce8f [gn] port 23d1b6577a50 2023-08-16 13:25:02 -04:00
Dhruv Chawla
de059a2ea2
[NFC][ValueTracking] Remove calls to computeKnownBits for non-intrinsic CallInsts in isKnownNonZeroFromOperator
For non-intrinsic CallInsts, computeKnownBits only handles range
metadata and checking getReturnedArgOperand(). Both of these are now
handled in isKnownNonZero, so there is no need to fall through to
a call to computeKnownBits anymore.

Differential Revision: https://reviews.llvm.org/D158095
2023-08-16 22:52:13 +05:30
Kazushi (Jam) Marukawa
922ac64b04 [VE] Avoid vectorizing store/load in scalar mode
Avoid vectorizing store and load instructions in scalar mode.

Reviewed By: efocht

Differential Revision: https://reviews.llvm.org/D158049
2023-08-17 02:15:54 +09:00
V Donaldson
1fd72321a4 [flang] Runtime assigned format errors
Generate a runtime error message for a reference to an invalid
assigned format such as:

if (.true.) print n
end
2023-08-16 10:14:34 -07:00
Craig Topper
0805310b50 [RISCV] Fix spelling Ctypto->Crypto. NFC 2023-08-16 10:11:05 -07:00
Kazu Hirata
6e6014a260 [Analysis] Fix an unused variable warning
This patch fixes:

  llvm/lib/Analysis/LoopAccessAnalysis.cpp:2001:12: error: unused
  variable 'MinDepDistBytesOld' [-Werror,-Wunused-variable]
2023-08-16 10:09:40 -07:00
V Donaldson
04e6129d32 [flang] Separate module procedure variant
Accept "module procedure" (as well as module function/subroutine)
in a separate module procedure definition, such as "bb1" in:

module mm
  interface
    module subroutine mm1
    end subroutine
  end interface
end module

submodule(mm) bb
  interface
    module subroutine bb1
    end subroutine
  end interface
contains
  module procedure mm1
    call bb1
  end procedure
  module procedure bb1
    print*, 'bb1'
  end procedure
end submodule

  use mm
  call mm1
end
2023-08-16 10:07:07 -07:00
Michael Maitland
87ddd3a191 [LAA] Rename and fix semantics of MaxSafeDepDistBytes to MinDepDistBytes
`MaxSafeDepDistBytes` was not correct based on its name an semantics
in instances when there was a non-unit stride loop. For example,

```
for (int k = 0; k < len; k+=3) {
  a[k] = a[k+4];
  a[k+2] = a[k+6];
}
```

Here, the smallest dependence distance is 24 bytes, but only vectorizing 8 bytes
is safe. `MaxSafeVectorWidthInBits` reported the correct number of bits
that could be vectorized as 64 bits.

The semantics of of `MaxSafeDepDistBytes` should be:
  The smallest dependence distance in bytes in the loop. This may not be
  the same as the maximum number of bytes that are safe to operate on
  simultaneously.

The name of this variable should reflect those semantics and
its docstring should be updated accordingly, `MinDepDistBytes`.

A debug message that used `MaxSafeDepDistBytes` to signify to the user
how many bytes could be accessed in parallel is updated to use
`MaxSafeVectorWidthInBits` instead. That way, the same message if
communicated to the user, just in different units.

This patch makes sure that when `MinDepDistBytes` is modified in a way
that should impact `MaxSafeVectorWidthInBits`, that we update the latter
accordingly. This patch also clarifies why `MaxSafeVectorWidthInBits`
does not to be updated when `MinDepDistBytes` is (i.e. in the case of a
forward dependency).

Differential Revision: https://reviews.llvm.org/D156158
2023-08-16 09:53:35 -07:00
Nicholas Guy
d65feccb12 [ARM] Set preferred function alignment
Aligning functions yields small performance gains on
embedded cores, moreso with numerous small function calls.
Similar to aligning loops, if the function can fit within
a single cache line then the performance overhead of
fetching more instructions can be limited.

Differential Revision: https://reviews.llvm.org/D157514
2023-08-16 17:31:21 +01:00
Ingo Müller
d7e26b5620 [mlir][linalg][transform][python] Fix mix-in for MaskedVectorize.
Fix forward bug in dac19b457e2cfd139e0e5cc29872ba3c65b7510f, which uses
the vertical bar operator for type hints, which is only supported by
Python 3.10 and later, and thus breaks the builds on Python 3.8.
2023-08-16 16:27:46 +00:00
Siu Chi Chan
d40fd9e1d9 Fix typo in module inliner priority flag
Change-Id: If4a830fdacf1b0e7b7634f48f648427d5ec7ea21

Reviewed By: kazu, arsenm

Differential Revision: https://reviews.llvm.org/D158013
2023-08-16 12:26:06 -04:00
Jonas Devlieghere
5afa519c1a
[lldb] Print better error message when sphinx_automodapi is not installed
Print an error message with instructions on how to install
sphinx_automodapi.

Differential revision: https://reviews.llvm.org/D158022
2023-08-16 09:14:42 -07:00
David Green
a047dfe0d5 [AArch64][GISel] Lower EXT of 0 to a COPY
This allows us to select G_SHUFFLE_VECTOR with identity masks (possibly
including undef elements), but avoid the actual EXT instruction if the shift
amount is 0.
2023-08-16 17:12:15 +01:00
Dhruv Chawla
d53b3df570
[InstCombine] Remove unneeded isa<PHINode> check in foldOpIntoPhi
This check is redundant as it is covered by the call to
isPotentiallyReachable.

Depends on D155726.

Differential Revision: https://reviews.llvm.org/D155718
2023-08-16 21:09:08 +05:30
Dhruv Chawla
e549d578cc
[InstCombine] Test cases for D155718
Differential Revision: https://reviews.llvm.org/D155726
2023-08-16 21:09:04 +05:30
Akash Banerjee
5d9ccd7a96 [OpenMP] Migrate dispatch related utility functions from Clang codegen to OMPIRBuilder
Migrate createForStaticInitFunction, createDispatchInitFunction, createDispatchNextFunction and createDispatchFiniFunction from Clang CodeGen to OMPIRBuilder.

Differential Revision: https://reviews.llvm.org/D157994
2023-08-16 16:35:28 +01:00
Joseph Huber
5717329f1a [Libomptarget] Disable deadlocking bug49334.cpp test on AMDGPU
This test hangs on AMDGPU sporadically, disable it for the time being.

Fixes: https://github.com/llvm/llvm-project/issues/64733

Reviewed By: ronlieb

Differential Revision: https://reviews.llvm.org/D158082
2023-08-16 10:24:00 -05:00
Benjamin Maxwell
0d3abdc263 [mlir][Linalg] Fix formatting of generated docs markdown
This patch prevents `mlir-linalg-ods-yaml-gen` from adding extra
whitespace around the summary and description fields. This broke the
_italics_ of the summary as _ this _ is not recognised by markdown.
It also meant the first line of the description was in a code block
  as it was indented two spaces.

The separator between summary and description has also been updated to
two newlines. This was already followed and prevents line wrapping the
summary putting part of it in the description.

These issues can be currently seen at: https://mlir.llvm.org/docs/Dialects/Linalg/

Reviewed By: awarzynski

Differential Revision: https://reviews.llvm.org/D157853
2023-08-16 15:08:51 +00:00
Ingo Müller
67c092c8c8 [mlir][transform][python] Add test for AnyValueType binding.
I had forgotten to commit that test as part of
https://reviews.llvm.org/D157638.

Reviewed By: ftynse

Differential Revision: https://reviews.llvm.org/D158074
2023-08-16 15:07:48 +00:00
Ingo Müller
dac19b457e [mlir][linalg][transform][python] Add mix-in for MaskedVectorize.
Reviewed By: springerm

Differential Revision: https://reviews.llvm.org/D157735
2023-08-16 15:07:46 +00:00
Ingo Müller
2d3dcd4aec [mlir][linalg][transform][python] Add mix-in for BufferizeToAllocOp.
Re-apply https://reviews.llvm.org/D157704.

The original patch broke the tests on Python 3.8 and got reverted by
0c4aad050c23254c3c612e860e1278961d161aef. This patch replaces the usage
of the vertical bar operator for type hints with `Union`.

Reviewed By: ftynse

Differential Revision: https://reviews.llvm.org/D158075
2023-08-16 15:07:43 +00:00
Felix
a94c44cc0a [clang-tidy] Added a new option to lambda-function-name to ignore warnings in macro expansion
Improved check lambda-function-name with option IgnoreMacros to ignore warnings in macro expansion.
Relates to #62857 (https://github.com/llvm/llvm-project/issues/62857)

Reviewed By: PiotrZSL

Differential Revision: https://reviews.llvm.org/D157829
2023-08-16 15:02:56 +00:00
Eduard Zingerman
8f28e8069c [BPF] support for BPF_ST instruction in codegen
Generate store immediate instruction when CPUv4 is enabled.
For example:

    $ cat test.c
    struct foo {
      unsigned char  b;
      unsigned short h;
      unsigned int   w;
      unsigned long  d;
    };
    void bar(volatile struct foo *p) {
      p->b = 1;
      p->h = 2;
      p->w = 3;
      p->d = 4;
    }

    $ clang -O2 --target=bpf -mcpu=v4 test.c -c -o - | llvm-objdump -d -
    ...
    0000000000000000 <bar>:
           0:	72 01 00 00 01 00 00 00	*(u8 *)(r1 + 0x0) = 0x1
           1:	6a 01 02 00 02 00 00 00	*(u16 *)(r1 + 0x2) = 0x2
           2:	62 01 04 00 03 00 00 00	*(u32 *)(r1 + 0x4) = 0x3
           3:	7a 01 08 00 04 00 00 00	*(u64 *)(r1 + 0x8) = 0x4
           4:	95 00 00 00 00 00 00 00	exit

Take special care to:
- apply `BPFMISimplifyPatchable::checkADDrr` rewrite for BPF_ST
- validate immediate value when BPF_ST write is 64-bit:
  BPF interprets `(BPF_ST | BPF_MEM | BPF_DW)` writes as writes with
  sign extension. Thus it is fine to generate such write when
  immediate is -1, but it is incorrect to generate such write when
  immediate is +0xffff_ffff.

This commit was previously reverted in e66affa17e32.
The reason for revert was an unrelated bug in BPF backend,
triggered by test case added in this commit if LLVM is built
with LLVM_ENABLE_EXPENSIVE_CHECKS.
The bug was fixed in D157806.

Differential Revision: https://reviews.llvm.org/D140804
2023-08-16 17:51:28 +03:00
Philip Reames
3c2a66973e [RISCVInsertVSETVLI] Generalize scalar extract (vmv.x.s, and vmx.f.s) hamdling
vmv.x.s and vmv.f.s are unconditional. They read the low element of a vector
register (not vector group), and function even when VL=0 or VSTART>0. As such,
they are don't care with respect to both VL and LMUL.

We'd previously had handling in the forward pass only via the NoRegister
mechanusm.  (The only instructions with SEW but without VL are these extracts.)
This patch moves that handling into getDemanded so that the backwards pass
benefits as well.

Differential Revision: https://reviews.llvm.org/D157991
2023-08-16 07:50:59 -07:00
Soumi Manna
bd1ddc5850 [NFC][OpenMP] Initialize pointer field
Reviewed By: tahonermann

Differential Revision: https://reviews.llvm.org/D157989
2023-08-16 07:47:24 -07:00
Philip Reames
b06e52c32f [RISCVInsertVSETVLI] Default to VL=1 for scalar extracts
We were defaulting to VL=0 when we didn't otherwise have a vsetv
nearby. Instead, let's use VL=1. VL=0 is very much a cornercase
in hardware, and let's avoid if we can.

Differential Revision: https://reviews.llvm.org/D158015
2023-08-16 07:35:00 -07:00
Joseph Huber
0f386e693b [libc][fix] Fix test after changing logic for generic stdio
Summary:
The previous patch accidentally broke the logic for adding the `generic`
subdirectory. Fix this so the CPU build works properly.
2023-08-16 09:29:29 -05:00
Joseph Huber
1e573f378c [libc] Implement fopen, fclose, and fread on the GPU
This patch implements the `fopen`, `fclose`, and `fread` functions on
the GPU. These are pretty much re-implemented from what existed but
using the new interface. Having this subset allows us to test the
interface a bit more strenuously since we can write and read to a file.

Reviewed By: sivachandra

Differential Revision: https://reviews.llvm.org/D157622
2023-08-16 09:14:38 -05:00
Matt Arsenault
7c4aa3b37e AMDGPU: InstCombine amdgcn.rcp(amdgcn.sqrt) -> amdgcn.rsq
We currently have some wrong combines in the backend that
approximately do this.

https://reviews.llvm.org/D158002
2023-08-16 10:04:13 -04:00
Matt Arsenault
f19ee76f35 AMDGPU: Add baseline tests for rcp to rsq fold 2023-08-16 10:03:49 -04:00
Florian Hahn
5816d2ab28
[SimplifyCFG] Add tests for sinking load/store with swifterror operand.
Add test coverage for sinking/hoisting loads/stores with swifterror
pointers. Currently this isn't handled correctly by SimplifyCFG and
causes a verifier error.
2023-08-16 14:51:29 +01:00
Matt Arsenault
66ee794064 AMDGPU: Fix verifier error on splatted opencl fmin/fmax and ldexp calls
Apparently the spec has overloads for fmin/fmax and ldexp with one of
the operands as scalar. We need to broadcast the scalars to the vector
type.

https://reviews.llvm.org/D158077
2023-08-16 09:42:26 -04:00
Bjorn Pettersson
0c4c961008 [LinkAllPasses] Remove unused header includes. NFCI
This patch removes some includes from LinkAllPasses.h, that appears
to be unused. Those should have been removed earlier when the
corresponding legacy PM passes were removed.

InstSimplifyPass is a bit special since the legacy PM version of the
pass still exists. But since createInstSimplifyLegacyPass is defined
in Scalar.h and not in InstSimplifyPass.h that particular include
isn't needed anyway.
2023-08-16 15:24:19 +02:00
Timm Bäder
871ee94141 [clang][ExprConst] Use call source range for 'in call to' diags
Differential Revision: https://reviews.llvm.org/D156604
2023-08-16 15:22:29 +02:00
Matthias Springer
878950b82c [mlir][bufferization] Simplify getBufferType
`getBufferType` computes the bufferized type of an SSA value without bufferizing any IR. This is useful for predicting the bufferized type of iter_args of a loop.

To avoid endless recursion (e.g., in the case of "scf.for", the type of the iter_arg depends on the type of init_arg and the type of the yielded value; the type of the yielded value depends on the type of the iter_arg again), `fixedTypes` was used to fall back to "fixed" type. A simpler way is to maintain an "invocation stack". `getBufferType` implementations can then inspect the invocation stack to detect repetitive computations (typically when computing the bufferized type of a block argument).

Also improve error messages in case of inconsistent memory spaces inside of a loop.

Differential Revision: https://reviews.llvm.org/D158060
2023-08-16 15:02:07 +02:00