435563 Commits

Author SHA1 Message Date
Slava Zakharin
02f3fec391 [flang] Compute type allocation size based on the actual target representation.
This change makes sure that we compute the element size and the byte stride
based on the target representation of the element type.

For example, when REAL*10 is mapped to x86_fp80 each element occupies
16 bytes rather than 10 because of the padding.

Note that the size computation method used here actually returns
the distance between two adjacent element of the *same* type in memory
(which is equivalent to llvm::DataLayout::getTypeAllocSize()).
It does not return the number of bytes that may be overwritten
by storing a value of the specified type (e.g. what can be computed
via llvm::DataLayout::getTypeStoreSize(), but not available in
mlir::DataLayout).

Differential Revision: https://reviews.llvm.org/D133508
2022-09-09 08:39:15 -07:00
Joseph Huber
83fcba82cc [Libomptarget] Add proper LLVM libraries now that the AMDGPU plugin uses them
Summary:
The AMDGPU and CUDA plugins now relies on the Object and Support
libraries. This patch adds them explicitly rather than hoping that they
share the symbols loaded from the standard `libomptarget`.
2022-09-09 10:33:26 -05:00
Guray Ozen
61a4b228f5 [mlir][linalg] Fix tiling interface implementation ordering of parallel_insert_slice
The tiling interface generates the order of parallel_insert_slice incorrectly when there are multiple destionation operands. This revision fixes that and adds a test for it. It depends on D132937

Reviewed By: nicolasvasilache

Differential Revision: https://reviews.llvm.org/D133204
2022-09-09 17:31:55 +02:00
Philip Reames
4e295cb1ce [LV] Autogen a test for ease of update 2022-09-09 08:16:22 -07:00
Michał Górny
bdb4468d39 [gdb-remote] Move broadcasting logic down to GDBRemoteClientBase
Move the broadcasting support from GDBRemoteCommunication
to GDBRemoteClientBase since this is where it is actually used.  Remove
GDBRemoteCommunication and subclass constructor arguments left over
after Communication cleanup.

Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.llvm.org/D133427
2022-09-09 17:13:08 +02:00
mydeveloperday
28bd7945ea [clang-format] NFC remove incorrect whitespace causing documentation issue 2022-09-09 16:03:48 +01:00
Jay Foad
8901f7cebc [AMDGPU] Fix crash legalizing G_EXTRACT_VECTOR_ELT with negative index
Fixes https://github.com/llvm/llvm-project/issues/57408

Differential Revision: https://reviews.llvm.org/D132938
2022-09-09 15:53:34 +01:00
Guray Ozen
a367c57141 [mlir][linalg] Relax tiling constraint when there are multiple destination operands
This revision relaxes constraint of tiling when there are multiple destination operands. It also adds a test.

Reviewed By: nicolasvasilache

Differential Revision: https://reviews.llvm.org/D132937
2022-09-09 16:38:33 +02:00
Philip Reames
a33d98e20a [LV] Pull out common expression [nfc] 2022-09-09 07:31:46 -07:00
Philip Reames
edb26268ce [VPlan] Only generate single instr for stores uniform across all parts.
Extend the approach taken by D133019 to store instructions.

Differential Revision: https://reviews.llvm.org/D133497
2022-09-09 07:15:12 -07:00
Nikita Popov
ebbac868b5 [AST] Fix unit test to use BatchAA (NFC) 2022-09-09 16:07:04 +02:00
Chris Bieneman
d3c54a172d [HLSL] Call global constructors inside entry
HLSL doesn't have a runtime loader model that supports global
construction by a loader or runtime initializer. To allow us to leverage
global constructors with minimal code generation impact we put calls to
the global constructors inside the generated entry function.

Differential Revision: https://reviews.llvm.org/D132977
2022-09-09 09:01:28 -05:00
Tue Ly
463dcc8749 [libc][math] Implement acosf function correctly rounded for all rounding modes.
Implement acosf function correctly rounded for all rounding modes.

We perform range reduction as follows:

- When `|x| < 2^(-10)`, we use cubic Taylor polynomial:
```
  acos(x) = pi/2 - asin(x) ~ pi/2 - x - x^3 / 6.
```
- When `2^(-10) <= |x| <= 0.5`, we use the same approximation that is used for `asinf(x)` when `|x| <= 0.5`:
```
  acos(x) = pi/2 - asin(x) ~ pi/2 - x - x^3 * P(x^2).
```
- When `0.5 < x <= 1`, we use the double angle formula: `cos(2y) = 1 - 2 * sin^2 (y)` to reduce to:
```
  acos(x) = 2 * asin( sqrt( (1 - x)/2 ) )
```
- When `-1 <= x < -0.5`, we reduce to the positive case above using the formula:
```
  acos(x) = pi - acos(-x)
```

Performance benchmark using perf tool from the CORE-MATH project on Ryzen 1700:
```
$ CORE_MATH_PERF_MODE="rdtsc" ./perf.sh acosf
GNU libc version: 2.35
GNU libc release: stable
CORE-MATH reciprocal throughput   : 28.613
System LIBC reciprocal throughput : 29.204
LIBC reciprocal throughput        : 24.271

$ CORE_MATH_PERF_MODE="rdtsc" ./perf.sh asinf --latency
GNU libc version: 2.35
GNU libc release: stable
CORE-MATH latency   : 55.554
System LIBC latency : 76.879
LIBC latency        : 62.118
```

Reviewed By: orex, zimmermann6

Differential Revision: https://reviews.llvm.org/D133550
2022-09-09 09:55:30 -04:00
Nikita Popov
a9f312c7f4 [AST] Use BatchAA in aliasesUnknownInst() (NFCI) 2022-09-09 15:54:48 +02:00
Nicolas Vasilache
20df17fd2d [mlir][vector] Extend WarpExecutionOnLane0 pattern support to allow deduplicating identical yield values.
Differential Revision: https://reviews.llvm.org/D133573
2022-09-09 06:53:36 -07:00
Peter Steinfeld
49dab97e3c [Flang] Update build documentation
Changes to build instructions based on the latest requirements with
compiler-rt.

Differential Revision: https://reviews.llvm.org/D131041
2022-09-09 06:51:53 -07:00
Sebastian Neubauer
c7750c522e Add helper func to get first non-alloca position
The LLVM performance tips suggest that allocas should be placed at the
beginning of the entry block. So far, llvm doesn’t provide any helper to
find that position.

Add BasicBlock::getFirstNonPHIOrDbgOrAlloca and IRBuilder::SetInsertPointPastAllocas(Function*)
that get an insert position after the (static) allocas at the start of a
function and use it in ShadowStackGCLowering.

Differential Revision: https://reviews.llvm.org/D132554
2022-09-09 15:39:53 +02:00
Carlos Alberto Enciso
f671eb17be Add command line argument parsing to the Windows packaging script.
As discussed here:
https://discourse.llvm.org/t/build-llvm-release-bat-script-options

Add a function to parse command line arguments: `parse_args`.

The format for the arguments is:
  Boolean: --option
  Value:   --option<separator>value
    with `<separator>` being: space, colon, semicolon or equal sign

Command line usage example:
  my-batch-file.bat --build --type=release --version 123

It will create 3 variables:
  `build` with the value `true`
  `type` with the value `release`
  `version` with the value `123`

Usage:
  set "build="
  set "type="
  set "version="

  REM Parse arguments.
  call :parse_args %*

  if defined build (
    ...
  )
  if %type%=='release' (
    ...
  )
  if %version%=='123' (
    ...
  )
2022-09-09 14:36:40 +01:00
Nikita Popov
5b1df2e951 [LICM] Regenerate test checks (NFC) 2022-09-09 15:30:17 +02:00
Guray Ozen
1e2b4ff936 [mlir][linalg] Retire LinalgStrategyEnablePass
This revision retires LinalgStrategyEnablePass.

Reviewed By: nicolasvasilache

Differential Revision: https://reviews.llvm.org/D133557
2022-09-09 15:13:52 +02:00
Pavel Labath
681d0d9e5f [lldb-server] Report launch error in vRun packets
Uses our existing "error string" extension to provide a better
indication of why the launch failed (the client does not make use of the
error yet).

Also, fix the way we obtain the launch error message (make sure we read
the whole message, and skip trailing garbage), and reduce the size of
TestLldbGdbServer by splitting some tests into a separate file.

Differential Revision: https://reviews.llvm.org/D133352
2022-09-09 15:10:38 +02:00
Pavel Labath
89a3691b79 [lldb] Fix ThreadedCommunication races
The Read function could end up blocking if data (or EOF) arrived just as
it was about to start waiting for the events. This was only discovered
now, because we did not have unit tests for this functionality before.
We need to check for data *after* we start listening for incoming
events. There were no changes to the read thread code needed, as we
already use this pattern in SynchronizeWithReadThread, so I just updated
the comments to make it clear that it is used for reading as well.

Differential Revision: https://reviews.llvm.org/D133410
2022-09-09 15:10:38 +02:00
Simon Pilgrim
05f56f10ed [X86] Fix VPPERM load folding latency
Noticed while investigating BITREVERSE cost numbers with the D103695 script - VPPERM folded loads was using the WriteVarShuffleX defaults and was missing an override like the VPPERM reg-reg variants
2022-09-09 13:57:39 +01:00
Thomas Symalla
72730c3f0e [NFC][AMDGPU] Pre-commit test for D132837. 2022-09-09 14:09:02 +02:00
Aaron Ballman
55d626f852 Fix LLVM sphinx build
Addresses the issue found by:
https://lab.llvm.org/buildbot/#/builders/30/builds/25791

We can use anonymous references rather than explicit ones.
2022-09-09 07:55:12 -04:00
Namhyung Kim
43efb5e445 [llvm-objdump] Create name for fake sections
It doesn't have a section header string table so add a vector to have
the strings and create name based on the program header type and the
index.

Differential Revision: https://reviews.llvm.org/D131290
2022-09-09 12:27:07 +01:00
Serge Pavlov
7b9fae05b4 [Clang] Use virtual FS in processing config files
Clang has support of virtual file system for the purpose of testing, but
treatment of config files did not use it. This change enables VFS in it
as well.

Differential Revision: https://reviews.llvm.org/D132867
2022-09-09 18:24:45 +07:00
Nikita Popov
4ab77d1677 [LICM] Allow promotion with non-load/store users
If there are non-load/store users of the promoted pointer, we
currently abort promotion. However, having such users isn't really
relevant to the transform. We already separately check that a)
there are no instructions that modref the promoted pointer and
b) that a pointer capture disables store promotion.

In the affected @test_captured_in_loop test case we have a readnone
capture of the promoted pointer, which means that load promotion
can be performed (while store promotion cannot).

Differential Revision: https://reviews.llvm.org/D133485
2022-09-09 13:09:59 +02:00
Dmitry Preobrazhensky
6d63a531e2 [AMDGPU][MC][GFX11][NFC] Update disassembler tests for VOPD instructions
Differential Revision: https://reviews.llvm.org/D133414
2022-09-09 13:10:55 +03:00
Dmitry Preobrazhensky
c07ea46f21 [AMDGPU][MC][GFX11][NFC] Update disassembler tests for VOP3P instructions
Differential Revision: https://reviews.llvm.org/D133412
2022-09-09 13:06:44 +03:00
Dmitry Preobrazhensky
6b79610fd5 [AMDGPU][MC][GFX11][NFC] Correct VOPD parsing
Differential Revision: https://reviews.llvm.org/D133492
2022-09-09 13:03:29 +03:00
Simon Pilgrim
55b78e28d8 [CostModel][X86] Add missing i8 throughput cost 2022-09-09 10:58:51 +01:00
Nicolas Vasilache
27cc31b64c [mlir][vector] NFC - Clean up vector patterns and propagate benefit through populate functions
Differential Revision: https://reviews.llvm.org/D133559
2022-09-09 02:45:22 -07:00
Serge Pavlov
55e1441f7b Revert "[Clang] Use virtual FS in processing config files"
This reverts commit 9424497e43aff088e014d65fd952ec557e28e6cf.
Some buildbots failed, reverted for investigation.
2022-09-09 16:43:15 +07:00
Brad Smith
9b4c3c2c5b [mlir] Bump building CRunnerUtils from C++11 to C++17
Reviewed By: nikic

Differential Revision: https://reviews.llvm.org/D133553
2022-09-09 05:34:22 -04:00
Serge Pavlov
9424497e43 [Clang] Use virtual FS in processing config files
Clang has support of virtual file system for the purpose of testing, but
treatment of config files did not use it. This change enables VFS in it
as well.

Differential Revision: https://reviews.llvm.org/D132867
2022-09-09 16:28:51 +07:00
Graham Hunter
1f639d1bd2 [NFC][LV] Convert masked call tests to use update script 2022-09-09 10:07:39 +01:00
Djordje Todorovic
df868edee5 "Recommit "[AggressiveInstCombine] Lower Table Based CTTZ""
This reverts commit 053841c5624ca7eacd108a26071d8a1cefe1bebd.

We faced a use-after-free after pushing the D113291, since the
foldSqrt() has a call to eraseFromParent(). The function
should be at the end of the main loop that folds the patterns.
This patch fixes that.
2022-09-09 10:29:39 +02:00
serge-sans-paille
6f2ed8fd3f [OpenMP] Install ompt-multiplex.h alongside omp.h
The default install direction may not be in the compiler search path.

Differential Revision: https://reviews.llvm.org/D133420
2022-09-09 09:42:08 +02:00
Vitaly Buka
7dc0734567 [msan] Insert simplification passes after instrumentation
This resolves TODO from D96406.
InstCombine issue is fixed with D133394.

Save 4.5% of .text on CTMark.
2022-09-09 00:33:04 -07:00
Benjamin Kramer
f055d9c549 [bazel] Port 7fa1d743d073 2022-09-09 09:31:13 +02:00
Alvin Wong
a3a8bd00c8 [clang][MinGW] Add -mguard=cf and -mguard=cf-nochecks
This option can be used to enable Control Flow Guard checks and
generation of address-taken function table. They are equivalent to
`/guard:cf` and `/guard:cf,nochecks` in clang-cl. Passing this flag to
the Clang driver will also pass `--guard-cf` to the MinGW linker.

This feature is disabled by default. The option `-mguard=none` is also
available to explicitly disable this feature.

Reviewed By: rnk

Differential Revision: https://reviews.llvm.org/D132810
2022-09-09 09:55:40 +03:00
Alvin Wong
bf7c5f1fae [LLD][MinGW] Add --[no-]guard-cf and --[no-]guard-longjmp
These will be LLD-specific options to support Control Flow Guard for the
MinGW target. They are disabled by default, but enabling `--guard-cf`
will also enable `--guard-longjmp` unless `--no-guard-longjmp` is also
specified. These options maps to `-guard:cf,[no]longjmp`.

Note that these features require the `_load_config_used` symbol to
contain the load config directory and be filled with the required
symbols. While current versions of mingw-w64 do not supply this symbol,
the user can provide their own version of it.

Reviewed By: MaskRay, rnk

Differential Revision: https://reviews.llvm.org/D132808
2022-09-09 09:55:40 +03:00
Thomas Raoux
06413618ea [mlir][vector] Don't duplicate transfer_read during vector distribution
Only apply the pattern if the transfer_read can be distributed for all
its uses.

Differential Revision: https://reviews.llvm.org/D133538
2022-09-09 06:35:40 +00:00
gonglingqin
da8c9521ee [LoongArch] Add codegen support for frint
According to the revised description in `LoongArch Reference Manual v1.02`,
frint.[s/d] does not judge whether floating-point inexact exceptions are
allowed indicated by FCSR, i.e. always executes roundToIntegralExact(x).
What's more, the manual also specifically defines that frint.s/d is only
necessary to be defined in LA64. So ISD::FRINT is legal for LA64.

Differential Revision: https://reviews.llvm.org/D133337
2022-09-09 14:25:34 +08:00
Craig Topper
aa83bdd198 [DAGCombiner][X86] Fold (sub (subcarry X, 0, Carry), Y) -> (subcarry X, Y, Carry)
Fixes PR57576.

Differential Revision: https://reviews.llvm.org/D133471
2022-09-08 22:56:46 -07:00
Jakub Kuderski
864236d1c1 [mlir][arith] Support wide integer constant emulation
Reviewed By: antiagainst, Mogball

Differential Revision: https://reviews.llvm.org/D133136
2022-09-09 00:04:06 -04:00
Christopher Bate
f4a478cd01 [mlir][Tensor] Add rewrites to extract slices through tensor.collape_shape
This change adds a set of utilities to replace the result of a
`tensor.collapse_shape -> tensor.extract_slice` chain with the
equivalent result formed by aggregating slices of the
`tensor.collapse_shape` source. In general, it is not possible to
commute `extract_slice` and `collapse_shape` if linearized dimensions
are sliced. The i-th dimension of the `tensor.collapse_shape`
result is a "linearized sliced dimension" if:

1) Reassociation indices of tensor.collapse_shape in the i'th position
   is greater than size 1 (multiple dimensions of the input are collapsed)
2) The i-th dimension is sliced by `tensor.extract_slice`.

We can work around this by stitching together the result of
`tensor.extract_slice` by iterating over any linearized sliced dimensions.
This is equivalent to "tiling" the linearized-and-sliced dimensions of
the `tensor.collapse_shape` operation in order to manifest the result
tile (the result of the `tensor.extract_slice`). The user of the
utilities must provide the mechanism to create the tiling (e.g. a loop).
In the tests, it is demonstrated how to apply the utilities using either
`scf.for` or `scf.foreach_thread`.

The below example illustrates the pattern using `scf.for`:

```
%0 = linalg.generic ... -> tensor<3x7x11x10xf32>
%1 = tensor.collapse_shape %0 [[0, 1, 2], [3]] : ... to tensor<341x10xf32>
%2 = tensor.extract_slice %1 [13, 0] [10, 10] [2, 1] : .... tensor<10x10xf32>
```

We can construct %2 by generating the following IR:

```
%dest = linalg.init_tensor() : tensor<10x10xf32>
%2 = scf.for %iv = %c0 to %c10 step %c1 iter_args(%arg0) -> tensor<10x10xf32> {
   // Step 1: Map this output idx (%iv) to a multi-index for the input (%3):
   %linear_index = affine.apply affine_map<(d0)[]->(d0*2 + 11)>(%iv)
   %3:3 = arith.delinearize_index %iv into (3, 7, 11)
   // Step 2: Extract the slice from the input
   %4 = tensor.extract_slice %0 [%3#0, %3#1, %3#2, 0] [1, 1, 1, 10] [1, 1, 1, 1] :
         tensor<3x7x11x10xf32> to tensor<1x1x1x10xf32>
   %5 = tensor.collapse_shape %4 [[0, 1, 2], [3]] :
         tensor<1x1x1x10xf32> into tensor<1x10xf32>
   // Step 3: Insert the slice into the destination
   %6 = tensor.insert_slice %5 into %arg0 [%iv, 0] [1, 10] [1, 1] :
         tensor<1x10xf32> into tensor<10x10xf32>
   scf.yield %6 : tensor<10x10xf32>
}
```

The pattern was discussed in the RFC here: https://discourse.llvm.org/t/rfc-tensor-extracting-slices-from-tensor-collapse-shape/64034

Reviewed By: nicolasvasilache

Differential Revision: https://reviews.llvm.org/D129699
2022-09-08 21:58:21 -06:00
Jakub Kuderski
7fa1d743d0 Reland "[mlir][arith] Add wide integer emulation pass"
This reverts commit 45b5e8abe56d7f28c88b0c6cdd60ff741874fb1d.

Relands https://reviews.llvm.org/D133135 after fixing shared libs
builds.
2022-09-08 23:30:47 -04:00
Sheng
88bdc4687d [NFC][M68k] Correct debug message. 2022-09-09 10:58:37 +08:00