474868 Commits

Author SHA1 Message Date
Arthur Eubanks
0a1aa6cda2
[NFC][CodeGen] Change CodeGenOpt::Level/CodeGenFileType into enum classes (#66295)
This will make it easy for callers to see issues with and fix up calls
to createTargetMachine after a future change to the params of
TargetMachine.

This matches other nearby enums.

For downstream users, this should be a fairly straightforward
replacement,
e.g. s/CodeGenOpt::Aggressive/CodeGenOptLevel::Aggressive
or s/CGFT_/CodeGenFileType::
2023-09-14 14:10:14 -07:00
Jakub Kuderski
ae84b160d2
[GitHub][mlir][spirv] Add missing patterns for SPIR-V in mlir (#66423)
Include conversion passes and tools.
2023-09-14 17:06:03 -04:00
Joseph Huber
bbe7eb92b4 [libc][Obvious] Fix missing entrypoints after moving to generic
Summary:
The previous patch moved the implementations of these to generic/ and
accidentally did not add the unlocked variants. This patch fixes that
2023-09-14 15:59:08 -05:00
Jakub Kuderski
12175bcbce
[mlir][spirv] Support coop matrix in spirv.CompositeConstruct (#66399)
Also improve the documentation (code and website).
2023-09-14 16:57:59 -04:00
eric
571e4f233b Mark LWG 2426 as complete.
Atomic is implemented by Clang, and Clang already implements the
behavior in the DR. This was initially noticed by Zoe Carver.
2023-09-14 16:42:57 -04:00
Joseph Huber
a1be5d69df
[libc] Implement more input functions on the GPU (#66288)
Summary:
This patch implements the `fgets`, `getc`, `fgetc`, and `getchar`
functions on the GPU. Their implementations are straightforward enough.
One thing worth noting is that the implementation of `fgets` will be
extremely slow due to the high latency to read a single char. A faster
solution would be to make a new RPC call to call `fgets` (due to the
special rule that newline or null breaks the stream). But this is left
out because performance isn't the primary concern here.
2023-09-14 15:39:29 -05:00
Adrian Prantl
96ccc81fb8
Add comments (NFC) (#66427) 2023-09-14 12:56:28 -07:00
Yaxun (Sam) Liu
d7e1932f85
[HIP] Fix comdat of template kernel handle (#66283)
Currently, clang emits LLVM IR that fails verifier for the following
code:

```
template<typename T>
__global__ void foo(T x);

void bar() {
  foo<<<1, 1>>>(0);
}
```
This is due to clang putting the kernel handle for foo into comdat,
which is not allowed, since the kernel handle is a declaration.

The siutation is similar to calling a declaration-only template
function. The callee will be a declaration in LLVM IR and won't be put
into comdat. This is in contrast to calling a template function with
body, which will be put into comdat.

Fixes: SWDEV-419769
2023-09-14 15:56:02 -04:00
Christopher Bate
cafb6284d1 [mlir][VectorToGPU] Update memref stride preconditions on nvgpu.mma.sync path
This change removes the requirement that the row stride be statically known when
converting `vector.transfer_read` and `vector.transfer_write` to distributed
SIMT operations in the `nvgpu` lowering path. It also adds a check to verify
that the last dimension of the source memref is statically known to have stride
1 since this is assumed in the conversion logic.  No other change should be
required since the generated `vector.load` operations are never created across
dimensions other than the last. The routines for checking preconditions on
`vector.transfer_read/write` are moved to under nvgpu utilities.

The change is NFC with respect to the GPU dialect lowering path.

Reviewed By: ThomasRaoux

Differential Revision: https://reviews.llvm.org/D155753
2023-09-14 13:51:42 -06:00
Adrian Prantl
e8ad9b0e33
Add .swift_ast to list of Swift sections (#66426) 2023-09-14 12:51:13 -07:00
Alexey Bataev
c15c1e5dd5 [SLP]Do not account non-instructions for external use.
If the non-instruction gets vectorized, no need to account its extract
cost, it won't be removed and replaced by extractelement instruction.
2023-09-14 12:40:33 -07:00
Alexey Bataev
1034405486 [SLP][NFC]Add a test for non-instruction with external use. 2023-09-14 12:34:14 -07:00
Christopher Ferris
fd1721d860
[scudo] Add -Wconversion for tests and clean-up warnings. (#66147)
Fix all the places where the tests are doing implicit conversions.
2023-09-14 12:32:16 -07:00
Louis Dionne
85f27d126d
[libc++] Make sure LWG2070 is implemented as a DR (#65998)
When we implemented C++20's P0674R1, we didn't enable the part of
P0674R1 that was resolving LWG2070 as a DR. This patch fixes that and
makes sure that we consistently go through the allocator when
constructing and destroying the underlying object in
std::allocate_shared.

Fixes #54365.
2023-09-14 15:12:06 -04:00
Amir Ayupov
4a6426a802 [BOLT][NFC] Simplify RI::selectFunctionsToProcess
Reviewed By: #bolt, maksfb

Differential Revision: https://reviews.llvm.org/D159516
2023-09-14 11:57:44 -07:00
Kinuko Yasuda
0612c9b09a
[clang][dataflow] Ignore assignment where base class's operator is used (#66364)
In C++ it seems it is legit to use base class's operator (e.g. `using
Base::operator=`) to perform copy if the base class is the common
ancestor of the source and destination object. In such a case we
shouldn't try to access fields beyond that of the base class, however
such a case seems to be very rare (typical code would implement a copy
constructor instead), and could add complexities, so in this patch we
simply bail if the method operator's parent class is different from the
type of the destination object that this framework recognizes.
2023-09-14 20:45:56 +02:00
Leonard Chan
f45f1c3585 Reland "[clang] Add experimental option to omit the RTTI component from the vtable when -fno-rtti is used"
This reverts commit 070493ddbd9473499d6f00ca62bc6aa92808ed79 (and
relands the original change). This removes a test run that makes an
assumption of RTTI being on by default for a given target.
2023-09-14 18:28:37 +00:00
LLVM GN Syncbot
a126d61ede [gn build] Port 71e3642619dd 2023-09-14 18:15:29 +00:00
Kuba (Brecka) Mracek
454cc36630
[AArch64] Relax binary format switch in AArch64MCInstLower::LowerSymbolOperand to allow non-Darwin Mach-O files (#66011)
Trying to use a arm64-apple-none-macho target triple today crashes with
an assertion, this patch fixes that.
2023-09-14 11:12:30 -07:00
Matthias Braun
b0c8c45423
Avoid BlockFrequency overflow problems (#66280)
Multiplying raw block frequency with an integer carries a high risk
of overflow.

- Add `BlockFrequency::mul` return an std::optional with the product
  or `nullopt` to indicate an overflow.
- Fix two instances where overflow was likely.
2023-09-14 11:11:27 -07:00
Danila Malyutin
e80a8b4ab6 [NFC] Add test for #66382 2023-09-14 21:10:00 +03:00
Justin Bogner
11f0a63267 [github] Simplify DirectX backend labeling (#66407)
Based on how AMDGPU and RISCV set up their labels - this way does a
better job of catching everything with less maintenance burden.
2023-09-14 11:05:25 -07:00
Justin Bogner
71e3642619
[Transforms][DXIL] Wire up a basic DXILUpgrade pass (#66275)
This pass will upgrade DXIL-style llvm constructs (which are mostly
metadata) into the representations we use in LLVM for the same concepts.

For now we just strip the valver metadata, which we don't need. Later
changes will make this pass more useful, and then we should be able to
wire it into clang and possibly the DirectX backend's AsmParser.
2023-09-14 11:02:31 -07:00
Alex Langford
a5a2a5a3ec [lldb][NFCI] Remove use of ConstString in StructuredData
The remaining use of ConstString in StructuredData is the Dictionary
class. Internally it's backed by a `std::map<ConstString, ObjectSP>`.
I propose that we replace it with a `llvm::StringMap<ObjectSP>`.

Many StructuredData::Dictionary objects are ephemeral and only exist for
a short amount of time. Many of these Dictionaries are only produced
once and are never used again. That leaves us with a lot of string data
in the ConstString StringPool that is sitting there never to be used
again. Even if the same string is used many times for keys of different
Dictionary objects, that is something we can measure and adjust for
instead of assuming that every key may be reused at some point in the
future.

Quick comparisons of key data is likely not a concern with Dictionary,
but the use of `llvm::StringMap` means that lookups should be fast with
its hashing strategy.

Switching to a llvm::StringMap meant that the iteration order may be
different. To account for this when serializing/dumping the dictionary,
I added some code to sort the output by key before emitting anything.

Differential Revision: https://reviews.llvm.org/D159313
2023-09-14 10:53:39 -07:00
Christopher Bate
e2d39f799b [mlir][Transform] Add updateConversionTarget to ConversionPatternDescriptorOpInterface
This change adds a method to modify the ConversionTarget used during
`transform.apply_conversion_patterns` to the
`ConversionPatternDescriptorOpInterface`. This is needed when the TypeConverter
is used to dictate the dynamic legality of operations, as in "structural"
conversion patterns present in, for example, the SCF and func dialects.

As a first use case/test, this change also adds a
`transform.apply_patterns.scf.structural_conversions` operation to the SCF
dialect.

Reviewed By: springerm

Differential Revision: https://reviews.llvm.org/D158672
2023-09-14 11:39:47 -06:00
Fangrui Song
5a58e98c20
[ELF] Align the end of PT_GNU_RELRO associated PT_LOAD to a common-page-size boundary (#66042)
Close #57618: currently we align the end of PT_GNU_RELRO to a
common-page-size
boundary, but do not align the end of the associated PT_LOAD. This is
benign
when runtime_page_size >= common-page-size.

However, when runtime_page_size < common-page-size, it is possible that
`alignUp(end(PT_LOAD), page_size) < alignDown(end(PT_GNU_RELRO),
page_size)`.
In this case, rtld's mprotect call for PT_GNU_RELRO will apply to
unmapped
regions and lead to an error, e.g.

```
error while loading shared libraries: cannot apply additional memory protection after relocation: Cannot allocate memory
```

To fix the issue, add a padding section .relro_padding like mold, which
is contained in the PT_GNU_RELRO segment and the associated PT_LOAD
segment. The section also prevents strip from corrupting PT_LOAD program
headers.

.relro_padding has the largest `sortRank` among RELRO sections.
Therefore, it is naturally placed at the end of `PT_GNU_RELRO` segment
in the absence of `PHDRS`/`SECTIONS` commands.

In the presence of `SECTIONS` commands, we place .relro_padding
immediately before a symbol assignment using DATA_SEGMENT_RELRO_END (see
also https://reviews.llvm.org/D124656), if present.
DATA_SEGMENT_RELRO_END is changed to align to max-page-size instead of
common-page-size.

Some edge cases worth mentioning:

* ppc64-toc-addis-nop.s: when PHDRS is present, do not append
.relro_padding
* avoid-empty-program-headers.s: when the only RELRO section is .tbss,
it is not part of PT_LOAD segment, therefore we do not append
.relro_padding.

---

Close #65002: GNU ld from 2.39 onwards aligns the end of PT_GNU_RELRO to
a
max-page-size boundary (https://sourceware.org/PR28824) so that the last
page is
protected even if runtime_page_size > common-page-size.

In my opinion, losing protection for the last page when the runtime page
size is
larger than common-page-size is not really an issue. Double mapping a
page of up
to max-common-page for the protection could cause undesired VM waste.
Internally
we had users complaining about 2MiB max-page-size applying to shared
objects.

Therefore, the end of .relro_padding is padded to a common-page-size
boundary. Users who are really anxious can set common-page-size to match
their runtime page size.

---

17 tests need updating as there are lots of change detectors.
2023-09-14 10:33:11 -07:00
Sam McCall
21ab252f97
[dataflow] Add global invariant condition to DataflowAnalysisContext (#65949)
This records facts that are not sensitive to the current flow condition,
and should apply to all environments.

The motivating case is recording information about where a Value
originated, such as nullability:
 - we may see the same Value for multiple expressions (e.g. reads of the
   same field) in multiple environments (multiple blocks or iterations)
 - we want to record information only when we first see the Value
   (e.g. Nullability annotations on fields only add information if we
   don't know where the value came from)
 - this information should be expressible as a SAT condition
 - we must add this SAT condition to every environment where the
   Value may appear

We solve this by recording the information in the global condition.
This doesn't seem particularly elegant, but solves the problem and is
a fairly small and natural extension of the Environment.

Alternatives considered:
 - store the constraint directly as a property on the Value.
   But it's more composable for such properties to always be variables
   (AtomicBoolValue), and constrain them with SAT conditions.
 - add a hook whenever values are created, giving the analysis the
   chance to populate them.
   However the framework relies on/provides the ability to construct
   values in arbitrary places without providing the context such a hook
   would need, this would be a very invasive change.
2023-09-14 19:30:04 +02:00
Alex Langford
2f377c5bd7 [lldb][NFCI] Remove use of ConstString from UnixSignals
The majority of UnixSignals strings are static in the sense that they do
not change. The overwhelming majority of these strings are string
literals. Using ConstString to manage their lifetime does not make
sense. The only exception to this is one of the subclasses of
UnixSignals, for which I have created a StringSet local to that file
which will guarantee the lifetimes of these StringRefs.

As for the other benefits of ConstString, string uniqueness is not a
concern (as many of them are already string literals) and comparing
signal names and aliases should not be a hot path.

Differential Revision: https://reviews.llvm.org/D159011
2023-09-14 10:19:53 -07:00
Ye Luo
8c2da6bb7f
[libomptarget] document ActionFunctions in the amdgpu plugin. (#66397) 2023-09-14 12:18:49 -05:00
Jingu Kang
5ec9699c4d [MachineLICM] Handle Subloops
Following discussion on https://reviews.llvm.org/D154205, make MachineLICM pass
handle subloops with only visiting outermost loop's blocks once.

Differential Revision: https://reviews.llvm.org/D154205
2023-09-14 18:07:31 +01:00
Aart Bik
156a4ba9b4
[mlir][sparse] deprecate the convert{To,From}MLIRSparseTensor methods (#66304)
Rationale:
These libraries provided COO input and output at external boundaries
which, since then, has been generalized to the much more powerful pack
and unpack operations of the sparse tensor dialect.
2023-09-14 10:02:29 -07:00
Adrian Prantl
9dfc6d37da
Clean up test case (#66400) 2023-09-14 09:48:36 -07:00
Matt Arsenault
07acfe3a4d
ADT: Replace FPClassTest fabs with inverse_fabs and unknown_sign (#66390) 2023-09-14 19:46:53 +03:00
Shubham Sandeep Rastogi
0d0ab7600f Fix Dexter test broken with e6cc7b723f244f52663b6d67a5d94597109da1ef 2023-09-14 09:34:02 -07:00
Shubham Sandeep Rastogi
e6cc7b723f
[Dexter] Fix test failures on greendragon (#66299)
The issue with these test failures is that the dSYM was not being found
by lldb, which is why setting breakpoints was failing and lldb quit
without performing any steps. This change copies the dSYM to the same
temp directory that the executable is copied to.
2023-09-14 09:28:47 -07:00
Yinying Li
e2e429d994
[mlir][sparse] Migrate more tests to new syntax (#66309)
CSR:
`lvlTypes = [ "dense", "compressed" ]` to `map = (d0, d1) -> (d0 :
dense, d1 : compressed)`

CSC:
`lvlTypes = [ "dense", "compressed" ], dimToLvl = affine_map<(d0, d1) ->
(d1, d0)>` to `map = (d0, d1) -> (d1 : dense, d0 : compressed)`

This is an ongoing effort: #66146
2023-09-14 12:21:13 -04:00
Aaron Ballman
1db6b127a1 [C23] Remove N2713 from the list
This paper was obsoleted by the changes in N3138 and US-045
2023-09-14 12:08:51 -04:00
Nuno Lopes
5eabb022d2 [test][NFC] fix call in test with mismatch in call/decl types 2023-09-14 13:59:02 +01:00
Joel E. Denny
15a1d288ba
[docs] Add more details about Python formatting (#66141)
Describe the darker utility. Make it clear that black/darker's default
rules should be used. Add some examples.

See ongoing discussion at
<https://discourse.llvm.org/t/rfc-document-and-standardize-python-code-style/68257>.
2023-09-14 12:00:19 -04:00
David Green
74724902ba
[AArch64] Split Ampere1Write_Arith into rr/ri and rs/rx InstRWs. (#66384)
The ampere1 scheduling model uses IsCheapLSL predicates for ADDXri and
ADDWrr instructions, which only have 3 operands. In attempting to check
that the third is a shift, the predicate can attempt to access an out of
bounds operand, hitting an assert. This splits the rr/ri instructions
(which can never have shifts) from the rs/rx instructions to ensure they
both work correctly. Ampere1Write_1cyc_1AB was chosen for the rr/ir
instructions to match the cheap case.

This also sets CompleteModel = 0 for the ampere1 scheduling model, as at
runtime under debug it will attempt to check that as well as all
instructions having scheduling info, there is information for each
output operand.

DefIdx 1 exceeds machine model writes for
  renamable $w9, renamable $w8 = LDPWi renamable $x8, 0
(Try with MCSchedModel.CompleteModel set to false)incomplete machine
model
2023-09-14 16:29:30 +01:00
Ingo Müller
360c629024
[mlir][linalg][transform][python] Drop _get_op_result... from mix-ins. (#65726)
`_get_op_result_or_value` was used in mix-ins to unify the handling of
op results and values. However, that function is now called in the
generated constructors, such that doing so in the mix-ins is not
necessary anymore.
2023-09-14 17:24:16 +02:00
Matt Arsenault
ddc3346a6b
clang/AMDGPU: Fix accidental behavior change for __builtin_amdgcn_ldexph (#66340) 2023-09-14 18:15:44 +03:00
Duncan P. N. Exon Smith
7976bdb589
Resign as code owner of branch weights and block frequency
Somewhat overdue... it has been a few years since I stopped watching block frequency / branch weight patches actively, so I effectively stopped acting as code owner a while ago. Reflect the reality.

Still happy to help out; feel free to pull me in if you think I might have useful context!
2023-09-14 08:01:02 -07:00
Manos Anagnostakis
008f26b12e
[AArch64] New subtarget features to control ldp and stp formation (#66098)
On some AArch64 cores, including Ampere's ampere1 and ampere1a
architectures, load and store pair instructions are faster compared to
simple loads/stores only when the alignment of the pair is at least
twice that of the individual element being loaded.

Based on that, this patch introduces four new subtarget features, two
for controlling ldp and two for controlling stp, to cover the ampere1
and ampere1a alignment needs and to enable optional fine-grained control
over ldp and stp generation in general. The latter can be utilized by
another cpu, if there are possible benefits
with a different policy than the default provided by the compiler.

More specifically, for each of the ldp and stp respectively we have:

- disable-ldp/disable-stp: Do not emit ldp/stp.
- ldp-aligned-only/stp-aligned-only: Emit ldp/stp only if the source
pointer is aligned to at least double the alignment of the type.

Therefore, for -mcpu=ampere1 and -mcpu=ampere1a
ldp-aligned-only/stp-aligned-only become the defaults, because of the
benefit from the alignment, whereas for the rest of the cpus the default
behaviour of the compiler is maintained.
2023-09-14 16:58:39 +02:00
Slava Zakharin
b7d02d7e12
[flang] Select proper library APIs for derived type io. (#66327)
This patch syncs the logic inside `getInputFunc` that selects
the library API and the logic in `createIoRuntimeCallForItem`
that creates the input arguments for the library call.
There were cases where we selected `InputDerivedType` API
and passed only two arguments, and also we selected `InputDescriptor`
and passed three arguments.
It turns out we also were incorrectly selecting `OutputDescriptor`
in `getOutputFunc` (`test4` case in the new LIT test),
which caused runtime issues for output of a derived type
with descriptor components (due to the missing non-type-bound table).
2023-09-14 07:58:26 -07:00
Slava Zakharin
8730fe95d8
[flang] Do not finalize main program variables. (#66326) 2023-09-14 07:58:01 -07:00
Björn Pettersson
a0ce4384a6
[LICM] Simplify isLoadInvariantInLoop given opaque pointers (#65597)
Since we no longer support typed pointers in LLVM IR, the PtrASXTy
in isLoadInvariantInLoop was set to be equal to Addr->getType() (an
opaque ptr in the same address space). That made the loop looking
through bitcasts redundant.
2023-09-14 16:53:34 +02:00
Matthias Springer
aca9019be0
[mlir][transform] Check for invalidated iterators on payload IR mappings (#66369)
Add extra error checking (in debug mode) to detect cases where an
iterator on "direct" payload IR mappings is invalidated (due to elements
being removed). Such errors are hard to debug: they are often
non-deterministic; sometimes the program crashes, sometimes it produces
wrong results. Even when it crashes, the stack trace often points to
completely unrelated code locations.

Store a timestamp with each "direct" mapping. The timestamp is increased
whenever an operation is performed that invaldiates an iterator on that
mapping. A debug iterator is added that checks the timestamp as payload
IR is enumerated.
2023-09-14 16:34:32 +02:00
Martin Erhart
66aa9a2517
[mlir][bufferization] Implement BufferDeallocationopInterface for scf.forall.in_parallel (#66351)
The scf.forall.in_parallel terminator operation has a nested graph region with the NoTerminator trait. Such regions are not supported by the default implementations. Therefore, this commit adds a specialized implementation for
this operation which only covers the case where the nested region is empty.
This is because after bufferization, ops like tensor.parallel_insert_slice were already converted to memref operations residing int the scf.forall only and the nested region of scf.forall.in_parallel ends up empty.
2023-09-14 16:20:24 +02:00
Joel E. Denny
9e739fdb85
[lit] Fix some issues from --per-test-coverage (#65242)
D154280 (landed in 64d19542e78a in July, 2023) implements
`--per-test-coverage` (which can also be specified via 
`lit_config.per_test_coverage`).  However, it has a few issues, which
the current patch addresses:

1. D154280 implements `--per-test-coverage` only for the case that lit 
   is configured to use an external shell.  The current patch extends
   the implementation to lit's internal shell.

2. In the case that lit is configured to use an external shell,
   regardless of whether `--per-test-coverage` is actually specified,
   D154280 causes `%dbg(RUN: at line N)` to be expanded in RUN lines
   early and in a manner that is specific to sh-like shells.  As a
   result, later code in lit that expands it in a shell-specific
   manner is useless as there's nothing left to expand.  The current
   patch cleans up the implementation to avoid useless code.

3. Because of issue 2, D154280 corrupts support for windows `cmd` as
   an external shell (effectively comments out all RUN lines with
   `:`).  The current patch happens to fix that particular corruption
   by addressing issue 2.  However, D122569 (landed in 1041a9642ba0 in
   April, 2022) had already broken support for windows `cmd` as an
   external shell (discards RUN lines when expanding `%dbg(RUN: at
   line N)`).  The current patch does not attempt to fix that bug.
   For further details, see the PR discussion of the current patch.

The current patch addresses the above issues by implementing
`--per-test-coverage` before selecting the shell (internal or
external) and by leaving `%dbg(RUN: at line N)` unexpanded there.
Thus, it is expanded later in a shell-specific manner, as before
D154280.

This patch introduces `buildPdbgCommand` into lit's implementation to
encapsulate the process of building (or rebuilding in the case of the 
`--per-test-coverage` implementation) a full `%dbg(RUN: at line N)
cmd` line and asserting that the result matches `kPdbgRegex`.  It also
cleans up that and all other uses of `kPdbgRegex` to operate on the 
full line with `re.fullmatch` not `re.match`.  This change better
reflects the intention in every case, but it is expected to be NFC 
because `kPdbgRegex` ends in `.*` and thus avoids the difference
between `re.fullmatch` and `re.match`.  The only caveat is that `.*`
does not match newlines, but RUN lines cannot contain newlines
currently, so this caveat currently shouldn't matter in practice.

The original `--per-test-coverage` implementation avoided accumulating
`export LLVM_PROFILE_FILE={profile}` insertions across retries (due to
`ALLOW_RETRIES`) by skipping the insertion if `%dbg(RUN: at line N)` 
was not present and thus had already been expanded.  However, the 
current patch makes sure the insertions also happen for commands
without `%dbg(RUN: at line N)`, such as preamble commands or some
commands from other lit test formats.  Thus, the current patch
implements a different mechanism to avoid accumulating those
insertions (see code comments).
2023-09-14 10:08:20 -04:00