The header was split in memref/no-memref section, but various methods
slipped in at the bottom at the wrong place
Reviewed By: Peiming, yinying-lisa-li
Differential Revision: https://reviews.llvm.org/D159218
The GPU has the ability to sleep for very short periods of time. We can
map this to the existing `nanosleep` utility. This patch maps the
nanosleep utility to the existing hardware instructions as best as
possible.
Depends on D159118
Reviewed By: JonChesterfield, sivachandra
Differential Revision: https://reviews.llvm.org/D159225
https://reviews.llvm.org/D157280 enabled `-fprofile-update` for `-fprofile-generate`, but omitted adding `-latomic` to the linker command on AIX. This omission causes linking to fail due to an undefined symbol. This patch fixes the link error.
Reviewed By: w2yehia
Differential Revision: https://reviews.llvm.org/D159137
See issue #64166 for more information about the layering issue.
The PPCMCTargetDesc library was including CodeGen headers such as
PPCInstrInfo.h and calling inline functions in them. This doesn't work
in the Bazel build, and is error-prone. If the inline function moves to
a cpp file, it will result in linker errors.
To address the issue, I moved several inline functions to
PPCMCTargetDesc.cpp, and declared them in the PPC namespace in
PPCMCTargetDesc.h, which seemed like the most straightforward fix.
Differential Revision: https://reviews.llvm.org/D156488
Some older gcc toolchains don't define these on 32 bit platforms. This
is a problem for pigweed which uses an older gcc toolchain and targets
32 bit.
Differential Revision: https://reviews.llvm.org/D157112
parseSourceString does not require null-terminated string, hence
requirement on memory buffer can be relaxed.
Differential Revision: https://reviews.llvm.org/D159214
A followon to https://reviews.llvm.org/D158237 ,
where this text can print stdout text when run
under address-sanitizer, and the test harness
does not expect any output, resulting in a test
failure on a sanitizer CI bot.
Summary:
We should check for the GPU architectures first, since `__linux__` can
be set potentially during these compilations. Also the test needs to be
a hermetic test.
With `%/t`, we can now compare `-###` output against the installtion dir
without backslash escaping getting in the way. Having to check against
the doubled-up backslashes is ugly, but it's the pattern used in lots of
tests (including for things like debug info where an `-###` alternative
that didn't escape backslashes wouldn't help).
This patch implements the `clock()` function on the GPU. This function
is supposed to return a timestamp that can be converted into seconds
using the `CLOCKS_PER_SEC` macro. The GPU has a fixed frequency timer
that can be used for this purpose. However, there are some
considerations.
First is that AMDGPU does not have a statically known fixed frequency. I
know internally that the gfx10xx and gfx11xx series use a 100 MHz clock
which will probably remain for the future. Gfx9xx typically uses a 25
MHz clock except for the Vega 10 GPU. The only way to know for sure is
to look it up from the runtime. For this purpose, I elected to default
it to some known values and assign these to an exteranlly visible symbol
that can be initialized if needed. If we do not have a good guess we
just return zero.
Second is that the `CLOCKS_PER_SEC` macro only gives about a microsecond
of resolution. POSIX demands that it's 1,000,000 so it's best that we
keep with this tradition as almost all targets seem to respect this. The
reason this is important is because on the GPU we will almost assuredly
be copying the host's macro value (see the wrapper header) so we should
go with the POSIX version that's most likely to be set. (We could
probably make a warning if the included header doesn't match the
expected value).
Reviewed By: jdoerfert
Differential Revision: https://reviews.llvm.org/D159118
This adds some more extensive test coverage for fdiv through global isel,
switching the opcodes to use the more complete ActionDefinitions to handle more
cases and moving it into the position of the existing code which is no longer
needed.
When scanning over TLS regions, we attempt to check if one of the regions is
one of the thread_local allocator caches which would be located in one of the
TLS blocks pointer to by the DTV. This is to prevent marking a pointer that was
allocated by the primary allocator (from a thread_local cache) as reachable. The
check is a simple bounds check to see if the allocator cache is within the
bounds of one of the TLS block we're iterating over, but it looks like the check
for the end of the cache is slightly incorrect.
Differential Revision: https://reviews.llvm.org/D156015
Add a MemProfReader base class which can be used directly where
symbolization and processing a raw profile is unnecessary.
Reviewed By: tejohnson
Differential Revision: https://reviews.llvm.org/D159141
Replaces some 600 EXPECT_EQ() to verifyFormat() or verifyNoChange() in
FormatTest.cpp because the former neither checks stability of formatting
nor tests formatting C++ as Objective C.
Also marks dozens of unstable test cases with FIXME comments.
Differential Revision: https://reviews.llvm.org/D159051
Summary:
On AIX OS, clang may use llvm-nm to export the symbols from all input files (see 515c435e37/clang/lib/Driver/ToolChains/AIX.cpp (L236)). However, the clang command-line may include import files (identified by them starting with #!). llvm-nm previously reported "invalid object file" errors for import files, meaning that the clang driver would fail to link when import files are included this way.
In this patch, llvm-nm is changed to ignore import files when the --export-symbol option, meaning that clang will now succeed in this case.
For more information about AIX import files, see https://www.ibm.com/docs/en/aix/7.3?topic=l-ld-command
Reviewers: Hubert Tong, James Henderson, MaskRay, Stephen Peckham
Differential Revision: https://reviews.llvm.org/D158004
This re-implements the special casing we had in lowerScalarSplat as a DAG combine. As can be seen in the tests, this ends up triggering in a bunch more cases.
The semantically interesting bit of this change is the use of the implicit truncate semantics for when XLEN > SEW. We'd already been doing this for vmv.v.x, but this change extends e.g. the constant matching to make the same assumption about vmv.s.x. Per my reading of the specification, this should be fine, and if anything, is more obviously true of vmv.s.x than vmv.v.x.
Differential Revision: https://reviews.llvm.org/D158874
The RSS code is not very useful and can be replicated by using
ulimit. Remove it and remove the options associated with it.
Reviewed By: Chia-hungDuan
Differential Revision: https://reviews.llvm.org/D159155
We'd discussed this in the original set of patches months ago, but decided against it. I think we should reverse ourselves here as the code is significantly more readable, and we do pick up cases we'd missed by not calling the appropriate helper routine.
Differential Revision: https://reviews.llvm.org/D158854
stdarg.h and stddef.h have to be textual headers in their upcoming modules to support their `__needs_xxx` macros. That means that they won't get precompiled into their modules' pcm, and instead their declarations will go into every other pcm that uses them. For now that's ok since the type merger can handle the declarations in these headers, but it's suboptimal at best. Make separate headers for all of the pieces so that they can be properly modularized.
Reviewed By: aaron.ballman, ChuanqiXu
Differential Revision: https://reviews.llvm.org/D158709
- Allow the definition of synthetic formatters in C++ even when LLDB is built without python scripting support.
- Fix linking problems with the CXXSyntheticChildren
Differential Revision: https://reviews.llvm.org/D158010
There are really two rounding modes, so only return the standard
values if both modes are the same. Otherwise, return a bitmask
representing the two modes.
Annoyingly the register doesn't use the same values as FLT_ROUNDS. Use
a simple integer table we can shift into to convert.
https://reviews.llvm.org/D153158
Old links pointed out to old domain, and then redirected to correct
one, but to wrong pages. Changed links from old to new domain and page.
Fixes: #65064
In particular, high LMULs, constant offsets within high LMUL, and types which require splitting. Note that most of these are way off with current lowering.
This patch makes structural-hash-detailed.ll look at diffs rather than
specific values to make the test more robust against the underlying
hashing implementation while still testing that the hash value is
deterministic across multiple runs.
Reviewed By: aeubanks
Differential Revision: https://reviews.llvm.org/D159154
Only pass it +unaligned-scalar-mem/-unaligned-scalar-mem if the
user has passed one of the alignment options.
This allows us to add unaligned-scalar-mem as a feature on CPUs
that support it.
Reviewed By: wangpc
Differential Revision: https://reviews.llvm.org/D159145
For function multi-versioning using the target or target_clones
function attributes, currently we incorrectly set comdat for internal
linkage resolvers. This is problematic for ELF linkers
as GRP_COMDAT deduplication will kick in even with STB_LOCAL signature
(https://groups.google.com/g/generic-abi/c/2X6mR-s2zoc
"GRP_COMDAT group with STB_LOCAL signature").
In short, two `__attribute((target_clones(...))) static void foo()`
in two translation units will be deduplicated. Fix this.
Fix#65114
Reviewed By: erichkeane
Differential Revision: https://reviews.llvm.org/D158963
The motivation for this patch is that many code bases use exception handling. As GPUs are not expected to support exception handling in the near future, we can experiment with compiling the code for GPU targets anyway. This will
allow us to run the code, as long as no exception is thrown.
The overall idea is very simple:
- If a throw expression is compiled to AMDGCN or NVPTX, it is replaced with a trap during code generation.
- If a try/catch statement is compiled to AMDGCN or NVPTX, we generate code for the try statement as if it were a basic block.
With this patch, the compilation of the following example
```
int gaussian_sum(int a,int b){
if ((a + b) % 2 == 0) {throw -1;};
return (a+b) * ((a+b)/2);
}
int main(void) {
int gauss = 0;
#pragma omp target map(from:gauss)
{
try {
gauss = gaussian_sum(1,100);
}
catch (int e){
gauss = e;
}
}
std::cout << "GaussianSum(1,100)="<<gauss<<std::endl;
#pragma omp target map(from:gauss)
{
try {
gauss = gaussian_sum(1,101);
}
catch (int e){
gauss = e;
}
}
std::cout << "GaussianSum(1,101)="<<gauss<<std::endl;
return (gauss > 1) ? 0 : 1;
}
```
with offloading to `gfx906` results in
```
./bin/target_try_minimal_fail
GaussianSum(1,100)=5050
AMDGPU fatal error 1: Received error in queue 0x155555506000: HSA_STATUS_ERROR_EXCEPTION: An HSAIL operation resulted in a hardware exception.
zsh: abort (core dumped)
```
Reviewed By: jdoerfert
Differential Revision: https://reviews.llvm.org/D153924
An execute-only target disallows data access to code sections.
-fsanitize=function and -fsanitize=kcfi instrument indirect function
calls to load a type hash before the function label. This results in a
non-execute access to the code section and a runtime error.
To solve the issue, -fsanitize=function should not be included in any
check group (e.g. undefined) on an execute-only target. If a user passes
-fsanitize=undefined, there is no error and no warning. However, if the
user explicitly passes -fsanitize=function or -fsanitize=kcfi on an
execute-only target, an error will be emitted.
Fixes: https://github.com/llvm/llvm-project/issues/64931.
Reviewed By: MaskRay, probinson, simon_tatham
Differential Revision: https://reviews.llvm.org/D158614