Now that getMemoryOpCost() correctly handles all the vector variants,
we should no longer hand-roll our own version of it, but use it directly.
The AVX512 variant probably needs a similar change,
but there it is less obvious.
This was initially landed in 69ed93a435,
but was reverted in 6b95fd199d
because the patch it depends on was reverted.
Instead of handling power-of-two sized vector chunks,
try handling the large vector in a stream mode,
decreasing the operational vector size
once it no longer works for the elements left to process.
Notably, this improves costs for overaligned loads - loading padding is fine.
This more directly tracks when we need to insert/extract the YMM/XMM subvector,
some costs fluctuate because of that.
This was initially landed in c02476f315,
but reverted in 5fddc3312b,
because the code made some very optimistic assumptions about invariants
that didn't hold in practice.
Reviewed By: RKSimon, ABataev
Differential Revision: https://reviews.llvm.org/D100684
D29668 enabled to avoid a useless copy of the argument value into an alloca if the caller places it in memory (as it often happens on x86) by directly forwarding the pointer to it. This optimization is illegal if the type contains padding bytes: if a truncating store into the alloca is replaced the upper bits are filled with garbage and produce code misbehaving at runtime.
Reviewed By: rnk
Differential Revision: https://reviews.llvm.org/D102153
Replace use of host floating types with operations on APFloat when it is
possible. Use of APFloat makes analysis more convenient and facilitates
constant folding in the case of non-default FP environment.
Differential Revision: https://reviews.llvm.org/D102672
Avoid the warning
/polly/lib/Support/RegisterPasses.cpp:833:3: warning: default label in switch which covers all enumeration values [-Wcovered-switch-default]
default:
^
since all cases are now handled.
Thanks to Luke Benes for reporting.
True is a bad default: the useful symbol names and `@GOTPCREL` are scrubbed.
Change the default and add global variable tests to x86-basic.ll
(renamed from x86_function_name.ll since we now also test variables).
I updated some tests to show the differences.
Updated LCPI regex to include Darwin style `LCPI_[0-9]+_[0-9]+` (no
leading dot).
Reviewed By: pengfei
Differential Revision: https://reviews.llvm.org/D102588
Dummy arguments of ENTRY statements in execution parts were
not being created as objects, nor were they being implicitly
typed.
When the symbol corresponding to an alternate ENTRY point
already exists (by that name) due to having been referenced
in an earlier call, name resolution used to delete the extant
symbol. This isn't the right thing to do -- the extant
symbol will be pointed to by parser::Name nodes in the parse
tree while no longer being part of any Scope.
Differential Review: https://reviews.llvm.org/D102948
The implementation and intent behind freeing the triple string here is the same
as LLVMGetDefaultTargetTriple (and any other owned c string returned from the C
API), so we should use LLVMDisposeMessage for to free the string for
consistency.
Patch by Mats Larsen -- thanks Mats!
Reviewed By: lhames
Differential Revision: https://reviews.llvm.org/D102957
D88631 added initial support for:
- -mstack-protector-guard=
- -mstack-protector-guard-reg=
- -mstack-protector-guard-offset=
flags, and D100919 extended these to AArch64. Unfortunately, these flags
aren't retained for LTO. Make them module attributes rather than
TargetOptions.
Link: https://github.com/ClangBuiltLinux/linux/issues/1378
Reviewed By: tejohnson
Differential Revision: https://reviews.llvm.org/D102742
These pass documents belong on the main pass page, and not generated as
top level pages.
Reviewed By: rriddle
Differential Revision: https://reviews.llvm.org/D102947
Prior to this change build with `-shared/-pie` and using TLS (but
without -shared-memory) would hit this assert:
"Currenly only a single data segment is supported in PIC mode"
This is because we were not including TLS data when merging data
segments. However, when we build without shared-memory (i.e. without
threads) we effectively lower away TLS into a normal active data
segment.. so we were ending up with two active data segments: the merged
data, and the lowered TLS data.
To fix this problem we can instead avoid combining data segments at
all when running in shared memory mode (because in this case all
segment initialization is passive). And then in non-shared memory
mode we know that TLS has been lowered and therefore we can can
and should combine all segments.
So with this new behavior we have two different modes:
1. With shared memory / mutli-threaded: Never combine data segments
since it is not necessary. (All data segments as passive already).
2. Wihout shared memory / single-threaded: Combine *all* data segments
since we treat TLS as normal data. (We end up with a single
active data segment).
Differential Revision: https://reviews.llvm.org/D102937
If multiple instances of the -arm-implicit-it option is passed to
the backend, it errors out.
Also fix cases where there are multiple -Wa,-mimplicit-it; the existing
tests indicate that the last one specified takes effect, while in
practice it passed double options, which didn't work as intended.
Differential Revision: https://reviews.llvm.org/D102812
This adds the {s,u,m}badaddr CSR aliases as well as the sptbr alias.
These are for compatibility with binutils. Furthermore, these are used
by the RISC-V Proxy Kernel and are required to enable building the Proxy
Kernel with the LLVM IAS.
The aliases here are deprecated. These are being introduced in order to
provide a compatibility story for the existing GNU toolchain, which
still supports the deprecated spelling in the assembler. However, in
order to encourage the migration of existing coding, we provide warnings
indicating that the aliased CSRs are deprecated and should be replaced.
Differential Revision: https://reviews.llvm.org/D101919
Reviewed By: Craig Topper
Warnings on deprecated api cannot be suppressed if the library is not initialized.
With this change it is possible to set KMP_WARNINGS=false to suppress the warnings.
Differential Revision: https://reviews.llvm.org/D102676
The COFF driver produces an ABSOLUTE relocation base for an ADDR32
relocation type and the system is 64 bits (machine=AMD64). The
relocation information won't be added in the output and could
produce an incorrect address access during run-time. This change
set checks if the relocation type is IMAGE_REL_AMD64_ADDR32 and
if so, adds the relocated symbol as IMAGE_REL_BASED_HIGHLOW base.
Differential Revision: https://reviews.llvm.org/D96619
These checks already exist as asserts when creating the corresponding
instruction. Anybody creating these instructions already need to take
care to not break these checks.
Move the checks for success/failure ordering in cmpxchg from the
verifier to the LLParser and BitcodeReader plus an assert.
Add some tests for cmpxchg ordering. The .bc files are created from the
.ll files with an llvm-as with these checks disabled.
Reviewed By: dblaikie
Differential Revision: https://reviews.llvm.org/D102803
This commit alphabetizes all includes in libcxx. This is a NFC.
This can also serve as a pseudo "announcement" for how we should order these headers going forward (note: double underscores go before other headers).
Differential Revision: https://reviews.llvm.org/D102941
This revision completes the "dimension ordering" feature
of sparse tensor types that enables the programmer to
define a preferred order on dimension access (other than
the default left-to-right order). This enables e.g. selection
of column-major over row-major storage for sparse matrices,
but generalized to any rank, as in:
dimOrdering = affine_map<(i,j,k,l,m,n,o,p) -> (p,o,j,k,i,l,m,n)>
Reviewed By: bixia
Differential Revision: https://reviews.llvm.org/D102856
This is the second to last one! Based on D101396. Depends on D100255. Refs D101079 and D101193.
Differential Revision: https://reviews.llvm.org/D101476
The option was used during the initial bringup, but it does not add any
value at this point. Remove it.
Reviewed By: anemet
Differential Revision: https://reviews.llvm.org/D102930
The check for the TO flag when processing firstprivates is missing. As a result,
sometimes the device copy of a firstprivate never gets initialized. Currectly we
try to force lambda structs to be allocated immediately by marking them as a
non-firstprivate, so that PrivateArgumentManagerTy::addArg allocates memory for
them immediately. However, calling addArg with IsFirstPrivate=false makes the
function skip initializing the device copy. Whether an argument is firstprivate
and whether we need to allocate memory immediately are not synonyms, so this
patch introduces one more control variable for immediate allocation and sets it
apart from initialization.
Differential Revision: https://reviews.llvm.org/D102890
Now that gtest has been updated to 1.10 which supports GTEST_SKIP, we can use
that over return;
Patch by Mats Larsen. Thanks Mats!
Reviewed By: lhames, ikudrin
Differential Revision: https://reviews.llvm.org/D102710
BTVER2 has a weaker f64 multiplier that other AVX1-era targets, so we need to bump the worst case cost slightly - llvm-mca reports the new vectorization in simplebb is beneficial on btver2, bdver2 and sandybridge AVX1 targets
Add link to documentation for "AMD Instinct MI100 Instruction Set
Architecture" to AMDGPUUsage.rst.
Reviewed By: kzhuravl, rampitec, dp
Differential Revision: https://reviews.llvm.org/D102859