There's an assert in LUI+SH*ADD+ADDI materialization that makes sure the
lower 12 bits aren't zero since that case should have been handled as
LUI+ADDI+SH*ADD. But nothing prevented the LUI+SH*ADD+ADDI checks from
running after the earlier code handled it.
The sequence would be the same length or longer so it wouldn't replace
the earlier sequence, but the assert happened before that was checked.
The vector holding the sequence also wasn't reset before the second
check so that guaranteed the sequence would never be found to be
shorter.
This patch fixes this by only trying the second expansion when the
earlier fails.
Fixes PR54812.
Reviewed By: benshi001
Differential Revision: https://reviews.llvm.org/D123406
This patch extends the scope of VPlan to also model the pre-header.
The pre-header can be used to place recipes that should be code-gen'd
outside the loop, like SCEV expansion.
Depends on D121623.
Reviewed By: Ayal
Differential Revision: https://reviews.llvm.org/D121624
When we fold vselect(cond, pshufb(x), pshufb(y)) -> or(pshufb(x), pshufb(y)), ensure we convert all undef elements to zero elements - this should help us expose more known zero elements for deeper chains of these cases.
Noticed while triaging Issue #54819
In theory, constructors can take arguments when called via .init_array
where at least glibc passes in (argc, argv, envp). This isn't used in
the generated code and if it was, the first argument should be an
integer, not a pointer. For destructors registered via atexit, the
function should never take an argument.
Differential Revision: https://reviews.llvm.org/D123370
znver2 is a mainly a search+replace of the znver1 model, but for no reason some lines have been moved around - try to keep these in sync (no actual changes in the models).
Use a specialized "buffer" to count the number of insertions instead of
using a `string` as storage type.
Depends on D110497.
Reviewed By: ldionne, #libc
Differential Revision: https://reviews.llvm.org/D110498
Instead of writing every character directly into the container by using
a `back_insert_iterator` the data is buffered in an `array`. This buffer
is then inserted to the container by calling its `insert` member function.
Since there's no guarantee every container's `insert` behaves properly
containers need to opt-in to this behaviour. The appropriate standard
containers opt-in to this behaviour.
This change improves the performance of the format functions that use a
`back_insert_iterator`.
Depends on D110495
Reviewed By: ldionne, vitaut, #libc
Differential Revision: https://reviews.llvm.org/D110497
(With C++ exceptions, `clang++ --target=mips64{,el}-linux-gnu -fpie -pie
-fuse-ld=lld` has link errors (lld does not implement some strange R_MIPS_64
.eh_frame handling in GNU ld). However, sanitizer-x86_64-linux-qemu used this to
build ScudoUnitTests. Pined ScudoUnitTests to -no-pie.)
Default the option introduced in D113372 to ON to match all(?) major Linux
distros. This matches GCC and improves consistency with Android and linux-musl
which always default to PIE.
Note: CLANG_DEFAULT_PIE_ON_LINUX may be removed in the future.
Differential Revision: https://reviews.llvm.org/D120305
This keeps the test behavior unchanged when CLANG_DEFAULT_PIE_ON_LINUX switches
to ON by default.
Note: current clang --target=mips64el-linux-gnu -fpie -pie -fuse-ld=lld
does not link with C++ exceptions, using -pie would lead to
```
ld.lld: error: cannot preempt symbol: DW.ref.__gxx_personality_v0
...
ld.lld: error: relocation R_MIPS_64 cannot be used against local symbol; recompile with -fPIC
...
```
when linking `ScudoUnitTests`: https://lab.llvm.org/buildbot/#/builders/169/builds/7311/steps/18/logs/stdio
The information about OpenMP/OpenACC declarative directives in modules
should be carried in export mod files. This supports export OpenMP
Threadprivate directive and import OpenMP declarative directives.
Reviewed By: kiranchandramohan
Differential Revision: https://reviews.llvm.org/D120396
The previous implementation of UnwindInfoSection materialized
all the compact unwind entries & applied their relocations, then parsed
the resulting data to generate the final unwind info. This design had
some unfortunate conseqeuences: since relocations can only be applied
after their referents have had addresses assigned, operations that need
to happen before address assignment must contort themselves. (See
{D113582} and observe how this diff greatly simplifies it.)
Moreover, it made synthesizing new compact unwind entries awkward.
Handling PR50956 will require us to do this synthesis, and is the main
motivation behind this diff.
Previously, instead of generating a new CompactUnwindEntry directly, we
would have had to generate a ConcatInputSection with a number of
`Reloc`s that would then get "flattened" into a CompactUnwindEntry.
This diff introduces an internal representation of `CompactUnwindEntry`
(the former `CompactUnwindEntry` has been renamed to
`CompactUnwindLayout`). The new CompactUnwindEntry stores references to
its personality symbol and LSDA section directly, without the use of
`Reloc` structs.
In addition to being easier to work with, this diff also allows us to
handle unwind info whose personality symbols are located in sections
placed after the `__unwind_info`.
Reviewed By: #lld-macho, oontvoo
Differential Revision: https://reviews.llvm.org/D123276
This reverts commit 3f0587d0c668202bb89d29a25432aa290e551a31.
Not all tests pass after a few rounds of fixes.
I spot one failure that std::shuffle (potentially different results with
different STL implementations) was misused and replaced it with llvm::shuffle,
but there appears to be another failure in a Windows build.
The latest failure is reported on https://reviews.llvm.org/D121556#3440383
This reverts commit 2a2149c754f96376ddf8fed248102dd8e6092a22.
This reverts commit 8d7595be1dd41d7f7470ec90867936ca5e4e0d82.
This reverts commit e2e6899452998932b37f0fa9e66d104a02abe3e5.
If this doesn't work, I'll revert the whole thing.
D67148 has removed TTI::getNumberOfRegisters(bool Vector) and
started to call TTI::getNumberOfRegisters(unsigned ClassID) from
the LoopVectorize. This has resulted in an unrestricted vectorization
on AMDGPU blowing up register pressure.
Differential Revision: https://reviews.llvm.org/D122850
Clang is adding a feature to ObjC code generation, where instead of calling
objc_msgSend directly with an object & selector, it generates a stub that
gets passed only the object and the stub figures out the selector.
This patch adds support for following that dispatch method into the implementation
function.
AtomicExpandPass uses this variable to determine emitting libcalls or not. The default value is 1024 and if we don't specify it for PPC64 explicitly, AtomicExpandPass won't emit `__atomic_*` libcalls for those target unable to inline atomic ops and finally the backend emits `__sync_*` libcalls. Thanks @efriedma for pointing it out.
Reviewed By: efriedma
Differential Revision: https://reviews.llvm.org/D122868
Putting __support/FPUtil/x86_64/FMA.h in `hdrs` will trigger a
compilation action for that header, and it will always `#error` out for
non-FMA targets. Move these platform-specific headers that are
conditionally included to `textual_hdrs` instead.
Make 16-byte atomic type aligned to 16-byte on PPC64, thus consistent with GCC. Also enable inlining 16-byte atomics on non-AIX targets on PPC64.
Reviewed By: hubert.reinterpretcast
Differential Revision: https://reviews.llvm.org/D122377
Simply returning will report the test as PASSED when it didn't
really do anything. SKIPPED is the correct result for these.
Found by the Rotten Green Tests project.
When a "keyword" token like __restrict was present in a macro condition,
modernize-macro-to-enum would assert in non-release builds. However,
even for a "keyword" token, calling getIdentifierInfo()->getName() would
retrieve the text of the token, which is what we want. Our intention is
to scan names that appear in conditional expressions in potential enum
clusters and invalidate those clusters if they contain the name.
Also, guard against "raw identifiers" appearing as potential enums.
This shouldn't happen, but it doesn't hurt to generalize the code.
Differential Revision: https://reviews.llvm.org/D123349Fixes#54775