i.e. they cannot be unreachable from the entry (which usually indicate usage errors).
This change allows the removal of some nullptr checks.
Reviewed By: kuhar
Differential Revision: https://reviews.llvm.org/D88758
MWAITX doesn't touch EFLAGS so no pseudos should def EFLAGS.
The SAVE_EBX/RBX pseudos only needs to def the EBX register that
the expansion overwrites. The EAX and ECX registers are only read.
The pseudo emitted during isel that is used by the custom inserter
shouldn't have any implicit defs or uses since everything is in
vregs.
This reverts commit e9b87f43bde8b5f0d8a79c5884fdce639b12e0ca.
There are issues with macros generating macros without an obvious simple fix
so I'm going to revert this and try something different.
Apparently querying dereferenceability of array allocations is
being intentionally penalized (https://reviews.llvm.org/D41398),
so avoid using them in tests.
We were taking multiple pointer arguments in the builtin.
gcc accepts a single void*.
The cast from void* to _m128i* caused the IR generation to assume
the pointer was aligned.
Instead make the builtin take a single void*, emit i8* GEPs to
adjust then cast to <2 x i64>* and perform a store with align of 1.
This adds a helper to convert a VPRecipeBase pointer to a VPUser, for
recipes that inherit from VPUser. Once VPRecipeBase directly inherits
from VPUser this helper can be removed.
Summary: This patch implements the builtins for xvtdivdp, xvtdivsp, xvtsqrtdp, xvtsqrtsp.
The instructions correspond to the following builtins:
int vec_test_swdiv(vector double v1, vector double v2);
int vec_test_swdivs(vector float v1, vector float v2);
int vec_test_swsqrt(vector double v1);
int vec_test_swsqrts(vector float v1);
This patch depends on D88274, which fixes the bug in copying from CRRC to GPRC/G8RC.
Reviewed By: steven.zhang, amyk
Differential Revision: https://reviews.llvm.org/D88278
In the motivating case from https://llvm.org/PR47517
we create a node that does not get constant folded
before getNegatedExpression is attempted from some
other node, and we crash.
By moving the fold into SelectionDAG::simplifyFPBinop(),
we get the constant fold sooner and avoid the problem.
The case of a destination read between call and memcpy was not
covered anywhere (but is handled correctly).
However, a potentially throwing call between the call and the
memcpy appears to be miscompiled.
Preliminary patch for the next stage of PR45974 - we don't want to be creating 'padded' vectors on-the-fly at all in combineX86ShufflesRecursively, and only pad the source inputs if we have a definite match inside combineX86ShuffleChain.
This means that the inputs to combineX86ShuffleChain might soon be smaller than the final root value type, so we should ensure that isTargetShuffleEquivalent only matches with the inputs if they are the correct size.
We should avoid emitting MachineSDNodes from lowering.
We can use the the implicit def handling in InstrEmitter to avoid
manually copying from each xmm result register. We only need to
manually emit the copies for the implicit uses.
New projects (particularly out of tree) have a tendency to hijack the existing
llvm configuration options and build targets (add_llvm_library,
add_llvm_tool). This can lead to some confusion.
1) When querying a configuration variable, do we care about how LLVM was
configured, or how these options were configured for the out of tree project?
2) LLVM has lots of defaults, which are easy to miss
(e.g. LLVM_BUILD_TOOLS=ON). These options all need to be duplicated in the
CMakeLists.txt for the project.
In addition, with LLVM Incubators coming online, we need better ways for these
incubators to do things the "LLVM way" without alot of futzing. Ideally, this
would happen in a way that eases importing into the LLVM monorepo when
projects mature.
This patch creates some generic infrastructure in llvm/cmake/modules and
refactors MLIR to use this infrastructure. This should expand to include
add_xxx_library, which is by far the most complicated bit of building a
project correctly, since it has to deal with lots of shared library
configuration bits. (MLIR currently hijacks the LLVM infrastructure for
building libMLIR.so, so this needs to get refactored anyway.)
Differential Revision: https://reviews.llvm.org/D85140
Instead of emitting MachineSDNodes during lowering, emit X86ISD
opcodes. These opcodes will either be selected by tablegen
patterns or custom selection code.
Emitting MachineSDNodes during lowering is uncommon so this makes
things more consistent. It also allows selectAddr to be called to
perform address matching during instruction selection.
I had trouble getting tablegen to accept XMM0-XMM7 as results in
an isel pattern for the WIDE instructions so I had to use custom
instruction selection.
This diff refactors writeUniversalBinary and adds writeUniversalBinaryToBuffer.
This is a preparation for adding support for universal binaries to llvm-objcopy.
Test plan: make check-all
Differential revision: https://reviews.llvm.org/D88372
The signature of the ctor expects a MCRegister, but currently any
unsigned value can be converted to a MCRegister.
This patch checks that indeed the provided value is a physical register
only. We want to eventually stop implicitly converting unsigned or
Register to MCRegister (which is incorrect). The next step after this
patch is changing uses of MCRegUnitIterator to explicitly cast Register
or unsigned values to MCRegister. To that end, this patch also
introduces 2 APIs that make that conversion checked and explicit.
Differential Revision: https://reviews.llvm.org/D88705
When updating operands of a VPUser, we also have to adjust the list of
users for the new and old VPValues. This is required once we start
transitioning recipes to become VPValues.
We could either try to make SROA more picky to the new type
and/or prevent InstCombine from creating the original problem (converting load-stores to operate on ints),
and/or make InstCombine recover the situation by cleaning up all that cruft.
This makes the prologue match the windows canonical layout, for
cases without a frame pointer.
This can potentially be a slower (a longer dependency chain of the
sp register, and potentially one arithmetic operation more on some
cores), but gives notable size improvements.
The previous two commits shrinks a 166 KB xdata section by 49 KB,
and if the change from this commit is enabled, it shrinks the xdata
section by another 25 KB.
In total, since the start of the recent arm64 unwind info cleanups
and optimizations (since before commit 37ef743cbf3), the xdata+pdata
sections of the same test DLL has shrunk from 407 KB in total
originally, to 163 KB now.
Differential Revision: https://reviews.llvm.org/D88701
This saves one instruction per prologue/epilogue for any function with
an odd number of callee-saved GPRs, but more importantly, allows such
functions to match the packed unwind format.
Differential Revision: https://reviews.llvm.org/D88699
On windows, the callee saved registers in a canonical prologue are
ordered starting from a lower register number at a lower stack
address (with the possible gap for aligning the stack at the top);
this is the opposite order that llvm normally produces.
To achieve this, reverse the order of the registers in the
assignCalleeSavedSpillSlots callback, to get the stack objects
laid out by PrologEpilogInserter in the right order, and adjust
computeCalleeSaveRegisterPairs to lay them out from the bottom up.
This allows generated prologs more often to match the format that
allows the unwind info to be written as packed info.
Differential Revision: https://reviews.llvm.org/D88677
Do not instrument user-defined ELF sections (whose names resemble valid
C identifiers). They may have special use semantics and modifying them
may break programs. This is e.g. the case with NetBSD __link_set API
that expects these sections to store consecutive array elements.
Differential Revision: https://reviews.llvm.org/D76665
We can't use Use.Calls after its std::move()'d to TmpCalls as it will be in an undefined state. Instead, swap with the known empty map in TmpCalls so we can then safely emplace_back into the now empty Use.Calls.
Fixes clang static analyzer warning.
We were not accounting for the pointer offset when splitting a store from
a VMOVDRR node, which could lead to incorrect aliasing info. In this
case it is the fneg via integer arithmetic that gives us a store->load
pair that we started getting wrong.
Differential Revision: https://reviews.llvm.org/D88653
Add basic vector handling to recognizeBSwapOrBitReverseIdiom/collectBitParts - this works at the element level, all vector element operations must match (splat constants etc.) and there is no cross-element support (insert/extract/shuffle etc.).
If we're bswap'ing some bytes and zero'ing the remainder we can perform this as a bswap+mask which helps us match 'partial' bswaps as a first step towards folding into a more complex bswap pattern.
Reapplied with early-out if recognizeBSwapOrBitReverseIdiom collects a source wider than the result type.
Differential Revision: https://reviews.llvm.org/D88578