This is the 32-bit mode version of LEAVE64. It should be at least somewhat similar to LEAVE64.
The Sandy Bridge version was missing a load port use.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@329347 91177308-0d34-0410-b5e6-96231b3b80d8
We were forcing the latency of these instructions to 5 cycles, but every other scheduler model had them as 1 cycle. I'm sure I didn't get everything, but this gets a big portion.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@329339 91177308-0d34-0410-b5e6-96231b3b80d8
This patch adds a way for users to create their own custom sections to
be added to wasm files. At the LLVM IR layer, they are defined through
the "wasm.custom_sections" named metadata. The expected use case for
this is bindings generators such as wasm-bindgen.
Patch by Dan Gohman
Differential Revision: https://reviews.llvm.org/D45297
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@329315 91177308-0d34-0410-b5e6-96231b3b80d8
A noreturn nounwind function can be expected to never return in any way, and by
never returning it will also never have to restore any callee-saved registers
for its caller. This makes it possible to skip spills of those registers during
function entry, saving some stack space and time in the process. This is rather
useful for embedded targets with limited stack space.
Should fix PR9970.
Patch by myeisha (pmb).
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@329287 91177308-0d34-0410-b5e6-96231b3b80d8
It's failing on the bots and I'm not sure why.
This reverts:
[X86] Synchronize the SchedRW on some EVEX instructions with their VEX equivalents.
[X86] Use WriteFShuffle256 for VEXTRACTF128 to be consistent with VEXTRACTI128 which uses WriteShuffle256.
[X86] Remove some InstRWs for plain store instructions on Sandy Bridge.
[X86] Auto-generate complete checks. NFC
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@329256 91177308-0d34-0410-b5e6-96231b3b80d8
We were forcing the latency of these instructions to 5 cycles, but every other scheduler model had them as 1 cycle. I'm sure I didn't get everything, but this gets a big portion.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@329252 91177308-0d34-0410-b5e6-96231b3b80d8
The implementation of shadow call stack on aarch64 is quite different to
the implementation on x86_64. Instead of reserving a segment register for
the shadow call stack, we reserve the platform register, x18. Any function
that spills lr to sp also spills it to the shadow call stack, a pointer to
which is stored in x18.
Differential Revision: https://reviews.llvm.org/D45239
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@329236 91177308-0d34-0410-b5e6-96231b3b80d8
The BSWAP64r version is 2 uops and BSWAP32r is only 1 uop. The regular expressions also looked for a non-existant BSWAP16r.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@329211 91177308-0d34-0410-b5e6-96231b3b80d8
Summary:
These new image intrinsics contain the texture type as part of
their name and have each component of the address/coordinate as
individual parameters.
This is a preparatory step for implementing the A16 feature, where
coordinates are passed as half-floats or -ints, but the Z compare
value and texel offsets are still full dwords, making it difficult
or impossible to distinguish between A16 on or off in the old-style
intrinsics.
Additionally, these intrinsics pass the 'texfailpolicy' and
'cachectrl' as i32 bit fields to reduce operand clutter and allow
for future extensibility.
v2:
- gather4 supports 2darray images
- fix a bug with 1D images on SI
Change-Id: I099f309e0a394082a5901ea196c3967afb867f04
Reviewers: arsenm, rampitec, b-sumner
Subscribers: kzhuravl, wdng, yaxunl, dstuttard, tpr, llvm-commits, t-tye
Differential Revision: https://reviews.llvm.org/D44939
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@329166 91177308-0d34-0410-b5e6-96231b3b80d8
Summary:
When an i1-value is defined inside of a loop and used outside of it, we
cannot simply use the SGPR bitmask from the loop's last iteration.
There are also useful and correct cases of an i1-value being copied between
basic blocks, e.g. when a condition is computed outside of a loop and used
inside it. The concept of dominators is not sufficient to capture what is
going on, so I propose the notion of "lane-dominators".
Fixes a bug encountered in Nier: Automata.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=103743
Change-Id: If37b969ddc71d823ab3004aeafb9ea050e45bd9a
Reviewers: arsenm, rampitec
Subscribers: kzhuravl, wdng, mgorny, yaxunl, dstuttard, tpr, llvm-commits, t-tye
Differential Revision: https://reviews.llvm.org/D40547
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@329164 91177308-0d34-0410-b5e6-96231b3b80d8
Recommitting rL321259. Previosuly this caused an issue with PPCBE but
I didn't receieve a reproducer and didn't have the time to follow up.
If the issue appears again, please provide a reproducer so I can fix
it.
Original commit message:
If the SRL node is only used by an AND, we may be able to set the
ExtVT to the width of the mask, making the AND redundant. To support
this, another check has been added in isLegalNarrowLoad which queries
whether the load is valid.
Differential Revision: https://reviews.llvm.org/D41350
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@329160 91177308-0d34-0410-b5e6-96231b3b80d8
Summary:
The ShadowCallStack pass instruments functions marked with the
shadowcallstack attribute. The instrumented prolog saves the return
address to [gs:offset] where offset is stored and updated in [gs:0].
The instrumented epilog loads/updates the return address from [gs:0]
and checks that it matches the return address on the stack before
returning.
Reviewers: pcc, vitalybuka
Reviewed By: pcc
Subscribers: cryptoad, eugenis, craig.topper, mgorny, llvm-commits, kcc
Differential Revision: https://reviews.llvm.org/D44802
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@329139 91177308-0d34-0410-b5e6-96231b3b80d8
This commit is similar to r329120, but uses the existing getUsesRedZone() function
in X86MachineFunctionInfo. This teaches the outliner to look at whether or not a
function *truly* uses a redzone instead of just the noredzone attribute on a
function.
Thus, after this commit, it's possible to outline from x86 without using
-mno-red-zone and still get outlining results.
This also adds a new test for the new redzone behaviour.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@329134 91177308-0d34-0410-b5e6-96231b3b80d8
Summary: There are no packed instructions for min3 or max3. So, performMinMaxCombine should not optimize vectors of f16 to min3/max3.
Author: FarhanaAleen
Reviewed By: arsenm
Subscribers: llvm-commits, AMDGPU
Differential Revision: https://reviews.llvm.org/D45219
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@329131 91177308-0d34-0410-b5e6-96231b3b80d8
This patch adds a hasRedZone() function to AArch64MachineFunctionInfo. It
returns true if the function is known to use a redzone, false if it is known
to not use a redzone, and no value otherwise.
This removes the requirement to pass -mno-red-zone when outlining for AArch64.
https://reviews.llvm.org/D45189
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@329120 91177308-0d34-0410-b5e6-96231b3b80d8
The linkage type on outlined functions was private before. This meant that if
you set a breakpoint in an outlined function, the debugger wouldn't be able to
give a sane name to the outlined function.
This commit changes the linkage type to internal and updates any tests that
relied on the prefixes on the names of outlined functions.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@329116 91177308-0d34-0410-b5e6-96231b3b80d8
We don't constant fold any of these, but we could...but if we
do, we must produce the right answer.
Unlike the IR fptosi instruction or its DAG node counterpart
ISD::FP_TO_SINT, these are not undef for an out-of-range input.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@329100 91177308-0d34-0410-b5e6-96231b3b80d8
fact use regular expression syntax to use regular expressions.
Should restore the bots. Sorry for the noise on this test.
Thanks to Philip for spotting the bug!
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@329057 91177308-0d34-0410-b5e6-96231b3b80d8
This adds the basic test cases from all the EFLAGS bugs in more direct
forms. It also switches to generated check lines, and includes both
32-bit and 64-bit variations.
No functionality changing here, just setting things up to have a nice
clean asm diff in my EFLAGS patch.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@329056 91177308-0d34-0410-b5e6-96231b3b80d8
do explicit scrubbing of the offsets of stack spills and reloads.
You can always turn this off in order to test specific stack slot usage.
We were already hiding most of this, but the new logic hides it more
generically. Notably, we should effectively hide stack slot churn in
functions that have a frame pointer now, and should also hide it when
changing a function from stack pointer to frame pointer. That transition
already changes enough to be clearly noticed in the test case diff,
showing *every* spill and reload is really noisy without benefit. See
the test case I ran this on as a classic example.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@329055 91177308-0d34-0410-b5e6-96231b3b80d8
Commit 37962a331c ("bpf: Improve expanding logic in LowerSELECT_CC")
intended to improve code quality for certain jmp conditions. The
commit, however, has a couple of issues:
(1). In code, just swap is not enough, ConditionalCode CC
should also be swapped, otherwise incorrect code will
be generated.
(2). The ConditionalCode swap should be subject to
getHasJmpExt(). If getHasJmpExt() is False, certain
conditional codes will not be supported and swap
may generate incorrect code.
The original goal for this patch is to optimize jmp operations
which does not have JmpExt turned on. If JmpExt is on,
better code could be generated. For example, the test
select_ri.ll is introduced to demonstrate the optimization.
The same result can be achieved with -mcpu=v2 flag.
Signed-off-by: Yonghong Song <yhs@fb.com>
Acked-by: Alexei Starovoitov <ast@kernel.org>
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@329043 91177308-0d34-0410-b5e6-96231b3b80d8
Just adds basic block labels and tidies up where comments go in the test
case and then generates fresh CHECK lines with the script. This way, the
check lines are much easier to maintain. They were already close to this
but not quite there.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@329040 91177308-0d34-0410-b5e6-96231b3b80d8
This patch fixes PR36885: clang++ generates unaligned stub symbol
holding a pointer.
Patch by Rahul Chaudhry!
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@329030 91177308-0d34-0410-b5e6-96231b3b80d8
If a load follows a store and reloads data that the store has written to memory, Intel microarchitectures can in many cases forward the data directly from the store to the load, This "store forwarding" saves cycles by enabling the load to directly obtain the data instead of accessing the data from cache or memory.
A "store forward block" occurs in cases that a store cannot be forwarded to the load. The most typical case of store forward block on Intel Core microarchiticutre that a small store cannot be forwarded to a large load.
The estimated penalty for a store forward block is ~13 cycles.
This pass tries to recognize and handle cases where "store forward block" is created by the compiler when lowering memcpy calls to a sequence
of a load and a store.
The pass currently only handles cases where memcpy is lowered to XMM/YMM registers, it tries to break the memcpy into smaller copies.
breaking the memcpy should be possible since there is no atomicity guarantee for loads and stores to XMM/YMM.
Differential revision: https://reviews.llvm.org/D41330
Change-Id: Ib48836ccdf6005989f7d4466fa2035b7b04415d9
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@328973 91177308-0d34-0410-b5e6-96231b3b80d8
Fixes most of PR36898. Still need to fix the 512-bit instructions, but Agner's tables don't have those.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@328960 91177308-0d34-0410-b5e6-96231b3b80d8