This patch adds unscaled loads and sign-extend loads to the TII
getMemOpBaseRegImmOfs API, which is used to control clustering in the MI
scheduler. This is done to create more opportunities for load pairing. I've
also added the scaled LDRSWui instruction, which was missing from the scaled
instructions. Finally, I've added support in shouldClusterLoads for clustering
adjacent sext and zext loads that too can be paired by the load/store optimizer.
Differential Revision: http://reviews.llvm.org/D18048
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@263819 91177308-0d34-0410-b5e6-96231b3b80d8
Summary:
Allow the selection of BUFFER_LOAD_FORMAT_x and _XY. Do this now before
the frontend patches land in Mesa. Eventually, we may want to automatically
reduce the size of loads at the LLVM IR level, which requires such overloads,
and in some cases Mesa can generate them directly.
Reviewers: tstellarAMD, arsenm
Subscribers: arsenm, llvm-commits
Differential Revision: http://reviews.llvm.org/D18255
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@263792 91177308-0d34-0410-b5e6-96231b3b80d8
Summary:
These intrinsics expose the BUFFER_ATOMIC_* instructions and will be used
by Mesa to implement atomics with buffer semantics. The intrinsic interface
matches that of buffer.load.format and buffer.store.format, except that the
GLC bit is not exposed (it is automatically deduced based on whether the
return value is used).
The change of hasSideEffects is required for TableGen to accept the pattern
that matches the intrinsic.
Reviewers: tstellarAMD, arsenm
Subscribers: arsenm, rivanvx, llvm-commits
Differential Revision: http://reviews.llvm.org/D18151
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@263791 91177308-0d34-0410-b5e6-96231b3b80d8
Summary:
We cannot easily deduce that an offset is in an SGPR, but the Mesa frontend
cannot easily make use of an explicit soffset parameter either. Furthermore,
it is likely that in the future, LLVM will be in a better position than the
frontend to choose an SGPR offset if possible.
Since there aren't any frontend uses of these intrinsics in upstream
repositories yet, I would like to take this opportunity to change the
intrinsic signatures to a single offset parameter, which is then selected
to immediate offsets or voffsets using a ComplexPattern.
Reviewers: arsenm, tstellarAMD, mareko
Subscribers: arsenm, llvm-commits
Differential Revision: http://reviews.llvm.org/D18218
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@263790 91177308-0d34-0410-b5e6-96231b3b80d8
For fcmp, major concern about the following 6 cases is NaN result. The
comparison result consists of 4 bits, indicating lt, eq, gt and un (unordered),
only one of which will be set. The result is generated by fcmpu
instruction. However, bc instruction only inspects one of the first 3
bits, so when un is set, bc instruction may jump to to an undesired
place.
More specifically, if we expect an unordered comparison and un is set, we
expect to always go to true branch; in such case UEQ, UGT and ULT still
give false, which are undesired; but UNE, UGE, ULE happen to give true,
since they are tested by inspecting !eq, !lt, !gt, respectively.
Similarly, for ordered comparison, when un is set, we always expect the
result to be false. In such case OGT, OLT and OEQ is good, since they are
actually testing GT, LT, and EQ respectively, which are false. OGE, OLE
and ONE are tested through !lt, !gt and !eq, and these are true.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@263753 91177308-0d34-0410-b5e6-96231b3b80d8
This patch prevents CTR loops optimization when using soft float operations
inside loop body. Soft float operations use function calls, but function
calls are not allowed inside CTR optimized loops.
Patch by Aleksandar Beserminji.
Differential Revision: http://reviews.llvm.org/D17600
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@263727 91177308-0d34-0410-b5e6-96231b3b80d8
Summary:
MRI::eliminateFrameIndex can emit several instructions to do address
calculations; these can usually be stackified. Because instructions with
FI operands can have subsequent operands which may be expression trees,
find the top of the leftmost tree and insert the code before it, to keep
the LIFO property.
Also use stackified registers when writing back the SP value to memory
in the epilog; it's unnecessary because SP will not be used after the
epilog, and it results in better code.
Differential Revision: http://reviews.llvm.org/D18234
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@263725 91177308-0d34-0410-b5e6-96231b3b80d8
We were being too aggressive in trying to combine a shuffle into a blend-with-zero pattern, often resulting in a endless loop of contrasting combines
This patch stops the combine if we already have a blend in place (means we miss some domain corrections)
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@263717 91177308-0d34-0410-b5e6-96231b3b80d8
The two changes together weakened the test and caused a regression with division
handling in MSVC mode. They were applied to avoid an assertion being triggered
in the block frequency analysis. However, the underlying problem was simply
being masked rather than solved properly. Address the actual underlying problem
and revert the changes. Rather than analyze the cause of the assertion, the
division failure was assumed to be an overflow.
The underlying issue was a subtle bug in the BB construction in the emission of
the div-by-zero check (WIN__DBZCHK). We did not construct the proper successor
information in the basic blocks, nor did we update the PHIs associated with the
basic block when we split them. This would result in assertions being triggered
in the block frequency analysis pass.
Although the original tests are being removed, the tests themselves performed
very little in terms of validation but merely tested that we did not assert when
generating code. Update this with new tests that actually ensure that we do not
regress on the code generation.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@263714 91177308-0d34-0410-b5e6-96231b3b80d8
Summary:
Uniform loops where the branch leaving the loop is predicated on VCCNZ
must be skipped if EXEC = 0, otherwise they will be infinite.
Reviewers: tstellarAMD, arsenm
Subscribers: arsenm, llvm-commits
Differential Revision: http://reviews.llvm.org/D18137
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@263658 91177308-0d34-0410-b5e6-96231b3b80d8
- Ensure we test X86 + X64
- sitopfp / uitofp requires testing for SSE2 and SSE42 as well (part of the fix for PR26953)
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@263640 91177308-0d34-0410-b5e6-96231b3b80d8
We can currently only match zeroable vector elements of the same size as the shuffle type - these tests demonstrate the problem and a solution will be shortly added in an updated D14261
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@263606 91177308-0d34-0410-b5e6-96231b3b80d8
Summary:
Static LDS size is saved in MachineFunctionInfo::LDSSize,
We define a pseudo instruction with usesCustomInserter bit set. Then, in EmitInstrWithCustomInserter,
we replace this pseudo instruction with a mov of MachineFunctionInfo::LDSSize.
Reviewers:
arsenm
tstellarAMD
Subscribers
llvm-commits, arsenm
Differential Revision:
http://reviews.llvm.org/D18064
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@263563 91177308-0d34-0410-b5e6-96231b3b80d8
This patch adds support for the MachO .alt_entry assembly directive, and uses
it for global aliases with non-zero GEP offsets. The alt_entry flag indicates
that a symbol should be layed out immediately after the preceding symbol.
Conceptually it introduces an alternate entry point for a function or data
structure. E.g.:
safe_foo:
// check preconditions for foo
.alt_entry fast_foo
fast_foo:
// body of foo, can assume preconditions.
The .alt_entry flag is also implicitly set on assembly aliases of the form:
a = b + C
where C is a non-zero constant, since these have the same effect as an
alt_entry symbol: they introduce a label that cannot be moved relative to the
preceding one. Setting the alt_entry flag on aliases of this form fixes
http://llvm.org/PR25381.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@263521 91177308-0d34-0410-b5e6-96231b3b80d8
pre-SSE41 hardware" as it seems to be causing crashes during code
generation in halide. PR forthcoming.
This reverts commit r263303.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@263512 91177308-0d34-0410-b5e6-96231b3b80d8
When the SP in not changed because of realignment/VLAs etc., we restore the SP
by using the previous value of SP and not the FP. Breaking the dependency will
help in cases when the epilog of a callee is close to the epilog of the caller;
for then "sub sp, fp, #" depends on the load restoring the FP in the epilog of
the callee.
http://reviews.llvm.org/D18060
Patch by Aditya Kumar and Evandro Menezes.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@263458 91177308-0d34-0410-b5e6-96231b3b80d8
Converting masked vector loads to regular vector loads for x86 AVX should always be a win.
I raised the legality issue of reading the extra memory bytes on llvm-dev. I did not see any
objections.
1. x86 already does this kind of optimization for multiple scalar loads -> vector load.
2. If other targets have the same flexibility, we could move this transform up to CGP or DAGCombiner.
Differential Revision: http://reviews.llvm.org/D18094
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@263446 91177308-0d34-0410-b5e6-96231b3b80d8
Summary:
MIPSR6 introduces a class of branches called compact branches. Unlike the
traditional MIPS branches which have a delay slot, compact branches do not
have a delay slot. The instruction following the compact branch is only
executed if the branch is not taken and must not be a branch.
It works by generating compact branches for MIPS32R6 when the delay slot
filler cannot fill a delay slot. Then, inspecting the generated code for
forbidden slot hazards (a compact branch with an adjacent branch or other
CTI) and inserting nops to clear this hazard.
Patch by Simon Dardis.
Reviewers: vkalintiris, dsanders
Subscribers: MatzeB, dsanders, llvm-commits
Differential Revision: http://reviews.llvm.org/D16353
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@263444 91177308-0d34-0410-b5e6-96231b3b80d8
On the z13, it turns out to be more efficient to access a full
floating-point register than just the upper half (as done e.g.
by the LE and LER instructions).
Current code already takes this into account when loading from
memory by using the LDE instruction in place of LE. However,
we still generate LER, which shows the same performance issues
as LE in certain circumstances.
This patch changes the back-end to emit LDR instead of LER to
implement FP32 register-to-register copies on z13.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@263431 91177308-0d34-0410-b5e6-96231b3b80d8
For cases where we are truncating an integer vector arithmetic result, it may be better to pre-truncate the input operands - no code to support this yet (scalar is done with SimplifyDemandedBits but adding vector support could be a lot of work) but these tests represent the current codegen status.
Example bugs: PR14666, PR22703
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@263384 91177308-0d34-0410-b5e6-96231b3b80d8
The SSE41 v8i16 shift lowering using (v)pblendvb is great for non-constant shift amounts, but if it is constant then we can efficiently reduce the VSELECT to shuffles with the pre-SSE41 lowering.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@263383 91177308-0d34-0410-b5e6-96231b3b80d8
This patch corresponds to review:
http://reviews.llvm.org/D17712
We were not clearing the TOC vector in PPCAsmPrinter when initializing it. This
caused duplicate definition asserts when the pass is reused on the module
(i.e. with -compile-twice or in JIT contexts).
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@263338 91177308-0d34-0410-b5e6-96231b3b80d8
cmpxchg[8|16]b uses RBX as one of its argument.
In other words, using this instruction clobbers RBX as it is defined to hold one
the input. When the backend uses dynamically allocated stack, RBX is used as a
reserved register for the base pointer.
Reserved registers have special semantic that only the target understands and
enforces, because of that, the register allocator don’t use them, but also,
don’t try to make sure they are used properly (remember it does not know how
they are supposed to be used).
Therefore, when RBX is used as a reserved register but defined by something that
is not compatible with that use, the register allocator will not fix the
surrounding code to make sure it gets saved and restored properly around the
broken code. This is the responsibility of the target to do the right thing with
its reserved register.
To fix that, when the base pointer needs to be preserved, we use a different
pseudo instruction for cmpxchg that save rbx.
That pseudo takes two more arguments than the regular instruction:
- One is the value to be copied into RBX to set the proper value for the
comparison.
- The other is the virtual register holding the save of the value of RBX as the
base pointer. This saving is done as part of isel (i.e., we emit a copy from
rbx).
cmpxchg_save_rbx <regular cmpxchg args>, input_for_rbx_reg, save_of_rbx_as_bp
This gets expanded into:
rbx = copy input_for_rbx_reg
cmpxchg <regular cmpxchg args>
rbx = save_of_rbx_as_bp
Note: The actual modeling of the pseudo is a bit more complicated to make sure
the interferes that appears after the pseudo gets expanded are properly modeled
before that expansion.
This fixes PR26883.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@263325 91177308-0d34-0410-b5e6-96231b3b80d8
Improve vector extension of vectors on hardware without dedicated VSEXT/VZEXT instructions.
We already convert these to SIGN_EXTEND_VECTOR_INREG/ZERO_EXTEND_VECTOR_INREG but can further improve this by using the legalizer instead of prematurely splitting into legal vectors in the combine as this only properly helps for lowering to VSEXT/VZEXT.
Removes a lot of unnecessary any_extend + mask pattern - (Fix for PR25718).
Differential Revision: http://reviews.llvm.org/D17932
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@263303 91177308-0d34-0410-b5e6-96231b3b80d8
Instead, extend f16 (like we do when lowering a standalone SETCC),
and let f128 be legalized to the RT calls.
Fixes PR26803.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@263301 91177308-0d34-0410-b5e6-96231b3b80d8
Its not enough that we test for SSSE3 - that's only OK for 128-bit vectors - we also need to test for AVX2 / AVX512BW for 256/512 bit vector cases.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@263239 91177308-0d34-0410-b5e6-96231b3b80d8
If a constant is the same as the reverse of an inline immediate,
this is 4 bytes smaller than having to embed a 32-bit literal.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@263201 91177308-0d34-0410-b5e6-96231b3b80d8
Generalise the existing SIGN_EXTEND to SIGN_EXTEND_VECTOR_INREG combine to support zero extension as well and get rid of a lot of unnecessary ANY_EXTEND + mask patterns.
Reapplied with a fix for PR26870 (avoid premature use of TargetConstant in ZERO_EXTEND_VECTOR_INREG expansion).
Differential Revision: http://reviews.llvm.org/D17691
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@263159 91177308-0d34-0410-b5e6-96231b3b80d8
This patch fixes the problem which occurs when loop-vectorize tries to use @llvm.masked.load/store intrinsic for a non-default addrspace pointer. It fails with "Calling a function with a bad signature!" assertion in CallInst constructor because it tries to pass a non-default addrspace pointer to the pointer argument which has default addrspace.
The fix is to add pointer type as another overloaded type to @llvm.masked.load/store intrinsics.
Reviewed By: reames
Differential Revision: http://reviews.llvm.org/D17270
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@263158 91177308-0d34-0410-b5e6-96231b3b80d8
Summary:
They correspond to BUFFER_LOAD/STORE_FORMAT_XYZW and will be used by Mesa
to implement the GL_ARB_shader_image_load_store extension.
The intention is that for llvm.amdgcn.buffer.load.format, LLVM will decide
whether one of the _X/_XY/_XYZ opcodes can be used (similar to image sampling
and loads). However, this is not currently implemented.
For llvm.amdgcn.buffer.store, LLVM cannot decide to use one of the "smaller"
opcodes and therefore the intrinsic is overloaded. Currently, only the v4f32
is actually implemented since GLSL also only has a vec4 variant of the store
instructions, although it's conceivable that Mesa will want to be smarter
about this in the future.
BUFFER_LOAD_FORMAT_XYZW is already exposed via llvm.SI.vs.load.input, which
has a legacy name, pretends not to access memory, and does not capture the
full flexibility of the instruction.
Reviewers: arsenm, tstellarAMD, mareko
Subscribers: arsenm, llvm-commits
Differential Revision: http://reviews.llvm.org/D17277
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@263140 91177308-0d34-0410-b5e6-96231b3b80d8