Summary:
Uniform loops where the branch leaving the loop is predicated on VCCNZ
must be skipped if EXEC = 0, otherwise they will be infinite.
Reviewers: tstellarAMD, arsenm
Subscribers: arsenm, llvm-commits
Differential Revision: http://reviews.llvm.org/D18137
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@263658 91177308-0d34-0410-b5e6-96231b3b80d8
Summary:
Static LDS size is saved in MachineFunctionInfo::LDSSize,
We define a pseudo instruction with usesCustomInserter bit set. Then, in EmitInstrWithCustomInserter,
we replace this pseudo instruction with a mov of MachineFunctionInfo::LDSSize.
Reviewers:
arsenm
tstellarAMD
Subscribers
llvm-commits, arsenm
Differential Revision:
http://reviews.llvm.org/D18064
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@263563 91177308-0d34-0410-b5e6-96231b3b80d8
If a constant is the same as the reverse of an inline immediate,
this is 4 bytes smaller than having to embed a 32-bit literal.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@263201 91177308-0d34-0410-b5e6-96231b3b80d8
Summary:
They correspond to BUFFER_LOAD/STORE_FORMAT_XYZW and will be used by Mesa
to implement the GL_ARB_shader_image_load_store extension.
The intention is that for llvm.amdgcn.buffer.load.format, LLVM will decide
whether one of the _X/_XY/_XYZ opcodes can be used (similar to image sampling
and loads). However, this is not currently implemented.
For llvm.amdgcn.buffer.store, LLVM cannot decide to use one of the "smaller"
opcodes and therefore the intrinsic is overloaded. Currently, only the v4f32
is actually implemented since GLSL also only has a vec4 variant of the store
instructions, although it's conceivable that Mesa will want to be smarter
about this in the future.
BUFFER_LOAD_FORMAT_XYZW is already exposed via llvm.SI.vs.load.input, which
has a legacy name, pretends not to access memory, and does not capture the
full flexibility of the instruction.
Reviewers: arsenm, tstellarAMD, mareko
Subscribers: arsenm, llvm-commits
Differential Revision: http://reviews.llvm.org/D17277
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@263140 91177308-0d34-0410-b5e6-96231b3b80d8
Summary:
The code in SelectionDAG did not handle the case where the
register type and output types were different, but had the same size.
Reviewers: arsenm, echristo
Subscribers: llvm-commits
Differential Revision: http://reviews.llvm.org/D17940
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@263022 91177308-0d34-0410-b5e6-96231b3b80d8
Supprot DPP syntax as used in SP3 (except several operands syntax).
Added dpp-specific operands in td-files.
Added DPP flag to TSFlags to determine if instruction is dpp in InstPrinter.
Support for VOP2 DPP instructions in td-files.
Some tests for DPP instructions.
ToDo:
- VOP2bInst:
- vcc is considered as operand
- AsmMatcher doesn't apply mnemonic aliases when parsing operands
- v_mac_f32
- v_nop
- disable instructions with 64-bit operands
- change dpp_ctrl assembler representation to conform sp3
Review: http://reviews.llvm.org/D17804
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@263008 91177308-0d34-0410-b5e6-96231b3b80d8
Summary:
This is necessary for when we run out of VGPRs and can no
longer use v_{read,write}_lane for spilling SGPRs.
Reviewers: arsenm
Subscribers: arsenm, llvm-commits
Differential Revision: http://reviews.llvm.org/D17592
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@262732 91177308-0d34-0410-b5e6-96231b3b80d8
On AMDGPU where operations i64 operations are often bitcasted to v2i32
and back, this pattern shows up regularly where it breaks some
expected combines on i64, such as load width reducing.
This fixes some test failures in a future commit when i64 loads
are changed to promote.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@262397 91177308-0d34-0410-b5e6-96231b3b80d8
This reduces the number of bitcast nodes and generally cleans up the
DAG when bitcasting between integers and vectors everywhere.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@262358 91177308-0d34-0410-b5e6-96231b3b80d8
Summary:
This patch impleemnts DS_PERMUTE/DS_BPERMUTE instruction definitions and intrinsics,
which are new since VI.
Reviewers: tstellarAMD, arsenm
Subscribers: llvm-commits, arsenm
Differential Revision: http://reviews.llvm.org/D17614
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@262356 91177308-0d34-0410-b5e6-96231b3b80d8
This currently does not have the control over the bitwidth,
and there are missing optimizations to reduce the integer to
32-bit if it can be.
But in most situations we do want the sinking to occur.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@262296 91177308-0d34-0410-b5e6-96231b3b80d8
The maximum private allocation for the whole GPU is 4G,
so the maximum possible index for a single workitem is the
maximum size divided by the smallest granularity for a dispatch.
This increases the number of known zero high bits, which
enables more offset folding. The maximum private size per
workitem with this is 128M but may be smaller still.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@262153 91177308-0d34-0410-b5e6-96231b3b80d8
In the case where op = add, y = base_ptr, and x = offset, this
transform:
(op y, (op x, c1)) -> (op (op x, y), c1)
breaks the canonical form of add by putting the base pointer in the
second operand and the offset in the first.
This fix is important for the R600 target, because for some address
spaces the base pointer and the offset are stored in separate register
classes. The old pattern caused the ISel code for matching addressing
modes to put the base pointer and offset in the wrong register classes,
which required no-trivial code transformations to fix.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@262148 91177308-0d34-0410-b5e6-96231b3b80d8
This matches the behavior of the HSAIL clock instruction.
s_realmemtime is used if the subtarget supports it, and falls
back to s_memtime if not.
Also introduces new intrinsics for each of s_memtime / s_memrealtime.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@262119 91177308-0d34-0410-b5e6-96231b3b80d8
Add parsing and printing of image operands. Matches legacy sp3 assembler.
Change image instruction order to have data/image/sampler operands in the beginning. This is needed because optional operands in MC are always last.
Update SITargetLowering for new order.
Add basic MC test.
Update CodeGen tests.
Review: http://reviews.llvm.org/D17574
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@261995 91177308-0d34-0410-b5e6-96231b3b80d8
I don't think this test was intending to test unaligned load/store.
Change it to use the natural alignment to avoid regressing.
Also adds missing SI checks.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@261571 91177308-0d34-0410-b5e6-96231b3b80d8
This avoids some test regressions in a future commit
when unaligned operations are expanded when they
have custom lowering.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@261570 91177308-0d34-0410-b5e6-96231b3b80d8
Summary:
Instead of trying to replace SMRD instructions with a VGPR base pointer
with an equivalent MUBUF instruction, we now copy the base pointer to
SGPRs using v_readfirstlane.
This is safe to do, because any load selected as an SMRD instruction
has been proven to have a uniform base pointer, so each thread in the
wave will have the same pointer value in VGPRs.
This will fix some errors on VI from trying to replace SMRD instructions
with addr64-enabled MUBUF instructions that don't exist.
Reviewers: arsenm, cfang, nhaehnle
Subscribers: arsenm, llvm-commits
Differential Revision: http://reviews.llvm.org/D17305
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@261385 91177308-0d34-0410-b5e6-96231b3b80d8
Summary:
This was broken in r260694 which swapped the address and data operands
for flat store instructions. The code in SIInsertWaits assumes
that the data operand always comes before the address operand, so
we need to add a special case for flat.
Reviewers: arsenm
Subscribers: arsenm, llvm-commits
Differential Revision: http://reviews.llvm.org/D17366
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@261330 91177308-0d34-0410-b5e6-96231b3b80d8
Summary:
These correspond to IMAGE_LOAD/STORE[_MIP] and are going to be used by Mesa
for the GL_ARB_shader_image_load_store extension.
IMAGE_LOAD is already matched by llvm.SI.image.load. That intrinsic has
a legacy name and pretends not to read memory.
Differential Revision: http://reviews.llvm.org/D17276
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@261224 91177308-0d34-0410-b5e6-96231b3b80d8
Tests for the new scalarize all private access options will be
included with a future commit.
The only functional change is to make the split/scalarize behavior
for private access of > 4 element vectors to be consistent
with the flat/global handling. This makes the spilling worse
in the two changed tests.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@260804 91177308-0d34-0410-b5e6-96231b3b80d8
This intrinsic will be used to expose dpp functionality to higher-level
languages. It will map to the dpp version of v_mov_b32.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@260792 91177308-0d34-0410-b5e6-96231b3b80d8
These provide direct access to the hardware instruction without
the unit version required like llvm.sin/llvm.cos lowering requires.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@260782 91177308-0d34-0410-b5e6-96231b3b80d8
Historically, AMD internal sp3 assembler has flat_store* addr, data
format. To match existing code and to enable reuse, change LLVM
definitions to match. Also update MC and CodeGen tests.
Differential Revision: http://reviews.llvm.org/D16927
Patch by: Nikolay Haustov
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@260694 91177308-0d34-0410-b5e6-96231b3b80d8
Summary:
It is possible that the loop condition can be a boolean constant (infinite loop,
for example). So we sould handle constant condition in annotating a loop. This
patch adds this functionality to support annotating constant condition.
Reviewers: tstellarAMD, arsenm
Subscribers: llvm-commits, arsenm
Differential Revision: http://reviews.llvm.org/D15093
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@260692 91177308-0d34-0410-b5e6-96231b3b80d8
This was hardcoded to the static private size, but this
would be missing the offset and additional size for someday
when we have dynamic sizing.
Also stops always initializing flat_scratch even when unused.
In the future we should stop emitting this unless flat instructions
are used to access private memory. For example this will initialize
it almost always on VI because flat is used for global access.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@260658 91177308-0d34-0410-b5e6-96231b3b80d8
Introduce a subtarget feature for this, and leave the default with
the current behavior which assumes up to 16-byte loads/stores can
be used. The field also seems to have the ability to be set to 2 bytes,
but I'm not sure what that would be used for.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@260651 91177308-0d34-0410-b5e6-96231b3b80d8
Summary:
It's possible to have resource descriptors and samplers stored in
VGPRs, either by a VMEM instruction or in the case of samplers,
floating-point calculations. When this happens, we need to use
v_readfirstlane to copy these values back to sgprs.
Reviewers: mareko, arsenm
Subscribers: arsenm, llvm-commits
Differential Revision: http://reviews.llvm.org/D17102
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@260599 91177308-0d34-0410-b5e6-96231b3b80d8
If the two operands to an instruction were both
subregisters of the same super register, it would incorrectly
think this counted as the same constant bus use.
This fixes the verifier error in fmin_legacy.ll which
was missing -verify-machineinstrs.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@260495 91177308-0d34-0410-b5e6-96231b3b80d8
Summary:
This fixes a crash where subsequent spills would be unable to scavenge
a register. In particular, it fixes a crash in piglit's
spec@glsl-1.50@execution@geometry@max-input-components (the test still
has a shader that fails to compile because of too many SGPR spills, but
at least it doesn't crash any more).
This is a candidate for the release branch.
Reviewers: arsenm, tstellarAMD
Subscribers: qcolombet, arsenm
Differential Revision: http://reviews.llvm.org/D16558
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@260427 91177308-0d34-0410-b5e6-96231b3b80d8
If a range has a lower bound of 0, add an AssertZext from the
nearest floor power of two.
This allows operations with some workitem intrinsics with known
maximum ranges to use fast 24-bit multiplies.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@260109 91177308-0d34-0410-b5e6-96231b3b80d8