435 Commits

Author SHA1 Message Date
David Stuttard
d7200011f1 [AMDGPU] Stop mulhi from doing 24 bit mul for uniform values
Added support to check if architecture supports s_mulhi which is used as part of
the decision whether or not to use valu 24 bit mul (if the mulhi gets
transformed to a valu op anyway, then may as well use it).

This is an extension of the work in D97063

Differential Revision: https://reviews.llvm.org/D103321

Change-Id: I80b1323de640a52623d69ac005a97d06a5d42a14
2021-07-05 10:33:23 +01:00
Brendon Cahoon
16a6bb6581 [AMDGPU][GlobalISel] Legalize and select G_SBFX and G_UBFX
Adds legalizer, register bank select, and instruction
select support for G_SBFX and G_UBFX. These opcodes generate
scalar or vector ALU bitfield extract instructions for
AMDGPU. The instructions allow both constant or register
values for the offset and width operands.

The 32-bit scalar version is expanded to a sequence that
combines the offset and width into a single register.

There are no 64-bit vgpr bitfield extract instructions, so the
operations are expanded to a sequence of instructions that
implement the operation. If the width is a constant,
then the 32-bit bitfield extract instructions are used.

Moved the AArch64 specific code for creating G_SBFX to
CombinerHelper.cpp so that it can be used by other targets.
Only bitfield extracts with constant offset and width values
are handled currently.

Differential Revision: https://reviews.llvm.org/D100149
2021-06-28 09:06:44 -04:00
Carl Ritson
9a5c628361 [AMDGPU] Add 224-bit vector types and link 192-bit types to MVTs
Add SReg_224, VReg_224, AReg_224, etc.
Link 224-bit types with v7i32/v7f32.
Link existing 192-bit types to newly added v3i64/v3f64/v6i32/v6f32.

Reviewed By: rampitec

Differential Revision: https://reviews.llvm.org/D104622
2021-06-24 12:41:22 +09:00
Matt Arsenault
50b757aa13 AMDGPU: Move zeroed FP high bits optimization to patterns 2021-06-22 12:47:56 -04:00
Michael Liao
be5f17eb4b [amdgpu] Improve the from f32 to i64.
- Take the same principle as the conversion from f64 to i64 with extra
  necessary pre- and post-processing. It helps to reduce that conversion
  sequence by half compared to legacy one.

Reviewed By: foad

Differential Revision: https://reviews.llvm.org/D104427
2021-06-19 12:46:48 -04:00
Matt Arsenault
6c04830c65 AMDGPU: Fix infinite loop in DAG combine with fneg + fma
We were not reporting isFNegFree for v2f32, although it is effectively
free after legalization. The generic combine was pulling fneg out of
the fma source operands, and the AMDGPU combine was doing the
opposite.
2021-06-18 19:09:03 -04:00
Julien Pagès
7410730d5d [AMDGPU] Fix a crash when selecting a particular case of buffer_load_format_d16
In this particular example, we had a crash when compiling it
for several architectures. This patch extends the legalization
of extract_subvector to avoid this problem.

Differential Revision: https://reviews.llvm.org/D103344
2021-06-03 16:40:18 -04:00
Sebastian Neubauer
64ac3b3090 [AMDGPU] Fix function calls with flat scratch
When flat scratch is used, the stack pointer needs to be added when
writing arguments to the stack.
For buffer instructions, this is done in SelectMUBUFScratchOffen
and SelectMUBUFScratchOffset.

Move that to call argument lowering, like it is done in GlobalISel.

Differential Revision: https://reviews.llvm.org/D103166
2021-05-28 11:22:13 +02:00
Stanislav Mekhanoshin
cc39b4a525 [AMDGPU] Fix module LDS selection
Accesses to global module LDS variable start from null,
but kernel also thinks its variables start address is
null. Fixed by not using a null as an address.

Differential Revision: https://reviews.llvm.org/D102882
2021-05-20 15:59:01 -07:00
Julien Pagès
38ac1317ba [AMDGPU] Select V_CVT_*16_F16 more often
Improve the code generation of fp_to_sint
and fp_to_uint for integer on 16-bits.

Differential Revision: https://reviews.llvm.org/D101481

Patch by Julien Pagès!
2021-05-05 08:57:51 +01:00
Nicolai Hähnle
60121ef575 [AMDGPU][SelectionDAG] Don't combine uniform multiplies to MUL_[UI]24
Prefer to keep uniform (non-divergent) multiplies on the scalar ALU when
possible. This significantly improves some game cases by eliminating
v_readfirstlane instructions when the result feeds into a scalar
operation, like the address calculation for a scalar load or store.

Since isDivergent is only an approximation of whether a value is in
SGPRs, it can potentially regress some situations where a uniform value
ends up in a VGPR. These should be rare in real code, although the test
changes do contain a number of examples.

Most of the test changes are just using s_mul instead of v_mul/mad which
is generally better for both register pressure and latency (at least on
GFX10 where sgpr pressure doesn't affect occupancy and vector ALU
instructions have significantly longer latency than scalar ALU). Some
R600 tests now use MULLO_INT instead of MUL_UINT24.

GlobalISel appears to handle more scenarios in the desirable way,
although it can also be thrown off and fails to select the 24-bit
multiplies in some cases.

Alternative solution considered and rejected was to allow selecting
MUL_[UI]24 to S_MUL_I32. I've rejected this because the definition of
those SD operations works is don't-care on the most significant 8 bits,
and this fact is used in some combines via SimplifyDemandedBits.

Based on a patch by Nicolai Hähnle.

Differential Revision: https://reviews.llvm.org/D97063
2021-02-23 15:39:19 +00:00
Stanislav Mekhanoshin
f1c6dbc4d5 [AMDGPU] gfx90a support
Differential Revision: https://reviews.llvm.org/D96906
2021-02-17 16:01:32 -08:00
Jay Foad
9a5c672484 [AMDGPU] Rename simplifyI24 to simplifyMul24
Also simplify one of its call sites. NFC.
2021-02-17 11:33:49 +00:00
Craig Topper
b58d6a4b20 [TargetLowering] Use Align in allowsMisalignedMemoryAccesses.
Reviewed By: arsenm

Differential Revision: https://reviews.llvm.org/D96097
2021-02-04 19:22:06 -08:00
dfukalov
f3ae5b9b8c [NFC][AMDGPU] Split AMDGPUSubtarget.h to R600 and GCN subtargets
... to reduce headers dependency.

Reviewed By: rampitec, arsenm

Differential Revision: https://reviews.llvm.org/D95036
2021-01-20 22:22:45 +03:00
dfukalov
d069b95364 [NFC][AMDGPU] Reduce include files dependency.
Reviewed By: rampitec

Differential Revision: https://reviews.llvm.org/D93813
2021-01-07 22:22:05 +03:00
Sebastian Neubauer
de78d986b0 [AMDGPU] Mark amdgpu_gfx functions as module entry function
- Allows lds allocations
- Writes resource usage into COMPUTE_PGM_RSRC1 registers in PAL metadata

Differential Revision: https://reviews.llvm.org/D92946
2020-12-14 10:43:39 +01:00
Sebastian Neubauer
7e4be9501b [AMDGPU] Add amdgpu_gfx calling convention
Add a calling convention called amdgpu_gfx for real function calls
within graphics shaders. For the moment, this uses the same calling
convention as other calls in amdgpu, with registers excluded for return
address, stack pointer and stack buffer descriptor.

Differential Revision: https://reviews.llvm.org/D88540
2020-11-09 16:51:44 +01:00
Christudasan Devadasan
f694eea8d3 [AMDGPU] Some refactoring after D90404. NFC. 2020-11-01 13:18:53 +05:30
Piotr Sobczak
e82875aa15 [AMDGPU] Fix expansion of i16 MULH
This commit marks i16 MULH as expand in AMDGPU backend,
which is necessary after the refactoring in D80485.

Differential Revision: https://reviews.llvm.org/D89965
2020-10-22 17:05:06 +02:00
Jay Foad
a9d0d8c70c [AMDGPU] Remove MUL_LOHI_U24/MUL_LOHI_I24
These were introduced in r279902 on the grounds that using separate
MUL_U24/MUL_I24 and MULHI_U24/MULHI_I24 nodes would introduce multiple
uses of the operands, which would prevent SimplifyDemandedBits from
simplifying the operands.

This has since been fixed by D24672 "AMDGPU/SI: Use new SimplifyDemandedBits helper for multi-use operations"

No functional change intended. At least it has no effect on lit tests.

Differential Revision: https://reviews.llvm.org/D89706
2020-10-19 19:15:34 +01:00
Jay Foad
850a1db38c [AMDGPU] Add new llvm.amdgcn.fma.legacy intrinsic
Differential Revision: https://reviews.llvm.org/D89558
2020-10-16 17:10:21 +01:00
David Sherwood
272d0be3ed [SVE][NFC] Replace some TypeSize comparisons in non-AArch64 Targets
In most of lib/Target we know that we are not dealing with scalable
types so it's perfectly fine to replace TypeSize comparison operators
with their fixed width equivalents, making use of getFixedSize()
and so on.

Differential Revision: https://reviews.llvm.org/D89101
2020-10-15 09:01:21 +01:00
Jay Foad
f4f70547af [AMDGPU] Reformat AMDGPUTargetLowering::isSDNodeAlwaysUniform. NFC. 2020-09-28 16:24:16 +01:00
Craig Topper
ad2bea0363 [SelectionDAG] Use Align/MaybeAlign in calls to getLoad/getStore/getExtLoad/getTruncStore.
The versions that take 'unsigned' will be removed in the future.

I tried to use getOriginalAlign instead of getAlign in some
places. getAlign factors in the minimum alignment implied by
the offset in the pointer info. Since we're also passing the
pointer info we can use the original alignment.

Reviewed By: arsenm

Differential Revision: https://reviews.llvm.org/D87592
2020-09-14 13:54:50 -07:00
Ruiling Song
9b5efb57d0 [AMDGPU] Fix crash when dag-combining bitcast
From the code after the 'break', they are processing 64bit scalar and
vector bitcast. So I think the break-condition should be (cond1 || cond2)
This means we only execute following code if (64bit and dest-is-vector).

Also remove a previous fix which is not needed with this new fix.
(introduced in: 1349a04ef5f594dda705ec80474dda4837f26dba)

Reviewed By: arsenm

Differential Revision: https://reviews.llvm.org/D85804
2020-08-13 10:23:13 +08:00
Kerry McLaughlin
76e22108d4 [CodeGen] Refactor getMemBasePlusOffset & getObjectPtrOffset to accept a TypeSize
Changes the Offset arguments to both functions from int64_t to TypeSize
& updates all uses of the functions to create the offset using TypeSize::Fixed()

Reviewed By: efriedma

Differential Revision: https://reviews.llvm.org/D85220
2020-08-11 12:17:10 +01:00
Matt Arsenault
5d9720d33a AMDGPU: Remove ATOMIC_PK_FADD
The f32 and v2f16 cases should be handled the same way.
2020-08-05 22:00:52 -04:00
Matt Arsenault
933fc1e514 AMDGPU: Eliminate BUFFER_ATOMIC_PK_ADD_F16 node
This is redundant with the other no return buffer atomic node, and we
don't really need a separate type profile for it.
2020-08-05 15:16:51 -04:00
Jay Foad
b212a3e9c6 [AMDGPU] Propagate fast math flags in frem lowering
Differential Revision: https://reviews.llvm.org/D84518
2020-08-05 09:09:38 +01:00
Jay Foad
f5c2a38331 [AMDGPU] Lower frem f16
Without this it would fail to select on subtargets that have 16-bit
instructions.

Differential Revision: https://reviews.llvm.org/D84517
2020-08-05 09:08:40 +01:00
Jay Foad
c5df629d48 [AMDGPU] Use fma for lowering frem
This gives shorter f64 code and perhaps better accuracy.

Differential Revision: https://reviews.llvm.org/D84516
2020-08-04 16:18:23 +01:00
Cameron McInally
60134c9850 [FPEnv] Don't transform FSUB(-0,X)->FNEG(X) in SelectionDAGBuilder.
This patch stops unconditionally transforming FSUB(-0,X) into an FNEG(X) while building the DAG. There is also one small change to handle the new FSUB(-0,X) similarly to FNEG(X) in the AMDGPU backend.

Differential Revision: https://reviews.llvm.org/D84056
2020-08-03 10:22:25 -05:00
Matt Arsenault
2a62cae903 AMDGPU/GlobalISel: Select llvm.amdgcn.global.atomic.csub
Remove the custom node boilerplate. Not sure why this tried to handle
the LDS atomic stuff.
2020-07-29 08:27:31 -04:00
Sebastian Neubauer
30060e6be7 [AMDGPU] Fix typo. NFC 2020-07-23 17:01:12 +02:00
Petar Avramovic
1bb753a062 AMDGPU: Simplify f16 to i64 custom lowering
Range that f16 can represent fits into i32.
Lower as f16->i32->i64 instead of f16->f32->i64
since f32->i64 has long expansion.

Differential Revision: https://reviews.llvm.org/D84166
2020-07-22 10:32:14 +02:00
Matt Arsenault
26ff5d8529 AMDGPU: Start interpreting byref on kernel arguments
These are treated identically to value aggregates placed in the kernel
argument list. A %struct.foo or %struct.foo addrspace(4)*
byref(sizeof(%struct.foo)) align(alignof(%struct.foo)) argument should
produce the same offsets and argument metadata.

This handles all 3 kernel ABI implementations, and the two HSA
metadata emission paths.
2020-07-21 18:11:22 -04:00
Jay Foad
f4ddb9229d [AMDGPU] Fix typos in performCtlz_CttzCombine()
Fix two obvious errors in the code and also update the test check.
Also add one test to catch the failure.

Patch by Ruiling Song!

Differential Revision: https://reviews.llvm.org/D83280
2020-07-14 10:18:18 +01:00
Jay Foad
875c9d1f56 [AMDGPU] Fix and simplify AMDGPUTargetLowering::LowerUDIVREM
Use the algorithm from AMDGPUCodeGenPrepare::expandDivRem32.

Differential Revision: https://reviews.llvm.org/D83382
2020-07-08 19:14:49 +01:00
Guillaume Chatelet
077a3432f3 [Alignment][NFC] Migrate AMDGPU backend to Align
This patch is part of a series to introduce an Alignment type.
See this thread for context: http://lists.llvm.org/pipermail/llvm-dev/2019-July/133851.html
See this patch for the introduction of the type: https://reviews.llvm.org/D64790

Differential Revision: https://reviews.llvm.org/D82743
2020-06-29 11:56:06 +00:00
Eli Friedman
9d315e1c2b Remove GlobalValue::getAlignment().
This function is deceptive at best: it doesn't return what you'd expect.
If you have an arbitrary GlobalValue and you want to determine the
alignment of that pointer, Value::getPointerAlignment() returns the
correct value.  If you want the actual declared alignment of a function
or variable, GlobalObject::getAlignment() returns that.

This patch switches all the users of GlobalValue::getAlignment to an
appropriate alternative.

Differential Revision: https://reviews.llvm.org/D80368
2020-06-23 19:13:42 -07:00
Eli Friedman
bd863e1c80 [AArch64][SVE] Add legalization support for i32/i64 vector srem/urem
Implement them on top of sdiv/udiv, similar to what we do for integer
types.

Potential future work: implementing i8/i16 srem/urem, optimizations for
constant divisors, optimizing the mul+sub to mls.

Differential Revision: https://reviews.llvm.org/D81511
2020-06-23 16:27:52 -07:00
Matt Arsenault
59adbf12f9 AMDGPU: Remove intermediate DAG node for trig_preop intrinsic
We weren't doing anything with this, and keeping it would just add
more boilerplate for GlobalISel.
2020-06-16 21:06:25 -04:00
Stanislav Mekhanoshin
9363e58d6d [AMDGPU] Add gfx1030 target
Differential Revision: https://reviews.llvm.org/D81886
2020-06-15 16:18:05 -07:00
Matt Arsenault
510fa460fb AMDGPU: Stop using getSelectCC in division lowering
This was promoting booleans to i32 to perform a comparison against
them to feed to a select condition. Just use the booleans
directly. This produces the same final code, since the combiner is
unable to undo the mess this creates. I untangled this logic when I
ported this code to GlobalISel, so port the cleanups back.
2020-06-10 13:56:53 -04:00
Jay Foad
774767acdf [AMDGPU] Better use of llvm::numbers
Tweak a few constant expressions involving numbers::pi etc to avoid
rounding errors. NFCI though it's possible some of these will now be
more accurate in the last bit.
2020-05-29 09:55:36 +01:00
Matt Arsenault
767eefb33b AMDGPU: Update store node checks for atomics
Prepare to switch to using StoreSDNode for atomic stores.
2020-05-26 15:20:03 -04:00
QingShan Zhang
b8ab9c6736 [DAGCombine] Remove the getNegatibleCost to avoid the out of sync with getNegatedExpression
We have the getNegatibleCost/getNegatedExpression to evaluate the cost and negate the expression.
However, during negating the expression, the cost might change as we are changing the DAG,
and then, hit the assertion if we negated the wrong expression as the cost is not trustful anymore.

This patch is target to remove the getNegatibleCost to avoid the out of sync with getNegatedExpression,
and check the cost during negating the expression. It also reduce the duplicated code between
getNegatibleCost and getNegatedExpression. And fix the crash for the test in D76638

Reviewed By: RKSimon, spatel

Differential Revision: https://reviews.llvm.org/D77319
2020-05-20 02:12:16 +00:00
Stanislav Mekhanoshin
77e3cbb0fa [AMDGPU] Fixed selection error for 64 bit extract_subvector
Differential Revision: https://reviews.llvm.org/D80155
2020-05-18 14:17:59 -07:00
Stanislav Mekhanoshin
a8a78bdad8 [AMDGPU] Make v16f64/v16i64 legal
This allows indirect VGPR addressing to work.

Differential Revision: https://reviews.llvm.org/D79960
2020-05-14 14:46:55 -07:00