6103 Commits

Author SHA1 Message Date
Matt Arsenault
20d89b9242 GlobalISel: Use LLT in call lowering callbacks
This preserves the memory type so the lowerings can rely on them.
2021-07-01 12:15:54 -04:00
Stanislav Mekhanoshin
bb5a67dc29 [AMDGPU] Fix immediate sign during V_MOV_B64_PSEUDO expansion
Creating a V_MOV_B32 with zero extended immediate source
prevented conversion to V_BFREV_B32.

Differential Revision: https://reviews.llvm.org/D105235
2021-07-01 09:00:29 -07:00
Matt Arsenault
d665475981 GlobalISel: Use LLT in memory legality queries
This enables proper lowering of non-byte sized loads. We still aren't
faithfully preserving memory types everywhere, so the legality checks
still only consider the size.
2021-06-30 17:44:13 -04:00
Jon Roelofs
b3511ee3cf [GISel] Support llvm.memcpy.inline
Differential revision: https://reviews.llvm.org/D105072
2021-06-30 12:39:05 -07:00
Stanislav Mekhanoshin
4b0ec23e84 [AMDGPU] Add S_MOV_B64_IMM_PSEUDO for wide constants
This is to allow 64 bit constant rematerialization. If a constant
is split into two separate moves initializing sub0 and sub1 like
now RA cannot rematerizalize a 64 bit register.

This gives 10-20% uplift in a set of huge apps heavily using double
precession math.

Fixes: SWDEV-292645

Differential Revision: https://reviews.llvm.org/D104874
2021-06-30 11:45:38 -07:00
alex-t
a236991319 [AMDGPU] PHI node cost should not be counted for the size and latency.
Details: https://reviews.llvm.org/D96805 changed the GCNTTIImpl::getCFInstrCost to return 1 for the PHI nodes
  for the TTI::TCK_CodeSize and TTI::TCK_SizeAndLatency. This is incorrect because the value moves that are the
  result of the PHI lowering are inserted into the basic block predecessors - not into the block itself.
  As a result of this change LoopRotate and LoopUnroll were broken because of the incorrect Loop header and loop
  body size/cost estimation.

Reviewed By: rampitec

Differential Revision: https://reviews.llvm.org/D105104
2021-06-30 16:11:17 +03:00
madhur13490
dc2f2451d3 [AMDGPU] Simplify getReservedNumSGPRs
This is a followup patch on D103636 where
it seemed checking on amdgpu-calls and
amdgpu-stack-objects is unnecessary. Removing these
checks didn't regress any tests functionally.

Reviewed By: arsenm

Differential Revision: https://reviews.llvm.org/D104513
2021-06-30 16:19:39 +05:30
Tony Tye
47ca9dba51 [AMDGPU] Update gfx90a memory model support
Update AMDGPU gfx90a memory model to make coarse grain memory allocations
consistent when fine grained system scope atomic acquire and release is
performed.

Reviewed By: rampitec

Differential Revision: https://reviews.llvm.org/D105137
2021-06-30 04:05:22 +00:00
Piotr Sobczak
8db71b419f [AMDGPU] Fix 224-bit spills
Related to D104622.

Differential Revision: https://reviews.llvm.org/D105109
2021-06-29 17:52:16 +02:00
Jay Foad
b73e1f6d9f [AMDGPU] Use opName instead of PseudoName in VOP2 multiclasses. NFC.
This is just for consistency with all other instruction multiclasses
that pass around pseudo names as arguments.
2021-06-28 16:46:35 +01:00
Brendon Cahoon
16a6bb6581 [AMDGPU][GlobalISel] Legalize and select G_SBFX and G_UBFX
Adds legalizer, register bank select, and instruction
select support for G_SBFX and G_UBFX. These opcodes generate
scalar or vector ALU bitfield extract instructions for
AMDGPU. The instructions allow both constant or register
values for the offset and width operands.

The 32-bit scalar version is expanded to a sequence that
combines the offset and width into a single register.

There are no 64-bit vgpr bitfield extract instructions, so the
operations are expanded to a sequence of instructions that
implement the operation. If the width is a constant,
then the 32-bit bitfield extract instructions are used.

Moved the AArch64 specific code for creating G_SBFX to
CombinerHelper.cpp so that it can be used by other targets.
Only bitfield extracts with constant offset and width values
are handled currently.

Differential Revision: https://reviews.llvm.org/D100149
2021-06-28 09:06:44 -04:00
Jon Chesterfield
7f9c53b162 Disable ReplaceLDS pass, patch up tests to match
Most tests passed with an extra argument to explicitly enable the pass.
One does not, deleted it as part of this change. I can't see why the codegen
would be different between default on and default off but switched on. It
can be retrieved from the project history.

This would be a revert, but git revert was not clean. Disabling the pass
and leaving it in tree is less likely to cause breakage elsewhere than
patching up the git revert conflicts on unfamiliar code. It'll be landed
without review, as @hsmhsm is believed unavailable at present.

Differential Revision: https://reviews.llvm.org/D104962
2021-06-26 01:36:42 +01:00
Jay Foad
636a02dd3d [AMDGPU] Removed unused Predicate HasOffset3fBug. NFC.
The predicate definition didn't make sense anyway because it was defined
as being the opposite of what the name suggests.
2021-06-25 16:58:44 +01:00
Sander de Smalen
5eab663b62 [GlobalISel] NFC: Change LLT::changeNumElements to LLT::changeElementCount.
Reviewed By: aemerson

Differential Revision: https://reviews.llvm.org/D104453
2021-06-25 15:54:00 +01:00
Sander de Smalen
88c55d538f [GlobalISel] NFC: Change LLT::scalarOrVector to take ElementCount.
Reviewed By: aemerson

Differential Revision: https://reviews.llvm.org/D104452
2021-06-25 11:26:16 +01:00
Martin Storsjö
9d14adb9f6 [llvm] Rename StringRef _lower() method calls to _insensitive()
This is a mechanical change. This actually also renames the
similarly named methods in the SmallString class, however these
methods don't seem to be used outside of the llvm subproject, so
this doesn't break building of the rest of the monorepo.
2021-06-25 00:22:01 +03:00
Aakanksha Patil
d4359ff02a [AMDGPU] Add gfx1035 target
Differential Revision: https://reviews.llvm.org/D104804
2021-06-24 14:32:41 -04:00
Sander de Smalen
ac11cfc716 [GlobalISel] NFC: Change LLT::vector to take ElementCount.
This also adds new interfaces for the fixed- and scalable case:
* LLT::fixed_vector
* LLT::scalable_vector

The strategy for migrating to the new interfaces was as follows:
* If the new LLT is a (modified) clone of another LLT, taking the
  same number of elements, then use LLT::vector(OtherTy.getElementCount())
  or if the number of elements is halfed/doubled, it uses .divideCoefficientBy(2)
  or operator*. That is because there is no reason to specifically restrict
  the types to 'fixed_vector'.
* If the algorithm works on the number of elements (as unsigned), then
  just use fixed_vector. This will need to be fixed up in the future when
  modifying the algorithm to also work for scalable vectors, and will need
  then need additional tests to confirm the behaviour works the same for
  scalable vectors.
* If the test used the '/*Scalable=*/true` flag of LLT::vector, then
  this is replaced by LLT::scalable_vector.

Reviewed By: aemerson

Differential Revision: https://reviews.llvm.org/D104451
2021-06-24 11:26:12 +01:00
Carl Ritson
9a5c628361 [AMDGPU] Add 224-bit vector types and link 192-bit types to MVTs
Add SReg_224, VReg_224, AReg_224, etc.
Link 224-bit types with v7i32/v7f32.
Link existing 192-bit types to newly added v3i64/v3f64/v6i32/v6f32.

Reviewed By: rampitec

Differential Revision: https://reviews.llvm.org/D104622
2021-06-24 12:41:22 +09:00
Jon Chesterfield
358e2ca63a Revert "[AMDGPU] [IndirectCalls] Don't propagate attributes to address taken functions and their callees"
This reverts commit 6a3beb1f68d6791a4cd0190f68b48510f754a00a.
Test case that triggers an infinite loop before the revert is at
the review for D103138.
2021-06-24 02:33:50 +01:00
Stanislav Mekhanoshin
bbca3107f3 [AMDGPU] Check for pointer operand while refining LDS align
Also skips the propagation if alignment is 1.

Differential Revision: https://reviews.llvm.org/D104796
2021-06-23 12:27:55 -07:00
Jay Foad
362fb7c790 [AMDGPU] Remove unused multiclass MUBUF_Real_gfx10_with_name 2021-06-23 14:37:28 +01:00
Jay Foad
4509098899 [AMDGPU] Stop using LegacyLegalizerInfo. NFCI.
Differential Revision: https://reviews.llvm.org/D103684
2021-06-23 10:50:32 +01:00
Jay Foad
72abd6c7b3 [AMDGPU] Simplify collectReachableCallees. NFCI.
Don't use SCC iterators when we're only interested in reachability.
Use df_begin/df_end inline to find reachable nodes.

Differential Revision: https://reviews.llvm.org/D104704
2021-06-23 09:11:29 +01:00
Stanislav Mekhanoshin
5459f63eb4 [AMDGPU] Propagate LDS align into to instructions
Differential Revision: https://reviews.llvm.org/D104316
2021-06-23 00:57:16 -07:00
Matt Arsenault
248de533d0 AMDGPU: Try to eliminate clearing of high bits of 16-bit instructions
These used to consistently be zeroed pre-gfx9, but gfx9 made the
situation complicated since now some still do and some don't. This
also manages to pick up a few cases that the pattern fails to optimize
away.

We handle some cases with instruction patterns, but some get
through. In particular this improves the integer cases.
2021-06-22 13:42:49 -04:00
Matt Arsenault
daaa1d32d1 AMDGPU: Fix high 16-bit optimization on gfx9
We can do this optimization in the majority of cases, but we currently
don't have a way to do it. We do not track/model which instructions
have which behavior, the control bit to change the high bit behavior,
or making use of preserved bits at all. This is a bit fuzzy since we
don't know precisely how the source instruction will be lowered, but
that only really matters in one case (for fma_mixlo).

We do need to fixup some of these cases after selection, but the
pattern helps eliminate many of these zexts.
2021-06-22 13:16:45 -04:00
Stanislav Mekhanoshin
cd7b7932a8 [AMDGPU] Use performOptimizedStructLayout for LDS sort
This gives better packing.

Differential Revision: https://reviews.llvm.org/D104331
2021-06-22 09:58:10 -07:00
Matt Arsenault
50b757aa13 AMDGPU: Move zeroed FP high bits optimization to patterns 2021-06-22 12:47:56 -04:00
Eli Friedman
0c81356419 Rename MachineMemOperand::getOrdering -> getSuccessOrdering.
Since this method can apply to cmpxchg operations, make sure it's clear
what value we're actually retrieving.  This will help ensure we don't
accidentally ignore the failure ordering of cmpxchg in the future.

We could potentially introduce a getOrdering() method on AtomicSDNode
that asserts the operation isn't cmpxchg, but not sure that's
worthwhile.

Differential Revision: https://reviews.llvm.org/D103338
2021-06-21 16:49:27 -07:00
Jordan Rupprecht
79bdbc37ef [NFC] Wrap entire assert-only block in LLVM_DEBUG 2021-06-21 04:01:27 -07:00
Sebastian Neubauer
95f90b846b [AMDGPU] Fix linking with shared libraries
AMDGPULDSUtils depends on llvm::CallGraph.
2021-06-21 11:11:13 +02:00
Ruiling Song
02847853b4 [AMDGPU] Add Optimize VGPR LiveRange Pass.
This pass aims to optimize VGPR live-range in a typical divergent if-else
control flow. For example:

def(a)
if(cond)
  use(a)
  ... // A
else
  use(a)

As AMDGPU access vgpr with respect to active-mask, we can mark `a` as
dead in region A. For details, please refer to the comments in
implementation file.

The pass is enabled by default, the frontend can disable it through
"-amdgpu-opt-vgpr-liverange=false".

Differential Revision: https://reviews.llvm.org/D102212
2021-06-21 15:25:55 +08:00
hsmahesha
37c462f96a [AMDGPU] Replace non-kernel function uses of LDS globals by pointers.
The main motivation behind pointer replacement of LDS use within non-kernel
functions is - to *avoid* subsequent LDS lowering pass from directly packing
LDS (assume large LDS) into a struct type which would otherwise cause allocating
huge memory for struct instance within every kernel.

Reviewed By: rampitec

Differential Revision: https://reviews.llvm.org/D103225
2021-06-21 11:51:49 +05:30
Fangrui Song
74f848b886 Fix -Wunused-variable and -Wunused-but-set-variable in -DLLVM_ENABLE_ASSERTIONS=off build. NFC 2021-06-20 11:09:07 -07:00
Michael Liao
be5f17eb4b [amdgpu] Improve the from f32 to i64.
- Take the same principle as the conversion from f64 to i64 with extra
  necessary pre- and post-processing. It helps to reduce that conversion
  sequence by half compared to legacy one.

Reviewed By: foad

Differential Revision: https://reviews.llvm.org/D104427
2021-06-19 12:46:48 -04:00
Matt Arsenault
6c04830c65 AMDGPU: Fix infinite loop in DAG combine with fneg + fma
We were not reporting isFNegFree for v2f32, although it is effectively
free after legalization. The generic combine was pulling fneg out of
the fma source operands, and the AMDGPU combine was doing the
opposite.
2021-06-18 19:09:03 -04:00
Matt Arsenault
bf6259aa0b AMDGPU: Fix assert on m0_lo16/m0_hi16
These get added (redundantly) to the bundle expanded for indirect
register accesses. We hit this path only when there is a call in the
function.
2021-06-18 18:48:53 -04:00
Anshil Gandhi
7f0aeb0ac8 [AMDGPU] [CodeGen] Fold negate llvm.amdgcn.class into test mask
Implemented the transformation of xor (llvm.amdgcn.class x, mask), -1 into
llvm.amdgcn.class(x, ~mask). Added LIT tests as well.

Differential Revision: https://reviews.llvm.org/D104049
2021-06-18 13:04:12 -06:00
Carl Ritson
b219ce9cea [AMDGPU] Remove duplicate setOperationAction for v4i16/v4f16 (NFC) 2021-06-18 12:38:54 +09:00
Jay Foad
53635563ca [AMDGPU] Set VOP3P flag on Real instructions
This does not affect codegen but might benefit llvm-mca.
2021-06-16 15:00:45 +01:00
Jay Foad
7c0accc327 [AMDGPU] Set SALU, VALU and other instruction type flags on Real instructions
This does not affect codegen but might benefit llvm-mca.
2021-06-16 13:36:02 +01:00
Jay Foad
ccc05a7e3b [AMDGPU] Set IsAtomicRet and IsAtomicNoRet on Real instructions
This does not affect codegen but might benefit llvm-mca.
2021-06-16 12:23:29 +01:00
Jay Foad
cdf3a5cb61 [AMDGPU] Set mayLoad and mayStore on Real instructions
This does not affect codegen but might benefit llvm-mca.
2021-06-16 12:10:23 +01:00
Jay Foad
88474bed70 [AMDGPU] Set more flags on Real instructions
This does not affect codegen, which only tests these flags on Pseudo
instructions, but might help llvm-mca which has to work with Real
instructions. In particular setting LGKM_CNT on DS instructions helps
with the problem identified in D104149.

Differential Revision: https://reviews.llvm.org/D104293
2021-06-16 09:58:50 +01:00
Jay Foad
965d5fb99e [AMDGPU] Use defvar in SOPInstructions.td. NFC.
Factor out repeated !cast<SOP*_Pseudo>(NAME) into a new "defvar ps",
just to improve readability and maintainability.

Differential Revision: https://reviews.llvm.org/D104306
2021-06-16 09:16:45 +01:00
Piotr Sobczak
c63f0c139e [AMDGPU] Limit runs of fixLdsBranchVmemWARHazard
The code in fixLdsBranchVmemWARHazard looks for patterns of a vmem/lds
access followed by a branch, followed by an lds/vmem access.

The handling of the hazard requires an arbitrary number of instructions
to process. In the worst case where a function has a vmem access, but no lds
accesses, all instructions are examined only to conclude that the hazard
cannot occur.

Add the pre-processing stage which detects if there is both lds and vmem
present in the function and only then does the more costly search.

This patch significantly improves compilation time in the cases the hazard
cannot happen. In one pathological case I looked at IsHazardInst is needlesly
called 88.6 milions times.

The numbers could also be improved by introducing a map around the
inner calls to ::getWaitStatesSince in fixLdsBranchVmemWARHazard, but
nothing will beat not running fixLdsBranchVmemWARHazard at all in the cases
detected by shouldRunLdsBranchVmemWARHazardFixup().

Differential Revision: https://reviews.llvm.org/D104219
2021-06-14 22:30:23 +02:00
madhur13490
1d0f3b309b [AMDGPU][IndirectCalls] Fix register usage propagation for indirect/external calls
This patch computes max SGPRs and VGPRs used by module
in presence of indirect calls and makes that
as register requirement for functions/kernels
which makes indirect calls.

This patch also refactors code AMDGPUSubTarget.cpp
which add a "base" variants of getMaxNumSGPRs which
is used by MachineFunction and new Function version.

Reviewed By: arsenm

Differential Revision: https://reviews.llvm.org/D103636
2021-06-12 11:59:34 +05:30
Matt Arsenault
ddaa8ec259 AMDGPU/GlobalISel: Remove leftover hack for argument memory sizes
Since the call lowering code now tries to respect the tablegen
reported argument types, this is no longer necessary.
2021-06-11 13:45:25 -04:00
Matt Arsenault
8ecbf4dea3 AMDGPU/GlobalISel: Fix indentation 2021-06-11 13:45:25 -04:00