Summary:
The lowering of PHI nodes used to detect if all inputs originated
from IMPLICIT_DEF's. If so the PHI node was replaced by an
IMPLICIT_DEF. Now we also consider undef uses when checking the
inputs. So if all inputs are implicitly defined or undef we
lower the PHI to an IMPLICIT_DEF. This makes
PHIElimination::LowerPHINode more consistent as it checks
both implicit and undef properties at later stages.
Reviewers: MatzeB, tstellar
Reviewed By: MatzeB
Subscribers: jvesely, nhaehnle, llvm-commits
Differential Revision: https://reviews.llvm.org/D52558
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@343417 91177308-0d34-0410-b5e6-96231b3b80d8
For the AMDGPU target if a MBB contains exec mask restore preamble, SplitEditor may get state when it cannot insert a spill instruction.
E.g. for a MIR
bb.100:
%1 = S_OR_SAVEEXEC_B64 %2, implicit-def $exec, implicit-def $scc, implicit $exec
and if the regalloc will try to allocate a virtreg to the physreg already assigned to virtreg %1, it should insert spill instruction before the S_OR_SAVEEXEC_B64 instruction.
But it is not possible since can generate incorrect code in terms of exec mask.
The change makes regalloc to ignore such physreg candidates.
Reviewed By: rampitec
Differential Revision: https://reviews.llvm.org/D52052
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@343004 91177308-0d34-0410-b5e6-96231b3b80d8
[AMDGPU] lower-switch in preISel as a workaround for legacy DA
Summary:
The default target of the switch instruction may sometimes be an
"unreachable" block, when it is guaranteed that one of the cases is
always taken. The dominator tree concludes that such a switch
instruction does not have an immediate post dominator. This confuses
divergence analysis, which is unable to propagate sync dependence to
the targets of the switch instruction.
As a workaround, the AMDGPU target now invokes lower-switch as a
preISel pass. LowerSwitch is designed to handle the unreachable
default target correctly, allowing the divergence analysis to locate
the correct immediate dominator of the now-lowered switch.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@342956 91177308-0d34-0410-b5e6-96231b3b80d8
The check for assignment of zero is practically useless
while the assignment moves around with different scheduling.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@342935 91177308-0d34-0410-b5e6-96231b3b80d8
If the alignment is at least 4, this should report true.
Something still seems off with how < 4-byte types are
handled here though.
Fixing this seems to change how some combines get
to where they get, but somehow isn't changing the net
result.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@342879 91177308-0d34-0410-b5e6-96231b3b80d8
Summary:
The default target of the switch instruction may sometimes be an
"unreachable" block, when it is guaranteed that one of the cases is
always taken. The dominator tree concludes that such a switch
instruction does not have an immediate post dominator. This confuses
divergence analysis, which is unable to propagate sync dependence to
the targets of the switch instruction.
As a workaround, the AMDGPU target now invokes lower-switch as a
preISel pass. LowerSwitch is designed to handle the unreachable
default target correctly, allowing the divergence analysis to locate
the correct immediate dominator of the now-lowered switch.
Reviewers: arsenm, nhaehnle
Reviewed By: nhaehnle
Subscribers: kzhuravl, jvesely, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, llvm-commits, simoll
Differential Revision: https://reviews.llvm.org/D52221
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@342722 91177308-0d34-0410-b5e6-96231b3b80d8
Summary: This change is the first part of the AMDGPU target description
change. The aim of it is the effective splitting the vector and scalar
flows at the selection stage. Selection uses predicate functions based
on the framework implemented earlier - https://reviews.llvm.org/D35267
Differential revision: https://reviews.llvm.org/D52019
Reviewers: rampitec
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@342719 91177308-0d34-0410-b5e6-96231b3b80d8
Summary:
This is required for GPUs with 16 bit instructions where f16 is a
legal register type and hence int_to_fp i1 to f16 is not lowered
by legalizing.
Reviewers: arsenm, nhaehnle
Reviewed By: nhaehnle
Subscribers: kzhuravl, jvesely, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, llvm-commits
Differential Revision: https://reviews.llvm.org/D52018
Change-Id: Ie4c0fd6ced7cf10ad612023c6879724d9ded5851
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@342558 91177308-0d34-0410-b5e6-96231b3b80d8
Summary:
GFX9 and above support sin/cos instructions with a greater range and thus don't
require a fract instruction prior to invocation.
Added a subtarget feature to reflect this and added code to take advantage of
expanded range on GFX9+
Also updated the tests to check correct behaviour
Subscribers: arsenm, kzhuravl, jvesely, wdng, nhaehnle, yaxunl, tpr, t-tye, llvm-commits
Differential Revision: https://reviews.llvm.org/D51933
Change-Id: I1c1f1d3726a5ae32116646ca5cfa1ab4ef69e5b0
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@342222 91177308-0d34-0410-b5e6-96231b3b80d8
If an argument was passed on the stack, this
was using the default alignment.
I'm not sure there's an observable change from this. This
was observable due to bugs in expansion of unaligned
loads and stores, but since that is fixed I don't think
this matters much.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@342133 91177308-0d34-0410-b5e6-96231b3b80d8
This was trying to scalarizing a scalar FP type,
resulting in an assert.
Fixes unaligned f64 stack stores for AMDGPU.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@342132 91177308-0d34-0410-b5e6-96231b3b80d8
Move isa version determination into TargetParser.
Also switch away from target features to CPU string when
determining isa version. This fixes an issue when we
output wrong isa version in the object code when features
of a particular CPU are altered (i.e. gfx902 w/o xnack
used to result in gfx900).
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@342069 91177308-0d34-0410-b5e6-96231b3b80d8
This reverts commit r341982.
The change introduced a layering violation. Reverting to unbreak
our integrate.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@342023 91177308-0d34-0410-b5e6-96231b3b80d8
into TargetParser.
Also switch away from target features to CPU string when
determining isa version. This fixes an issue when we
output wrong isa version in the object code when features
of a particular CPU are altered (i.e. gfx902 w/o xnack
used to result in gfx900).
Differential Revision: https://reviews.llvm.org/D51890
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@341982 91177308-0d34-0410-b5e6-96231b3b80d8
We should never abort on valid IR. The most reasonable
interpretation of an arbitrary address space pointer is
probably some kind of special subset of global memory.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@341894 91177308-0d34-0410-b5e6-96231b3b80d8
This will require something to cast. Before this would eliminate
the cast, which would result in copies of $noreg.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@341803 91177308-0d34-0410-b5e6-96231b3b80d8
This already worked if only one register piece was used,
but didn't if a type was split into multiple, unequal
sized pieces.
Fixes not splitting 3i16/v3f16 into two registers for
AMDGPU.
This will also allow fixing the ABI for 16-bit vectors
in a future commit so that it's the same for all subtargets.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@341801 91177308-0d34-0410-b5e6-96231b3b80d8
Summary:
This fixes a bug where a large number of implicit def instructions can fill the GCNHazardRecognizer lookahead buffer causing required NOPs to not be inserted.
Reviewers: nhaehnle, arsenm
Reviewed By: arsenm
Subscribers: sheredom, kzhuravl, jvesely, wdng, yaxunl, dstuttard, tpr, t-tye, llvm-commits
Differential Revision: https://reviews.llvm.org/D51726
Change-Id: Ie75338f94de704ee5816b05afd0c922c6748a95b
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@341798 91177308-0d34-0410-b5e6-96231b3b80d8
The intention is to enable the extract_vector_elt load combine,
and doing this for other operations interferes with more
useful optimizations on vectors.
Handle any type of load since in principle we should do the
same combine for the various load intrinsics.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@341219 91177308-0d34-0410-b5e6-96231b3b80d8
In computeRegisterLiveness, the max instructions to search
was counting dbg_value instructions, which could potentially
cause an observable codegen change from the presence of debug
info.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@341028 91177308-0d34-0410-b5e6-96231b3b80d8
If there is an unused def, this would previously
report that the register was live. Check for uses
first so that it is reported as dead if never used.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@341027 91177308-0d34-0410-b5e6-96231b3b80d8
If the end of the block is reached during the scan, check
the live ins of the successors. This was already done in the
other direction if the block entry was reached.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@341026 91177308-0d34-0410-b5e6-96231b3b80d8