Commit Graph

1198 Commits

Author SHA1 Message Date
Matt Arsenault
fadb61df65 AMDGPU: Recompute scc liveness
The various scalar bit operations set SCC,
so one is erased or moved it needs to be recomputed.
Not sure why the existing tests don't fail on this.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@312819 91177308-0d34-0410-b5e6-96231b3b80d8
2017-09-08 18:51:26 +00:00
Matt Arsenault
0bb6355f63 AMDGPU: Start selecting v_mad_mix_f32
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@312732 91177308-0d34-0410-b5e6-96231b3b80d8
2017-09-07 18:05:07 +00:00
Konstantin Zhuravlyov
3964b8bfc8 AMDGPU: Handle non-temporal loads and stores
Differential Revision: https://reviews.llvm.org/D36862


git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@312729 91177308-0d34-0410-b5e6-96231b3b80d8
2017-09-07 17:14:54 +00:00
Matt Arsenault
ca22b05483 AMDGPU: Don't legalize i16 extloads to i32 with legal i16
Keeping non-i16 extloads makes it easier to match some new
gfx9 load instructions.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@312699 91177308-0d34-0410-b5e6-96231b3b80d8
2017-09-07 05:37:34 +00:00
Stanislav Mekhanoshin
6148c30603 [AMDGPU] Use v_pk_max_f16 for fcanonicalize
Differential Revision: https://reviews.llvm.org/D37325

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@312676 91177308-0d34-0410-b5e6-96231b3b80d8
2017-09-06 22:27:29 +00:00
Stanislav Mekhanoshin
953b70393a [AMDGPU] Fixed encoding of v_pk_mul_f16 in fcanonicalize
Differential Revision: https://reviews.llvm.org/D37522

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@312660 91177308-0d34-0410-b5e6-96231b3b80d8
2017-09-06 18:29:51 +00:00
Stanislav Mekhanoshin
651c4efd77 [AMDGPU] Fix shouldClusterMemOps to process flat loads
Flat loads do not have vdata operand but have vdst instead.

Differential Revision: https://reviews.llvm.org/D37502

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@312640 91177308-0d34-0410-b5e6-96231b3b80d8
2017-09-06 15:31:30 +00:00
Nicolai Haehnle
adf1cb63f2 AMDGPU: Make worst-case assumption about the wait states in inline assembly
Summary:
Mesa still uses a hack where empty inline assembly is used as a kind of
optimization barrier. This exposed a problem where not enough wait states
were inserted, because the hazard recognizer implicitly assumed that each
inline assembly "instruction" has at least one wait state.

Reviewers: arsenm

Subscribers: kzhuravl, wdng, yaxunl, dstuttard, tpr, llvm-commits, t-tye

Differential Revision: https://reviews.llvm.org/D37205

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@312635 91177308-0d34-0410-b5e6-96231b3b80d8
2017-09-06 13:50:13 +00:00
Yaxun Liu
1e1d0b01c1 [AMDGPU] Transform __read_pipe_* and __write_pipe_*
When packet size equals packet align and is power of 2, transform
__read_pipe* and __write_pipe* to specialized library function.

Differential Revision: https://reviews.llvm.org/D36831


git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@312598 91177308-0d34-0410-b5e6-96231b3b80d8
2017-09-06 00:30:27 +00:00
Matt Arsenault
4e0c4fb9c1 AMDGPU: Fix not accounting for tail call resource usage
If the only call in a function is a tail call, the
function isn't considered to have a call since it's a
type of return.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@312561 91177308-0d34-0410-b5e6-96231b3b80d8
2017-09-05 18:36:36 +00:00
Simon Pilgrim
b474446ca8 [AMDGPU] Added extra test checks to make D19325 diff clearer
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@312537 91177308-0d34-0410-b5e6-96231b3b80d8
2017-09-05 14:32:06 +00:00
Sam McCall
c7c869be7e Revert "Re-enable "[MachineCopyPropagation] Extend pass to do COPY source forwarding""
This crashes on boringSSL on PPC (will send reduced testcase)

This reverts commit r312328.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@312490 91177308-0d34-0410-b5e6-96231b3b80d8
2017-09-04 15:47:00 +00:00
Stanislav Mekhanoshin
99106502b7 [AMDGPU] Testcase for computeKnownBits recursion. NFC.
Testcase for rL312364:
[AMDGPU] Prevent infinite recursion in DAG.computeKnownBits()

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@312388 91177308-0d34-0410-b5e6-96231b3b80d8
2017-09-01 22:25:22 +00:00
Nicolai Haehnle
96b6414540 AMDGPU: IMPLICIT_DEFs and DBG_VALUEs do not contribute to wait states
Summary:
This fixes a bug that was exposed on gfx9 in various
GL45-CTS.shaders.loops.*_iterations.select_iteration_count_fragment tests,
e.g. GL45-CTS.shaders.loops.do_while_uniform_iterations.select_iteration_count_fragment

Reviewers: arsenm

Subscribers: kzhuravl, wdng, yaxunl, dstuttard, tpr, t-tye, llvm-commits

Differential Revision: https://reviews.llvm.org/D36193

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@312337 91177308-0d34-0410-b5e6-96231b3b80d8
2017-09-01 16:56:32 +00:00
Geoff Berry
d168a77ec3 Re-enable "[MachineCopyPropagation] Extend pass to do COPY source forwarding"
Issues addressed since original review:
- Moved removal of dead instructions found by
  LiveIntervals::shrinkToUses() outside of loop iterating over
  instructions to avoid instructions being deleted while pointed to by
  iterator.
- Fixed ARMLoadStoreOptimizer bug exposed by this change in r311907.
- The pass no longer forwards COPYs to physical register uses, since
  doing so can break code that implicitly relies on the physical
  register number of the use.
- The pass no longer forwards COPYs to undef uses, since doing so
  can break the machine verifier by creating LiveRanges that don't
  end on a use (since the undef operand is not considered a use).

  [MachineCopyPropagation] Extend pass to do COPY source forwarding

  This change extends MachineCopyPropagation to do COPY source forwarding.

  This change also extends the MachineCopyPropagation pass to be able to
  be run during register allocation, after physical registers have been
  assigned, but before the virtual registers have been re-written, which
  allows it to remove virtual register COPY LiveIntervals that become dead
  through the forwarding of all of their uses.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@312328 91177308-0d34-0410-b5e6-96231b3b80d8
2017-09-01 14:27:20 +00:00
Matt Arsenault
fcd77e8a04 AMDGPU: Fold clamp modifier for packed instructions
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@312297 91177308-0d34-0410-b5e6-96231b3b80d8
2017-08-31 23:53:50 +00:00
Hans Wennborg
92b6b153a4 Revert r312154 "Re-enable "[MachineCopyPropagation] Extend pass to do COPY source forwarding""
It caused PR34387: Assertion failed: (RegNo < NumRegs && "Attempting to access record for invalid register number!")

> Issues identified by buildbots addressed since original review:
> - Fixed ARMLoadStoreOptimizer bug exposed by this change in r311907.
> - The pass no longer forwards COPYs to physical register uses, since
>   doing so can break code that implicitly relies on the physical
>   register number of the use.
> - The pass no longer forwards COPYs to undef uses, since doing so
>   can break the machine verifier by creating LiveRanges that don't
>   end on a use (since the undef operand is not considered a use).
>
>   [MachineCopyPropagation] Extend pass to do COPY source forwarding
>
>   This change extends MachineCopyPropagation to do COPY source forwarding.
>
>   This change also extends the MachineCopyPropagation pass to be able to
>   be run during register allocation, after physical registers have been
>   assigned, but before the virtual registers have been re-written, which
>   allows it to remove virtual register COPY LiveIntervals that become dead
>   through the forwarding of all of their uses.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@312178 91177308-0d34-0410-b5e6-96231b3b80d8
2017-08-30 22:11:37 +00:00
Geoff Berry
62c7c252f8 Re-enable "[MachineCopyPropagation] Extend pass to do COPY source forwarding"
Issues identified by buildbots addressed since original review:
- Fixed ARMLoadStoreOptimizer bug exposed by this change in r311907.
- The pass no longer forwards COPYs to physical register uses, since
  doing so can break code that implicitly relies on the physical
  register number of the use.
- The pass no longer forwards COPYs to undef uses, since doing so
  can break the machine verifier by creating LiveRanges that don't
  end on a use (since the undef operand is not considered a use).

  [MachineCopyPropagation] Extend pass to do COPY source forwarding

  This change extends MachineCopyPropagation to do COPY source forwarding.

  This change also extends the MachineCopyPropagation pass to be able to
  be run during register allocation, after physical registers have been
  assigned, but before the virtual registers have been re-written, which
  allows it to remove virtual register COPY LiveIntervals that become dead
  through the forwarding of all of their uses.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@312154 91177308-0d34-0410-b5e6-96231b3b80d8
2017-08-30 18:41:07 +00:00
Balaram Makam
b3e25cea9c Re-land MachineInstr: Reason locally about some memory objects before going to AA.
Summary:
Reverts r311008 to reinstate r310825 with a fix.

Refine alias checking for pseudo vs value to be conservative.
This fixes the original failure in builtbot unittest SingleSource/UnitTests/2003-07-09-SignedArgs.

Reviewers: hfinkel, nemanjai, efriedma

Reviewed By: efriedma

Subscribers: bjope, mcrosier, nhaehnle, javed.absar, llvm-commits

Differential Revision: https://reviews.llvm.org/D36900

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@312126 91177308-0d34-0410-b5e6-96231b3b80d8
2017-08-30 14:57:12 +00:00
Stanislav Mekhanoshin
0a4c4f2a1d [AMDGPU] Use v_max_f* for fcanonicalize
If denorms are not flushed we can use max instead of multiplication
by 1. For double that is simply faster, while for float and half
it is shorter, because mul uses constant bus and VOP3.

Differential Revision: https://reviews.llvm.org/D36856

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@312095 91177308-0d34-0410-b5e6-96231b3b80d8
2017-08-30 03:03:38 +00:00
Matt Arsenault
90ea18a51f AMDGPU: Select clamp pattern with v2f16
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@312087 91177308-0d34-0410-b5e6-96231b3b80d8
2017-08-30 01:20:17 +00:00
Stanislav Mekhanoshin
9324a77aa4 [AMDGPU] Fix regression in AMDGPULibCalls allowing native for doubles
Under -cl-fast-relaxed-math we could use native_sqrt, but f64 was
allowed to produce HSAIL's nsqrt instruction. HSAIL is not here
and we stick with non-existing native_sqrt(double) as a result.

Add check for f64 to not return native functions and also remove
handling of f64 case for fold_sqrt.

Differential Revision: https://reviews.llvm.org/D37223

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@311900 91177308-0d34-0410-b5e6-96231b3b80d8
2017-08-28 18:00:08 +00:00
Stanislav Mekhanoshin
f4dd1bdd9a [AMDGPU] computeKnownBitsForTargetNode for 24 bit mul
Differential Revision: https://reviews.llvm.org/D37168

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@311896 91177308-0d34-0410-b5e6-96231b3b80d8
2017-08-28 16:35:37 +00:00
Matt Arsenault
1a6aed20fa IPRA: Don't assume called function is first call operand
Fixes not finding the called global for AMDGPU
call pseudoinstructions, which prevented IPRA
from doing much.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@311637 91177308-0d34-0410-b5e6-96231b3b80d8
2017-08-24 07:55:15 +00:00
Geoff Berry
6c9f36933c Revert "[MachineCopyPropagation] Extend pass to do COPY source forwarding" round 2
This reverts commit r311135.

sanitizer-x86_64-linux-android buildbot is timing out with just this
patch applied.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@311142 91177308-0d34-0410-b5e6-96231b3b80d8
2017-08-18 01:43:11 +00:00
Geoff Berry
d93db263e5 Re-enable "[MachineCopyPropagation] Extend pass to do COPY source forwarding"
Two issues identified by buildbots were addressed:
    - The pass no longer forwards COPYs to physical register uses, since
      doing so can break code that implicitly relies on the physical
      register number of the use.
    - The pass no longer forwards COPYs to undef uses, since doing so
      can break the machine verifier by creating LiveRanges that don't
      end on a use (since the undef operand is not considered a use).

    [MachineCopyPropagation] Extend pass to do COPY source forwarding

    This change extends MachineCopyPropagation to do COPY source forwarding.

    This change also extends the MachineCopyPropagation pass to be able to
    be run during register allocation, after physical registers have been
    assigned, but before the virtual registers have been re-written, which
    allows it to remove virtual register COPY LiveIntervals that become dead
    through the forwarding of all of their uses.

    Reviewers: qcolombet, javed.absar, MatzeB, jonpa

    Subscribers: jyknight, nemanjai, llvm-commits, nhaehnle, mcrosier, mgorny

    Differential Revision: https://reviews.llvm.org/D30751

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@311135 91177308-0d34-0410-b5e6-96231b3b80d8
2017-08-17 23:06:55 +00:00
Geoff Berry
a6a5be21df Revert "[MachineCopyPropagation] Extend pass to do COPY source forwarding"
This reverts commit r311038.

Several buildbots are breaking, and at least one appears to be due to
the forwarding of physical regs enabled by this change.  Reverting while
I investigate further.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@311062 91177308-0d34-0410-b5e6-96231b3b80d8
2017-08-17 04:04:11 +00:00
Geoff Berry
31db6f3bd2 [MachineCopyPropagation] Extend pass to do COPY source forwarding
This change extends MachineCopyPropagation to do COPY source forwarding.

This change also extends the MachineCopyPropagation pass to be able to
be run during register allocation, after physical registers have been
assigned, but before the virtual registers have been re-written, which
allows it to remove virtual register COPY LiveIntervals that become dead
through the forwarding of all of their uses.

Reviewers: qcolombet, javed.absar, MatzeB, jonpa

Subscribers: jyknight, nemanjai, llvm-commits, nhaehnle, mcrosier, mgorny

Differential Revision: https://reviews.llvm.org/D30751

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@311038 91177308-0d34-0410-b5e6-96231b3b80d8
2017-08-16 20:50:01 +00:00
Balaram Makam
edd00a7e54 Revert "MachineInstr: Reason locally about some memory objects before going to AA."
r310825 caused the clang-ppc64le-linux-lnt bot to go red
(http://lab.llvm.org:8011/builders/clang-ppc64le-linux-lnt/builds/5712)
because of a test-suite failure of
SingleSource/UnitTests/2003-07-09-SignedArgs

This reverts commit 0028f6a872.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@311008 91177308-0d34-0410-b5e6-96231b3b80d8
2017-08-16 14:17:43 +00:00
Stanislav Mekhanoshin
2df3fafbea [AMDGPU] Eliminate no effect instructions before s_endpgm
Differential Revision: https://reviews.llvm.org/D36585

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@310987 91177308-0d34-0410-b5e6-96231b3b80d8
2017-08-16 04:43:49 +00:00
Jakub Kuderski
c0f00a9516 [Dominators] Include infinite loops in PostDominatorTree
Summary:
This patch teaches PostDominatorTree about infinite loops. It is built on top of D29705 by @dberlin which includes a very detailed motivation for this change.

What's new is that the patch also teaches the incremental updater how to deal with reverse-unreachable regions and how to properly maintain and verify tree roots. Before that, the incremental algorithm sometimes ended up preserving reverse-unreachable regions after updates that wouldn't appear in the tree if it was constructed from scratch on the same CFG.

This patch makes the following assumptions:
- A sequence of updates should produce the same tree as a recalculating it.
- Any sequence of the same updates should lead to the same tree.
- Siblings and roots are unordered.

The last two properties are essential to efficiently perform batch updates in the future.
When it comes to the first one, we can decide later that the consistency between freshly built tree and an updated one doesn't matter match, as there are many correct ways to pick roots in infinite loops, and to relax this assumption. That should enable us to recalculate postdominators less frequently.

This patch is pretty conservative when it comes to incremental updates on reverse-unreachable regions and ends up recalculating the whole tree in many cases. It should be possible to improve the performance in many cases, if we decide that it's important enough.
That being said, my experiments showed that reverse-unreachable are very rare in the IR emitted by clang when bootstrapping  clang. Here are the statistics I collected by analyzing IR between passes and after each removePredecessor call:

```
# functions:  52283
# samples:  337609
# reverse unreachable BBs:  216022
# BBs:  247840796
Percent reverse-unreachable:  0.08716159869015269 %
Max(PercRevUnreachable) in a function:  87.58620689655172 %
# > 25 % samples:  471 ( 0.1395104988314885 % samples )
... in 145 ( 0.27733680163724345 % functions )
```

Most of the reverse-unreachable regions come from invalid IR where it wouldn't be possible to construct a PostDomTree anyway.

I would like to commit this patch in the next week in order to be able to complete the work that depends on it before the end of my internship, so please don't wait long to voice your concerns :).

Reviewers: dberlin, sanjoy, grosser, brzycki, davide, chandlerc, hfinkel

Reviewed By: dberlin

Subscribers: nhaehnle, javed.absar, kparzysz, uabelho, jlebar, hiraditya, llvm-commits, dberlin, david2050

Differential Revision: https://reviews.llvm.org/D35851

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@310940 91177308-0d34-0410-b5e6-96231b3b80d8
2017-08-15 18:14:57 +00:00
Balaram Makam
0028f6a872 MachineInstr: Reason locally about some memory objects before going to AA.
This addresses a FIXME in MachineInstr::mayAlias.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@310825 91177308-0d34-0410-b5e6-96231b3b80d8
2017-08-14 09:41:40 +00:00
Matt Arsenault
45424dbebb AMDGPU: Start adding tail call support
Handle the sibling call cases.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@310753 91177308-0d34-0410-b5e6-96231b3b80d8
2017-08-11 20:42:08 +00:00
Stanislav Mekhanoshin
9643e6bd78 [AMDGPU] Ported and adopted AMDLibCalls pass
The pass does simplifications of well known AMD library calls.
If given -amdgpu-prelink option it works in a pre-link mode which
allows to reference new library functions which will be linked in
later.

In addition it also used to process traditional AMD option
-fuse-native which allows to replace some of the functions with
their fast native implementations from the library.

The necessary glue to pass the prelink option and translate
-fuse-native is to be added to the driver.

Differential Revision: https://reviews.llvm.org/D36436

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@310731 91177308-0d34-0410-b5e6-96231b3b80d8
2017-08-11 16:42:09 +00:00
Matt Arsenault
e695a23276 AMDGPU: Fix assert on n inline asm constraint
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@310515 91177308-0d34-0410-b5e6-96231b3b80d8
2017-08-09 20:09:35 +00:00
Connor Abbott
8e92b2817e [AMDGPU] Add llvm.amdgpu.update.dpp intrinsic
Summary:
Now that we've made all the necessary backend changes, we can add a new
intrinsic which exposes the new capabilities to IR producers. Since
llvm.amdgpu.update.dpp is a strict superset of llvm.amdgpu.mov.dpp, we
should deprecate the former. We also add tests for all the functionality
that was added in previous changes, now that we can access it via an IR
construct.

Reviewers: tstellar, arsenm

Subscribers: kzhuravl, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye

Differential Revision: https://reviews.llvm.org/D34718

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@310399 91177308-0d34-0410-b5e6-96231b3b80d8
2017-08-08 18:52:22 +00:00
Connor Abbott
4026c85e90 [AMDGPU] Add pseudo "old" source to all DPP instructions
Summary:
All instructions with the DPP modifier may not write to certain lanes of
the output if bound_ctrl=1 is set or any bits in bank_mask or row_mask
aren't set, so the destination register may be both defined and modified.
The right way to handle this is to add a constraint that the destination
register is the same as one of the inputs. We could tie the destination
to the first source, but that would be too restrictive for some use-cases
where we want the destination to be some other value before the
instruction executes. Instead, add a fake "old" source and tie it to the
destination. Effectively, the "old" source defines what value unwritten
lanes will get. We'll expose this functionality to users with a new
intrinsic later.

Also, we want to use DPP instructions for computing derivatives, which
means we need to set WQM for them. We also need to enable the entire
wavefront when using DPP intrinsics to implement nonuniform subgroup
reductions, since otherwise we'll get incorrect results in some cases.
To accomodate this, add a new operand to all DPP instructions which will
be interpreted by the SI WQM pass. This will be exposed with a new
intrinsic later. We'll also add support for Whole Wavefront Mode later.

I also fixed llvm.amdgcn.mov.dpp to overwrite the source and fixed up
the test. However, I could also keep the old behavior (where lanes that
aren't written are undefined) if people want it.

Reviewers: tstellar, arsenm

Subscribers: kzhuravl, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye

Differential Revision: https://reviews.llvm.org/D34716

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@310283 91177308-0d34-0410-b5e6-96231b3b80d8
2017-08-07 19:10:56 +00:00
Matt Arsenault
e525b474ef AMDGPU: Remove -mcpu=SI
Leftover from before amdgcn/r600 split.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@310277 91177308-0d34-0410-b5e6-96231b3b80d8
2017-08-07 18:30:35 +00:00
Matt Arsenault
2e48864110 AMDGPU: Cleanup subtarget features
Try to avoid mutually exclusive features. Don't use
a real default GPU, and use a fake "generic". The goal
is to make it easier to see which set of features are
incompatible between feature strings.

Most of the test changes are due to random scheduling changes
from not having a default fullspeed model.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@310258 91177308-0d34-0410-b5e6-96231b3b80d8
2017-08-07 14:58:04 +00:00
Matt Arsenault
e1f2d6cc65 IPRA: Don't crash on null getCallPreservedMask
Kernels aren't callable, so they don't have a call preserved mask.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@310172 91177308-0d34-0410-b5e6-96231b3b80d8
2017-08-05 07:50:18 +00:00
Connor Abbott
c300b1a6d3 [AMDGPU] Implement llvm.amdgcn.set.inactive intrinsic
Summary:
This intrinsic lets us set inactive lanes to an identity value when
implementing wavefront reductions. In combination with Whole Wavefront
Mode, it lets inactive lanes be skipped over as required by GLSL/Vulkan.
Lowering the intrinsic needs to happen post-RA so that RA knows that the
destination isn't completely overwritten due to the EXEC shenanigans, so
we need another pseudo-instruction to represent the un-lowered
intrinsic.

Reviewers: tstellar, arsenm

Subscribers: kzhuravl, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye

Differential Revision: https://reviews.llvm.org/D34719

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@310088 91177308-0d34-0410-b5e6-96231b3b80d8
2017-08-04 18:36:54 +00:00
Connor Abbott
ecf573917a [AMDGPU] Add support for Whole Wavefront Mode
Summary:
Whole Wavefront Wode (WWM) is similar to WQM, except that all of the
lanes are always enabled, regardless of control flow. This is required
for implementing wavefront reductions in non-uniform control flow, where
we need to use the inactive lanes to propagate intermediate results, so
they need to be enabled. We need to propagate WWM to uses (unless
they're explicitly marked as exact) so that they also propagate
intermediate results correctly. We do the analysis and exec mask munging
during the WQM pass, since there are interactions with WQM for things
that require both WQM and WWM. For simplicity, WWM is entirely
block-local -- blocks are never WWM on entry or exit of a block, and WWM
is not propagated to the block level.  This means that computations
involving WWM cannot involve control flow, but we only ever plan to use
WWM for a few limited purposes (none of which involve control flow)
anyways.

Shaders can ask for WWM using the @llvm.amdgcn.wwm intrinsic. There
isn't yet a way to turn WWM off -- that will be added in a future
change.

Finally, it turns out that turning on inactive lanes causes a number of
problems with register allocation. While the best long-term solution
seems like teaching LLVM's register allocator about predication, for now
we need to add some hacks to prevent ourselves from getting into trouble
due to constraints that aren't currently expressed in LLVM. For the gory
details, see the comments at the top of SIFixWWMLiveness.cpp.

Reviewers: arsenm, nhaehnle, tpr

Subscribers: kzhuravl, wdng, mgorny, yaxunl, dstuttard, t-tye, llvm-commits

Differential Revision: https://reviews.llvm.org/D35524

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@310087 91177308-0d34-0410-b5e6-96231b3b80d8
2017-08-04 18:36:52 +00:00
Connor Abbott
7af89e579d [AMDGPU] Add an llvm.amdgcn.wqm intrinsic for WQM
Summary:
Previously, we assumed that certain types of instructions needed WQM in
pixel shaders, particularly DS instructions and image sampling
instructions. This was ok because with OpenGL, the assumption was
correct. But we want to start using DPP instructions for derivatives as
well as other things, so the assumption that we can infer whether to use
WQM based on the instruction won't continue to hold. This intrinsic lets
frontends like Mesa indicate what things need WQM based on their
knowledge of the API, rather than second-guessing them in the backend.
We need to keep around the old method of enabling WQM, but eventually we
should remove it once Mesa catches up. For now, this will let us use DPP
instructions for computing derivatives correctly.

Reviewers: arsenm, tpr, nhaehnle

Subscribers: kzhuravl, wdng, yaxunl, dstuttard, llvm-commits, t-tye

Differential Revision: https://reviews.llvm.org/D35167

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@310085 91177308-0d34-0410-b5e6-96231b3b80d8
2017-08-04 18:36:49 +00:00
Stanislav Mekhanoshin
6069fa8531 [AMDGPU] Preserve inverted bit in SI_IF in presence of SI_KILL
In case if SI_KILL is in between of the SI_IF and SI_END_CF we need
to preserve the bits actually flipped by if rather then restoring
the original mask.

Differential Revision: https://reviews.llvm.org/D36299

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@310031 91177308-0d34-0410-b5e6-96231b3b80d8
2017-08-04 06:58:42 +00:00
Connor Abbott
1a05d247fa [AMDGPU] Add missing hazard for DPP-after-EXEC-write
Summary:
Following the docs, we need at least 5 wait states between an EXEC write
and an instruction that uses DPP.

Reviewers: tstellar, arsenm

Subscribers: kzhuravl, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, llvm-commits

Differential Revision: https://reviews.llvm.org/D34849

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@310013 91177308-0d34-0410-b5e6-96231b3b80d8
2017-08-04 01:09:43 +00:00
Matt Arsenault
0856e7acd5 AMDGPU: Don't use report_fatal_error for unsupported call types
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@310004 91177308-0d34-0410-b5e6-96231b3b80d8
2017-08-03 23:32:41 +00:00
Matt Arsenault
dfb2cfedfc AMDGPU: Remove error on calls for amdgcn
Repurpose the -amdgpu-function-calls flag. Rather
than require it to emit a call, only use it to
run the always inline path or not.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@310003 91177308-0d34-0410-b5e6-96231b3b80d8
2017-08-03 23:24:05 +00:00
Matt Arsenault
688929ea0f AMDGPU: Fix implicitarg.ptr handling special inputs
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@310002 91177308-0d34-0410-b5e6-96231b3b80d8
2017-08-03 23:12:44 +00:00
Matt Arsenault
c60159767d AMDGPU: Pass special input registers to functions
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@309998 91177308-0d34-0410-b5e6-96231b3b80d8
2017-08-03 23:00:29 +00:00
Connor Abbott
6615aeaf28 test commit
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@309981 91177308-0d34-0410-b5e6-96231b3b80d8
2017-08-03 20:22:30 +00:00