Commit Graph

608 Commits

Author SHA1 Message Date
Konstantin Zhuravlyov
2d50d3f3c9 [AMDGPU] Promote uniform (i1, i16] operations to i32
Differential Revision: https://reviews.llvm.org/D25302


git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@283555 91177308-0d34-0410-b5e6-96231b3b80d8
2016-10-07 14:22:58 +00:00
Nicolai Haehnle
f3907ede55 AMDGPU: Fix use-after-free in SIOptimizeExecMasking
Summary:
There was a bug with sequences like

   s_mov_b64 s[0:1], exec
   s_and_b64 s[2:3]<def>, s[0:1], s[2:3]<kill>
   ...
   s_mov_b64_term exec, s[2:3]

because s[2:3] was defined and used in the same instruction, ending up with
SaveExecInst inside OtherUseInsts.

Note that the test case also exposes an unrelated bug.

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=98028

Reviewers: tstellarAMD, arsenm

Subscribers: kzhuravl, wdng, yaxunl, llvm-commits, tony-tye

Differential Revision: https://reviews.llvm.org/D25306

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@283528 91177308-0d34-0410-b5e6-96231b3b80d8
2016-10-07 08:40:14 +00:00
Matt Arsenault
0b0321b9a7 AMDGPU: Change check prefix in test
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@283521 91177308-0d34-0410-b5e6-96231b3b80d8
2016-10-07 03:55:04 +00:00
Matt Arsenault
ecc6c2b633 BranchRelaxation: Support expanding unconditional branches
AMDGPU needs to expand unconditional branches in a new
block with an indirect branch.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@283464 91177308-0d34-0410-b5e6-96231b3b80d8
2016-10-06 16:20:41 +00:00
Konstantin Zhuravlyov
bb3823e630 [AMDGPU] Promote uniform i16 bitreverse intrinsic to i32
Differential Revision: https://reviews.llvm.org/D25121


git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@283415 91177308-0d34-0410-b5e6-96231b3b80d8
2016-10-06 02:20:46 +00:00
Bjorn Pettersson
17676e0feb [DAG] Teach computeKnownBits and ComputeNumSignBits in SelectionDAG to look through EXTRACT_VECTOR_ELT.
Summary: Both computeKnownBits and ComputeNumSignBits can now do a simple
look-through of EXTRACT_VECTOR_ELT. It will compute the result based
on the known bits (or known sign bits) for the vector that the element
is extracted from.

Reviewers: bogner, tstellarAMD, mkuper

Subscribers: wdng, RKSimon, jyknight, llvm-commits, nhaehnle

Differential Revision: https://reviews.llvm.org/D25007

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@283347 91177308-0d34-0410-b5e6-96231b3b80d8
2016-10-05 17:40:27 +00:00
Matthias Braun
6b11127921 Set some tests to an unknown vendor and OS
This avoids llc using the hosts OS/vendor as defaults and triggering
unwanted behaviour in the tests. This should deal with the buildbot
breakages on windows after r283140.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@283149 91177308-0d34-0410-b5e6-96231b3b80d8
2016-10-03 21:58:20 +00:00
Konstantin Zhuravlyov
1e8f5fd9f8 [AMDGPU] Sign extend AShr when promoting (instead of zero extending)
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@283130 91177308-0d34-0410-b5e6-96231b3b80d8
2016-10-03 18:29:01 +00:00
Matt Arsenault
82f124302e AMDGPU: Fix missing -verify-machineinstrs in test
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@283107 91177308-0d34-0410-b5e6-96231b3b80d8
2016-10-03 12:58:59 +00:00
Mehdi Amini
3e821f8cd8 Revert "AMDGPU: Don't use offen if it is 0"
This reverts commit r282999.
Tests are not passing: http://lab.llvm.org:8011/builders/clang-x86_64-linux-selfhost-modules/builds/20038

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@283003 91177308-0d34-0410-b5e6-96231b3b80d8
2016-10-01 02:35:24 +00:00
Matt Arsenault
494146de48 AMDGPU: Don't use offen if it is 0
This removes many re-initializations of a base register to 0.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@282999 91177308-0d34-0410-b5e6-96231b3b80d8
2016-10-01 01:37:15 +00:00
Matt Arsenault
7eba65d30c AMDGPU: Use unsigned compare for eq/ne
For some reason there are both of these available, except
for scalar 64-bit compares which only has u64. I'm not sure
why there are both (I'm guessing it's for the one bit inputs we
don't use), but for consistency always using the
unsigned one.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@282832 91177308-0d34-0410-b5e6-96231b3b80d8
2016-09-30 01:50:20 +00:00
Matt Arsenault
0461ece2ce AMDGPU: Partially fix control flow at -O0
Fixes to allow spilling all registers at the end of the block
work with exec modifications. Don't emit s_and_saveexec_b64 for
if lowering, and instead emit copies. Mark control flow mask
instructions as terminators to get correct spill code placement
with fast regalloc, and then have a separate optimization pass
form the saveexec.

This should work if SGPRs are spilled to VGPRs, but
will likely fail in the case that an SGPR spills to memory
and no workitem takes a divergent branch.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@282667 91177308-0d34-0410-b5e6-96231b3b80d8
2016-09-29 01:44:16 +00:00
Konstantin Zhuravlyov
f9bcd7b189 [AMDGPU] Promote uniform i16 ops to i32 ops for targets that have 16 bit instructions
Differential Revision: https://reviews.llvm.org/D24125



git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@282624 91177308-0d34-0410-b5e6-96231b3b80d8
2016-09-28 20:05:39 +00:00
Nirav Dave
bb15ebf5c7 Revert "In visitSTORE, always use FindBetterChain, rather than only when UseAA is enabled."
This reverts commit r282600 due to test failues with MCJIT

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@282604 91177308-0d34-0410-b5e6-96231b3b80d8
2016-09-28 16:37:50 +00:00
Nirav Dave
a6d3e00dff In visitSTORE, always use FindBetterChain, rather than only when UseAA is enabled.
Simplify Consecutive Merge Store Candidate Search

  Now that address aliasing is much less conservative, push through
  simplified store merging search which only checks for parallel stores
  through the chain subgraph. This is cleaner as the separation of
  non-interfering loads/stores from the store-merging logic.

  Whem merging stores, search up the chain through a single load, and
  finds all possible stores by looking down from through a load and a
  TokenFactor to all stores visited. This improves the quality of the
  output SelectionDAG and generally the output CodeGen (with some
  exceptions).

  Additional Minor Changes:

    1. Finishes removing unused AliasLoad code
    2. Unifies the the chain aggregation in the merged stores across
       code paths
    3. Re-add the Store node to the worklist after calling
       SimplifyDemandedBits.
    4. Increase GatherAllAliasesMaxDepth from 6 to 18. That number is
       arbitrary, but seemed sufficient to not cause regressions in
       tests.

  This finishes the change Matt Arsenault started in r246307 and
  jyknight's original patch.

  Many tests required some changes as memory operations are now
  reorderable. Some tests relying on the order were changed to use
  volatile memory operations

  Noteworthy tests:

    CodeGen/AArch64/argument-blocks.ll -
      It's not entirely clear what the test_varargs_stackalign test is
      supposed to be asserting, but the new code looks right.

    CodeGen/AArch64/arm64-memset-inline.lli -
    CodeGen/AArch64/arm64-stur.ll -
    CodeGen/ARM/memset-inline.ll -
      The backend now generates *worse* code due to store merging
      succeeding, as we do do a 16-byte constant-zero store efficiently.

    CodeGen/AArch64/merge-store.ll -
      Improved, but there still seems to be an extraneous vector insert
      from an element to itself?

    CodeGen/PowerPC/ppc64-align-long-double.ll -
      Worse code emitted in this case, due to the improved store->load
      forwarding.

    CodeGen/X86/dag-merge-fast-accesses.ll -
    CodeGen/X86/MergeConsecutiveStores.ll -
    CodeGen/X86/stores-merging.ll -
    CodeGen/Mips/load-store-left-right.ll -
      Restored correct merging of non-aligned stores

    CodeGen/AMDGPU/promote-alloca-stored-pointer-value.ll -
      Improved. Correctly merges buffer_store_dword calls

    CodeGen/AMDGPU/si-triv-disjoint-mem-access.ll -
      Improved. Sidesteps loading a stored value and merges two stores

    CodeGen/X86/pr18023.ll -
      This test has been removed, as it was asserting incorrect
      behavior. Non-volatile stores *CAN* be moved past volatile loads,
      and now are.

    CodeGen/X86/vector-idiv.ll -
    CodeGen/X86/vector-lzcnt-128.ll -
      It's basically impossible to tell what these tests are actually
      testing. But, looks like the code got better due to the memory
      operations being recognized as non-aliasing.

    CodeGen/X86/win32-eh.ll -
      Both loads of the securitycookie are now merged.

    CodeGen/AMDGPU/vgpr-spill-emergency-stack-slot-compute.ll -
      This test appears to work but no longer exhibits the spill
      behavior.

Reviewers: arsenm, hfinkel, tstellarAMD, nhaehnle, jyknight

Subscribers: wdng, nhaehnle, nemanjai, arsenm, weimingz, niravd, RKSimon, aemerson, qcolombet, resistor, tstellarAMD, t.p.northover, spatel

Differential Revision: https://reviews.llvm.org/D14834

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@282600 91177308-0d34-0410-b5e6-96231b3b80d8
2016-09-28 15:50:43 +00:00
Michael Kuperstein
db4e01f9a1 [DAG] Remove isVectorClearMaskLegal() check from vector_build dagcombine
This check currently doesn't seem to do anything useful on any in-tree target:
On non-x86, it always evaluates to false, so we never hit the code path that
creates the shuffle with zero.
On x86, it just forwards to isShuffleMaskLegal(), which is a reasonable thing to
query in general, but doesn't make sense if only restricted to zero blends.

Differential Revision: https://reviews.llvm.org/D24625


git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@282567 91177308-0d34-0410-b5e6-96231b3b80d8
2016-09-28 06:13:58 +00:00
Tom Stellard
ccb1190aeb AMDGPU/SI: Don't crash on anonymous GlobalValues
Summary:
We need to call AsmPrinter::getNameWithPrefix() in order to handle
anonymous GlobalValues (e.g. @0, @1).

Reviewers: arsenm, b-sumner

Subscribers: arsenm, kzhuravl, wdng, nhaehnle, yaxunl, tony-tye, llvm-commits

Differential Revision: https://reviews.llvm.org/D24865

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@282420 91177308-0d34-0410-b5e6-96231b3b80d8
2016-09-26 17:29:25 +00:00
Tom Stellard
bf101a6e08 AMDGPU/SI: Include implicit arguments in kernarg_segment_byte_size
Reviewers: arsenm

Subscribers: arsenm, kzhuravl, wdng, nhaehnle, yaxunl, llvm-commits, tony-tye

Differential Revision: https://reviews.llvm.org/D24835

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@282223 91177308-0d34-0410-b5e6-96231b3b80d8
2016-09-23 01:33:26 +00:00
Nirav Dave
489cfe73c2 [DAG] Fix incorrect alignment of ext load.
Correctly use alignment size from loaded size not output value size.

Reviewers: jyknight, tstellarAMD, arsenm

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D23356

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@282177 91177308-0d34-0410-b5e6-96231b3b80d8
2016-09-22 17:28:43 +00:00
Matt Arsenault
de82da5521 AMDGPU: Fix broken FrameIndex handling
We were trying to avoid using a FrameIndex operand in non-pointer
operands in a convoluted way, and would break because of
using TargetFrameIndex. The TargetFrameIndex should only be used
in the case where it makes sense to fold it as part of the addressing
mode, otherwise it requires materialization like a normal constant.
This wasn't working reliably and failed in the added testcase, hitting
the assert when processing the frame index.

The TargetFrameIndex was coming from trying to produce an AssertZext
limiting the maximum stack size. I'm not sure this was correct to begin
with, because it is apparently possible to have a single workitem
dispatch that requires all 4G of private memory.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@281824 91177308-0d34-0410-b5e6-96231b3b80d8
2016-09-17 16:09:55 +00:00
Matt Arsenault
077ab85e5a AMDGPU: Push bitcasts through build_vector
This reduces the number of copies and reg_sequences
when using fp constant vectors. This significantly
reduces the code size in local-stack-alloc-bug.ll

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@281822 91177308-0d34-0410-b5e6-96231b3b80d8
2016-09-17 15:44:16 +00:00
Matt Arsenault
982faf27a3 AMDGPU: Use i64 scalar compare instructions
VI added eq/ne for i64, so use them.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@281800 91177308-0d34-0410-b5e6-96231b3b80d8
2016-09-17 02:02:19 +00:00
Tom Stellard
019f4de043 AMDGPU/SI: Fix kernel argument ABI for HSA
Summary: i8, i16, and f16 values are not extended to 32-bit in the HSA kernel ABI.

Reviewers: arsenm

Subscribers: arsenm, kzhuravl, wdng, nhaehnle, llvm-commits, yaxunl

Differential Revision: https://reviews.llvm.org/D24621

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@281789 91177308-0d34-0410-b5e6-96231b3b80d8
2016-09-16 22:20:24 +00:00
Matt Arsenault
8f824feb12 AMDGPU: Allow some control flow intrinsics to be CSEd
These clean up some unnecessary or instructions in
cases with complex loops.

In the original testcase I noticed this, the same
or with exec was repeated 5 or 6 times in a row. With
this only one is emitted or sometimes a copy.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@281786 91177308-0d34-0410-b5e6-96231b3b80d8
2016-09-16 22:11:18 +00:00
Tom Stellard
77361ae206 AMDGPU: Refactor kernel argument lowering
Summary:
The main challenge in lowering kernel arguments for AMDGPU is determing the
memory type of the argument.  The generic calling convention code assumes
that only legal register types can be stored in memory, but this is not the
case for AMDGPU.

This consolidates all the logic AMDGPU uses for deducing memory types into a single
function.  This will make it much easier to support different ABIs in the future.

Reviewers: arsenm

Subscribers: arsenm, wdng, nhaehnle, llvm-commits, yaxunl

Differential Revision: https://reviews.llvm.org/D24614

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@281781 91177308-0d34-0410-b5e6-96231b3b80d8
2016-09-16 21:53:00 +00:00
Matt Arsenault
3a74bac021 AMDGPU: Use SOPK compare instructions
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@281780 91177308-0d34-0410-b5e6-96231b3b80d8
2016-09-16 21:41:16 +00:00
Tom Stellard
1961591989 AMDGPU/SI: Add support for triples with the mesa3d operating system
Summary:
mesa3d will use the same kernel calling convention as amdhsa, but it will
handle everything else like the default 'unknown' OS type.

Reviewers: arsenm

Subscribers: arsenm, llvm-commits, kzhuravl

Differential Revision: https://reviews.llvm.org/D22783

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@281779 91177308-0d34-0410-b5e6-96231b3b80d8
2016-09-16 21:34:26 +00:00
Matt Arsenault
aceeb33943 Revert "AMDGPU: Use SOPK compare instructions"
Accidentally committed

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@281514 91177308-0d34-0410-b5e6-96231b3b80d8
2016-09-14 18:04:42 +00:00
Matt Arsenault
fff5113a50 AMDGPU: Use SOPK compare instructions
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@281513 91177308-0d34-0410-b5e6-96231b3b80d8
2016-09-14 18:03:53 +00:00
Matt Arsenault
0f7125844e AMDGPU: Support folding FrameIndex operands
This avoids test regressions in a future commit.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@281491 91177308-0d34-0410-b5e6-96231b3b80d8
2016-09-14 15:51:33 +00:00
Matt Arsenault
8bc95d0a47 AMDGPU: Improve splitting 64-bit bit ops by constants
This addresses a TODO to handle operations besides and. This
also starts eliminating no-op operations with a constant that
can emerge later.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@281488 91177308-0d34-0410-b5e6-96231b3b80d8
2016-09-14 15:19:03 +00:00
Matt Arsenault
ec4f2a0c81 AMDGPU: Support commuting a FrameIndex operand
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@281369 91177308-0d34-0410-b5e6-96231b3b80d8
2016-09-13 19:03:12 +00:00
Nicolai Haehnle
01a133c760 AMDGPU: Do not clobber SCC in SIWholeQuadMode
Reviewers: arsenm, tstellarAMD, mareko

Subscribers: arsenm, llvm-commits, kzhuravl

Differential Revision: http://reviews.llvm.org/D22198

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@281230 91177308-0d34-0410-b5e6-96231b3b80d8
2016-09-12 16:25:20 +00:00
NAKAMURA Takumi
56e56fbc57 llvm/test/CodeGen/AMDGPU/infinite-loop-evergreen.ll REQUIRES +Asserts.
This might not *crash* with -Asserts. I saw it caused infinite loop in the codegen.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@281190 91177308-0d34-0410-b5e6-96231b3b80d8
2016-09-12 04:27:28 +00:00
Matt Arsenault
3843a1382e AMDGPU: Fix immediate folding logic when shrinking instructions
If the literal is being folded into src0, it doesn't matter
if it's an SGPR because it's being replaced with the literal.

Also fixes initially selecting 32-bit versions of some instructions
which also confused commuting.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@281117 91177308-0d34-0410-b5e6-96231b3b80d8
2016-09-09 23:32:53 +00:00
Matt Arsenault
d5a5e9043a AMDGPU: Run LoadStoreVectorizer pass by default
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@281112 91177308-0d34-0410-b5e6-96231b3b80d8
2016-09-09 22:29:28 +00:00
Wei Ding
8dae05acf4 AMDGPU : Fix mqsad_u32_u8 instruction incorrect data type.
Differential Revision: http://reviews.llvm.org/D23700

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@281081 91177308-0d34-0410-b5e6-96231b3b80d8
2016-09-09 19:31:51 +00:00
Tom Stellard
6ecd5004b4 AMDGPU/SI: Make sure llvm.amdgcn.implicitarg.ptr() is 8-byte aligned for HSA
Reviewers: arsenm

Subscribers: arsenm, wdng, nhaehnle, llvm-commits

Differential Revision: https://reviews.llvm.org/D24405

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@281080 91177308-0d34-0410-b5e6-96231b3b80d8
2016-09-09 19:28:00 +00:00
Tim Northover
59282d3fd2 GlobalISel: move type information to MachineRegisterInfo.
We want each register to have a canonical type, which means the best place to
store this is in MachineRegisterInfo rather than on every MachineInstr that
happens to use or define that register.

Most changes following from this are pretty simple (you need an MRI anyway if
you're going to be doing any transformations, so just check the type there).
But legalization doesn't really want to check redundant operands (when, for
example, a G_ADD only ever has one type) so I've made use of MCInstrDesc's
operand type field to encode these constraints and limit legalization's work.

As an added bonus, more validation is possible, both in MachineVerifier and
MachineIRBuilder (coming soon).

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@281035 91177308-0d34-0410-b5e6-96231b3b80d8
2016-09-09 11:46:34 +00:00
Sam Kolton
442deaa85b [AMDGPU] Assembler: rename amd_kernel_code_t asm names according to spec
Summary:
Also removed duplicate code from AMDGPUTargetAsmStreamer.
This change only change how amd_kernel_code_t is parsed and printed. No variable names are changed.

Reviewers: vpykhtin, tstellarAMD

Subscribers: arsenm, wdng, nhaehnle

Differential Revision: https://reviews.llvm.org/D24296

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@281028 91177308-0d34-0410-b5e6-96231b3b80d8
2016-09-09 10:08:02 +00:00
Matt Arsenault
c0196eb442 AMDGPU: Try to commute when selecting s_addk_i32/s_mulk_i32
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@280972 91177308-0d34-0410-b5e6-96231b3b80d8
2016-09-08 17:35:41 +00:00
Matt Arsenault
d764af3c4e AMDGPU: Support commuting with immediate in src0
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@280970 91177308-0d34-0410-b5e6-96231b3b80d8
2016-09-08 17:19:29 +00:00
Simon Pilgrim
d88990b028 [SelectionDAG] Add BUILD_VECTOR support to computeKnownBits and SimplifyDemandedBits
Add the ability to computeKnownBits and SimplifyDemandedBits to extract the known zero/one bits from BUILD_VECTOR, returning the known bits that are shared by every vector element.

This is an initial step towards determining the sign bits of a vector (PR29079).

Differential Revision: https://reviews.llvm.org/D24253

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@280927 91177308-0d34-0410-b5e6-96231b3b80d8
2016-09-08 12:57:51 +00:00
Yaxun Liu
6874fa846b AMDGPU: Add hidden kernel arguments to runtime metadata
OpenCL kernels have hidden kernel arguments for global offset and printf buffer. For consistency, these hidden argument should be included in the runtime metadata. Also updated kernel argument kind metadata.

Differential Revision: https://reviews.llvm.org/D23424

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@280829 91177308-0d34-0410-b5e6-96231b3b80d8
2016-09-07 17:44:00 +00:00
Konstantin Zhuravlyov
c87867f700 [AMDGPU] Wave and register controls
- Add missing test



git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@280749 91177308-0d34-0410-b5e6-96231b3b80d8
2016-09-06 20:29:10 +00:00
Konstantin Zhuravlyov
1f99c41083 [AMDGPU] Wave and register controls
- Implemented amdgpu-flat-work-group-size attribute
- Implemented amdgpu-num-active-waves-per-eu attribute
- Implemented amdgpu-num-sgpr attribute
- Implemented amdgpu-num-vgpr attribute
- Dynamic LDS constraints are in a separate patch

Patch by Tom Stellard and Konstantin Zhuravlyov

Differential Revision: https://reviews.llvm.org/D21562



git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@280747 91177308-0d34-0410-b5e6-96231b3b80d8
2016-09-06 20:22:28 +00:00
Wei Ding
9493fa986d AMDGPU : Add XNACK feature to GPUs that support it.
Differential Revision: http://reviews.llvm.org/D24276

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@280742 91177308-0d34-0410-b5e6-96231b3b80d8
2016-09-06 19:55:17 +00:00
Nicolai Haehnle
f9f3b70e90 AMDGPU: Reduce the duration of whole-quad-mode
Summary:
This contains two changes that reduce the time spent in WQM, with the
intention of reducing bandwidth required by VMEM loads:

1. Sampling instructions by themselves don't need to run in WQM, only their
   coordinate inputs need it (unless of course there is a dependent sampling
   instruction). The initial scanInstructions step is modified accordingly.

2. When switching back from WQM to Exact, switch back as soon as possible.
   This affects the logic in processBlock.

This should always be a win or at best neutral.

There are also some cleanups (e.g. remove unused ExecExports) and some new
debugging output.

Reviewers: arsenm, tstellarAMD, mareko

Subscribers: arsenm, llvm-commits, kzhuravl

Differential Revision: http://reviews.llvm.org/D22092

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@280590 91177308-0d34-0410-b5e6-96231b3b80d8
2016-09-03 12:26:38 +00:00
Nicolai Haehnle
c280d74837 AMDGPU: Fix an interaction between WQM and polygon stippling
Summary:
This fixes a rare bug in polygon stippling with non-monolithic pixel shaders.

The underlying problem is as follows: the prolog part contains the polygon
stippling sequence, i.e. a kill. The main part then enables WQM based on the
_reduced_ exec mask, effectively undoing most of the polygon stippling.

Since we cannot know whether polygon stippling will be used, the main part
of a non-monolithic shader must always return to exact mode to fix this
problem.

Reviewers: arsenm, tstellarAMD, mareko

Subscribers: arsenm, llvm-commits, kzhuravl

Differential Revision: https://reviews.llvm.org/D23131

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@280589 91177308-0d34-0410-b5e6-96231b3b80d8
2016-09-03 12:26:32 +00:00