archived-llvm

mirror of https://github.com/RPCS3/llvm.git synced 2026-01-31 01:25:19 +01:00

Author	SHA1	Message	Date
Konstantin Zhuravlyov	2d50d3f3c9	[AMDGPU] Promote uniform (i1, i16] operations to i32 Differential Revision: https://reviews.llvm.org/D25302 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@283555 91177308-0d34-0410-b5e6-96231b3b80d8	2016-10-07 14:22:58 +00:00
Nicolai Haehnle	f3907ede55	AMDGPU: Fix use-after-free in SIOptimizeExecMasking Summary: There was a bug with sequences like s_mov_b64 s[0:1], exec s_and_b64 s[2:3]<def>, s[0:1], s[2:3]<kill> ... s_mov_b64_term exec, s[2:3] because s[2:3] was defined and used in the same instruction, ending up with SaveExecInst inside OtherUseInsts. Note that the test case also exposes an unrelated bug. Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=98028 Reviewers: tstellarAMD, arsenm Subscribers: kzhuravl, wdng, yaxunl, llvm-commits, tony-tye Differential Revision: https://reviews.llvm.org/D25306 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@283528 91177308-0d34-0410-b5e6-96231b3b80d8	2016-10-07 08:40:14 +00:00
Matt Arsenault	0b0321b9a7	AMDGPU: Change check prefix in test git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@283521 91177308-0d34-0410-b5e6-96231b3b80d8	2016-10-07 03:55:04 +00:00
Matt Arsenault	ecc6c2b633	BranchRelaxation: Support expanding unconditional branches AMDGPU needs to expand unconditional branches in a new block with an indirect branch. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@283464 91177308-0d34-0410-b5e6-96231b3b80d8	2016-10-06 16:20:41 +00:00
Konstantin Zhuravlyov	bb3823e630	[AMDGPU] Promote uniform i16 bitreverse intrinsic to i32 Differential Revision: https://reviews.llvm.org/D25121 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@283415 91177308-0d34-0410-b5e6-96231b3b80d8	2016-10-06 02:20:46 +00:00
Bjorn Pettersson	17676e0feb	[DAG] Teach computeKnownBits and ComputeNumSignBits in SelectionDAG to look through EXTRACT_VECTOR_ELT. Summary: Both computeKnownBits and ComputeNumSignBits can now do a simple look-through of EXTRACT_VECTOR_ELT. It will compute the result based on the known bits (or known sign bits) for the vector that the element is extracted from. Reviewers: bogner, tstellarAMD, mkuper Subscribers: wdng, RKSimon, jyknight, llvm-commits, nhaehnle Differential Revision: https://reviews.llvm.org/D25007 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@283347 91177308-0d34-0410-b5e6-96231b3b80d8	2016-10-05 17:40:27 +00:00
Matthias Braun	6b11127921	Set some tests to an unknown vendor and OS This avoids llc using the hosts OS/vendor as defaults and triggering unwanted behaviour in the tests. This should deal with the buildbot breakages on windows after r283140. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@283149 91177308-0d34-0410-b5e6-96231b3b80d8	2016-10-03 21:58:20 +00:00
Konstantin Zhuravlyov	1e8f5fd9f8	[AMDGPU] Sign extend AShr when promoting (instead of zero extending) git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@283130 91177308-0d34-0410-b5e6-96231b3b80d8	2016-10-03 18:29:01 +00:00
Matt Arsenault	82f124302e	AMDGPU: Fix missing -verify-machineinstrs in test git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@283107 91177308-0d34-0410-b5e6-96231b3b80d8	2016-10-03 12:58:59 +00:00
Mehdi Amini	3e821f8cd8	Revert "AMDGPU: Don't use offen if it is 0" This reverts commit r282999. Tests are not passing: http://lab.llvm.org:8011/builders/clang-x86_64-linux-selfhost-modules/builds/20038 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@283003 91177308-0d34-0410-b5e6-96231b3b80d8	2016-10-01 02:35:24 +00:00
Matt Arsenault	494146de48	AMDGPU: Don't use offen if it is 0 This removes many re-initializations of a base register to 0. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@282999 91177308-0d34-0410-b5e6-96231b3b80d8	2016-10-01 01:37:15 +00:00
Matt Arsenault	7eba65d30c	AMDGPU: Use unsigned compare for eq/ne For some reason there are both of these available, except for scalar 64-bit compares which only has u64. I'm not sure why there are both (I'm guessing it's for the one bit inputs we don't use), but for consistency always using the unsigned one. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@282832 91177308-0d34-0410-b5e6-96231b3b80d8	2016-09-30 01:50:20 +00:00
Matt Arsenault	0461ece2ce	AMDGPU: Partially fix control flow at -O0 Fixes to allow spilling all registers at the end of the block work with exec modifications. Don't emit s_and_saveexec_b64 for if lowering, and instead emit copies. Mark control flow mask instructions as terminators to get correct spill code placement with fast regalloc, and then have a separate optimization pass form the saveexec. This should work if SGPRs are spilled to VGPRs, but will likely fail in the case that an SGPR spills to memory and no workitem takes a divergent branch. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@282667 91177308-0d34-0410-b5e6-96231b3b80d8	2016-09-29 01:44:16 +00:00
Konstantin Zhuravlyov	f9bcd7b189	[AMDGPU] Promote uniform i16 ops to i32 ops for targets that have 16 bit instructions Differential Revision: https://reviews.llvm.org/D24125 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@282624 91177308-0d34-0410-b5e6-96231b3b80d8	2016-09-28 20:05:39 +00:00
Nirav Dave	bb15ebf5c7	Revert "In visitSTORE, always use FindBetterChain, rather than only when UseAA is enabled." This reverts commit r282600 due to test failues with MCJIT git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@282604 91177308-0d34-0410-b5e6-96231b3b80d8	2016-09-28 16:37:50 +00:00
Nirav Dave	a6d3e00dff	In visitSTORE, always use FindBetterChain, rather than only when UseAA is enabled. Simplify Consecutive Merge Store Candidate Search Now that address aliasing is much less conservative, push through simplified store merging search which only checks for parallel stores through the chain subgraph. This is cleaner as the separation of non-interfering loads/stores from the store-merging logic. Whem merging stores, search up the chain through a single load, and finds all possible stores by looking down from through a load and a TokenFactor to all stores visited. This improves the quality of the output SelectionDAG and generally the output CodeGen (with some exceptions). Additional Minor Changes: 1. Finishes removing unused AliasLoad code 2. Unifies the the chain aggregation in the merged stores across code paths 3. Re-add the Store node to the worklist after calling SimplifyDemandedBits. 4. Increase GatherAllAliasesMaxDepth from 6 to 18. That number is arbitrary, but seemed sufficient to not cause regressions in tests. This finishes the change Matt Arsenault started in r246307 and jyknight's original patch. Many tests required some changes as memory operations are now reorderable. Some tests relying on the order were changed to use volatile memory operations Noteworthy tests: CodeGen/AArch64/argument-blocks.ll - It's not entirely clear what the test_varargs_stackalign test is supposed to be asserting, but the new code looks right. CodeGen/AArch64/arm64-memset-inline.lli - CodeGen/AArch64/arm64-stur.ll - CodeGen/ARM/memset-inline.ll - The backend now generates worse code due to store merging succeeding, as we do do a 16-byte constant-zero store efficiently. CodeGen/AArch64/merge-store.ll - Improved, but there still seems to be an extraneous vector insert from an element to itself? CodeGen/PowerPC/ppc64-align-long-double.ll - Worse code emitted in this case, due to the improved store->load forwarding. CodeGen/X86/dag-merge-fast-accesses.ll - CodeGen/X86/MergeConsecutiveStores.ll - CodeGen/X86/stores-merging.ll - CodeGen/Mips/load-store-left-right.ll - Restored correct merging of non-aligned stores CodeGen/AMDGPU/promote-alloca-stored-pointer-value.ll - Improved. Correctly merges buffer_store_dword calls CodeGen/AMDGPU/si-triv-disjoint-mem-access.ll - Improved. Sidesteps loading a stored value and merges two stores CodeGen/X86/pr18023.ll - This test has been removed, as it was asserting incorrect behavior. Non-volatile stores CAN be moved past volatile loads, and now are. CodeGen/X86/vector-idiv.ll - CodeGen/X86/vector-lzcnt-128.ll - It's basically impossible to tell what these tests are actually testing. But, looks like the code got better due to the memory operations being recognized as non-aliasing. CodeGen/X86/win32-eh.ll - Both loads of the securitycookie are now merged. CodeGen/AMDGPU/vgpr-spill-emergency-stack-slot-compute.ll - This test appears to work but no longer exhibits the spill behavior. Reviewers: arsenm, hfinkel, tstellarAMD, nhaehnle, jyknight Subscribers: wdng, nhaehnle, nemanjai, arsenm, weimingz, niravd, RKSimon, aemerson, qcolombet, resistor, tstellarAMD, t.p.northover, spatel Differential Revision: https://reviews.llvm.org/D14834 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@282600 91177308-0d34-0410-b5e6-96231b3b80d8	2016-09-28 15:50:43 +00:00
Michael Kuperstein	db4e01f9a1	[DAG] Remove isVectorClearMaskLegal() check from vector_build dagcombine This check currently doesn't seem to do anything useful on any in-tree target: On non-x86, it always evaluates to false, so we never hit the code path that creates the shuffle with zero. On x86, it just forwards to isShuffleMaskLegal(), which is a reasonable thing to query in general, but doesn't make sense if only restricted to zero blends. Differential Revision: https://reviews.llvm.org/D24625 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@282567 91177308-0d34-0410-b5e6-96231b3b80d8	2016-09-28 06:13:58 +00:00
Tom Stellard	ccb1190aeb	AMDGPU/SI: Don't crash on anonymous GlobalValues Summary: We need to call AsmPrinter::getNameWithPrefix() in order to handle anonymous GlobalValues (e.g. @0, @1). Reviewers: arsenm, b-sumner Subscribers: arsenm, kzhuravl, wdng, nhaehnle, yaxunl, tony-tye, llvm-commits Differential Revision: https://reviews.llvm.org/D24865 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@282420 91177308-0d34-0410-b5e6-96231b3b80d8	2016-09-26 17:29:25 +00:00
Tom Stellard	bf101a6e08	AMDGPU/SI: Include implicit arguments in kernarg_segment_byte_size Reviewers: arsenm Subscribers: arsenm, kzhuravl, wdng, nhaehnle, yaxunl, llvm-commits, tony-tye Differential Revision: https://reviews.llvm.org/D24835 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@282223 91177308-0d34-0410-b5e6-96231b3b80d8	2016-09-23 01:33:26 +00:00
Nirav Dave	489cfe73c2	[DAG] Fix incorrect alignment of ext load. Correctly use alignment size from loaded size not output value size. Reviewers: jyknight, tstellarAMD, arsenm Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D23356 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@282177 91177308-0d34-0410-b5e6-96231b3b80d8	2016-09-22 17:28:43 +00:00
Matt Arsenault	de82da5521	AMDGPU: Fix broken FrameIndex handling We were trying to avoid using a FrameIndex operand in non-pointer operands in a convoluted way, and would break because of using TargetFrameIndex. The TargetFrameIndex should only be used in the case where it makes sense to fold it as part of the addressing mode, otherwise it requires materialization like a normal constant. This wasn't working reliably and failed in the added testcase, hitting the assert when processing the frame index. The TargetFrameIndex was coming from trying to produce an AssertZext limiting the maximum stack size. I'm not sure this was correct to begin with, because it is apparently possible to have a single workitem dispatch that requires all 4G of private memory. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@281824 91177308-0d34-0410-b5e6-96231b3b80d8	2016-09-17 16:09:55 +00:00
Matt Arsenault	077ab85e5a	AMDGPU: Push bitcasts through build_vector This reduces the number of copies and reg_sequences when using fp constant vectors. This significantly reduces the code size in local-stack-alloc-bug.ll git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@281822 91177308-0d34-0410-b5e6-96231b3b80d8	2016-09-17 15:44:16 +00:00
Matt Arsenault	982faf27a3	AMDGPU: Use i64 scalar compare instructions VI added eq/ne for i64, so use them. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@281800 91177308-0d34-0410-b5e6-96231b3b80d8	2016-09-17 02:02:19 +00:00
Tom Stellard	019f4de043	AMDGPU/SI: Fix kernel argument ABI for HSA Summary: i8, i16, and f16 values are not extended to 32-bit in the HSA kernel ABI. Reviewers: arsenm Subscribers: arsenm, kzhuravl, wdng, nhaehnle, llvm-commits, yaxunl Differential Revision: https://reviews.llvm.org/D24621 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@281789 91177308-0d34-0410-b5e6-96231b3b80d8	2016-09-16 22:20:24 +00:00
Matt Arsenault	8f824feb12	AMDGPU: Allow some control flow intrinsics to be CSEd These clean up some unnecessary or instructions in cases with complex loops. In the original testcase I noticed this, the same or with exec was repeated 5 or 6 times in a row. With this only one is emitted or sometimes a copy. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@281786 91177308-0d34-0410-b5e6-96231b3b80d8	2016-09-16 22:11:18 +00:00
Tom Stellard	77361ae206	AMDGPU: Refactor kernel argument lowering Summary: The main challenge in lowering kernel arguments for AMDGPU is determing the memory type of the argument. The generic calling convention code assumes that only legal register types can be stored in memory, but this is not the case for AMDGPU. This consolidates all the logic AMDGPU uses for deducing memory types into a single function. This will make it much easier to support different ABIs in the future. Reviewers: arsenm Subscribers: arsenm, wdng, nhaehnle, llvm-commits, yaxunl Differential Revision: https://reviews.llvm.org/D24614 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@281781 91177308-0d34-0410-b5e6-96231b3b80d8	2016-09-16 21:53:00 +00:00
Matt Arsenault	3a74bac021	AMDGPU: Use SOPK compare instructions git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@281780 91177308-0d34-0410-b5e6-96231b3b80d8	2016-09-16 21:41:16 +00:00
Tom Stellard	1961591989	AMDGPU/SI: Add support for triples with the mesa3d operating system Summary: mesa3d will use the same kernel calling convention as amdhsa, but it will handle everything else like the default 'unknown' OS type. Reviewers: arsenm Subscribers: arsenm, llvm-commits, kzhuravl Differential Revision: https://reviews.llvm.org/D22783 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@281779 91177308-0d34-0410-b5e6-96231b3b80d8	2016-09-16 21:34:26 +00:00
Matt Arsenault	aceeb33943	Revert "AMDGPU: Use SOPK compare instructions" Accidentally committed git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@281514 91177308-0d34-0410-b5e6-96231b3b80d8	2016-09-14 18:04:42 +00:00
Matt Arsenault	fff5113a50	AMDGPU: Use SOPK compare instructions git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@281513 91177308-0d34-0410-b5e6-96231b3b80d8	2016-09-14 18:03:53 +00:00
Matt Arsenault	0f7125844e	AMDGPU: Support folding FrameIndex operands This avoids test regressions in a future commit. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@281491 91177308-0d34-0410-b5e6-96231b3b80d8	2016-09-14 15:51:33 +00:00
Matt Arsenault	8bc95d0a47	AMDGPU: Improve splitting 64-bit bit ops by constants This addresses a TODO to handle operations besides and. This also starts eliminating no-op operations with a constant that can emerge later. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@281488 91177308-0d34-0410-b5e6-96231b3b80d8	2016-09-14 15:19:03 +00:00
Matt Arsenault	ec4f2a0c81	AMDGPU: Support commuting a FrameIndex operand git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@281369 91177308-0d34-0410-b5e6-96231b3b80d8	2016-09-13 19:03:12 +00:00
Nicolai Haehnle	01a133c760	AMDGPU: Do not clobber SCC in SIWholeQuadMode Reviewers: arsenm, tstellarAMD, mareko Subscribers: arsenm, llvm-commits, kzhuravl Differential Revision: http://reviews.llvm.org/D22198 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@281230 91177308-0d34-0410-b5e6-96231b3b80d8	2016-09-12 16:25:20 +00:00
NAKAMURA Takumi	56e56fbc57	llvm/test/CodeGen/AMDGPU/infinite-loop-evergreen.ll REQUIRES +Asserts. This might not crash with -Asserts. I saw it caused infinite loop in the codegen. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@281190 91177308-0d34-0410-b5e6-96231b3b80d8	2016-09-12 04:27:28 +00:00
Matt Arsenault	3843a1382e	AMDGPU: Fix immediate folding logic when shrinking instructions If the literal is being folded into src0, it doesn't matter if it's an SGPR because it's being replaced with the literal. Also fixes initially selecting 32-bit versions of some instructions which also confused commuting. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@281117 91177308-0d34-0410-b5e6-96231b3b80d8	2016-09-09 23:32:53 +00:00
Matt Arsenault	d5a5e9043a	AMDGPU: Run LoadStoreVectorizer pass by default git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@281112 91177308-0d34-0410-b5e6-96231b3b80d8	2016-09-09 22:29:28 +00:00
Wei Ding	8dae05acf4	AMDGPU : Fix mqsad_u32_u8 instruction incorrect data type. Differential Revision: http://reviews.llvm.org/D23700 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@281081 91177308-0d34-0410-b5e6-96231b3b80d8	2016-09-09 19:31:51 +00:00
Tom Stellard	6ecd5004b4	AMDGPU/SI: Make sure llvm.amdgcn.implicitarg.ptr() is 8-byte aligned for HSA Reviewers: arsenm Subscribers: arsenm, wdng, nhaehnle, llvm-commits Differential Revision: https://reviews.llvm.org/D24405 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@281080 91177308-0d34-0410-b5e6-96231b3b80d8	2016-09-09 19:28:00 +00:00
Tim Northover	59282d3fd2	GlobalISel: move type information to MachineRegisterInfo. We want each register to have a canonical type, which means the best place to store this is in MachineRegisterInfo rather than on every MachineInstr that happens to use or define that register. Most changes following from this are pretty simple (you need an MRI anyway if you're going to be doing any transformations, so just check the type there). But legalization doesn't really want to check redundant operands (when, for example, a G_ADD only ever has one type) so I've made use of MCInstrDesc's operand type field to encode these constraints and limit legalization's work. As an added bonus, more validation is possible, both in MachineVerifier and MachineIRBuilder (coming soon). git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@281035 91177308-0d34-0410-b5e6-96231b3b80d8	2016-09-09 11:46:34 +00:00
Sam Kolton	442deaa85b	[AMDGPU] Assembler: rename amd_kernel_code_t asm names according to spec Summary: Also removed duplicate code from AMDGPUTargetAsmStreamer. This change only change how amd_kernel_code_t is parsed and printed. No variable names are changed. Reviewers: vpykhtin, tstellarAMD Subscribers: arsenm, wdng, nhaehnle Differential Revision: https://reviews.llvm.org/D24296 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@281028 91177308-0d34-0410-b5e6-96231b3b80d8	2016-09-09 10:08:02 +00:00
Matt Arsenault	c0196eb442	AMDGPU: Try to commute when selecting s_addk_i32/s_mulk_i32 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@280972 91177308-0d34-0410-b5e6-96231b3b80d8	2016-09-08 17:35:41 +00:00
Matt Arsenault	d764af3c4e	AMDGPU: Support commuting with immediate in src0 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@280970 91177308-0d34-0410-b5e6-96231b3b80d8	2016-09-08 17:19:29 +00:00
Simon Pilgrim	d88990b028	[SelectionDAG] Add BUILD_VECTOR support to computeKnownBits and SimplifyDemandedBits Add the ability to computeKnownBits and SimplifyDemandedBits to extract the known zero/one bits from BUILD_VECTOR, returning the known bits that are shared by every vector element. This is an initial step towards determining the sign bits of a vector (PR29079). Differential Revision: https://reviews.llvm.org/D24253 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@280927 91177308-0d34-0410-b5e6-96231b3b80d8	2016-09-08 12:57:51 +00:00
Yaxun Liu	6874fa846b	AMDGPU: Add hidden kernel arguments to runtime metadata OpenCL kernels have hidden kernel arguments for global offset and printf buffer. For consistency, these hidden argument should be included in the runtime metadata. Also updated kernel argument kind metadata. Differential Revision: https://reviews.llvm.org/D23424 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@280829 91177308-0d34-0410-b5e6-96231b3b80d8	2016-09-07 17:44:00 +00:00
Konstantin Zhuravlyov	c87867f700	[AMDGPU] Wave and register controls - Add missing test git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@280749 91177308-0d34-0410-b5e6-96231b3b80d8	2016-09-06 20:29:10 +00:00
Konstantin Zhuravlyov	1f99c41083	[AMDGPU] Wave and register controls - Implemented amdgpu-flat-work-group-size attribute - Implemented amdgpu-num-active-waves-per-eu attribute - Implemented amdgpu-num-sgpr attribute - Implemented amdgpu-num-vgpr attribute - Dynamic LDS constraints are in a separate patch Patch by Tom Stellard and Konstantin Zhuravlyov Differential Revision: https://reviews.llvm.org/D21562 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@280747 91177308-0d34-0410-b5e6-96231b3b80d8	2016-09-06 20:22:28 +00:00
Wei Ding	9493fa986d	AMDGPU : Add XNACK feature to GPUs that support it. Differential Revision: http://reviews.llvm.org/D24276 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@280742 91177308-0d34-0410-b5e6-96231b3b80d8	2016-09-06 19:55:17 +00:00
Nicolai Haehnle	f9f3b70e90	AMDGPU: Reduce the duration of whole-quad-mode Summary: This contains two changes that reduce the time spent in WQM, with the intention of reducing bandwidth required by VMEM loads: 1. Sampling instructions by themselves don't need to run in WQM, only their coordinate inputs need it (unless of course there is a dependent sampling instruction). The initial scanInstructions step is modified accordingly. 2. When switching back from WQM to Exact, switch back as soon as possible. This affects the logic in processBlock. This should always be a win or at best neutral. There are also some cleanups (e.g. remove unused ExecExports) and some new debugging output. Reviewers: arsenm, tstellarAMD, mareko Subscribers: arsenm, llvm-commits, kzhuravl Differential Revision: http://reviews.llvm.org/D22092 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@280590 91177308-0d34-0410-b5e6-96231b3b80d8	2016-09-03 12:26:38 +00:00
Nicolai Haehnle	c280d74837	AMDGPU: Fix an interaction between WQM and polygon stippling Summary: This fixes a rare bug in polygon stippling with non-monolithic pixel shaders. The underlying problem is as follows: the prolog part contains the polygon stippling sequence, i.e. a kill. The main part then enables WQM based on the _reduced_ exec mask, effectively undoing most of the polygon stippling. Since we cannot know whether polygon stippling will be used, the main part of a non-monolithic shader must always return to exact mode to fix this problem. Reviewers: arsenm, tstellarAMD, mareko Subscribers: arsenm, llvm-commits, kzhuravl Differential Revision: https://reviews.llvm.org/D23131 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@280589 91177308-0d34-0410-b5e6-96231b3b80d8	2016-09-03 12:26:32 +00:00

1 2 3 4 5 ...

608 Commits