archived-llvm

mirror of https://github.com/RPCS3/llvm.git synced 2026-01-31 01:25:19 +01:00

Author	SHA1	Message	Date
Matt Arsenault	5ad067fad4	AMDGPU: Don't use spir_kernel in a test Also use verify-machineinstrs. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@336374 91177308-0d34-0410-b5e6-96231b3b80d8	2018-07-05 17:01:29 +00:00
Matt Arsenault	e5d3d15134	AMDGPU/GlobalISel: Implement custom kernel arg lowering Avoid using allocateKernArg / AssignFn. We do not want any of the type splitting properties of normal calling convention lowering. For now at least this exists alongside the IR argument lowering pass. This is necessary to handle struct padding correctly while some arguments are still skipped by the IR argument lowering pass. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@336373 91177308-0d34-0410-b5e6-96231b3b80d8	2018-07-05 17:01:20 +00:00
Ryan Taylor	e941c76442	[AMDGPU] Add VALU to V_INTERP Instructions Wait states are not properly being inserted after buffer_store for v_interp instructions. Add VALU to V_INTERP instructions so that the GCNHazardRecognizer can check and insert the appropriate wait states when needed. Differential Revision: https://reviews.llvm.org/D48772 Change-Id: Id540c9b074fc69b5c1de6b182276aa089c74aa64 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@336339 91177308-0d34-0410-b5e6-96231b3b80d8	2018-07-05 12:02:07 +00:00
Tom Stellard	8eb696d509	AMDGPU/GlobalISel: Make IMPLICIT_DEF of all sizes < 512 legal. Summary: We could split sizes that are not power of two into smaller sized G_IMPLICIT_DEF instructions, but this ends up generating G_MERGE_VALUES instructions which we then have to handle in the instruction selector. Since G_IMPLICIT_DEF is really a no-op it's easier just to keep everything that can fit into a register legal. Reviewers: arsenm Reviewed By: arsenm Subscribers: kzhuravl, wdng, nhaehnle, yaxunl, rovka, kristof.beyls, dstuttard, tpr, t-tye, llvm-commits Differential Revision: https://reviews.llvm.org/D48777 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@336041 91177308-0d34-0410-b5e6-96231b3b80d8	2018-06-30 04:09:44 +00:00
Matt Arsenault	eac8acfa94	AMDGPU: Don't use struct type for argument layout This was introducing unnecessary padding after the explicit arguments, depending on the alignment of the total struct type. Also has the side effect of avoiding creating an extra GEP for the offset from the base kernel argument to the explicit kernel argument offset. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@335999 91177308-0d34-0410-b5e6-96231b3b80d8	2018-06-29 17:31:42 +00:00
Stanislav Mekhanoshin	0e1a98e255	[AMDGPU] Enable LICM in the BE pipeline This allows to hoist code portion to compute reciprocal of loop invariant denominator in integer division after codegen prepare expansion. Differential Revision: https://reviews.llvm.org/D48604 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@335988 91177308-0d34-0410-b5e6-96231b3b80d8	2018-06-29 16:26:53 +00:00
Stanislav Mekhanoshin	0e7f42b639	[AMDGPU] Early expansion of 32 bit udiv/urem This allows hoisting of a common code, for instance if denominator is loop invariant. Current change is expansion only, adding licm to the target pass list going to be a separate patch. Given this patch changes to codegen are minor as the expansion is similar to that on DAG. DAG expansion still must remain for R600. Differential Revision: https://reviews.llvm.org/D48586 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@335868 91177308-0d34-0410-b5e6-96231b3b80d8	2018-06-28 15:59:18 +00:00
Stanislav Mekhanoshin	8746f255cd	[AMDGPU] Overload llvm.amdgcn.fmad.ftz to support f16 Differential Revision: https://reviews.llvm.org/D48677 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@335866 91177308-0d34-0410-b5e6-96231b3b80d8	2018-06-28 15:24:46 +00:00
Matt Arsenault	fe32d1437e	AMDGPU: Error on calls from graphics shaders In principle nothing should stop these from working, but work is necessary to create an ABI for dealing with the stack related registers. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@335829 91177308-0d34-0410-b5e6-96231b3b80d8	2018-06-28 10:18:36 +00:00
Matt Arsenault	5f4316253c	AMDGPU: Fix assert on aggregate type kernel arguments Just fix the crash for now by not doing the optimization since figuring out how to properly convert the bits for an arbitrary struct is a pain. Also fix a crash when there is only an empty struct argument. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@335827 91177308-0d34-0410-b5e6-96231b3b80d8	2018-06-28 10:18:11 +00:00
Stanislav Mekhanoshin	bc547571e7	[AMDGPU] Convert rcp to rcp_iflag If a source of rcp instruction is a result of any conversion from an integer convert it into rcp_iflag instruction. No FP exception can ever happen except division by zero if a single precision rcp argument is a representation of an integral number. Differential Revision: https://reviews.llvm.org/D48569 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@335742 91177308-0d34-0410-b5e6-96231b3b80d8	2018-06-27 15:33:33 +00:00
Stanislav Mekhanoshin	aac117a9ba	[AMDGPU] Add llvm.amdgcn.fmad.ftz intrinsic This intrinsic selects v_mad_f32 regardless of fp32 denorm support. Differential Revision: https://reviews.llvm.org/D48573 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@335654 91177308-0d34-0410-b5e6-96231b3b80d8	2018-06-26 20:04:19 +00:00
Matt Arsenault	a2ba13d731	AMDGPU: Add pass to lower kernel arguments to loads This replaces most argument uses with loads, but for now not all. The code in SelectionDAG for calling convention lowering is actively harmful for amdgpu_kernel. It attempts to split the argument types into register legal types, which results in low quality code for arbitary types. Since all kernel arguments are passed in memory, we just want the raw types. I've tried a couple of methods of mitigating this in SelectionDAG, but it's easier to just bypass this problem alltogether. It's possible to hack around the problem in the initial lowering, but the real problem is the DAG then expects to be able to use CopyToReg/CopyFromReg for uses of the arguments outside the block. Exposing the argument loads in the IR also has the advantage that the LoadStoreVectorizer can merge them. I'm not sure the best approach to dealing with the IR argument list is. The patch as-is just leaves the IR arguments in place, so all the existing code will still compute the same kernarg size and pointlessly lowers the arguments. Arguably the frontend should emit kernels with an empty argument list in the first place. Alternatively a dummy array could be inserted as a single argument just to reserve space. This does have some disadvantages. Local pointer kernel arguments can no longer have AssertZext placed on them as the equivalent !range metadata is not valid on pointer typed loads. This is mostly bad for SI which needs to know about the known bits in order to use the DS instruction offset, so in this case this is not done. More importantly, this skips noalias arguments since this pass does not yet convert this to the equivalent !alias.scope and !noalias metadata. Producing this metadata correctly seems to be tricky, although this logically is the same as inlining into a function which doesn't exist. Additionally, exposing these loads to the vectorizer may result in degraded aliasing information if a pointer load is merged with another argument load. I'm also not entirely sure this is preserving the current clover ABI, although I would greatly prefer if it would stop widening arguments and match the HSA ABI. As-is I think it is extending < 4-byte arguments to 4-bytes but doesn't align them to 4-bytes. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@335650 91177308-0d34-0410-b5e6-96231b3b80d8	2018-06-26 19:10:00 +00:00
Krzysztof Parzyszek	2c45bcb399	Account for undef values from predecessors in extendSegmentsToUses It is legal for a PHI node not to have a live value in a predecessor as long as the end of the predecessor is jointly dominated by an undef value. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@335607 91177308-0d34-0410-b5e6-96231b3b80d8	2018-06-26 14:37:16 +00:00
Matt Arsenault	529b26551c	AMDGPU/GlobalISel: Add support for llvm.amdgcn.kernarg.segment.ptr Note a normal select test is not currently possible because this relies on input registers tracked in SIMachineFunctionInfo which are not currently serializable in MIR, but this does work end-to-end from the IR. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@335490 91177308-0d34-0410-b5e6-96231b3b80d8	2018-06-25 16:17:48 +00:00
Matt Arsenault	a26c784064	StackSlotColoring: Decide colors per stack ID I thought I fixed this in r308673, but that fix was very broken. The assumption that any frame index can be used in place of another was more widespread than I realized. Even when stack slot sharing was disabled, this was still replacing frame index uses with a different ID with a different stack slot. Really fix this by doing the coloring per-stack ID, so all of the coloring logically done in a separate namespace. This is a lot simpler than trying to figure out how to change the color if the stack ID is different. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@335488 91177308-0d34-0410-b5e6-96231b3b80d8	2018-06-25 16:05:55 +00:00
Matt Arsenault	c4c340a047	AMDGPU/GlobalISel: Fix G_IMPLICIT_DEF for pointers git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@335485 91177308-0d34-0410-b5e6-96231b3b80d8	2018-06-25 15:42:12 +00:00
Matt Arsenault	d3d2228f8c	AMDGPU: Respect align argument parameter This should avoid relying on the pointee type to get the alignment, particularly since pointee types are supposed to be removed at some point. Also fixes not getting the alignment for unsized types. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@335478 91177308-0d34-0410-b5e6-96231b3b80d8	2018-06-25 14:29:04 +00:00
Krzysztof Parzyszek	ce1c8d90ff	Improve handling of COPY instructions with identical value numbers Testcases provided by Tim Renouf. Differential Revision: https://reviews.llvm.org/D48102 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@335472 91177308-0d34-0410-b5e6-96231b3b80d8	2018-06-25 13:46:41 +00:00
Matt Arsenault	ae175dfe4a	AMDGPU: Add patterns for i32/i64 local atomic load/store Not sure why the 32/64 split is needed in the atomic_load store hierarchies. The regular PatFrags do this, but we don't do it for the existing handling for global. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@335325 91177308-0d34-0410-b5e6-96231b3b80d8	2018-06-22 08:39:52 +00:00
Tom Stellard	3e9055c7ee	AMDGPU/GlobalISel: legalize and select 32-bit G_ASHR Reviewers: arsenm, nhaehnle Subscribers: kzhuravl, wdng, yaxunl, rovka, kristof.beyls, dstuttard, tpr, llvm-commits, t-tye Differential Revision: https://reviews.llvm.org/D48196 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@335318 91177308-0d34-0410-b5e6-96231b3b80d8	2018-06-22 02:54:57 +00:00
Tom Stellard	20f413f83a	AMDGPU/GlobalISel: legalize and select 32-bit G_SITOFP Reviewers: arsenm, nhaehnle Reviewed By: arsenm Subscribers: kzhuravl, wdng, yaxunl, rovka, kristof.beyls, dstuttard, tpr, t-tye, llvm-commits Differential Revision: https://reviews.llvm.org/D48195 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@335316 91177308-0d34-0410-b5e6-96231b3b80d8	2018-06-22 02:34:29 +00:00
Tom Stellard	b93460fa62	AMDGPU/GlobalISel: Implement select() for COPY Reviewers: arsenm, nhaehnle Reviewed By: nhaehnle Subscribers: kzhuravl, wdng, yaxunl, rovka, kristof.beyls, dstuttard, tpr, t-tye, llvm-commits Differential Revision: https://reviews.llvm.org/D46151 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@335315 91177308-0d34-0410-b5e6-96231b3b80d8	2018-06-22 00:44:29 +00:00
Tom Stellard	0225aa982f	AMDGPU/GlobalISel: Implement select() for G_IMPLICIT_DEF Reviewers: arsenm, nhaehnle Subscribers: kzhuravl, wdng, yaxunl, rovka, kristof.beyls, dstuttard, tpr, t-tye, llvm-commits Differential Revision: https://reviews.llvm.org/D46150 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@335307 91177308-0d34-0410-b5e6-96231b3b80d8	2018-06-21 23:38:20 +00:00
Konstantin Zhuravlyov	13b063fc36	AMDGPU: Remove ability to reserve VGPRs for debugger Differential Revision: https://reviews.llvm.org/D48234 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@335288 91177308-0d34-0410-b5e6-96231b3b80d8	2018-06-21 20:28:19 +00:00
Scott Linder	43cbf8d92e	[AMDGPU] Update assembler for HSA Code Object v3 Update AMDGPU assembler syntax behind the code-object-v3 feature: * Replace/rename most AMDGPU assembler directives/symbols and document them. * Provide more diagnostics (e.g. values out of range, missing values, repeated values). * Provide path for backwards compatibility, even with underlying descriptor changes. Differential Revision: https://reviews.llvm.org/D47736 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@335281 91177308-0d34-0410-b5e6-96231b3b80d8	2018-06-21 19:38:56 +00:00
Stanislav Mekhanoshin	8f072ac5b4	DAG combine "and\|or (select c, -1, 0), x" -> "select c, x, 0\|-1" Allowed folding for "and/or" binops with non-constant operand if arguments of select are 0/-1 values. Normally this code with "and" opcode does not get to a DAG combiner and simplified yet in the InstCombine. However AMDGPU produces it during lowering and InstCombine has no chance to optimize it out. In turn the same pattern with "or" opcode can reach DAG. Differential Revision: https://reviews.llvm.org/D48301 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@335250 91177308-0d34-0410-b5e6-96231b3b80d8	2018-06-21 16:02:05 +00:00
Nicolai Haehnle	4c3fa871b5	AMDGPU: Remove old-style image intrinsics Summary: This also removes the need for atomic pseudo instructions, since we select the correct encoding directly in SITargetLowering::lowerImage for dimension-aware image intrinsics. Mesa uses dimension-aware image intrinsics since commit a9a7993441. Change-Id: I7473d20009476a4ed6d919cae4e6dca9ff42e77a Reviewers: arsenm, rampitec, mareko, tpr, b-sumner Subscribers: kzhuravl, wdng, yaxunl, dstuttard, t-tye, llvm-commits Differential Revision: https://reviews.llvm.org/D48167 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@335231 91177308-0d34-0410-b5e6-96231b3b80d8	2018-06-21 13:37:45 +00:00
Nicolai Haehnle	330c65751e	AMDGPU: Convert test cases to the dimension-aware intrinsics Summary: Also explicitly port over some tests in llvm.amdgcn.image.* that were missing. Some tests are removed because they no longer apply (i.e. explicitly testing building an address vector via insertelement). This is in preparation for the eventual removal of the old-style intrinsics. Some additional notes: - constant-address-space-32bit.ll: change some GCN-NEXT to GCN because the instruction schedule was subtly altered - insert_vector_elt.ll: the old test didn't actually test anything, because %tmp1 was not used; remove the load, because it doesn't work (Because of the amdgpu_ps calling convention? In any case, it's orthogonal to what the test claims to be testing.) Change-Id: Idfa99b6512ad139e755e82b8b89548ab08f0afcf Reviewers: arsenm, rampitec Subscribers: MatzeB, qcolombet, kzhuravl, wdng, yaxunl, dstuttard, tpr, t-tye, javed.absar, llvm-commits Differential Revision: https://reviews.llvm.org/D48018 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@335229 91177308-0d34-0410-b5e6-96231b3b80d8	2018-06-21 13:37:19 +00:00
Nicolai Haehnle	d42a61b0d4	AMDGPU: Select MIMG instructions manually in SITargetLowering Summary: Having TableGen patterns for image intrinsics is hitting limitations: for D16 we already have to manually pre-lower the packing of data values, and we will have to do the same for A16 eventually. Since there is already some custom C++ code anyway, it is arguably easier to just do everything in C++, now that we can use the beefed-up generic tables backend of TableGen to provide all the required metadata and map intrinsics to corresponding opcodes. With this approach, all image intrinsic lowering happens in SITargetLowering::lowerImage. That code is dense due to all the cases that it handles, but it should still be easier to follow than what we had before, by virtue of it all being done in a single location, and by virtue of not relying on the TableGen pattern magic that very few people really understand. This means that we will have MachineSDNodes with MIMG instructions during DAG combining, but that seems alright: previously we had intrinsic nodes instead, but those are similarly opaque to the generic CodeGen infrastructure, and the final pattern matching just did a 1:1 translation to machine instructions anyway. If anything, the fact that we now merge the address words into a vector before DAG combine should be an advantage. Change-Id: I417f26bd88f54ce9781c1668acc01f3f99774de6 Reviewers: arsenm, rampitec, rtaylor, tstellar Subscribers: kzhuravl, wdng, yaxunl, dstuttard, tpr, t-tye, llvm-commits Differential Revision: https://reviews.llvm.org/D48017 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@335228 91177308-0d34-0410-b5e6-96231b3b80d8	2018-06-21 13:36:57 +00:00
Nicolai Haehnle	6694c8fd84	AMDGPU: Add implicit def of SCC to kill and indirect pseudos Summary: Kill instructions sometimes do use SCC in unusual circumstances, when v_cmpx cannot be used due to the operands that are involved. Additionally, even if SCC was never defined by the expansion, kill pseudos could previously occur between an s_cmp and an s_cbranch_scc, which breaks the SCC liveness tracking when the pseudo is expanded to split the basic block. While it would be possible to explicitly mark the SCC as live-in for the successor basic block, it's simpler to just mark the pseudo as using SCC, so that such a sequence is never emitted by instruction selection in the first place. A similar issue affects indirect source/dest pseudos in principle, although I haven't been able to come up with a test case where it actually matters (this affects instruction selection, so a MIR test can't be used). Fixes: dEQP-GLES3.functional.shaders.discard.dynamic_loop_always Change-Id: Ica8d82ecff1a763b892a1112cf1b06c948863a4f Reviewers: arsenm, rampitec Subscribers: kzhuravl, wdng, yaxunl, dstuttard, tpr, t-tye, llvm-commits Differential Revision: https://reviews.llvm.org/D47761 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@335223 91177308-0d34-0410-b5e6-96231b3b80d8	2018-06-21 13:36:08 +00:00
Nicolai Haehnle	3bd8feb970	AMDGPU: Turn D16 for MIMG instructions into a regular operand Summary: This allows us to reduce the number of different machine instruction opcodes, which reduces the table sizes and helps flatten the TableGen multiclass hierarchies. We can do this because for each hardware MIMG opcode, we have a full set of IMAGE_xxx_Vn_Vm machine instructions for all required sizes of vdata and vaddr registers. Instead of having separate D16 machine instructions, a packed D16 instructions loading e.g. 4 components can simply use the same V2 opcode variant that non-D16 instructions use. We still require a TSFlag for D16 buffer instructions, because the D16-ness of buffer instructions is part of the opcode. Renaming the flag should help avoid future confusion. The one non-obvious code change is that for gather4 instructions, the disassembler can no longer automatically decide whether to use a V2 or a V4 variant. The existing logic which choose the correct variant for other MIMG instruction is extended to cover gather4 as well. As a bonus, some of the assembler error messages are now more helpful (e.g., complaining about a wrong data size instead of a non-existing instruction). While we're at it, delete a whole bunch of dead legacy TableGen code. Change-Id: I89b02c2841c06f95e662541433e597f5d4553978 Reviewers: arsenm, rampitec, kzhuravl, artem.tamazov, dp, rtaylor Subscribers: wdng, yaxunl, dstuttard, tpr, t-tye, llvm-commits Differential Revision: https://reviews.llvm.org/D47434 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@335222 91177308-0d34-0410-b5e6-96231b3b80d8	2018-06-21 13:36:01 +00:00
Alina Sbirlea	8d9ac7ff94	Generalize MergeBlockIntoPredecessor. Replace uses of MergeBasicBlockIntoOnlyPred. Summary: Two utils methods have essentially the same functionality. This is an attempt to merge them into one. 1. lib/Transforms/Utils/Local.cpp : MergeBasicBlockIntoOnlyPred 2. lib/Transforms/Utils/BasicBlockUtils.cpp : MergeBlockIntoPredecessor Prior to the patch: 1. MergeBasicBlockIntoOnlyPred Updates either DomTree or DeferredDominance Moves all instructions from Pred to BB, deletes Pred Asserts BB has single predecessor If address was taken, replace the block address with constant 1 (?) 2. MergeBlockIntoPredecessor Updates DomTree, LoopInfo and MemoryDependenceResults Moves all instruction from BB to Pred, deletes BB Returns if doesn't have a single predecessor Returns if BB's address was taken After the patch: Method 2. MergeBlockIntoPredecessor is attempting to become the new default: Updates DomTree or DeferredDominance, and LoopInfo and MemoryDependenceResults Moves all instruction from BB to Pred, deletes BB Returns if doesn't have a single predecessor Returns if BB's address was taken Uses of MergeBasicBlockIntoOnlyPred that need to be replaced: 1. lib/Transforms/Scalar/LoopSimplifyCFG.cpp Updated in this patch. No challenges. 2. lib/CodeGen/CodeGenPrepare.cpp Updated in this patch. i. eliminateFallThrough is straightforward, but I added using a temporary array to avoid the iterator invalidation. ii. eliminateMostlyEmptyBlock(s) methods also now use a temporary array for blocks Some interesting aspects: - Since Pred is not deleted (BB is), the entry block does not need updating. - The entry block was being updated with the deleted block in eliminateMostlyEmptyBlock. Added assert to make obvious that BB=SinglePred. - isMergingEmptyBlockProfitable assumes BB is the one to be deleted. - eliminateMostlyEmptyBlock(BB) does not delete BB on one path, it deletes its unique predecessor instead. - adding some test owner as subscribers for the interesting tests modified: test/CodeGen/X86/avx-cmp.ll test/CodeGen/AMDGPU/nested-loop-conditions.ll test/CodeGen/AMDGPU/si-annotate-cf.ll test/CodeGen/X86/hoist-spill.ll test/CodeGen/X86/2006-11-17-IllegalMove.ll 3. lib/Transforms/Scalar/JumpThreading.cpp Not covered in this patch. It is the only use case using the DeferredDominance. I would defer to Brian Rzycki to make this replacement. Reviewers: chandlerc, spatel, davide, brzycki, bkramer, javed.absar Subscribers: qcolombet, sanjoy, nemanjai, nhaehnle, jlebar, tpr, kbarton, RKSimon, wmi, arsenm, llvm-commits Differential Revision: https://reviews.llvm.org/D48202 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@335183 91177308-0d34-0410-b5e6-96231b3b80d8	2018-06-20 22:01:04 +00:00
Stanislav Mekhanoshin	3e703da257	Allow binop C1, (select cc, CF, CT) -> select folding Previously this folding was done only if select is a first operand. However, for non-commutative operations constant may go before select. Differential Revision: https://reviews.llvm.org/D48223 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@335167 91177308-0d34-0410-b5e6-96231b3b80d8	2018-06-20 20:24:20 +00:00
Matt Arsenault	eb6f9a11db	AMDGPU: Fix scalar_to_vector for v4i16/v4f16 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@335161 91177308-0d34-0410-b5e6-96231b3b80d8	2018-06-20 19:45:48 +00:00
Michael Berg	56057ccc17	Utilize new SDNode flag functionality to expand current support for fadd Summary: This patch originated from D46562 and is a proper subset, with some issues addressed. Reviewers: spatel, hfinkel, wristow, arsenm, javed.absar Reviewed By: spatel Subscribers: wdng, nhaehnle Differential Revision: https://reviews.llvm.org/D47909 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@334996 91177308-0d34-0410-b5e6-96231b3b80d8	2018-06-18 23:44:59 +00:00
Krzysztof Parzyszek	db53e01a53	Shrink interval after moving copy in removePartialRedundancy git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@334963 91177308-0d34-0410-b5e6-96231b3b80d8	2018-06-18 17:16:39 +00:00
Stanislav Mekhanoshin	ec4b3c5670	[AMDGPU] setcc (select cc, CT, CF), CF, eq \| ne -> xor cc, -1 \| cc This is the common case in the BE when we serialize condition and then rematerialize it. Use either original or inverted condition. Differential Revision: https://reviews.llvm.org/D48246 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@334882 91177308-0d34-0410-b5e6-96231b3b80d8	2018-06-16 03:46:59 +00:00
Michael Berg	97de3c8816	Utilize new SDNode flag functionality to expand current support for fdiv Summary: This patch originated from D46562 and is a proper subset, with some issues addressed. Reviewers: spatel, hfinkel, wristow, arsenm Reviewed By: spatel Subscribers: wdng, nhaehnle Differential Revision: https://reviews.llvm.org/D47954 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@334862 91177308-0d34-0410-b5e6-96231b3b80d8	2018-06-15 20:44:55 +00:00
Matt Arsenault	bd0b6b0e98	AMDGPU: Add combine for short vector extract_vector_elts Try to access pieces 4 bytes at a time. This helps various hasOneUse extract_vector_elt combines, such as load width reductions. Avoids test regressions in a future commit. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@334836 91177308-0d34-0410-b5e6-96231b3b80d8	2018-06-15 15:31:36 +00:00
Matt Arsenault	9e41f5314e	AMDGPU: Make v4i16/v4f16 legal Some image loads return these, and it's awkward working around them not being legal. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@334835 91177308-0d34-0410-b5e6-96231b3b80d8	2018-06-15 15:15:46 +00:00
Matt Arsenault	bd512dd0f9	DAG: Fix creating concat_vectors with illegal type Test passes as is, but fails with future patch to make v4i16/v4f16 legal. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@334823 91177308-0d34-0410-b5e6-96231b3b80d8	2018-06-15 12:09:15 +00:00
Roman Lebedev	f2c20b5ace	[AMDGPU] Recognize x & ~(-1 << y) pattern. Summary: The same pattern as D48010, but this one is IR-canonical as of D47428. Reviewers: nhaehnle, bogner, tstellar, arsenm Reviewed By: arsenm Subscribers: arsenm, kzhuravl, wdng, yaxunl, dstuttard, tpr, t-tye, llvm-commits Tags: #amdgpu Differential Revision: https://reviews.llvm.org/D48012 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@334817 91177308-0d34-0410-b5e6-96231b3b80d8	2018-06-15 09:56:45 +00:00
Roman Lebedev	fc84800456	[AMDGPU] Recognize x & ((1 << y) - 1) pattern. Summary: As a followup for D48007. Since we already handle `x << (bitwidth - y) >> (bitwidth - y)` pattern, which does not have ub for both the edge cases (`y == 0`, `y == bitwidth`), i think also handling a pattern that is ub for `y == bitwidth` should be fine. Reviewers: nhaehnle, bogner, tstellar, arsenm Reviewed By: arsenm Subscribers: arsenm, kzhuravl, wdng, yaxunl, dstuttard, tpr, t-tye, llvm-commits Tags: #amdgpu Differential Revision: https://reviews.llvm.org/D48010 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@334816 91177308-0d34-0410-b5e6-96231b3b80d8	2018-06-15 09:56:39 +00:00
Roman Lebedev	1d9a02a498	[AMDGPU] Recognize x & (-1 >> (32 - y)) pattern. Summary: D47980 will canonicalize the `x << (32 - y) >> (32 - y)`, which is the pattern the AMDGPU expects to `x & (-1 >> (32 - y))`, which is not recognized by AMDGPU. Thus, it needs to be recognized, too. Reviewers: nhaehnle, bogner, tstellar, arsenm Reviewed By: arsenm Subscribers: arsenm, kzhuravl, wdng, yaxunl, dstuttard, tpr, t-tye, llvm-commits Tags: #amdgpu Differential Revision: https://reviews.llvm.org/D48007 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@334815 91177308-0d34-0410-b5e6-96231b3b80d8	2018-06-15 09:56:31 +00:00
Tom Stellard	b4220f48ed	AMDGPU/GlobalISel: Implement select() for @llvm.amdgcn.cvt.pkrtz Reviewers: arsenm, nhaehnle Reviewed By: arsenm Subscribers: kzhuravl, wdng, yaxunl, rovka, kristof.beyls, dstuttard, tpr, t-tye, llvm-commits Differential Revision: https://reviews.llvm.org/D45907 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@334757 91177308-0d34-0410-b5e6-96231b3b80d8	2018-06-14 19:26:37 +00:00
Tom Stellard	2cf1b47d89	AMDGPU/GlobalISel: Implement select() for 32-bit G_FADD and G_FMUL Reviewers: arsenm, nhaehnle Reviewed By: arsenm Subscribers: kzhuravl, wdng, yaxunl, rovka, kristof.beyls, dstuttard, tpr, t-tye, llvm-commits Differential Revision: https://reviews.llvm.org/D46171 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@334665 91177308-0d34-0410-b5e6-96231b3b80d8	2018-06-13 22:30:47 +00:00
Stanislav Mekhanoshin	822ea1bfe8	[AMDGPU] Corrected computeKnownBits for V_PERM_B32 Differential Revision: https://reviews.llvm.org/D48133 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@334640 91177308-0d34-0410-b5e6-96231b3b80d8	2018-06-13 18:52:54 +00:00
Yaxun Liu	350359838b	[AMDGPU] Change enqueue kernel handle type Currently the handle type is a global pointer which holds 8 bytes. We need a larger type which hold 16 bytes, therefore change it to [i64 x 2]. Differential Revision: https://reviews.llvm.org/D48094 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@334625 91177308-0d34-0410-b5e6-96231b3b80d8	2018-06-13 17:31:51 +00:00
Krzysztof Parzyszek	be100d0a8d	Revert "Improve handling of COPY instructions with identical value numbers" This reverts r334594, it breaks buildbots and fails with expensive checks. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@334598 91177308-0d34-0410-b5e6-96231b3b80d8	2018-06-13 13:49:06 +00:00

1 2 3 4 5 ...

1671 Commits