archived-llvm

mirror of https://github.com/RPCS3/llvm.git synced 2026-01-31 01:25:19 +01:00

Author	SHA1	Message	Date
Michael Berg	56057ccc17	Utilize new SDNode flag functionality to expand current support for fadd Summary: This patch originated from D46562 and is a proper subset, with some issues addressed. Reviewers: spatel, hfinkel, wristow, arsenm, javed.absar Reviewed By: spatel Subscribers: wdng, nhaehnle Differential Revision: https://reviews.llvm.org/D47909 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@334996 91177308-0d34-0410-b5e6-96231b3b80d8	2018-06-18 23:44:59 +00:00
Krzysztof Parzyszek	db53e01a53	Shrink interval after moving copy in removePartialRedundancy git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@334963 91177308-0d34-0410-b5e6-96231b3b80d8	2018-06-18 17:16:39 +00:00
Stanislav Mekhanoshin	ec4b3c5670	[AMDGPU] setcc (select cc, CT, CF), CF, eq \| ne -> xor cc, -1 \| cc This is the common case in the BE when we serialize condition and then rematerialize it. Use either original or inverted condition. Differential Revision: https://reviews.llvm.org/D48246 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@334882 91177308-0d34-0410-b5e6-96231b3b80d8	2018-06-16 03:46:59 +00:00
Michael Berg	97de3c8816	Utilize new SDNode flag functionality to expand current support for fdiv Summary: This patch originated from D46562 and is a proper subset, with some issues addressed. Reviewers: spatel, hfinkel, wristow, arsenm Reviewed By: spatel Subscribers: wdng, nhaehnle Differential Revision: https://reviews.llvm.org/D47954 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@334862 91177308-0d34-0410-b5e6-96231b3b80d8	2018-06-15 20:44:55 +00:00
Matt Arsenault	bd0b6b0e98	AMDGPU: Add combine for short vector extract_vector_elts Try to access pieces 4 bytes at a time. This helps various hasOneUse extract_vector_elt combines, such as load width reductions. Avoids test regressions in a future commit. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@334836 91177308-0d34-0410-b5e6-96231b3b80d8	2018-06-15 15:31:36 +00:00
Matt Arsenault	9e41f5314e	AMDGPU: Make v4i16/v4f16 legal Some image loads return these, and it's awkward working around them not being legal. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@334835 91177308-0d34-0410-b5e6-96231b3b80d8	2018-06-15 15:15:46 +00:00
Matt Arsenault	bd512dd0f9	DAG: Fix creating concat_vectors with illegal type Test passes as is, but fails with future patch to make v4i16/v4f16 legal. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@334823 91177308-0d34-0410-b5e6-96231b3b80d8	2018-06-15 12:09:15 +00:00
Roman Lebedev	f2c20b5ace	[AMDGPU] Recognize x & ~(-1 << y) pattern. Summary: The same pattern as D48010, but this one is IR-canonical as of D47428. Reviewers: nhaehnle, bogner, tstellar, arsenm Reviewed By: arsenm Subscribers: arsenm, kzhuravl, wdng, yaxunl, dstuttard, tpr, t-tye, llvm-commits Tags: #amdgpu Differential Revision: https://reviews.llvm.org/D48012 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@334817 91177308-0d34-0410-b5e6-96231b3b80d8	2018-06-15 09:56:45 +00:00
Roman Lebedev	fc84800456	[AMDGPU] Recognize x & ((1 << y) - 1) pattern. Summary: As a followup for D48007. Since we already handle `x << (bitwidth - y) >> (bitwidth - y)` pattern, which does not have ub for both the edge cases (`y == 0`, `y == bitwidth`), i think also handling a pattern that is ub for `y == bitwidth` should be fine. Reviewers: nhaehnle, bogner, tstellar, arsenm Reviewed By: arsenm Subscribers: arsenm, kzhuravl, wdng, yaxunl, dstuttard, tpr, t-tye, llvm-commits Tags: #amdgpu Differential Revision: https://reviews.llvm.org/D48010 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@334816 91177308-0d34-0410-b5e6-96231b3b80d8	2018-06-15 09:56:39 +00:00
Roman Lebedev	1d9a02a498	[AMDGPU] Recognize x & (-1 >> (32 - y)) pattern. Summary: D47980 will canonicalize the `x << (32 - y) >> (32 - y)`, which is the pattern the AMDGPU expects to `x & (-1 >> (32 - y))`, which is not recognized by AMDGPU. Thus, it needs to be recognized, too. Reviewers: nhaehnle, bogner, tstellar, arsenm Reviewed By: arsenm Subscribers: arsenm, kzhuravl, wdng, yaxunl, dstuttard, tpr, t-tye, llvm-commits Tags: #amdgpu Differential Revision: https://reviews.llvm.org/D48007 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@334815 91177308-0d34-0410-b5e6-96231b3b80d8	2018-06-15 09:56:31 +00:00
Tom Stellard	b4220f48ed	AMDGPU/GlobalISel: Implement select() for @llvm.amdgcn.cvt.pkrtz Reviewers: arsenm, nhaehnle Reviewed By: arsenm Subscribers: kzhuravl, wdng, yaxunl, rovka, kristof.beyls, dstuttard, tpr, t-tye, llvm-commits Differential Revision: https://reviews.llvm.org/D45907 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@334757 91177308-0d34-0410-b5e6-96231b3b80d8	2018-06-14 19:26:37 +00:00
Tom Stellard	2cf1b47d89	AMDGPU/GlobalISel: Implement select() for 32-bit G_FADD and G_FMUL Reviewers: arsenm, nhaehnle Reviewed By: arsenm Subscribers: kzhuravl, wdng, yaxunl, rovka, kristof.beyls, dstuttard, tpr, t-tye, llvm-commits Differential Revision: https://reviews.llvm.org/D46171 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@334665 91177308-0d34-0410-b5e6-96231b3b80d8	2018-06-13 22:30:47 +00:00
Stanislav Mekhanoshin	822ea1bfe8	[AMDGPU] Corrected computeKnownBits for V_PERM_B32 Differential Revision: https://reviews.llvm.org/D48133 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@334640 91177308-0d34-0410-b5e6-96231b3b80d8	2018-06-13 18:52:54 +00:00
Yaxun Liu	350359838b	[AMDGPU] Change enqueue kernel handle type Currently the handle type is a global pointer which holds 8 bytes. We need a larger type which hold 16 bytes, therefore change it to [i64 x 2]. Differential Revision: https://reviews.llvm.org/D48094 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@334625 91177308-0d34-0410-b5e6-96231b3b80d8	2018-06-13 17:31:51 +00:00
Krzysztof Parzyszek	be100d0a8d	Revert "Improve handling of COPY instructions with identical value numbers" This reverts r334594, it breaks buildbots and fails with expensive checks. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@334598 91177308-0d34-0410-b5e6-96231b3b80d8	2018-06-13 13:49:06 +00:00
Krzysztof Parzyszek	eee75ac731	Improve handling of COPY instructions with identical value numbers Differential Revision: https://reviews.llvm.org/D48102 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@334594 91177308-0d34-0410-b5e6-96231b3b80d8	2018-06-13 12:47:17 +00:00
Stanislav Mekhanoshin	6c5eb4370b	[AMDGPU] DAG combine to produce V_PERM_B32 Differential Revision: https://reviews.llvm.org/D48099 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@334559 91177308-0d34-0410-b5e6-96231b3b80d8	2018-06-12 23:50:37 +00:00
Konstantin Zhuravlyov	0397fe5863	AMDHSA/NFC: Code object v3 updates (additional): - Move section selection and alignment to AMDGPUAsmPrinter git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@334521 91177308-0d34-0410-b5e6-96231b3b80d8	2018-06-12 18:33:51 +00:00
Konstantin Zhuravlyov	299cf5ff6a	AMDHSA: Code object v3 updates - Do not emit following assembler directives: - .hsa_code_object_version - .hsa_code_object_isa - .amd_amdgpu_isa - .amd_amdgpu_hsa_metadata - .amd_amdgpu_pal_metadata - Do not emit .note entries - Cleanup and bring in sync kernel descriptor header file - Emit kernel descriptor into .rodata with appropriate relocations and alignments git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@334519 91177308-0d34-0410-b5e6-96231b3b80d8	2018-06-12 18:02:46 +00:00
Mark Searles	8f93b43810	[AMDGPU] prevent hitting Assertion `isReg() && "Wrong MachineOperand accessor"' The use iterator, used within findMaskOperands(), can return anything which is not a def. isUse() requires a register, so check isReg() before calling isUse(). Differential Revision: https://reviews.llvm.org/D48047 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@334459 91177308-0d34-0410-b5e6-96231b3b80d8	2018-06-12 00:41:26 +00:00
Stanislav Mekhanoshin	7a3f751cbc	[AMDGPU] Do not consider indirect acces through phi for wave limiter Rational: if there is indirect access that is usually an issue because load is not ready by the use. However, if use is inside a loop and load is outside that is potentially an issue for a first iteration only. Differential Revision: https://reviews.llvm.org/D47740 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@334420 91177308-0d34-0410-b5e6-96231b3b80d8	2018-06-11 16:50:49 +00:00
Roman Lebedev	8a8a729242	[NFC][AMDGPU] Add tests for all the various IR patterns equivalent to extracting low bits. Summary: The idiom recognition seems rather poor. Only the `@bzhi32_d0` produces `v_bfe_u32`. But they all should. This needs to be fixed before D47980 can be re-landed. Reviewers: mareko, bogner, rampitec, arsenm, tstellar, nhaehnle Reviewed By: nhaehnle Subscribers: kzhuravl, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, llvm-commits Tags: #amdgpu Differential Revision: https://reviews.llvm.org/D48005 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@334398 91177308-0d34-0410-b5e6-96231b3b80d8	2018-06-11 10:21:10 +00:00
Daniil Fukalov	8fbd41bc43	[AMDGPU] Inline asm - added i16, half and i128 types support AMDGPU inline assembler support i16, half and i128 typed variables in constraints, but they were reported as error. Needed to fix https://github.com/RadeonOpenCompute/ROCm/issues/341, e.g. to be able to load with global_load_dwordx4 to a 128bit integer variable Differential Revision: https://reviews.llvm.org/D44920 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@334301 91177308-0d34-0410-b5e6-96231b3b80d8	2018-06-08 16:29:04 +00:00
Matt Arsenault	0aedafefd1	AMDGPU: Error on LDS global address in functions These won't work as expected now, so error on them to avoid wasting time debugging this in the future. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@334269 91177308-0d34-0410-b5e6-96231b3b80d8	2018-06-08 08:05:54 +00:00
Tony Tye	1310a7556e	[AMDGPU] Simplify memory legalizer - Make code easier to maintain. - Avoid generating waitcnts for VMEM if the address sppace does not involve VMEM. - Add support to generate waitcnts for LDS and GDS memory. Differential Revision: https://reviews.llvm.org/D47504 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@334241 91177308-0d34-0410-b5e6-96231b3b80d8	2018-06-07 22:28:32 +00:00
Matt Arsenault	4dea9c2811	AMDGPU: Fix not including v2f64 in SReg_128 Fixes assertion with calls returning v2f64. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@334189 91177308-0d34-0410-b5e6-96231b3b80d8	2018-06-07 12:16:31 +00:00
Matt Arsenault	86e569e7c6	AMDGPU: Use scalar operations for f16 fabs/fneg patterns Fixes unnecessary differences between subtargets. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@334184 91177308-0d34-0410-b5e6-96231b3b80d8	2018-06-07 10:15:20 +00:00
Matt Arsenault	75c4f68f53	AMDGPU: Try a lot harder to emit scalar loads This has two main components. First, widen widen short constant loads in DAG when they have the correct alignment. This is already done a bit in AMDGPUCodeGenPrepare, since that has access to DivergenceAnalysis. This can't help kernarg loads created in the DAG. Start to use DAG divergence analysis to help this case. The second part is to avoid kernel argument lowering breaking the alignment of short vector elements because calling convention lowering wants to split everything into legal register types. When loading a split type, load the nearest 4-byte aligned segment and shift to get the desired bits. This extra load of the earlier argument piece ends up merging, and the bit extract hopefully folds out. There are a number of improvements and regressions with this, but I think as-is this is a better compromise between several of the worst parts of SelectionDAG. Particularly when i16 is legal, this produces worse code for i8 and i16 element vector kernel arguments. This is partially due to the very weak load merging the DAG does. It only looks for fairly specific combines between pairs of loads which no longer appear. In particular this causes v4i16 loads to be split into 2 components when previously the two halves were merged. Worse, because of the newly introduced shifts, there is a lot more unnecessary vector packing and unpacking code emitted. At least some of this is due to reporting false for isTypeDesirableForOp for i16 as a workaround for the lack of divergence information in the DAG. The cases where this happens it doesn't actually matter, but the relevant code in SimplifyDemandedBits doens't have the context to know to ignore this. The use of the scalar cache is probably more important than the mess of mostly scalar instructions doing this packing and unpacking. Future work can fix this, possibly by making better use of the new DAG divergence information for controlling promotion decisions, or adding another version of shift + trunc + shift combines that doesn't only know about the used types. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@334180 91177308-0d34-0410-b5e6-96231b3b80d8	2018-06-07 09:54:49 +00:00
Stanislav Mekhanoshin	ded60aa7f8	[AMDGPU] Improve reciprocal handling When denormals are supported we are producing a full division for 1.0f / x. That still can be replaced by the faster version: bool c = fabs(x) > 0x1.0p+96f; float s = c ? 0x1.0p-32f : 1.0f; x = s; return s v_rcp_f32(x) in case if requested accuracy is 2.5ulp or less. The same version is used if denormals are not supported for non 1.0 numerators, where just v_rcp_f32 is then used for 1.0 numerator. The optimization of 1/x is extended to the case -1/x, which is the same except for the resulting sign bit. OpenCL conformance passed with both enabled and disabled denorms. Differential Revision: https://reviews.llvm.org/D47805 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@334142 91177308-0d34-0410-b5e6-96231b3b80d8	2018-06-06 22:22:32 +00:00
Matt Arsenault	2d78ae2ab1	AMDGPU: Custom lower v2f16 fneg/fabs with illegal f16 Fixes terrible code on targets without f16 support. The legalization creates a mess that is difficult to recover from. Also should avoid randomly breaking these tests multiple times in sequence in future commits. Some regressions in cases where it happens to be better to pull the source modifier after the conversion. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@334132 91177308-0d34-0410-b5e6-96231b3b80d8	2018-06-06 21:28:11 +00:00
Matt Arsenault	19dfb4b388	AMDGPU: Preserve metadata when widening loads Preserves the low bound of the !range. I don't think it's legal to do anything with the top half since it's theoretically reading garbage. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@334045 91177308-0d34-0410-b5e6-96231b3b80d8	2018-06-05 19:52:56 +00:00
Matt Arsenault	0c5ab47e3b	AMDGPU: Use more custom insert/extract_vector_elt lowering Apply to i8 vectors. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@334044 91177308-0d34-0410-b5e6-96231b3b80d8	2018-06-05 19:52:46 +00:00
Matt Arsenault	e5fc674090	DAG: Stop dropping invariant/dereferencable When legalizing illegal FP load results, this was for some reason dropping the invariant and dereferencable memory flags. There doesn't seem to be any reason for this, and the equivalent isn't done for integer loads. Fixes an issue in a future AMDGPU commit where some identical loads fail to merge because one of the loads ends up dropping the flags. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@334020 91177308-0d34-0410-b5e6-96231b3b80d8	2018-06-05 14:52:24 +00:00
Scott Linder	b9cccf44aa	[CodeGen] Always update divergence in SelectionDAG::UpdateNodeOperands Some overloads failed to update divergence. Differential Revision: https://reviews.llvm.org/D47148 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@333947 91177308-0d34-0410-b5e6-96231b3b80d8	2018-06-04 20:19:45 +00:00
Mark Searles	8177aafa74	[AMDGPU][Waitcnt] Fix handling of flat instrs On GFX9 and earlier, flat memory ops may decrement VMCNT out-of-order as well as LGKMCNT out-of-order. Differential Revision: https://reviews.llvm.org/D46616 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@333926 91177308-0d34-0410-b5e6-96231b3b80d8	2018-06-04 16:51:59 +00:00
Matt Arsenault	0176ae243c	AMDGPU: Switch some half using-tests to use amdhsa The default clover ABI weirdly promotes half to float, which should probably be fixed. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@333730 91177308-0d34-0410-b5e6-96231b3b80d8	2018-06-01 07:06:03 +00:00
Stanislav Mekhanoshin	e5240df77a	[AMDGPU] Construct memory clauses before RA Memory clauses are formed into bundles in presence of xnack. Their source operands are marked as early-clobber. This allows to allocate distinct source and destination registers within a clause and prevent breaking the clause with s_nop in the hazard recognizer. Clauses are undone before post-RA scheduler to allow some rescheduling, which will not break the clause since artificial edges are created in the dag to keep memory operations together. Yet this allows a better ILP in some cases. Differential Revision: https://reviews.llvm.org/D47511 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@333691 91177308-0d34-0410-b5e6-96231b3b80d8	2018-05-31 20:13:51 +00:00
Stanislav Mekhanoshin	93b037daa0	[AMDGPU] Fixed incorrect -mcpu=gfx800 in xnor.ll test. NFC. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@333687 91177308-0d34-0410-b5e6-96231b3b80d8	2018-05-31 19:39:54 +00:00
Jan Vesely	655741c807	AMDGPU/R600: Make sure functions are cacheline aligned v2: use "ensureAlignment" make functions cache line aligned Fixes GPU hangs since r333219: "AMDGPU: Split R600 AsmPrinter code into its own class" Differential Revision: https://reviews.llvm.org/D47516 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@333622 91177308-0d34-0410-b5e6-96231b3b80d8	2018-05-31 04:08:08 +00:00
Matt Arsenault	664e38cb13	AMDGPU: Use better alignment for kernarg lowering This was just emitting loads with the ABI alignment for the raw type. The true alignment is often better, especially when an illegal vector type was scalarized. The better alignment allows using a scalar load more often. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@333558 91177308-0d34-0410-b5e6-96231b3b80d8	2018-05-30 16:17:51 +00:00
Mark Searles	b81ef9cebe	[AMDGPU][Waitcnt] Fix handling of loops with many bottom blocks In terms of waitcnt insertion/if necessary, the waitcnt pass forces convergence for a loop. Previously, that kicked if greater than 2 passes over a loop, which doesn't account for loop with many bottom blocks. So, increase the threshold to (n+1), where n is the number of bottom blocks. This gives the pass an opportunity to consider the contribution of each bottom block, to the overall loop, before the forced convergence potentially kicks in. Differential Revision: https://reviews.llvm.org/D47488 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@333556 91177308-0d34-0410-b5e6-96231b3b80d8	2018-05-30 15:47:45 +00:00
Matt Arsenault	7f19a98b4c	AMDGPU: Fix broken check lines git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@333458 91177308-0d34-0410-b5e6-96231b3b80d8	2018-05-29 19:35:53 +00:00
Matt Arsenault	1fa5b55214	AMDGPU: Round up kernel argument allocation size AFAIK the driver's allocation will actually have to round this up anyway. It is useful to track the rounded up size, so that the end of the kernel segment is known to be dereferencable so a wider s_load_dword can be used for a short argument at the end of the segment. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@333456 91177308-0d34-0410-b5e6-96231b3b80d8	2018-05-29 19:35:00 +00:00
Konstantin Zhuravlyov	840f423383	AMDGPU: Always set COMPUTE_PGM_RSRC2.ENABLE_TRAP_HANDLER to zero for AMDHSA as it is set by CP Differential Revision: https://reviews.llvm.org/D47392 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@333451 91177308-0d34-0410-b5e6-96231b3b80d8	2018-05-29 19:09:13 +00:00
Tim Renouf	822fd80cbe	[AMDGPU] Fixed WWM bug in block otherwise entirely in WQM Summary: For a block with WQM on entry and exit and containing no exact mode code, but containing some WWM code, the WQM pass forgot to process the block at all and so did not insert code to enter and leave WWM. This commit fixes that. Subscribers: arsenm, kzhuravl, wdng, nhaehnle, yaxunl, dstuttard, t-tye, llvm-commits Differential Revision: https://reviews.llvm.org/D47027 Change-Id: I044792eead1293bed4203fb26ce75f47878afeb6 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@333362 91177308-0d34-0410-b5e6-96231b3b80d8	2018-05-27 17:26:11 +00:00
Mark Searles	d900d78107	[AMDGPU][Waitcnt] Remove obsolete waitcnt option With the removal of the old waitcnt pass, the '-enable-si-insert-waitcnts' option is obsolete. Remove it. Differential Revision: https://reviews.llvm.org/D47378 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@333303 91177308-0d34-0410-b5e6-96231b3b80d8	2018-05-25 20:24:08 +00:00
Stanislav Mekhanoshin	277225f527	[AMDGPU] Add perf hints to functions This is adoption of HSAIL perfhint pass. Two types of hints are produced: 1. Function is memory bound. 2. Kernel can use wave limiter. Currently these hints are used in the scheduler. If a function is suspected to be memory bound we allow occupancy to decrease to 4 waves in the course of scheduling. Differential Revision: https://reviews.llvm.org/D46992 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@333289 91177308-0d34-0410-b5e6-96231b3b80d8	2018-05-25 17:25:12 +00:00
Tim Renouf	dde1312300	[AMDGPU] Fixed incorrect break from loop Summary: Lower control flow did not correctly handle the case that a loop break in if/else was on a condition that was not guaranteed to be masked by exec. The first test kernel shows an example of this going wrong; after exiting the loop, exec is all ones, even if it was not before the loop. The fix is for lowering of if-break and else-break to insert an S_AND_B64 to mask the break condition with exec. This commit also includes the optimization of not inserting that S_AND_B64 if it is obviously not needed because the break condition is the result of a V_CMP in the same basic block. V2: Addressed some review comments. V3: Test fixes. Subscribers: arsenm, kzhuravl, wdng, nhaehnle, yaxunl, dstuttard, t-tye, llvm-commits Differential Revision: https://reviews.llvm.org/D44046 Change-Id: I0fc56a01209a9e99d1d5c9b0ffd16f111caf200c git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@333258 91177308-0d34-0410-b5e6-96231b3b80d8	2018-05-25 07:55:04 +00:00
Changpeng Fang	310e78bb8f	StructurizeCFG: Adjust the loop depth for a subregion to order the nodes correctly Summary: StructurizeCFG::orderNodes basically uses a reverse post-order (RPO) traversal of the region list to get the order. The only problem with it is that sometimes backedges for outer loops will be visited before backedges for inner loops. To solve this problem, a loop depth based approach has been used to make sure all blocks in this loop has been visited before moving on to outer loop. However, we found a problem for a SubRegion which is a loop itself: --> BB1 --> BB2 --> BB3 --> In this case, BB2 is a SubRegion (loop), and thus its loopdepth is different than that of BB1 and BB3. This fact will lead BB2 to be placed in the wrong order. In this work, we treat the SubRegion as a special case and use its exit block to determine the loop and its depth to guard the sorting. Reviewers: arsenm, jlebar Differential Revision: https://reviews.llvm.org/D46912 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@333111 91177308-0d34-0410-b5e6-96231b3b80d8	2018-05-23 18:34:48 +00:00
Matt Arsenault	46d48331d6	AMDGPU: Fix missing test coverage for some 16-bit and packed ops git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@333024 91177308-0d34-0410-b5e6-96231b3b80d8	2018-05-22 20:42:00 +00:00

1 2 3 4 5 ...

1636 Commits