archived-llvm

mirror of https://github.com/RPCS3/llvm.git synced 2026-01-31 01:25:19 +01:00

Author	SHA1	Message	Date
Stanislav Mekhanoshin	ec4b3c5670	[AMDGPU] setcc (select cc, CT, CF), CF, eq \| ne -> xor cc, -1 \| cc This is the common case in the BE when we serialize condition and then rematerialize it. Use either original or inverted condition. Differential Revision: https://reviews.llvm.org/D48246 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@334882 91177308-0d34-0410-b5e6-96231b3b80d8	2018-06-16 03:46:59 +00:00
Matt Arsenault	bd0b6b0e98	AMDGPU: Add combine for short vector extract_vector_elts Try to access pieces 4 bytes at a time. This helps various hasOneUse extract_vector_elt combines, such as load width reductions. Avoids test regressions in a future commit. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@334836 91177308-0d34-0410-b5e6-96231b3b80d8	2018-06-15 15:31:36 +00:00
Matt Arsenault	9e41f5314e	AMDGPU: Make v4i16/v4f16 legal Some image loads return these, and it's awkward working around them not being legal. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@334835 91177308-0d34-0410-b5e6-96231b3b80d8	2018-06-15 15:15:46 +00:00
Roman Lebedev	f2c20b5ace	[AMDGPU] Recognize x & ~(-1 << y) pattern. Summary: The same pattern as D48010, but this one is IR-canonical as of D47428. Reviewers: nhaehnle, bogner, tstellar, arsenm Reviewed By: arsenm Subscribers: arsenm, kzhuravl, wdng, yaxunl, dstuttard, tpr, t-tye, llvm-commits Tags: #amdgpu Differential Revision: https://reviews.llvm.org/D48012 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@334817 91177308-0d34-0410-b5e6-96231b3b80d8	2018-06-15 09:56:45 +00:00
Roman Lebedev	fc84800456	[AMDGPU] Recognize x & ((1 << y) - 1) pattern. Summary: As a followup for D48007. Since we already handle `x << (bitwidth - y) >> (bitwidth - y)` pattern, which does not have ub for both the edge cases (`y == 0`, `y == bitwidth`), i think also handling a pattern that is ub for `y == bitwidth` should be fine. Reviewers: nhaehnle, bogner, tstellar, arsenm Reviewed By: arsenm Subscribers: arsenm, kzhuravl, wdng, yaxunl, dstuttard, tpr, t-tye, llvm-commits Tags: #amdgpu Differential Revision: https://reviews.llvm.org/D48010 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@334816 91177308-0d34-0410-b5e6-96231b3b80d8	2018-06-15 09:56:39 +00:00
Roman Lebedev	1d9a02a498	[AMDGPU] Recognize x & (-1 >> (32 - y)) pattern. Summary: D47980 will canonicalize the `x << (32 - y) >> (32 - y)`, which is the pattern the AMDGPU expects to `x & (-1 >> (32 - y))`, which is not recognized by AMDGPU. Thus, it needs to be recognized, too. Reviewers: nhaehnle, bogner, tstellar, arsenm Reviewed By: arsenm Subscribers: arsenm, kzhuravl, wdng, yaxunl, dstuttard, tpr, t-tye, llvm-commits Tags: #amdgpu Differential Revision: https://reviews.llvm.org/D48007 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@334815 91177308-0d34-0410-b5e6-96231b3b80d8	2018-06-15 09:56:31 +00:00
Tom Stellard	b4220f48ed	AMDGPU/GlobalISel: Implement select() for @llvm.amdgcn.cvt.pkrtz Reviewers: arsenm, nhaehnle Reviewed By: arsenm Subscribers: kzhuravl, wdng, yaxunl, rovka, kristof.beyls, dstuttard, tpr, t-tye, llvm-commits Differential Revision: https://reviews.llvm.org/D45907 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@334757 91177308-0d34-0410-b5e6-96231b3b80d8	2018-06-14 19:26:37 +00:00
Tom Stellard	2cf1b47d89	AMDGPU/GlobalISel: Implement select() for 32-bit G_FADD and G_FMUL Reviewers: arsenm, nhaehnle Reviewed By: arsenm Subscribers: kzhuravl, wdng, yaxunl, rovka, kristof.beyls, dstuttard, tpr, t-tye, llvm-commits Differential Revision: https://reviews.llvm.org/D46171 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@334665 91177308-0d34-0410-b5e6-96231b3b80d8	2018-06-13 22:30:47 +00:00
Stanislav Mekhanoshin	822ea1bfe8	[AMDGPU] Corrected computeKnownBits for V_PERM_B32 Differential Revision: https://reviews.llvm.org/D48133 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@334640 91177308-0d34-0410-b5e6-96231b3b80d8	2018-06-13 18:52:54 +00:00
Yaxun Liu	350359838b	[AMDGPU] Change enqueue kernel handle type Currently the handle type is a global pointer which holds 8 bytes. We need a larger type which hold 16 bytes, therefore change it to [i64 x 2]. Differential Revision: https://reviews.llvm.org/D48094 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@334625 91177308-0d34-0410-b5e6-96231b3b80d8	2018-06-13 17:31:51 +00:00
Dmitry Preobrazhensky	3ab6afc6d2	[AMDGPU][MC] Enabled parsing of relocations on VALU instructions See bug 37566: https://bugs.llvm.org/show_bug.cgi?id=37566 Reviewers: artem.tamazov, arsenm, nhaehnle Differential Revision: https://reviews.llvm.org/D47884 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@334622 91177308-0d34-0410-b5e6-96231b3b80d8	2018-06-13 17:02:03 +00:00
Dmitry Preobrazhensky	d77bb599aa	[AMDGPU][MC][GFX8][GFX9] Allow LDS direct reads for BUFFER_LOAD_DWORDX2/X3/X4 See bug 37653: https://bugs.llvm.org/show_bug.cgi?id=37653 Reviewers: artem.tamazov, arsenm Differential Revision: https://reviews.llvm.org/D47885 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@334609 91177308-0d34-0410-b5e6-96231b3b80d8	2018-06-13 15:32:46 +00:00
Tom Stellard	98e05fe529	AMDGPU: Move isSDNodeSourceOfDivergence() implementation to SITargetLowering Summary: The code that handles ISD:Register and ISD::CopyFromReg assumes the target is amdgcn, so this is broken on r600. We don't need this analysis on r600 anyway so we can safely move it to SITargetLowering. Reviewers: alex-t, arsenm, nhaehnle Reviewed By: arsenm Subscribers: msearles, kzhuravl, wdng, yaxunl, dstuttard, tpr, t-tye, llvm-commits Differential Revision: https://reviews.llvm.org/D46298 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@334607 91177308-0d34-0410-b5e6-96231b3b80d8	2018-06-13 15:06:37 +00:00
Stanislav Mekhanoshin	6c5eb4370b	[AMDGPU] DAG combine to produce V_PERM_B32 Differential Revision: https://reviews.llvm.org/D48099 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@334559 91177308-0d34-0410-b5e6-96231b3b80d8	2018-06-12 23:50:37 +00:00
Konstantin Zhuravlyov	0397fe5863	AMDHSA/NFC: Code object v3 updates (additional): - Move section selection and alignment to AMDGPUAsmPrinter git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@334521 91177308-0d34-0410-b5e6-96231b3b80d8	2018-06-12 18:33:51 +00:00
Konstantin Zhuravlyov	299cf5ff6a	AMDHSA: Code object v3 updates - Do not emit following assembler directives: - .hsa_code_object_version - .hsa_code_object_isa - .amd_amdgpu_isa - .amd_amdgpu_hsa_metadata - .amd_amdgpu_pal_metadata - Do not emit .note entries - Cleanup and bring in sync kernel descriptor header file - Emit kernel descriptor into .rodata with appropriate relocations and alignments git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@334519 91177308-0d34-0410-b5e6-96231b3b80d8	2018-06-12 18:02:46 +00:00
Mark Searles	8f93b43810	[AMDGPU] prevent hitting Assertion `isReg() && "Wrong MachineOperand accessor"' The use iterator, used within findMaskOperands(), can return anything which is not a def. isUse() requires a register, so check isReg() before calling isUse(). Differential Revision: https://reviews.llvm.org/D48047 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@334459 91177308-0d34-0410-b5e6-96231b3b80d8	2018-06-12 00:41:26 +00:00
George Burgess IV	23bfaae29f	Simplify; NFC Not shown in the diff: AQ is a `vector<SUnit >`, and SU is a `SUnit ` git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@334451 91177308-0d34-0410-b5e6-96231b3b80d8	2018-06-11 22:58:32 +00:00
Konstantin Zhuravlyov	db963fc642	AMDGPU: Add 64-bit relative variant kind Differential Revision: https://reviews.llvm.org/D47601 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@334443 91177308-0d34-0410-b5e6-96231b3b80d8	2018-06-11 21:37:57 +00:00
Stanislav Mekhanoshin	7a3f751cbc	[AMDGPU] Do not consider indirect acces through phi for wave limiter Rational: if there is indirect access that is usually an issue because load is not ready by the use. However, if use is inside a loop and load is outside that is potentially an issue for a first iteration only. Differential Revision: https://reviews.llvm.org/D47740 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@334420 91177308-0d34-0410-b5e6-96231b3b80d8	2018-06-11 16:50:49 +00:00
Daniil Fukalov	8fbd41bc43	[AMDGPU] Inline asm - added i16, half and i128 types support AMDGPU inline assembler support i16, half and i128 typed variables in constraints, but they were reported as error. Needed to fix https://github.com/RadeonOpenCompute/ROCm/issues/341, e.g. to be able to load with global_load_dwordx4 to a 128bit integer variable Differential Revision: https://reviews.llvm.org/D44920 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@334301 91177308-0d34-0410-b5e6-96231b3b80d8	2018-06-08 16:29:04 +00:00
Matt Arsenault	0aedafefd1	AMDGPU: Error on LDS global address in functions These won't work as expected now, so error on them to avoid wasting time debugging this in the future. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@334269 91177308-0d34-0410-b5e6-96231b3b80d8	2018-06-08 08:05:54 +00:00
Tony Tye	469bc504d0	[AMDGPU] Simplify memory legalizer (add missing virtual descructor) Differential Revision: https://reviews.llvm.org/D47504 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@334257 91177308-0d34-0410-b5e6-96231b3b80d8	2018-06-08 01:00:11 +00:00
Tony Tye	1310a7556e	[AMDGPU] Simplify memory legalizer - Make code easier to maintain. - Avoid generating waitcnts for VMEM if the address sppace does not involve VMEM. - Add support to generate waitcnts for LDS and GDS memory. Differential Revision: https://reviews.llvm.org/D47504 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@334241 91177308-0d34-0410-b5e6-96231b3b80d8	2018-06-07 22:28:32 +00:00
Matt Arsenault	4dea9c2811	AMDGPU: Fix not including v2f64 in SReg_128 Fixes assertion with calls returning v2f64. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@334189 91177308-0d34-0410-b5e6-96231b3b80d8	2018-06-07 12:16:31 +00:00
Matt Arsenault	86e569e7c6	AMDGPU: Use scalar operations for f16 fabs/fneg patterns Fixes unnecessary differences between subtargets. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@334184 91177308-0d34-0410-b5e6-96231b3b80d8	2018-06-07 10:15:20 +00:00
Matt Arsenault	75c4f68f53	AMDGPU: Try a lot harder to emit scalar loads This has two main components. First, widen widen short constant loads in DAG when they have the correct alignment. This is already done a bit in AMDGPUCodeGenPrepare, since that has access to DivergenceAnalysis. This can't help kernarg loads created in the DAG. Start to use DAG divergence analysis to help this case. The second part is to avoid kernel argument lowering breaking the alignment of short vector elements because calling convention lowering wants to split everything into legal register types. When loading a split type, load the nearest 4-byte aligned segment and shift to get the desired bits. This extra load of the earlier argument piece ends up merging, and the bit extract hopefully folds out. There are a number of improvements and regressions with this, but I think as-is this is a better compromise between several of the worst parts of SelectionDAG. Particularly when i16 is legal, this produces worse code for i8 and i16 element vector kernel arguments. This is partially due to the very weak load merging the DAG does. It only looks for fairly specific combines between pairs of loads which no longer appear. In particular this causes v4i16 loads to be split into 2 components when previously the two halves were merged. Worse, because of the newly introduced shifts, there is a lot more unnecessary vector packing and unpacking code emitted. At least some of this is due to reporting false for isTypeDesirableForOp for i16 as a workaround for the lack of divergence information in the DAG. The cases where this happens it doesn't actually matter, but the relevant code in SimplifyDemandedBits doens't have the context to know to ignore this. The use of the scalar cache is probably more important than the mess of mostly scalar instructions doing this packing and unpacking. Future work can fix this, possibly by making better use of the new DAG divergence information for controlling promotion decisions, or adding another version of shift + trunc + shift combines that doesn't only know about the used types. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@334180 91177308-0d34-0410-b5e6-96231b3b80d8	2018-06-07 09:54:49 +00:00
Stanislav Mekhanoshin	ded60aa7f8	[AMDGPU] Improve reciprocal handling When denormals are supported we are producing a full division for 1.0f / x. That still can be replaced by the faster version: bool c = fabs(x) > 0x1.0p+96f; float s = c ? 0x1.0p-32f : 1.0f; x = s; return s v_rcp_f32(x) in case if requested accuracy is 2.5ulp or less. The same version is used if denormals are not supported for non 1.0 numerators, where just v_rcp_f32 is then used for 1.0 numerator. The optimization of 1/x is extended to the case -1/x, which is the same except for the resulting sign bit. OpenCL conformance passed with both enabled and disabled denorms. Differential Revision: https://reviews.llvm.org/D47805 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@334142 91177308-0d34-0410-b5e6-96231b3b80d8	2018-06-06 22:22:32 +00:00
Matt Arsenault	2d78ae2ab1	AMDGPU: Custom lower v2f16 fneg/fabs with illegal f16 Fixes terrible code on targets without f16 support. The legalization creates a mess that is difficult to recover from. Also should avoid randomly breaking these tests multiple times in sequence in future commits. Some regressions in cases where it happens to be better to pull the source modifier after the conversion. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@334132 91177308-0d34-0410-b5e6-96231b3b80d8	2018-06-06 21:28:11 +00:00
Peter Smith	e2b2a91087	[MC] Pass MCSubtargetInfo to fixupNeedsRelaxation and applyFixup On targets like Arm some relaxations may only be performed when certain architectural features are available. As functions can be compiled with differing levels of architectural support we must make a judgement on whether we can relax based on the MCSubtargetInfo for the function. This change passes through the MCSubtargetInfo for the function to fixupNeedsRelaxation so that the decision on whether to relax can be made per function. In this patch, only the ARM backend makes use of this information. We must also pass the MCSubtargetInfo to applyFixup because some fixups skip error checking on the assumption that relaxation has occurred, to prevent code-generation errors applyFixup must see the same MCSubtargetInfo as fixupNeedsRelaxation. Differential Revision: https://reviews.llvm.org/D44928 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@334078 91177308-0d34-0410-b5e6-96231b3b80d8	2018-06-06 09:40:06 +00:00
Matt Arsenault	19dfb4b388	AMDGPU: Preserve metadata when widening loads Preserves the low bound of the !range. I don't think it's legal to do anything with the top half since it's theoretically reading garbage. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@334045 91177308-0d34-0410-b5e6-96231b3b80d8	2018-06-05 19:52:56 +00:00
Matt Arsenault	0c5ab47e3b	AMDGPU: Use more custom insert/extract_vector_elt lowering Apply to i8 vectors. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@334044 91177308-0d34-0410-b5e6-96231b3b80d8	2018-06-05 19:52:46 +00:00
David Blaikie	8325fb20d4	Move Analysis/Utils/Local.h back to Transforms Review feedback from r328165. Split out just the one function from the file that's used by Analysis. (As chandlerc pointed out, the original change only moved the header and not the implementation anyway - which was fine for the one function that was used (since it's a template/inlined in the header) but not in general) git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@333954 91177308-0d34-0410-b5e6-96231b3b80d8	2018-06-04 21:23:21 +00:00
Stanislav Mekhanoshin	9136eac979	[AMDGPU] Small refactoring in the scheduler After last changes some code can be simplified. Differential Revision: https://reviews.llvm.org/D47661 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@333934 91177308-0d34-0410-b5e6-96231b3b80d8	2018-06-04 17:57:40 +00:00
Stanislav Mekhanoshin	222c71a39d	[AMDGPU] Factored out common part of GCNRPTracker::reset() Differential Revision: https://reviews.llvm.org/D47664 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@333931 91177308-0d34-0410-b5e6-96231b3b80d8	2018-06-04 17:21:54 +00:00
Mark Searles	8177aafa74	[AMDGPU][Waitcnt] Fix handling of flat instrs On GFX9 and earlier, flat memory ops may decrement VMCNT out-of-order as well as LGKMCNT out-of-order. Differential Revision: https://reviews.llvm.org/D46616 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@333926 91177308-0d34-0410-b5e6-96231b3b80d8	2018-06-04 16:51:59 +00:00
Nicolai Haehnle	c5e2005321	AMDGPU: Make various NamedOperands upper case Summary: Avoid name clashes with the corresponding bit fields in the instruction encoding. Change-Id: Id1644e703e976e78f7af93788d9f44cb48c3251f Reviewers: arsenm, rampitec, kzhuravl Subscribers: wdng, yaxunl, dstuttard, tpr, t-tye, llvm-commits Differential Revision: https://reviews.llvm.org/D47433 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@333905 91177308-0d34-0410-b5e6-96231b3b80d8	2018-06-04 14:45:20 +00:00
Nicolai Haehnle	26db53e38e	TableGen: Streamline the semantics of NAME Summary: The new rules are straightforward. The main rules to keep in mind are: 1. NAME is an implicit template argument of class and multiclass, and will be substituted by the name of the instantiating def/defm. 2. The name of a def/defm in a multiclass must contain a reference to NAME. If such a reference is not present, it is automatically prepended. And for some additional subtleties, consider these: 3. defm with no name generates a unique name but has no special behavior otherwise. 4. def with no name generates an anonymous record, whose name is unique but undefined. In particular, the name won't contain a reference to NAME. Keeping rules 1&2 in mind should allow a predictable behavior of name resolution that is simple to follow. The old "rules" were rather surprising: sometimes (but not always), NAME would correspond to the name of the toplevel defm. They were also plain bonkers when you pushed them to their limits, as the old version of the TableGen test case shows. Having NAME correspond to the name of the toplevel defm introduces "spooky action at a distance" and breaks composability: refactoring the upper layers of a hierarchy of nested multiclass instantiations can cause unexpected breakage by changing the value of NAME at a lower level of the hierarchy. The new rules don't suffer from this problem. Some existing .td files have to be adjusted because they ended up depending on the details of the old implementation. Change-Id: I694095231565b30f563e6fd0417b41ee01a12589 Reviewers: tra, simon_tatham, craig.topper, MartinO, arsenm, javed.absar Subscribers: wdng, llvm-commits Differential Revision: https://reviews.llvm.org/D47430 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@333900 91177308-0d34-0410-b5e6-96231b3b80d8	2018-06-04 14:26:05 +00:00
Amaury Sechet	876db10e96	Set ADDE/ADDC/SUBE/SUBC to expand by default Summary: They've been deprecated in favor of UADDO/ADDCARRY or USUBO/SUBCARRY for a while. Target that uses these opcodes are changed in order to ensure their behavior doesn't change. Reviewers: efriedma, craig.topper, dblaikie, bkramer Subscribers: jholewinski, arsenm, jyknight, sdardis, nemanjai, nhaehnle, kbarton, fedor.sergeev, asb, rbar, johnrusso, simoncook, jordy.potman.lists, apazos, sabuasal, niosHD, jrtc27, zzheng, edward-jones, mgrang, atanasyan, llvm-commits Differential Revision: https://reviews.llvm.org/D47422 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@333748 91177308-0d34-0410-b5e6-96231b3b80d8	2018-06-01 13:21:33 +00:00
Tom Stellard	c54b4ce0c3	AMDGPU/R600: Move intrinsics to IntrinsicsAMDGPU.td Reviewers: arsenm, nhaehnle, jvesely Reviewed By: arsenm Subscribers: kzhuravl, wdng, yaxunl, dstuttard, tpr, llvm-commits, t-tye Differential Revision: https://reviews.llvm.org/D47487 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@333720 91177308-0d34-0410-b5e6-96231b3b80d8	2018-06-01 02:19:46 +00:00
Stanislav Mekhanoshin	e5240df77a	[AMDGPU] Construct memory clauses before RA Memory clauses are formed into bundles in presence of xnack. Their source operands are marked as early-clobber. This allows to allocate distinct source and destination registers within a clause and prevent breaking the clause with s_nop in the hazard recognizer. Clauses are undone before post-RA scheduler to allow some rescheduling, which will not break the clause since artificial edges are created in the dag to keep memory operations together. Yet this allows a better ILP in some cases. Differential Revision: https://reviews.llvm.org/D47511 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@333691 91177308-0d34-0410-b5e6-96231b3b80d8	2018-05-31 20:13:51 +00:00
Roman Tereshin	99f1f92235	[GlobalISel][AMDGPU] LegalizerInfo verifier: Adding LegalizerInfo::verify(...) call for AMDGPU Reviewers: aemerson, qcolombet Reviewed By: qcolombet Differential Revision: https://reviews.llvm.org/D46339 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@333664 91177308-0d34-0410-b5e6-96231b3b80d8	2018-05-31 16:16:48 +00:00
Stanislav Mekhanoshin	10bf29495f	[AMDGPU] Track occupancy in MFI Keep track of achieved occupancy in SIMachineFunctionInfo. At the moment we have a lot of duplicated or even missed code to query and maintain occupancy info. Record it in the MFI and query in a single call. Interfaces: - getOccupancy() - returns current recorded achieved occupancy. - getMinAllowedOccupancy() - returns lesser of the achieved occupancy and the lowest occupancy we are ready to tolerate. For example if a kernel is memory bound we are ready to tolerate 4 waves. - limitOccupancy() - record occupancy level if we have to lower it. - increaseOccupancy() - record occupancy if scheduler managed to increase the occupancy. MFI takes care of integrating different checks affecting occupancy, including LDS use and waves-per-eu attribute. Note that scheduler starts with not yet known register pressure, so has to record either limit or increase in occupancy after it is done. Later passes can just query a resulting value. New interface is used in the active scheduler and NFC wrt its work. Changes are also made to experimental schedulers to use it and record an occupancy after they are done. Before the change waves-per-eu was ignored by experimental schedulers and tolerance window for memory bound kernels was not used. Differential Revision: https://reviews.llvm.org/D47509 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@333629 91177308-0d34-0410-b5e6-96231b3b80d8	2018-05-31 05:36:04 +00:00
Jan Vesely	655741c807	AMDGPU/R600: Make sure functions are cacheline aligned v2: use "ensureAlignment" make functions cache line aligned Fixes GPU hangs since r333219: "AMDGPU: Split R600 AsmPrinter code into its own class" Differential Revision: https://reviews.llvm.org/D47516 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@333622 91177308-0d34-0410-b5e6-96231b3b80d8	2018-05-31 04:08:08 +00:00
Tom Stellard	e0c801c31a	AMDGPU: Split AMDGPUTTI into GCNTTI and R600TTI Reviewers: arsenm, nhaehnle Reviewed By: arsenm Subscribers: kzhuravl, wdng, yaxunl, dstuttard, tpr, llvm-commits, t-tye Differential Revision: https://reviews.llvm.org/D47359 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@333605 91177308-0d34-0410-b5e6-96231b3b80d8	2018-05-30 22:55:35 +00:00
Mark Searles	b7ba560cf9	[AMDGPU][Waitcnt] Fix build error: unused variable 'SWaitInst' https://reviews.llvm.org/rL333556 caused a buildbot failure. See http://lab.llvm.org:8011/builders/lld-x86_64-darwin13/builds/21876/steps/build_Lld/logs/stdio /Users/buildslave/as-bldslv9/lld-x86_64-darwin13/llvm.src/lib/Target/AMDGPU/SIInsertWaitcnts.cpp:2007:10: error: unused variable 'SWaitInst' [-Werror,-Wunused-variable] auto SWaitInst = BuildMI(EntryBB, EntryBB.getFirstNonPHI(), The unused variable was for debugging purposes; removing that piece of code to fix the build. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@333559 91177308-0d34-0410-b5e6-96231b3b80d8	2018-05-30 16:27:57 +00:00
Matt Arsenault	664e38cb13	AMDGPU: Use better alignment for kernarg lowering This was just emitting loads with the ABI alignment for the raw type. The true alignment is often better, especially when an illegal vector type was scalarized. The better alignment allows using a scalar load more often. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@333558 91177308-0d34-0410-b5e6-96231b3b80d8	2018-05-30 16:17:51 +00:00
Mark Searles	b81ef9cebe	[AMDGPU][Waitcnt] Fix handling of loops with many bottom blocks In terms of waitcnt insertion/if necessary, the waitcnt pass forces convergence for a loop. Previously, that kicked if greater than 2 passes over a loop, which doesn't account for loop with many bottom blocks. So, increase the threshold to (n+1), where n is the number of bottom blocks. This gives the pass an opportunity to consider the contribution of each bottom block, to the overall loop, before the forced convergence potentially kicks in. Differential Revision: https://reviews.llvm.org/D47488 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@333556 91177308-0d34-0410-b5e6-96231b3b80d8	2018-05-30 15:47:45 +00:00
Matt Arsenault	c40f49e881	AMDGPU: Fix typo in option description git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@333457 91177308-0d34-0410-b5e6-96231b3b80d8	2018-05-29 19:35:46 +00:00
Matt Arsenault	1fa5b55214	AMDGPU: Round up kernel argument allocation size AFAIK the driver's allocation will actually have to round this up anyway. It is useful to track the rounded up size, so that the end of the kernel segment is known to be dereferencable so a wider s_load_dword can be used for a short argument at the end of the segment. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@333456 91177308-0d34-0410-b5e6-96231b3b80d8	2018-05-29 19:35:00 +00:00

1 2 3 4 5 ...

2659 Commits