RPCS3/llvm - llvm - Gitea: Git with a cup of tea

RPCS3/llvm

mirror of https://github.com/RPCS3/llvm.git synced 2025-05-23 05:46:05 +00:00

Author	SHA1	Message	Date
Stanislav Mekhanoshin	abcd91992d	[AMDGPU] Unroll more to eliminate phis and conditions Increase threshold to unroll a loop which contains an "if" statement whose condition defined by a PHI belonging to the loop. This may help to eliminate if region and potentially even PHI itself, saving on both divergence and registers used for the PHI. Add a small bonus for each of such "if" statements. Differential Revision: https://reviews.llvm.org/D31693 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@299779 91177308-0d34-0410-b5e6-96231b3b80d8	2017-04-07 16:26:28 +00:00
Stanislav Mekhanoshin	e42e44c7e3	[AMDGPU] Boost unroll threshold for loops reading local memory This is less important than increase threshold for private memory, but still brings performance improvements in a wide range of tests. Unrolling more for local memory serves three purposes: it allows to combine ds operations if offset becomes static, saves registers used for offsets in case of static offsets, and allows better lds latency hiding. Differential Revision: https://reviews.llvm.org/D31412 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@298948 91177308-0d34-0410-b5e6-96231b3b80d8	2017-03-28 22:13:51 +00:00
Yaxun Liu	ab3be33d40	[AMDGPU] Get address space mapping by target triple environment As we introduced target triple environment amdgiz and amdgizcl, the address space values are no longer enums. We have to decide the value by target triple. The basic idea is to use struct AMDGPUAS to represent address space values. For address space values which are not depend on target triple, use static const members, so that they don't occupy extra memory space and is equivalent to a compile time constant. Since the struct is lightweight and cheap, it can be created on the fly at the point of usage. Or it can be added as member to a pass and created at the beginning of the run* function. Differential Revision: https://reviews.llvm.org/D31284 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@298846 91177308-0d34-0410-b5e6-96231b3b80d8	2017-03-27 14:04:01 +00:00
Changpeng Fang	1970df176f	AMDGPU/SI: Disable unrolling in the loop vectorizer if the loop is not vectorized. Reviewers: arsenm Differential Revision: http://reviews.llvm.org/D30719 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@297328 91177308-0d34-0410-b5e6-96231b3b80d8	2017-03-09 00:07:00 +00:00
Matt Arsenault	210095c5c9	LoadStoreVectorizer: Split even sized illegal chains properly Implement isLegalToVectorizeLoadChain for AMDGPU to avoid producing private address spaces accesses that will need to be split up later. This was doing the wrong thing in the case where the queried chain was an even number of elements. A possible <4 x i32> store was being split into store <2 x i32> store i32 store i32 rather than store <2 x i32> store <2 x i32> when legal. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@295933 91177308-0d34-0410-b5e6-96231b3b80d8	2017-02-23 03:58:53 +00:00
Matt Arsenault	a0240d6d1a	AMDGPU: Remove SI_fs_constant and SI_fs_interp intrinsics Update test uses with expansion in terms of new intrinsics. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@295269 91177308-0d34-0410-b5e6-96231b3b80d8	2017-02-16 02:01:13 +00:00
Stanislav Mekhanoshin	8773751a3d	[AMDGPU] Bump -amdgpu-unroll-threshold-private to 2000 This has quite positive performance impact according to measurements. Before previous fixes to limit the optimization that was too high and blowed compile time and scratch usage, but now this is gone and we can bump the threshold. Differential Revision: https://reviews.llvm.org/D29505 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@294032 91177308-0d34-0410-b5e6-96231b3b80d8	2017-02-03 20:08:29 +00:00
Matt Arsenault	94b8d93358	AMDGPU: Don't unroll for private with dynamic allocas This won't be elimnated, so this will just bloat code if/when these are ever used/supported. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@294030 91177308-0d34-0410-b5e6-96231b3b80d8	2017-02-03 19:36:00 +00:00
Stanislav Mekhanoshin	95787fcd27	[AMDGPU] Unroll preferences improvements Exit loop analysis early if suitable private access found. Do not account for GEPs which are invariant to loop induction variable. Do not account for Allocas which are too big to fit into register file anyway. Add option for tuning: -amdgpu-unroll-threshold-private. Differential Revision: https://reviews.llvm.org/D29473 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@293991 91177308-0d34-0410-b5e6-96231b3b80d8	2017-02-03 02:20:05 +00:00
Matt Arsenault	6f6c6c9128	AMDGPU: Fix atomic_inc/atomic_dec + ds_swizzle not being divergent git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@293504 91177308-0d34-0410-b5e6-96231b3b80d8	2017-01-30 17:09:47 +00:00
Mohammed Agabaria	9c6b24cc3a	[X86] updating TTI costs for arithmetic instructions on X86\SLM arch. updated instructions: pmulld, pmullw, pmulhw, mulsd, mulps, mulpd, divss, divps, divsd, divpd, addpd and subpd. special optimization case which replaces pmulld with pmullw\pmulhw\pshuf seq. In case if the real operands bitwidth <= 16. Differential Revision: https://reviews.llvm.org/D28104 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@291657 91177308-0d34-0410-b5e6-96231b3b80d8	2017-01-11 08:23:37 +00:00
Nicolai Haehnle	ddba2a5670	AMDGPU: llvm.amdgcn.interp.mov is a source of divergence Summary: While the result is constant across a single primitive, each pixel shader wave can have pixels from multiple primitives. Reviewers: tstellarAMD, arsenm Subscribers: kzhuravl, wdng, yaxunl, llvm-commits, tony-tye Differential Revision: https://reviews.llvm.org/D27572 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@289447 91177308-0d34-0410-b5e6-96231b3b80d8	2016-12-12 16:52:19 +00:00
Volkan Keles	9920e54e99	Add new target hooks for LoadStoreVectorizer Summary: Added 6 new target hooks for the vectorizer in order to filter types, handle size constraints and decide how to split chains. Reviewers: tstellarAMD, arsenm Subscribers: arsenm, mzolotukhin, wdng, llvm-commits, nhaehnle Differential Revision: https://reviews.llvm.org/D24727 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@283099 91177308-0d34-0410-b5e6-96231b3b80d8	2016-10-03 10:31:34 +00:00
Matt Arsenault	4dc595b198	AMDGPU: Implement getLoadStoreVecRegBitWidth git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@274312 91177308-0d34-0410-b5e6-96231b3b80d8	2016-07-01 00:56:27 +00:00
Matt Arsenault	310a3752c0	AMDGPU: Remove llvm.SI.tid intrinsic Mesa doesn't emit this for llvm >= 3.8 anymore. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@273050 91177308-0d34-0410-b5e6-96231b3b80d8	2016-06-17 21:18:41 +00:00
Nicolai Haehnle	10ea516563	AMDGPU: llvm.SI.fs.constant is a source of divergence Summary: This intrinsic is used to get flat-shaded fragment shader inputs. Those are uniform across a primitive, but a fragment shader wave may process pixels from multiple primitives (as indicated by the prim_mask), and so that's where divergence can arise. Reviewers: arsenm, tstellarAMD Subscribers: arsenm, llvm-commits Differential Revision: http://reviews.llvm.org/D19747 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@268259 91177308-0d34-0410-b5e6-96231b3b80d8	2016-05-02 17:37:01 +00:00
Nicolai Haehnle	3441786e27	AMDGPU/SI: add llvm.amdgcn.ps.live intrinsic Summary: This intrinsic returns true if the current thread belongs to a live pixel and false if it belongs to a pixel that we are executing only for derivative computation. It will be used by Mesa to implement gl_HelperInvocation. Note that for pixels that are killed during the shader, this implementation also returns true, but it doesn't matter because those pixels are always disabled in the EXEC mask. This unearthed a corner case in the instruction verifier, which complained about a v_cndmask 0, 1, exec, exec<imp-use> instruction. That's stupid but correct code, so make the verifier accept it as such. Reviewers: arsenm, tstellarAMD Subscribers: arsenm, llvm-commits Differential Revision: http://reviews.llvm.org/D19191 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@267102 91177308-0d34-0410-b5e6-96231b3b80d8	2016-04-22 04:04:08 +00:00
Nicolai Haehnle	ea7a0c0467	AMDGPU: Add a shader calling convention This makes it possible to distinguish between mesa shaders and other kernels even in the presence of compute shaders. Patch By: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl> Differential Revision: http://reviews.llvm.org/D18559 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@265589 91177308-0d34-0410-b5e6-96231b3b80d8	2016-04-06 19:40:20 +00:00
Matt Arsenault	bb366db643	AMDGPU: Cost model for basic integer operations This resolves bug 21148 by preventing promotion to i64 induction variables. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@264376 91177308-0d34-0410-b5e6-96231b3b80d8	2016-03-25 01:16:40 +00:00
Matt Arsenault	359a7d918e	AMDGPU: Partially implement getArithmeticInstrCost for FP ops git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@264374 91177308-0d34-0410-b5e6-96231b3b80d8	2016-03-25 01:00:32 +00:00
Matt Arsenault	42792ff39d	AMDGPU: TTI: Make insertelement free. We don't want to have a cost to scalarizing operations. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@264364 91177308-0d34-0410-b5e6-96231b3b80d8	2016-03-25 00:14:11 +00:00
Nicolai Haehnle	e9f50a4929	AMDGPU/SI: Add llvm.amdgcn.buffer.atomic.* intrinsics Summary: These intrinsics expose the BUFFER_ATOMIC_* instructions and will be used by Mesa to implement atomics with buffer semantics. The intrinsic interface matches that of buffer.load.format and buffer.store.format, except that the GLC bit is not exposed (it is automatically deduced based on whether the return value is used). The change of hasSideEffects is required for TableGen to accept the pattern that matches the intrinsic. Reviewers: tstellarAMD, arsenm Subscribers: arsenm, rivanvx, llvm-commits Differential Revision: http://reviews.llvm.org/D18151 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@263791 91177308-0d34-0410-b5e6-96231b3b80d8	2016-03-18 16:24:31 +00:00
Nicolai Haehnle	d3939c80f8	AMDGPU: mark atomic instructions as sources of divergence Summary: As explained by the comment, threads will typically see different values returned by atomic instructions even if the arguments are equal. Reviewers: arsenm, tstellarAMD Subscribers: arsenm, llvm-commits Differential Revision: http://reviews.llvm.org/D18156 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@263719 91177308-0d34-0410-b5e6-96231b3b80d8	2016-03-17 16:21:59 +00:00
Nicolai Haehnle	555e806a6f	AMDGPU: mark llvm.amdgcn.image.atomic.* as a source of divergence Summary: When multiple threads perform an atomic op with the same arguments, they will usually see different return values. Reviewers: arsenm, tstellarAMD Subscribers: arsenm, llvm-commits Differential Revision: http://reviews.llvm.org/D18101 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@263440 91177308-0d34-0410-b5e6-96231b3b80d8	2016-03-14 15:37:18 +00:00
Tom Stellard	98ef447825	AMDGPU/SI: Detect uniform branches and emit s_cbranch instructions Reviewers: arsenm Subscribers: mareko, MatzeB, qcolombet, arsenm, llvm-commits Differential Revision: http://reviews.llvm.org/D16603 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@260765 91177308-0d34-0410-b5e6-96231b3b80d8	2016-02-12 23:45:29 +00:00
Matt Arsenault	5911f21abb	AMDGPU: Fix not handling new workitem intrinsics in DivergenceAnalysis git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@260491 91177308-0d34-0410-b5e6-96231b3b80d8	2016-02-11 05:32:51 +00:00
Matt Arsenault	1e79dfc2ee	AMDGPU: Fix getRegisterBitWidth for vectors git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@256362 91177308-0d34-0410-b5e6-96231b3b80d8	2015-12-24 05:14:55 +00:00
Tom Stellard	688dd45c47	AMDGPU/SI: Fix implemenation of isSourceOfDivergence() for graphics shaders Summary: The analysis of shader inputs was completely wrong. We were passing the wrong index to AttributeSet::hasAttribute() and the logic for which inputs where in SGPRs was wrong too. Reviewers: arsenm Subscribers: arsenm, llvm-commits Differential Revision: http://reviews.llvm.org/D15608 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@256082 91177308-0d34-0410-b5e6-96231b3b80d8	2015-12-19 02:54:15 +00:00
Matt Arsenault	b11dd50509	AMDGPU: Override getCFInstrCost The default cost was 0 with the assumption that it is predictable. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@255796 91177308-0d34-0410-b5e6-96231b3b80d8	2015-12-16 18:37:19 +00:00
Tom Stellard	6e2ee6c7ea	AMDGPU/SI: Implement AMDGPUTargetTransformInfo::isSourceOfDivergence() Reviewers: arsenm Subscribers: arsenm, llvm-commits Differential Revision: http://reviews.llvm.org/D15476 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@255661 91177308-0d34-0410-b5e6-96231b3b80d8	2015-12-15 18:04:38 +00:00
Matt Arsenault	2320de6adb	AMDGPU: Report extractelement as free in cost model The cost for scalarized operations is computed as N * (scalar operation cost + 1 extractelement + 1 insertelement). This partially fixes inflating the cost of scalarized operations since every operation is scalarized and free. I don't think we want any cost asociated with scalarization, but for now insertelement is still counted. I'm not sure if we should pretend that insertelement is also free, or add a way to compute a custom scalarization cost. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@254438 91177308-0d34-0410-b5e6-96231b3b80d8	2015-12-01 19:08:39 +00:00
Tom Stellard	953c681473	R600 -> AMDGPU rename git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@239657 91177308-0d34-0410-b5e6-96231b3b80d8	2015-06-13 03:28:10 +00:00

32 Commits