archived-llvm-mirror

mirror of https://github.com/RPCS3/llvm-mirror.git synced 2026-01-31 01:35:20 +01:00

Author	SHA1	Message	Date
Jan Sjodin	43e732c065	Fix warnings in r313297. llvm-svn: 313302	2017-09-14 21:49:52 +00:00
Matt Arsenault	a8560d8853	AMDGPU: Fix violating constant bus restriction You can't use madmk/madmk if it already uses an SGPR input. llvm-svn: 313298	2017-09-14 20:54:29 +00:00
Jan Sjodin	242b2dcd0b	Add AddresSpace to PseudoSourceValue. Differential Revision: https://reviews.llvm.org/D35089 llvm-svn: 313297	2017-09-14 20:53:51 +00:00
Matt Arsenault	f18ea9e4aa	AMDGPU: Fix assert on alloca of array of struct llvm-svn: 313282	2017-09-14 18:02:29 +00:00
Matt Arsenault	0a3745b2a6	AMDGPU: Stop modifying SP in call sequences Because the stack growth direction and addressing is done in the same direction, modifying SP at the beginning of the call sequence was incorrect. If we had a stack passed argument, we would end up skipping that number of bytes before pushing arguments, leaving unused/inconsistent space. The callee creates fixed stack objects in its frame, so the space necessary for these is already logically allocated in the callee, so we just let the callee increment SP if it really requires it. llvm-svn: 313279	2017-09-14 17:37:40 +00:00
Matt Arsenault	332360a091	AMDGPU: Make frame register caller preserved Using SplitCSR for the frame register was very broken. Often the copies in the prolog and epilog were optimized out, in addition to them being inserted after the true prolog where the FP was clobbered. I have a hacky solution which works that continues to use split CSR, but for now this is simpler and will get to working programs. llvm-svn: 313274	2017-09-14 17:14:57 +00:00
Matt Arsenault	668655a056	AMDGPU: Don't spill SP reg like a normal CSR llvm-svn: 313217	2017-09-13 23:47:01 +00:00
Stanislav Mekhanoshin	fbfa163a41	Allow target to decide when to cluster loads/stores in misched MachineScheduler when clustering loads or stores checks if base pointers point to the same memory. This check is done through comparison of base registers of two memory instructions. This works fine when instructions have separate offset operand. If they require a full calculated pointer such instructions can never be clustered according to such logic. Changed shouldClusterMemOps to accept base registers as well and let it decide what to do about it. Differential Revision: https://reviews.llvm.org/D37698 llvm-svn: 313208	2017-09-13 22:20:47 +00:00
Matt Arsenault	dd4680bcec	AMDGPU: Handle coldcc in more places Missed in r312936 llvm-svn: 313205	2017-09-13 21:55:52 +00:00
Matt Arsenault	c6fab4ecd2	AMDGPU: Allow coldcc calls llvm-svn: 312936	2017-09-11 18:54:20 +00:00
Stanislav Mekhanoshin	ac399c356c	[AMDGPU] Produce madak and madmk from the two-address pass These two instructions are normally selected, but when the two address pass converts mac into mad we end up with the mad where we could have one of these. Differential Revision: https://reviews.llvm.org/D37389 llvm-svn: 312928	2017-09-11 17:13:57 +00:00
Tim Renouf	0ce7d42fef	[AMDGPU] exp should not be in WQM mode A mrt exp with vm=1 must be in exact (non-WQM) mode, as it also exports the exec mask as the valid mask to determine which pixels to render. This commit marks any exp as needing to be in exact mode. Actually, if there are multiple mrt exps, only one needs to have vm=1, and only that one needs to be in exact mode. But that is an optimization for another day. Differential Revision: https://reviews.llvm.org/D36305 llvm-svn: 312915	2017-09-11 13:55:39 +00:00
Tim Renouf	a63255f185	AMDGPU: trivial comment change ... to check commit access for new committer. llvm-svn: 312900	2017-09-11 08:31:32 +00:00
Davide Italiano	eb697977d3	[AMDGPU] Remove unused function. NFCI. llvm-svn: 312836	2017-09-08 23:54:11 +00:00
Matt Arsenault	26a8dd2b88	AMDGPU: Start using !con operator We have a lot of operand definition work essentially producing every valid permutation of operands to workaround builiding operand lists based on the instruction features. Apparently tablegen already has a mostly undocumented operator to concat dags which simplies this. Convert one simple place to use this. The BUF instruction definitions have much more complicated logic that can be totally rewritten now. llvm-svn: 312822	2017-09-08 19:09:13 +00:00
Matt Arsenault	ce67d359ea	AMDGPU: Recompute scc liveness The various scalar bit operations set SCC, so one is erased or moved it needs to be recomputed. Not sure why the existing tests don't fail on this. llvm-svn: 312819	2017-09-08 18:51:26 +00:00
Matt Arsenault	a0c03c6e92	AMDGPU: Start selecting v_mad_mix_f32 llvm-svn: 312732	2017-09-07 18:05:07 +00:00
Konstantin Zhuravlyov	d45f9c8f96	AMDGPU: Handle non-temporal loads and stores Differential Revision: https://reviews.llvm.org/D36862 llvm-svn: 312729	2017-09-07 17:14:54 +00:00
Konstantin Zhuravlyov	693d4420f7	AMDGPU: Handle more than one memory operand in SIMemoryLegalizer Differential Revision: https://reviews.llvm.org/D37397 llvm-svn: 312725	2017-09-07 16:14:21 +00:00
Matt Arsenault	40dcb3af13	AMDGPU: Don't legalize i16 extloads to i32 with legal i16 Keeping non-i16 extloads makes it easier to match some new gfx9 load instructions. llvm-svn: 312699	2017-09-07 05:37:34 +00:00
Stanislav Mekhanoshin	5a54368061	[AMDGPU] Use v_pk_max_f16 for fcanonicalize Differential Revision: https://reviews.llvm.org/D37325 llvm-svn: 312676	2017-09-06 22:27:29 +00:00
Stanislav Mekhanoshin	0361615e65	[AMDGPU] Fixed encoding of v_pk_mul_f16 in fcanonicalize Differential Revision: https://reviews.llvm.org/D37522 llvm-svn: 312660	2017-09-06 18:29:51 +00:00
Stanislav Mekhanoshin	c3fee5e8af	[AMDGPU] Fix shouldClusterMemOps to process flat loads Flat loads do not have vdata operand but have vdst instead. Differential Revision: https://reviews.llvm.org/D37502 llvm-svn: 312640	2017-09-06 15:31:30 +00:00
Nicolai Haehnle	12b9057d7f	AMDGPU: Make worst-case assumption about the wait states in inline assembly Summary: Mesa still uses a hack where empty inline assembly is used as a kind of optimization barrier. This exposed a problem where not enough wait states were inserted, because the hazard recognizer implicitly assumed that each inline assembly "instruction" has at least one wait state. Reviewers: arsenm Subscribers: kzhuravl, wdng, yaxunl, dstuttard, tpr, llvm-commits, t-tye Differential Revision: https://reviews.llvm.org/D37205 llvm-svn: 312635	2017-09-06 13:50:13 +00:00
Yaxun Liu	97cfe8f552	[AMDGPU] Transform __read_pipe_* and __write_pipe_* When packet size equals packet align and is power of 2, transform __read_pipe* and __write_pipe* to specialized library function. Differential Revision: https://reviews.llvm.org/D36831 llvm-svn: 312598	2017-09-06 00:30:27 +00:00
Konstantin Zhuravlyov	696ac55220	AMDGPU: Cleanup/refactor SIMemoryLegalizer [3]: - Refactor SIMemOpInfo's constructors - Allow construction of NotAtomic SIMemOpInfo Differential Revision: https://reviews.llvm.org/D37396 llvm-svn: 312563	2017-09-05 19:01:10 +00:00
Matt Arsenault	cc1c22b6ad	AMDGPU: Fix not accounting for tail call resource usage If the only call in a function is a tail call, the function isn't considered to have a call since it's a type of return. llvm-svn: 312561	2017-09-05 18:36:36 +00:00
Konstantin Zhuravlyov	f8f8f79ae7	AMDGPU/NFC: Cleanup/refactor SIMemoryLegalizer [2]: - Make SIMemOpInfo a class - Add accessor methods to SIMemOpInfo - Move get*Info methods to SIMemOpInfo Differential Revision: https://reviews.llvm.org/D37395 llvm-svn: 312541	2017-09-05 16:41:25 +00:00
Konstantin Zhuravlyov	2b3d09d8f1	AMDGPU/NFC: Cleanup/refactor SIMemoryLegalizer [1]: - Rename MemOpInfo -> SIMemOpInfo - Move SIMemOpInfo class out of SIMemoryLegalizer class Differential Revision: https://reviews.llvm.org/D37394 llvm-svn: 312540	2017-09-05 16:18:05 +00:00
Stanislav Mekhanoshin	f0a6c9995f	[AMDGPU] Prevent infinite recursion in DAG.computeKnownBits() Differential Revision: https://reviews.llvm.org/D37392 llvm-svn: 312364	2017-09-01 20:43:20 +00:00
Matt Arsenault	960a469e7e	AMDGPU: Add ds_{read\|write}_addtid_b32 definitions llvm-svn: 312349	2017-09-01 18:38:02 +00:00
Matt Arsenault	d48237c09b	AMDGPU: Add most d16 load/store instruction definitions Doesn't include the tied operand necessary for the loads, but is enough for the assembler to work. llvm-svn: 312347	2017-09-01 18:36:06 +00:00
Nicolai Haehnle	eaeeaec273	AMDGPU: IMPLICIT_DEFs and DBG_VALUEs do not contribute to wait states Summary: This fixes a bug that was exposed on gfx9 in various GL45-CTS.shaders.loops.*_iterations.select_iteration_count_fragment tests, e.g. GL45-CTS.shaders.loops.do_while_uniform_iterations.select_iteration_count_fragment Reviewers: arsenm Subscribers: kzhuravl, wdng, yaxunl, dstuttard, tpr, t-tye, llvm-commits Differential Revision: https://reviews.llvm.org/D36193 llvm-svn: 312337	2017-09-01 16:56:32 +00:00
Matt Arsenault	7a5feae52b	AMDGPU: Fold clamp modifier for packed instructions llvm-svn: 312297	2017-08-31 23:53:50 +00:00
Eugene Zelenko	f25fa567b0	[Analysis] Fix some Clang-tidy modernize-use-using and Include What You Use warnings; other minor fixes. Also affected in files (NFC). llvm-svn: 312289	2017-08-31 21:56:16 +00:00
Matt Arsenault	82b42ca714	AMDGPU: Turn int pack pattern into build_vector build_vector is a more useful canonical form when pattern matching packed operations, so turn shift into high element into a build_vector. Should show no change for now. llvm-svn: 312282	2017-08-31 21:17:22 +00:00
Matt Arsenault	11d7542186	AMDGPU: Don't assert in TTI with fp32 denorms enabled Also refine for f16 and rcp cases. llvm-svn: 312213	2017-08-31 05:47:00 +00:00
Matt Arsenault	1edfe9115e	AMDGPU: Use set for tracked registers The majority of the time spent in the pass checking for the register reads. Rather than searching all of the defined registers for uses in each instruction, use a set of defined registers and check the operands of the instruction. This process still is algorithmically not great, but with the additional trick of skipping the analysis for addresses with one use, this brings one slow testcase into a reasonable range. llvm-svn: 312206	2017-08-31 01:53:09 +00:00
Matt Arsenault	de478ba30e	AMDGPU: Correct operand types for v_mad_mix* These aren't really packed instructions, so the default op_sel_hi should be 0 since this indicates a conversion. The operand types are scalar values that behave similar to an f16 scalar that may be converted to f32. Doesn't change the default printing for op_sel_hi, just the parsing. llvm-svn: 312179	2017-08-30 22:18:40 +00:00
Matt Arsenault	c0412ae0a3	AMDGPU: Don't look for DS merge candidates with one use address The merge is only possible if the base address register is the same for the two instructions. If there is only the one use, there's no point in doing an expensive forward scan checking for memory interference looking for a merge candidate. This gives a signficant improvement in one extreme testcase. The code to do the scan is still algorithmically terrible, so this is still the slowest pass in that example. llvm-svn: 312096	2017-08-30 03:26:18 +00:00
Stanislav Mekhanoshin	36ec90d063	[AMDGPU] Use v_max_f* for fcanonicalize If denorms are not flushed we can use max instead of multiplication by 1. For double that is simply faster, while for float and half it is shorter, because mul uses constant bus and VOP3. Differential Revision: https://reviews.llvm.org/D36856 llvm-svn: 312095	2017-08-30 03:03:38 +00:00
Matt Arsenault	70c0f608ef	AMDGPU: Select clamp pattern with v2f16 llvm-svn: 312087	2017-08-30 01:20:17 +00:00
Matt Arsenault	2cedecb223	AMDGPU: Fix typo llvm-svn: 312040	2017-08-29 21:25:51 +00:00
Stanislav Mekhanoshin	22de6c878a	[AMDGPU] Fix regression in AMDGPULibCalls allowing native for doubles Under -cl-fast-relaxed-math we could use native_sqrt, but f64 was allowed to produce HSAIL's nsqrt instruction. HSAIL is not here and we stick with non-existing native_sqrt(double) as a result. Add check for f64 to not return native functions and also remove handling of f64 case for fold_sqrt. Differential Revision: https://reviews.llvm.org/D37223 llvm-svn: 311900	2017-08-28 18:00:08 +00:00
Stanislav Mekhanoshin	5f48b3a89c	[AMDGPU] computeKnownBitsForTargetNode for 24 bit mul Differential Revision: https://reviews.llvm.org/D37168 llvm-svn: 311896	2017-08-28 16:35:37 +00:00
Konstantin Zhuravlyov	7d9fe6e6ce	AMDGPU: Fix gfx801 features gfx801 has 1/2 rate F64, Fast F32 FMA Differential Revision: https://reviews.llvm.org/D36981 llvm-svn: 311694	2017-08-24 20:03:07 +00:00
Benjamin Kramer	b795ef1cb5	Move helper classes into anonymous namespaces. No functionality change intended. llvm-svn: 311288	2017-08-20 13:03:48 +00:00
Konstantin Zhuravlyov	c624568e15	AMDGPU/NFC: Reorder functions in SIMemoryLegalizer: - Move load functions before atomic functions - Move store functions before atomic functions llvm-svn: 311256	2017-08-19 18:44:27 +00:00
Konstantin Zhuravlyov	eec800fc3a	AMDGPU/NFC: Rename few things in SIMemoryLegalizer: - AtomicInfo -> MemOpInfo - getAtomicLoadInfo -> getLoadInfo - getAtomicStoreInfo -> getStoreInfo - expandAtomicLoad -> expandLoad - expandAtomicStore -> expandStore Differential Revision: https://reviews.llvm.org/D36861 llvm-svn: 311179	2017-08-18 17:30:02 +00:00
Tom Stellard	a72d243500	AMDGPU: Add R600InstPrinter class Summary: This is step towards separating the GCN and R600 tablegen'd code. This is a little awkward for now, because the R600 functions won't have the MCSubtargetInfo parameter, so we need to have AMDMGPUInstPrinter delegate to R600InstPrinter, but once the tablegen'd code is split, we will be able to drop the delegation and use R600InstPrinter directly. Reviewers: arsenm Subscribers: kzhuravl, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, llvm-commits Differential Revision: https://reviews.llvm.org/D36444 llvm-svn: 311128	2017-08-17 22:20:04 +00:00

1 2 3 4 5 ...

2093 Commits