RPCSX/llvm - llvm - Gitea: Git with a cup of tea

RPCSX/llvm

mirror of https://github.com/RPCSX/llvm.git synced 2025-01-27 07:12:06 +00:00

Author	SHA1	Message	Date
Konstantin Zhuravlyov	e1d66f4ce3	[AMDGPU] Emit linkonce and linkonce_odr symbols Differential Revision: http://reviews.llvm.org/D18726 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@265408 91177308-0d34-0410-b5e6-96231b3b80d8	2016-04-05 16:00:58 +00:00
Tom Stellard	059753cf8e	AMDGPU: Implement {BUFFER,FLAT}_ATOMIC_CMPSWAP{,_X2} Summary: Implement BUFFER_ATOMIC_CMPSWAP{,_X2} instructions on all GCN targets, and FLAT_ATOMIC_CMPSWAP{,_X2} on CI+. 32-bit instruction variants tested manually on Kabini and Bonaire. Tests and parts of code provided by Jan Veselý. Patch by: Vedran Miletić Reviewers: arsenm, tstellarAMD, nhaehnle Subscribers: jvesely, scchan, kanarayan, arsenm Differential Revision: http://reviews.llvm.org/D17280 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@265170 91177308-0d34-0410-b5e6-96231b3b80d8	2016-04-01 18:27:37 +00:00
Adrian Prantl	7876f64bc3	testcase gardening: update the emissionKind enum to the new syntax. (NFC) git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@265081 91177308-0d34-0410-b5e6-96231b3b80d8	2016-04-01 00:16:49 +00:00
Matt Arsenault	e8cd894c28	AMDGPU: Add frexp_exp intrinsic git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@264944 91177308-0d34-0410-b5e6-96231b3b80d8	2016-03-30 22:28:52 +00:00
Tom Stellard	7725fd8c02	AMDGPU/SI: Improve MachineSchedModel definition This patch contains a few improvements to the model, including: - Using a single resource with a defined buffers size for each memory unit. - Setting the IssueWidth correctly. - Fixing latency values for memory instructions. shader-db stats: 16429 shaders in 3231 tests Totals: SGPRS: 318232 -> 312328 (-1.86 %) VGPRS: 208996 -> 209346 (0.17 %) Code Size: 7147044 -> 7166440 (0.27 %) bytes LDS: 83 -> 83 (0.00 %) blocks Scratch: 1862656 -> 1459200 (-21.66 %) bytes per wave Max Waves: 49182 -> 49243 (0.12 %) Wait states: 0 -> 0 (0.00 %)A Differential Revision: http://reviews.llvm.org/D18453 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@264877 91177308-0d34-0410-b5e6-96231b3b80d8	2016-03-30 16:35:13 +00:00
Tom Stellard	d3adac51fc	AMDGPU/SI: Enable lanemask tracking in misched Summary: This results in higher register usage, but should make it easier for the compiler to hide latency. This pass is a prerequisite for some more scheduler improvements, and I think the increase register usage with this patch is acceptable, because when combined with the scheduler improvements, the total register usage will decrease. shader-db stats: 2382 shaders in 478 tests Totals: SGPRS: 48672 -> 49088 (0.85 %) VGPRS: 34148 -> 34847 (2.05 %) Code Size: 1285816 -> 1289128 (0.26 %) bytes LDS: 28 -> 28 (0.00 %) blocks Scratch: 492544 -> 573440 (16.42 %) bytes per wave Max Waves: 6856 -> 6846 (-0.15 %) Wait states: 0 -> 0 (0.00 %) Depends on D18451 Reviewers: nhaehnle, arsenm Subscribers: arsenm, llvm-commits Differential Revision: http://reviews.llvm.org/D18452 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@264876 91177308-0d34-0410-b5e6-96231b3b80d8	2016-03-30 16:35:09 +00:00
Sanjay Patel	206cd3a64e	fix checks: _DAG -> -DAG git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@264676 91177308-0d34-0410-b5e6-96231b3b80d8	2016-03-28 22:11:06 +00:00
Matthias Braun	1598569f70	CodeGen: Correct specification of PHI nodes They do have a def machine operand. Fixing the definition is necessary for an upcoming patch. Differential Revision: http://reviews.llvm.org/D18384 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@264607 91177308-0d34-0410-b5e6-96231b3b80d8	2016-03-28 18:18:41 +00:00
Tom Stellard	4a79dec8e2	AMDGPU/SI: Limit load clustering to 16 bytes instead of 4 instructions Summary: This helps prevent load clustering from drastically increasing register pressure by trying to cluster 4 SMRDx8 loads together. The limit of 16 bytes was chosen, because it seems like that was the original intent of setting the limit to 4 instructions, but more analysis could show that a different limit is better. This fixes yields small decreases in register usage with shader-db, but also helps avoid a large increase in register usage when lane mask tracking is enabled in the machine scheduler, because lane mask tracking enables more opportunities for load clustering. shader-db stats: 2379 shaders in 477 tests Totals: SGPRS: 49744 -> 48600 (-2.30 %) VGPRS: 34120 -> 34076 (-0.13 %) Code Size: 1282888 -> 1283184 (0.02 %) bytes LDS: 28 -> 28 (0.00 %) blocks Scratch: 495616 -> 492544 (-0.62 %) bytes per wave Max Waves: 6843 -> 6853 (0.15 %) Wait states: 0 -> 0 (0.00 %) Reviewers: nhaehnle, arsenm Subscribers: arsenm, llvm-commits Differential Revision: http://reviews.llvm.org/D18451 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@264589 91177308-0d34-0410-b5e6-96231b3b80d8	2016-03-28 16:10:13 +00:00
Matthias Braun	afb111acf7	LiveInterval: Fix Distribute() failing on liveranges with unused VNInfos This fixes http://llvm.org/PR26991 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@264345 91177308-0d34-0410-b5e6-96231b3b80d8	2016-03-24 21:41:38 +00:00
Matt Arsenault	54f7128a27	AMDGPU: Remove atomic inc/dec patterns There is no benefit to these since materializing the constant 1 requires the same number of instructions as materializing uint_max git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@264215 91177308-0d34-0410-b5e6-96231b3b80d8	2016-03-23 23:23:38 +00:00
Matt Arsenault	fc80e900d8	AMDGPU: Promote alloca should skip volatiles git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@264214 91177308-0d34-0410-b5e6-96231b3b80d8	2016-03-23 23:17:29 +00:00
Matt Arsenault	222f9e97f6	AMDGPU: Insert moves of frame index to value operands Strengthen tests of storing frame indices. Right now this just creates irrelevant scheduling changes. We don't want to have multiple frame index operands on an instruction. There seem to be various assumptions that at least the same frame index will not appear twice in the LocalStackSlotAllocation pass. There's no reason to have this happen, and it just makes it easy to introduce bugs where the immediate offset is appplied to the storing instruction when it should really be applied to the value being stored as a separate add. This might not be sufficient. It might still be problematic to have an add fi, fi situation, but that's even less unlikely to happen in real code. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@264200 91177308-0d34-0410-b5e6-96231b3b80d8	2016-03-23 21:49:25 +00:00
Nicolai Haehnle	f0b7f107b9	AMDGPU: Add SIWholeQuadMode pass Summary: Whole quad mode is already enabled for pixel shaders that compute derivatives, but it must be suspended for instructions that cause a shader to have side effects (i.e. stores and atomics). This pass addresses the issue by storing the real (initial) live mask in a register, masking EXEC before instructions that require exact execution and (re-)enabling WQM where required. This pass is run before register coalescing so that we can use machine SSA for analysis. The changes in this patch expose a problem with the second machine scheduling pass: target independent instructions like COPY implicitly use EXEC when they operate on VGPRs, but this fact is not encoded in the MIR. This can lead to miscompilation because instructions are moved past changes to EXEC. This patch fixes the problem by adding use-implicit operands to target independent instructions. Some general codegen passes are relaxed to work with such implicit use operands. Reviewers: arsenm, tstellarAMD, mareko Subscribers: MatzeB, arsenm, llvm-commits Differential Revision: http://reviews.llvm.org/D18162 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@263982 91177308-0d34-0410-b5e6-96231b3b80d8	2016-03-21 20:28:33 +00:00
Tom Stellard	1f213b9b37	AMDGPU/SI: Fix threshold calculation for branching when exec is zero Summary: When control flow is implemented using the exec mask, the compiler will insert branch instructions to skip over the masked section when exec is zero if the section contains more than a certain number of instructions. The previous code would only count instructions in successor blocks, and this patch modifies the code to start counting instructions in all blocks between the start and end of the branch. Reviewers: nhaehnle, arsenm Subscribers: arsenm, llvm-commits Differential Revision: http://reviews.llvm.org/D18282 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@263969 91177308-0d34-0410-b5e6-96231b3b80d8	2016-03-21 18:56:58 +00:00
Matt Arsenault	ea35ca49f2	AMDGPU: Remove SignBitIsZero for mubuf scratch offsets These instructions do not have the same negative base address problem that DS instructions do on SI. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@263964 91177308-0d34-0410-b5e6-96231b3b80d8	2016-03-21 18:02:18 +00:00
Matt Arsenault	c8d042bec9	AMDGPU: Add frexp_mant intrinsic git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@263948 91177308-0d34-0410-b5e6-96231b3b80d8	2016-03-21 16:11:05 +00:00
Nicolai Haehnle	a99f51df81	AMDGPU: Overload return type of llvm.amdgcn.buffer.load.format Summary: Allow the selection of BUFFER_LOAD_FORMAT_x and _XY. Do this now before the frontend patches land in Mesa. Eventually, we may want to automatically reduce the size of loads at the LLVM IR level, which requires such overloads, and in some cases Mesa can generate them directly. Reviewers: tstellarAMD, arsenm Subscribers: arsenm, llvm-commits Differential Revision: http://reviews.llvm.org/D18255 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@263792 91177308-0d34-0410-b5e6-96231b3b80d8	2016-03-18 16:24:40 +00:00
Nicolai Haehnle	e9f50a4929	AMDGPU/SI: Add llvm.amdgcn.buffer.atomic.* intrinsics Summary: These intrinsics expose the BUFFER_ATOMIC_* instructions and will be used by Mesa to implement atomics with buffer semantics. The intrinsic interface matches that of buffer.load.format and buffer.store.format, except that the GLC bit is not exposed (it is automatically deduced based on whether the return value is used). The change of hasSideEffects is required for TableGen to accept the pattern that matches the intrinsic. Reviewers: tstellarAMD, arsenm Subscribers: arsenm, rivanvx, llvm-commits Differential Revision: http://reviews.llvm.org/D18151 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@263791 91177308-0d34-0410-b5e6-96231b3b80d8	2016-03-18 16:24:31 +00:00
Nicolai Haehnle	71acdb08db	AMDGPU: use ComplexPattern for offsets in llvm.amdgcn.buffer.load/store.format Summary: We cannot easily deduce that an offset is in an SGPR, but the Mesa frontend cannot easily make use of an explicit soffset parameter either. Furthermore, it is likely that in the future, LLVM will be in a better position than the frontend to choose an SGPR offset if possible. Since there aren't any frontend uses of these intrinsics in upstream repositories yet, I would like to take this opportunity to change the intrinsic signatures to a single offset parameter, which is then selected to immediate offsets or voffsets using a ComplexPattern. Reviewers: arsenm, tstellarAMD, mareko Subscribers: arsenm, llvm-commits Differential Revision: http://reviews.llvm.org/D18218 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@263790 91177308-0d34-0410-b5e6-96231b3b80d8	2016-03-18 16:24:20 +00:00
Sam Kolton	5460c31240	[AMDGPU] Assembler: Change dpp_ctrl syntax to match sp3 Review: http://reviews.llvm.org/D18267 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@263789 91177308-0d34-0410-b5e6-96231b3b80d8	2016-03-18 15:35:51 +00:00
Changpeng Fang	6405fe8e88	AMDGPU/SI: Do not generate s_waitcnt after ds_permute/ds_bpermute Symmary: ds_permute/ds_bpermute do not read memory so s_waitcnt is not needed. Reviewers arsenm, tstellarAMD Subscribers llvm-commits, arsenm Differential Revision: http://reviews.llvm.org/D18197 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@263720 91177308-0d34-0410-b5e6-96231b3b80d8	2016-03-17 16:43:50 +00:00
Nicolai Haehnle	d22ce50fea	AMDGPU: Prevent uniform loops from becoming infinite Summary: Uniform loops where the branch leaving the loop is predicated on VCCNZ must be skipped if EXEC = 0, otherwise they will be infinite. Reviewers: tstellarAMD, arsenm Subscribers: arsenm, llvm-commits Differential Revision: http://reviews.llvm.org/D18137 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@263658 91177308-0d34-0410-b5e6-96231b3b80d8	2016-03-16 20:14:33 +00:00
Changpeng Fang	53914149bf	AMDGPU/SI: Implement GroupStaticSize Intrinsic for Dynamic LDS Summary: Static LDS size is saved in MachineFunctionInfo::LDSSize, We define a pseudo instruction with usesCustomInserter bit set. Then, in EmitInstrWithCustomInserter, we replace this pseudo instruction with a mov of MachineFunctionInfo::LDSSize. Reviewers: arsenm tstellarAMD Subscribers llvm-commits, arsenm Differential Revision: http://reviews.llvm.org/D18064 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@263563 91177308-0d34-0410-b5e6-96231b3b80d8	2016-03-15 17:28:44 +00:00
Tom Stellard	f53246799f	AMDGPU/SI: Handle wait states required for DPP instructions Reviewers: arsenm Subscribers: arsenm, llvm-commits Differential Revision: http://reviews.llvm.org/D17543 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@263447 91177308-0d34-0410-b5e6-96231b3b80d8	2016-03-14 17:05:56 +00:00
Marek Olsak	01d3696081	AMDGPU/SI: Incomplete shader binaries need to finish execution at the end Reviewers: tstellarAMD, arsenm Subscribers: arsenm Differential Revision: http://reviews.llvm.org/D18058 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@263441 91177308-0d34-0410-b5e6-96231b3b80d8	2016-03-14 15:57:14 +00:00
Chad Rosier	9c9879a621	Update test case to appease bots after 263255. I'll follow up with Matt to confirm this is the correct fix. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@263268 91177308-0d34-0410-b5e6-96231b3b80d8	2016-03-11 17:33:36 +00:00
Nikolay Haustov	63cffd62c1	[AMDGPU] Assembler: change v_madmk operands to have same order as mad. The constant is now at source operand 1 (previously at 2). This is also how it is in legacy AMD sp3 assembler. Update tests. Differential Revision: http://reviews.llvm.org/D17984 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@263212 91177308-0d34-0410-b5e6-96231b3b80d8	2016-03-11 09:27:25 +00:00
Matt Arsenault	e4e707f153	AMDGPU: Materialize sign bits with bfrev If a constant is the same as the reverse of an inline immediate, this is 4 bytes smaller than having to embed a 32-bit literal. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@263201 91177308-0d34-0410-b5e6-96231b3b80d8	2016-03-11 07:42:49 +00:00
Nicolai Haehnle	f0eb7094d4	AMDGPU/SI: add llvm.amdgcn.buffer.load/store.format intrinsics Summary: They correspond to BUFFER_LOAD/STORE_FORMAT_XYZW and will be used by Mesa to implement the GL_ARB_shader_image_load_store extension. The intention is that for llvm.amdgcn.buffer.load.format, LLVM will decide whether one of the _X/_XY/_XYZ opcodes can be used (similar to image sampling and loads). However, this is not currently implemented. For llvm.amdgcn.buffer.store, LLVM cannot decide to use one of the "smaller" opcodes and therefore the intrinsic is overloaded. Currently, only the v4f32 is actually implemented since GLSL also only has a vec4 variant of the store instructions, although it's conceivable that Mesa will want to be smarter about this in the future. BUFFER_LOAD_FORMAT_XYZW is already exposed via llvm.SI.vs.load.input, which has a legacy name, pretends not to access memory, and does not capture the full flexibility of the instruction. Reviewers: arsenm, tstellarAMD, mareko Subscribers: arsenm, llvm-commits Differential Revision: http://reviews.llvm.org/D17277 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@263140 91177308-0d34-0410-b5e6-96231b3b80d8	2016-03-10 18:43:50 +00:00
Changpeng Fang	de01cf1028	AMDGPU/SI: Define S_GETREG Intrinsic Summary: Define s_getreg intrinsic to generate s_getreg instruction to read hardware registers. Reviewers: tstellarAMD, arsenm Subscribers: llvm-commits, arsenm Differential Revision: http://reviews.llvm.org/D17892 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@263124 91177308-0d34-0410-b5e6-96231b3b80d8	2016-03-10 16:47:15 +00:00
Tom Stellard	026295317d	SelectionDAG: Fix a crash on inline asm when output register supports multiple types Summary: The code in SelectionDAG did not handle the case where the register type and output types were different, but had the same size. Reviewers: arsenm, echristo Subscribers: llvm-commits Differential Revision: http://reviews.llvm.org/D17940 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@263022 91177308-0d34-0410-b5e6-96231b3b80d8	2016-03-09 16:02:52 +00:00
Sam Kolton	6e4c55e686	[AMDGPU] Assembler: Support DPP instructions. Supprot DPP syntax as used in SP3 (except several operands syntax). Added dpp-specific operands in td-files. Added DPP flag to TSFlags to determine if instruction is dpp in InstPrinter. Support for VOP2 DPP instructions in td-files. Some tests for DPP instructions. ToDo: - VOP2bInst: - vcc is considered as operand - AsmMatcher doesn't apply mnemonic aliases when parsing operands - v_mac_f32 - v_nop - disable instructions with 64-bit operands - change dpp_ctrl assembler representation to conform sp3 Review: http://reviews.llvm.org/D17804 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@263008 91177308-0d34-0410-b5e6-96231b3b80d8	2016-03-09 12:29:31 +00:00
Matt Arsenault	eae62a846f	AMDGPU: Match more med3 integer patterns git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@262864 91177308-0d34-0410-b5e6-96231b3b80d8	2016-03-07 21:54:48 +00:00
Matthias Braun	35ea41f3e7	RegisterCoalescer: Remap subregister lanemasks before exchanging operands Rematerializing and merging into a bigger register class at the same time, requires the subregister range lanemasks getting remapped to the new register class. This fixes http://llvm.org/PR26805 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@262768 91177308-0d34-0410-b5e6-96231b3b80d8	2016-03-05 04:36:13 +00:00
Tom Stellard	6bf8b0e0f7	AMDGPU/SI: Add support for spiling SGPRs to scratch buffer Summary: This is necessary for when we run out of VGPRs and can no longer use v_{read,write}_lane for spilling SGPRs. Reviewers: arsenm Subscribers: arsenm, llvm-commits Differential Revision: http://reviews.llvm.org/D17592 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@262732 91177308-0d34-0410-b5e6-96231b3b80d8	2016-03-04 18:31:18 +00:00
Nikolay Haustov	03489d3461	AMDGPU/SI: add llvm.amdgcn.image.atomic.* intrinsics These correspond to IMAGE_ATOMIC_* and are going to be used by Mesa for the GL_ARB_shader_image_load_store extension. Initial change by Nicolai H.hnle Differential Revision: http://reviews.llvm.org/D17401 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@262701 91177308-0d34-0410-b5e6-96231b3b80d8	2016-03-04 10:39:50 +00:00
Matt Arsenault	543afc9d41	DAGCombiner: Make sure an integer is being truncated git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@262446 91177308-0d34-0410-b5e6-96231b3b80d8	2016-03-02 01:36:51 +00:00
Matt Arsenault	d06f393d79	DAGCombiner: Turn truncate of a bitcasted vector to an extract On AMDGPU where operations i64 operations are often bitcasted to v2i32 and back, this pattern shows up regularly where it breaks some expected combines on i64, such as load width reducing. This fixes some test failures in a future commit when i64 loads are changed to promote. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@262397 91177308-0d34-0410-b5e6-96231b3b80d8	2016-03-01 21:31:53 +00:00
Matt Arsenault	8ba165a405	DAGCombiner: Turn extract of bitcasted integer into truncate This reduces the number of bitcast nodes and generally cleans up the DAG when bitcasting between integers and vectors everywhere. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@262358 91177308-0d34-0410-b5e6-96231b3b80d8	2016-03-01 18:01:37 +00:00
Changpeng Fang	5bbcde0b92	AMDGPU/SI: Implement DS_PERMUTE/DS_BPERMUTE Instruction Definitions and Intrinsics Summary: This patch impleemnts DS_PERMUTE/DS_BPERMUTE instruction definitions and intrinsics, which are new since VI. Reviewers: tstellarAMD, arsenm Subscribers: llvm-commits, arsenm Differential Revision: http://reviews.llvm.org/D17614 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@262356 91177308-0d34-0410-b5e6-96231b3b80d8	2016-03-01 17:51:23 +00:00
Matt Arsenault	12cb9f057f	AMDGPU: Set HasExtractBitInsn This currently does not have the control over the bitwidth, and there are missing optimizations to reduce the integer to 32-bit if it can be. But in most situations we do want the sinking to occur. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@262296 91177308-0d34-0410-b5e6-96231b3b80d8	2016-03-01 04:58:17 +00:00
Matt Arsenault	26419a11ad	AMDGPU: More bits of frame index are known to be zero The maximum private allocation for the whole GPU is 4G, so the maximum possible index for a single workitem is the maximum size divided by the smallest granularity for a dispatch. This increases the number of known zero high bits, which enables more offset folding. The maximum private size per workitem with this is 128M but may be smaller still. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@262153 91177308-0d34-0410-b5e6-96231b3b80d8	2016-02-27 20:26:57 +00:00
Matt Arsenault	2bc40a1dbd	DAGCombiner: Don't unnecessarily swap operands in ReassociateOps In the case where op = add, y = base_ptr, and x = offset, this transform: (op y, (op x, c1)) -> (op (op x, y), c1) breaks the canonical form of add by putting the base pointer in the second operand and the offset in the first. This fix is important for the R600 target, because for some address spaces the base pointer and the offset are stored in separate register classes. The old pattern caused the ISel code for matching addressing modes to put the base pointer and offset in the wrong register classes, which required no-trivial code transformations to fix. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@262148 91177308-0d34-0410-b5e6-96231b3b80d8	2016-02-27 19:57:45 +00:00
Matt Arsenault	f51a2196a5	DAGCombiner: Relax sqrt NaN folding check This is OK for +0 since compares to +/-0 give the same result. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@262125 91177308-0d34-0410-b5e6-96231b3b80d8	2016-02-27 09:38:05 +00:00
Matt Arsenault	788be52946	AMDGPU: Add s_sleep intrinsic git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@262120 91177308-0d34-0410-b5e6-96231b3b80d8	2016-02-27 08:53:52 +00:00
Matt Arsenault	a164276e20	AMDGPU: Implement readcyclecounter This matches the behavior of the HSAIL clock instruction. s_realmemtime is used if the subtarget supports it, and falls back to s_memtime if not. Also introduces new intrinsics for each of s_memtime / s_memrealtime. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@262119 91177308-0d34-0410-b5e6-96231b3b80d8	2016-02-27 08:53:46 +00:00
Nikolay Haustov	1c038cf2fa	[AMDGPU] Assembler: Basic support for MIMG Add parsing and printing of image operands. Matches legacy sp3 assembler. Change image instruction order to have data/image/sampler operands in the beginning. This is needed because optional operands in MC are always last. Update SITargetLowering for new order. Add basic MC test. Update CodeGen tests. Review: http://reviews.llvm.org/D17574 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@261995 91177308-0d34-0410-b5e6-96231b3b80d8	2016-02-26 09:51:05 +00:00
Matthias Braun	e0761c4899	MachineCopyPropagation: Catch copies of the form A<-B;A<-B Differential Revision: http://reviews.llvm.org/D17475 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@261966 91177308-0d34-0410-b5e6-96231b3b80d8	2016-02-26 03:18:55 +00:00
Matt Arsenault	4a5938727a	AMDGPU: Add failing testcase for register coalescer git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@261592 91177308-0d34-0410-b5e6-96231b3b80d8	2016-02-22 23:45:42 +00:00

1 2 3 4 5 ...

322 Commits