archived-llvm

mirror of https://github.com/RPCS3/llvm.git synced 2026-01-31 01:25:19 +01:00

Author	SHA1	Message	Date
Alexei Starovoitov	2462cfbd35	BPF: emit an error message for unsupported signed division operation Signed-off-by: Yonghong Song <yhs@plumgrid.com> Signed-off-by: Alexei Starovoitov <ast@fb.com> git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@263842 91177308-0d34-0410-b5e6-96231b3b80d8	2016-03-18 22:02:47 +00:00
Chad Rosier	b0c326b0a1	[AArch64] Enable more load clustering in the MI Scheduler. This patch adds unscaled loads and sign-extend loads to the TII getMemOpBaseRegImmOfs API, which is used to control clustering in the MI scheduler. This is done to create more opportunities for load pairing. I've also added the scaled LDRSWui instruction, which was missing from the scaled instructions. Finally, I've added support in shouldClusterLoads for clustering adjacent sext and zext loads that too can be paired by the load/store optimizer. Differential Revision: http://reviews.llvm.org/D18048 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@263819 91177308-0d34-0410-b5e6-96231b3b80d8	2016-03-18 19:21:02 +00:00
Nicolai Haehnle	a99f51df81	AMDGPU: Overload return type of llvm.amdgcn.buffer.load.format Summary: Allow the selection of BUFFER_LOAD_FORMAT_x and _XY. Do this now before the frontend patches land in Mesa. Eventually, we may want to automatically reduce the size of loads at the LLVM IR level, which requires such overloads, and in some cases Mesa can generate them directly. Reviewers: tstellarAMD, arsenm Subscribers: arsenm, llvm-commits Differential Revision: http://reviews.llvm.org/D18255 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@263792 91177308-0d34-0410-b5e6-96231b3b80d8	2016-03-18 16:24:40 +00:00
Nicolai Haehnle	e9f50a4929	AMDGPU/SI: Add llvm.amdgcn.buffer.atomic.* intrinsics Summary: These intrinsics expose the BUFFER_ATOMIC_* instructions and will be used by Mesa to implement atomics with buffer semantics. The intrinsic interface matches that of buffer.load.format and buffer.store.format, except that the GLC bit is not exposed (it is automatically deduced based on whether the return value is used). The change of hasSideEffects is required for TableGen to accept the pattern that matches the intrinsic. Reviewers: tstellarAMD, arsenm Subscribers: arsenm, rivanvx, llvm-commits Differential Revision: http://reviews.llvm.org/D18151 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@263791 91177308-0d34-0410-b5e6-96231b3b80d8	2016-03-18 16:24:31 +00:00
Nicolai Haehnle	71acdb08db	AMDGPU: use ComplexPattern for offsets in llvm.amdgcn.buffer.load/store.format Summary: We cannot easily deduce that an offset is in an SGPR, but the Mesa frontend cannot easily make use of an explicit soffset parameter either. Furthermore, it is likely that in the future, LLVM will be in a better position than the frontend to choose an SGPR offset if possible. Since there aren't any frontend uses of these intrinsics in upstream repositories yet, I would like to take this opportunity to change the intrinsic signatures to a single offset parameter, which is then selected to immediate offsets or voffsets using a ComplexPattern. Reviewers: arsenm, tstellarAMD, mareko Subscribers: arsenm, llvm-commits Differential Revision: http://reviews.llvm.org/D18218 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@263790 91177308-0d34-0410-b5e6-96231b3b80d8	2016-03-18 16:24:20 +00:00
Sam Kolton	5460c31240	[AMDGPU] Assembler: Change dpp_ctrl syntax to match sp3 Review: http://reviews.llvm.org/D18267 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@263789 91177308-0d34-0410-b5e6-96231b3b80d8	2016-03-18 15:35:51 +00:00
Tim Shen	0ead5321f0	[PPC, FastISel] Fix ordered/unordered fcmp For fcmp, major concern about the following 6 cases is NaN result. The comparison result consists of 4 bits, indicating lt, eq, gt and un (unordered), only one of which will be set. The result is generated by fcmpu instruction. However, bc instruction only inspects one of the first 3 bits, so when un is set, bc instruction may jump to to an undesired place. More specifically, if we expect an unordered comparison and un is set, we expect to always go to true branch; in such case UEQ, UGT and ULT still give false, which are undesired; but UNE, UGE, ULE happen to give true, since they are tested by inspecting !eq, !lt, !gt, respectively. Similarly, for ordered comparison, when un is set, we always expect the result to be false. In such case OGT, OLT and OEQ is good, since they are actually testing GT, LT, and EQ respectively, which are false. OGE, OLE and ONE are tested through !lt, !gt and !eq, and these are true. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@263753 91177308-0d34-0410-b5e6-96231b3b80d8	2016-03-17 22:27:58 +00:00
Tim Northover	dfee902f30	ARM: stop asserting on weird <3 x Ty> vectors in ISelLowering. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@263741 91177308-0d34-0410-b5e6-96231b3b80d8	2016-03-17 20:10:28 +00:00
Petar Jovanovic	cd74bb6415	[PowerPC] Disable CTR loops optimization for soft float operations This patch prevents CTR loops optimization when using soft float operations inside loop body. Soft float operations use function calls, but function calls are not allowed inside CTR optimized loops. Patch by Aleksandar Beserminji. Differential Revision: http://reviews.llvm.org/D17600 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@263727 91177308-0d34-0410-b5e6-96231b3b80d8	2016-03-17 17:11:33 +00:00
Derek Schuff	3d15212029	[WebAssembly] Stackify code emitted by eliminateFrameIndex and SP writeback Summary: MRI::eliminateFrameIndex can emit several instructions to do address calculations; these can usually be stackified. Because instructions with FI operands can have subsequent operands which may be expression trees, find the top of the leftmost tree and insert the code before it, to keep the LIFO property. Also use stackified registers when writing back the SP value to memory in the epilog; it's unnecessary because SP will not be used after the epilog, and it results in better code. Differential Revision: http://reviews.llvm.org/D18234 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@263725 91177308-0d34-0410-b5e6-96231b3b80d8	2016-03-17 17:00:29 +00:00
Changpeng Fang	6405fe8e88	AMDGPU/SI: Do not generate s_waitcnt after ds_permute/ds_bpermute Symmary: ds_permute/ds_bpermute do not read memory so s_waitcnt is not needed. Reviewers arsenm, tstellarAMD Subscribers llvm-commits, arsenm Differential Revision: http://reviews.llvm.org/D18197 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@263720 91177308-0d34-0410-b5e6-96231b3b80d8	2016-03-17 16:43:50 +00:00
Simon Pilgrim	96ac27b20b	[X86][SSE] Simplified blend-with-zero combining We were being too aggressive in trying to combine a shuffle into a blend-with-zero pattern, often resulting in a endless loop of contrasting combines This patch stops the combine if we already have a blend in place (means we miss some domain corrections) git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@263717 91177308-0d34-0410-b5e6-96231b3b80d8	2016-03-17 15:59:36 +00:00
Saleem Abdulrasool	a944db98cd	ARM: Revert SVN r253865, 254158, fix windows division The two changes together weakened the test and caused a regression with division handling in MSVC mode. They were applied to avoid an assertion being triggered in the block frequency analysis. However, the underlying problem was simply being masked rather than solved properly. Address the actual underlying problem and revert the changes. Rather than analyze the cause of the assertion, the division failure was assumed to be an overflow. The underlying issue was a subtle bug in the BB construction in the emission of the div-by-zero check (WIN__DBZCHK). We did not construct the proper successor information in the basic blocks, nor did we update the PHIs associated with the basic block when we split them. This would result in assertions being triggered in the block frequency analysis pass. Although the original tests are being removed, the tests themselves performed very little in terms of validation but merely tested that we did not assert when generating code. Update this with new tests that actually ensure that we do not regress on the code generation. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@263714 91177308-0d34-0410-b5e6-96231b3b80d8	2016-03-17 14:10:49 +00:00
Nicolai Haehnle	d22ce50fea	AMDGPU: Prevent uniform loops from becoming infinite Summary: Uniform loops where the branch leaving the loop is predicated on VCCNZ must be skipped if EXEC = 0, otherwise they will be infinite. Reviewers: tstellarAMD, arsenm Subscribers: arsenm, llvm-commits Differential Revision: http://reviews.llvm.org/D18137 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@263658 91177308-0d34-0410-b5e6-96231b3b80d8	2016-03-16 20:14:33 +00:00
Simon Pilgrim	1372924ef8	[X86] Reduced alignment of widened vector load/stores to better match PR26953 cases git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@263649 91177308-0d34-0410-b5e6-96231b3b80d8	2016-03-16 18:32:44 +00:00
Simon Pilgrim	10e0424cf0	[X86] Regenerated + extended widened vector conversion tests - Ensure we test X86 + X64 - sitopfp / uitofp requires testing for SSE2 and SSE42 as well (part of the fix for PR26953) git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@263640 91177308-0d34-0410-b5e6-96231b3b80d8	2016-03-16 15:33:43 +00:00
Igor Breger	81434f47ae	AVX512BW: Fix SRA v64i8 lowering. Use PCMPGTM (cmp result in k register) for 512bit vector because PCMPGT supported only for 128/256bit. Differential Revision: http://reviews.llvm.org/D18204 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@263624 91177308-0d34-0410-b5e6-96231b3b80d8	2016-03-16 08:48:26 +00:00
Simon Pilgrim	a235a96895	[X86] Regenerated widen load tests git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@263608 91177308-0d34-0410-b5e6-96231b3b80d8	2016-03-16 00:41:21 +00:00
Simon Pilgrim	e30740e25e	[X86][SSE41] Additional tests for extracting zeroable shuffle elements We can currently only match zeroable vector elements of the same size as the shuffle type - these tests demonstrate the problem and a solution will be shortly added in an updated D14261 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@263606 91177308-0d34-0410-b5e6-96231b3b80d8	2016-03-16 00:13:36 +00:00
Quentin Colombet	c0ed3e9a27	[MIR] Add a test case for the diagnostic of a wrongly typed generic instruction git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@263573 91177308-0d34-0410-b5e6-96231b3b80d8	2016-03-15 18:31:29 +00:00
Quentin Colombet	02dabc9fee	[AArch64] Move GlobalISel test cases into a GlobalISel subdirectory git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@263572 91177308-0d34-0410-b5e6-96231b3b80d8	2016-03-15 18:30:00 +00:00
Changpeng Fang	53914149bf	AMDGPU/SI: Implement GroupStaticSize Intrinsic for Dynamic LDS Summary: Static LDS size is saved in MachineFunctionInfo::LDSSize, We define a pseudo instruction with usesCustomInserter bit set. Then, in EmitInstrWithCustomInserter, we replace this pseudo instruction with a mov of MachineFunctionInfo::LDSSize. Reviewers: arsenm tstellarAMD Subscribers llvm-commits, arsenm Differential Revision: http://reviews.llvm.org/D18064 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@263563 91177308-0d34-0410-b5e6-96231b3b80d8	2016-03-15 17:28:44 +00:00
Lang Hames	875b7560fb	[MachO] Add MachO alt-entry directive support. This patch adds support for the MachO .alt_entry assembly directive, and uses it for global aliases with non-zero GEP offsets. The alt_entry flag indicates that a symbol should be layed out immediately after the preceding symbol. Conceptually it introduces an alternate entry point for a function or data structure. E.g.: safe_foo: // check preconditions for foo .alt_entry fast_foo fast_foo: // body of foo, can assume preconditions. The .alt_entry flag is also implicitly set on assembly aliases of the form: a = b + C where C is a non-zero constant, since these have the same effect as an alt_entry symbol: they introduce a label that cannot be moved relative to the preceding one. Setting the alt_entry flag on aliases of this form fixes http://llvm.org/PR25381. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@263521 91177308-0d34-0410-b5e6-96231b3b80d8	2016-03-15 01:43:05 +00:00
Eric Christopher	a27966ebf6	Temporarily Revert "[X86][SSE] Simplify vector LOAD + EXTEND on pre-SSE41 hardware" as it seems to be causing crashes during code generation in halide. PR forthcoming. This reverts commit r263303. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@263512 91177308-0d34-0410-b5e6-96231b3b80d8	2016-03-14 23:59:57 +00:00
Chad Rosier	4005bceb12	[AArch64] Break the dependency between FP and SP when possible. When the SP in not changed because of realignment/VLAs etc., we restore the SP by using the previous value of SP and not the FP. Breaking the dependency will help in cases when the epilog of a callee is close to the epilog of the caller; for then "sub sp, fp, #" depends on the load restoring the FP in the epilog of the callee. http://reviews.llvm.org/D18060 Patch by Aditya Kumar and Evandro Menezes. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@263458 91177308-0d34-0410-b5e6-96231b3b80d8	2016-03-14 18:17:41 +00:00
Tom Stellard	f53246799f	AMDGPU/SI: Handle wait states required for DPP instructions Reviewers: arsenm Subscribers: arsenm, llvm-commits Differential Revision: http://reviews.llvm.org/D17543 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@263447 91177308-0d34-0410-b5e6-96231b3b80d8	2016-03-14 17:05:56 +00:00
Sanjay Patel	d26914da05	[x86, AVX] replace masked load with full vector load when possible Converting masked vector loads to regular vector loads for x86 AVX should always be a win. I raised the legality issue of reading the extra memory bytes on llvm-dev. I did not see any objections. 1. x86 already does this kind of optimization for multiple scalar loads -> vector load. 2. If other targets have the same flexibility, we could move this transform up to CGP or DAGCombiner. Differential Revision: http://reviews.llvm.org/D18094 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@263446 91177308-0d34-0410-b5e6-96231b3b80d8	2016-03-14 16:54:43 +00:00
Daniel Sanders	eacb2ec057	[mips] MIPS32R6 compact branch support Summary: MIPSR6 introduces a class of branches called compact branches. Unlike the traditional MIPS branches which have a delay slot, compact branches do not have a delay slot. The instruction following the compact branch is only executed if the branch is not taken and must not be a branch. It works by generating compact branches for MIPS32R6 when the delay slot filler cannot fill a delay slot. Then, inspecting the generated code for forbidden slot hazards (a compact branch with an adjacent branch or other CTI) and inserting nops to clear this hazard. Patch by Simon Dardis. Reviewers: vkalintiris, dsanders Subscribers: MatzeB, dsanders, llvm-commits Differential Revision: http://reviews.llvm.org/D16353 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@263444 91177308-0d34-0410-b5e6-96231b3b80d8	2016-03-14 16:24:05 +00:00
Marek Olsak	01d3696081	AMDGPU/SI: Incomplete shader binaries need to finish execution at the end Reviewers: tstellarAMD, arsenm Subscribers: arsenm Differential Revision: http://reviews.llvm.org/D18058 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@263441 91177308-0d34-0410-b5e6-96231b3b80d8	2016-03-14 15:57:14 +00:00
Ulrich Weigand	55474c1f67	[SystemZ] Avoid LER on z13 due to partial register dependencies On the z13, it turns out to be more efficient to access a full floating-point register than just the upper half (as done e.g. by the LE and LER instructions). Current code already takes this into account when loading from memory by using the LDE instruction in place of LE. However, we still generate LER, which shows the same performance issues as LE in certain circumstances. This patch changes the back-end to emit LDR instead of LER to implement FP32 register-to-register copies on z13. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@263431 91177308-0d34-0410-b5e6-96231b3b80d8	2016-03-14 13:50:03 +00:00
Zlatko Buljan	a4bfc57321	[mips] Fix an issue with long double when function roundl is defined Differential Revision: http://reviews.llvm.org/D17760 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@263428 91177308-0d34-0410-b5e6-96231b3b80d8	2016-03-14 12:50:23 +00:00
Igor Breger	aecc6a2077	AVX512: icmp operation should be always lowered to CMPM (AVX-512) instruction on SKX. implemented by delena Differential Revision: http://reviews.llvm.org/D18054 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@263417 91177308-0d34-0410-b5e6-96231b3b80d8	2016-03-14 10:26:39 +00:00
Simon Pilgrim	38258959b8	[X86][XOP] Added target shuffle combine tests for XOP's VPPERM 2-op shuffle Actual combing support will be added in a future patch git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@263402 91177308-0d34-0410-b5e6-96231b3b80d8	2016-03-14 00:18:26 +00:00
Simon Pilgrim	75aa6fae3a	[X86][SSE] Added truncated vector arithmetic tests. For cases where we are truncating an integer vector arithmetic result, it may be better to pre-truncate the input operands - no code to support this yet (scalar is done with SimplifyDemandedBits but adding vector support could be a lot of work) but these tests represent the current codegen status. Example bugs: PR14666, PR22703 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@263384 91177308-0d34-0410-b5e6-96231b3b80d8	2016-03-13 19:08:01 +00:00
Simon Pilgrim	b419ca4ee0	[X86][SSE41] Avoid variable blend for constant v8i16 shifts The SSE41 v8i16 shift lowering using (v)pblendvb is great for non-constant shift amounts, but if it is constant then we can efficiently reduce the VSELECT to shuffles with the pre-SSE41 lowering. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@263383 91177308-0d34-0410-b5e6-96231b3b80d8	2016-03-13 18:35:59 +00:00
Nemanja Ivanovic	6f20310e9e	Fix for PR 26378 This patch corresponds to review: http://reviews.llvm.org/D17712 We were not clearing the TOC vector in PPCAsmPrinter when initializing it. This caused duplicate definition asserts when the pass is reused on the module (i.e. with -compile-twice or in JIT contexts). git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@263338 91177308-0d34-0410-b5e6-96231b3b80d8	2016-03-12 10:23:07 +00:00
Quentin Colombet	f02ed558bd	[X86] Make sure we do not clobber RBX with cmpxchg when used as a base pointer. cmpxchg[8\|16]b uses RBX as one of its argument. In other words, using this instruction clobbers RBX as it is defined to hold one the input. When the backend uses dynamically allocated stack, RBX is used as a reserved register for the base pointer. Reserved registers have special semantic that only the target understands and enforces, because of that, the register allocator don’t use them, but also, don’t try to make sure they are used properly (remember it does not know how they are supposed to be used). Therefore, when RBX is used as a reserved register but defined by something that is not compatible with that use, the register allocator will not fix the surrounding code to make sure it gets saved and restored properly around the broken code. This is the responsibility of the target to do the right thing with its reserved register. To fix that, when the base pointer needs to be preserved, we use a different pseudo instruction for cmpxchg that save rbx. That pseudo takes two more arguments than the regular instruction: - One is the value to be copied into RBX to set the proper value for the comparison. - The other is the virtual register holding the save of the value of RBX as the base pointer. This saving is done as part of isel (i.e., we emit a copy from rbx). cmpxchg_save_rbx <regular cmpxchg args>, input_for_rbx_reg, save_of_rbx_as_bp This gets expanded into: rbx = copy input_for_rbx_reg cmpxchg <regular cmpxchg args> rbx = save_of_rbx_as_bp Note: The actual modeling of the pseudo is a bit more complicated to make sure the interferes that appears after the pseudo gets expanded are properly modeled before that expansion. This fixes PR26883. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@263325 91177308-0d34-0410-b5e6-96231b3b80d8	2016-03-12 02:25:27 +00:00
Simon Pilgrim	ddeaf238c9	[X86][SSE] Simplify vector LOAD + EXTEND on pre-SSE41 hardware Improve vector extension of vectors on hardware without dedicated VSEXT/VZEXT instructions. We already convert these to SIGN_EXTEND_VECTOR_INREG/ZERO_EXTEND_VECTOR_INREG but can further improve this by using the legalizer instead of prematurely splitting into legal vectors in the combine as this only properly helps for lowering to VSEXT/VZEXT. Removes a lot of unnecessary any_extend + mask pattern - (Fix for PR25718). Differential Revision: http://reviews.llvm.org/D17932 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@263303 91177308-0d34-0410-b5e6-96231b3b80d8	2016-03-11 22:18:05 +00:00
Ahmed Bougacha	cdae7a5e30	[AArch64] Don't blindly lower f16/f128 FCCMPs. Instead, extend f16 (like we do when lowering a standalone SETCC), and let f128 be legalized to the RT calls. Fixes PR26803. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@263301 91177308-0d34-0410-b5e6-96231b3b80d8	2016-03-11 22:02:58 +00:00
Chad Rosier	9c9879a621	Update test case to appease bots after 263255. I'll follow up with Matt to confirm this is the correct fix. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@263268 91177308-0d34-0410-b5e6-96231b3b80d8	2016-03-11 17:33:36 +00:00
Quentin Colombet	36053724e3	[IRTranslator] Translate unconditional branches. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@263265 91177308-0d34-0410-b5e6-96231b3b80d8	2016-03-11 17:28:03 +00:00
Simon Pilgrim	91f04ade7e	[X86][AVX] Fixed issue where a long chain of shuffles could attempt to combine to a single (illegal) PSHUFB instruction. Its not enough that we test for SSSE3 - that's only OK for 128-bit vectors - we also need to test for AVX2 / AVX512BW for 256/512 bit vector cases. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@263239 91177308-0d34-0410-b5e6-96231b3b80d8	2016-03-11 14:39:10 +00:00
Vasileios Kalintiris	aaa219c75c	[mips] MIPSR6 Instruction itineraries Summary: Defines instruction itineraries for common MIPSR6 instructions. Patch by Simon Dardis. Reviewers: vkalintiris Subscribers: MatzeB, dsanders, llvm-commits Differential Revision: http://reviews.llvm.org/D17198 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@263229 91177308-0d34-0410-b5e6-96231b3b80d8	2016-03-11 13:05:06 +00:00
Nikolay Haustov	63cffd62c1	[AMDGPU] Assembler: change v_madmk operands to have same order as mad. The constant is now at source operand 1 (previously at 2). This is also how it is in legacy AMD sp3 assembler. Update tests. Differential Revision: http://reviews.llvm.org/D17984 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@263212 91177308-0d34-0410-b5e6-96231b3b80d8	2016-03-11 09:27:25 +00:00
Matt Arsenault	e4e707f153	AMDGPU: Materialize sign bits with bfrev If a constant is the same as the reverse of an inline immediate, this is 4 bytes smaller than having to embed a 32-bit literal. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@263201 91177308-0d34-0410-b5e6-96231b3b80d8	2016-03-11 07:42:49 +00:00
Tim Northover	32048586b0	AArch64: only try to use scaled fcvt ops on legal vector types. Before we ended up calling getSimpleVectorType on a <3 x float>, which asserted. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@263169 91177308-0d34-0410-b5e6-96231b3b80d8	2016-03-10 23:02:21 +00:00
Simon Pilgrim	86875b1fdf	[X86][SSE] Reapplied: Improve vector ZERO_EXTEND by combining to ZERO_EXTEND_VECTOR_INREG Generalise the existing SIGN_EXTEND to SIGN_EXTEND_VECTOR_INREG combine to support zero extension as well and get rid of a lot of unnecessary ANY_EXTEND + mask patterns. Reapplied with a fix for PR26870 (avoid premature use of TargetConstant in ZERO_EXTEND_VECTOR_INREG expansion). Differential Revision: http://reviews.llvm.org/D17691 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@263159 91177308-0d34-0410-b5e6-96231b3b80d8	2016-03-10 20:40:26 +00:00
Artur Pilipenko	980df33d17	Support arbitrary addrspace pointers in masked load/store intrinsics This patch fixes the problem which occurs when loop-vectorize tries to use @llvm.masked.load/store intrinsic for a non-default addrspace pointer. It fails with "Calling a function with a bad signature!" assertion in CallInst constructor because it tries to pass a non-default addrspace pointer to the pointer argument which has default addrspace. The fix is to add pointer type as another overloaded type to @llvm.masked.load/store intrinsics. Reviewed By: reames Differential Revision: http://reviews.llvm.org/D17270 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@263158 91177308-0d34-0410-b5e6-96231b3b80d8	2016-03-10 20:39:22 +00:00
Balaram Makam	21374d486c	Fix testicase to turn buildbot green. NFC. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@263154 91177308-0d34-0410-b5e6-96231b3b80d8	2016-03-10 19:07:50 +00:00
Nicolai Haehnle	f0eb7094d4	AMDGPU/SI: add llvm.amdgcn.buffer.load/store.format intrinsics Summary: They correspond to BUFFER_LOAD/STORE_FORMAT_XYZW and will be used by Mesa to implement the GL_ARB_shader_image_load_store extension. The intention is that for llvm.amdgcn.buffer.load.format, LLVM will decide whether one of the _X/_XY/_XYZ opcodes can be used (similar to image sampling and loads). However, this is not currently implemented. For llvm.amdgcn.buffer.store, LLVM cannot decide to use one of the "smaller" opcodes and therefore the intrinsic is overloaded. Currently, only the v4f32 is actually implemented since GLSL also only has a vec4 variant of the store instructions, although it's conceivable that Mesa will want to be smarter about this in the future. BUFFER_LOAD_FORMAT_XYZW is already exposed via llvm.SI.vs.load.input, which has a legacy name, pretends not to access memory, and does not capture the full flexibility of the instruction. Reviewers: arsenm, tstellarAMD, mareko Subscribers: arsenm, llvm-commits Differential Revision: http://reviews.llvm.org/D17277 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@263140 91177308-0d34-0410-b5e6-96231b3b80d8	2016-03-10 18:43:50 +00:00

1 2 3 4 5 ...

16048 Commits