RPCSX/llvm - llvm - Gitea: Git with a cup of tea

RPCSX/llvm

mirror of https://github.com/RPCSX/llvm.git synced 2025-01-08 13:00:43 +00:00

Author	SHA1	Message	Date
Eric Christopher	7da58e6313	Add aliases for mfvrsave/mtvrsave. Update a test as we're now going to emit it for easier reading of generated assembly as well. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@272339 91177308-0d34-0410-b5e6-96231b3b80d8	2016-06-09 23:27:48 +00:00
Simon Pilgrim	8f579ce1a6	[X86][AVX512] Added avx512 VPSLLDQ/VPSRLDQ instruction comments git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@272319 91177308-0d34-0410-b5e6-96231b3b80d8	2016-06-09 22:03:15 +00:00
Simon Pilgrim	3ddec70a78	[X86][AVX512] Dropped avx512 VPSLLDQ/VPSRLDQ intrinsics Auto-upgrade to generic shuffles like sse/avx2 implementations now that we can lower to VPSLLDQ/VPSRLDQ git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@272308 91177308-0d34-0410-b5e6-96231b3b80d8	2016-06-09 21:09:03 +00:00
Simon Pilgrim	f921bac68f	[X86][AVX512] Fixed issue with v16i32 shuffles lowering to VPALIGNR git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@272307 91177308-0d34-0410-b5e6-96231b3b80d8	2016-06-09 20:53:12 +00:00
Simon Pilgrim	9ceba992f6	[X86][AVX512] Added support for lowering 512-bit vector shuffles to bit/byte shifts 512-bit VPSLLDQ/VPSRLDQ can only be used for avx512bw targets so lowerVectorShuffleAsShift had to be adjusted to include the subtarget git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@272300 91177308-0d34-0410-b5e6-96231b3b80d8	2016-06-09 20:13:58 +00:00
Justin Lebar	dd9e8b3bcc	[NVPTX] Add intrinsics for shfl instructions. Summary: Currently clang emits these instructions via inline (volatile) asm in the CUDA headers. Switching to intrinsics will let the optimizer reason across calls to these intrinsics. Reviewers: tra Subscribers: llvm-commits, jholewinski Differential Revision: http://reviews.llvm.org/D21160 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@272298 91177308-0d34-0410-b5e6-96231b3b80d8	2016-06-09 20:04:08 +00:00
Wei Ding	39ce7152a2	AMDGPU/SI: Fix 32-bit fdiv lowering We were using the fast fdiv lowering for all division, implementation of IEEE754 fdiv is added. http://reviews.llvm.org/D20557 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@272292 91177308-0d34-0410-b5e6-96231b3b80d8	2016-06-09 19:17:15 +00:00
Davide Italiano	a72ade5c07	Also fix a typo. Need more coffee today. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@272278 91177308-0d34-0410-b5e6-96231b3b80d8	2016-06-09 17:06:01 +00:00
Davide Italiano	c4c43eaa95	Improve r272262, check that __stack_chk_guard is used. Thanks to Rafael for the suggestion. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@272277 91177308-0d34-0410-b5e6-96231b3b80d8	2016-06-09 17:04:38 +00:00
Jan Vesely	406c47ff89	SelectionDAG: Implement expansion of {S,U}MIN/MAX in integer legalization Fixes {u,}long_{min,max,clamp} opencl piglit regressions on EG. Reviewers: arsenm Differential Revision: http://reviews.llvm.org/D17898 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@272272 91177308-0d34-0410-b5e6-96231b3b80d8	2016-06-09 16:04:00 +00:00
Haicheng Wu	c4f2258852	Reapply "[MBP] Reduce code size by running tail merging in MBP."" This reapplies commit r271930, r271915, r271923. They hit a bug in Thumb which is fixed in r272258 now. The original message: The code layout that TailMerging (inside BranchFolding) works on is not the final layout optimized based on the branch probability. Generally, after BlockPlacement, many new merging opportunities emerge. This patch calls Tail Merging after MBP and calls MBP again if Tail Merging merges anything. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@272267 91177308-0d34-0410-b5e6-96231b3b80d8	2016-06-09 15:24:29 +00:00
Ulrich Weigand	09f4ea27b7	[SystemZ] Enable long displacement constraints for inline ASM operands This enables use of the 'S' constraint for inline ASM operands on SystemZ, which allows for a memory reference with a signed 20-bit immediate displacement. This patch includes corresponding documentation and test case updates. I've changed the 'T' constraint to match the new behavior for 'S', as 'T' also uses a long displacement (though index constraints are still not implemented). I also changed 'm' to match the behavior for 'S' as this will allow for a wider range of displacements for 'm', though correct me if that's not the right decision. Author: colpell Differential Revision: http://reviews.llvm.org/D21097 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@272266 91177308-0d34-0410-b5e6-96231b3b80d8	2016-06-09 15:19:16 +00:00
Davide Italiano	cbf7512550	Move stackguard test to X86/ directory as it's not generic. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@272264 91177308-0d34-0410-b5e6-96231b3b80d8	2016-06-09 15:16:58 +00:00
Davide Italiano	cb6cf5b6ec	[CodeGen] Change getSDagStackGuard to get an internal sym. Fixes a crash in the backend during an LTO build of rtld(1) in FreeBSD. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@272262 91177308-0d34-0410-b5e6-96231b3b80d8	2016-06-09 14:23:38 +00:00
Igor Breger	de21197e48	[AVX512] Remove masked_move/blendm intrinsic from back-end. This is complement patch to D21060. Differential Revision: http://reviews.llvm.org/D21174 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@272257 91177308-0d34-0410-b5e6-96231b3b80d8	2016-06-09 11:46:55 +00:00
Zlatko Buljan	2edd549258	[mips][microMIPS] Add CodeGen support for SEL., SELEQZ, SELNEZ, SELEQZ., SELNEZ.* and CMP.condn.fmt instructions Differential Revision: http://reviews.llvm.org/D20862 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@272256 91177308-0d34-0410-b5e6-96231b3b80d8	2016-06-09 11:15:53 +00:00
Diana Picus	1063f931d5	[llc] Remove exit-on-error flag from MIR tests (PR27770) This is made possible by removing an assert in llc that assumed MIRParser::parseLLVMModule would exit on error. MIRParser's documentation states that it returns null if a parsing error occurs, so there's no reason to assert. We can instead just fall through to where the check for a module is performed and exit if it is null. This commit is part of the clean-up after r269655. Fixes PR27770 Differential Revision: http://reviews.llvm.org/D20371 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@272254 91177308-0d34-0410-b5e6-96231b3b80d8	2016-06-09 10:31:05 +00:00
Craig Topper	b867bf3ea9	[AVX512] Fix shuffle decode printing for several instructions with write masks. There are still more bugs here with UNPCK and PALIGN for sure. But these were the easiest ones to fix. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@272252 91177308-0d34-0410-b5e6-96231b3b80d8	2016-06-09 07:49:08 +00:00
James Molloy	95709cad3b	[Thumb] Select a BIC instead of AND if the immediate can be encoded more optimally negated If an immediate is only used in an AND node, it is possible that the immediate can be more optimally materialized when negated. If this is the case, we can negate the immediate and use a BIC instead; int i(int a) { return a & 0xfffffeec; } Used to produce: ldr r1, [CONSTPOOL] ands r0, r1 CONSTPOOL: 0xfffffeec And now produces: movs r1, #255 adds r1, #20 ; Less costly immediate generation bics r0, r1 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@272251 91177308-0d34-0410-b5e6-96231b3b80d8	2016-06-09 07:39:08 +00:00
Craig Topper	cadff981d8	[X86] Fix a test I failed to re-generate in r272249. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@272250 91177308-0d34-0410-b5e6-96231b3b80d8	2016-06-09 07:10:34 +00:00
Craig Topper	1b683873d6	[X86] Bring consistent naming to the SSE/AVX and AVX512 PALIGNR instructions. Then add shuffle decode printing for the EVEX forms which is made easier by having the naming structure more similar to other instructions. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@272249 91177308-0d34-0410-b5e6-96231b3b80d8	2016-06-09 07:06:38 +00:00
Quentin Colombet	c4f21c06ed	[MIR] Check that generic virtual registers get a size. Without that check it was possible to write test cases where the size was not specified and we ended up with weird asserts down the road, because the default value (1) would not make sense. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@272226 91177308-0d34-0410-b5e6-96231b3b80d8	2016-06-08 23:27:46 +00:00
Dehao Chen	8153f63460	Revive http://reviews.llvm.org/D12778 to handle forward-hot-prob and backward-hot-prob consistently. Summary: Consider the following diamond CFG: A / \ B C \/ D Suppose A->B and A->C have probabilities 81% and 19%. In block-placement, A->B is called a hot edge and the final placement should be ABDC. However, the current implementation outputs ABCD. This is because when choosing the next block of B, it checks if Freq(C->D) > Freq(B->D) * 20%, which is true (if Freq(A) = 100, then Freq(B->D) = 81, Freq(C->D) = 19, and 19 > 8120%=16.2). Actually, we should use 25% instead of 20% as the probability here, so that we have 19 < 8125%=20.25, and the desired ABDC layout will be generated. Reviewers: djasper, davidxl Subscribers: llvm-commits Differential Revision: http://reviews.llvm.org/D20989 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@272203 91177308-0d34-0410-b5e6-96231b3b80d8	2016-06-08 21:30:12 +00:00
Quentin Colombet	792b56f6a7	[AArch64][RegisterBankInfo] G_OR are fine on either GPR or FPR. Teach AArch64RegisterBankInfo that G_OR can be mapped on either GPR or FPR for 64-bit or 32-bit values. Add test cases demonstrating how this information is used to coalesce a computation on a single register bank. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@272170 91177308-0d34-0410-b5e6-96231b3b80d8	2016-06-08 16:53:32 +00:00
Oliver Stannard	25429add0f	[ARM] MSR instructions implicitly set CPSR The MSR instructions can write to the CPSR, but we did not model this fact, so we could emit them in the middle of IT blocks, changing the condition flags for later instructions in the block. The tests use two calls to llvm.write_register.i32 because it is valid to use these instructions at the end of an IT block, which if conversion does do in some cases. With two calls, the first clobbers the flags, so a branch has to be used to make the second one conditional. Differential Revision: http://reviews.llvm.org/D21139 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@272154 91177308-0d34-0410-b5e6-96231b3b80d8	2016-06-08 15:26:34 +00:00
Matthias Braun	0f19dc2756	MIR: Fix parsing of stack object references in MachineMemOperands The MachineMemOperand parser lacked the code to handle %stack.X references (%fixed-stack.X was working). git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@272082 91177308-0d34-0410-b5e6-96231b3b80d8	2016-06-08 00:47:07 +00:00
Nicolai Haehnle	2ac1fa00c9	AMDGPU: Add amdgpu-ps-wqm-outputs function attributes Summary: The presence of this attribute indicates that VGPR outputs should be computed in whole quad mode. This will be used by Mesa for prolog pixel shaders, so that derivatives can be taken of shader inputs computed by the prolog, fixing a bug. The generated code could certainly be improved: if a prolog pixel shader is used (which isn't common in modern OpenGL - they're used for gl_Color, polygon stipples, and forcing per-sample interpolation), Mesa will use this attribute unconditionally, because it has to be conservative. So WQM may be used in the prolog when it isn't really needed, and furthermore a silly back-and-forth switch is likely to happen at the boundary between prolog and main shader parts. Fixing this is a bit involved: we'd first have to add a mechanism by which LLVM writes the WQM-related input requirements to the main shader part binary, and then Mesa specializes the prolog part accordingly. At that point, we may as well just compile a monolithic shader... Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=95130 Reviewers: arsenm, tstellarAMD, mareko Subscribers: arsenm, llvm-commits, kzhuravl Differential Revision: http://reviews.llvm.org/D20839 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@272063 91177308-0d34-0410-b5e6-96231b3b80d8	2016-06-07 21:37:17 +00:00
Simon Pilgrim	3aa4772f49	[X86][SSE4A] Regenerated SSE4A intrinsics tests There are no VEX encoded versions of SSE4A instructions, make sure that AVX targets give the same output git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@272060 91177308-0d34-0410-b5e6-96231b3b80d8	2016-06-07 21:15:45 +00:00
Eric Christopher	c2a7f10882	Revert "Differential Revision: http://reviews.llvm.org/D20557 " Author: Wei Ding <wei.ding2@amd.com> Date: Tue Jun 7 19:04:44 2016 +0000 Differential Revision: http://reviews.llvm.org/D20557 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@272044 91177308-0d34-0410-b5e6-96231b3b80d8 as it was breaking the bots. This reverts commit r272044. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@272056 91177308-0d34-0410-b5e6-96231b3b80d8	2016-06-07 20:27:12 +00:00
Etienne Bergeron	70cf01c276	[stack-protection] Add support for MSVC buffer security check Summary: This patch is adding support for the MSVC buffer security check implementation The buffer security check is turned on with the '/GS' compiler switch. * https://msdn.microsoft.com/en-us/library/8dbf701c.aspx * To be added to clang here: http://reviews.llvm.org/D20347 Some overview of buffer security check feature and implementation: * https://msdn.microsoft.com/en-us/library/aa290051(VS.71).aspx * http://www.ksyash.com/2011/01/buffer-overflow-protection-3/ * http://blog.osom.info/2012/02/understanding-vs-c-compilers-buffer.html For the following example: ``` int example(int offset, int index) { char buffer[10]; memset(buffer, 0xCC, index); return buffer[index]; } ``` The MSVC compiler is adding these instructions to perform stack integrity check: ``` push ebp mov ebp,esp sub esp,50h [1] mov eax,dword ptr [__security_cookie (01068024h)] [2] xor eax,ebp [3] mov dword ptr [ebp-4],eax push ebx push esi push edi mov eax,dword ptr [index] push eax push 0CCh lea ecx,[buffer] push ecx call _memset (010610B9h) add esp,0Ch mov eax,dword ptr [index] movsx eax,byte ptr buffer[eax] pop edi pop esi pop ebx [4] mov ecx,dword ptr [ebp-4] [5] xor ecx,ebp [6] call @__security_check_cookie@4 (01061276h) mov esp,ebp pop ebp ret ``` The instrumentation above is: * [1] is loading the global security canary, * [3] is storing the local computed ([2]) canary to the guard slot, * [4] is loading the guard slot and ([5]) re-compute the global canary, * [6] is validating the resulting canary with the '__security_check_cookie' and performs error handling. Overview of the current stack-protection implementation: * lib/CodeGen/StackProtector.cpp * There is a default stack-protection implementation applied on intermediate representation. * The target can overload 'getIRStackGuard' method if it has a standard location for the stack protector cookie. * An intrinsic 'Intrinsic::stackprotector' is added to the prologue. It will be expanded by the instruction selection pass (DAG or Fast). * Basic Blocks are added to every instrumented function to receive the code for handling stack guard validation and errors handling. * Guard manipulation and comparison are added directly to the intermediate representation. * lib/CodeGen/SelectionDAG/SelectionDAGISel.cpp * lib/CodeGen/SelectionDAG/SelectionDAGBuilder.cpp * There is an implementation that adds instrumentation during instruction selection (for better handling of sibbling calls). * see long comment above 'class StackProtectorDescriptor' declaration. * The target needs to override 'getSDagStackGuard' to activate SDAG stack protection generation. (note: getIRStackGuard MUST be nullptr). * 'getSDagStackGuard' returns the appropriate stack guard (security cookie) * The code is generated by 'SelectionDAGBuilder.cpp' and 'SelectionDAGISel.cpp'. * include/llvm/Target/TargetLowering.h * Contains function to retrieve the default Guard 'Value'; should be overriden by each target to select which implementation is used and provide Guard 'Value'. * lib/Target/X86/X86ISelLowering.cpp * Contains the x86 specialisation; Guard 'Value' used by the SelectionDAG algorithm. Function-based Instrumentation: * The MSVC doesn't inline the stack guard comparison in every function. Instead, a call to '__security_check_cookie' is added to the epilogue before every return instructions. * To support function-based instrumentation, this patch is * adding a function to get the function-based check (llvm 'Value', see include/llvm/Target/TargetLowering.h), * If provided, the stack protection instrumentation won't be inlined and a call to that function will be added to the prologue. * modifying (SelectionDAGISel.cpp) do avoid producing basic blocks used for inline instrumentation, * generating the function-based instrumentation during the ISEL pass (SelectionDAGBuilder.cpp), * if FastISEL (not SelectionDAG), using the fallback which rely on the same function-based implemented over intermediate representation (StackProtector.cpp). Modifications * adding support for MSVC (lib/Target/X86/X86ISelLowering.cpp) * adding support function-based instrumentation (lib/CodeGen/SelectionDAG/SelectionDAGBuilder.cpp, .h) Results * IR generated instrumentation: ``` clang-cl /GS test.cc /Od /c -mllvm -print-isel-input ``` ``` * Final LLVM Code input to ISel * ; Function Attrs: nounwind sspstrong define i32 @"\01?example@@YAHHH@Z"(i32 %offset, i32 %index) #0 { entry: %StackGuardSlot = alloca i8* <<<-- Allocated guard slot %0 = call i8* @llvm.stackguard() <<<-- Loading Stack Guard value call void @llvm.stackprotector(i8* %0, i8** %StackGuardSlot) <<<-- Prologue intrinsic call (store to Guard slot) %index.addr = alloca i32, align 4 %offset.addr = alloca i32, align 4 %buffer = alloca [10 x i8], align 1 store i32 %index, i32* %index.addr, align 4 store i32 %offset, i32* %offset.addr, align 4 %arraydecay = getelementptr inbounds [10 x i8], [10 x i8]* %buffer, i32 0, i32 0 %1 = load i32, i32* %index.addr, align 4 call void @llvm.memset.p0i8.i32(i8* %arraydecay, i8 -52, i32 %1, i32 1, i1 false) %2 = load i32, i32* %index.addr, align 4 %arrayidx = getelementptr inbounds [10 x i8], [10 x i8]* %buffer, i32 0, i32 %2 %3 = load i8, i8* %arrayidx, align 1 %conv = sext i8 %3 to i32 %4 = load volatile i8, i8* %StackGuardSlot <<<-- Loading Guard slot call void @__security_check_cookie(i8* %4) <<<-- Epilogue function-based check ret i32 %conv } ``` * SelectionDAG generated instrumentation: ``` clang-cl /GS test.cc /O1 /c /FA ``` ``` "?example@@YAHHH@Z": # @"\01?example@@YAHHH@Z" # BB#0: # %entry pushl %esi subl $16, %esp movl ___security_cookie, %eax <<<-- Loading Stack Guard value movl 28(%esp), %esi movl %eax, 12(%esp) <<<-- Store to Guard slot leal 2(%esp), %eax pushl %esi pushl $204 pushl %eax calll _memset addl $12, %esp movsbl 2(%esp,%esi), %esi movl 12(%esp), %ecx <<<-- Loading Guard slot calll @__security_check_cookie@4 <<<-- Epilogue function-based check movl %esi, %eax addl $16, %esp popl %esi retl ``` Reviewers: kcc, pcc, eugenis, rnk Subscribers: majnemer, llvm-commits, hans, thakis, rnk Differential Revision: http://reviews.llvm.org/D20346 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@272053 91177308-0d34-0410-b5e6-96231b3b80d8	2016-06-07 20:15:35 +00:00
Wei Ding	e2d1122183	Differential Revision: http://reviews.llvm.org/D20557 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@272044 91177308-0d34-0410-b5e6-96231b3b80d8	2016-06-07 19:04:44 +00:00
Geoff Berry	f323692d97	Reapply [AArch64] Fix isLegalAddImmediate() to return true for valid negative values. Originally reviewed here: http://reviews.llvm.org/D17463 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@272023 91177308-0d34-0410-b5e6-96231b3b80d8	2016-06-07 16:48:43 +00:00
Haicheng Wu	63ca44cb85	Revert "[MBP] Reduce code size by running tail merging in MBP." This reverts commit r271930, r271915, r271923. They break a thumb selfhosting bot. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@272017 91177308-0d34-0410-b5e6-96231b3b80d8	2016-06-07 15:17:21 +00:00
Simon Pilgrim	c9a83046c3	[X86][AVX512] Added 512-bit integer vector non-temporal load tests git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@272016 91177308-0d34-0410-b5e6-96231b3b80d8	2016-06-07 15:12:47 +00:00
Simon Pilgrim	6ad3e358e9	[X86][SSE] Add general lowering of nontemporal vector loads Currently the only way to use the (V)MOVNTDQA nontemporal vector loads instructions is through the int_x86_sse41_movntdqa style builtins. This patch adds support for lowering nontemporal loads from general IR, allowing us to remove the movntdqa builtins in a future patch. We currently still fold nontemporal loads into suitable instructions, we should probably look at removing this (and nontemporal stores as well) or at least make the target's folding implementation aware that its dealing with a nontemporal memory transaction. There is also an issue that VMOVNTDQA only acts on 128-bit vectors on pre-AVX2 hardware - so currently a normal ymm load is still used on AVX1 targets. Differential Review: http://reviews.llvm.org/D20965 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@272010 91177308-0d34-0410-b5e6-96231b3b80d8	2016-06-07 13:34:24 +00:00
James Molloy	6e988c5976	[Thumb-1] Add optimized constant materialization for integers [256..512) We can materialize these integers using a MOV; ADDi8 pair. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@272007 91177308-0d34-0410-b5e6-96231b3b80d8	2016-06-07 13:10:14 +00:00
Igor Breger	7e0019d8f7	[AVX512] Fix load opcode for fast isel. Differential Revision: http://reviews.llvm.org/D21067 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@272006 91177308-0d34-0410-b5e6-96231b3b80d8	2016-06-07 13:08:45 +00:00
Ulrich Weigand	d7ad443387	[PowerPC] Support multiple return values with fast isel Using an LLVM IR aggregate return value type containing three or more integer values causes an abort in the fast isel pass. This patch adds two more registers to RetCC_PPC64_ELF_FIS to allow returning up to four integers with fast isel, just the same as is currently supported with regular isel (RetCC_PPC). This is needed for Swift and (possibly) other non-clang frontends. Fixes PR26190. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@272005 91177308-0d34-0410-b5e6-96231b3b80d8	2016-06-07 12:48:22 +00:00
Simon Pilgrim	c1e27cc453	[X86][SSE] Improved blend+zero target shuffle combining to use combined shuffle mask directly We currently only combine to blend+zero if the target value type has 8 elements or less, but this was missing a lot of cases where the combined mask had been widened. This change makes it so we use the combined mask to determine the blend value type, allowing us to catch more widened cases. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@272003 91177308-0d34-0410-b5e6-96231b3b80d8	2016-06-07 12:20:14 +00:00
James Molloy	87f50aafbc	[ARM] Shrink post-indexed LDR and STR to LDM/STM A Thumb-2 post-indexed LDR instruction such as: ldr.w r0, [r1], #4 Can be rewritten as: ldm.n r1!, {r0} LDMs can be more expensive than LDRs on some cores, so this has been enabled only in minsize mode. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@272002 91177308-0d34-0410-b5e6-96231b3b80d8	2016-06-07 12:13:34 +00:00
James Molloy	d5127f4273	[ARM] Transform LDMs into writeback form to save code size If we have an LDM that uses only low registers and doesn't write to its base register: ldm.w r0, {r1, r2, r3} And that base register is dead after the LDM, then we can convert it to writeback form and use a narrow encoding: ldm.n r0!, {r1, r2, r3} Obviously, this introduces a new register write and so can cause WAW hazards, so I've enabled it only in minsize mode. This is a code size trick that ARM Compiler 5 ("armcc") does that we don't. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@272000 91177308-0d34-0410-b5e6-96231b3b80d8	2016-06-07 11:47:24 +00:00
Saleem Abdulrasool	48ff9d62da	ARM: correct TLS access on WoA TLS access requires an offset from the TLS index. The index itself is the section-relative distance of the symbol. For ARM, the relevant relocation (IMAGE_REL_ARM_SECREL) is applied as a constant. This means that the value may not be an immediate and must be lowered into a constant pool. This offset will not be base relocated. We were previously emitting the actual address of the symbol which would be base relocated and would therefore be the vaue offset by the ImageBase + TLS Offset. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@271974 91177308-0d34-0410-b5e6-96231b3b80d8	2016-06-07 03:15:07 +00:00
Matt Arsenault	bc1b8d5b49	AMDGPU: Fix constantexpr addrspacecasts If we had a constant group address space cast the queue pointer wasn't enabled for the function, resulting in a crash on noreg later. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@271935 91177308-0d34-0410-b5e6-96231b3b80d8	2016-06-06 20:03:31 +00:00
Haicheng Wu	a6caf31dea	Fix a test case. NFC. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@271930 91177308-0d34-0410-b5e6-96231b3b80d8	2016-06-06 19:11:53 +00:00
Haicheng Wu	84755987d0	[MBP] Reduce code size by running tail merging in MBP. The code layout that TailMerging (inside BranchFolding) works on is not the final layout optimized based on the branch probability. Generally, after BlockPlacement, many new merging opportunities emerge. This patch calls Tail Merging after MBP and calls MBP again if Tail Merging merges anything. Differential Revision: http://reviews.llvm.org/D20276 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@271925 91177308-0d34-0410-b5e6-96231b3b80d8	2016-06-06 18:36:07 +00:00
Artem Tamazov	7049ac906c	[AMDGPU][llvm-mc] v_cndmask_b32: src2 is mandatory; do not enforce VOP2 when src2 == VCC. Another step for unification llvm assembler/disassembler with sp3. Besides, CodeGen output is a bit improved, thus changes in CodeGen tests. Assembler/Disassembler tests updated/added. Differential Revision: http://reviews.llvm.org/D20796 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@271900 91177308-0d34-0410-b5e6-96231b3b80d8	2016-06-06 15:23:43 +00:00
Igor Breger	5238dbe213	[KNL] Fix UMULO lowering. Differential Revision: http://reviews.llvm.org/D21013 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@271891 91177308-0d34-0410-b5e6-96231b3b80d8	2016-06-06 12:24:52 +00:00
Craig Topper	6dbfac925f	[AVX512] Remove masked palignr intrinsics and auto-upgrade them to native IR of vector shuffle and select. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@271872 91177308-0d34-0410-b5e6-96231b3b80d8	2016-06-06 06:12:54 +00:00
Craig Topper	856b53e006	[AVX512] Add PALIGNR shuffle lowering for v32i16 and v16i32. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@271870 91177308-0d34-0410-b5e6-96231b3b80d8	2016-06-06 05:39:10 +00:00
Craig Topper	5cc4ee2898	[AVX512] Update tests to show shuffle decoding for vpshuflw/vpshufhw. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@271869 91177308-0d34-0410-b5e6-96231b3b80d8	2016-06-06 05:39:07 +00:00

1 2 3 4 5 ...

16989 Commits