llvm-mirror

mirror of https://github.com/RPCS3/llvm-mirror.git synced 2024-12-17 08:36:52 +00:00

Author	SHA1	Message	Date
Adam Nemet	78db8293e5	[AVX512] Fix copy-and-paste bugs in vpermil 1) i512mem -> f512mem (this is the packed FP input being permuted) 2) element size is 64 bits in EVEX_CD8 for PD. (A good illustration why X86VectorVTInfo is useful) llvm-svn: 220734	2014-10-27 23:08:31 +00:00
Pete Cooper	01b2132972	Fix a stackmap bug introduced in r220710. For a call to not return in to the stackmap shadow, the shadow must end with the call. To do this, we must insert any required nops before the call, and not after it. llvm-svn: 220728	2014-10-27 22:38:45 +00:00
Juergen Ributzka	b3117b0b86	[FastISel][AArch64] Emit immediate version of icmp (subs) for null pointer check. This is a minor change to use the immediate version when the operand is a null value. This should get rid of an unnecessary 'mov' instruction in debug builds and align the code more with the one generated by SelectionDAG. This fixes rdar://problem/18785125. llvm-svn: 220713	2014-10-27 19:58:36 +00:00
Juergen Ributzka	76c57b0570	[FastISel][AArch64] Optimize compare-and-branch for i1 to use 'tbz'. Minor enhancement to use 'tbz' for i1 compare-and-branch to get rid of an 'and' instruction. This fixes rdar://problem/18784953. llvm-svn: 220712	2014-10-27 19:46:23 +00:00
Pete Cooper	87efb91a50	Stackmap shadows should consider call returns a branch target. To avoid emitting too many nops, a stackmap shadow can include emitted instructions in the shadow, but these must not include branch targets. A return from a call should count as a branch target as patching over the instructions after the call would lead to incorrect behaviour for threads currently making that call, when they return. llvm-svn: 220710	2014-10-27 19:40:35 +00:00
Juergen Ributzka	3423c5ccda	[FastISel][AArch64] Use 'cbz' also for null values (pointers). The pattern matching for a 'ConstantInt' value was too restrictive. Checking for a 'Constant' with a bull value is sufficient for using an 'cbz/cbnz' instruction. This fixes rdar://problem/18784732. llvm-svn: 220709	2014-10-27 19:38:05 +00:00
Juergen Ributzka	64c2c99226	[FastISel][AArch64] Don't fold the 'and' instruction into the 'tbz/tbnz' instruction if it is in a different basic block. This fixes a bug where the input register was not defined for the 'tbz/tbnz' instruction. This happened, because we folded the 'and' instruction from a different basic block. This fixes rdar://problem/18784013. llvm-svn: 220704	2014-10-27 19:16:48 +00:00
Juergen Ributzka	57783726dd	[FastISel][AArch64] Fix load/store with frame indices. At higher optimization levels the LLVM IR may contain more complex patterns for loads/stores from/to frame indices. The 'computeAddress' function wasn't able to handle this and triggered an assertion. This fix extends the possible addressing modes for frame indices. This fixes rdar://problem/18783298. llvm-svn: 220700	2014-10-27 18:21:58 +00:00
Lang Hames	77d387a954	[PBQP] Unique allowed-sets for nodes in the PBQP graph and use pairs of these sets as keys into a cache of interference matrice values in the Interference constraint adder. Creating interference matrices was one of the large remaining time-sinks in PBQP. Caching them reduces the total compile time (when using PBQP) on the nightly test suite by ~10%. llvm-svn: 220688	2014-10-27 17:44:25 +00:00
NAKAMURA Takumi	a1ef3346aa	Prune CRLF. llvm-svn: 220678	2014-10-27 12:37:26 +00:00
Oliver Stannard	9a595b2769	[ARM] Select VMAXNM and VMINNM regardless of operand order Currently, the ARM backend will select the VMAXNM and VMINNM for these C expressions: (a < b) ? a : b (a > b) ? a : b but not these expressions: (a > b) ? b : a (a < b) ? b : a This patch allows all of these expressions to be matched. llvm-svn: 220671	2014-10-27 09:23:02 +00:00
Yuri Gorshenin	0701551505	[asan-asm-instrumentation] Added comment describing how asm instrumentation works. Summary: [asan-asm-instrumentation] Added comment describing how asm instrumentation works. Reviewers: eugenis Subscribers: llvm-commits Differential Revision: http://reviews.llvm.org/D5970 llvm-svn: 220670	2014-10-27 08:38:54 +00:00
Elena Demikhovsky	467839f2f5	AVX-512: Fixed encoding of VPBROADCASTM and added SKX forms of this instruction llvm-svn: 220638	2014-10-26 09:52:24 +00:00
Simon Pilgrim	c20b3d1fdd	[X86][SSE] Vector integer/float conversion memory folding Tidied up some entries in the folding tables so that they are under the correct comment section (they were categorised as AVX2 instructions when they're AVX1). Minor patch agreed with qcolombet. llvm-svn: 220613	2014-10-25 08:11:20 +00:00
Jingyue Wu	376aaf44c4	[NVPTX] aligned byte-buffers for vector return types Summary: Fixes PR21100 which is caused by inconsistency between the declared return type and the expected return type at the call site. The new behavior is consistent with nvcc and the NVPTXTargetLowering::getPrototype function. Test Plan: test/Codegen/NVPTX/vector-return.ll Reviewers: jholewinski Reviewed By: jholewinski Subscribers: llvm-commits, meheff, eliben, jholewinski Differential Revision: http://reviews.llvm.org/D5612 llvm-svn: 220607	2014-10-25 03:46:16 +00:00
Kevin Enderby	9e33867d25	Fix a Mach-O assembler segfault for a subtraction expression with an undefined symbol. In a Mach-O object file a relocatable expression of the form SymbolA - SymbolB + constant is allowed when both symbols are defined in a section. But when either symbol is undefined it is an error. The code was crashing when it had an undefined symbol in this case. And should have printed a error message using the location information in the relocation entry. rdar://18678402 llvm-svn: 220599	2014-10-24 22:39:40 +00:00
Simon Pilgrim	29b3d266a4	[X86][SSE] Bitcast assertion in XFormVExtractWithShuffleIntoLoad Minor patch to fix an issue in XFormVExtractWithShuffleIntoLoad where a load is unary shuffled, then bitcast (to a type with the same number of elements) before extracting an element. An undef was created for the second shuffle operand using the original (post-bitcasted) vector type instead of the pre-bitcasted type like the rest of the shuffle node - this was then causing an assertion on the different types later on inside SelectionDAG::getVectorShuffle. Differential Revision: http://reviews.llvm.org/D5917 llvm-svn: 220592	2014-10-24 21:04:41 +00:00
Colin LeMahieu	a5b4b06680	[Hexagon] Resubmission of 220427 Modified library structure to deal with circular dependency between HexagonInstPrinter and HexagonMCInst. Adding encoding bits for add opcode. Adding llvm-mc tests. Removing unit tests. http://reviews.llvm.org/D5624 llvm-svn: 220584	2014-10-24 19:00:32 +00:00
Sanjay Patel	4e42d925c1	Allow AVX vrsqrtps generation. This is a follow-on to r220570 that allows a 256-bit (v8f32) version of vrsqrtps to be generated. llvm-svn: 220579	2014-10-24 17:59:18 +00:00
Sanjay Patel	d9b7837012	Use rsqrt (X86) to speed up reciprocal square root calcs This is a first step for generating SSE rsqrt instructions for reciprocal square root calcs when fast-math is allowed. For now, be conservative and only enable this for AMD btver2 where performance improves significantly - for example, 29% on llvm/projects/test-suite/SingleSource/Benchmarks/BenchmarkGame/n-body.c (if we convert the data type to single-precision float). This patch adds a two constant version of the Newton-Raphson refinement algorithm to DAGCombiner that can be selected by any target via a parameter returned by getRsqrtEstimate().. See PR20900 for more details: http://llvm.org/bugs/show_bug.cgi?id=20900 Differential Revision: http://reviews.llvm.org/D5658 llvm-svn: 220570	2014-10-24 17:02:16 +00:00
Daniel Sanders	3747982b4e	[mips] Replace MipsABIEnum with a MipsABIInfo class. Summary: No functional change yet, it's just an object replacement for an enum. It will allow us to gather ABI information in a single place so that we can start testing for properties of the ABI's instead of the ABI itself. For example we will eventually be able to use: ABI.MinStackAlignmentInBytes() instead of: (isABI_N32() \|\| isABI_N64()) ? 16 : 8 which is clearer and more maintainable. Reviewers: matheusalmeida Reviewed By: matheusalmeida Differential Revision: http://reviews.llvm.org/D3341 llvm-svn: 220568	2014-10-24 16:15:27 +00:00
Daniel Sanders	656a7ed742	[mips] Fix >80-column line llvm-svn: 220564	2014-10-24 14:46:00 +00:00
Daniel Sanders	5f05e5e050	[mips] Remove redundant code in RetCC_MipsN. NFC. Summary: i32 is always promoted to i64 so it no longer makes sense to assign i32 to registers. Reviewers: vmedic Reviewed By: vmedic Subscribers: llvm-commits Differential Revision: http://reviews.llvm.org/D5964 llvm-svn: 220561	2014-10-24 13:49:54 +00:00
Daniel Sanders	d767694dc4	[mips] For N32/N64, structs must be passed in the upper bits of a register. Summary: Most structs were fixed by r218451 but those of between >32-bits and <64-bits remained broken since they were not marked with [ASZ]ExtUpper. This patch fixes the remaining cases by using CCPromoteToUpperBitsInType<i64> on i64's in addition to i32 and smaller. Reviewers: vmedic Reviewed By: vmedic Subscribers: llvm-commits Differential Revision: http://reviews.llvm.org/D5963 llvm-svn: 220556	2014-10-24 13:09:19 +00:00
Oliver Stannard	4116b199b2	[AArch64] Fix fast-isel of cbz of i1, i8, i16 This fixes a miscompilation in the AArch64 fast-isel which was triggered when a branch is based on an icmp with condition eq or ne, and type i1, i8 or i16. The cbz instruction compares the whole 32-bit register, so values with the bottom 1, 8 or 16 bits clear would cause the wrong branch to be taken. llvm-svn: 220553	2014-10-24 09:54:41 +00:00
Adam Nemet	01508c4733	[AVX512] FMA support for the 231 variants This is asm/diasm-only support, similar to AVX. For ISeling the register variant, they are no different from 213 other than whether the multiplication or the addition operand is destructed. For ISeling the memory variant, i.e. to fold a load, they are no different than the 132 variant. The addition operand (op3) in both cases can come from memory. Again the ony difference is which operand is destructed. There could be a post-RA pass that would convert a 213 or 132 into a 231. Part of <rdar://problem/17082571> llvm-svn: 220540	2014-10-24 00:03:00 +00:00
Adam Nemet	3beded2b5d	[AVX512] Introduce fma3p_forms from AVX This multiclass generates the different forms: 213, 231, 132 in AVX. 132 in AVX512 is a separate class but I am planning to use this same multiclass to generate 231 relying on the nice the null_frag trick from AVX to disable codegen pattern for 231. No functionality change, no change in X86.td.expanded except for the different instruction definition names. llvm-svn: 220539	2014-10-24 00:02:55 +00:00
Ahmed Bougacha	af95bf4558	[X86] Improve mul w/ overflow codegen, to MUL8+SETO. Currently, @llvm.smul.with.overflow.i8 expands to 9 instructions, where 3 are really needed. This adds X86ISD::UMUL8/SMUL8 SD nodes, and custom lowers them to MUL8/IMUL8 + SETO. i8 is a special case because there is no two/three operand variants of (I)MUL8, so the first operand and return value need to go in AL/AX. Also, we can't write patterns for these instructions: TableGen refuses patterns where output operands don't match SDNode results. In this case, instructions where the output operand is an implicitly defined register. A related special case (and FIXME) exists for MUL8 (X86InstrArith.td): // FIXME: Used for 8-bit mul, ignore result upper 8 bits. // This probably ought to be moved to a def : Pat<> if the // syntax can be accepted. [(set AL, (mul AL, GR8:$src)), (implicit EFLAGS)] Ideally, these go away with UMUL8, but we still need to improve TableGen support of implicit operands in patterns. Before this change: movsbl %sil, %eax movsbl %dil, %ecx imull %eax, %ecx movb %cl, %al sarb $7, %al movzbl %al, %eax movzbl %ch, %esi cmpl %eax, %esi setne %al After: movb %dil, %al imulb %sil seto %al Also, remove a made-redundant testcase for PR19858, and enable more FastISel ALU-overflow tests for SelectionDAG too. Differential Revision: http://reviews.llvm.org/D5809 llvm-svn: 220516	2014-10-23 21:55:31 +00:00
Renato Golin	cd30ecdc65	Do not emit intermediate register for zero FP immediate This updates check for double precision zero floating point constant to allow use of instruction with immediate value rather than temporary register. Currently "a == 0.0", where "a" is of "double" type generates: vmov.i32 d16, #0x0 vcmpe.f64 d0, d16 With this change it becomes: vcmpe.f64 d0, #0 Patch by Sergey Dmitrouk. llvm-svn: 220486	2014-10-23 15:31:50 +00:00
NAKAMURA Takumi	b1cc8f92d6	Hexagon/Disassembler/LLVMBuild.txt: Update libdeps. llvm-svn: 220482	2014-10-23 11:32:16 +00:00
NAKAMURA Takumi	985e79f39f	Hexagon/LLVMBuild.txt: Prune CRLF. llvm-svn: 220481	2014-10-23 11:32:03 +00:00
NAKAMURA Takumi	0cdee77f2d	[CMake] Prune CRLF in CMakeLists.txt(s). llvm-svn: 220480	2014-10-23 11:31:50 +00:00
NAKAMURA Takumi	d8b493b106	Revert r220427, "[Hexagon] Adding encoding bits for add opcode." It brought cyclic dependecy between HexagonAsmPrinter and HexagonDesc. llvm-svn: 220478	2014-10-23 11:31:22 +00:00
Zoran Jovanovic	417b0378da	[mips][microMIPS] Implement ADDIUR1SP instruction Differential Revision: http://reviews.llvm.org/D5153 llvm-svn: 220477	2014-10-23 11:13:59 +00:00
Zoran Jovanovic	7d4ae91afe	ps][microMIPS] Implement ADDIUR2 instruction Differential Revision: http://reviews.llvm.org/D5151 llvm-svn: 220476	2014-10-23 11:06:34 +00:00
Zoran Jovanovic	c644ce2e43	ps][microMIPS] Implement LI16 instruction Differential Revision: http://reviews.llvm.org/D5149 llvm-svn: 220475	2014-10-23 10:59:24 +00:00
Zoran Jovanovic	0172553d1e	[mips][microMIPS] Implement CodeGen support for SLL16 and SRL16 instructions Differential Revision: http://reviews.llvm.org/D5774 llvm-svn: 220474	2014-10-23 10:42:01 +00:00
Oliver Stannard	b9f2d13ec6	[Thumb2] Improve disassembly of memory hints Currently, the ARM disassembler will disassemble the Thumb2 memory hint instructions (PLD, PLDW and PLI), even for targets which do not have these instructions. This patch adds the required checks to the disassmebler. llvm-svn: 220472	2014-10-23 08:52:58 +00:00
Akira Hatanaka	f1f0b1a1fd	[ARM, stack protector] If supported, use armv7 instructions. This commit enables using movt/movw to load the stack guard address: movw r0, :lower16:(L_g3$non_lazy_ptr-(LPC0_0+8)) movt r0, :upper16:(L_g3$non_lazy_ptr-(LPC0_0+8)) ldr r0, [pc, r0] Previously a pc-relative load was emitted: ldr r0, LCPI0_0 ldr r0, [pc, r0] rdar://problem/18740489 llvm-svn: 220470	2014-10-23 04:17:05 +00:00
Colin LeMahieu	88461f5b7a	[Hexagon] Adding encoding bits for add opcode. Adding llvm-mc tests. Removing unit tests. http://reviews.llvm.org/D5624 llvm-svn: 220427	2014-10-22 20:58:35 +00:00
Chad Rosier	19ef65a831	[AArch64] Add support for the .inst directive. This has been implement using the MCTargetStreamer interface as is done in the ARM, Mips and PPC backends. Phabricator: http://reviews.llvm.org/D5891 PR20964 llvm-svn: 220422	2014-10-22 20:35:57 +00:00
Hans Wennborg	dbfab2bba5	Fix VS2012 build; C++11 type aliases are not supported. llvm-svn: 220399	2014-10-22 17:47:49 +00:00
Colin LeMahieu	f359444940	Ammending 220393 - Removing unused decoding tables. llvm-svn: 220397	2014-10-22 17:23:01 +00:00
Colin LeMahieu	05568816b4	Ammending 220393 - Removing unused functions. llvm-svn: 220396	2014-10-22 17:03:19 +00:00
Bill Schmidt	1033a05374	[PATCH] Support select-cc for VSFRC when VSX is enabled A previous patch enabled SELECT_VSRC and SELECT_CC_VSRC for VSX to handle <2 x double> cases. This patch adds SELECT_VSFRC and SELECT_CC_VSFRC to allow use of all 64 vector-scalar registers for the f64 type when VSX is enabled. The changes are analogous to those in the previous patch. I've added a new variant to vsx.ll to test the code generation. (I also cleaned up a little formatting in PPCInstrVSX.td from the previous patch.) llvm-svn: 220395	2014-10-22 16:58:20 +00:00
Colin LeMahieu	f6f8bdf091	[Hexagon] Adding basic disassembler. Marking all instructions as CodeGenOnly since encoding bits are not set yet. http://reviews.llvm.org/D5829?vs=on&id=15023&whitespace=ignore-all#toc llvm-svn: 220393	2014-10-22 16:49:14 +00:00
Bill Schmidt	884633f291	[PowerPC] Support select-cc for VSX The tests test/CodeGen/Generic/select-cc.ll and test/CodeGen/PowerPC/select-cc.ll both fail with VSX enabled. The problem is that the lowering logic for the SELECT and SELECT_CC operations doesn't currently support the VSX registers. This patch fixes that. In lib/Target/PowerPC/PPCInstrInfo.td, we have pseudos to handle this for other register classes. Similar pseudos are added in PPCInstrVSX.td (they must be there, because the "vsrc" register class definition appears there) for the VSRC register class. The SELECT_VSRC pseudo is then used in pattern matching for SELECT_CC. The rest of the patch just adds logic for SELECT_VSRC wherever similar logic appears for SELECT_VRRC. There are no new test cases because the existing tests above test this, along with a variant in test/CodeGen/PowerPC/vsx.ll. After discussion with Hal, a future patch will add similar _VSFRC variants to override f64 type handling (currently using F8RC). llvm-svn: 220385	2014-10-22 13:13:40 +00:00
Arnaud A. de Grandmaison	e8d641238b	[AArch64] Cleanup A57PBQPConstraints And add a long awaited testcase. llvm-svn: 220381	2014-10-22 12:40:20 +00:00
Jyoti Allur	555583e57e	[Thumb/Thumb2] Implement restrictions on SP in register list on LDM, STM variants in thumb mode llvm-svn: 220379	2014-10-22 10:41:14 +00:00
Matt Arsenault	2257f6b589	Add minnum / maxnum codegen llvm-svn: 220342	2014-10-21 23:01:01 +00:00

1 2 3 4 5 ...

30522 Commits