RPCS3/llvm - llvm - Gitea: Git with a cup of tea

RPCS3/llvm

mirror of https://github.com/RPCS3/llvm.git synced 2024-12-27 14:45:50 +00:00

Author	SHA1	Message	Date
Yaron Keren	c63035aa56	Add and update reset() and doInitialization() methods to MC* and passes. This enables reusing a PassManager instead of re-constructing it every time. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@217948 91177308-0d34-0410-b5e6-96231b3b80d8	2014-09-17 09:25:36 +00:00
Pavel Chupin	780f7e2168	[x32] Fix function indirect calls Summary: Zero-extend register to 64-bit for callq/jmpq. Test Plan: 3 tests added Reviewers: nadav, dschuff Subscribers: llvm-commits, zinovy.nis Differential Revision: http://reviews.llvm.org/D5355 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@217942 91177308-0d34-0410-b5e6-96231b3b80d8	2014-09-17 07:09:23 +00:00
Robin Morisset	5c16c4e45a	[X86] Use the generic AtomicExpandPass instead of X86AtomicExpandPass This required a new hook called hasLoadLinkedStoreConditional to know whether to expand atomics to LL/SC (ARM, AArch64, in a future patch Power) or to CmpXchg (X86). Apart from that, the new code in AtomicExpandPass is mostly moved from X86AtomicExpandPass. The main result of this patch is to get rid of that pass, which had lots of code duplicated with AtomicExpandPass. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@217928 91177308-0d34-0410-b5e6-96231b3b80d8	2014-09-17 00:06:58 +00:00
Adam Nemet	7cb345ea87	[X86] Improve comment git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@217885 91177308-0d34-0410-b5e6-96231b3b80d8	2014-09-16 17:14:10 +00:00
Elena Demikhovsky	0218e1e1da	AVX-512: added cost for some AVX-512 instructions git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@217863 91177308-0d34-0410-b5e6-96231b3b80d8	2014-09-16 07:57:37 +00:00
Chandler Carruth	07b445aff7	[x86] Remove a FIXME that doesn't make any sense. Only the lanes feeding the blend that is matched by this are "used" in any sense, and so any build_vector or other nodes feeding these will already drop other lanes. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@217855 91177308-0d34-0410-b5e6-96231b3b80d8	2014-09-16 02:16:42 +00:00
Chandler Carruth	2f21b7ec5c	[x86] Cleanup an unused variable by actually using it in the non-asserts place where it was needed. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@217854 91177308-0d34-0410-b5e6-96231b3b80d8	2014-09-16 02:14:51 +00:00
Chandler Carruth	2e363ece75	[x86] Remove the last vestiges of the BLENDI-based ADDSUB pattern matching. This design just fundamentally didn't work because ADDSUB is available prior to any legal lowerings of BLENDI nodes. Instead, we have a dedicated ADDSUB synthetic ISD node which is pattern matched trivially into the instructions. These nodes are then recognized by both the existing and a trivial new lowering combine in the backend. Removing these patterns required adding 2 missing shuffle masks to the DAG combine, without which tests would have failed. Added the masks and a helpful assert as well to catch if anything ever goes wrong here. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@217851 91177308-0d34-0410-b5e6-96231b3b80d8	2014-09-16 00:39:08 +00:00
Chandler Carruth	bad2c13aae	[x86] As a follow-up to r217819, don't check for VSELECT legality now that we don't use VSELECT and directly emit an addsub synthetic node. Also remove a stale comment referencing VSELECT. The test case is updated to use 'core2' which only has SSE3, not SSE4.1, and it still passes. Previously it would not because we lacked sufficient blend support to legalize the VSELECT. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@217849 91177308-0d34-0410-b5e6-96231b3b80d8	2014-09-16 00:24:42 +00:00
Chandler Carruth	cba9d1273a	[x86] Add the beginnings of a proper DAG combine to match ADDSUBPS and ADDSUBPD nodes out of blends of adds and subs. This allows us to actually form these instructions with SSE3 rather than only forming them when we had both SSE3 for the ADDSUB instructions and SSE4.1 for the blend instructions. ;] Kind-of important. I've adjusted the CPU requirements on one of the tests to demonstrate this kicking in nicely for an SSE3 cpu configuration. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@217848 91177308-0d34-0410-b5e6-96231b3b80d8	2014-09-16 00:15:20 +00:00
Juergen Ributzka	1ee1e8bdc2	[FastISel] Move optimizeCmpPredicate to FastISel base class. NFC. Make the optimizeCmpPredicate function available to all targets. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@217822 91177308-0d34-0410-b5e6-96231b3b80d8	2014-09-15 20:47:13 +00:00
Chandler Carruth	fa6cf7e73c	[x86] Start fixing our emission of ADDSUBPS and ADDSUBPD instructions by introducing a synthetic X86 ISD node representing this generic operation. The relevant patterns for mapping these nodes into the concrete instructions are also added, and a gnarly bit of C++ code in the target-specific DAG combiner is replaced with simple code emitting this primitive. The next step is to generically combine blends of adds and subs into this node so that we can drop the reliance on an SSE4.1 ISD node (BLENDI) when matching an SSE3 feature (ADDSUB). git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@217819 91177308-0d34-0410-b5e6-96231b3b80d8	2014-09-15 20:09:47 +00:00
Akira Hatanaka	348e9e7b6d	[X86] Fix a bug in X86's peephole optimization. Peephole optimization was folding MOVSDrm, which is a zero-extending double precision floating point load, into ADDPDrr, which is a SIMD add of two packed double precision floating point values. (before) %vreg21<def> = MOVSDrm <fi#0>, 1, %noreg, 0, %noreg; mem:LD8[%7](align=16)(tbaa=<badref>) VR128:%vreg21 %vreg23<def,tied1> = ADDPDrr %vreg20<tied0>, %vreg21; VR128:%vreg23,%vreg20,%vreg21 (after) %vreg23<def,tied1> = ADDPDrm %vreg20<tied0>, <fi#0>, 1, %noreg, 0, %noreg; mem:LD8[%7](align=16)(tbaa=<badref>) VR128:%vreg23,%vreg20 X86InstrInfo::foldMemoryOperandImpl already had the logic that prevented this from happening. However the check wasn't being conducted for loads from stack objects. This commit factors out the logic into a new function and uses it for checking loads from stack slots are not zero-extending loads. rdar://problem/18236850 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@217799 91177308-0d34-0410-b5e6-96231b3b80d8	2014-09-15 18:23:52 +00:00
Chandler Carruth	c5371836a5	[x86] Begin emitting PBLENDW instructions for integer blend operations when SSE4.1 is available. This removes a ton of domain crossing from blend code paths that were ending up in the floating point code path. This is just the tip of the iceberg though. The real switch is for integer blend lowering to more actively rely on this instruction being available so we don't hit shufps at all any longer. =] That will come in a follow-up patch. Another place where we need better support is for using PBLENDVB when doing so avoids the need to have two complementary PSHUFB masks. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@217767 91177308-0d34-0410-b5e6-96231b3b80d8	2014-09-15 12:40:54 +00:00
Chandler Carruth	2fdec16fbe	[x86] Teach the x86 DAG combiner to form UNPCKLPS and UNPCKHPS instructions from the relevant shuffle patterns. This is the last tweak I'm aware of to generate essentially perfect v4f32 and v2f64 shuffles with the new vector shuffle lowering up through SSE4.1. I'm sure I've missed some and it'd be nice to check since v4f32 is amenable to exhaustive exploration, but this is all of the tricks I'm aware of. With AVX there is a new trick to use the VPERMILPS instruction, that's coming up in a subsequent patch. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@217761 91177308-0d34-0410-b5e6-96231b3b80d8	2014-09-15 11:26:25 +00:00
Chandler Carruth	08780d4c1d	[x86] Teach the x86 DAG combiner to form MOVSLDUP and MOVSHDUP instructions when it finds an appropriate pattern. These are lovely instructions, and its a shame to not use them. =] They are fast, and can hand loads folded into their operands, etc. I've also plumbed the comment shuffle decoding through the various layers so that the test cases are printed nicely. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@217758 91177308-0d34-0410-b5e6-96231b3b80d8	2014-09-15 11:15:23 +00:00
Chandler Carruth	04402a6c13	[x86] Undo a flawed transform I added to form UNPCK instructions when AVX is available, and generally tidy up things surrounding UNPCK formation. Originally, I was thinking that the only advantage of PSHUFD over UNPCK instruction variants was its free copy, and otherwise we should use the shorter encoding UNPCK instructions. This isn't right though, there is a larger advantage of being able to fold a load into the operand of a PSHUFD. For UNPCK, the operand must be in a register so it can be the second input. This removes the UNPCK formation in the target-specific DAG combine for v4i32 shuffles. It also lifts the v8 and v16 cases out of the AVX-specific check as they are potentially replacing multiple instructions with a single instruction and so should always be valuable. The floating point checks are simplified accordingly. This also adjusts the formation of PSHUFD instructions to attempt to match the shuffle mask to one which would fit an UNPCK instruction variant. This was originally motivated to allow it to match the UNPCK instructions in the combiner, but clearly won't now. Eventually, we should add a MachineCombiner pass that can form UNPCK instructions post-RA when the operand is known to be in a register and thus there is no loss. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@217755 91177308-0d34-0410-b5e6-96231b3b80d8	2014-09-15 10:35:41 +00:00
Chandler Carruth	a6cc351c5b	[x86] Teach the new vector shuffle lowering to use 'punpcklwd' and 'punpckhwd' instructions when suitable rather than falling back to the generic algorithm. While we could canonicalize to these patterns late in the process, that wouldn't help when the freedom to use them is only visible during initial lowering when undef lanes are well understood. This, it turns out, is very important for matching the shuffle patterns that are used to lower sign extension. Fixes a small but relevant regression in gcc-loops with the new lowering. When I changed this I noticed that several 'pshufd' lowerings became unpck variants. This is bad because it removes the ability to freely copy in the same instruction. I've adjusted the widening test to handle undef lanes correctly and now those will correctly continue to use 'pshufd' to lower. However, this caused a bunch of churn in the test cases. No functional change, just churn. Both of these changes are part of addressing a general weakness in the new lowering -- it doesn't sufficiently leverage undef lanes. I've at least a couple of patches that will help there at least in an academic sense. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@217752 91177308-0d34-0410-b5e6-96231b3b80d8	2014-09-15 09:02:37 +00:00
Chandler Carruth	e610c324e1	[x86] Teach the new vector shuffle lowering to use BLENDPS and BLENDPD. These are super simple. They even take precedence over crazy instructions like INSERTPS because they have very high throughput on modern x86 chips. I still have to teach the integer shuffle variants about this to avoid so many domain crossings. However, due to the particular instructions available, that's a touch more complex and so a separate patch. Also, the backend doesn't seem to realize it can commute blend instructions by negating the mask. That would help remove a number of copies here. Suggestions on how to do this welcome, it's an area I'm less familiar with. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@217744 91177308-0d34-0410-b5e6-96231b3b80d8	2014-09-14 23:43:33 +00:00
Chandler Carruth	33957173a7	[x86] Teach the vector combiner that picks a canonical shuffle from to support transforming the forms from the new vector shuffle lowering to use 'movddup' when appropriate. A bunch of the cases where we actually form 'movddup' don't actually show up in the test results because something even later than DAG legalization maps them back to 'unpcklpd'. If this shows back up as a performance problem, I'll probably chase it down, but it is at least an encoded size loss. =/ To make this work, also always do this canonicalizing step for floating point vectors where the baseline shuffle instructions don't provide any free copies of their inputs. This also causes us to canonicalize unpck[hl]pd into mov{hl,lh}ps (resp.) which is a nice encoding space win. There is one test which is "regressed" by this: extractelement-load. There, the test case where the optimization it is testing fails, the exact instruction pattern which results is slightly different. This should probably be fixed by having the appropriate extract formed earlier in the DAG, but that would defeat the purpose of the test.... If this test case is critically important for anyone, please let me know and I'll try to work on it. The prior behavior was actually contrary to the comment in the test case and seems likely to have been an accident. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@217738 91177308-0d34-0410-b5e6-96231b3b80d8	2014-09-14 22:41:37 +00:00
Yaron Keren	0f39f35425	The MCAssembler.h include isn't used. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@217705 91177308-0d34-0410-b5e6-96231b3b80d8	2014-09-12 20:29:17 +00:00
Adam Nemet	49f31255be	[AVX512] Fix miscompile for unpack r189189 implemented AVX512 unpack by essentially performing a 256-bit unpack between the low and the high 256 bits of src1 into the low part of the destination and another unpack of the low and high 256 bits of src2 into the high part of the destination. I don't think that's how unpack works. AVX512 unpack simply has more 128-bit lanes but other than it works the same way as AVX. So in each 128-bit lane, we're always interleaving certain parts of both operands rather different parts of one of the operands. E.g. for this: __v16sf a = { 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15 }; __v16sf b = { 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31 }; __v16sf c = __builtin_shufflevector(a, b, 0, 8, 1, 9, 4, 12, 5, 13, 16, 24, 17, 25, 20, 28, 21, 29); we generated punpcklps (notice how the elements of a and b are not interleaved in the shuffle). In turn, c was set to this: 0 16 1 17 4 20 5 21 8 24 9 25 12 28 13 29 Obviously this should have just returned the mask vector of the shuffle vector. I mostly reverted this change and made sure the original AVX code worked for 512-bit vectors as well. Also updated the tests because they matched the logic from the code. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@217602 91177308-0d34-0410-b5e6-96231b3b80d8	2014-09-11 16:51:10 +00:00
Benjamin Kramer	db414b01a3	Move constant-sized bitvector to the stack. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@217600 91177308-0d34-0410-b5e6-96231b3b80d8	2014-09-11 15:58:39 +00:00
Sanjay Patel	87c977a52b	Rename getMaximumUnrollFactor -> getMaxInterleaveFactor; also rename option names controlling this variable. "Unroll" is not the appropriate name for this variable. Clang already uses the term "interleave" in pragmas and metadata for this. Differential Revision: http://reviews.llvm.org/D5066 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@217528 91177308-0d34-0410-b5e6-96231b3b80d8	2014-09-10 17:58:16 +00:00
Yuri Gorshenin	ca31084292	[asan-assembly-instrumentation] Added CFI directives to the generated instrumentation code. Summary: [asan-assembly-instrumentation] Added CFI directives to the generated instrumentation code. Reviewers: eugenis Subscribers: llvm-commits Differential Revision: http://reviews.llvm.org/D5189 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@217482 91177308-0d34-0410-b5e6-96231b3b80d8	2014-09-10 09:45:49 +00:00
Sanjay Patel	a9d7398280	Add a scheduling model for AMD 16H Jaguar (btver2). This is a first pass at a scheduling model for Jaguar. It's structured largely on the existing SandyBridge and SLM sched models. Using this model, in addition to turning on the PostRA scheduler, results in some perf wins on internal and 3rd party benchmarks. There's not much difference in LLVM's test-suite benchmarking subset of tests. Differential Revision: http://reviews.llvm.org/D5229 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@217457 91177308-0d34-0410-b5e6-96231b3b80d8	2014-09-09 20:07:07 +00:00
Pavel Chupin	586994a74e	[x32] Emit callq for CALLpcrel32 Summary: In AT&T annotation for both x86_64 and x32 calls should be printed as callq in assembly. It's only a matter of correct mnemonic, object output is ok. Test Plan: trivial test added Reviewers: nadav, dschuff, craig.topper Subscribers: llvm-commits, zinovy.nis Differential Revision: http://reviews.llvm.org/D5213 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@217435 91177308-0d34-0410-b5e6-96231b3b80d8	2014-09-09 11:54:12 +00:00
Bob Wilson	086832979b	Set trunc store action to Expand for all X86 targets. When compiling without SSE2, isTruncStoreLegal(F64, F32) would return Legal, whereas with SSE2 it would return Expand. And since the Target doesn't seem to actually handle a truncstore for double -> float, it would just output a store of a full double in the space for a float hence overwriting other bits on the stack. Patch by Luqman Aden! git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@217410 91177308-0d34-0410-b5e6-96231b3b80d8	2014-09-09 01:13:36 +00:00
Chandler Carruth	8ceea90956	[x86] Revert my over-eager commit in r217332. I hadn't actually run all the tests yet and these combines have somewhat surprisingly far reaching effects. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@217333 91177308-0d34-0410-b5e6-96231b3b80d8	2014-09-07 12:37:11 +00:00
Chandler Carruth	e328c5ea83	[x86] Tweak the rules surrounding 0,0 and 1,1 v2f64 shuffles and add support for MOVDDUP which is really important for matrix multiply style operations that do lots of non-vector-aligned load and splats. The original motivation was to add support for MOVDDUP as the lack of it regresses matmul_f64_4x4 by 5% or so. However, all of the rules here were somewhat suspicious. First, we should always be using the floating point domain shuffles, regardless of how many copies we have to make as a movapd is crazy faster than the domain switching cost on some chips. (Mostly because movapd is crazy cheap.) Because SHUFPD can't do the copy-for-free trick of the PSHUF instructions, there is no need to avoid canonicalizing on UNPCK variants, so do that canonicalizing. This also ensures we have the chance to form MOVDDUP. =] Second, we assume SSE2 support when doing any vector lowering, and given that we should just use UNPCKLPD and UNPCKHPD as they can operate on registers or memory. If vectors get spilled or come from memory at all this is going to allow the load to be folded into the operation. If we want to optimize for encoding size (the only difference, and only a 2 byte difference) it should be done much later, likely after RA. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@217332 91177308-0d34-0410-b5e6-96231b3b80d8	2014-09-07 12:02:14 +00:00
Chandler Carruth	7cd7154421	[x86] Fix a pretty horrible bug and inconsistency in the x86 asm parsing (and latent bug in the instruction definitions). This is effectively a revert of r136287 which tried to address a specific and narrow case of immediate operands failing to be accepted by x86 instructions with a pretty heavy hammer: it introduced a new kind of operand that behaved differently. All of that is removed with this commit, but the test cases are both preserved and enhanced. The core problem that r136287 and this commit are trying to handle is that gas accepts both of the following instructions: insertps $192, %xmm0, %xmm1 insertps $-64, %xmm0, %xmm1 These will encode to the same byte sequence, with the immediate occupying an 8-bit entry. The first form was fixed by r136287 but that broke the prior handling of the second form! =[ Ironically, we would still emit the second form in some cases and then be unable to re-assemble the output. The reason why the first instruction failed to be handled is because prior to r136287 the operands ere marked 'i32i8imm' which forces them to be sign-extenable. Clearly, that won't work for 192 in a single byte. However, making thim zero-extended or "unsigned" doesn't really address the core issue either because it breaks negative immediates. The correct fix is to make these operands 'i8imm' reflecting that they can be either signed or unsigned but must be 8-bit immediates. This patch backs out r136287 and then changes those places as well as some others to use 'i8imm' rather than one of the extended variants. Naturally, this broke something else. The custom DAG nodes had to be updated to have a much more accurate type constraint of an i8 node, and a bunch of Pat immediates needed to be specified as i8 values. The fallout didn't end there though. We also then ceased to be able to match the instruction-specific intrinsics to the instructions so modified. Digging, this is because they too used i32 rather than i8 in their signature. So I've also switched those intrinsics to i8 arguments in line with the instructions. In order to make the intrinsic adjustments of course, I also had to add auto upgrading for the intrinsics. I suspect that the intrinsic argument types may have led everything down this rabbit hole. Pretty happy with the result. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@217310 91177308-0d34-0410-b5e6-96231b3b80d8	2014-09-06 10:00:01 +00:00
Chandler Carruth	469c73bc27	[x86] Fix an embarressing bug in the INSERTPS formation code. The mask computation was totally wrong, but somehow it didn't really show up with llc. I've added an assert that triggers on multiple existing test cases and updated one of them to show the correct value. There appear to still be more bugs lurking around insertps's mask. =/ However, note that this only really impacts the new vector shuffle lowering. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@217289 91177308-0d34-0410-b5e6-96231b3b80d8	2014-09-05 23:19:45 +00:00
Chandler Carruth	c1c5dcf069	[x86] Factor out the zero vector insertion logic in the new vector shuffle lowering for integer vectors and share it from v4i32, v8i16, and v16i8 code paths. Ironically, the SSE2 v16i8 code for this is now better than the SSSE3! =] Will have to fix the SSSE3 code next to just using a single pshufb. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@217240 91177308-0d34-0410-b5e6-96231b3b80d8	2014-09-05 10:36:31 +00:00
Reid Kleckner	f2cdc0b1e9	X86: cpuid and xgetbv write to 32-bit registers, not 64-bit This fixes an issue where MS inline assembly containing xgetbv wouldn't be marked as clobbering EAX:EDX. Test for that forthcoming on the Clang side. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@217173 91177308-0d34-0410-b5e6-96231b3b80d8	2014-09-04 16:58:25 +00:00
Chandler Carruth	ae98867126	[x86] Teach the new v4i32 shuffle lowering some more tricks to recognize vzext patterns and insert-element patterns that for SSE4 have dedicated instructions. With this we can enable the experimental mode in a regression test that happens to cover some of the past set of issues. You can see that the new logic does significantly better here on the floating point cases. A follow-up to this change and the previous ones will hoist the logic into helpers so it can be shared across element type sizes as in this particular case it generalizes cleanly. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@217136 91177308-0d34-0410-b5e6-96231b3b80d8	2014-09-04 09:26:30 +00:00
Elena Demikhovsky	a91600713d	Fixed compilation problem on Windows (initialization of non-aggregate type). After commit 217131. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@217134 91177308-0d34-0410-b5e6-96231b3b80d8	2014-09-04 07:20:39 +00:00
Elena Demikhovsky	df1bc5a200	X86 Intrinsics table - changed to a static table sorted by intrinsic id. Used binary search over the tables. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@217131 91177308-0d34-0410-b5e6-96231b3b80d8	2014-09-04 06:34:34 +00:00
Chandler Carruth	fa2dfaedf2	[x86] Teach the new vector shuffle lowering about the zero masking abilities of INSERTPS which are really powerful and come up in very important contexts such as forming diagonal matrices, etc. With this I ended up being able to remove the somewhat weird helper I added for INSERTPS because we can collapse the entire state to a no-op mask. Added a bunch of tests for inserting into a zero-ish vector. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@217117 91177308-0d34-0410-b5e6-96231b3b80d8	2014-09-04 01:13:48 +00:00
Chandler Carruth	699fd1909e	[x86] Teach the new vector shuffle lowering about the simplest of 'insertps' patterns. This replaces two shuffles with a single insertps in very common cases. My next patch will extend this to leverage the zeroing capabilities of insertps which will allow it to be used in a much wider set of cases. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@217100 91177308-0d34-0410-b5e6-96231b3b80d8	2014-09-03 22:48:34 +00:00
Chandler Carruth	5f209637c4	[x86] Teach the asm comment printing to only print the clarification of an immediate operand when we don't have instruction-specific comments. This ensures that instruction-specific comments are attached to the same line as the instruction which is important for using them to write readable and maintainable tests. My next commit will just such a test. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@217099 91177308-0d34-0410-b5e6-96231b3b80d8	2014-09-03 22:46:44 +00:00
Juergen Ributzka	ecadea992a	[FastISel][tblgen] Rename tblgen generated FastISel functions. NFC. This is the final round of renaming. This changes tblgen to emit lower-case function names for FastEmitInst_* and FastEmit_*, and updates all its uses in the source code. Reviewed by Eric git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@217075 91177308-0d34-0410-b5e6-96231b3b80d8	2014-09-03 20:56:59 +00:00
Juergen Ributzka	6042034603	[FastISel] Rename public visible FastISel functions. NFC. This commit renames the following public FastISel functions: LowerArguments -> lowerArguments SelectInstruction -> selectInstruction TargetSelectInstruction -> fastSelectInstruction FastLowerArguments -> fastLowerArguments FastLowerCall -> fastLowerCall FastLowerIntrinsicCall -> fastLowerIntrinsicCall FastEmitZExtFromI1 -> fastEmitZExtFromI1 FastEmitBranch -> fastEmitBranch UpdateValueMap -> updateValueMap TargetMaterializeConstant -> fastMaterializeConstant TargetMaterializeAlloca -> fastMaterializeAlloca TargetMaterializeFloatZero -> fastMaterializeFloatZero LowerCallTo -> lowerCallTo Reviewed by Eric git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@217074 91177308-0d34-0410-b5e6-96231b3b80d8	2014-09-03 20:56:52 +00:00
Eric Christopher	5b7ae59f6d	Remove resetSubtargetFeatures as it is unused. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@217071 91177308-0d34-0410-b5e6-96231b3b80d8	2014-09-03 20:36:31 +00:00
Eric Christopher	c24df453b0	Remove unnecessary getTarget call now that the subtarget is cached on the machine function. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@217070 91177308-0d34-0410-b5e6-96231b3b80d8	2014-09-03 20:36:26 +00:00
Alexander Potapenko	42ebff8c99	Follow-up for r217020: actually commit the fix for PR20800, revert the accidentally committed changes to LLVMSymbolize.cpp git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@217021 91177308-0d34-0410-b5e6-96231b3b80d8	2014-09-03 07:37:20 +00:00
Eric Christopher	d5dd8ce2a5	Reinstate "Nuke the old JIT." Approved by Jim Grosbach, Lang Hames, Rafael Espindola. This reinstates commits r215111, 215115, 215116, 215117, 215136. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@216982 91177308-0d34-0410-b5e6-96231b3b80d8	2014-09-02 22:28:02 +00:00
Robin Morisset	76b55cc4b1	[X86] Allow atomic operations using immediates to avoid using a register The only valid lowering of atomic stores in the X86 backend was mov from register to memory. As a result, storing an immediate required a useless copy of the immediate in a register. Now these can be compiled as a simple mov. Similarily, adding/and-ing/or-ing/xor-ing an immediate to an atomic location (but through an atomic_store/atomic_load, not a fetch_whatever intrinsic) can now make use of an 'add $imm, x(%rip)' instead of using a register. And the same applies to inc/dec. This second point matches the first issue identified in http://llvm.org/bugs/show_bug.cgi?id=17281 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@216980 91177308-0d34-0410-b5e6-96231b3b80d8	2014-09-02 22:16:29 +00:00
Sanjay Patel	96b466c066	Refactor LowerFABS and LowerFNEG into one function (x86) (NFC) We duplicate ~30 lines of code to lower FABS and FNEG for x86, so this patch combines them into one function. No functional change intended, so no additional test cases. Test-suite behavior is unchanged. Differential Revision: http://reviews.llvm.org/D5064 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@216942 91177308-0d34-0410-b5e6-96231b3b80d8	2014-09-02 20:24:47 +00:00
Reid Kleckner	f93099eb1c	CodeGen: Handle va_start in the entry block Also fix a small copy-paste bug in X86ISelLowering where Chain should have been used in place of DAG.getEntryToken(). Fixes PR20828. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@216929 91177308-0d34-0410-b5e6-96231b3b80d8	2014-09-02 18:42:44 +00:00
Saleem Abdulrasool	901a3419d1	CodeGen: indicate Windows unwind data format The structures for Windows unwinding are shared across multiple platforms. Indicate the encoding to be used for the particular target. Use this to switch the unwind emitter instantiated by the AsmPrinter. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@216895 91177308-0d34-0410-b5e6-96231b3b80d8	2014-09-01 23:48:39 +00:00

1 2 3 4 5 ...

10673 Commits