RPCS3/llvm - llvm - Gitea: Git with a cup of tea

RPCS3/llvm

mirror of https://github.com/RPCS3/llvm.git synced 2025-01-20 08:54:08 +00:00

Author	SHA1	Message	Date
Andrea Di Biagio	a1e1f01699	[X86] Improved target specific combine on VSELECT dag nodes. This patch teaches function 'transformVSELECTtoBlendVECTOR_SHUFFLE' how to convert VSELECT dag nodes to shuffles on targets that do not have SSE4.1. On pre-SSE4.1 targets, we can still perform blend operations using movss/movsd. Also, removed a target specific combine that performed a premature lowering of VSELECT nodes to target specific MOVSS/MOVSD nodes. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@222647 91177308-0d34-0410-b5e6-96231b3b80d8	2014-11-24 12:23:15 +00:00
Michael Kuperstein	d539147834	[X86] Fixes bug in build_vector v4x32 lowering r222375 made some improvements to build_vector lowering of v4x32 and v4xf32 into an insertps, but it missed a case where: 1. A single extracted element is used twice. 2. The lower of the two non-zero indexes should be preserved, and the higher should be used for the dest mask. This caused a crash, since the source value for the insertps ends-up uninitialized. Differential Revision: http://reviews.llvm.org/D6377 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@222635 91177308-0d34-0410-b5e6-96231b3b80d8	2014-11-23 13:09:06 +00:00
Elena Demikhovsky	ae1ae2c3a1	Masked Vector Load and Store Intrinsics. Introduced new target-independent intrinsics in order to support masked vector loads and stores. The loop vectorizer optimizes loops containing conditional memory accesses by generating these intrinsics for existing targets AVX2 and AVX-512. The vectorizer asks the target about availability of masked vector loads and stores. Added SDNodes for masked operations and lowering patterns for X86 code generator. Examples: <16 x i32> @llvm.masked.load.v16i32(i8* %addr, <16 x i32> %passthru, i32 4 /* align /, <16 x i1> %mask) declare void @llvm.masked.store.v8f64(i8 %addr, <8 x double> %value, i32 4, <8 x i1> %mask) Scalarizer for other targets (not AVX2/AVX-512) will be done in a separate patch. http://reviews.llvm.org/D6191 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@222632 91177308-0d34-0410-b5e6-96231b3b80d8	2014-11-23 08:07:43 +00:00
Chandler Carruth	06a07dadb9	[x86] Add some tests for a common unpack pattern of vector shuffle that has a remarkably unique and efficient lowering. While we get this some of the time already, we miss a few cases and there wasn't a principled reason we got it. We should at least test this. v8 already has tests for this pattern. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@222607 91177308-0d34-0410-b5e6-96231b3b80d8	2014-11-22 05:44:43 +00:00
Sanjay Patel	28660d4b2f	Add a feature flag for slow 32-byte unaligned memory accesses [x86]. This patch adds a feature flag to avoid unaligned 32-byte load/store AVX codegen for Sandy Bridge and Ivy Bridge. There is no functionality change intended for those chips. Previously, the absence of AVX2 was being used as a proxy to detect this feature. But that hindered codegen for AVX-enabled AMD chips such as btver2 that do not have the 32-byte unaligned access slowdown. Performance measurements are included in PR21541 ( http://llvm.org/bugs/show_bug.cgi?id=21541 ). Differential Revision: http://reviews.llvm.org/D6355 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@222544 91177308-0d34-0410-b5e6-96231b3b80d8	2014-11-21 17:40:04 +00:00
Chandler Carruth	46c5a97adc	[x86] Restructure the checking patterns for v16 and v32 avx2 vector shuffle lowering to allow much better blend matching. Specifically, with the new structure the code seems clearer to me and we correctly can hit the cases where merging two 128-bit lanes is a clear win and can be shuffled cheaply afterward. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@222539 91177308-0d34-0410-b5e6-96231b3b80d8	2014-11-21 14:53:03 +00:00
Chandler Carruth	0889d65fd5	[x86] Make the previous logic significantly less conservative and get a bunch more improvements. Non-lane-crossing is fine, the key is that lane merging only makes sense for single-input shuffles. Not sure why I got so turned around here. The code all works, I was just using the wrong model for it. This only updates v4 and v8 lowering. The v16 and v32 lowering requires restructuring the entire check sequence. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@222537 91177308-0d34-0410-b5e6-96231b3b80d8	2014-11-21 14:33:24 +00:00
Andrea Di Biagio	607099b697	[DAG] Teach how to turn a build_vector into a shuffle if some of the operands are zero. Before this patch, the DAGCombiner only tried to convert build_vector dag nodes into shuffles if all operands were either extract_vector_elt or undef. This patch improves that logic and teaches the DAGCombiner how to deal with build_vector dag nodes where one or more operands are zero. A build_vector dag node with some zero operands is turned into a shuffle only if the resulting shuffle mask is legal for the target. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@222536 91177308-0d34-0410-b5e6-96231b3b80d8	2014-11-21 14:32:06 +00:00
Chandler Carruth	bd357588a1	[x86] Teach the x86 vector shuffle lowering to detect mergable 128-bit lanes. By special casing these we can often either reduce the total number of shuffles significantly or reduce the number of (high latency on Haswell) AVX2 shuffles that potentially cross 128-bit lanes. Even when these don't actually cross lanes, they have much higher latency to support that. Doing two of them and a blend is worse than doing a single insert across the 128-bit lanes to blend and then doing a single interleaved shuffle. While this seems like a narrow case, it kept cropping up on me and the difference is huge as you can see in many of the test cases. I first hit this trying to perfectly fix the interleaving shuffle patterns used by Halide for AVX2. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@222533 91177308-0d34-0410-b5e6-96231b3b80d8	2014-11-21 13:56:05 +00:00
Chandler Carruth	a5f4576510	[x86] Remove more windows line endings that slipped into this file... git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@222528 91177308-0d34-0410-b5e6-96231b3b80d8	2014-11-21 12:33:46 +00:00
Chandler Carruth	d8d3a957d8	[x86] Add a bunch of test cases to 256-bit shuffles that exercise merging 128-bit subvectors and also shuffling all the elements of those subvectors. Currently we generate pretty bad code for many of these, but I'm testing a patch that should dramatically improve this in addition to making the shuffle lowering robust to other changes. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@222525 91177308-0d34-0410-b5e6-96231b3b80d8	2014-11-21 12:17:50 +00:00
Alexey Volkov	d0d0424368	[X86] For Silvermont CPU use 16-bit division instead of 64-bit for small positive numbers Differential Revision: http://reviews.llvm.org/D5938 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@222521 91177308-0d34-0410-b5e6-96231b3b80d8	2014-11-21 11:19:34 +00:00
Quentin Colombet	c91f34ae54	[X86] Do not custom lower UINT_TO_FP when the target type does not match the custom lowering. <rdar://problem/19026326> git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@222489 91177308-0d34-0410-b5e6-96231b3b80d8	2014-11-21 00:47:19 +00:00
Saleem Abdulrasool	e6c1fc9a44	X86: use the correct alloca symbol for Windows Itanium Windows itanium targets the MSVCRT, and the stack probe symbol is provided by MSVCRT. This corrects the emission of stack probes on i686-windows-itanium. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@222439 91177308-0d34-0410-b5e6-96231b3b80d8	2014-11-20 18:01:26 +00:00
Andrea Di Biagio	53daaff125	[X86] Improved lowering of v4x32 build_vector dag nodes. This patch improves the lowering of v4f32 and v4i32 build_vector dag nodes that are known to have at least two non-zero elements. With this patch, a build_vector that performs a blend with zero is converted into a shuffle. This is done to let the shuffle legalizer expand the dag node in a optimal way. For example, if we know that a build_vector performs a blend with zero, we can try to lower it as a movq/blend instead of always selecting an insertps. This patch also improves the logic that lowers a build_vector into a insertps with zero masking. See for example the extra test cases added to test sse41.ll. Differential Revision: http://reviews.llvm.org/D6311 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@222375 91177308-0d34-0410-b5e6-96231b3b80d8	2014-11-19 19:34:29 +00:00
Simon Pilgrim	a6943fff90	[X86][SSE] pslldq/psrldq byte shifts/rotation for SSE2 This patch builds on http://reviews.llvm.org/D5598 to perform byte rotation shuffles (lowerVectorShuffleAsByteRotate) on pre-SSSE3 (palignr) targets - pre-SSSE3 is only enabled on i8 and i16 vector targets where it is a more definite performance gain. I've also added a separate byte shift shuffle (lowerVectorShuffleAsByteShift) that makes use of the ability of the SLLDQ/SRLDQ instructions to implicitly shift in zero bytes to avoid the need to create a zero register if we had used palignr. Differential Revision: http://reviews.llvm.org/D5699 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@222340 91177308-0d34-0410-b5e6-96231b3b80d8	2014-11-19 10:06:49 +00:00
Simon Pilgrim	e6d1a2625f	[X86][AVX] 256-bit vector stack unaligned load/stores identification Under many circumstances the stack is not 32-byte aligned, resulting in the use of the vmovups/vmovupd/vmovdqu instructions when inserting ymm reloads/spills. This minor patch adds these instructions to the isFrameLoadOpcode/isFrameStoreOpcode helpers so that they can be correctly identified and not be treated as folded reloads/spills. This has also been noticed by http://llvm.org/bugs/show_bug.cgi?id=18846 where it was causing redundant spills - I've added a reduced test case at test/CodeGen/X86/pr18846.ll Differential Revision: http://reviews.llvm.org/D6252 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@222281 91177308-0d34-0410-b5e6-96231b3b80d8	2014-11-18 23:38:19 +00:00
Alexey Volkov	19e8fe05dc	[X86] Use ADD/SUB instead of INC/DEC for Haswell and Broadwell CPUs Differential Revision: http://reviews.llvm.org/D5934 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@222141 91177308-0d34-0410-b5e6-96231b3b80d8	2014-11-17 16:17:51 +00:00
Bob Wilson	17e95ead36	Fix CR/LF line endings in test case. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@222120 91177308-0d34-0410-b5e6-96231b3b80d8	2014-11-17 08:00:45 +00:00
Andrea Di Biagio	37f645cb34	[DAG] Improved target independent vector shuffle folding logic. This patch teaches the DAGCombiner how to combine shuffles according to rules: shuffle(shuffle(A, Undef, M0), B, M1) -> shuffle(B, A, M2) shuffle(shuffle(A, B, M0), B, M1) -> shuffle(B, A, M2) shuffle(shuffle(A, B, M0), A, M1) -> shuffle(B, A, M2) git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@222090 91177308-0d34-0410-b5e6-96231b3b80d8	2014-11-15 22:56:25 +00:00
Simon Pilgrim	01e39346f3	[X86][SSE] Improve legal SHUFP and PSHUFD shuffle matching Updated X86TargetLowering::isShuffleMaskLegal to match SHUFP masks with commuted inputs and PSHUFD masks that reference the second input. As part of this I've refactored isPSHUFDMask to work in a more general manner and allow it to match against either the first or second input vector. Differential Revision: http://reviews.llvm.org/D6287 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@222087 91177308-0d34-0410-b5e6-96231b3b80d8	2014-11-15 21:13:05 +00:00
Cameron McInally	b3625eb445	[AVX512] Add 512b masked integer shift by immediate patterns. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@222002 91177308-0d34-0410-b5e6-96231b3b80d8	2014-11-14 15:43:00 +00:00
Tim Northover	4a7bbf4c29	X86: use getConstant rather than getTargetConstant behind BUILD_VECTOR. getTargetConstant should only be used when you can guarantee the instruction selected will be able to cope with the raw value. BUILD_VECTOR is rather too generic for this so we should use getConstant instead. In that case, an instruction can still consume the constant, but if it doesn't it'll be materialised through its own round of ISel. Should fix PR21352. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@221961 91177308-0d34-0410-b5e6-96231b3b80d8	2014-11-14 01:30:14 +00:00
Reid Kleckner	98c86d76df	Allow the use of functions as typeinfo in landingpad clauses This is one step towards supporting SEH filter functions in LLVM. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@221954 91177308-0d34-0410-b5e6-96231b3b80d8	2014-11-14 00:35:50 +00:00
Chandler Carruth	a5408b9c7c	[x86] Add some tests for specific patterns of lane-flips combined with in-lane shuffles that aren't always handled well by the current vector shuffle lowering. No functionality change yet, that will follow in a subsequent commit. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@221938 91177308-0d34-0410-b5e6-96231b3b80d8	2014-11-13 22:49:44 +00:00
Elena Demikhovsky	18e1185ddf	AVX-512: SINT_TO_FP cost model and some bugfixes Checked some corner cases, for example translation of <8 x i1> to <8 x double> git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@221883 91177308-0d34-0410-b5e6-96231b3b80d8	2014-11-13 11:46:16 +00:00
Chandler Carruth	4ea3097d08	[x86] Teach the vector shuffle lowering to make a more nuanced decision between splitting a vector into 128-bit lanes and recombining them vs. decomposing things into single-input shuffles and a final blend. This handles a large number of cases in AVX1 where the cross-lane shuffles would be much more expensive to represent even though we end up with a fast blend at the root. Instead, we can do a better job of shuffling in a single lane and then inserting it into the other lanes. This fixes the remaining bits of Halide's regression captured in PR21281 for AVX1. However, the bug persists in AVX2 because I've made this change reasonably conservative. The cases where it makes sense in AVX2 to split into 128-bit lanes are much more rare because we can often do full permutations across all elements of the 256-bit vector. However, the particular test case in PR21281 is an example of one of the rare cases where it is always better to work in a single 128-bit lane. I'm going to try to teach the logic to detect and form the good code even in AVX2 next, but it will need to use a separate heuristic. Finally, there is one pesky regression here where we previously would craftily use vpermilps in AVX1 to shuffle both high and low halves at the same time. We no longer pull that off, and not for any really good reason. Ultimately, I think this is just another missing nuance to the selection heuristic that I'll try to add in afterward, but this change already seems strictly worth doing considering the magnitude of the improvements in common matrix math shuffle patterns. As always, please let me know if this causes a surprising regression for you. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@221861 91177308-0d34-0410-b5e6-96231b3b80d8	2014-11-13 04:06:10 +00:00
Chandler Carruth	927a5f45e0	[x86] Don't form overly fragmented blends when splitting and re-combining shuffles because nothing was available in the wider vector type. The key observation (which I've put in the comments for future maintainers) is that at this point, no further combining is really possible. And so even though these shuffles trivially could be combined, we need to actually do that as we produce them when producing them this late in the lowering. This fixes another (huge) part of the Halide vector shuffle regressions. As it happens, this was already well covered by the tests, but I hadn't noticed how bad some of these got. The specific patterns that turn directly into unpckl/h patterns were occurring many times in common vector processing code. There are still more problems here sadly, but trying to incrementally tease them apart and it looks like this is the core of the problem in the splitting logic. There is some chance of regression here, you can see it in the test changes. Specifically, where we stop forming pshufb in some cases, it is possible that pshufb was in fact faster. Intel "says" that pshufb is slower than the instruction sequences replacing it. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@221852 91177308-0d34-0410-b5e6-96231b3b80d8	2014-11-13 02:42:08 +00:00
Quentin Colombet	e8a8deab8c	[CodeGenPrepare] Handle zero extensions in the TypePromotionHelper. Prior to this patch the TypePromotionHelper was promoting only sign extensions. Supporting zero extensions changes: - How constants are extended. - How sign extensions, zero extensions, and truncate are composed together. - How the type of the extended operation is recorded. Now we need to know the kind of the extension as well as its type. Each change is fairly small, unlike the diff. Most of the diff are comments/variable renaming to say "extension" instead of "sign extension". The performance improvements on the test suite are within the noise. Related to <rdar://problem/18310086>. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@221851 91177308-0d34-0410-b5e6-96231b3b80d8	2014-11-13 01:44:51 +00:00
Sanjay Patel	dab91bcc3a	Expose the number of Newton-Raphson iterations applied to the hardware's reciprocal estimate as a parameter (x86). This is a follow-on to r221706 and r221731 and discussed in more detail in PR21385. This patch also loosens the testcase checking for btver2. We know that the "1.0" will be loaded, but we can't tell exactly when, so replace the CHECK-NEXT specifiers with plain CHECKs. The CHECK-NEXT sequence relied on a quirk of post-RA-scheduling that may change independently of anything in these tests. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@221819 91177308-0d34-0410-b5e6-96231b3b80d8	2014-11-12 21:39:01 +00:00
Cameron McInally	be30336912	[AVX512] Add integer shift by immediate intrinsics. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@221811 91177308-0d34-0410-b5e6-96231b3b80d8	2014-11-12 19:58:54 +00:00
Chandler Carruth	556578ec0c	[x86] Start improving the matching of unpck instructions based on test cases from Halide folks. This initial step was extracted from a prototype change by Clay Wood to try and address regressions found with Halide and the new vector shuffle lowering. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@221779 91177308-0d34-0410-b5e6-96231b3b80d8	2014-11-12 10:05:18 +00:00
Chandler Carruth	3baea18935	[x86] Clean up a bunch of vector shuffle tests with my script. Notably, removes windows line endings and other noise. This is in prelude to making substantive changes to these tests. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@221776 91177308-0d34-0410-b5e6-96231b3b80d8	2014-11-12 09:17:15 +00:00
Elena Demikhovsky	5f9c438577	AVX-512: Intrinsics for ERI 3 instructions: vrcp28, vrsqrt28, vexp2, only vector forms. Intrinsics include SAE (Suppres All Exceptions) parameter. http://reviews.llvm.org/D6214 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@221774 91177308-0d34-0410-b5e6-96231b3b80d8	2014-11-12 07:31:03 +00:00
Tom Roeder	63dea2c952	Add Forward Control-Flow Integrity. This commit adds a new pass that can inject checks before indirect calls to make sure that these calls target known locations. It supports three types of checks and, at compile time, it can take the name of a custom function to call when an indirect call check fails. The default failure function ignores the error and continues. This pass incidentally moves the function JumpInstrTables::transformType from private to public and makes it static (with a new argument that specifies the table type to use); this is so that the CFI code can transform function types at call sites to determine which jump-instruction table to use for the check at that site. Also, this removes support for jumptables in ARM, pending further performance analysis and discussion. Review: http://reviews.llvm.org/D4167 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@221708 91177308-0d34-0410-b5e6-96231b3b80d8	2014-11-11 21:08:02 +00:00
Sanjay Patel	e7c966f067	Use rcpss/rcpps (X86) to speed up reciprocal calcs (PR21385). This is a first step for generating SSE rcp instructions for reciprocal calcs when fast-math allows it. This is very similar to the rsqrt optimization enabled in D5658 ( http://reviews.llvm.org/rL220570 ). For now, be conservative and only enable this for AMD btver2 where performance improves significantly both in terms of latency and throughput. We may never enable this codegen for Intel Core* chips because the divider circuits are just too fast. On SandyBridge, divss can be as fast as 10 cycles versus the 21 cycle critical path for the rcp + mul + sub + mul + add estimate. Follow-on patches may allow configuration of the number of Newton-Raphson refinement steps, add AVX512 support, and enable the optimization for more chips. More background here: http://llvm.org/bugs/show_bug.cgi?id=21385 Differential Revision: http://reviews.llvm.org/D6175 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@221706 91177308-0d34-0410-b5e6-96231b3b80d8	2014-11-11 20:51:00 +00:00
Rafael Espindola	612f7d7e00	Simplify testcase. NFC. Thanks to Filipe Cabecinhas for the tip. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@221705 91177308-0d34-0410-b5e6-96231b3b80d8	2014-11-11 20:49:16 +00:00
Rafael Espindola	71c70733b7	Use a 8 bit immediate when possible. This fixes pr21529. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@221700 91177308-0d34-0410-b5e6-96231b3b80d8	2014-11-11 19:46:36 +00:00
Dario Domizioli	949d328bee	[X86][ELF] Fix PR20243 - leaf frame pointer bug with TLS access The ISel lowering for global TLS access in PIC mode was creating a pseudo instruction that is later expanded to a call, but the code was not setting the hasCalls flag in the MachineFrameInfo alongside the adjustsStack flag. This caused some functions to be mistakenly recognized as leaf functions, and this in turn affected the decision to eliminate the frame pointer. With the fix, hasCalls is properly set and the leaf frame pointer is correctly preserved. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@221695 91177308-0d34-0410-b5e6-96231b3b80d8	2014-11-11 18:44:49 +00:00
Andrea Di Biagio	d6548ad013	[X86] Add missing check for 'isINSERTPSMask' in method 'isShuffleMaskLegal'. This helps the DAGCombiner to identify more opportunities to fold shuffles. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@221684 91177308-0d34-0410-b5e6-96231b3b80d8	2014-11-11 11:20:31 +00:00
Michael Kuperstein	f2fe3b72a9	[X86] Fix pattern match for 32-to-64-bit zext in the presence of AssertSext This fixes an issue with matching trunc -> assertsext -> zext on x86-64, which would not zero the high 32-bits. See PR20494 for details. Recommitting - This time, with a hopefully working test. Differential Revision: http://reviews.llvm.org/D6128 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@221672 91177308-0d34-0410-b5e6-96231b3b80d8	2014-11-11 07:07:40 +00:00
Quentin Colombet	8201185d61	[X86] Custom lower UINT_TO_FP from v4f32 to v4i32, and for v8f32 to v8i32 if AVX2 is available. According to IACA, the new lowering has a throughput of 8 cycles instead of 13 with the previous one. Althought this lowering kicks in some SPECs benchmarks, the performance improvement was within the noise. Correctness testing has been done for the whole range of uint32_t with the following program: uint4 v = (uint4) {0,1,2,3}; uint32_t i; //Check correctness over entire range for uint4 -> float4 conversion for( i = 0; i < 1U << (32-2); i++ ) { float4 t = test(v); float4 c = correct(v); if( 0xf != _mm_movemask_ps( t == c )) { printf( "Error @ %vx: %vf vs. %vf\n", v, c, t); return -1; } v += 4; } Where "correct" is the old lowering and "test" the new one. The patch adds a test case for the two custom lowering instruction. It also modifies the vector cost model, which is why cast.ll and uitofp.ll are modified. 2009-02-26-MachineLICMBug.ll is also modified because we now hoist 7 instructions instead of 4 (3 more constant loads). rdar://problem/18153096> git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@221657 91177308-0d34-0410-b5e6-96231b3b80d8	2014-11-11 02:23:47 +00:00
Michael Kuperstein	dee48e7ad4	Reverting r221626 due to a too-strict test. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@221629 91177308-0d34-0410-b5e6-96231b3b80d8	2014-11-10 21:07:41 +00:00
Michael Kuperstein	1a66dc7468	[X86] Fix pattern match for 32-to-64-bit zext in the presence of AssertSext This fixes an issue with matching trunc -> assertsext -> zext on x86-64, which would not zero the high 32-bits. See PR20494 for details. Differential Revision: http://reviews.llvm.org/D6128 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@221626 91177308-0d34-0410-b5e6-96231b3b80d8	2014-11-10 20:40:21 +00:00
Simon Pilgrim	de3d50643c	[X86][SSE] Vector integer/float conversion memory folding (cvttps2dq / cvttpd2dq) Fixed an issue with the (v)cvttps2dq and (v)cvttpd2dq instructions being incorrectly put in the 2 source operand folding tables instead of the 1 source operand and added the missing SSE/AVX versions. Also added missing (v)cvtps2dq and (v)cvtpd2dq instructions to the folding tables. Differential Revision: http://reviews.llvm.org/D6001 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@221489 91177308-0d34-0410-b5e6-96231b3b80d8	2014-11-06 22:15:41 +00:00
Ahmed Bougacha	112aabeeeb	[X86] Add VFMADDSUB cases for the 213->231 custom inserter. Also add tests for vfmadd/vfmsub. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@221488 91177308-0d34-0410-b5e6-96231b3b80d8	2014-11-06 22:04:15 +00:00
Ahmed Bougacha	f44d4cd925	[X86] Add missing FMA3 VFMADDSUB in the emitter. Also reuse the fma4 intrinsic test to cover fma3 instructions too. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@221487 91177308-0d34-0410-b5e6-96231b3b80d8	2014-11-06 21:58:11 +00:00
Ahmed Bougacha	8b6319bfed	[X86] Split FMA4 RM tests into a separate file. NFC. While there, remove useless comments. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@221484 91177308-0d34-0410-b5e6-96231b3b80d8	2014-11-06 21:46:23 +00:00
Rafael Espindola	eed959015b	Compute the correct jump table entries on 32 bit windows. On 32 bit windows we use label differences and .set does not suppress rolocations, a combination that was not used before r220256. This fixes PR21497. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@221456 91177308-0d34-0410-b5e6-96231b3b80d8	2014-11-06 14:39:49 +00:00
Andrea Di Biagio	f0f66a254d	[X86] When commuting SSE immediate blend, make sure that the new blend mask is a valid imm8. Example: define <4 x i32> @test(<4 x i32> %a, <4 x i32> %b) { %shuffle = shufflevector <4 x i32> %a, <4 x i32> %b, <4 x i32> <i32 4, i32 5, i32 6, i32 3> ret <4 x i32> %shuffle } Before llc (-mattr=+sse4.1), produced the following assembly instruction: pblendw $4294967103, %xmm1, %xmm0 After pblendw $63, %xmm1, %xmm0 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@221455 91177308-0d34-0410-b5e6-96231b3b80d8	2014-11-06 14:36:45 +00:00

1 2 3 4 5 ...

5585 Commits