archived-llvm

mirror of https://github.com/RPCS3/llvm.git synced 2026-01-31 01:25:19 +01:00

Author	SHA1	Message	Date
Simon Pilgrim	f9d4e1794d	[X86] Add v2i4 store test case (PR20012) git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@312874 91177308-0d34-0410-b5e6-96231b3b80d8	2017-09-09 20:28:50 +00:00
Simon Pilgrim	5037d51d6d	[X86] Add v2i2 test case (PR20011) git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@312873 91177308-0d34-0410-b5e6-96231b3b80d8	2017-09-09 20:22:35 +00:00
Simon Pilgrim	8b80450d25	[X86][FMA] Regenerate FMA tests git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@312871 91177308-0d34-0410-b5e6-96231b3b80d8	2017-09-09 19:25:59 +00:00
Simon Pilgrim	a22f9f2405	[X86][SSE] i32 vector multiplications test cases from PR6399 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@312868 91177308-0d34-0410-b5e6-96231b3b80d8	2017-09-09 18:18:17 +00:00
Simon Pilgrim	ce6571eae6	[X86][MOVBE] Fix typo in MOVBE scheduling test names Copy+paste is not your friend git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@312867 91177308-0d34-0410-b5e6-96231b3b80d8	2017-09-09 17:52:44 +00:00
Craig Topper	9080ea8806	[X86] Don't disable slow INC/DEC if optimizing for size Summary: Just because INC/DEC is a little slow on some processors doesn't mean we shouldn't prefer it when optimizing for size. This appears to match gcc behavior. Reviewers: chandlerc, zvi, RKSimon, spatel Reviewed By: RKSimon Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D37177 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@312866 91177308-0d34-0410-b5e6-96231b3b80d8	2017-09-09 17:11:59 +00:00
Kyle Butt	6ece35c79b	PPC: Don't select lxv/stxv for insufficiently aligned stack slots. The lxv/stxv instructions require an offset that is 0 % 16. Previously we were selecting lxv/stxv for loads and stores to the stack where the offset from the slot was a multiple of 16, but the stack slot was not 16 or more byte aligned. When the frame gets lowered these transform to r(1\|31) + slot + offset. If slot is not aligned, slot + offset may not be 0 % 16. Now we require 16 byte or more alignment for select lxv/stxv to stack slots. Includes a testcase that shows both sufficiently and insufficiently aligned stack slots. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@312843 91177308-0d34-0410-b5e6-96231b3b80d8	2017-09-09 00:37:56 +00:00
Yonghong Song	982a89e06c	bpf: fix test failures due to previous bpf change of assembly code syntax Signed-off-by: Yonghong Song <yhs@fb.com> git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@312840 91177308-0d34-0410-b5e6-96231b3b80d8	2017-09-09 00:11:13 +00:00
Matt Arsenault	fadb61df65	AMDGPU: Recompute scc liveness The various scalar bit operations set SCC, so one is erased or moved it needs to be recomputed. Not sure why the existing tests don't fail on this. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@312819 91177308-0d34-0410-b5e6-96231b3b80d8	2017-09-08 18:51:26 +00:00
Craig Topper	403bab200a	[X86] Simplify the slow-incdec test and add test cases with optsize. I think we want to consider using inc/dec with optsize. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@312804 91177308-0d34-0410-b5e6-96231b3b80d8	2017-09-08 17:33:54 +00:00
Wei Mi	9930880cbf	Fix a bug for rL312641. rL312641 Allowed llvm.memcpy/memset/memmove to be tail calls when parent function return the intrinsics's first argument. However on arm-none-eabi platform, llvm.memcpy will be expanded to __aeabi_memcpy which doesn't have return value. The fix is to check the libcall name after expansion to match "memcpy/memset/memmove" before allowing those intrinsic to be tail calls. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@312799 91177308-0d34-0410-b5e6-96231b3b80d8	2017-09-08 16:44:52 +00:00
Krzysztof Parzyszek	145740999d	Preserve existing regs when adding pristines to LivePhysRegs/LiveRegUnits Differential Revision: https://reviews.llvm.org/D37600 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@312797 91177308-0d34-0410-b5e6-96231b3b80d8	2017-09-08 16:29:50 +00:00
Simon Pilgrim	977c908e78	[X86] Added PR31045 test case Reduced version of 'addr-calc-crash.ll' that was included in D27044, that had been fixed already by D31286/rL298633 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@312786 91177308-0d34-0410-b5e6-96231b3b80d8	2017-09-08 10:49:11 +00:00
Jatin Bhateja	8277ad7473	[X86] Adding a test point for PR34149 'Suboptimal codegen for "fast" minnum and maxnum' Differential Revision: https://reviews.llvm.org/D37614 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@312778 91177308-0d34-0410-b5e6-96231b3b80d8	2017-09-08 09:15:36 +00:00
Dean Michael Berris	f7f70f7f6c	[XRay][CodeGen][PowerPC] Fix tail exit codegen for XRay in PPC Summary: This fixes code-gen for XRay in PPC. The regression wasn't caught by codegen tests which we add in this change. What happened was the following: - For tail exits, we used to unconditionally prepend the returns/exits with a pseudo-instruction that gets lowered to the instrumentation sled (and leave the actual return/exit instruction as-is). - Changes to the XRay instrumentation pass caused the tail exits to suddenly also emit the tail exit pseudo-instruction, since the check for whether a return instruction was also a call instruction meant it was a tail exit instruction. - None of the tests caught the regression either due to non-existent tests, or the tests being disabled/removed for continuous breakage. This change re-introduces some of the basic tests and verifies that we're back to a state that allows the back-end to generate appropriate XRay instrumented binaries for PPC in the presence of tail exits. Reviewers: echristo, timshen Subscribers: nemanjai, kbarton, llvm-commits Differential Revision: https://reviews.llvm.org/D37570 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@312772 91177308-0d34-0410-b5e6-96231b3b80d8	2017-09-08 01:47:56 +00:00
Chandler Carruth	575526c082	[x86] Flesh out the custom ISel for RMW aritmetic ops with used flags to cover the bitwise operators. Nothing really exciting here, this just stamps out the rest of the core operations that can RMW memory and set flags. Still not implemented here: ADC, SBB. Those will require more interesting logic to channel the flags in, and I'm not currently planning to try to tackle that. It might be interesting for someone who wants to improve our code generation for bignum implementations. Differential Revision: https://reviews.llvm.org/D37141 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@312768 91177308-0d34-0410-b5e6-96231b3b80d8	2017-09-08 00:17:12 +00:00
Chandler Carruth	803b2c7e69	[x86] Extend the manual ISel of `add` and `sub` with both RMW memory operands and used flags to support matching immediate operands. This is a bit trickier than register operands, and we still want to fall back on a register operands even for things that appear to be "immediates" when they won't actually select into the operation's immediate operand. This also requires us to handle things like selecting `sub` vs. `add` to minimize the number of bits needed to represent the immediate, and picking the shortest immediate encoding. In order to that, we in turn need to scan to make sure that CF isn't used as it will get inverted. The end result seems very nice though, and we're now generating optimal instruction sequences for these patterns IMO. A follow-up patch will further expand this to other operations with RMW memory operands. But handing `add` and `sub` are useful starting points to flesh out the machinery and make sure interesting and complex cases can be handled. Thanks to Craig Topper who provided a few fixes and improvements to this patch in addition to the review! Differential Revision: https://reviews.llvm.org/D37139 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@312764 91177308-0d34-0410-b5e6-96231b3b80d8	2017-09-07 23:54:24 +00:00
Paul Robinson	06296a65fd	[DWARF] Line 0 should not have a discriminator. It's meaningless and takes up extra space in the line table. Differential Revision: https://reviews.llvm.org/D37364 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@312751 91177308-0d34-0410-b5e6-96231b3b80d8	2017-09-07 22:15:44 +00:00
Artem Belevich	4059d374ce	[CUDA] Added rudimentary support for CUDA-9 and sm_70. For now CUDA-9 is not included in the list of CUDA versions clang searches for, so the path to CUDA-9 must be explicitly passed via --cuda-path=. On LLVM side NVPTX added sm_70 GPU type which bumps required PTX version to 6.0, but otherwise is equivalent to sm_62 at the moment. Differential Revision: https://reviews.llvm.org/D37576 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@312734 91177308-0d34-0410-b5e6-96231b3b80d8	2017-09-07 18:14:32 +00:00
Matt Arsenault	0bb6355f63	AMDGPU: Start selecting v_mad_mix_f32 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@312732 91177308-0d34-0410-b5e6-96231b3b80d8	2017-09-07 18:05:07 +00:00
Konstantin Zhuravlyov	3964b8bfc8	AMDGPU: Handle non-temporal loads and stores Differential Revision: https://reviews.llvm.org/D36862 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@312729 91177308-0d34-0410-b5e6-96231b3b80d8	2017-09-07 17:14:54 +00:00
Konstantin Zhuravlyov	b6f64be453	AMDGPU: Handle more than one memory operand in SIMemoryLegalizer Differential Revision: https://reviews.llvm.org/D37397 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@312725 91177308-0d34-0410-b5e6-96231b3b80d8	2017-09-07 16:14:21 +00:00
Benjamin Kramer	55915d8771	[ARM] Remove redundant vcvt patterns. These don't add any value as they're just compositions of existing patterns. However, they can confuse the cost logic in ISel, leading to duplicated vcvt instructions like in PR33199. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@312724 91177308-0d34-0410-b5e6-96231b3b80d8	2017-09-07 14:52:26 +00:00
Michael Zuckerman	563f2fdd92	[X86][LLVM]Expanding Supports lowerInterleavedLoad() in X86InterleavedAccess (VF{8\|16\|32} stride 3). This patch expands the support of lowerInterleavedload to {8\|16\|32}x8i stride 3. LLVM creates suboptimal shuffle code-gen for AVX2. In overall, this patch is a specific fix for the pattern (Strid=3 VF={8\|16\|32}) and we plan to include the store (deinterleved side). The patch goal is to optimize the following sequence: a0 b0 c0 a1 b1 c1 a2 b2 c2 a3 b3 c3 a4 b4 c4 a5 b5 c5 a6 b6 c6 a7 b7 c7 into a0 a1 a2 a3 a4 a5 a6 a7 b0 b1 b2 b3 b4 b5 b6 b7 c0 c1 c2 c3 c4 c5 c6 c7 Reviewers 1. zvi 2. igor 3. guyblank 4. dorit 5. Ayal git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@312722 91177308-0d34-0410-b5e6-96231b3b80d8	2017-09-07 14:02:13 +00:00
Florian Hahn	651af02437	[MachineCombiner] Update instruction depths incrementally for large BBs. Summary: For large basic blocks with lots of combinable instructions, the MachineTraceMetrics computations in MachineCombiner can dominate the compile time, as computing the trace information is quadratic in the number of instructions in a BB and it's relevant successors/predecessors. In most cases, knowing the instruction depth should be enough to make combination decisions. As we already iterate over all instructions in a basic block, the instruction depth can be computed incrementally. This reduces the cost of machine-combine drastically in cases where lots of instructions are combined. The major drawback is that AFAIK, computing the critical path length cannot be done incrementally. Therefore we only compute instruction depths incrementally, for basic blocks with more instructions than inc_threshold. The -machine-combiner-inc-threshold option can be used to set the threshold and allows for easier experimenting and checking if using incremental updates for all basic blocks has any impact on the performance. Reviewers: sanjoy, Gerolf, MatzeB, efriedma, fhahn Reviewed By: fhahn Subscribers: kiranchandramohan, javed.absar, efriedma, llvm-commits Differential Revision: https://reviews.llvm.org/D36619 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@312719 91177308-0d34-0410-b5e6-96231b3b80d8	2017-09-07 12:49:39 +00:00
Alexander Ivchenko	8f5188c5d8	[x86] Update to cmov promotion tests for D36711; NFC Adding i8 -> [i16, i32, i64] and i32 -> i64 cases. This way we can see what the current codegen looks like. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@312707 91177308-0d34-0410-b5e6-96231b3b80d8	2017-09-07 08:59:05 +00:00
Zvi Rackover	8a2fcfe5be	X86: Improve AVX512 fptoui lowering Summary: Add patterns for fptoui <16 x float> to <16 x i8> fptoui <16 x float> to <16 x i16> Reviewers: igorb, delena, craig.topper Reviewed By: craig.topper Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D37505 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@312704 91177308-0d34-0410-b5e6-96231b3b80d8	2017-09-07 07:40:34 +00:00
Matt Arsenault	ca22b05483	AMDGPU: Don't legalize i16 extloads to i32 with legal i16 Keeping non-i16 extloads makes it easier to match some new gfx9 load instructions. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@312699 91177308-0d34-0410-b5e6-96231b3b80d8	2017-09-07 05:37:34 +00:00
Saleem Abdulrasool	1211c5d71e	ARM: track globals promoted to coalesced const pool entries Globals that are promoted to an ARM constant pool may alias with another existing constant pool entry. We need to keep a reference to all globals that were promoted to each constant pool value so that we can emit a distinct label for each promoted global. These labels are necessary so that debug info can refer to the promoted global without an undefined reference during linking. Patch by Stephen Crane! git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@312692 91177308-0d34-0410-b5e6-96231b3b80d8	2017-09-07 04:00:13 +00:00
Stanislav Mekhanoshin	6148c30603	[AMDGPU] Use v_pk_max_f16 for fcanonicalize Differential Revision: https://reviews.llvm.org/D37325 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@312676 91177308-0d34-0410-b5e6-96231b3b80d8	2017-09-06 22:27:29 +00:00
Matthias Braun	109c6e02f7	Insert IMPLICIT_DEFS for undef uses in tail merging Tail merging can convert an undef use into a normal one when creating a common tail. Doing so can make the register live out from a block which previously contained the undef use. To keep the liveness up-to-date, insert IMPLICIT_DEFs in such blocks when necessary. To enable this patch the computeLiveIns() function which used to compute live-ins for a block and set them immediately is split into new functions: - computeLiveIns() just computes the live-ins in a LivePhysRegs set. - addLiveIns() applies the live-ins to a block live-in list. - computeAndAddLiveIns() is a convenience function combining the other two functions and behaving like computeLiveIns() before this patch. Based on a patch by Krzysztof Parzyszek <kparzysz@codeaurora.org> Differential Revision: https://reviews.llvm.org/D37034 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@312668 91177308-0d34-0410-b5e6-96231b3b80d8	2017-09-06 20:45:24 +00:00
Sanjay Patel	64aa32b606	[x86] fix triple and regenerate checks for psubus; NFC Patch by Yulia Koval! Differential Revision: https://reviews.llvm.org/D37523 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@312662 91177308-0d34-0410-b5e6-96231b3b80d8	2017-09-06 19:05:20 +00:00
Stanislav Mekhanoshin	953b70393a	[AMDGPU] Fixed encoding of v_pk_mul_f16 in fcanonicalize Differential Revision: https://reviews.llvm.org/D37522 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@312660 91177308-0d34-0410-b5e6-96231b3b80d8	2017-09-06 18:29:51 +00:00
Krzysztof Parzyszek	7e5553d4f2	[IfConversion] Remove kill flags from common instructions as well When if-converting a diamond, two separate blocks will be placed back to back to form a straight line code. To ensure correctness of the liveness information, any registers that are live in the second block should not be killed in the first block, even if they were in the original code. Additionally, when the two blocks share common instructions at the beginning, these instructions will not be duplicated, but only placed once, before both of the blocks. Since the function "isIdenticalTo" (as used here) ignores kill flags, the common initial code in one block may have a kill flag for a register that is live in the other block. Because the code that removes kill flags only runs for the non-common parts of the predicated blocks, a kill flag mismatch in the common code could still lead to a live register being killed prematurely. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@312654 91177308-0d34-0410-b5e6-96231b3b80d8	2017-09-06 17:57:13 +00:00
Krzysztof Parzyszek	0c3d5af968	[Hexagon] Add option to generate calls to "abort" for "unreachable" git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@312644 91177308-0d34-0410-b5e6-96231b3b80d8	2017-09-06 16:22:55 +00:00
Wei Mi	78696b31cd	[TailCall] Allow llvm.memcpy/memset/memmove to be tail calls when parent function return the intrinsics's first argument. llvm.memcpy/memset/memmove return void but they will return the first argument after they are expanded as libcalls. Now if the parent function has any return value, llvm.memcpy cannot be turned into tail call after expansion. The patch is to handle that case in SelectionDAGBuilder so when caller function return the same value as the first argument of llvm.memcpy, tail call is allowed. Differential Revision: https://reviews.llvm.org/D37406 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@312641 91177308-0d34-0410-b5e6-96231b3b80d8	2017-09-06 16:05:17 +00:00
Stanislav Mekhanoshin	651c4efd77	[AMDGPU] Fix shouldClusterMemOps to process flat loads Flat loads do not have vdata operand but have vdst instead. Differential Revision: https://reviews.llvm.org/D37502 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@312640 91177308-0d34-0410-b5e6-96231b3b80d8	2017-09-06 15:31:30 +00:00
Nicolai Haehnle	adf1cb63f2	AMDGPU: Make worst-case assumption about the wait states in inline assembly Summary: Mesa still uses a hack where empty inline assembly is used as a kind of optimization barrier. This exposed a problem where not enough wait states were inserted, because the hazard recognizer implicitly assumed that each inline assembly "instruction" has at least one wait state. Reviewers: arsenm Subscribers: kzhuravl, wdng, yaxunl, dstuttard, tpr, llvm-commits, t-tye Differential Revision: https://reviews.llvm.org/D37205 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@312635 91177308-0d34-0410-b5e6-96231b3b80d8	2017-09-06 13:50:13 +00:00
Chandler Carruth	1467a089bc	[x86] Fix PR34377 by disabling cmov conversion when we relied on it performing a zext of a register. On the PR there is discussion of how to more effectively handle this, but this patch prevents us from miscompiling code. Differential Revision: https://reviews.llvm.org/D37504 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@312620 91177308-0d34-0410-b5e6-96231b3b80d8	2017-09-06 06:28:08 +00:00
Zvi Rackover	922eae4d2e	X86 Tests: Tidy up AVX512 conversion tests. NFC. Rename functions to a consistent format to make it easier to track coverage. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@312619 91177308-0d34-0410-b5e6-96231b3b80d8	2017-09-06 05:33:04 +00:00
Jatin Bhateja	2411ad4316	Updating a test reference for rL312608. Differential Revision: https://reviews.llvm.org/D37501 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@312614 91177308-0d34-0410-b5e6-96231b3b80d8	2017-09-06 03:58:14 +00:00
Hal Finkel	a481ab548d	[PowerPC] Don't use xscvdpspn on the P7 xscvdpspn was not introduced until the P8, so don't use it on the P7. Fixes a regression introduced in r288152. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@312612 91177308-0d34-0410-b5e6-96231b3b80d8	2017-09-06 03:08:26 +00:00
Jatin Bhateja	f3b9c95869	[X86] Allow cross-lane permutations for sub targets supporting AVX2. Summary: Most instructions in AVX work “in-lane”, that is, each source element is applied only to other elements of the same lane, thus a cross lane permutation is costly and needs more than one instrution. AVX2 includes instructions to perform any-to-any permutation of words over a 256-bit register and vectorized table lookup. This should also Fix PR34369 Differential Revision: https://reviews.llvm.org/D37388 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@312608 91177308-0d34-0410-b5e6-96231b3b80d8	2017-09-06 02:58:47 +00:00
Yaxun Liu	1e1d0b01c1	[AMDGPU] Transform __read_pipe_* and __write_pipe_* When packet size equals packet align and is power of 2, transform __read_pipe* and __write_pipe* to specialized library function. Differential Revision: https://reviews.llvm.org/D36831 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@312598 91177308-0d34-0410-b5e6-96231b3b80d8	2017-09-06 00:30:27 +00:00
Eli Friedman	83b0e44429	[ARM] Make ARMExpandPseudo add implicit uses for predicated instructions Missing these could potentially screw up post-ra scheduling. Issue found by inspection, so I don't have a real testcase. Included test just verifies the expected operands after expansion. Differential Revision: https://reviews.llvm.org/D35156 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@312589 91177308-0d34-0410-b5e6-96231b3b80d8	2017-09-05 22:54:06 +00:00
Reid Kleckner	c86178ea37	Add llvm.codeview.annotation to implement MSVC __annotation Summary: This intrinsic represents a label with a list of associated metadata strings. It is modelled as reading and writing inaccessible memory so that it won't be removed as dead code. I think the intention is that the annotation strings should appear at most once in the debug info, so I marked it noduplicate. We are allowed to inline code with annotations as long as we strip the annotation, but that can be done later. Reviewers: majnemer Subscribers: eraman, llvm-commits, hiraditya Differential Revision: https://reviews.llvm.org/D36904 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@312569 91177308-0d34-0410-b5e6-96231b3b80d8	2017-09-05 20:14:58 +00:00
Craig Topper	8c5b337a87	[X86] Remove unnecessary (v4f32 (X86vzmovl (v4f32 (scalar_to_vector FR32X)))) patterns We had already disabled the pattern for SSE4.1 and SSE4.2. But it got re-enabled for AVX and AVX512. With SSE41 we rely on a separate (v4f32 (X86vzmovl VR128)) pattern to select blendps with a xorps to create zeroess. And a separate (v4f32 (scalar_to_vector FR32X)) to select a COPY_TO_REG_CLASS to move FR32 to VR128 The same thing can happen for AVX with vblendps and those separate patterns already exist. For AVX512, (v4f32 (X86vzmov VR128)) will select a VMOVSS instruction instead of VBLENDPS due to their not being a EVEX VBLENDPS. This is what we were getting out of the larger pattern anyway. So the larger pattern is unneeded for AVX512 too. For SSE1-SSSE3 we can rely on (v4f32 (X86vzmov VR128)) selecting a MOVSS similar to AVX512. Again this is what the larger pattern did too. So the only real change here is that AVX1/2 now properly outputs a VBLENDPS during isel instead of a VMOVSS to match SSE41. Most tests didn't notice because the two address instruction pass knows how to turn VMOVSS into VBLENDPS to get an independent destination register. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@312564 91177308-0d34-0410-b5e6-96231b3b80d8	2017-09-05 19:09:02 +00:00
Matt Arsenault	4e0c4fb9c1	AMDGPU: Fix not accounting for tail call resource usage If the only call in a function is a tail call, the function isn't considered to have a call since it's a type of return. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@312561 91177308-0d34-0410-b5e6-96231b3b80d8	2017-09-05 18:36:36 +00:00
Zvi Rackover	9c369c6f9c	X86 Tests: Adding missing AVX512 fptoui coverage tests. NFC. Some of the cases show missing pattern i intend to fix shortly. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@312560 91177308-0d34-0410-b5e6-96231b3b80d8	2017-09-05 18:24:39 +00:00
Craig Topper	035520018a	[AVX512] Remove patterns for (v8f32 (X86vzmovl (insert_subvector undef, (v4f32 (scalar_to_vector FR32X:)), (iPTR 0)))) and the same for v4f64. We don't have this same pattern for AVX2 so I don't believe we should have it for AVX512. We also didn't have it for v16f32. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@312543 91177308-0d34-0410-b5e6-96231b3b80d8	2017-09-05 17:33:58 +00:00

1 2 3 4 5 ...

22250 Commits