archived-llvm

mirror of https://github.com/RPCS3/llvm.git synced 2026-01-31 01:25:19 +01:00

Author	SHA1	Message	Date
Jinsong Ji	e09833c6e9	[UpdateChecks] Add support for armv7-apple-darwin armv7-apple-darwin was not supported well, the script can't generate checks. https://reviews.llvm.org/D60601/new/#inline-568671 Differential Revision: https://reviews.llvm.org/D63939 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@364668 91177308-0d34-0410-b5e6-96231b3b80d8	2019-06-28 18:07:19 +00:00
Simon Pilgrim	08a4b34e50	[X86] CombineShuffleWithExtract - recurse through EXTRACT_SUBVECTOR chain git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@364667 91177308-0d34-0410-b5e6-96231b3b80d8	2019-06-28 17:57:32 +00:00
Roman Lebedev	ae7898988a	[NFC][Codegen] Revisit test coverage for X % C == 0 fold once more (add tests with '1' divisor) git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@364661 91177308-0d34-0410-b5e6-96231b3b80d8	2019-06-28 17:26:28 +00:00
Sam Tebbs	a71ac42554	[ARM] Add support for the MVE long shift instructions MVE adds the lsll, lsrl and asrl instructions, which perform a shift on a 64 bit value separated into two 32 bit registers. The Expand64BitShift function is modified to accept ISD::SHL, ISD::SRL and ISD::SRA and convert it into the appropriate opcode in ARMISD. An SHL is converted into an lsll, an SRL is converted into an lsrl for the immediate form and a negation and lsll for the register form, and SRA is converted into an asrl. test/CodeGen/ARM/shift_parts.ll is added to test the logic of emitting these instructions. Differential Revision: https://reviews.llvm.org/D63430 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@364654 91177308-0d34-0410-b5e6-96231b3b80d8	2019-06-28 15:43:31 +00:00
Dmitry Preobrazhensky	3c357a49a3	[AMDGPU][MC] Enabled constant expressions as operands of sendmsg See bug 40820: https://bugs.llvm.org/show_bug.cgi?id=40820 Reviewers: artem.tamazov, arsenm Differential Revision: https://reviews.llvm.org/D62735 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@364645 91177308-0d34-0410-b5e6-96231b3b80d8	2019-06-28 14:14:02 +00:00
Simon Pilgrim	ef0e051c6d	[X86] CombineShuffleWithExtract - only require 1 source to be EXTRACT_SUBVECTOR We were requiring that both shuffle operands were EXTRACT_SUBVECTORs, but we can relax this to only require one of them to be. Also, we shouldn't bother attempting this if both operands are from the lowest subvector (or not EXTRACT_SUBVECTOR at all). git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@364644 91177308-0d34-0410-b5e6-96231b3b80d8	2019-06-28 12:24:49 +00:00
David Green	5f0953343c	[ARM] Add MVE mul patterns This simply adds integer and floating point VMUL patterns for MVE, same as we have add and sub. Differential Revision: https://reviews.llvm.org/D63866 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@364643 91177308-0d34-0410-b5e6-96231b3b80d8	2019-06-28 11:44:03 +00:00
Roman Lebedev	bae9441dbd	[NFC][Codegen] Revisit test coverage for X % C == 0 fold git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@364642 91177308-0d34-0410-b5e6-96231b3b80d8	2019-06-28 11:36:34 +00:00
David Green	2dd6a9d6de	[ARM] Mark math routines as non-legal for MVE This adds handling and tests for a number of floating point math routines, which have no MVE instructions. Differential Revision: https://reviews.llvm.org/D63725 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@364641 91177308-0d34-0410-b5e6-96231b3b80d8	2019-06-28 11:17:38 +00:00
David Green	4cbe79049c	[ARM] MVE patterns for VABS and VNEG This simply adds the required patterns for fp neg and abs. Differential Revision: https://reviews.llvm.org/D63861 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@364640 91177308-0d34-0410-b5e6-96231b3b80d8	2019-06-28 10:25:35 +00:00
David Green	0272917d5a	[ARM] Widening loads and narrowing stores MVE has instructions to widen as it loads, and narrow as it stores. This adds the required patterns and legalisation to make them work including specifying that they are legal, patterns to select them and test changes. Patch by David Sherwood. Differential Revision: https://reviews.llvm.org/D63839 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@364636 91177308-0d34-0410-b5e6-96231b3b80d8	2019-06-28 09:47:55 +00:00
David Green	c2c9f7413e	[ARM] MVE loads and stores This fills in the gaps for basic MVE loads and stores, allowing unaligned access and adding far too many tests. These will become important as narrowing/expanding and pre/post inc are added. Big endian might still not be handled very well, because we have not yet added bitcasts (and I'm not sure how we want it to work yet). I've included the alignment code anyway which maps with our current patterns. We plan to return to that later. Code written by Simon Tatham, with additional tests from Me and Mikhail Maltsev. Differential Revision: https://reviews.llvm.org/D63838 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@364633 91177308-0d34-0410-b5e6-96231b3b80d8	2019-06-28 08:41:40 +00:00
David Green	67091fc7e1	[ARM] Mark div and rem as expand for MVE We don't have vector operations for these, so they need to be expanded for both integer and float. Differential Revision: https://reviews.llvm.org/D63595 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@364631 91177308-0d34-0410-b5e6-96231b3b80d8	2019-06-28 08:18:55 +00:00
David Green	03b8387d14	[ARM] Select MVE fp add and sub The same as integer arithmetic, we can add simple floating point MVE addition and subtraction patterns. Initial code by David Sherwood Differential Revision: https://reviews.llvm.org/D63257 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@364629 91177308-0d34-0410-b5e6-96231b3b80d8	2019-06-28 07:41:09 +00:00
David Green	93744b32bb	[ARM] Select MVE add and sub This adds the first few patterns for MVE code generation, adding simple integer add and sub patterns. Initial code by David Sherwood Differential Revision: https://reviews.llvm.org/D63255 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@364627 91177308-0d34-0410-b5e6-96231b3b80d8	2019-06-28 07:21:11 +00:00
David Green	3b9e91d904	[ARM] MVE vector shuffles This patch adds necessary shuffle vector and buildvector support for ARM MVE. It essentially adds support for VDUP, VREVs and some VMOVs, which are often required by other code (like upcoming patches). This mostly uses the same code from Neon that already generated NEONvdup/NEONvduplane/NEONvrev's. These have been renamed to ARMvdup/etc and moved to ARMInstrInfo as they are common to both architectures. Most of the selection code seems to be applicable to both, but NEON does have some more instructions making some parts specific. Most code originally by David Sherwood. Differential Revision: https://reviews.llvm.org/D63567 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@364626 91177308-0d34-0410-b5e6-96231b3b80d8	2019-06-28 07:08:42 +00:00
Stanislav Mekhanoshin	41e9ec03c7	[AMDGPU] Packed thread ids in function call ABI Differential Revision: https://reviews.llvm.org/D63851 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@364619 91177308-0d34-0410-b5e6-96231b3b80d8	2019-06-28 01:52:13 +00:00
Amara Emerson	b9f4ed6752	[GlobalISel][IRTranslator] Fix some PHI bugs related to jump tables when optimizations are used. The new switch lowering code that tries to generate jump tables and range checks were tested at -O0 on arm64, but on -O3 the generic switch lowering code goes to town on trying to generate optimized lowerings, e.g. multiple jump tables, range checks etc. This exposed bugs in the way PHI nodes are handled because the CFG looks even stranger after all of this is done. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@364613 91177308-0d34-0410-b5e6-96231b3b80d8	2019-06-27 23:56:34 +00:00
Roman Lebedev	65afd3a0fd	[CodeGen] [SelectionDAG] More efficient code for X % C == 0 (UREM case) (try 3) Summary: I'm submitting a new revision since i don't understand how to reclaim/reopen/take over the existing one, D50222. There is no such action in "Add Action" menu... This implements an optimization described in Hacker's Delight 10-17: when `C` is constant, the result of `X % C == 0` can be computed more cheaply without actually calculating the remainder. The motivation is discussed here: https://bugs.llvm.org/show_bug.cgi?id=35479. This is a recommit, the original commit rL364563 was reverted in rL364568 because test-suite detected miscompile - the new comparison constant 'Q' was being computed incorrectly (we divided by `D0` instead of `D`). Original patch D50222 by @hermord (Dmytro Shynkevych) Notes: - In principle, it's possible to also handle the `X % C1 == C2` case, as discussed on bugzilla. This seems to require an extra branch on overflow, so I refrained from implementing this for now. - An explicit check for when the `REM` can be reduced to just its LHS is included: the `X % C` == 0 optimization breaks `test1` in `test/CodeGen/X86/jump_sign.ll` otherwise. I hadn't managed to find a better way to not generate worse output in this case. - The `test/CodeGen/X86/jump_sign.ll` regresses, and is being fixed by a followup patch D63390. Reviewers: RKSimon, craig.topper, spatel, hermord, xbolva00 Reviewed By: RKSimon, xbolva00 Subscribers: dexonsmith, kristina, xbolva00, javed.absar, llvm-commits, hermord Tags: #llvm Differential Revision: https://reviews.llvm.org/D63391 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@364600 91177308-0d34-0410-b5e6-96231b3b80d8	2019-06-27 21:52:10 +00:00
Sanjay Patel	f5c10ec47c	[x86] remove whitespace; NFC git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@364588 91177308-0d34-0410-b5e6-96231b3b80d8	2019-06-27 20:37:12 +00:00
Sanjay Patel	df3cc01106	[x86] prevent crashing from select narrowing with AVX512 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@364585 91177308-0d34-0410-b5e6-96231b3b80d8	2019-06-27 20:16:58 +00:00
Roman Lebedev	93f902be53	[NFC][CodeGen] Add negative test for X u% C == 0 fold (D63391) The fold (D63391) uses multiplicativeInverse(), but it is not guaranteed to always succeed, and '100' appears to be one of the problematic values. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@364578 91177308-0d34-0410-b5e6-96231b3b80d8	2019-06-27 19:09:51 +00:00
Roman Lebedev	1df452bb5a	Revert "[CodeGen] [SelectionDAG] More efficient code for X % C == 0 (UREM case) (try 2)" Appears to break test-suite on http://lab.llvm.org:8011/builders/clang-cmake-x86_64-sde-avx512-linux/builds/23790 FAIL: burg.execution_time FAIL: spiff.execution_time FAIL: employ.execution_time FAIL: llu.execution_time FAIL: gramschmidt.execution_time FAIL: fdtd-apml.execution_time This reverts commit r364563. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@364568 91177308-0d34-0410-b5e6-96231b3b80d8	2019-06-27 17:22:31 +00:00
Nicolai Haehnle	c71a1c3902	AMDGPU: Make fixing i1 copies robust against re-ordering Summary: The new test case led to incorrect code. Change-Id: Ief48b227e97aa662dd3535c9bafb27d4a184efca Reviewers: arsenm, david-salinas Subscribers: kzhuravl, jvesely, wdng, yaxunl, dstuttard, tpr, t-tye, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D63871 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@364566 91177308-0d34-0410-b5e6-96231b3b80d8	2019-06-27 16:56:44 +00:00
Roman Lebedev	84139109d4	[CodeGen] [SelectionDAG] More efficient code for X % C == 0 (UREM case) (try 2) Summary: I'm submitting a new revision since i don't understand how to reclaim/reopen/take over the existing one, D50222. There is no such action in "Add Action" menu... Original patch D50222 by @hermord (Dmytro Shynkevych) This implements an optimization described in Hacker's Delight 10-17: when `C` is constant, the result of `X % C == 0` can be computed more cheaply without actually calculating the remainder. The motivation is discussed here: https://bugs.llvm.org/show_bug.cgi?id=35479. Original patch author: @hermord (Dmytro Shynkevych)! Notes: - In principle, it's possible to also handle the `X % C1 == C2` case, as discussed on bugzilla. This seems to require an extra branch on overflow, so I refrained from implementing this for now. - An explicit check for when the `REM` can be reduced to just its LHS is included: the `X % C` == 0 optimization breaks `test1` in `test/CodeGen/X86/jump_sign.ll` otherwise. I hadn't managed to find a better way to not generate worse output in this case. - The `test/CodeGen/X86/jump_sign.ll` regresses, and is being fixed by a followup patch D63390. Reviewers: RKSimon, craig.topper, spatel, hermord, xbolva00 Reviewed By: RKSimon, xbolva00 Subscribers: xbolva00, javed.absar, llvm-commits, hermord Tags: #llvm Differential Revision: https://reviews.llvm.org/D63391 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@364563 91177308-0d34-0410-b5e6-96231b3b80d8	2019-06-27 16:45:42 +00:00
Paul Robinson	2a7bae43b0	[debug-info] Make a couple of tests more robust. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@364556 91177308-0d34-0410-b5e6-96231b3b80d8	2019-06-27 15:53:07 +00:00
Simon Pilgrim	39f0e3cf18	[TargetLowering] SimplifyDemandedVectorElts - add shift/rotate support. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@364548 91177308-0d34-0410-b5e6-96231b3b80d8	2019-06-27 14:25:54 +00:00
Jinsong Ji	e946fcb412	[PowerPC][HTM] Fix disassembling buffer overflow for tabortdc and others This was reported in https://bugs.llvm.org/show_bug.cgi?id=41751 llvm-mc aborted when disassembling tabortdc. This patch try to clean up TM related DAGs. * Fixes the problem by remove explicit output of cr0, and put it as implicit def. * Update int_ppc_tbegin pattern to accommodate the implicit def of cr0. * Update the TCHECK operand and int_ppc_tcheck accordingly. * Add some builtin test and disassembly tests. * Remove unused CRRC0/crrc0 Differential Revision: https://reviews.llvm.org/D61935 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@364544 91177308-0d34-0410-b5e6-96231b3b80d8	2019-06-27 14:11:31 +00:00
Simon Pilgrim	c2a0046960	[TargetLowering] SimplifyDemandedBits - use DemandedElts to better identify partial splat shift amounts git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@364541 91177308-0d34-0410-b5e6-96231b3b80d8	2019-06-27 13:48:43 +00:00
Simon Pilgrim	59fa40e264	[X86][SSE] Regenerate v48 shuffle test on a variety of targets git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@364520 91177308-0d34-0410-b5e6-96231b3b80d8	2019-06-27 11:22:23 +00:00
Simon Pilgrim	d56a9b8674	[X86][AVX] SimplifyDemandedVectorElts - combine PERMPD(x) -> EXTRACTF128(X) If we only use the bottom lane, see if we can simplify this to extract_subvector - which is always at least as quick as PERMPD/PERMQ. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@364518 91177308-0d34-0410-b5e6-96231b3b80d8	2019-06-27 11:16:03 +00:00
Djordje Todorovic	e02e5c6f21	[ISEL][X86] Tracking of registers that forward call arguments While lowering calls, collect info about registers that forward arguments into following function frame. We store such info into the MachineFunction of the call. This is used very late when dumping DWARF info about call site parameters. ([9/13] Introduce the debug entry values.) Co-authored-by: Ananth Sowda <asowda@cisco.com> Co-authored-by: Nikola Prica <nikola.prica@rt-rk.com> Co-authored-by: Ivan Baev <ibaev@cisco.com> Differential Revision: https://reviews.llvm.org/D60715 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@364516 91177308-0d34-0410-b5e6-96231b3b80d8	2019-06-27 10:51:15 +00:00
Diana Picus	65c682330e	[GlobalISel] Accept multiple vregs for lowerCall's args Change the interface of CallLowering::lowerCall to accept several virtual registers for each argument, instead of just one. This is a follow-up to D46018. CallLowering::lowerReturn was similarly refactored in D49660 and lowerFormalArguments in D63549. With this change, we no longer pack the virtual registers generated for aggregates into one big lump before delegating to the target. Therefore, the target can decide itself whether it wants to handle them as separate pieces or use one big register. ARM and AArch64 have been updated to use the passed in virtual registers directly, which means we no longer need to generate so many merge/extract instructions. NFCI for AMDGPU, Mips and X86. Differential Revision: https://reviews.llvm.org/D63551 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@364512 91177308-0d34-0410-b5e6-96231b3b80d8	2019-06-27 09:18:03 +00:00
Diana Picus	d3b26382a9	[GlobalISel] Accept multiple vregs for lowerCall's result Change the interface of CallLowering::lowerCall to accept several virtual registers for the call result, instead of just one. This is a follow-up to D46018. CallLowering::lowerReturn was similarly refactored in D49660 and lowerFormalArguments in D63549. With this change, we no longer pack the virtual registers generated for aggregates into one big lump before delegating to the target. Therefore, the target can decide itself whether it wants to handle them as separate pieces or use one big register. ARM and AArch64 have been updated to use the passed in virtual registers directly, which means we no longer need to generate so many merge/extract instructions. NFCI for AMDGPU, Mips and X86. Differential Revision: https://reviews.llvm.org/D63550 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@364511 91177308-0d34-0410-b5e6-96231b3b80d8	2019-06-27 09:15:53 +00:00
Diana Picus	4776c1ff97	[GlobalISel] Accept multiple vregs in lowerFormalArgs Change the interface of CallLowering::lowerFormalArguments to accept several virtual registers for each formal argument, instead of just one. This is a follow-up to D46018. CallLowering::lowerReturn was similarly refactored in D49660. lowerCall will be refactored in the same way in follow-up patches. With this change, we forward the virtual registers generated for aggregates to CallLowering. Therefore, the target can decide itself whether it wants to handle them as separate pieces or use one big register. We also copy the pack/unpackRegs helpers to CallLowering to facilitate this. ARM and AArch64 have been updated to use the passed in virtual registers directly, which means we no longer need to generate so many merge/extract instructions. AArch64 seems to have had a bug when lowering e.g. [1 x i8*], which was put into a s64 instead of a p0. Added a test-case which illustrates the problem more clearly (it crashes without this patch) and fixed the existing test-case to expect p0. AMDGPU has been updated to unpack into the virtual registers for kernels. I think the other code paths fall back for aggregates, so this should be NFC. Mips doesn't support aggregates yet, so it's also NFC. x86 seems to have code for dealing with aggregates, but I couldn't find the tests for it, so I just added a fallback to DAGISel if we get more than one virtual register for an argument. Differential Revision: https://reviews.llvm.org/D63549 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@364510 91177308-0d34-0410-b5e6-96231b3b80d8	2019-06-27 08:54:17 +00:00
Jay Foad	135f7bd084	[AMDGPU] Fix +DumpCode to print an entry label for the first function Summary: The +DumpCode attribute is a horrible hack in AMDGPU to embed the disassembly of the generated code into the elf file. It is used by LLPC to implement an extension that allows the application to read back the disassembly of the code. It tries to print an entry label at the start of every function, but that didn't work for the first function in the module because DumpCodeInstEmitter wasn't initialised until EmitFunctionBodyStart which is too late. Change-Id: I790d73ddf4f51fd02ab32529380c7cb7c607c4ee Reviewers: arsenm, tpr, kzhuravl Reviewed By: arsenm Subscribers: arsenm, kzhuravl, jvesely, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D63712 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@364508 91177308-0d34-0410-b5e6-96231b3b80d8	2019-06-27 08:19:28 +00:00
Djordje Todorovic	3aa859711c	[MachineFunction] Base support for call site info tracking Add an attribute into the MachineFunction that tracks call site info. ([8/13] Introduce the debug entry values.) Co-authored-by: Ananth Sowda <asowda@cisco.com> Co-authored-by: Nikola Prica <nikola.prica@rt-rk.com> Co-authored-by: Ivan Baev <ibaev@cisco.com> Differential Revision: https://reviews.llvm.org/D61061 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@364506 91177308-0d34-0410-b5e6-96231b3b80d8	2019-06-27 07:48:06 +00:00
Craig Topper	1fadf01eab	[X86] Teach selectScalarSSELoad to not narrow volatile loads. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@364498 91177308-0d34-0410-b5e6-96231b3b80d8	2019-06-27 05:51:56 +00:00
Eli Friedman	f5ce44c687	[ARM] Don't reserve R12 on Thumb1 as an emergency spill slot. The current implementation of ThumbRegisterInfo::saveScavengerRegister is bad for two reasons: one, it's buggy, and two, it blocks using R12 for other optimizations. So this patch gets rid of it, and adds the necessary support for using an ordinary emergency spill slot on Thumb1. (Specifically, I think saveScavengerRegister was broken by r305625, and nobody noticed for two years because the codepath is almost never used. The new code will also probably not be used much, but it now has better tests, and if we fail to emit a necessary emergency spill slot we get a reasonable error message instead of a miscompile.) A rough outline of the changes in the patch: 1. Gets rid of ThumbRegisterInfo::saveScavengerRegister. 2. Modifies ARMFrameLowering::determineCalleeSaves to allocate an emergency spill slot for Thumb1. 3. Implements useFPForScavengingIndex, so the emergency spill slot isn't placed at a negative offset from FP on Thumb1. 4. Modifies the heuristics for allocating an emergency spill slot to support Thumb1. This includes fixing ExtraCSSpill so we don't try to use "lr" as a substitute for allocating an emergency spill slot. 5. Allocates a base pointer in more cases, so the emergency spill slot is always accessible. 6. Modifies ARMFrameLowering::ResolveFrameIndexReference to compute the right offset in the new cases where we're forcing a base pointer. 7. Ensures we never generate a load or store with an offset outside of its frame object. This makes the heuristics more straightforward. 8. Changes Thumb1 prologue and epilogue emission so it never uses register scavenging. Some of the changes to the emergency spill slot heuristics in determineCalleeSaves affect ARM/Thumb2; hopefully, they should allow the compiler to avoid allocating an emergency spill slot in cases where it isn't necessary. The rest of the changes should only affect Thumb1. Differential Revision: https://reviews.llvm.org/D63677 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@364490 91177308-0d34-0410-b5e6-96231b3b80d8	2019-06-26 23:46:51 +00:00
Matt Arsenault	220e7b197a	[AMDGPU] Fix Livereg computation during epilogue insertion The LivePhysRegs calculated in order to find a scratch register in the epilogue code wrongly uses 'LiveIns'. Instead, it should use the 'Liveout' sets. For the liveness, also considering the operands of the terminator (return) instruction which is the insertion point for the scratch-exec-copy instruction. Patch by Christudasan Devadasan git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@364470 91177308-0d34-0410-b5e6-96231b3b80d8	2019-06-26 20:35:18 +00:00
Craig Topper	355b9a12ae	[X86] Rework the logic in LowerBuildVectorv16i8 to make better use of any_extend and break false dependencies. Other improvements This patch rewrites the loop iteration to only visit every other element starting with element 0. And we work on the "even" element and "next" element at the same time. The "First" logic has been moved to the bottom of the loop and doesn't run on every element. I believe it could create dangling nodes previously since we didn't check if we were going to use SCALAR_TO_VECTOR for the first insertion. I got rid of the "First" variable and just do a null check on V which should be equivalent. We also no longer use undef as the starting V for vectors with no zeroes to avoid false dependencies. This matches v8i16. I've changed all the extends and OR operations to use MVT::i32 since that's what they'll be promoted to anyway. I've tried to use zero_extend only when necessary and use any_extend otherwise. This resulted in some improvements in tests where we are now able to promote aligned (i32 (extload i8)) to a 32-bit load. Differential Revision: https://reviews.llvm.org/D63702 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@364469 91177308-0d34-0410-b5e6-96231b3b80d8	2019-06-26 20:16:19 +00:00
Simon Pilgrim	6a67d10b3b	[X86][SSE] getFauxShuffleMask - handle OR(x,y) where x and y have no overlapping bits Create a per-byte shuffle mask based on the computeKnownBits from each operand - if for each byte we have a known zero (or both) then it can be safely blended. Fixes PR41545 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@364458 91177308-0d34-0410-b5e6-96231b3b80d8	2019-06-26 18:21:26 +00:00
Simon Pilgrim	cf436bba72	[X86][AVX] Add reduced test case for PR41545 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@364454 91177308-0d34-0410-b5e6-96231b3b80d8	2019-06-26 17:56:53 +00:00
Ulrich Weigand	3f58e45931	Allow matching extend-from-memory with strict FP nodes This implements a small enhancement to https://reviews.llvm.org/D55506 Specifically, while we were able to match strict FP nodes for floating-point extend operations with a register as source, this did not work for operations with memory as source. That is because from regular operations, this is represented as a combined "extload" node (which is a variant of a load SD node); but there is no equivalent using a strict FP operation. However, it turns out that even in the absence of an extload node, we can still just match the operations explicitly, e.g. (strict_fpextend (f32 (load node:$ptr)) This patch implements that method to match the LDEB/LXEB/LXDB SystemZ instructions even when the extend uses a strict-FP node. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@364450 91177308-0d34-0410-b5e6-96231b3b80d8	2019-06-26 17:19:12 +00:00
Thomas Lively	65adefc141	[WebAssembly] Omit wrap on i64x2.{shl,shr*} ISel when possible Summary: Since the WebAssembly SIMD shift instructions take i32 operands, we truncate the i64 operand to <2 x i64> shifts during ISel. When the i64 operand is sign extended from i32, this CL makes it so the sign extension is dropped instead of a wrap instruction added. Reviewers: dschuff, aheejin Subscribers: sbc100, jgravelle-google, hiraditya, sunfish, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D63615 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@364446 91177308-0d34-0410-b5e6-96231b3b80d8	2019-06-26 16:19:59 +00:00
Thomas Lively	78cbc90959	[WebAssembly] Implement tail calls and unify tablegen call classes Summary: Implements direct and indirect tail calls enabled by the 'tail-call' feature in both DAG ISel and FastISel. Updates existing call tests and adds new tests including a binary encoding test. Reviewers: aheejin Subscribers: dschuff, sbc100, jgravelle-google, hiraditya, sunfish, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D62877 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@364445 91177308-0d34-0410-b5e6-96231b3b80d8	2019-06-26 16:17:15 +00:00
Evandro Menezes	6016bbfa46	[CodeGen] Improve formatting of jump tables (NFC) Split jump tables into individual lines and fix spacing. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@364436 91177308-0d34-0410-b5e6-96231b3b80d8	2019-06-26 15:11:31 +00:00
Simon Pilgrim	d889d38a51	[X86][SSE] X86TargetLowering::isCommutativeBinOp - add PCMPEQ Allows narrowInsertExtractVectorBinOp to reduce vector size git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@364432 91177308-0d34-0410-b5e6-96231b3b80d8	2019-06-26 14:40:49 +00:00
Simon Pilgrim	95db6ffd34	[X86][SSE] X86TargetLowering::isBinOp - add PCMPGT Allows narrowInsertExtractVectorBinOp to reduce vector size git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@364431 91177308-0d34-0410-b5e6-96231b3b80d8	2019-06-26 14:34:41 +00:00
Roman Lebedev	1c83fed7e7	[X86] X86DAGToDAGISel::matchBitExtract(): pattern c: truncation awareness Summary: The one thing of note here is that the 'bitwidth' constant (32/64) was previously pessimistic. Given `x & (-1 >> (C - z))`, we were taking `C` to be `bitwidth(x)`, but in reality we want `(-1 >> (C - z))` pattern to mean "low z bits must be all-ones". And for that, `C` should be `bitwidth(-1 >> (C - z))`, i.e. of the shift operation itself. Last pattern D does not seem to exhibit any of these truncation issues. Although it has the opposite problem - if we extract low bits (no shift) from i64, and then truncate to i32, then we fail to shrink this 64-bit extraction into 32-bit extraction. Reviewers: RKSimon, craig.topper, spatel Reviewed By: RKSimon Subscribers: llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D62806 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@364419 91177308-0d34-0410-b5e6-96231b3b80d8	2019-06-26 12:19:47 +00:00

1 2 3 4 5 ...

30284 Commits