archived-llvm

mirror of https://github.com/RPCS3/llvm.git synced 2026-01-31 01:25:19 +01:00

Author	SHA1	Message	Date
Tom Stellard	a6a3cde0aa	Revert "Merging r328039:" This reverts commit r332001. I forgot to run make check before committing and the tests cases with this patch fail. git-svn-id: https://llvm.org/svn/llvm-project/llvm/branches/release_60@332008 91177308-0d34-0410-b5e6-96231b3b80d8	2018-05-10 17:55:07 +00:00
Tom Stellard	ff8f891adb	Merging r328039: ------------------------------------------------------------------------ r328039 \| mstorsjo \| 2018-03-20 13:37:51 -0700 (Tue, 20 Mar 2018) \| 8 lines [X86] Don't use the MSVC stack protector names on mingw Mingw uses the same stack protector functions as GCC provides on other platforms as well. Patch by Valentin Churavy! Differential Revision: https://reviews.llvm.org/D27296 ------------------------------------------------------------------------ git-svn-id: https://llvm.org/svn/llvm-project/llvm/branches/release_60@332001 91177308-0d34-0410-b5e6-96231b3b80d8	2018-05-10 17:27:58 +00:00
Tom Stellard	19fb6975c5	Merging r327540: ------------------------------------------------------------------------ r327540 \| ctopper \| 2018-03-14 10:57:19 -0700 (Wed, 14 Mar 2018) \| 7 lines [X86] Add back fast-isel code for handling i8 shifts. I removed this in r316797 because the coverage report showed no coverage and I thought it should have been handled by the auto generated table. I now see that there is code that bypasses the table if the shift amount is out of bounds. This adds back the code. We'll codegen out of bounds i8 shifts to effectively (amount & 0x1f). The 0x1f is a strange quirk of x86 that shift amounts are always masked to 5-bits(except 64-bits). So if the masked value is still out bounds the result will be 0. Fixes PR36731. ------------------------------------------------------------------------ git-svn-id: https://llvm.org/svn/llvm-project/llvm/branches/release_60@331815 91177308-0d34-0410-b5e6-96231b3b80d8	2018-05-08 22:21:28 +00:00
Tom Stellard	56c31e605a	Merging r330186: ------------------------------------------------------------------------ r330186 \| nemanjai \| 2018-04-17 06:07:01 -0700 (Tue, 17 Apr 2018) \| 11 lines [PowerPC] Mark the BDNZ intrinsic as NoDuplicate Duplicating this intrinsic is not generally valid because it has the side-effect of decrementing the CTR. Any passes that duplicate it would need to be taught to keep the regions formed completely disjoint. This patch should be NFC for typical uses as CTRLoops runs after the remaining loop passes. It only affects situations where the loop passes are scheduled on the IR after the codegen passes (as is the case with some JIT pipelines). Fixes https://bugs.llvm.org/show_bug.cgi?id=37050 ------------------------------------------------------------------------ git-svn-id: https://llvm.org/svn/llvm-project/llvm/branches/release_60@331716 91177308-0d34-0410-b5e6-96231b3b80d8	2018-05-08 02:41:23 +00:00
Tom Stellard	f1b37feef3	Merging r329761: ------------------------------------------------------------------------ r329761 \| gberry \| 2018-04-10 14:43:03 -0700 (Tue, 10 Apr 2018) \| 13 lines [AArch64][Falkor] Fix bug in Falkor HWPF collision avoidance pass. Summary: When inserting MOVs to avoid Falkor HWPF collisions, the non-base register operand of load instructions (e.g. a register offset) was not being considered live, so it could potentially have been used as a scratch register, clobbering the actual offset value. Reviewers: mcrosier Subscribers: rengolin, javed.absar, kristof.beyls, llvm-commits Differential Revision: https://reviews.llvm.org/D45502 ------------------------------------------------------------------------ git-svn-id: https://llvm.org/svn/llvm-project/llvm/branches/release_60@330209 91177308-0d34-0410-b5e6-96231b3b80d8	2018-04-17 20:17:43 +00:00
Tom Stellard	0b57b47378	Merging r322373: ------------------------------------------------------------------------ r322373 \| d0k \| 2018-01-12 07:03:24 -0800 (Fri, 12 Jan 2018) \| 4 lines [PowerPC] Don't miscompile rotate+mask into an ANDIo if it can't recreate the immediate I'm not even sure if this transform is ever worth it, but this at least stops the bleeding. ------------------------------------------------------------------------ git-svn-id: https://llvm.org/svn/llvm-project/llvm/branches/release_60@330082 91177308-0d34-0410-b5e6-96231b3b80d8	2018-04-14 02:06:40 +00:00
Tom Stellard	4c9ba56670	Merging r329852: ------------------------------------------------------------------------ r329852 \| nemanjai \| 2018-04-11 14:25:44 -0700 (Wed, 11 Apr 2018) \| 8 lines [PowerPC] Fix condition for 64-bit rotate when replacing r+r instr with r+i This patch fixes https://bugs.llvm.org/show_bug.cgi?id=37039 The condition only covers one of the two 64-bit rotate instructions. This just adds the second (RLDICLo). Patch by Josh Stone. ------------------------------------------------------------------------ git-svn-id: https://llvm.org/svn/llvm-project/llvm/branches/release_60@330076 91177308-0d34-0410-b5e6-96231b3b80d8	2018-04-13 23:26:20 +00:00
Simon Dardis	84bc444011	Merging r325653 with test fixups: ------------------------------------------------------------------------ r325653 \| sdardis \| 2018-02-21 00:06:53 +0000 (Wed, 21 Feb 2018) \| 31 lines [mips] Spectre variant two mitigation for MIPSR2 This patch provides mitigation for CVE-2017-5715, Spectre variant two, which affects the P5600 and P6600. It implements the LLVM part of -mindirect-jump=hazard. It is _not_ enabled by default for the P5600. The migitation strategy suggested by MIPS for these processors is to use hazard barrier instructions. 'jalr.hb' and 'jr.hb' are hazard barrier variants of the 'jalr' and 'jr' instructions respectively. These instructions impede the execution of instruction stream until architecturally defined hazards (changes to the instruction stream, privileged registers which may affect execution) are cleared. These instructions in MIPS' designs are not speculated past. These instructions are used with the attribute +use-indirect-jump-hazard when branching indirectly and for indirect function calls. These instructions are defined by the MIPS32R2 ISA, so this mitigation method is not compatible with processors which implement an earlier revision of the MIPS ISA. Performance benchmarking of this option with -fpic and lld using -z hazardplt shows a difference of overall 10%~ time increase for the LLVM testsuite. Certain benchmarks such as methcall show a substantially larger increase in time due to their nature. Reviewers: atanasyan, zoran.jovanovic Differential Revision: https://reviews.llvm.org/D43486 ------------------------------------------------------------------------ git-svn-id: https://llvm.org/svn/llvm-project/llvm/branches/release_60@329798 91177308-0d34-0410-b5e6-96231b3b80d8	2018-04-11 12:55:10 +00:00
Tom Stellard	9fa366d3c1	Merging r327651: ------------------------------------------------------------------------ r327651 \| carrot \| 2018-03-15 10:49:12 -0700 (Thu, 15 Mar 2018) \| 9 lines [PPC] Avoid non-simple MVT in STBRX optimization PR35402 triggered this case. It bswap and stores a 48bit value, current STBRX optimization transforms it into STBRX. Unfortunately 48bit is not a simple MVT, there is no PPC instruction to support it, and it can't be automatically expanded by llvm, so caused a crash. This patch detects the non-simple MVT and returns early. Differential Revision: https://reviews.llvm.org/D44500 ------------------------------------------------------------------------ git-svn-id: https://llvm.org/svn/llvm-project/llvm/branches/release_60@329641 91177308-0d34-0410-b5e6-96231b3b80d8	2018-04-09 23:19:44 +00:00
Tom Stellard	48e90723ea	Merging r322319: ------------------------------------------------------------------------ r322319 \| matze \| 2018-01-11 14:30:43 -0800 (Thu, 11 Jan 2018) \| 7 lines PeepholeOptimizer: Fix for vregs without defs The PeepholeOptimizer would fail for vregs without a definition. If this was caused by an undef operand abort to keep the code simple (so we don't need to add logic everywhere to replicate the undef flag). Differential Revision: https://reviews.llvm.org/D40763 ------------------------------------------------------------------------ git-svn-id: https://llvm.org/svn/llvm-project/llvm/branches/release_60@329619 91177308-0d34-0410-b5e6-96231b3b80d8	2018-04-09 20:45:48 +00:00
Tom Stellard	a7769cbdb1	Merging r326535: ------------------------------------------------------------------------ r326535 \| jvesely \| 2018-03-01 18:50:22 -0800 (Thu, 01 Mar 2018) \| 6 lines AMDGPU/GCN: Promote i16 ctpop i16 capable ASICs do not support i16 operands for this instruction. Add tablegen pattern to merge chained i16 additions. Differential Revision: https://reviews.llvm.org/D43985 ------------------------------------------------------------------------ git-svn-id: https://llvm.org/svn/llvm-project/llvm/branches/release_60@329589 91177308-0d34-0410-b5e6-96231b3b80d8	2018-04-09 16:38:02 +00:00
Tom Stellard	aa0c91ae81	Merging r328341: ------------------------------------------------------------------------ r328341 \| apazos \| 2018-03-23 10:53:27 -0700 (Fri, 23 Mar 2018) \| 16 lines [ARM] Fix "Constant pool entry out of range!" in Thumb1 mode This patch fixes PR36658, "Constant pool entry out of range!" in Thumb1 mode. In ARMConstantIslands::optimizeThumb2JumpTables() in Thumb1 mode, adjustBBOffsetsAfter() is not calculating postOffset correctly by properly accounting for the padding that is required for the constant pool that immediately follows the jump table branch instruction. Reviewers: t.p.northover, eli.friedman Reviewed By: t.p.northover Subscribers: chrib, tstellar, javed.absar, kristof.beyls, llvm-commits Differential Revision: https://reviews.llvm.org/D44709 ------------------------------------------------------------------------ git-svn-id: https://llvm.org/svn/llvm-project/llvm/branches/release_60@329487 91177308-0d34-0410-b5e6-96231b3b80d8	2018-04-07 05:52:39 +00:00
Hans Wennborg	1a427644e6	Merging r326393: ------------------------------------------------------------------------ r326393 \| ctopper \| 2018-03-01 01:08:38 +0100 (Thu, 01 Mar 2018) \| 5 lines [X86] Make sure we don't combine (fneg (fma X, Y, Z)) to a target specific node when there are no FMA instructions. This would cause a 'cannot select' error at isel when we should have emitted a lib call and an xor. Fixes PR36553. ------------------------------------------------------------------------ git-svn-id: https://llvm.org/svn/llvm-project/llvm/branches/release_60@326423 91177308-0d34-0410-b5e6-96231b3b80d8	2018-03-01 09:05:01 +00:00
Hans Wennborg	a1f4098730	Merging r325739: ------------------------------------------------------------------------ r325739 \| nemanjai \| 2018-02-22 04:02:41 +0100 (Thu, 22 Feb 2018) \| 9 lines [PowerPC] Do not produce invalid CTR loop with an FRem An FRem instruction inside a loop should prevent the loop from being converted into a CTR loop since this is not an operation that is legal on any PPC subtarget. This will always be a call to a library function which means the loop will be invalid if this instruction is in the body. Fixes PR36292. ------------------------------------------------------------------------ git-svn-id: https://llvm.org/svn/llvm-project/llvm/branches/release_60@325767 91177308-0d34-0410-b5e6-96231b3b80d8	2018-02-22 11:29:35 +00:00
Hans Wennborg	edd46837c9	Merging r325654: ------------------------------------------------------------------------ r325654 \| ctopper \| 2018-02-21 01:15:48 +0100 (Wed, 21 Feb 2018) \| 10 lines [X86] Disable CLWB for Cannon Lake Cannon Lake does not support CLWB, therefore it does not include all features listed under SKX anymore. Instead, enumerate all SKX features with the exception of CLWB. Patch by Gabor Buella Differential Revision: https://reviews.llvm.org/D43380 ------------------------------------------------------------------------ git-svn-id: https://llvm.org/svn/llvm-project/llvm/branches/release_60@325671 91177308-0d34-0410-b5e6-96231b3b80d8	2018-02-21 11:11:33 +00:00
Hans Wennborg	28b22410cd	[AArch64][GlobalISel] Support G_INSERT/G_EXTRACT of types < s32 bits. These are needed for operations on fp16 types in a later patch. This also re-instates the test/CodeGen/AArch64/GlobalISel/fp16-copy-gpr.mir test that was deleted which depended on this patch. (See PR36345.) git-svn-id: https://llvm.org/svn/llvm-project/llvm/branches/release_60@325669 91177308-0d34-0410-b5e6-96231b3b80d8	2018-02-21 10:25:22 +00:00
Hans Wennborg	22a047b7bd	Merging r325550: I couldn't get fp16-copy-gpr.mir to pass after merging so I removed it until aemerson; the other test I re-generated and it seems to work. ------------------------------------------------------------------------ r325550 \| aemerson \| 2018-02-20 06:11:57 +0100 (Tue, 20 Feb 2018) \| 7 lines [AArch64][GlobalISel] When copying from a gpr32 to an fpr16 reg, convert to fpr32 first. This is a follow on commit to r[x] where we fix the other direction of copy. For this case, after converting the source from gpr32 -> fpr32, we use a subregister copy, which is essentially what EXTRACT_SUBREG does in SDAG land. https://reviews.llvm.org/D43444 ------------------------------------------------------------------------ git-svn-id: https://llvm.org/svn/llvm-project/llvm/branches/release_60@325591 91177308-0d34-0410-b5e6-96231b3b80d8	2018-02-20 16:18:57 +00:00
Hans Wennborg	b99df1f397	Merging r325463: (I had to re-generate the test and manually update to handle the r323922 MIR physical register sigil. ------------------------------------------------------------------------ r325463 \| aemerson \| 2018-02-18 18:10:49 +0100 (Sun, 18 Feb 2018) \| 8 lines [AArch64][GlobalISel] Fix an assert fail/miscompile when fp16 types are copied to gpr register banks. PR36345. rdar://36478867 Differential Revision: https://reviews.llvm.org/D43310 ------------------------------------------------------------------------ git-svn-id: https://llvm.org/svn/llvm-project/llvm/branches/release_60@325586 91177308-0d34-0410-b5e6-96231b3b80d8	2018-02-20 15:49:15 +00:00
Hans Wennborg	75d86fdcda	Merging r324353: ------------------------------------------------------------------------ r324353 \| mareko \| 2018-02-06 16:17:55 +0100 (Tue, 06 Feb 2018) \| 5 lines AMDGPU: Fix S_BUFFER_LOAD_DWORD_SGPR moveToVALU Author: Bas Nieuwenhuizen https://reviews.llvm.org/D42881 ------------------------------------------------------------------------ git-svn-id: https://llvm.org/svn/llvm-project/llvm/branches/release_60@325497 91177308-0d34-0410-b5e6-96231b3b80d8	2018-02-19 13:55:23 +00:00
Hans Wennborg	a1b3df34e9	Revert r319778 (and r319911) due to PR36357 git-svn-id: https://llvm.org/svn/llvm-project/llvm/branches/release_60@325112 91177308-0d34-0410-b5e6-96231b3b80d8	2018-02-14 10:51:00 +00:00
Hans Wennborg	2955450dc1	Merging r324576: ------------------------------------------------------------------------ r324576 \| ctopper \| 2018-02-08 08:45:55 +0100 (Thu, 08 Feb 2018) \| 20 lines [X86] Don't emit KTEST instructions unless only the Z flag is being used Summary: KTEST has weird flag behavior. The Z flag is set for all bits in the AND of the k-registers being 0, and the C flag is set for all bits being 1. All other flags are cleared. We currently emit this instruction in EmitTEST and don't check the condition code. This can lead to strange things like using the S flag after a KTEST for a signed compare. The domain reassignment pass can also transform TEST instructions into KTEST and is not protected against the flag usage either. For now I've disabled this part of the domain reassignment pass. I tried to comment out the checks in the mir test so that we could recover them later, but I couldn't figure out how to get that to work. This patch moves the KTEST handling into LowerSETCC and now creates a ktest+x86setcc. I've chosen this approach because I'd like to add support for the C flag for all ones in a followup patch. To do that requires that I can rewrite the condition code going in the x86setcc to be different than the original SETCC condition code. This fixes PR36182. I'll file a PR to fix domain reassignment once this goes in. Should this be merged to 6.0? Reviewers: spatel, guyblank, RKSimon, zvi Reviewed By: guyblank Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D42770 ------------------------------------------------------------------------ git-svn-id: https://llvm.org/svn/llvm-project/llvm/branches/release_60@325106 91177308-0d34-0410-b5e6-96231b3b80d8	2018-02-14 09:44:17 +00:00
Hans Wennborg	0bd30eea1a	Merging r324497: ------------------------------------------------------------------------ r324497 \| ctopper \| 2018-02-07 19:32:15 +0100 (Wed, 07 Feb 2018) \| 1 line [X86] Regenerate test using update_mir_test_checks.py. NFC ------------------------------------------------------------------------ git-svn-id: https://llvm.org/svn/llvm-project/llvm/branches/release_60@325105 91177308-0d34-0410-b5e6-96231b3b80d8	2018-02-14 09:41:04 +00:00
Reid Kleckner	1d4c94e1d4	Merging r325049: ------------------------------------------------------------------------ r325049 \| rnk \| 2018-02-13 12:47:49 -0800 (Tue, 13 Feb 2018) \| 17 lines [X86] Use EDI for retpoline when no scratch regs are left Summary: Instead of solving the hard problem of how to pass the callee to the indirect jump thunk without a register, just use a CSR. At a call boundary, there's nothing stopping us from using a CSR to hold the callee as long as we save and restore it in the prologue. Also, add tests for this mregparm=3 case. I wrote execution tests for __llvm_retpoline_push, but they never got committed as lit tests, either because I never rewrote them or because they got lost in merge conflicts. Reviewers: chandlerc, dwmw2 Subscribers: javed.absar, kristof.beyls, hiraditya, llvm-commits Differential Revision: https://reviews.llvm.org/D43214 ------------------------------------------------------------------------ git-svn-id: https://llvm.org/svn/llvm-project/llvm/branches/release_60@325084 91177308-0d34-0410-b5e6-96231b3b80d8	2018-02-14 00:22:20 +00:00
Reid Kleckner	ca89cbee88	Merging r324645: ------------------------------------------------------------------------ r324645 \| dwmw2 \| 2018-02-08 12:06:05 -0800 (Thu, 08 Feb 2018) \| 5 lines [X86] Support 'V' register operand modifier This allows the register name to be printed without the leading '%'. This can be used for emitting calls to the retpoline thunks from inline asm. ------------------------------------------------------------------------ git-svn-id: https://llvm.org/svn/llvm-project/llvm/branches/release_60@325083 91177308-0d34-0410-b5e6-96231b3b80d8	2018-02-14 00:19:26 +00:00
Reid Kleckner	10dd7c84dc	Merging r324449: ------------------------------------------------------------------------ r324449 \| chandlerc \| 2018-02-06 22:16:24 -0800 (Tue, 06 Feb 2018) \| 15 lines [x86/retpoline] Make the external thunk names exactly match the names that happened to end up in GCC. This is really unfortunate, as the names don't have much rhyme or reason to them. Originally in the discussions it seemed fine to rely on aliases to map different names to whatever external thunk code developers wished to use but there are practical problems with that in the kernel it turns out. And since we're discovering this practical problems late and since GCC has already shipped a release with one set of names, we are forced, yet again, to blindly match what is there. Somewhat rushing this patch out for the Linux kernel folks to test and so we can get it patched into our releases. Differential Revision: https://reviews.llvm.org/D42998 ------------------------------------------------------------------------ git-svn-id: https://llvm.org/svn/llvm-project/llvm/branches/release_60@325082 91177308-0d34-0410-b5e6-96231b3b80d8	2018-02-14 00:18:17 +00:00
Hans Wennborg	5d4f6a8d9b	Merging r324422: ------------------------------------------------------------------------ r324422 \| efriedma \| 2018-02-07 00:00:17 +0100 (Wed, 07 Feb 2018) \| 16 lines [LivePhysRegs] Fix handling of return instructions. See D42509 for the original version of this. Basically, there are two significant changes to behavior here: - addLiveOuts always adds all pristine registers (even if a block has no successors). - addLiveOuts and addLiveOutsNoPristines always add all callee-saved registers for return blocks (including conditional return blocks). I cleaned up the functions a bit to make it clear these properties hold. Differential Revision: https://reviews.llvm.org/D42655 ------------------------------------------------------------------------ git-svn-id: https://llvm.org/svn/llvm-project/llvm/branches/release_60@324466 91177308-0d34-0410-b5e6-96231b3b80d8	2018-02-07 10:01:03 +00:00
Hans Wennborg	dd47b68215	Merging r324039: (test case modified to work around r323886 et al.) ------------------------------------------------------------------------ r324039 \| matze \| 2018-02-02 01:08:19 +0100 (Fri, 02 Feb 2018) \| 33 lines SplitKit: Fix liveness recomputation in some remat cases. Example situation: ``` BB0: %0 = ... use %0 ; ... condjump BB1 jmp BB2 BB1: %0 = ... ; rematerialized def from above (from earlier split step) jmp BB2 BB2: ; ... use %0 ``` %0 will have a live interval with 3 value numbers (for the BB0, BB1 and BB2 parts). Now SplitKit tries and succeeds in rematerializing the value number in BB2 (This only works because it is a secondary split so SplitKit is can trace this back to a single original def). We need to recompute all live ranges affected by a value number that we rematerialize. The case that we missed before is that when the value that is rematerialized is at a join (Phi VNI) then we also have to recompute liveness for the predecessor VNIs. rdar://35699130 Differential Revision: https://reviews.llvm.org/D42667 ------------------------------------------------------------------------ git-svn-id: https://llvm.org/svn/llvm-project/llvm/branches/release_60@324218 91177308-0d34-0410-b5e6-96231b3b80d8	2018-02-05 09:55:40 +00:00
Hans Wennborg	24c873b42c	Merging r324002: ------------------------------------------------------------------------ r324002 \| ctopper \| 2018-02-01 21:48:50 +0100 (Thu, 01 Feb 2018) \| 7 lines [DAGCombiner] When folding (insert_subvector undef, (bitcast (extract_subvector N1, Idx)), Idx) -> (bitcast N1) make sure that N1 has the same total size as the original output We were only checking the element count, but not the total width. This could cause illegal bitcasts to be created if for example the output was 512-bits, but N1 is 256 bits, and the extraction size was 128-bits. Fixes PR36199 Differential Revision: https://reviews.llvm.org/D42809 ------------------------------------------------------------------------ git-svn-id: https://llvm.org/svn/llvm-project/llvm/branches/release_60@324216 91177308-0d34-0410-b5e6-96231b3b80d8	2018-02-05 09:32:05 +00:00
Hans Wennborg	4b07ed628f	Merging r323908: ------------------------------------------------------------------------ r323908 \| mareko \| 2018-01-31 21:18:04 +0100 (Wed, 31 Jan 2018) \| 7 lines AMDGPU: Add intrinsics llvm.amdgcn.cvt.{pknorm.i16, pknorm.u16, pk.i16, pk.u16} Reviewers: arsenm, nhaehnle Subscribers: kzhuravl, wdng, yaxunl, dstuttard, tpr, t-tye Differential Revision: https://reviews.llvm.org/D41663 ------------------------------------------------------------------------ git-svn-id: https://llvm.org/svn/llvm-project/llvm/branches/release_60@324103 91177308-0d34-0410-b5e6-96231b3b80d8	2018-02-02 16:24:08 +00:00
Hans Wennborg	816adbd1b4	Merging r323643: ------------------------------------------------------------------------ r323643 \| jdevlieghere \| 2018-01-29 13:10:32 +0100 (Mon, 29 Jan 2018) \| 16 lines [Sparc] Account for bias in stack readjustment Summary: This was broken long ago in D12208, which failed to account for the fact that 64-bit SPARC uses a stack bias of 2047, and it is the unbiased value which should be aligned, not the biased one. This was seen to be an issue with Rust. Patch by: jrtc27 (James Clarke) Reviewers: jyknight, venkatra Reviewed By: jyknight Subscribers: jacob_hansen, JDevlieghere, fhahn, fedor.sergeev, llvm-commits Differential Revision: https://reviews.llvm.org/D39425 ------------------------------------------------------------------------ git-svn-id: https://llvm.org/svn/llvm-project/llvm/branches/release_60@324090 91177308-0d34-0410-b5e6-96231b3b80d8	2018-02-02 13:56:46 +00:00
Hans Wennborg	abf249d90b	Merging r323909: ------------------------------------------------------------------------ r323909 \| mareko \| 2018-01-31 21:18:11 +0100 (Wed, 31 Jan 2018) \| 13 lines AMDGPU: Fold inline offset for loads properly in moveToVALU on GFX9 Summary: This enables load merging into x2, x4, which is driven by inline offsets. 6500 shaders are affected: Code Size in affected shaders: -15.14 % Reviewers: arsenm, nhaehnle Subscribers: kzhuravl, wdng, yaxunl, dstuttard, tpr, t-tye, llvm-commits Differential Revision: https://reviews.llvm.org/D42078 ------------------------------------------------------------------------ git-svn-id: https://llvm.org/svn/llvm-project/llvm/branches/release_60@324089 91177308-0d34-0410-b5e6-96231b3b80d8	2018-02-02 13:54:44 +00:00
Hans Wennborg	52d11d163e	Merging r323536: ------------------------------------------------------------------------ r323536 \| arichardson \| 2018-01-26 16:56:14 +0100 (Fri, 26 Jan 2018) \| 11 lines [MIPS] Don't crash on unsized extern types with -mgpopt Summary: This fixes an assertion when building the FreeBSD MIPS64 kernel. Reviewers: atanasyan, sdardis, emaste Reviewed By: sdardis Subscribers: krytarowski, llvm-commits Differential Revision: https://reviews.llvm.org/D42571 ------------------------------------------------------------------------ git-svn-id: https://llvm.org/svn/llvm-project/llvm/branches/release_60@324087 91177308-0d34-0410-b5e6-96231b3b80d8	2018-02-02 13:47:07 +00:00
Hans Wennborg	240f1f3d60	Merging r323781: ------------------------------------------------------------------------ r323781 \| sdardis \| 2018-01-30 17:24:10 +0100 (Tue, 30 Jan 2018) \| 15 lines [mips] Fix incorrect sign extension for fpowi libcall PR36061 showed that during the expansion of ISD::FPOWI, that there was an incorrect zero extension of the integer argument which for MIPS64 would then give incorrect results. Address this with the existing mechanism for correcting sign extensions. This resolves PR36061. Thanks to James Cowgill for reporting the issue! Reviewers: atanasyan, hfinkel Differential Revision: https://reviews.llvm.org/D42537 ------------------------------------------------------------------------ git-svn-id: https://llvm.org/svn/llvm-project/llvm/branches/release_60@324085 91177308-0d34-0410-b5e6-96231b3b80d8	2018-02-02 13:41:04 +00:00
Hans Wennborg	dcecdaaa04	Merging r323857: ------------------------------------------------------------------------ r323857 \| rogfer01 \| 2018-01-31 10:23:43 +0100 (Wed, 31 Jan 2018) \| 19 lines [ARM] Allow the scheduler to clone a node with glue to avoid a copy CPSR ↔ GPR. In Thumb 1, with the new ADDCARRY / SUBCARRY the scheduler may need to do copies CPSR ↔ GPR but not all Thumb1 targets implement them. The schedule can attempt, before attempting a copy, to clone the instructions but it does not currently do that for nodes with input glue. In this patch we introduce a target-hook to let the hook decide if a glued machinenode is still eligible for copying. In this case these are ARM::tADCS and ARM::tSBCS . As a follow-up of this change we should actually implement the copies for the Thumb1 targets that do implement them and restrict the hook to the targets that can't really do such copy as these clones are not ideal. This change fixes PR35836. Differential Revision: https://reviews.llvm.org/D42051 ------------------------------------------------------------------------ git-svn-id: https://llvm.org/svn/llvm-project/llvm/branches/release_60@324082 91177308-0d34-0410-b5e6-96231b3b80d8	2018-02-02 13:35:26 +00:00
Hans Wennborg	8f86cd9213	Merging r323915: ------------------------------------------------------------------------ r323915 \| chandlerc \| 2018-01-31 21:56:37 +0100 (Wed, 31 Jan 2018) \| 17 lines [x86] Make the retpoline thunk insertion a machine function pass. Summary: This removes the need for a machine module pass using some deeply questionable hacks. This should address PR36123 which is a case where in full LTO the memory usage of a machine module pass actually ended up being significant. We should revert this on trunk as soon as we understand and fix the memory usage issue, but we should include this in any backports of retpolines themselves. Reviewers: echristo, MatzeB Subscribers: sanjoy, mcrosier, mehdi_amini, hiraditya, llvm-commits Differential Revision: https://reviews.llvm.org/D42726 ------------------------------------------------------------------------ git-svn-id: https://llvm.org/svn/llvm-project/llvm/branches/release_60@324071 91177308-0d34-0410-b5e6-96231b3b80d8	2018-02-02 10:55:15 +00:00
Hans Wennborg	e307072026	Merging r323155: ------------------------------------------------------------------------ r323155 \| chandlerc \| 2018-01-22 23:05:25 +0100 (Mon, 22 Jan 2018) \| 133 lines Introduce the "retpoline" x86 mitigation technique for variant #2 of the speculative execution vulnerabilities disclosed today, specifically identified by CVE-2017-5715, "Branch Target Injection", and is one of the two halves to Spectre.. Summary: First, we need to explain the core of the vulnerability. Note that this is a very incomplete description, please see the Project Zero blog post for details: https://googleprojectzero.blogspot.com/2018/01/reading-privileged-memory-with-side.html The basis for branch target injection is to direct speculative execution of the processor to some "gadget" of executable code by poisoning the prediction of indirect branches with the address of that gadget. The gadget in turn contains an operation that provides a side channel for reading data. Most commonly, this will look like a load of secret data followed by a branch on the loaded value and then a load of some predictable cache line. The attacker then uses timing of the processors cache to determine which direction the branch took in the speculative execution, and in turn what one bit of the loaded value was. Due to the nature of these timing side channels and the branch predictor on Intel processors, this allows an attacker to leak data only accessible to a privileged domain (like the kernel) back into an unprivileged domain. The goal is simple: avoid generating code which contains an indirect branch that could have its prediction poisoned by an attacker. In many cases, the compiler can simply use directed conditional branches and a small search tree. LLVM already has support for lowering switches in this way and the first step of this patch is to disable jump-table lowering of switches and introduce a pass to rewrite explicit indirectbr sequences into a switch over integers. However, there is no fully general alternative to indirect calls. We introduce a new construct we call a "retpoline" to implement indirect calls in a non-speculatable way. It can be thought of loosely as a trampoline for indirect calls which uses the RET instruction on x86. Further, we arrange for a specific call->ret sequence which ensures the processor predicts the return to go to a controlled, known location. The retpoline then "smashes" the return address pushed onto the stack by the call with the desired target of the original indirect call. The result is a predicted return to the next instruction after a call (which can be used to trap speculative execution within an infinite loop) and an actual indirect branch to an arbitrary address. On 64-bit x86 ABIs, this is especially easily done in the compiler by using a guaranteed scratch register to pass the target into this device. For 32-bit ABIs there isn't a guaranteed scratch register and so several different retpoline variants are introduced to use a scratch register if one is available in the calling convention and to otherwise use direct stack push/pop sequences to pass the target address. This "retpoline" mitigation is fully described in the following blog post: https://support.google.com/faqs/answer/7625886 We also support a target feature that disables emission of the retpoline thunk by the compiler to allow for custom thunks if users want them. These are particularly useful in environments like kernels that routinely do hot-patching on boot and want to hot-patch their thunk to different code sequences. They can write this custom thunk and use `-mretpoline-external-thunk` in addition to `-mretpoline`. In this case, on x86-64 thu thunk names must be: ``` __llvm_external_retpoline_r11 ``` or on 32-bit: ``` __llvm_external_retpoline_eax __llvm_external_retpoline_ecx __llvm_external_retpoline_edx __llvm_external_retpoline_push ``` And the target of the retpoline is passed in the named register, or in the case of the `push` suffix on the top of the stack via a `pushl` instruction. There is one other important source of indirect branches in x86 ELF binaries: the PLT. These patches also include support for LLD to generate PLT entries that perform a retpoline-style indirection. The only other indirect branches remaining that we are aware of are from precompiled runtimes (such as crt0.o and similar). The ones we have found are not really attackable, and so we have not focused on them here, but eventually these runtimes should also be replicated for retpoline-ed configurations for completeness. For kernels or other freestanding or fully static executables, the compiler switch `-mretpoline` is sufficient to fully mitigate this particular attack. For dynamic executables, you must compile all libraries with `-mretpoline` and additionally link the dynamic executable and all shared libraries with LLD and pass `-z retpolineplt` (or use similar functionality from some other linker). We strongly recommend also using `-z now` as non-lazy binding allows the retpoline-mitigated PLT to be substantially smaller. When manually apply similar transformations to `-mretpoline` to the Linux kernel we observed very small performance hits to applications running typical workloads, and relatively minor hits (approximately 2%) even for extremely syscall-heavy applications. This is largely due to the small number of indirect branches that occur in performance sensitive paths of the kernel. When using these patches on statically linked applications, especially C++ applications, you should expect to see a much more dramatic performance hit. For microbenchmarks that are switch, indirect-, or virtual-call heavy we have seen overheads ranging from 10% to 50%. However, real-world workloads exhibit substantially lower performance impact. Notably, techniques such as PGO and ThinLTO dramatically reduce the impact of hot indirect calls (by speculatively promoting them to direct calls) and allow optimized search trees to be used to lower switches. If you need to deploy these techniques in C++ applications, we strongly recommend that you ensure all hot call targets are statically linked (avoiding PLT indirection) and use both PGO and ThinLTO. Well tuned servers using all of these techniques saw 5% - 10% overhead from the use of retpoline. We will add detailed documentation covering these components in subsequent patches, but wanted to make the core functionality available as soon as possible. Happy for more code review, but we'd really like to get these patches landed and backported ASAP for obvious reasons. We're planning to backport this to both 6.0 and 5.0 release streams and get a 5.0 release with just this cherry picked ASAP for distros and vendors. This patch is the work of a number of people over the past month: Eric, Reid, Rui, and myself. I'm mailing it out as a single commit due to the time sensitive nature of landing this and the need to backport it. Huge thanks to everyone who helped out here, and everyone at Intel who helped out in discussions about how to craft this. Also, credit goes to Paul Turner (at Google, but not an LLVM contributor) for much of the underlying retpoline design. Reviewers: echristo, rnk, ruiu, craig.topper, DavidKreitzer Subscribers: sanjoy, emaste, mcrosier, mgorny, mehdi_amini, hiraditya, llvm-commits Differential Revision: https://reviews.llvm.org/D41723 ------------------------------------------------------------------------ git-svn-id: https://llvm.org/svn/llvm-project/llvm/branches/release_60@324067 91177308-0d34-0410-b5e6-96231b3b80d8	2018-02-02 10:49:53 +00:00
Hans Wennborg	a593839b62	Merging r323811: ------------------------------------------------------------------------ r323811 \| mstorsjo \| 2018-01-30 20:50:58 +0100 (Tue, 30 Jan 2018) \| 3 lines [GlobalISel] Bail out on calls to dllimported functions Differential Revision: https://reviews.llvm.org/D42568 ------------------------------------------------------------------------ git-svn-id: https://llvm.org/svn/llvm-project/llvm/branches/release_60@323853 91177308-0d34-0410-b5e6-96231b3b80d8	2018-01-31 09:00:28 +00:00
Hans Wennborg	52eee2488b	Merging r323810: ------------------------------------------------------------------------ r323810 \| mstorsjo \| 2018-01-30 20:50:51 +0100 (Tue, 30 Jan 2018) \| 3 lines [AArch64] Properly handle dllimport of variables when using fast-isel Differential Revision: https://reviews.llvm.org/D42567 ------------------------------------------------------------------------ git-svn-id: https://llvm.org/svn/llvm-project/llvm/branches/release_60@323852 91177308-0d34-0410-b5e6-96231b3b80d8	2018-01-31 08:57:32 +00:00
Hans Wennborg	7d30102ce0	Merging r323706: ------------------------------------------------------------------------ r323706 \| mareko \| 2018-01-30 00:19:10 +0100 (Tue, 30 Jan 2018) \| 15 lines AMDGPU: Allow a SGPR for the conditional KILL operand Patch by: Bas Nieuwenhuizen Just use the _e64 variant if needed. This should be possible as per def : Pat < (int_amdgcn_kill (i1 (setcc f32:$src, InlineFPImm<f32>:$imm, cond:$cond))), (SI_KILL_F32_COND_IMM_PSEUDO $src, (bitcast_fpimm_to_i32 $imm), (cond_as_i32imm $cond)) > ; I don't think we can get an immediate for the other operand for which we need the second 32-bit word. https://reviews.llvm.org/D42302 ------------------------------------------------------------------------ git-svn-id: https://llvm.org/svn/llvm-project/llvm/branches/release_60@323772 91177308-0d34-0410-b5e6-96231b3b80d8	2018-01-30 15:29:20 +00:00
Hans Wennborg	f48ff4c85e	Merging r323355: ------------------------------------------------------------------------ r323355 \| nha \| 2018-01-24 19:02:05 +0100 (Wed, 24 Jan 2018) \| 9 lines Revert r321751, "StructurizeCFG: Fix broken backedge detection" It causes regressions in various OpenGL test suites. Keep the test cases introduced by r321751 as XFAIL, and add a test case for the regression. Change-Id: I90b4cc354f68cebe5fcef1f2422dc8fe1c6d3514 Bugzilla: https://bugs.llvm.org/show_bug.cgi?id=36015 ------------------------------------------------------------------------ git-svn-id: https://llvm.org/svn/llvm-project/llvm/branches/release_60@323749 91177308-0d34-0410-b5e6-96231b3b80d8	2018-01-30 11:17:13 +00:00
Hans Wennborg	03a6999cf4	Merging r323710: ------------------------------------------------------------------------ r323710 \| qcolombet \| 2018-01-30 00:42:37 +0100 (Tue, 30 Jan 2018) \| 13 lines [RAFast] Don't dereference MBB::end When RAFast sees liveins in on a basic block, it uses that information to initialize the availability of the registers. The called method uses an instruction as one of its argument and in the liveins case, RAFast was dereferencing MBB::begin which can be MBB::end for empty basic block. Change the API of definePhysReg to use MachineBasicBlock::iterator instead of MachineInstr so that we don't dereference an invalid iterator while making the call. rdar://problem/36952401 ------------------------------------------------------------------------ git-svn-id: https://llvm.org/svn/llvm-project/llvm/branches/release_60@323746 91177308-0d34-0410-b5e6-96231b3b80d8	2018-01-30 10:53:45 +00:00
Hans Wennborg	f8f8b9b531	Merging r323672: (test-case re-generated) ------------------------------------------------------------------------ r323672 \| ctopper \| 2018-01-29 18:56:57 +0100 (Mon, 29 Jan 2018) \| 5 lines [X86] Don't create SHRUNKBLEND when the condition is used by the true or false operand of the vselect. Fixes PR34592. Differential Revision: https://reviews.llvm.org/D42628 ------------------------------------------------------------------------ git-svn-id: https://llvm.org/svn/llvm-project/llvm/branches/release_60@323743 91177308-0d34-0410-b5e6-96231b3b80d8	2018-01-30 10:30:33 +00:00
Hans Wennborg	e57fcaa435	Merging r323582: ------------------------------------------------------------------------ r323582 \| aemerson \| 2018-01-27 08:07:20 +0100 (Sat, 27 Jan 2018) \| 6 lines [GlobalISel][Legalizer] Convert the FP constants to the right APFloat type for G_FCONSTANT. We weren't converting the immediate ConstantFP during legalization, which caused the wrong bit patterns to be emitted for half type FP constants. Fixes PR36106. ------------------------------------------------------------------------ git-svn-id: https://llvm.org/svn/llvm-project/llvm/branches/release_60@323742 91177308-0d34-0410-b5e6-96231b3b80d8	2018-01-30 10:19:03 +00:00
Hans Wennborg	f1286127b7	Merging r323384: ------------------------------------------------------------------------ r323384 \| aemerson \| 2018-01-24 23:40:25 +0100 (Wed, 24 Jan 2018) \| 1 line [GlobalISel] Add a requires: asserts to a test. ------------------------------------------------------------------------ git-svn-id: https://llvm.org/svn/llvm-project/llvm/branches/release_60@323523 91177308-0d34-0410-b5e6-96231b3b80d8	2018-01-26 12:03:01 +00:00
Hans Wennborg	57ea45a708	Merging r323369 and r323371: ------------------------------------------------------------------------ r323369 \| aemerson \| 2018-01-24 20:59:29 +0100 (Wed, 24 Jan 2018) \| 4 lines [GlobalISel] Don't fall back to FastISel. Apparently checking the pass structure isn't enough to ensure that we don't fall back to FastISel, as it's set up as part of the SelectionDAGISel. ------------------------------------------------------------------------ ------------------------------------------------------------------------ r323371 \| aemerson \| 2018-01-24 21:35:37 +0100 (Wed, 24 Jan 2018) \| 12 lines [AArch64][GlobalISel] Fall back during AArch64 isel if we have a volatile load. The tablegen imported patterns for sext(load(a)) don't check for single uses of the load or delete the original after matching. As a result two loads are left in the generated code. This particular issue will be fixed by adding support for a G_SEXTLOAD opcode in future. There are however other potential issues around this that wouldn't be fixed by a G_SEXTLOAD, so until we have a proper solution we don't try to handle volatile loads at all in the AArch64 selector. Fixes/works around PR36018. ------------------------------------------------------------------------ git-svn-id: https://llvm.org/svn/llvm-project/llvm/branches/release_60@323434 91177308-0d34-0410-b5e6-96231b3b80d8	2018-01-25 15:28:01 +00:00
Hans Wennborg	50fb516bb5	Merging r322900 and r323307: ------------------------------------------------------------------------ r322900 \| mstorsjo \| 2018-01-18 22:21:48 +0100 (Thu, 18 Jan 2018) \| 6 lines [test] Actually check the common parts in CodeGen/ARM/global-merge-external.ll. NFC. Previously, these parts weren't ever checked. The label patterns need to be extended to match successfully on macho. Differential Revision: https://reviews.llvm.org/D42126 ------------------------------------------------------------------------ ------------------------------------------------------------------------ r323307 \| mstorsjo \| 2018-01-24 07:40:04 +0100 (Wed, 24 Jan 2018) \| 6 lines [GlobalMerge] Don't merge dllexport globals Merging such globals loses the dllexport attribute. Add a test to check that normal globals still are merged. Differential Revision: https://reviews.llvm.org/D42127 ------------------------------------------------------------------------ git-svn-id: https://llvm.org/svn/llvm-project/llvm/branches/release_60@323337 91177308-0d34-0410-b5e6-96231b3b80d8	2018-01-24 15:53:46 +00:00
Hans Wennborg	093b1726dc	Merging r323190: ------------------------------------------------------------------------ r323190 \| rksimon \| 2018-01-23 12:39:06 +0100 (Tue, 23 Jan 2018) \| 5 lines [X86][SSE] LowerBUILD_VECTORAsVariablePermute - fix PSHUFB source/index operand ordering As detailed in rL317463, PSHUFB (like most variable shuffle instructions) uses Op[0] for the source vector and Op[1] for the shuffle index vector, VPERMV works in reverse which is probably where the confusion comes from. Differential Revision: https://reviews.llvm.org/D42380 ------------------------------------------------------------------------ git-svn-id: https://llvm.org/svn/llvm-project/llvm/branches/release_60@323335 91177308-0d34-0410-b5e6-96231b3b80d8	2018-01-24 15:38:38 +00:00
Hans Wennborg	d8c108c8d0	Merging r322372 and r322767: ------------------------------------------------------------------------ r322372 \| nemanjai \| 2018-01-12 15:58:41 +0100 (Fri, 12 Jan 2018) \| 10 lines [PowerPC] Zero-extend the compare operand for ATOMIC_CMP_SWAP Part of the fix for https://bugs.llvm.org/show_bug.cgi?id=35812. This patch ensures that the compare operand for the atomic compare and swap is properly zero-extended to 32 bits if applicable. A follow-up commit will fix the extension for the SETCC node generated when expanding an ATOMIC_CMP_SWAP_WITH_SUCCESS. That will complete the bug fix. Differential Revision: https://reviews.llvm.org/D41856 ------------------------------------------------------------------------ ------------------------------------------------------------------------ r322767 \| efriedma \| 2018-01-17 23:04:36 +0100 (Wed, 17 Jan 2018) \| 12 lines [LegalizeDAG] Fix ATOMIC_CMP_SWAP_WITH_SUCCESS legalization. The code wasn't zero-extending correctly, so the comparison could spuriously fail. Adds some AArch64 tests to cover this case. Inspired by D41791. Differential Revision: https://reviews.llvm.org/D41798 ------------------------------------------------------------------------ git-svn-id: https://llvm.org/svn/llvm-project/llvm/branches/release_60@323334 91177308-0d34-0410-b5e6-96231b3b80d8	2018-01-24 15:33:33 +00:00
Hans Wennborg	0780628097	Merging r322878: ------------------------------------------------------------------------ r322878 \| aemerson \| 2018-01-18 20:21:27 +0100 (Thu, 18 Jan 2018) \| 5 lines [AArch64][GlobalISel] Add isel support for global values in the large code model. Fixes PR35958. Differential Revision: https://reviews.llvm.org/D42175 ------------------------------------------------------------------------ git-svn-id: https://llvm.org/svn/llvm-project/llvm/branches/release_60@323103 91177308-0d34-0410-b5e6-96231b3b80d8	2018-01-22 11:56:34 +00:00
Hans Wennborg	aa06dbe86a	Merging r322644: ------------------------------------------------------------------------ r322644 \| d0k \| 2018-01-17 05:01:06 -0800 (Wed, 17 Jan 2018) \| 7 lines [X86] Don't mutate shuffle arguments after early-out for AVX512 The match* functions have the annoying behavior of modifying its inputs. Save and restore the inputs, just in case the early out for AVX512 is hit. This is still not great and its only a matter of time this kind of bug happens again, but I couldn't come up with a better pattern without rewriting significant chunks of this code. Fixes PR35977. ------------------------------------------------------------------------ git-svn-id: https://llvm.org/svn/llvm-project/llvm/branches/release_60@322840 91177308-0d34-0410-b5e6-96231b3b80d8	2018-01-18 11:37:05 +00:00

1 2 3 4 5 ...

23689 Commits