archived-llvm

mirror of https://github.com/RPCS3/llvm.git synced 2026-01-31 01:25:19 +01:00

Author	SHA1	Message	Date
Jatin Bhateja	be808009ac	[WebAssembly] Fix stack offsets of return values from call lowering. Summary: Fixes PR35220 Reviewers: vadimcn, alexcrichton Reviewed By: alexcrichton Subscribers: pepyakin, alexcrichton, jfb, dschuff, sbc100, jgravelle-google, llvm-commits, aheejin Differential Revision: https://reviews.llvm.org/D39866 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@317895 91177308-0d34-0410-b5e6-96231b3b80d8	2017-11-10 16:26:04 +00:00
Simon Pilgrim	3343c3a754	[X86] Add scheduling tests for DAA/DAS git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@317892 91177308-0d34-0410-b5e6-96231b3b80d8	2017-11-10 15:49:41 +00:00
Simon Pilgrim	41ecf7e107	[X86] Test non-i64 shld/shll tests on x86_64 targets as well as i686 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@317888 91177308-0d34-0410-b5e6-96231b3b80d8	2017-11-10 13:43:04 +00:00
Simon Pilgrim	7702921068	[X86] Add scheduling tests - CBW etc sign extensions - CLC/CLD/CMC flag modifiers - CPUID git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@317885 91177308-0d34-0410-b5e6-96231b3b80d8	2017-11-10 12:32:34 +00:00
Alexander Timofeev	23476c4f80	[AMDGPU] Prevent Machine Copy Propagation from replacing live copy with the dead one Differential revision: https://reviews.llvm.org/D38754 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@317884 91177308-0d34-0410-b5e6-96231b3b80d8	2017-11-10 12:21:10 +00:00
Simon Pilgrim	d84a31be44	[X86] Added TODO list for missing generic x86 instruction scheduling tests. Not sure if we want to add the more exotic system instructions (IRET etc.) as well? git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@317882 91177308-0d34-0410-b5e6-96231b3b80d8	2017-11-10 12:04:39 +00:00
Jonas Paulsson	a9fba7123d	[RegAlloc, SystemZ] Increase number of LOCRs by passing "hard" regalloc hints. * The method getRegAllocationHints() is now of bool type instead of void. If true is returned, regalloc (AllocationOrder) will only try to allocate the hints, as opposed to merely trying them before non-hinted registers. * TargetRegisterInfo::getRegAllocationHints() is implemented for SystemZ with an increase in number of LOCRs. In this case, it is desired to force the hints even though there is a slight increase in spilling, because if a non-hinted register would be allocated, the LOCRMux pseudo would have to be expanded with a jump sequence. The LOCR (Load On Condition) SystemZ instruction must have both operands in either the low or high part of the 64 bit register. Reviewers: Quentin Colombet and Ulrich Weigand https://reviews.llvm.org/D36795 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@317879 91177308-0d34-0410-b5e6-96231b3b80d8	2017-11-10 08:46:26 +00:00
Craig Topper	cad7a316e0	[X86] Add support for combining FMADDSUB(A, B, FNEG(C))->FMSUBADD(A, B, C) Support the opposite direction as well. Also add a TODO for not being able to combine FMSUB/FNMADD/FNMSUB with FNEG. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@317878 91177308-0d34-0410-b5e6-96231b3b80d8	2017-11-10 08:22:37 +00:00
Yaxun Liu	bd2cfd038d	[AMDGPU] Fix pointer info for lowering load/store for r600 for amdgiz environment r600 uses dummy pointer info for lowering load/store. Since dummy pointer info assumes address space 0, this causes isel failure when temporary load/store SDNodes are generated for amdgiz environment. Since the offest is not constant, FixedStack pseudo source value cannot be used to create the pointer info. This patch creates pointer info using llvm undef value. At least this provides correct address space so that isel can be done correctly. Differential Revision: https://reviews.llvm.org/D39698 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@317862 91177308-0d34-0410-b5e6-96231b3b80d8	2017-11-10 02:03:28 +00:00
Yaxun Liu	d3bd6cf5cc	[AMDGPU] Fix pointer info for pseudo source for r600 The pointer info for pseudo source for r600 is not correct when alloca addr space is not 0, which causes invalid SDNode for r600---amdgiz. This patch fixes that. Differential Revision: https://reviews.llvm.org/D39670 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@317861 91177308-0d34-0410-b5e6-96231b3b80d8	2017-11-10 01:53:24 +00:00
Ulrich Weigand	b9cdeecf25	[SystemZ] Add support for the "o" inline asm constraint We don't really need any special handling of "offsettable" memory addresses, but since some existing code uses inline asm statements with the "o" constraint, add support for this constraint for compatibility purposes. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@317807 91177308-0d34-0410-b5e6-96231b3b80d8	2017-11-09 16:31:57 +00:00
Simon Dardis	5be95c6164	[mips] Correct microMIP's jump and add unconditional branch pseudo Correct the definition of 'j' as being unavailable for microMIPS32R6 and provide the 'b' assembly idiom for codegen purposes for microMIPS32r3. Provide the necessary 'br' pattern for microMIPS32R6 as it now longer incorrectly uses the 'j' instruction. Reviewers: atanasyan Differential Revision: https://reviews.llvm.org/D39741 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@317801 91177308-0d34-0410-b5e6-96231b3b80d8	2017-11-09 16:02:18 +00:00
Alex Bradbury	2fe0076f87	[RISCV] Re-generate test/CodeGen/RISCV/alu32.ll using update_llc_test_checks.py No real change, but makes it marginally easier to merge the remainder of the out-of-tree patches. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@317796 91177308-0d34-0410-b5e6-96231b3b80d8	2017-11-09 15:45:42 +00:00
Andrew V. Tischenko	a4b0c616f6	Sched model improving on btver2: JFPU01 resource, vtestp* for xmm. Differential Revision: https://reviews.llvm.org/D39802 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@317785 91177308-0d34-0410-b5e6-96231b3b80d8	2017-11-09 14:19:59 +00:00
Andrew V. Tischenko	a5b99ed522	Add -print-schedule scheduling comments to inline asm. Differential Revision: https://reviews.llvm.org/D39728 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@317782 91177308-0d34-0410-b5e6-96231b3b80d8	2017-11-09 12:45:40 +00:00
Craig Topper	7595e4c402	[X86] Make X86ISD::FMADDS3 isel patterns commutable. This was missed when FMADDS3 was split from X86ISD::FMADDS3_RND. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@317769 91177308-0d34-0410-b5e6-96231b3b80d8	2017-11-09 06:17:05 +00:00
Marek Olsak	9960086c11	AMDGPU: Merge BUFFER_STORE_DWORD_OFFEN/OFFSET into x2, x4 Summary: Only 56 shaders (out of 48486) are affected. Totals from affected shaders (changed stats only): SGPRS: 2420 -> 2460 (1.65 %) Spilled VGPRs: 94 -> 112 (19.15 %) Scratch size: 524 -> 528 (0.76 %) dwords per thread Code Size: 187400 -> 184992 (-1.28 %) bytes One DiRT Showdown shader spills 6 more VGPRs. One Grid Autosport shader spills 12 more VGPRs. The other 54 shaders only have a decrease in code size. (I'm ignoring the SGPR noise) Reviewers: arsenm, nhaehnle Subscribers: kzhuravl, wdng, yaxunl, dstuttard, tpr, llvm-commits, t-tye Differential Revision: https://reviews.llvm.org/D39012 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@317755 91177308-0d34-0410-b5e6-96231b3b80d8	2017-11-09 01:52:55 +00:00
Marek Olsak	aa75d4aeb0	AMDGPU: Lower buffer store and atomic intrinsics manually Summary: Without this, SIMemoryLegalizer inserts s_waitcnt vmcnt(0) before every buffer store and atomic instruction. Reviewers: arsenm, nhaehnle Subscribers: kzhuravl, wdng, yaxunl, dstuttard, tpr, llvm-commits, t-tye Differential Revision: https://reviews.llvm.org/D39060 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@317754 91177308-0d34-0410-b5e6-96231b3b80d8	2017-11-09 01:52:48 +00:00
Marek Olsak	e79e4fb9f1	AMDGPU: Merge BUFFER_LOAD_DWORD_OFFSET into x2, x4 Summary: Only 3 (out of 48486) shaders are affected. Reviewers: arsenm, nhaehnle Subscribers: kzhuravl, wdng, yaxunl, dstuttard, tpr, t-tye, llvm-commits Differential Revision: https://reviews.llvm.org/D38951 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@317753 91177308-0d34-0410-b5e6-96231b3b80d8	2017-11-09 01:52:36 +00:00
Marek Olsak	79b91c25d7	AMDGPU: Merge BUFFER_LOAD_DWORD_OFFEN into x2, x4 Summary: -9.9% code size decrease in affected shaders. Totals (changed stats only): SGPRS: 2151462 -> 2170646 (0.89 %) VGPRS: 1634612 -> 1640288 (0.35 %) Spilled SGPRs: 8942 -> 8940 (-0.02 %) Code Size: 52940672 -> 51727288 (-2.29 %) bytes Max Waves: 373066 -> 371718 (-0.36 %) Totals from affected shaders: SGPRS: 283520 -> 302704 (6.77 %) VGPRS: 227632 -> 233308 (2.49 %) Spilled SGPRs: 3966 -> 3964 (-0.05 %) Code Size: 12203080 -> 10989696 (-9.94 %) bytes Max Waves: 44070 -> 42722 (-3.06 %) Reviewers: arsenm, nhaehnle Subscribers: kzhuravl, wdng, yaxunl, dstuttard, tpr, llvm-commits, t-tye Differential Revision: https://reviews.llvm.org/D38950 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@317752 91177308-0d34-0410-b5e6-96231b3b80d8	2017-11-09 01:52:30 +00:00
Marek Olsak	1a1019d019	AMDGPU: Merge S_BUFFER_LOAD_DWORD_IMM into x2, x4 Summary: Only constant offsets (*_IMM opcodes) are merged. It reuses code for LDS load/store merging. It relies on the scheduler to group loads. The results are mixed, I think they are mostly positive. Most shaders are affected, so here are total stats only: SGPRS: 2072198 -> 2151462 (3.83 %) VGPRS: 1628024 -> 1634612 (0.40 %) Spilled SGPRs: 7883 -> 8942 (13.43 %) Spilled VGPRs: 97 -> 101 (4.12 %) Scratch size: 1488 -> 1492 (0.27 %) dwords per thread Code Size: 60222620 -> 52940672 (-12.09 %) bytes Max Waves: 374337 -> 373066 (-0.34 %) There is 13.4% increase in SGPR spilling, DiRT Showdown spills a few more VGPRs (now 37), but 12% decrease in code size. These are the new stats for SGPR spilling. We already spill a lot SGPRs, so it's uncertain whether more spilling will make any difference since SGPRs are always spilled to VGPRs: SGPR SPILLING APPS Shaders SpillSGPR AvgPerSh alien_isolation 2938 100 0.0 batman_arkham_origins 589 6 0.0 bioshock-infinite 1769 4 0.0 borderlands2 3968 22 0.0 counter_strike_glob.. 1142 60 0.1 deus_ex_mankind_div.. 1410 79 0.1 dirt-showdown 533 4 0.0 dirt_rally 364 1163 3.2 divinity 1052 2 0.0 dota2 1747 7 0.0 f1-2015 776 1515 2.0 grid_autosport 1767 1505 0.9 hitman 1413 273 0.2 left_4_dead_2 1762 4 0.0 life_is_strange 1296 26 0.0 mad_max 358 96 0.3 metro_2033_redux 2670 60 0.0 payday2 1362 22 0.0 portal 474 3 0.0 saints_row_iv 1704 8 0.0 serious_sam_3_bfe 392 1348 3.4 shadow_of_mordor 1418 12 0.0 shadow_warrior 3956 239 0.1 talos_principle 324 1735 5.4 thea 172 17 0.1 tomb_raider 1449 215 0.1 total_war_warhammer 242 56 0.2 ue4_effects_cave 295 55 0.2 ue4_elemental 572 12 0.0 unigine_tropics 210 56 0.3 unigine_valley 278 152 0.5 victor_vran 1262 84 0.1 yofrankie 82 2 0.0 Reviewers: arsenm, nhaehnle Subscribers: kzhuravl, wdng, yaxunl, dstuttard, tpr, llvm-commits, t-tye Differential Revision: https://reviews.llvm.org/D38949 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@317751 91177308-0d34-0410-b5e6-96231b3b80d8	2017-11-09 01:52:23 +00:00
Marek Olsak	c94ae749c0	AMDGPU: Fold immediate offset into BUFFER_LOAD_DWORD lowered from SMEM Summary: -5.3% code size in affected shaders. Changed stats only: 48486 shaders in 30489 tests Totals: SGPRS: 2086406 -> 2072430 (-0.67 %) VGPRS: 1626872 -> 1627960 (0.07 %) Spilled SGPRs: 7865 -> 7912 (0.60 %) Code Size: 60978060 -> 60188764 (-1.29 %) bytes Max Waves: 374530 -> 374342 (-0.05 %) Totals from affected shaders: SGPRS: 299664 -> 285688 (-4.66 %) VGPRS: 233844 -> 234932 (0.47 %) Spilled SGPRs: 3959 -> 4006 (1.19 %) Code Size: 14905272 -> 14115976 (-5.30 %) bytes Max Waves: 46202 -> 46014 (-0.41 %) Reviewers: arsenm, nhaehnle Subscribers: kzhuravl, wdng, yaxunl, dstuttard, tpr, llvm-commits, t-tye Differential Revision: https://reviews.llvm.org/D38915 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@317750 91177308-0d34-0410-b5e6-96231b3b80d8	2017-11-09 01:52:17 +00:00
Craig Topper	c8d83c1c39	[X86] Make sure we don't read too many operands from X86ISD::FMADDS1/FMADDS3 nodes when doing FNEG combine. r317453 added new ISD nodes without rounding modes that were added to an existing if/else chain. But all the previous nodes handled there included a rounding mode. The final code after this if/else chain expected an extra operand that isn't present for the new nodes. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@317748 91177308-0d34-0410-b5e6-96231b3b80d8	2017-11-09 01:06:47 +00:00
Craig Topper	606231c1dd	[X86] Preserve memory refs when folding loads into divides. This is similar to what we already do for multiplies. Without this we can't unfold and hoist an invariant load. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@317732 91177308-0d34-0410-b5e6-96231b3b80d8	2017-11-08 22:26:39 +00:00
Dan Gohman	b5e0bec282	Add an @llvm.sideeffect intrinsic This patch implements Chandler's idea [0] for supporting languages that require support for infinite loops with side effects, such as Rust, providing part of a solution to bug 965 [1]. Specifically, it adds an `llvm.sideeffect()` intrinsic, which has no actual effect, but which appears to optimization passes to have obscure side effects, such that they don't optimize away loops containing it. It also teaches several optimization passes to ignore this intrinsic, so that it doesn't significantly impact optimization in most cases. As discussed on llvm-dev [2], this patch is the first of two major parts. The second part, to change LLVM's semantics to have defined behavior on infinite loops by default, with a function attribute for opting into potential-undefined-behavior, will be implemented and posted for review in a separate patch. [0] http://lists.llvm.org/pipermail/llvm-dev/2015-July/088103.html [1] https://bugs.llvm.org/show_bug.cgi?id=965 [2] http://lists.llvm.org/pipermail/llvm-dev/2017-October/118632.html Differential Revision: https://reviews.llvm.org/D38336 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@317729 91177308-0d34-0410-b5e6-96231b3b80d8	2017-11-08 21:59:51 +00:00
Reid Kleckner	bfc1e953bc	Revert "Correct dwarf unwind information in function epilogue for X86" This reverts r317579, originally committed as r317100. There is a design issue with marking CFI instructions duplicatable. Not all targets support the CFIInstrInserter pass, and targets like Darwin can't cope with duplicated prologue setup CFI instructions. The compact unwind info emission fails. When the following code is compiled for arm64 on Mac at -O3, the CFI instructions end up getting tail duplicated, which causes compact unwind info emission to fail: int a, c, d, e, f, g, h, i, j, k, l, m; void n(int o, int b) { if (g) f = 0; for (; f < o; f++) { m = a; if (l > j k > i) j = i = k = d; h = b[c] - e; } } We get assembly that looks like this: ; BB#1: ; %if.then Lloh3: adrp x9, _f@GOTPAGE Lloh4: ldr x9, [x9, _f@GOTPAGEOFF] mov w8, wzr Lloh5: str wzr, [x9] stp x20, x19, [sp, #-16]! ; 8-byte Folded Spill .cfi_def_cfa_offset 16 .cfi_offset w19, -8 .cfi_offset w20, -16 cmp w8, w0 b.lt LBB0_3 b LBB0_7 LBB0_2: ; %entry.if.end_crit_edge Lloh6: adrp x8, _f@GOTPAGE Lloh7: ldr x8, [x8, _f@GOTPAGEOFF] Lloh8: ldr w8, [x8] stp x20, x19, [sp, #-16]! ; 8-byte Folded Spill .cfi_def_cfa_offset 16 .cfi_offset w19, -8 .cfi_offset w20, -16 cmp w8, w0 b.ge LBB0_7 LBB0_3: ; %for.body.lr.ph Note the multiple .cfi_def* directives. Compact unwind info emission can't handle that. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@317726 91177308-0d34-0410-b5e6-96231b3b80d8	2017-11-08 21:31:14 +00:00
Dan Gohman	94964bb756	[WebAssembly] Add a test for inline-asm "m" constraints. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@317711 91177308-0d34-0410-b5e6-96231b3b80d8	2017-11-08 19:37:24 +00:00
Dan Gohman	40486d7dcf	[WebAssembly] Call signExtend to get sign extended register Patch by Jatin Bhateja! Differential Revision: https://reviews.llvm.org/D39529 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@317710 91177308-0d34-0410-b5e6-96231b3b80d8	2017-11-08 19:24:21 +00:00
Dan Gohman	8e0d3d4cd7	[WebAssembly] Revise the strategy for inline asm. Previously, an "r" constraint would mean the compiler provides a value on WebAssembly's operand stack. This was tricky to use properly, particularly since it isn't possible to declare a new local from within an inline asm string. With this patch, "r" provides the value in a WebAssembly local, and the local index is provided to the inline asm string. This requires inline asm to use get_local and set_local to read the register. This does potentially result in larger code size, however inline asm should hopefully be quite rare in WebAssembly. This also means that the "m" constraint can no longer be supported, as WebAssembly has nothing like a "memory operand" that includes an implicit get_local. This fixes PR34599 for the wasm32-unknown-unknown-wasm target (though not for the ELF target). git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@317707 91177308-0d34-0410-b5e6-96231b3b80d8	2017-11-08 19:18:08 +00:00
Simon Pilgrim	15443bdc4b	[X86] Add some initial scheduling tests for generic x86 instructions These will be using inline asm to ensure we have coverage that we're unlikely to get from lowering of basic ir. Currently waiting for D39728 to land to add support for scheduler comments for inline asm. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@317698 91177308-0d34-0410-b5e6-96231b3b80d8	2017-11-08 16:35:42 +00:00
Alex Bradbury	da781c7295	[RISCV] Initial support for function calls Note that this is just enough for simple function call examples to generate working code. Support for varargs etc follows in future patches. Differential Revision: https://reviews.llvm.org/D29936 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@317691 91177308-0d34-0410-b5e6-96231b3b80d8	2017-11-08 13:41:21 +00:00
Alex Bradbury	eacca308e4	[RISCV] Codegen for conditional branches A good portion of this patch is the extra functions that needed to be implemented to support the test case. e.g. storeRegToStackSlot, loadRegFromStackSlot, eliminateFrameIndex. Setting ISD::BR_CC to Expand may appear non-obvious on an architecture with branch+cmp instructions. However, I found it much easier to deal with matching the expanded form. I had to change simm13_lsb0 and simm21_lsb0 to inherit from the Operand<OtherVT> class rather than Operand<i32> in order to keep tablegen happy. This isn't a big deal, but it does seem a shame to lose the uniformity across immediate types when there's not an obvious benefit (I'm hoping a tablegen expert will educate me on what I'm missing here!). Differential Revision: https://reviews.llvm.org/D29935 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@317690 91177308-0d34-0410-b5e6-96231b3b80d8	2017-11-08 13:31:40 +00:00
Alex Bradbury	6c9938cf11	[RISCV] Codegen support for memory operations on global addresses Differential Revision: https://reviews.llvm.org/D39103 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@317688 91177308-0d34-0410-b5e6-96231b3b80d8	2017-11-08 13:24:21 +00:00
Alex Bradbury	21ae2e7a56	[RISCV] Codegen support for memory operations This required the implementation of RISCVTargetInstrInfo::copyPhysReg. Support for lowering global addresses follow in the next patch. Differential Revision: https://reviews.llvm.org/D29934 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@317685 91177308-0d34-0410-b5e6-96231b3b80d8	2017-11-08 12:20:01 +00:00
Alex Bradbury	c5abad3f59	[RISCV] Codegen support for materializing constants Differential Revision: https://reviews.llvm.org/D39101 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@317684 91177308-0d34-0410-b5e6-96231b3b80d8	2017-11-08 12:02:22 +00:00
Simon Dardis	d036c9c439	[mips] Guard indirect and tailcall pseudo instructions correctly. Previously these pseudo instructions were not guarded by ISA, so their select was dependant on the ordering of the entries in the DAG matcher. Reviewers: atanasyan Differential Revision: https://reviews.llvm.org/D39723 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@317681 91177308-0d34-0410-b5e6-96231b3b80d8	2017-11-08 11:13:44 +00:00
Craig Topper	d550a31777	[X86] Add patterns to fold EVEX store with EVEX encoded vcvtps2ph instructions. Remove bad pattern that had vf432 vcvtps2ph storing 128-bits. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@317662 91177308-0d34-0410-b5e6-96231b3b80d8	2017-11-08 04:00:31 +00:00
Craig Topper	c30df5f3d3	[X86] Allow legacy vcvtps2ph intrinsics to select EVEX encoded instructions. Rely on EVEX->VEX to convert back. Missed store folding opportunities will be fixed in a subsequent commit. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@317661 91177308-0d34-0410-b5e6-96231b3b80d8	2017-11-08 04:00:30 +00:00
Matt Arsenault	a932c3f118	AMDGPU: Set correct sched model on v_mad_u64_u32 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@317645 91177308-0d34-0410-b5e6-96231b3b80d8	2017-11-08 00:48:25 +00:00
Sriraman Tallam	ea30756f68	Attribute nonlazybind should not affect calls to functions with hidden visibility. Differential Revision: https://reviews.llvm.org/D39625 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@317639 91177308-0d34-0410-b5e6-96231b3b80d8	2017-11-08 00:01:05 +00:00
Justin Lebar	d8660fa5dc	[NVPTX] Implement __nvvm_atom_add_gen_d builtin. Summary: This just seems to have been an oversight. We already supported the f64 atomic add with an explicit scope (e.g. "cta"), but not the scopeless version. Reviewers: tra Subscribers: jholewinski, sanjoy, cfe-commits, llvm-commits, hiraditya Differential Revision: https://reviews.llvm.org/D39638 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@317623 91177308-0d34-0410-b5e6-96231b3b80d8	2017-11-07 22:10:54 +00:00
Graham Yiu	5363e7a31e	Use new vector insert half-word and byte instructions when we see insertelement on '8 x i16' and '16 x i8' types. Also extended existing lit testcase to cover these cases. Differential Revision: https://reviews.llvm.org/D34630 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@317613 91177308-0d34-0410-b5e6-96231b3b80d8	2017-11-07 20:55:43 +00:00
Petar Jovanovic	8cec6c4916	Reland "Correct dwarf unwind information in function epilogue for X86" Reland r317100 with minor fix regarding ComputeCommonTailLength function in BranchFolding.cpp. Skipping top CFI instructions block needs to executed on several more return points in ComputeCommonTailLength(). Original r317100 message: "Correct dwarf unwind information in function epilogue for X86" This patch aims to provide correct dwarf unwind information in function epilogue for X86. It consists of two parts. The first part inserts CFI instructions that set appropriate cfa offset and cfa register in emitEpilogue() in X86FrameLowering. This part is X86 specific. The second part is platform independent and ensures that: - CFI instructions do not affect code generation - Unwind information remains correct when a function is modified by different passes. This is done in a late pass by analyzing information about cfa offset and cfa register in BBs and inserting additional CFI directives where necessary. Changed CFI instructions so that they: - are duplicable - are not counted as instructions when tail duplicating or tail merging - can be compared as equal Added CFIInstrInserter pass: - analyzes each basic block to determine cfa offset and register valid at its entry and exit - verifies that outgoing cfa offset and register of predecessor blocks match incoming values of their successors - inserts additional CFI directives at basic block beginning to correct the rule for calculating CFA Having CFI instructions in function epilogue can cause incorrect CFA calculation rule for some basic blocks. This can happen if, due to basic block reordering, or the existence of multiple epilogue blocks, some of the blocks have wrong cfa offset and register values set by the epilogue block above them. CFIInstrInserter is currently run only on X86, but can be used by any target that implements support for adding CFI instructions in epilogue. Patch by Violeta Vukobrat. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@317579 91177308-0d34-0410-b5e6-96231b3b80d8	2017-11-07 14:40:27 +00:00
Simon Pilgrim	1ed4428b8f	[X86] Regenerate select tests git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@317571 91177308-0d34-0410-b5e6-96231b3b80d8	2017-11-07 13:21:02 +00:00
Kristof Beyls	b79469ca2f	[GlobalISel] Enable legalizing non-power-of-2 sized types. This changes the interface of how targets describe how to legalize, see the below description. 1. Interface for targets to describe how to legalize. In GlobalISel, the API in the LegalizerInfo class is the main interface for targets to specify which types are legal for which operations, and what to do to turn illegal type/operation combinations into legal ones. For each operation the type sizes that can be legalized without having to change the size of the type are specified with a call to setAction. This isn't different to how GlobalISel worked before. For example, for a target that supports 32 and 64 bit adds natively: for (auto Ty : {s32, s64}) setAction({G_ADD, 0, s32}, Legal); or for a target that needs a library call for a 32 bit division: setAction({G_SDIV, s32}, Libcall); The main conceptual change to the LegalizerInfo API, is in specifying how to legalize the type sizes for which a change of size is needed. For example, in the above example, how to specify how all types from i1 to i8388607 (apart from s32 and s64 which are legal) need to be legalized and expressed in terms of operations on the available legal sizes (again, i32 and i64 in this case). Before, the implementation only allowed specifying power-of-2-sized types (e.g. setAction({G_ADD, 0, s128}, NarrowScalar). A worse limitation was that if you'd wanted to specify how to legalize all the sized types as allowed by the LLVM-IR LangRef, i1 to i8388607, you'd have to call setAction 8388607-3 times and probably would need a lot of memory to store all of these specifications. Instead, the legalization actions that need to change the size of the type are specified now using a "SizeChangeStrategy". For example: setLegalizeScalarToDifferentSizeStrategy( G_ADD, 0, widenToLargerAndNarrowToLargest); This example indicates that for type sizes for which there is a larger size that can be legalized towards, do it by Widening the size. For example, G_ADD on s17 will be legalized by first doing WidenScalar to make it s32, after which it's legal. The "NarrowToLargest" indicates what to do if there is no larger size that can be legalized towards. E.g. G_ADD on s92 will be legalized by doing NarrowScalar to s64. Another example, taken from the ARM backend is: for (unsigned Op : {G_SDIV, G_UDIV}) { setLegalizeScalarToDifferentSizeStrategy(Op, 0, widenToLargerTypesUnsupportedOtherwise); if (ST.hasDivideInARMMode()) setAction({Op, s32}, Legal); else setAction({Op, s32}, Libcall); } For this example, G_SDIV on s8, on a target without a divide instruction, would be legalized by first doing action (WidenScalar, s32), followed by (Libcall, s32). The same principle is also followed for when the number of vector lanes on vector data types need to be changed, e.g.: setAction({G_ADD, LLT::vector(8, 8)}, LegalizerInfo::Legal); setAction({G_ADD, LLT::vector(16, 8)}, LegalizerInfo::Legal); setAction({G_ADD, LLT::vector(4, 16)}, LegalizerInfo::Legal); setAction({G_ADD, LLT::vector(8, 16)}, LegalizerInfo::Legal); setAction({G_ADD, LLT::vector(2, 32)}, LegalizerInfo::Legal); setAction({G_ADD, LLT::vector(4, 32)}, LegalizerInfo::Legal); setLegalizeVectorElementToDifferentSizeStrategy( G_ADD, 0, widenToLargerTypesUnsupportedOtherwise); As currently implemented here, vector types are legalized by first making the vector element size legal, followed by then making the number of lanes legal. The strategy to follow in the first step is set by a call to setLegalizeVectorElementToDifferentSizeStrategy, see example above. The strategy followed in the second step "moreToWiderTypesAndLessToWidest" (see code for its definition), indicating that vectors are widened to more elements so they map to natively supported vector widths, or when there isn't a legal wider vector, split the vector to map it to the widest vector supported. Therefore, for the above specification, some example legalizations are: * getAction({G_ADD, LLT::vector(3, 3)}) returns {WidenScalar, LLT::vector(3, 8)} * getAction({G_ADD, LLT::vector(3, 8)}) then returns {MoreElements, LLT::vector(8, 8)} * getAction({G_ADD, LLT::vector(20, 8)}) returns {FewerElements, LLT::vector(16, 8)} 2. Key implementation aspects. How to legalize a specific (operation, type index, size) tuple is represented by mapping intervals of integers representing a range of size types to an action to take, e.g.: setScalarAction({G_ADD, LLT:scalar(1)}, {{1, WidenScalar}, // bit sizes [ 1, 31[ {32, Legal}, // bit sizes [32, 33[ {33, WidenScalar}, // bit sizes [33, 64[ {64, Legal}, // bit sizes [64, 65[ {65, NarrowScalar} // bit sizes [65, +inf[ }); Please note that most of the code to do the actual lowering of non-power-of-2 sized types is currently missing, this is just trying to make it possible for targets to specify what is legal, and how non-legal types should be legalized. Probably quite a bit of further work is needed in the actual legalizing and the other passes in GlobalISel to support non-power-of-2 sized types. I hope the documentation in LegalizerInfo.h and the examples provided in the various {Target}LegalizerInfo.cpp and LegalizerInfoTest.cpp explains well enough how this is meant to be used. This drops the need for LLT::{half,double}...Size(). Differential Revision: https://reviews.llvm.org/D30529 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@317560 91177308-0d34-0410-b5e6-96231b3b80d8	2017-11-07 10:34:34 +00:00
Bjorn Steinbrink	c1c411e7a8	[X86] Don't clobber reserved registers with stack adjustments Summary: Calls using invoke in funclet based functions are assumed to clobber all registers, which causes the stack adjustment using pops to consider all registers not defined by the call to be undefined, which can unfortunately include the base pointer, if one is needed. To prevent this (and possibly other hazards), skip reserved registers when looking for candidate registers. This fixes issue #45034 in the Rust compiler. Reviewers: mkuper Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D39636 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@317551 91177308-0d34-0410-b5e6-96231b3b80d8	2017-11-07 08:50:21 +00:00
Craig Topper	c305f3d45a	[X86] Add patterns to fold a 64-bit load into the EVEX vcvtph2ps instructions. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@317548 91177308-0d34-0410-b5e6-96231b3b80d8	2017-11-07 07:13:07 +00:00
Craig Topper	7f4581842b	[X86] Add patterns for folding a v16i8 with the VEX vcvtph2ps intrinsics. Disable the peephole pass to prove that the pattern is working. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@317547 91177308-0d34-0410-b5e6-96231b3b80d8	2017-11-07 07:13:06 +00:00
Craig Topper	2765e2df21	[X86] Add a test for a 128-bit vector load feeding a cvtph2ps intrinsic. The instruction only loads 64-bits, but we should be able to fold a wider load and let it be narrowed. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@317546 91177308-0d34-0410-b5e6-96231b3b80d8	2017-11-07 07:13:05 +00:00
Craig Topper	bfc0134619	[X86] Remove alignment from a load in the f16c intrinsic test. The alignment shouldn't be required for load folding. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@317545 91177308-0d34-0410-b5e6-96231b3b80d8	2017-11-07 07:13:04 +00:00

1 2 3 4 5 ...

22970 Commits