Commit Graph

22970 Commits

Author SHA1 Message Date
Jatin Bhateja
be808009ac [WebAssembly] Fix stack offsets of return values from call lowering.
Summary: Fixes PR35220

Reviewers: vadimcn, alexcrichton

Reviewed By: alexcrichton

Subscribers: pepyakin, alexcrichton, jfb, dschuff, sbc100, jgravelle-google, llvm-commits, aheejin

Differential Revision: https://reviews.llvm.org/D39866

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@317895 91177308-0d34-0410-b5e6-96231b3b80d8
2017-11-10 16:26:04 +00:00
Simon Pilgrim
3343c3a754 [X86] Add scheduling tests for DAA/DAS
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@317892 91177308-0d34-0410-b5e6-96231b3b80d8
2017-11-10 15:49:41 +00:00
Simon Pilgrim
41ecf7e107 [X86] Test non-i64 shld/shll tests on x86_64 targets as well as i686
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@317888 91177308-0d34-0410-b5e6-96231b3b80d8
2017-11-10 13:43:04 +00:00
Simon Pilgrim
7702921068 [X86] Add scheduling tests
- CBW etc sign extensions
 - CLC/CLD/CMC flag modifiers
 - CPUID

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@317885 91177308-0d34-0410-b5e6-96231b3b80d8
2017-11-10 12:32:34 +00:00
Alexander Timofeev
23476c4f80 [AMDGPU] Prevent Machine Copy Propagation from replacing live copy with the dead one
Differential revision: https://reviews.llvm.org/D38754

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@317884 91177308-0d34-0410-b5e6-96231b3b80d8
2017-11-10 12:21:10 +00:00
Simon Pilgrim
d84a31be44 [X86] Added TODO list for missing generic x86 instruction scheduling tests.
Not sure if we want to add the more exotic system instructions (IRET etc.) as well?

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@317882 91177308-0d34-0410-b5e6-96231b3b80d8
2017-11-10 12:04:39 +00:00
Jonas Paulsson
a9fba7123d [RegAlloc, SystemZ] Increase number of LOCRs by passing "hard" regalloc hints.
* The method getRegAllocationHints() is now of bool type instead of void. If
true is returned, regalloc (AllocationOrder) will *only* try to allocate the
hints, as opposed to merely trying them before non-hinted registers.

* TargetRegisterInfo::getRegAllocationHints() is implemented for SystemZ with
an increase in number of LOCRs.

In this case, it is desired to force the hints even though there is a slight
increase in spilling, because if a non-hinted register would be allocated,
the LOCRMux pseudo would have to be expanded with a jump sequence. The LOCR
(Load On Condition) SystemZ instruction must have both operands in either the
low or high part of the 64 bit register.

Reviewers: Quentin Colombet and Ulrich Weigand
https://reviews.llvm.org/D36795

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@317879 91177308-0d34-0410-b5e6-96231b3b80d8
2017-11-10 08:46:26 +00:00
Craig Topper
cad7a316e0 [X86] Add support for combining FMADDSUB(A, B, FNEG(C))->FMSUBADD(A, B, C)
Support the opposite direction as well. Also add a TODO for not being able to combine FMSUB/FNMADD/FNMSUB with FNEG.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@317878 91177308-0d34-0410-b5e6-96231b3b80d8
2017-11-10 08:22:37 +00:00
Yaxun Liu
bd2cfd038d [AMDGPU] Fix pointer info for lowering load/store for r600 for amdgiz environment
r600 uses dummy pointer info for lowering load/store. Since dummy pointer info
assumes address space 0, this causes isel failure when temporary load/store SDNodes
are generated for amdgiz environment.

Since the offest is not constant, FixedStack pseudo source value cannot be used
to create the pointer info. This patch creates pointer info using llvm undef value.
At least this provides correct address space so that isel can be done correctly.

Differential Revision: https://reviews.llvm.org/D39698


git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@317862 91177308-0d34-0410-b5e6-96231b3b80d8
2017-11-10 02:03:28 +00:00
Yaxun Liu
d3bd6cf5cc [AMDGPU] Fix pointer info for pseudo source for r600
The pointer info for pseudo source for r600 is not correct when
alloca addr space is not 0, which causes invalid SDNode for r600---amdgiz.

This patch fixes that.

Differential Revision: https://reviews.llvm.org/D39670


git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@317861 91177308-0d34-0410-b5e6-96231b3b80d8
2017-11-10 01:53:24 +00:00
Ulrich Weigand
b9cdeecf25 [SystemZ] Add support for the "o" inline asm constraint
We don't really need any special handling of "offsettable"
memory addresses, but since some existing code uses inline
asm statements with the "o" constraint, add support for this
constraint for compatibility purposes.



git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@317807 91177308-0d34-0410-b5e6-96231b3b80d8
2017-11-09 16:31:57 +00:00
Simon Dardis
5be95c6164 [mips] Correct microMIP's jump and add unconditional branch pseudo
Correct the definition of 'j' as being unavailable for microMIPS32R6 and
provide the 'b' assembly idiom for codegen purposes for microMIPS32r3.

Provide the necessary 'br' pattern for microMIPS32R6 as it now longer
incorrectly uses the 'j' instruction.

Reviewers: atanasyan

Differential Revision: https://reviews.llvm.org/D39741


git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@317801 91177308-0d34-0410-b5e6-96231b3b80d8
2017-11-09 16:02:18 +00:00
Alex Bradbury
2fe0076f87 [RISCV] Re-generate test/CodeGen/RISCV/alu32.ll using update_llc_test_checks.py
No real change, but makes it marginally easier to merge the remainder of the
out-of-tree patches.


git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@317796 91177308-0d34-0410-b5e6-96231b3b80d8
2017-11-09 15:45:42 +00:00
Andrew V. Tischenko
a4b0c616f6 Sched model improving on btver2: JFPU01 resource, vtestp* for xmm.
Differential Revision: https://reviews.llvm.org/D39802


git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@317785 91177308-0d34-0410-b5e6-96231b3b80d8
2017-11-09 14:19:59 +00:00
Andrew V. Tischenko
a5b99ed522 Add -print-schedule scheduling comments to inline asm.
Differential Revision: https://reviews.llvm.org/D39728


git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@317782 91177308-0d34-0410-b5e6-96231b3b80d8
2017-11-09 12:45:40 +00:00
Craig Topper
7595e4c402 [X86] Make X86ISD::FMADDS3 isel patterns commutable.
This was missed when FMADDS3 was split from X86ISD::FMADDS3_RND.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@317769 91177308-0d34-0410-b5e6-96231b3b80d8
2017-11-09 06:17:05 +00:00
Marek Olsak
9960086c11 AMDGPU: Merge BUFFER_STORE_DWORD_OFFEN/OFFSET into x2, x4
Summary:
Only 56 shaders (out of 48486) are affected.

Totals from affected shaders (changed stats only):
SGPRS: 2420 -> 2460 (1.65 %)
Spilled VGPRs: 94 -> 112 (19.15 %)
Scratch size: 524 -> 528 (0.76 %) dwords per thread
Code Size: 187400 -> 184992 (-1.28 %) bytes

One DiRT Showdown shader spills 6 more VGPRs.
One Grid Autosport shader spills 12 more VGPRs.

The other 54 shaders only have a decrease in code size.
(I'm ignoring the SGPR noise)

Reviewers: arsenm, nhaehnle

Subscribers: kzhuravl, wdng, yaxunl, dstuttard, tpr, llvm-commits, t-tye

Differential Revision: https://reviews.llvm.org/D39012

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@317755 91177308-0d34-0410-b5e6-96231b3b80d8
2017-11-09 01:52:55 +00:00
Marek Olsak
aa75d4aeb0 AMDGPU: Lower buffer store and atomic intrinsics manually
Summary:
Without this, SIMemoryLegalizer inserts s_waitcnt vmcnt(0) before every
buffer store and atomic instruction.

Reviewers: arsenm, nhaehnle

Subscribers: kzhuravl, wdng, yaxunl, dstuttard, tpr, llvm-commits, t-tye

Differential Revision: https://reviews.llvm.org/D39060

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@317754 91177308-0d34-0410-b5e6-96231b3b80d8
2017-11-09 01:52:48 +00:00
Marek Olsak
e79e4fb9f1 AMDGPU: Merge BUFFER_LOAD_DWORD_OFFSET into x2, x4
Summary: Only 3 (out of 48486) shaders are affected.

Reviewers: arsenm, nhaehnle

Subscribers: kzhuravl, wdng, yaxunl, dstuttard, tpr, t-tye, llvm-commits

Differential Revision: https://reviews.llvm.org/D38951

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@317753 91177308-0d34-0410-b5e6-96231b3b80d8
2017-11-09 01:52:36 +00:00
Marek Olsak
79b91c25d7 AMDGPU: Merge BUFFER_LOAD_DWORD_OFFEN into x2, x4
Summary:
-9.9% code size decrease in affected shaders.

Totals (changed stats only):
SGPRS: 2151462 -> 2170646 (0.89 %)
VGPRS: 1634612 -> 1640288 (0.35 %)
Spilled SGPRs: 8942 -> 8940 (-0.02 %)
Code Size: 52940672 -> 51727288 (-2.29 %) bytes
Max Waves: 373066 -> 371718 (-0.36 %)

Totals from affected shaders:
SGPRS: 283520 -> 302704 (6.77 %)
VGPRS: 227632 -> 233308 (2.49 %)
Spilled SGPRs: 3966 -> 3964 (-0.05 %)
Code Size: 12203080 -> 10989696 (-9.94 %) bytes
Max Waves: 44070 -> 42722 (-3.06 %)

Reviewers: arsenm, nhaehnle

Subscribers: kzhuravl, wdng, yaxunl, dstuttard, tpr, llvm-commits, t-tye

Differential Revision: https://reviews.llvm.org/D38950

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@317752 91177308-0d34-0410-b5e6-96231b3b80d8
2017-11-09 01:52:30 +00:00
Marek Olsak
1a1019d019 AMDGPU: Merge S_BUFFER_LOAD_DWORD_IMM into x2, x4
Summary:
Only constant offsets (*_IMM opcodes) are merged.
It reuses code for LDS load/store merging.
It relies on the scheduler to group loads.

The results are mixed, I think they are mostly positive. Most shaders are
affected, so here are total stats only:

 SGPRS: 2072198 -> 2151462 (3.83 %)
 VGPRS: 1628024 -> 1634612 (0.40 %)
 Spilled SGPRs: 7883 -> 8942 (13.43 %)
 Spilled VGPRs: 97 -> 101 (4.12 %)
 Scratch size: 1488 -> 1492 (0.27 %) dwords per thread
 Code Size: 60222620 -> 52940672 (-12.09 %) bytes
 Max Waves: 374337 -> 373066 (-0.34 %)

There is 13.4% increase in SGPR spilling, DiRT Showdown spills a few more
VGPRs (now 37), but 12% decrease in code size.

These are the new stats for SGPR spilling. We already spill a lot SGPRs,
so it's uncertain whether more spilling will make any difference since
SGPRs are always spilled to VGPRs:

 SGPR SPILLING APPS   Shaders SpillSGPR AvgPerSh
 alien_isolation         2938       100      0.0
 batman_arkham_origins    589         6      0.0
 bioshock-infinite       1769         4      0.0
 borderlands2            3968        22      0.0
 counter_strike_glob..   1142        60      0.1
 deus_ex_mankind_div..   1410        79      0.1
 dirt-showdown            533         4      0.0
 dirt_rally               364      1163      3.2
 divinity                1052         2      0.0
 dota2                   1747         7      0.0
 f1-2015                  776      1515      2.0
 grid_autosport          1767      1505      0.9
 hitman                  1413       273      0.2
 left_4_dead_2           1762         4      0.0
 life_is_strange         1296        26      0.0
 mad_max                  358        96      0.3
 metro_2033_redux        2670        60      0.0
 payday2                 1362        22      0.0
 portal                   474         3      0.0
 saints_row_iv           1704         8      0.0
 serious_sam_3_bfe        392      1348      3.4
 shadow_of_mordor        1418        12      0.0
 shadow_warrior          3956       239      0.1
 talos_principle          324      1735      5.4
 thea                     172        17      0.1
 tomb_raider             1449       215      0.1
 total_war_warhammer      242        56      0.2
 ue4_effects_cave         295        55      0.2
 ue4_elemental            572        12      0.0
 unigine_tropics          210        56      0.3
 unigine_valley           278       152      0.5
 victor_vran             1262        84      0.1
 yofrankie                 82         2      0.0

Reviewers: arsenm, nhaehnle

Subscribers: kzhuravl, wdng, yaxunl, dstuttard, tpr, llvm-commits, t-tye

Differential Revision: https://reviews.llvm.org/D38949

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@317751 91177308-0d34-0410-b5e6-96231b3b80d8
2017-11-09 01:52:23 +00:00
Marek Olsak
c94ae749c0 AMDGPU: Fold immediate offset into BUFFER_LOAD_DWORD lowered from SMEM
Summary:
-5.3% code size in affected shaders.

Changed stats only:

48486 shaders in 30489 tests
Totals:
SGPRS: 2086406 -> 2072430 (-0.67 %)
VGPRS: 1626872 -> 1627960 (0.07 %)
Spilled SGPRs: 7865 -> 7912 (0.60 %)
Code Size: 60978060 -> 60188764 (-1.29 %) bytes
Max Waves: 374530 -> 374342 (-0.05 %)

Totals from affected shaders:
SGPRS: 299664 -> 285688 (-4.66 %)
VGPRS: 233844 -> 234932 (0.47 %)
Spilled SGPRs: 3959 -> 4006 (1.19 %)
Code Size: 14905272 -> 14115976 (-5.30 %) bytes
Max Waves: 46202 -> 46014 (-0.41 %)

Reviewers: arsenm, nhaehnle

Subscribers: kzhuravl, wdng, yaxunl, dstuttard, tpr, llvm-commits, t-tye

Differential Revision: https://reviews.llvm.org/D38915

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@317750 91177308-0d34-0410-b5e6-96231b3b80d8
2017-11-09 01:52:17 +00:00
Craig Topper
c8d83c1c39 [X86] Make sure we don't read too many operands from X86ISD::FMADDS1/FMADDS3 nodes when doing FNEG combine.
r317453 added new ISD nodes without rounding modes that were added to an existing if/else chain. But all the previous nodes handled there included a rounding mode. The final code after this if/else chain expected an extra operand that isn't present for the new nodes.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@317748 91177308-0d34-0410-b5e6-96231b3b80d8
2017-11-09 01:06:47 +00:00
Craig Topper
606231c1dd [X86] Preserve memory refs when folding loads into divides.
This is similar to what we already do for multiplies. Without this we can't unfold and hoist an invariant load.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@317732 91177308-0d34-0410-b5e6-96231b3b80d8
2017-11-08 22:26:39 +00:00
Dan Gohman
b5e0bec282 Add an @llvm.sideeffect intrinsic
This patch implements Chandler's idea [0] for supporting languages that
require support for infinite loops with side effects, such as Rust, providing
part of a solution to bug 965 [1].

Specifically, it adds an `llvm.sideeffect()` intrinsic, which has no actual
effect, but which appears to optimization passes to have obscure side effects,
such that they don't optimize away loops containing it. It also teaches
several optimization passes to ignore this intrinsic, so that it doesn't
significantly impact optimization in most cases.

As discussed on llvm-dev [2], this patch is the first of two major parts.
The second part, to change LLVM's semantics to have defined behavior
on infinite loops by default, with a function attribute for opting into
potential-undefined-behavior, will be implemented and posted for review in
a separate patch.

[0] http://lists.llvm.org/pipermail/llvm-dev/2015-July/088103.html
[1] https://bugs.llvm.org/show_bug.cgi?id=965
[2] http://lists.llvm.org/pipermail/llvm-dev/2017-October/118632.html

Differential Revision: https://reviews.llvm.org/D38336


git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@317729 91177308-0d34-0410-b5e6-96231b3b80d8
2017-11-08 21:59:51 +00:00
Reid Kleckner
bfc1e953bc Revert "Correct dwarf unwind information in function epilogue for X86"
This reverts r317579, originally committed as r317100.

There is a design issue with marking CFI instructions duplicatable. Not
all targets support the CFIInstrInserter pass, and targets like Darwin
can't cope with duplicated prologue setup CFI instructions. The compact
unwind info emission fails.

When the following code is compiled for arm64 on Mac at -O3, the CFI
instructions end up getting tail duplicated, which causes compact unwind
info emission to fail:
  int a, c, d, e, f, g, h, i, j, k, l, m;
  void n(int o, int *b) {
    if (g)
      f = 0;
    for (; f < o; f++) {
      m = a;
      if (l > j * k > i)
        j = i = k = d;
      h = b[c] - e;
    }
  }

We get assembly that looks like this:
; BB#1:                                 ; %if.then
Lloh3:
	adrp	x9, _f@GOTPAGE
Lloh4:
	ldr	x9, [x9, _f@GOTPAGEOFF]
	mov	 w8, wzr
Lloh5:
	str		wzr, [x9]
	stp	x20, x19, [sp, #-16]!   ; 8-byte Folded Spill
	.cfi_def_cfa_offset 16
	.cfi_offset w19, -8
	.cfi_offset w20, -16
	cmp		w8, w0
	b.lt	LBB0_3
	b	LBB0_7
LBB0_2:                                 ; %entry.if.end_crit_edge
Lloh6:
	adrp	x8, _f@GOTPAGE
Lloh7:
	ldr	x8, [x8, _f@GOTPAGEOFF]
Lloh8:
	ldr		w8, [x8]
	stp	x20, x19, [sp, #-16]!   ; 8-byte Folded Spill
	.cfi_def_cfa_offset 16
	.cfi_offset w19, -8
	.cfi_offset w20, -16
	cmp		w8, w0
	b.ge	LBB0_7
LBB0_3:                                 ; %for.body.lr.ph

Note the multiple .cfi_def* directives. Compact unwind info emission
can't handle that.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@317726 91177308-0d34-0410-b5e6-96231b3b80d8
2017-11-08 21:31:14 +00:00
Dan Gohman
94964bb756 [WebAssembly] Add a test for inline-asm "m" constraints.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@317711 91177308-0d34-0410-b5e6-96231b3b80d8
2017-11-08 19:37:24 +00:00
Dan Gohman
40486d7dcf [WebAssembly] Call signExtend to get sign extended register
Patch by Jatin Bhateja!

Differential Revision: https://reviews.llvm.org/D39529


git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@317710 91177308-0d34-0410-b5e6-96231b3b80d8
2017-11-08 19:24:21 +00:00
Dan Gohman
8e0d3d4cd7 [WebAssembly] Revise the strategy for inline asm.
Previously, an "r" constraint would mean the compiler provides a value
on WebAssembly's operand stack. This was tricky to use properly,
particularly since it isn't possible to declare a new local from within
an inline asm string.

With this patch, "r" provides the value in a WebAssembly local, and the
local index is provided to the inline asm string. This requires inline
asm to use get_local and set_local to read the register. This does
potentially result in larger code size, however inline asm should
hopefully be quite rare in WebAssembly.

This also means that the "m" constraint can no longer be supported, as
WebAssembly has nothing like a "memory operand" that includes an
implicit get_local.

This fixes PR34599 for the wasm32-unknown-unknown-wasm target (though
not for the ELF target).


git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@317707 91177308-0d34-0410-b5e6-96231b3b80d8
2017-11-08 19:18:08 +00:00
Simon Pilgrim
15443bdc4b [X86] Add some initial scheduling tests for generic x86 instructions
These will be using inline asm to ensure we have coverage that we're unlikely to get from lowering of basic ir.

Currently waiting for D39728 to land to add support for scheduler comments for inline asm.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@317698 91177308-0d34-0410-b5e6-96231b3b80d8
2017-11-08 16:35:42 +00:00
Alex Bradbury
da781c7295 [RISCV] Initial support for function calls
Note that this is just enough for simple function call examples to generate 
working code. Support for varargs etc follows in future patches.

Differential Revision: https://reviews.llvm.org/D29936


git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@317691 91177308-0d34-0410-b5e6-96231b3b80d8
2017-11-08 13:41:21 +00:00
Alex Bradbury
eacca308e4 [RISCV] Codegen for conditional branches
A good portion of this patch is the extra functions that needed to be 
implemented to support the test case. e.g. storeRegToStackSlot, 
loadRegFromStackSlot, eliminateFrameIndex.

Setting ISD::BR_CC to Expand may appear non-obvious on an architecture with 
branch+cmp instructions. However, I found it much easier to deal with matching 
the expanded form.

I had to change simm13_lsb0 and simm21_lsb0 to inherit from the 
Operand<OtherVT> class rather than Operand<i32> in order to keep tablegen 
happy. This isn't a big deal, but it does seem a shame to lose the uniformity 
across immediate types when there's not an obvious benefit (I'm hoping a 
tablegen expert will educate me on what I'm missing here!).

Differential Revision: https://reviews.llvm.org/D29935


git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@317690 91177308-0d34-0410-b5e6-96231b3b80d8
2017-11-08 13:31:40 +00:00
Alex Bradbury
6c9938cf11 [RISCV] Codegen support for memory operations on global addresses
Differential Revision: https://reviews.llvm.org/D39103


git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@317688 91177308-0d34-0410-b5e6-96231b3b80d8
2017-11-08 13:24:21 +00:00
Alex Bradbury
21ae2e7a56 [RISCV] Codegen support for memory operations
This required the implementation of RISCVTargetInstrInfo::copyPhysReg. Support
for lowering global addresses follow in the next patch.

Differential Revision: https://reviews.llvm.org/D29934


git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@317685 91177308-0d34-0410-b5e6-96231b3b80d8
2017-11-08 12:20:01 +00:00
Alex Bradbury
c5abad3f59 [RISCV] Codegen support for materializing constants
Differential Revision: https://reviews.llvm.org/D39101


git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@317684 91177308-0d34-0410-b5e6-96231b3b80d8
2017-11-08 12:02:22 +00:00
Simon Dardis
d036c9c439 [mips] Guard indirect and tailcall pseudo instructions correctly.
Previously these pseudo instructions were not guarded by ISA, so their
select was dependant on the ordering of the entries in the DAG matcher.

Reviewers: atanasyan

Differential Revision: https://reviews.llvm.org/D39723


git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@317681 91177308-0d34-0410-b5e6-96231b3b80d8
2017-11-08 11:13:44 +00:00
Craig Topper
d550a31777 [X86] Add patterns to fold EVEX store with EVEX encoded vcvtps2ph instructions. Remove bad pattern that had vf432 vcvtps2ph storing 128-bits.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@317662 91177308-0d34-0410-b5e6-96231b3b80d8
2017-11-08 04:00:31 +00:00
Craig Topper
c30df5f3d3 [X86] Allow legacy vcvtps2ph intrinsics to select EVEX encoded instructions. Rely on EVEX->VEX to convert back.
Missed store folding opportunities will be fixed in a subsequent commit.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@317661 91177308-0d34-0410-b5e6-96231b3b80d8
2017-11-08 04:00:30 +00:00
Matt Arsenault
a932c3f118 AMDGPU: Set correct sched model on v_mad_u64_u32
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@317645 91177308-0d34-0410-b5e6-96231b3b80d8
2017-11-08 00:48:25 +00:00
Sriraman Tallam
ea30756f68 Attribute nonlazybind should not affect calls to functions with hidden visibility.
Differential Revision: https://reviews.llvm.org/D39625

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@317639 91177308-0d34-0410-b5e6-96231b3b80d8
2017-11-08 00:01:05 +00:00
Justin Lebar
d8660fa5dc [NVPTX] Implement __nvvm_atom_add_gen_d builtin.
Summary:
This just seems to have been an oversight.  We already supported the f64
atomic add with an explicit scope (e.g. "cta"), but not the scopeless
version.

Reviewers: tra

Subscribers: jholewinski, sanjoy, cfe-commits, llvm-commits, hiraditya

Differential Revision: https://reviews.llvm.org/D39638

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@317623 91177308-0d34-0410-b5e6-96231b3b80d8
2017-11-07 22:10:54 +00:00
Graham Yiu
5363e7a31e Use new vector insert half-word and byte instructions when we see insertelement on '8 x i16' and '16 x i8' types. Also extended existing lit testcase to cover these cases.
Differential Revision: https://reviews.llvm.org/D34630

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@317613 91177308-0d34-0410-b5e6-96231b3b80d8
2017-11-07 20:55:43 +00:00
Petar Jovanovic
8cec6c4916 Reland "Correct dwarf unwind information in function epilogue for X86"
Reland r317100 with minor fix regarding ComputeCommonTailLength function in
BranchFolding.cpp. Skipping top CFI instructions block needs to executed on
several more return points in ComputeCommonTailLength().

Original r317100 message:

"Correct dwarf unwind information in function epilogue for X86"

This patch aims to provide correct dwarf unwind information in function
epilogue for X86.

It consists of two parts. The first part inserts CFI instructions that set
appropriate cfa offset and cfa register in emitEpilogue() in
X86FrameLowering. This part is X86 specific.

The second part is platform independent and ensures that:

- CFI instructions do not affect code generation
- Unwind information remains correct when a function is modified by
  different passes. This is done in a late pass by analyzing information
  about cfa offset and cfa register in BBs and inserting additional CFI
  directives where necessary.

Changed CFI instructions so that they:

- are duplicable
- are not counted as instructions when tail duplicating or tail merging
- can be compared as equal

Added CFIInstrInserter pass:

- analyzes each basic block to determine cfa offset and register valid at
  its entry and exit
- verifies that outgoing cfa offset and register of predecessor blocks match
  incoming values of their successors
- inserts additional CFI directives at basic block beginning to correct the
  rule for calculating CFA

Having CFI instructions in function epilogue can cause incorrect CFA
calculation rule for some basic blocks. This can happen if, due to basic
block reordering, or the existence of multiple epilogue blocks, some of the
blocks have wrong cfa offset and register values set by the epilogue block
above them.

CFIInstrInserter is currently run only on X86, but can be used by any target
that implements support for adding CFI instructions in epilogue.

Patch by Violeta Vukobrat.


git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@317579 91177308-0d34-0410-b5e6-96231b3b80d8
2017-11-07 14:40:27 +00:00
Simon Pilgrim
1ed4428b8f [X86] Regenerate select tests
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@317571 91177308-0d34-0410-b5e6-96231b3b80d8
2017-11-07 13:21:02 +00:00
Kristof Beyls
b79469ca2f [GlobalISel] Enable legalizing non-power-of-2 sized types.
This changes the interface of how targets describe how to legalize, see
the below description.

1. Interface for targets to describe how to legalize.

In GlobalISel, the API in the LegalizerInfo class is the main interface
for targets to specify which types are legal for which operations, and
what to do to turn illegal type/operation combinations into legal ones.

For each operation the type sizes that can be legalized without having
to change the size of the type are specified with a call to setAction.
This isn't different to how GlobalISel worked before. For example, for a
target that supports 32 and 64 bit adds natively:

  for (auto Ty : {s32, s64})
    setAction({G_ADD, 0, s32}, Legal);

or for a target that needs a library call for a 32 bit division:

  setAction({G_SDIV, s32}, Libcall);

The main conceptual change to the LegalizerInfo API, is in specifying
how to legalize the type sizes for which a change of size is needed. For
example, in the above example, how to specify how all types from i1 to
i8388607 (apart from s32 and s64 which are legal) need to be legalized
and expressed in terms of operations on the available legal sizes
(again, i32 and i64 in this case). Before, the implementation only
allowed specifying power-of-2-sized types (e.g. setAction({G_ADD, 0,
s128}, NarrowScalar).  A worse limitation was that if you'd wanted to
specify how to legalize all the sized types as allowed by the LLVM-IR
LangRef, i1 to i8388607, you'd have to call setAction 8388607-3 times
and probably would need a lot of memory to store all of these
specifications.

Instead, the legalization actions that need to change the size of the
type are specified now using a "SizeChangeStrategy".  For example:

   setLegalizeScalarToDifferentSizeStrategy(
       G_ADD, 0, widenToLargerAndNarrowToLargest);

This example indicates that for type sizes for which there is a larger
size that can be legalized towards, do it by Widening the size.
For example, G_ADD on s17 will be legalized by first doing WidenScalar
to make it s32, after which it's legal.
The "NarrowToLargest" indicates what to do if there is no larger size
that can be legalized towards. E.g. G_ADD on s92 will be legalized by
doing NarrowScalar to s64.

Another example, taken from the ARM backend is:
   for (unsigned Op : {G_SDIV, G_UDIV}) {
     setLegalizeScalarToDifferentSizeStrategy(Op, 0,
         widenToLargerTypesUnsupportedOtherwise);
     if (ST.hasDivideInARMMode())
       setAction({Op, s32}, Legal);
     else
       setAction({Op, s32}, Libcall);
   }

For this example, G_SDIV on s8, on a target without a divide
instruction, would be legalized by first doing action (WidenScalar,
s32), followed by (Libcall, s32).

The same principle is also followed for when the number of vector lanes
on vector data types need to be changed, e.g.:

   setAction({G_ADD, LLT::vector(8, 8)}, LegalizerInfo::Legal);
   setAction({G_ADD, LLT::vector(16, 8)}, LegalizerInfo::Legal);
   setAction({G_ADD, LLT::vector(4, 16)}, LegalizerInfo::Legal);
   setAction({G_ADD, LLT::vector(8, 16)}, LegalizerInfo::Legal);
   setAction({G_ADD, LLT::vector(2, 32)}, LegalizerInfo::Legal);
   setAction({G_ADD, LLT::vector(4, 32)}, LegalizerInfo::Legal);
   setLegalizeVectorElementToDifferentSizeStrategy(
       G_ADD, 0, widenToLargerTypesUnsupportedOtherwise);

As currently implemented here, vector types are legalized by first
making the vector element size legal, followed by then making the number
of lanes legal. The strategy to follow in the first step is set by a
call to setLegalizeVectorElementToDifferentSizeStrategy, see example
above.  The strategy followed in the second step
"moreToWiderTypesAndLessToWidest" (see code for its definition),
indicating that vectors are widened to more elements so they map to
natively supported vector widths, or when there isn't a legal wider
vector, split the vector to map it to the widest vector supported.

Therefore, for the above specification, some example legalizations are:
  * getAction({G_ADD, LLT::vector(3, 3)})
    returns {WidenScalar, LLT::vector(3, 8)}
  * getAction({G_ADD, LLT::vector(3, 8)})
    then returns {MoreElements, LLT::vector(8, 8)}
  * getAction({G_ADD, LLT::vector(20, 8)})
    returns {FewerElements, LLT::vector(16, 8)}


2. Key implementation aspects.

How to legalize a specific (operation, type index, size) tuple is
represented by mapping intervals of integers representing a range of
size types to an action to take, e.g.:

       setScalarAction({G_ADD, LLT:scalar(1)},
                       {{1, WidenScalar},  // bit sizes [ 1, 31[
                        {32, Legal},       // bit sizes [32, 33[
                        {33, WidenScalar}, // bit sizes [33, 64[
                        {64, Legal},       // bit sizes [64, 65[
                        {65, NarrowScalar} // bit sizes [65, +inf[
                       });

Please note that most of the code to do the actual lowering of
non-power-of-2 sized types is currently missing, this is just trying to
make it possible for targets to specify what is legal, and how non-legal
types should be legalized.  Probably quite a bit of further work is
needed in the actual legalizing and the other passes in GlobalISel to
support non-power-of-2 sized types.

I hope the documentation in LegalizerInfo.h and the examples provided in the
various {Target}LegalizerInfo.cpp and LegalizerInfoTest.cpp explains well
enough how this is meant to be used.

This drops the need for LLT::{half,double}...Size().


Differential Revision: https://reviews.llvm.org/D30529



git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@317560 91177308-0d34-0410-b5e6-96231b3b80d8
2017-11-07 10:34:34 +00:00
Bjorn Steinbrink
c1c411e7a8 [X86] Don't clobber reserved registers with stack adjustments
Summary:
Calls using invoke in funclet based functions are assumed to clobber
all registers, which causes the stack adjustment using pops to consider
all registers not defined by the call to be undefined, which can
unfortunately include the base pointer, if one is needed.

To prevent this (and possibly other hazards), skip reserved registers
when looking for candidate registers.

This fixes issue #45034 in the Rust compiler.

Reviewers: mkuper

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D39636

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@317551 91177308-0d34-0410-b5e6-96231b3b80d8
2017-11-07 08:50:21 +00:00
Craig Topper
c305f3d45a [X86] Add patterns to fold a 64-bit load into the EVEX vcvtph2ps instructions.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@317548 91177308-0d34-0410-b5e6-96231b3b80d8
2017-11-07 07:13:07 +00:00
Craig Topper
7f4581842b [X86] Add patterns for folding a v16i8 with the VEX vcvtph2ps intrinsics.
Disable the peephole pass to prove that the pattern is working.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@317547 91177308-0d34-0410-b5e6-96231b3b80d8
2017-11-07 07:13:06 +00:00
Craig Topper
2765e2df21 [X86] Add a test for a 128-bit vector load feeding a cvtph2ps intrinsic.
The instruction only loads 64-bits, but we should be able to fold a wider load and let it be narrowed.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@317546 91177308-0d34-0410-b5e6-96231b3b80d8
2017-11-07 07:13:05 +00:00
Craig Topper
bfc0134619 [X86] Remove alignment from a load in the f16c intrinsic test. The alignment shouldn't be required for load folding.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@317545 91177308-0d34-0410-b5e6-96231b3b80d8
2017-11-07 07:13:04 +00:00