47445 Commits

Author SHA1 Message Date
Sam Clegg
a01ce37a7f [WebAssembly] Initial Disassembler.
This implements a new table-gen emitter to create tables for
a wasm disassembler, and a dissassembler to use them.

Comes with 2 tests, that tests a few instructions manually. Is also able to
disassemble large .wasm files with objdump reasonably.

Not working so well, to be addressed in followups:
- objdump appears to be passing an incorrect starting point.
- since the disassembler works an instruction at a time, and it is
  disassembling stack instruction, it has no idea of pseudo register assignments.
  These registers are required for the instruction printing code that follows.
  For now, all such registers appear in the output as $0.

Patch by Wouter van Oortmerssen

Differential Revision: https://reviews.llvm.org/D45848

llvm-svn: 332052
2018-05-10 22:16:44 +00:00
Craig Topper
3e055ced70 [X86] Add new patterns for masked scalar load/store to match clang's codegen from r331958.
Clang's codegen now uses 128-bit masked load/store intrinsics in IR. The backend will widen to 512-bits on AVX512F targets.

So this patch adds patterns to detect codegen's widening and patterns for AVX512VL that don't get widened.

We may be able to drop some of the old patterns, but I leave that for a future patch.

llvm-svn: 332049
2018-05-10 21:49:16 +00:00
Tom Stellard
50cdefd704 AMDGPU/GlobalISel: Implement select() for G_BITCAST s32 <--> <2 x s16>
Reviewers: arsenm, nhaehnle

Subscribers: kzhuravl, wdng, yaxunl, rovka, kristof.beyls, dstuttard, tpr, t-tye, llvm-commits

Differential Revision: https://reviews.llvm.org/D45881

llvm-svn: 332042
2018-05-10 21:20:10 +00:00
Tom Stellard
f3e91252b5 AMDGPU/GlobalISel: Enable TableGen'd instruction selector
Reviewers: arsenm, nhaehnle

Reviewed By: arsenm

Subscribers: kzhuravl, wdng, mgorny, yaxunl, rovka, kristof.beyls, dstuttard, tpr, t-tye, llvm-commits

Differential Revision: https://reviews.llvm.org/D45994

llvm-svn: 332039
2018-05-10 20:53:06 +00:00
Gabor Buella
96af1f4837 [X86] Initialize HasPTWRITE member of X86Subtarget
This was missing from r331961.
Caught by sanitizer bots.

llvm-svn: 332024
2018-05-10 19:15:10 +00:00
Simon Pilgrim
4d5b5dfc53 [X86] Convert/Merge more instregex patterns to reduce InstrRW compile time.
Use instrs lists or merge multiple instregex patterns.

llvm-svn: 332022
2018-05-10 19:08:06 +00:00
Haicheng Wu
cc71215e80 [CGP] Split large data structres to sink more GEPs
Accessing the members of a large data structures needs a lot of GEPs which
usually have large offsets due to the size of the underlying data structure. If
the offsets are too large to fit into the r+i addressing mode, these GEPs cannot
be sunk to their users' blocks and many extra registers are needed then to carry
the values of these GEPs.

This patch tries to split a large data struct starting from %base like the
following.

Before:
BB0:
  %base     =

BB1:
  %gep0     = gep %base, off0
  %gep1     = gep %base, off1
  %gep2     = gep %base, off2

BB2:
  %load1    = load %gep0
  %load2    = load %gep1
  %load3    = load %gep2

After:
BB0:
  %base     =
  %new_base = gep %base, off0

BB1:
  %new_gep0 = %new_base
  %new_gep1 = gep %new_base, off1 - off0
  %new_gep2 = gep %new_base, off2 - off0

BB2:
  %load1    = load i32, i32* %new_gep0
  %load2    = load i32, i32* %new_gep1
  %load3    = load i32, i32* %new_gep2

In the above example, the struct is split into two parts. The first part still
starts from %base and the second part starts from %new_base. After the
splitting, %new_gep1 and %new_gep2 have smaller offsets and then can be sunk to
BB2 and folded into their users.

The algorithm to split data structure is simple and very similar to the work of
merging SExts. First, it collects GEPs that have large offsets when iterating
the blocks. Second, it splits the underlying data structures and updates the
collected GEPs to use smaller offsets.

Differential Revision: https://reviews.llvm.org/D42759

llvm-svn: 332015
2018-05-10 18:27:36 +00:00
Simon Pilgrim
b88252fa40 [X86][Znver1] Remove unnecessary SchedWritePMULLD InstRW overrides.
llvm-svn: 332006
2018-05-10 17:42:26 +00:00
Simon Pilgrim
f555f6990b [X86][SNB] Fix typo in PEXTRDmr instregex, was missing VPEXTRDmr.
llvm-svn: 332002
2018-05-10 17:30:49 +00:00
Simon Pilgrim
ea0328e7cf [X86] Split WriteVecALU/WriteVecLogic/WriteShuffle/WriteVarShuffle/WritePSADBW/WritePHAdd scheduler classes
Split off XMM classes from the default (MMX) classes.

llvm-svn: 331999
2018-05-10 17:06:09 +00:00
Simon Atanasyan
301b65d0f6 [mips] Accept 32-bit offsets for ld/sd/lld commands
This is a follow up to the rL330983. The patch teaches ld, sd, and lld
commands accept 32-bit memory offsets by replacing `mem_simm16` operand
to `mem_simmptr`. In fact, these commands should accept 64-bit offsets,
but so large offsets require another command expanding and will be
supported by a separate patch.

Differential Revision: https://reviews.llvm.org/D46629

llvm-svn: 331997
2018-05-10 16:01:36 +00:00
Simon Atanasyan
18e71301e7 [mips] Accept 32-bit offsets for lh and lhu commands
This is a follow up to the rL330983. The patch teaches lh and lhu
commands accepts 32-bit memory offsets by replacing `mem_simm16` operand
to `mem_simmptr`.

Differential Revision: https://reviews.llvm.org/D46513

llvm-svn: 331996
2018-05-10 16:01:18 +00:00
Sanjay Patel
e77f8316e9 [x86] fix fmaxnum/fminnum with nnan
With nnan, there's no need for the masked merge / blend
sequence (that probably costs much more than the min/max
instruction).

Somewhere between clang 5.0 and 6.0, we started producing
these intrinsics for fmax()/fmin() in C source instead of
libcalls or fcmp/select. The backend wasn't prepared for
that, so we regressed perf in those cases.

Note: it's possible that other targets have similar problems
as seen here. 

Noticed while investigating PR37403 and related bugs:
https://bugs.llvm.org/show_bug.cgi?id=37403

The IR FMF propagation cases still don't work. There's
a proposal that might fix those cases in D46563.

llvm-svn: 331992
2018-05-10 15:40:49 +00:00
Simon Dardis
580ce729dc [mips] Correct the predicates of cvt.fmt.fmt instructions
Reviewers: atanasyan, smaksimovic, abeserminji

Differential Revision: https://reviews.llvm.org/D46390

llvm-svn: 331969
2018-05-10 10:42:30 +00:00
Gabor Buella
c9bf50d007 [X86] ptwrite intrinsic
Reviewers: craig.topper, RKSimon

Reviewed By: craig.topper, RKSimon

Differential Revision: https://reviews.llvm.org/D46539

llvm-svn: 331961
2018-05-10 07:26:05 +00:00
Artem Belevich
e3372a717a [NVPTX] Added a feature to use short pointers for const/local/shared AS.
Const/local/shared address spaces are all < 4GB and we can always use
32-bit pointers to access them. This has substantial performance impact
on kernels that uses shared memory for intermediary results.

The feature is disabled by default.

Differential Revision: https://reviews.llvm.org/D46147

llvm-svn: 331941
2018-05-09 23:46:19 +00:00
Amaury Sechet
5b50c746d8 [ARM] Add support for SETCCCARRY instead of SETCCE
Summary: As per title. SETCCE is deprecated and will eventually be removed.

Reviewers: rogfer01, efriedma, rengolin, javed.absar

Subscribers: kristof.beyls, chrib, llvm-commits

Differential Revision: https://reviews.llvm.org/D46512

llvm-svn: 331929
2018-05-09 22:15:51 +00:00
Farhana Aleen
de1cb88328 [AMDGPU] Support horizontal vectorization of min/max.
Author: FarhanaAleen

Reviewed By: rampitec

Subscribers: AMDGPU

Differential Revision: https://reviews.llvm.org/D46604

llvm-svn: 331920
2018-05-09 21:18:34 +00:00
Matt Arsenault
71f2a2852f AMDGPU: Ignore any_extend in mul24 combine
If a multiply is truncated, SimplifyDemandedBits
sometimes turns a zero_extend of the inputs into an
any_extend, which makes the known bits computation unhelpful.
Ignore these and compute known bits for the underlying value,
since we insert the correct extend type after.

llvm-svn: 331919
2018-05-09 21:11:35 +00:00
Krzysztof Parzyszek
fa4a514b8e [Hexagon] Add patterns for vector shift-and-accumulate
llvm-svn: 331918
2018-05-09 21:10:41 +00:00
Matt Arsenault
c923946f57 AMDGPU: Handle partial shift reduction for variable shifts
If the variable shift amount has known bits, we can still reduce
the shift.

llvm-svn: 331917
2018-05-09 20:52:54 +00:00
Matt Arsenault
52c2acf8b4 AMDGPU: Partially shrink 64-bit shifts if reduced to 16-bit
This is an extension of an existing combine to reduce wider
shls if the result fits in the final result type. This
introduces the same combine, but reduces the shift to a middle
sized type to avoid the slow 64-bit shift.

llvm-svn: 331916
2018-05-09 20:52:43 +00:00
Simon Pilgrim
22828c06d8 [X86] Fix Broadwell's Shuffle256 schedule classes load latency values.
Allows us to remove some unnecessary InstRW overrides.

llvm-svn: 331913
2018-05-09 19:27:48 +00:00
Simon Pilgrim
3cbabb6e3c [X86] Merge instregex patterns to reduce InstrRW compile time.
llvm-svn: 331911
2018-05-09 19:04:15 +00:00
Matt Arsenault
b1686d7a9c AMDGPU: Add combine for trunc of bitcast from build_vector
If the truncate is only accessing the first element of the vector,
we can use the original source value.

This helps with some combine ordering issues after operations are
lowered to integer operations between bitcasts of build_vector.
In particular it stops unnecessarily materializing the unused
top half of a vector in some cases.

llvm-svn: 331909
2018-05-09 18:37:39 +00:00
Krzysztof Parzyszek
966370604d [Hexagon] Check the end of the correct container (fix typo)
llvm-svn: 331907
2018-05-09 18:33:59 +00:00
Matt Arsenault
065137cd64 AMDGPU: Stop special casing constant indexes of extract_vector_elt
The same result folds out of the dynamic expansion logic if the
index is constant.

llvm-svn: 331906
2018-05-09 18:29:26 +00:00
Krzysztof Parzyszek
8c84a11ef0 [Hexagon] Fix sanitizer error about using -1u in variable of enum type
llvm-svn: 331887
2018-05-09 15:44:40 +00:00
Krzysztof Parzyszek
eb411966ec [Hexagon] Simplify MCCodeEmitter, move data to tables
llvm-svn: 331883
2018-05-09 15:02:04 +00:00
Adhemerval Zanella
2e1b0642c7 [AArch64] Improve cost of vector division by constant
With custom lowering for vector MULLH{S,U}, it is now profitable to
vectorize a divide by constant loop for the custom types (v16i8, v8i16,
and v4i32).  The cost if based on TargetLowering::Build{S,U}DIV which
uses a multiply by constant plus adjustment to express a divide by
constant.

Both {u,s}mull{2} are expressed as Instruction::Mul and shifts by
Instruction::AShr.

llvm-svn: 331873
2018-05-09 12:48:22 +00:00
Simon Pilgrim
80bdcf7e1e [X86] Cleanup WriteFStore/WriteVecStore schedules
MOVNTPD/MOVNTPS should be WriteFStore

Standardized BDW/HSW/SKL/SKX WriteFStore/WriteVecStore - fixes some missed instregex patterns. (V)MASKMOVDQU was already using the default, its costs gets increased but is still nowhere near the real cost of that nasty instruction....

llvm-svn: 331864
2018-05-09 11:01:16 +00:00
Simon Dardis
da9dab65c5 [mips] Move conditional moves out of isCodeGenOnly
Reviewers: atanasyan, smaksimovic, abeserminji

Differential Revision: https://reviews.llvm.org/D46389

llvm-svn: 331863
2018-05-09 10:33:21 +00:00
Craig Topper
a93485c9d1 [X86] Combine (vXi1 (bitcast (-1)))) and (vXi1 (bitcast (0))) to all ones or all zeros vXi1 vector.
llvm-svn: 331847
2018-05-09 06:07:20 +00:00
Daniel Sanders
7932a6b01b Revert r331816 and r331820 - [globalisel] Add a combiner helpers for extending loads and use them in a pre-legalize combiner for AArch64
Reverting this to see if the clang-cmake-aarch64-global-isel and
clang-cmake-aarch64-quick bots are failing because of this commit.
We know it wasn't r331819.

llvm-svn: 331846
2018-05-09 05:00:17 +00:00
Shiva Chen
208c23a5a2 [DebugInfo] Examine all uses of isDebugValue() for debug instructions.
Because we create a new kind of debug instruction, DBG_LABEL, we need to
check all passes which use isDebugValue() to check MachineInstr is debug
instruction or not. When expelling debug instructions, we should expel
both DBG_VALUE and DBG_LABEL. So, I create a new function,
isDebugInstr(), in MachineInstr to check whether the MachineInstr is
debug instruction or not.

This patch has no new test case. I have run regression test and there is
no difference in regression test.

Differential Revision: https://reviews.llvm.org/D45342

Patch by Hsiangkai Wang.

llvm-svn: 331844
2018-05-09 02:42:00 +00:00
Daniel Sanders
7e100ec90e [globalisel] Add a combiner helpers for extending loads and use them in a pre-legalize combiner for AArch64
Summary: Depends on D45541

Reviewers: ab, aditya_nandakumar, bogner, rtereshin, volkan, rovka, javed.absar, aemerson

Reviewed By: aemerson

Subscribers: aemerson, rengolin, mgorny, javed.absar, kristof.beyls, llvm-commits

Differential Revision: https://reviews.llvm.org/D45543

llvm-svn: 331816
2018-05-08 22:26:39 +00:00
Jessica Paquette
8f92abcca0 Revert "[X86][CET] Shadow stack fix for setjmp/longjmp"
This reverts commit 30962eca38ef02666ebcdded72a94f2cd0292d68.

This commit has been causing test asan failures on a build bot.

http://green.lab.llvm.org/green/job/clang-stage1-configure-RA/45108/

Original commit: https://reviews.llvm.org/D46181

llvm-svn: 331813
2018-05-08 22:00:57 +00:00
Tim Renouf
3e6a2d3aa6 [AMDGPU] Provide machine -> name mapping
Summary:
AMDGPU stores a numerical code for the particular GPU variant in EFlags
in the ELF file. This commit provides a mapping from that number into
the machine name for use by objdump-type tools.

Change-Id: Id37fc0bebad443bd89c0080985ce298c4e7e9319

Subscribers: arsenm, kzhuravl, wdng, nhaehnle, yaxunl, dstuttard, t-tye, llvm-commits

Differential Revision: https://reviews.llvm.org/D46587

llvm-svn: 331798
2018-05-08 18:53:04 +00:00
Lei Huang
e528c0723c [Power9]Legalize and emit code for truncate and convert QP to HW and Byte
Legalize and emit code for truncate and convert float128 to (un)signed short
and (un)signed char.

Differential Revision: https://reviews.llvm.org/D46194

llvm-svn: 331797
2018-05-08 18:52:06 +00:00
Matt Arsenault
93d97de1e9 AMDGPU: Fix broken dynamic vector indexing for packed types
The intention of this was to multiply by 16, not shift by 16.

llvm-svn: 331793
2018-05-08 18:43:25 +00:00
Lei Huang
03f6278cac [Power9]Legalize and emit code for truncate and convert Quad-Precision to Word
Legalize and emit code for:

  * xscvqpswz : VSX Scalar truncate & Convert Quad-Precision to Signed Word
  * xscvqpuwz : VSX Scalar truncate & Convert Quad-Precision to Unsigned Word

Differential Revision: https://reviews.llvm.org/D45635

llvm-svn: 331790
2018-05-08 18:34:00 +00:00
Changpeng Fang
8f891d10dd AMDGPU: Use eraseFromParent to delete am instruction when it is no longer needed.
Reviewer: Nicolai

Differential Revision:
  https://reviews.llvm.org/D46438

llvm-svn: 331788
2018-05-08 18:32:35 +00:00
Lei Huang
33184ea07a [Power9]Legalize and emit code for truncate and convert QP to DW
Legalize and emit code for:

  * xscvqpsdz : VSX Scalar truncate & Convert Quad-Precision to Signed Dword
  * xscvqpudz : VSX Scalar truncate & Convert Quad-Precision to Unsigned Dword

Differential Revision: https://reviews.llvm.org/D45553

llvm-svn: 331787
2018-05-08 18:23:31 +00:00
Lei Huang
429d2eb1e1 [PowerPC] Unify handling for conversion of FP_TO_INT feeding a store
Existing DAG combine only handles conversions for FP_TO_SINT:
"{f32, f64} x { i32, i16 }"

This patch simplifies the code to handle:
"{ FP_TO_SINT, FP_TO_UINT } x { f64, f32 } x { i64, i32, i16, i8 }"

Differential Revision: https://reviews.llvm.org/D46102

llvm-svn: 331778
2018-05-08 17:36:40 +00:00
Stanislav Mekhanoshin
8390d2fe82 [AMDGPU] Added checks for dpp_ctrl value
- Report error for invalid dpp_ctrl values.
- Changed the way it is reported, now the error will be emitted into
  asm and will work with release build as well.
- Added dpp_ctrl value verifier for codegen.
- Added symbolic constants for dpp_ctrl.

Differential Revision: https://reviews.llvm.org/D46565

llvm-svn: 331775
2018-05-08 16:53:02 +00:00
Simon Pilgrim
f7067fbe1d [X86] Tag PCONFIG instruction with WriteSystem scheduler class
llvm-svn: 331773
2018-05-08 15:55:14 +00:00
Stefan Maksimovic
3b34aa259e [mips][msa] Pattern match the splat.d instruction
Introduced a new pattern for matching splat.d explicitly.

Both splat.d and splati.d can now be generated from the @llvm.mips.splat.d
intrinsic depending on whether an immediate value has been passed.

Differential Revision: https://reviews.llvm.org/D45683

llvm-svn: 331771
2018-05-08 15:12:29 +00:00
Simon Pilgrim
c2d26a18d6 [X86] Split off WriteIMul64 from WriteIMul schedule class (PR36931)
This fixes a couple of BtVer2 missing instructions that weren't been handled in the override.

NOTE: There are still a lot of overrides that still need cleaning up!
llvm-svn: 331770
2018-05-08 14:55:16 +00:00
Simon Pilgrim
d4d2fb4f3d [X86] Split WriteIDiv into div/idiv 8/16/32/64 implementations (PR36930)
I've created the necessary classes but there are still a lot of overrides that need cleaning up.

NOTE: The Znver1 model was missing some div/idiv variants in the instregex patterns and wasn't setting the resource cycles at all in the overrides.
llvm-svn: 331767
2018-05-08 13:51:45 +00:00
Simon Pilgrim
949fff4d97 [X86] Add vector masked load/store scheduler classes (PR32857)
Split off from existing vector load/store classes to remove InstRW overrides.

llvm-svn: 331760
2018-05-08 12:17:55 +00:00