Commit Graph

19595 Commits

Author SHA1 Message Date
Craig Topper
2129bbc974 [AVX-512] Fix a couple test cases to not pass an undef mask to gather intrinsic. This could break if any future optimizations taken advantage of the undef.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@292585 91177308-0d34-0410-b5e6-96231b3b80d8
2017-01-20 07:12:30 +00:00
Greg Parker
41b2c9b1f5 [test] Remove a unwanted match for XFAIL:.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@292567 91177308-0d34-0410-b5e6-96231b3b80d8
2017-01-20 02:01:04 +00:00
Ahmed Bougacha
51348febc6 [AArch64][GlobalISel] Widen scalar int->fp conversions.
It's incorrect to ignore the higher bits of the integer source.
Teach the legalizer how to widen it.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@292563 91177308-0d34-0410-b5e6-96231b3b80d8
2017-01-20 01:37:24 +00:00
Stanislav Mekhanoshin
f304f044ed [AMDGPU] Prevent spills before exec mask is restored
Inline spiller can decide to move a spill as early as possible in the basic block.
It will skip phis and label, but we also need to make sure it skips instructions
in the basic block prologue which restore exec mask.

Added isPositionLike callback in TargetInstrInfo to detect instructions which
shall be skipped in addition to common phis, labels etc.

Differential Revision: https://reviews.llvm.org/D27997

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@292554 91177308-0d34-0410-b5e6-96231b3b80d8
2017-01-20 00:44:31 +00:00
Ahmed Bougacha
c4512366a3 [AArch64][GlobalISel] Split FP conversion legalizer tests. NFC.
Big functions with large vreg # are quite unwieldy to update.

Change it to have one function per test (it does increase boilerplate,
but makes the core hopefully more readable and maintanable).

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@292552 91177308-0d34-0410-b5e6-96231b3b80d8
2017-01-20 00:30:09 +00:00
Ahmed Bougacha
6d3baec9d6 [AArch64][GlobalISel] Split legalizer combine tests. NFC.
Big functions with large vreg # are quite unwieldy to update.  This test
also relied on legal s8 operations which we're considering removing.

Change it to have one function per test (it does increase boilerplate,
but makes the core hopefully more readable and maintanable), and use
100% legal operations throughout.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@292551 91177308-0d34-0410-b5e6-96231b3b80d8
2017-01-20 00:30:06 +00:00
Ahmed Bougacha
cbd2ff78c0 [MIRParser] Allow generic register specification on operand.
This completes r292321 by adding support for generic registers, e.g.:

  %2:_(s32) = G_ADD %0, %1

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@292550 91177308-0d34-0410-b5e6-96231b3b80d8
2017-01-20 00:29:59 +00:00
Tim Northover
dfbb55fc0c AArch64: fall back to DAG ISel for inline assembly.
We can't currently handle "calls" to inlineasm strings so it's better to let
the DAG handle it than generate rubbish.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@292540 91177308-0d34-0410-b5e6-96231b3b80d8
2017-01-19 23:59:35 +00:00
Simon Pilgrim
64f91a5757 [SelectionDAG] Improve knownbits handling of UMIN/UMAX (PR31293)
This patch improves the knownbits logic for unsigned integer min/max opcodes.

For UMIN we know that the result will have the maximum of the inputs' known leading zero bits in the result, similarly for UMAX the maximum of the inputs' leading one bits.

This is particularly useful for simplifying clamping patterns,. e.g. as SSE doesn't have a uitofp instruction we want to use sitofp instead where possible and for that we need to confirm that the top bit is not set.

Differential Revision: https://reviews.llvm.org/D28853

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@292528 91177308-0d34-0410-b5e6-96231b3b80d8
2017-01-19 22:41:22 +00:00
Serge Rogatch
9a63b871bd [XRay][Arm] Repair XRay table emission on Arm32 and add tests to identify such problem earlier
Summary:
Emission of XRay table was occasionally disabled for Arm32, but this bug was not then detected because earlier (also by mistake) testing of XRay was occasionally disabled on 32-bit Arm targets. This patch should fix that problem and detect such problems in the future.
This patch is one of a series, see also
- https://reviews.llvm.org/D28623

Reviewers: rengolin, dberris

Reviewed By: dberris

Subscribers: llvm-commits, aemerson, rengolin, dberris, iid_iunknown

Differential Revision: https://reviews.llvm.org/D28624

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@292516 91177308-0d34-0410-b5e6-96231b3b80d8
2017-01-19 20:24:23 +00:00
Simon Pilgrim
2d628eed7f [X86][SSE] Attempt to pre-truncate arithmetic operations that have already been extended
As discussed on D28219 - it is profitable to combine trunc(binop (s/zext(x), s/zext(y)) to binop(trunc(s/zext(x)), trunc(s/zext(y))) assuming the trunc(ext()) will simplify further

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@292493 91177308-0d34-0410-b5e6-96231b3b80d8
2017-01-19 16:25:02 +00:00
Simon Pilgrim
b3f7c2ec02 [X86][SSE] Added tests for pre-truncating arithmetic operations that have already been extended
As discussed on D28219 - it is profitable to combine trunc(binop (s/zext(x), s/zext(y)) to binop(trunc(s/zext(x)), trunc(s/zext(y))) assuming the trunc(ext()) will simplify further

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@292487 91177308-0d34-0410-b5e6-96231b3b80d8
2017-01-19 15:03:00 +00:00
Mikael Holmen
7f6f824722 [DAG] Don't increase SDNodeOrder for dbg.value/declare.
Summary:
The SDNodeOrder is saved in the IROrder field in the SDNode, and this
field may affects scheduling. Thus, letting dbg.value/declare increase
the order numbers may in turn affect scheduling.

Because of this change we also need to update the code deciding when
dbg values should be output, in ScheduleDAGSDNodes.cpp/ProcessSDDbgValues.

Dbg values now have the same order as the SDNode they are connected to,
not the following orders.

Test cases provided by Florian Hahn.

Reviewers: bogner, aprantl, sunfish, atrick

Reviewed By: atrick

Subscribers: fhahn, probinson, andreadb, llvm-commits, MatzeB

Differential Revision: https://reviews.llvm.org/D25318

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@292485 91177308-0d34-0410-b5e6-96231b3b80d8
2017-01-19 13:55:55 +00:00
Kristof Beyls
56c4b1ef06 [GlobalISel] Pointers are legal operands for G_SELECT on AArch64
Differential Revision: https://reviews.llvm.org/D28805



git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@292481 91177308-0d34-0410-b5e6-96231b3b80d8
2017-01-19 13:32:14 +00:00
Elena Demikhovsky
f7484a051a Recommiting unsigned saturation with a bugfix.
A test case that crached is added to avx512-trunc.ll.
(PR31589)



git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@292479 91177308-0d34-0410-b5e6-96231b3b80d8
2017-01-19 12:08:21 +00:00
Justin Bogner
e3ad0db135 GlobalISel: Implement widening for shifts
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@292476 91177308-0d34-0410-b5e6-96231b3b80d8
2017-01-19 07:51:17 +00:00
Craig Topper
9720a787a1 [AVX-512] Add test cases that show where we are using two subvector inserts to broadcast a 128-bit subvector into a 512-bit vector. We'd be better off using something like SHUFF32X4.
If the subvector comes from a load, we convert to SUBV_BROADCAST and use a broadcast instruction. But if there is no load we keep the inserts. I think we should create the SUBV_BROADCAST even without the load and let isel use the fallback patterns that are used if the load can't be folded. This will use the SHUFF32X4 or similar instruction for the 128-bit into 512-bit case and a single insert for 128 into 256 or 256 into 512.

This should be fixed so subvector broadcast intrinsics can be replaced with native IR since some of those currently lower directly to SHUFF32X4.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@292475 91177308-0d34-0410-b5e6-96231b3b80d8
2017-01-19 07:37:45 +00:00
Craig Topper
f1fe387ada [AVX-512] Support ADD/SUB/MUL of mask vectors
Summary:
Currently we expand and scalarize these operations, but I think we should be able to implement ADD/SUB with KXOR and MUL with KAND.

We already do this for scalar i1 operations so I just extended it to vectors of i1.

Reviewers: zvi, delena

Reviewed By: delena

Subscribers: guyblank, llvm-commits

Differential Revision: https://reviews.llvm.org/D28888

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@292474 91177308-0d34-0410-b5e6-96231b3b80d8
2017-01-19 07:12:35 +00:00
Matt Arsenault
261f60f486 AMDGPU: Disable some fneg combines unless nsz
For -(x + y) -> (-x) + (-y), if x == -y, this would
change the result from -0.0 to 0.0. Since the fma/fmad
combine is an extension of this problem it also
applies there.

fmul should be fine, and I don't think any of the unary
operators or conversions should be a problem either.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@292473 91177308-0d34-0410-b5e6-96231b3b80d8
2017-01-19 06:35:27 +00:00
Matt Arsenault
cfe56d7c95 AMDGPU: Remove modifiers from v_div_scale_*
They seem to produce nonsense results when used.

This should be applied to the release branch.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@292472 91177308-0d34-0410-b5e6-96231b3b80d8
2017-01-19 06:04:12 +00:00
Craig Topper
2a42c3b9a1 [AVX-512] Use VSHUF instructions instead of two inserts as fallback for subvector broadcasts that can't fold the load.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@292466 91177308-0d34-0410-b5e6-96231b3b80d8
2017-01-19 02:34:29 +00:00
Craig Topper
3b137a67b9 [AVX-512] Add additional test cases for broadcast intrinsics that demonstates that we don't fold the loads to use a broadcast instruction.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@292465 91177308-0d34-0410-b5e6-96231b3b80d8
2017-01-19 02:34:25 +00:00
Justin Bogner
3552215d7c GlobalISel: Implement narrowing for G_LOAD
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@292461 91177308-0d34-0410-b5e6-96231b3b80d8
2017-01-19 01:05:48 +00:00
Matthias Braun
cc10c1ec95 Use an actual valid register in test
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@292459 91177308-0d34-0410-b5e6-96231b3b80d8
2017-01-19 01:04:08 +00:00
Artem Belevich
df5ffd6153 [NVPTX] Fix lowering of fp16 ISD::FNEG.
There's no neg.f16 instruction, so negation has to
be done via subtraction from zero.

Differential Revision: https://reviews.llvm.org/D28876

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@292452 91177308-0d34-0410-b5e6-96231b3b80d8
2017-01-19 00:14:45 +00:00
Krzysztof Parzyszek
a8b917bef3 Treat segment [B, E) as not overlapping block with boundaries [A, B)
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@292446 91177308-0d34-0410-b5e6-96231b3b80d8
2017-01-18 23:12:19 +00:00
Krzysztof Parzyszek
621d1f4149 [Hexagon] Remove dead defs from the live set when expanding wstores
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@292445 91177308-0d34-0410-b5e6-96231b3b80d8
2017-01-18 23:11:40 +00:00
Michael Kuperstein
57f0668781 Revert r291670 because it introduces a crash.
r291670 doesn't crash on the original testcase from PR31589,
but it crashes on a slightly more complex one.

PR31589 has the new reproducer.


git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@292444 91177308-0d34-0410-b5e6-96231b3b80d8
2017-01-18 23:05:58 +00:00
Evandro Menezes
f5697dce74 [AArch64] Generate literals by the little end
ARM seems to prefer that long literals be formed from their little end in
order to promote the fusion of the instrs pairs MOV/MOVK and MOVK/MOVK on
Cortex A57 and others (v.  "Cortex A57 Software Optimisation Guide", section
4.14).

Differential revision: https://reviews.llvm.org/D28697

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@292422 91177308-0d34-0410-b5e6-96231b3b80d8
2017-01-18 18:57:08 +00:00
Stanislav Mekhanoshin
d78f00a4d1 [AMDGPU] Do not allow register coalescer to create big superregs
Limit register coalescer by not allowing it to artificially increase
size of registers beyond dword. Such super-registers are in fact
register sequences and not distinct HW registers.

With more super-regs we would need to allocate adjacent registers
and constraint regalloc more than needed. Moreover, our super
registers are overlapping. For instance we have VGPR0_VGPR1_VGPR2,
VGPR1_VGPR2_VGPR3, VGPR2_VGPR3_VGPR4 etc, which complicates registers
allocation even more, resulting in excessive spilling.

Differential Revision: https://reviews.llvm.org/D28782

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@292413 91177308-0d34-0410-b5e6-96231b3b80d8
2017-01-18 17:30:05 +00:00
Justin Bogner
4a6dec6408 GlobalISel: Implement narrowing for G_STORE
Legalize stores of types that are too wide by breaking them up into
sequences of smaller stores.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@292412 91177308-0d34-0410-b5e6-96231b3b80d8
2017-01-18 17:29:54 +00:00
Teresa Johnson
6f03dce4dd Don't create a comdat group for a dropped def with initializer
Non-prevailing weak/linkonce odr symbols will be dropped by ThinLTO to
available_externally when possible. If they had an initializer in the
global_ctors list, a comdat group was being created. This code
already had logic to skip available_externally defs, but now the
EliminateAvailableExternally pass will drop these symbols to
declarations earlier. Change the check to skip all declarations for
linker (which includes available_externally along with declarations).

Reviewers: mehdi_amini

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D28737

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@292408 91177308-0d34-0410-b5e6-96231b3b80d8
2017-01-18 16:58:43 +00:00
Simon Pilgrim
e3d484c1a0 Fixed parser error on windows shell evaluation of RUN script line
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@292363 91177308-0d34-0410-b5e6-96231b3b80d8
2017-01-18 11:40:28 +00:00
Simon Pilgrim
00b2f8ccbf [X86][SSE] Simplify umax knownbits test
combineSRA doesn't detect sign bits splats that it does itself so just use -1 as the demanded input so that its already splatted

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@292361 91177308-0d34-0410-b5e6-96231b3b80d8
2017-01-18 11:20:31 +00:00
Michael Zuckerman
476a6c5119 [X86] Improve mul combine for negative multiplayer (2^c - 1)
This patch improves the mul instruction combine function (combineMul) 
by adding new layer of logic. 
In this patch, we are adding the ability to fold (mul x, -((1 << c) -1)) 
or (mul x, -((1 << c) +1)) into (neg(X << c) -x) or (neg((x << c) + x) respective.

Differential Revision: https://reviews.llvm.org/D28232


git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@292358 91177308-0d34-0410-b5e6-96231b3b80d8
2017-01-18 09:31:13 +00:00
Renato Golin
6994e2312e Revert "[XRay][Arm] Repair XRay table emission on Arm32 and add tests to identify such problem earlier"
This reverts commit r292210, as it broke the Thumb buldbot with:

clang-5.0: error: the clang compiler does not support '-fxray-instrument
on thumbv7-unknown-linux-gnueabihf'.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@292357 91177308-0d34-0410-b5e6-96231b3b80d8
2017-01-18 09:08:43 +00:00
Jonas Paulsson
ce4eec587e [SystemZ] Proper handling of undef flag while expanding pseudo.
During post-RA pseudo expansion, an 'undef' flag of the source operand should
be propagated by emitGRX32Move().

Review: Ulrich Weigand

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@292353 91177308-0d34-0410-b5e6-96231b3b80d8
2017-01-18 08:32:54 +00:00
Matt Arsenault
4cddac93ec DAG: Consider nnan in isKnownNeverNaN
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@292328 91177308-0d34-0410-b5e6-96231b3b80d8
2017-01-18 02:10:08 +00:00
Wei Mi
f2f70f10c2 Revert rL292292 since it causes a SEGV on sanitizer-x86_64-linux-fuzzer build bot.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@292327 91177308-0d34-0410-b5e6-96231b3b80d8
2017-01-18 01:53:53 +00:00
Dan Gohman
97c35d16c6 [WebAssembly] Update grow_memory's return type.
The grow_memory instruction now returns the previous memory size. Add the
return type to the LLVM intrinsic.


git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@292322 91177308-0d34-0410-b5e6-96231b3b80d8
2017-01-18 01:02:45 +00:00
Matthias Braun
c1fa0731c3 MIRParser: Allow regclass specification on operand
You can now define the register class of a virtual register on the
operand itself avoiding the need to use a "registers:" block.

Example: "%0:gr64 = COPY %rax"

Differential Revision: https://reviews.llvm.org/D22398

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@292321 91177308-0d34-0410-b5e6-96231b3b80d8
2017-01-18 00:59:19 +00:00
Justin Lebar
dc30ded6fb [NVPTX] Support global variables of integer type larger than i64.
Reviewers: tra, majnemer

Subscribers: llvm-commits, jholewinski

Differential Revision: https://reviews.llvm.org/D28825

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@292316 91177308-0d34-0410-b5e6-96231b3b80d8
2017-01-18 00:29:53 +00:00
Justin Lebar
db8ecafe57 [NVPTX] Implement min/max in tablegen, rather than with custom DAGComine logic.
Summary:
This change also lets us use max.{s,u}16.  There's a vague warning in a
test about this maybe being less efficient, but I could not come up with
a case where the resulting SASS (sm_35 or sm_60) was different with or
without max.{s,u}16.  It's true that nvcc seems to emit only
max.{s,u}32, but even ptxas 7.0 seems to have no problem generating
efficient SASS from max.{s,u}16 (the casts up to i32 and back down to
i16 seem to be implicit and nops, happening via register aliasing).

In the absence of evidence, better to have fewer special cases, emit
more straightforward code, etc.  In particular, if a new GPU has 16-bit
min/max instructions, we want to be able to use them.

Reviewers: tra

Subscribers: jholewinski, llvm-commits

Differential Revision: https://reviews.llvm.org/D28732

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@292304 91177308-0d34-0410-b5e6-96231b3b80d8
2017-01-18 00:09:01 +00:00
Justin Lebar
f78819e87f [NVPTX] Lower integer absolute value idiom to abs instruction.
Summary: Previously we lowered it literally, to shifts and xors.

Reviewers: tra

Subscribers: jholewinski, llvm-commits

Differential Revision: https://reviews.llvm.org/D28722

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@292303 91177308-0d34-0410-b5e6-96231b3b80d8
2017-01-18 00:08:44 +00:00
Justin Lebar
38c1089801 [NVPTX] Improve lowering of llvm.ctpop.
Summary:
Avoid an unnecessary conversion operation when using the result of
ctpop.i32 or ctpop.i16 as an i32, as in both cases the ptx instruction
we run returns an i32.

(Previously if we used the value as an i32, we'd do an unnecessary
zext+trunc.)

Reviewers: tra

Subscribers: jholewinski, llvm-commits

Differential Revision: https://reviews.llvm.org/D28721

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@292302 91177308-0d34-0410-b5e6-96231b3b80d8
2017-01-18 00:08:27 +00:00
Justin Lebar
7470b2cf62 [NVPTX] Add lowering for llvm.bitreverse.
Reviewers: tra

Subscribers: llvm-commits, jholewinski

Differential Revision: https://reviews.llvm.org/D28720

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@292301 91177308-0d34-0410-b5e6-96231b3b80d8
2017-01-18 00:08:10 +00:00
Justin Lebar
9ff5ce49a6 [NVPTX] Fix function names in ctlz.ll test. Test-only change.
Looks like a copy/paste mistake, all the functions in ctlz.ll were named
"ctpop".

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@292300 91177308-0d34-0410-b5e6-96231b3b80d8
2017-01-18 00:07:52 +00:00
Justin Lebar
fd46a6b819 [NVPTX] Improve lowering of llvm.ctlz.
Summary:
* Disable "ctlz speculation", which inserts a branch on every ctlz(x) which
  has defined behavior on x == 0 to check whether x is, in fact zero.

* Add DAG patterns that avoid re-truncating or re-expanding the result
  of the 16- and 64-bit ctz instructions.

Reviewers: tra

Subscribers: llvm-commits, jholewinski

Differential Revision: https://reviews.llvm.org/D28719

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@292299 91177308-0d34-0410-b5e6-96231b3b80d8
2017-01-18 00:07:35 +00:00
Wei Mi
caab16e99a [RegisterCoalescing] Remove partial redundent copy.
The patch is to solve the performance problem described in PR27827.
Register coalescing sometimes cannot remove a copy because of interference.
But if we can find a reverse copy in one of the predecessor block of the copy,
the copy is partially redundent and we may remove the copy partially by moving
it to the predecessor block without the reverse copy.

Differential Revision: https://reviews.llvm.org/D28585


git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@292292 91177308-0d34-0410-b5e6-96231b3b80d8
2017-01-17 23:39:07 +00:00
Tim Northover
f4d04d46f8 GlobalISel: fix comparison order for G_FCMP
As with G_ICMP we'd written the CSET instructions backwards.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@292285 91177308-0d34-0410-b5e6-96231b3b80d8
2017-01-17 23:04:01 +00:00