Commit Graph

298 Commits

Author SHA1 Message Date
Sanjay Patel
2f6b67546d [DAG] fold FP binops with undef operands to NaN
This is the FP sibling of D43141 with the corresponding IR change in rL327212.

We can't propagate undef here because if a variable operand is a NaN, these 
binops must propagate NaN. Neither global nor node-level fast-math makes a 
difference. If we have 'nnan', I think later folds can turn the NaN into undef.

The tests in X86/fp-undef.ll are meant to be the definitive verification for 
these folds - everything reduces identically now.

The other test changes are collateral damage. They may need to be altered to
preserve their intent.

Differential Revision: https://reviews.llvm.org/D47026



git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@332920 91177308-0d34-0410-b5e6-96231b3b80d8
2018-05-21 23:54:19 +00:00
Artem Belevich
2f825676aa [NVPTX] Added a feature to use short pointers for const/local/shared AS.
Const/local/shared address spaces are all < 4GB and we can always use
32-bit pointers to access them. This has substantial performance impact
on kernels that uses shared memory for intermediary results.

The feature is disabled by default.

Differential Revision: https://reviews.llvm.org/D46147

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@331941 91177308-0d34-0410-b5e6-96231b3b80d8
2018-05-09 23:46:19 +00:00
Shiva Chen
a8a13bc662 [DebugInfo] Add DILabel metadata and intrinsic llvm.dbg.label.
In order to set breakpoints on labels and list source code around
labels, we need collect debug information for labels, i.e., label
name, the function label belong, line number in the file, and the
address label located. In order to keep these information in LLVM
IR and to allow backend to generate debug information correctly.
We create a new kind of metadata for labels, DILabel. The format
of DILabel is

!DILabel(scope: !1, name: "foo", file: !2, line: 3)

We hope to keep debug information as much as possible even the
code is optimized. So, we create a new kind of intrinsic for label
metadata to avoid the metadata is eliminated with basic block.
The intrinsic will keep existing if we keep it from optimized out.
The format of the intrinsic is

llvm.dbg.label(metadata !1)

It has only one argument, that is the DILabel metadata. The
intrinsic will follow the label immediately. Backend could get the
label metadata through the intrinsic's parameter.

We also create DIBuilder API for labels to be used by Frontend.
Frontend could use createLabel() to allocate DILabel objects, and use
insertLabel() to insert llvm.dbg.label intrinsic in LLVM IR.

Differential Revision: https://reviews.llvm.org/D45024

Patch by Hsiangkai Wang.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@331841 91177308-0d34-0410-b5e6-96231b3b80d8
2018-05-09 02:40:45 +00:00
Benjamin Kramer
91250e2b2c [NVPTX] Make the legalizer expand shufflevector of <2 x half>
There's no direct instruction for this, but it's trivially implemented
with two movs. Without this the code generator just dies when
encountering a shufflevector.

Differential Revision: https://reviews.llvm.org/D46116

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@330948 91177308-0d34-0410-b5e6-96231b3b80d8
2018-04-26 15:26:29 +00:00
Artem Belevich
cbae883886 [NVPTX, CUDA] Added support for m8n32k16 and m32n8k16 variants of wmma instructions.
The new instructions were added added for sm_70+ GPUs in CUDA-9.1.

Differential Revision: https://reviews.llvm.org/D45068

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@330296 91177308-0d34-0410-b5e6-96231b3b80d8
2018-04-18 21:51:48 +00:00
Artem Belevich
08846e25c5 [NVPTX] add support for initializing fp16 arrays.
Previously HalfTy was not handled which would either trigger an assertion,
or result in array initialized with garbage.

Differential Revision: https://reviews.llvm.org/D45391

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@329463 91177308-0d34-0410-b5e6-96231b3b80d8
2018-04-06 22:25:08 +00:00
Artem Belevich
05471bb621 [NVPTX] Fixed vectorized LDG for f16.
v2f16 is a special case in NVPTX. v4f16 may be loaded as a pair of v2f16
and that was not previously handled correctly by tryLDGLDU()

Differential Revision: https://reviews.llvm.org/D45339

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@329456 91177308-0d34-0410-b5e6-96231b3b80d8
2018-04-06 21:10:24 +00:00
Artem Belevich
bc1c5eddf2 [NVPTX] Make tensor shape part of WMMA intrinsic's name.
This is needed for the upcoming implementation of the
new 8x32x16 and 32x8x16 variants of WMMA instructions
introduced in CUDA 9.1.

Differential Revision: https://reviews.llvm.org/D44719

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@328158 91177308-0d34-0410-b5e6-96231b3b80d8
2018-03-21 21:55:02 +00:00
Artem Belevich
9a1bbfc77b [NVPTX] Make tensor load/store intrinsics overloaded.
This way we can support address-space specific variants without explicitly
encoding the space in the name of the intrinsic. Less intrinsics to deal with ->
less boilerplate.

Added a bit of tablegen magic to match/replace an intrinsics with a pointer
argument in particular address space with the space-specific instruction
variant.

Updated tests to use non-default address spaces.

Differential Revision: https://reviews.llvm.org/D43268

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@328006 91177308-0d34-0410-b5e6-96231b3b80d8
2018-03-20 17:18:59 +00:00
Craig Topper
5af2fadb04 [DAGCombiner] When combining zero_extend of a truncate, only mask before extending for vectors.
Masking first, prevents the extend from being combine with loads. Its also interfering with some vXi1 extraction code.

Differential Revision: https://reviews.llvm.org/D42679

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@326500 91177308-0d34-0410-b5e6-96231b3b80d8
2018-03-01 22:32:25 +00:00
Justin Lebar
5ee5a7c21e [NVPTX] Lower loads from global constants using ld.global.nc (aka LDG).
Summary:
After D43914, loads from global variables in addrspace(1) happen with
ld.global.  But since they're constants, even better would be to use
ld.global.nc, aka ldg.

Reviewers: tra

Subscribers: jholewinski, sanjoy, hiraditya, llvm-commits

Differential Revision: https://reviews.llvm.org/D43915

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@326390 91177308-0d34-0410-b5e6-96231b3b80d8
2018-02-28 23:58:05 +00:00
Justin Lebar
1d2683f5a0 [NVPTX] Use addrspacecast instead of target-specific intrinsics in NVPTXGenericToNVVM.
Summary:
NVPTXGenericToNVVM was using target-specific intrinsics to do address
space casts.  Using the addrspacecast instruction is (a lot) simpler.
But it also has the advantage of being understandable to other passes.
In particular, InferAddrSpaces is able to understand these address space
casts and remove them in most cases.

Reviewers: tra

Subscribers: jholewinski, sanjoy, hiraditya, llvm-commits

Differential Revision: https://reviews.llvm.org/D43914

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@326389 91177308-0d34-0410-b5e6-96231b3b80d8
2018-02-28 23:57:48 +00:00
Craig Topper
297d461b01 [DAGCombiner] Call ExtendUsesToFormExtLoad in (zext (and (load)))->(and (zextload)) even when the and does not have multiple uses
Same for the sign extend case.

Currently we check for multiple uses on the binop. Then we call ExtendUsesToFormExtLoad to capture SetCCs that use the load. So we only end up finding any setccs when the and has additional uses and the load is used by a setcc. I don't think the and having multiple uses is relevant here. I think we should only be checking for the load having multiple uses.

This changes an NVPTX test because we now find that the load has a second use by a truncate, but ExtendUsesToFormExtLoad only looks at setccs it can extend. All other operations just check isTruncateFree. Maybe we should allow widening of an existing truncate even if its not free?

Differential Revision: https://reviews.llvm.org/D43063

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@325289 91177308-0d34-0410-b5e6-96231b3b80d8
2018-02-15 20:20:32 +00:00
Daniel Neilson
afa2e7e6a6 Remove alignment argument from memcpy/memmove/memset in favour of alignment attributes (Step 1)
Summary:
 This is a resurrection of work first proposed and discussed in Aug 2015:
   http://lists.llvm.org/pipermail/llvm-dev/2015-August/089384.html
and initially landed (but then backed out) in Nov 2015:
   http://lists.llvm.org/pipermail/llvm-commits/Week-of-Mon-20151109/312083.html

 The @llvm.memcpy/memmove/memset intrinsics currently have an explicit argument
which is required to be a constant integer. It represents the alignment of the
dest (and source), and so must be the minimum of the actual alignment of the
two.

 This change is the first in a series that allows source and dest to each
have their own alignments by using the alignment attribute on their arguments.

 In this change we:
1) Remove the alignment argument.
2) Add alignment attributes to the source & dest arguments. We, temporarily,
   require that the alignments for source & dest be equal.

 For example, code which used to read:
  call void @llvm.memcpy.p0i8.p0i8.i32(i8* %dest, i8* %src, i32 100, i32 4, i1 false)
will now read
  call void @llvm.memcpy.p0i8.p0i8.i32(i8* align 4 %dest, i8* align 4 %src, i32 100, i1 false)

 Downstream users may have to update their lit tests that check for
@llvm.memcpy/memmove/memset call/declaration patterns. The following extended sed script
may help with updating the majority of your tests, but it does not catch all possible
patterns so some manual checking and updating will be required.

s~declare void @llvm\.mem(set|cpy|move)\.p([^(]*)\((.*), i32, i1\)~declare void @llvm.mem\1.p\2(\3, i1)~g
s~call void @llvm\.memset\.p([^(]*)i8\(i8([^*]*)\* (.*), i8 (.*), i8 (.*), i32 [01], i1 ([^)]*)\)~call void @llvm.memset.p\1i8(i8\2* \3, i8 \4, i8 \5, i1 \6)~g
s~call void @llvm\.memset\.p([^(]*)i16\(i8([^*]*)\* (.*), i8 (.*), i16 (.*), i32 [01], i1 ([^)]*)\)~call void @llvm.memset.p\1i16(i8\2* \3, i8 \4, i16 \5, i1 \6)~g
s~call void @llvm\.memset\.p([^(]*)i32\(i8([^*]*)\* (.*), i8 (.*), i32 (.*), i32 [01], i1 ([^)]*)\)~call void @llvm.memset.p\1i32(i8\2* \3, i8 \4, i32 \5, i1 \6)~g
s~call void @llvm\.memset\.p([^(]*)i64\(i8([^*]*)\* (.*), i8 (.*), i64 (.*), i32 [01], i1 ([^)]*)\)~call void @llvm.memset.p\1i64(i8\2* \3, i8 \4, i64 \5, i1 \6)~g
s~call void @llvm\.memset\.p([^(]*)i128\(i8([^*]*)\* (.*), i8 (.*), i128 (.*), i32 [01], i1 ([^)]*)\)~call void @llvm.memset.p\1i128(i8\2* \3, i8 \4, i128 \5, i1 \6)~g
s~call void @llvm\.memset\.p([^(]*)i8\(i8([^*]*)\* (.*), i8 (.*), i8 (.*), i32 ([0-9]*), i1 ([^)]*)\)~call void @llvm.memset.p\1i8(i8\2* align \6 \3, i8 \4, i8 \5, i1 \7)~g
s~call void @llvm\.memset\.p([^(]*)i16\(i8([^*]*)\* (.*), i8 (.*), i16 (.*), i32 ([0-9]*), i1 ([^)]*)\)~call void @llvm.memset.p\1i16(i8\2* align \6 \3, i8 \4, i16 \5, i1 \7)~g
s~call void @llvm\.memset\.p([^(]*)i32\(i8([^*]*)\* (.*), i8 (.*), i32 (.*), i32 ([0-9]*), i1 ([^)]*)\)~call void @llvm.memset.p\1i32(i8\2* align \6 \3, i8 \4, i32 \5, i1 \7)~g
s~call void @llvm\.memset\.p([^(]*)i64\(i8([^*]*)\* (.*), i8 (.*), i64 (.*), i32 ([0-9]*), i1 ([^)]*)\)~call void @llvm.memset.p\1i64(i8\2* align \6 \3, i8 \4, i64 \5, i1 \7)~g
s~call void @llvm\.memset\.p([^(]*)i128\(i8([^*]*)\* (.*), i8 (.*), i128 (.*), i32 ([0-9]*), i1 ([^)]*)\)~call void @llvm.memset.p\1i128(i8\2* align \6 \3, i8 \4, i128 \5, i1 \7)~g
s~call void @llvm\.mem(cpy|move)\.p([^(]*)i8\(i8([^*]*)\* (.*), i8([^*]*)\* (.*), i8 (.*), i32 [01], i1 ([^)]*)\)~call void @llvm.mem\1.p\2i8(i8\3* \4, i8\5* \6, i8 \7, i1 \8)~g
s~call void @llvm\.mem(cpy|move)\.p([^(]*)i16\(i8([^*]*)\* (.*), i8([^*]*)\* (.*), i16 (.*), i32 [01], i1 ([^)]*)\)~call void @llvm.mem\1.p\2i16(i8\3* \4, i8\5* \6, i16 \7, i1 \8)~g
s~call void @llvm\.mem(cpy|move)\.p([^(]*)i32\(i8([^*]*)\* (.*), i8([^*]*)\* (.*), i32 (.*), i32 [01], i1 ([^)]*)\)~call void @llvm.mem\1.p\2i32(i8\3* \4, i8\5* \6, i32 \7, i1 \8)~g
s~call void @llvm\.mem(cpy|move)\.p([^(]*)i64\(i8([^*]*)\* (.*), i8([^*]*)\* (.*), i64 (.*), i32 [01], i1 ([^)]*)\)~call void @llvm.mem\1.p\2i64(i8\3* \4, i8\5* \6, i64 \7, i1 \8)~g
s~call void @llvm\.mem(cpy|move)\.p([^(]*)i128\(i8([^*]*)\* (.*), i8([^*]*)\* (.*), i128 (.*), i32 [01], i1 ([^)]*)\)~call void @llvm.mem\1.p\2i128(i8\3* \4, i8\5* \6, i128 \7, i1 \8)~g
s~call void @llvm\.mem(cpy|move)\.p([^(]*)i8\(i8([^*]*)\* (.*), i8([^*]*)\* (.*), i8 (.*), i32 ([0-9]*), i1 ([^)]*)\)~call void @llvm.mem\1.p\2i8(i8\3* align \8 \4, i8\5* align \8 \6, i8 \7, i1 \9)~g
s~call void @llvm\.mem(cpy|move)\.p([^(]*)i16\(i8([^*]*)\* (.*), i8([^*]*)\* (.*), i16 (.*), i32 ([0-9]*), i1 ([^)]*)\)~call void @llvm.mem\1.p\2i16(i8\3* align \8 \4, i8\5* align \8 \6, i16 \7, i1 \9)~g
s~call void @llvm\.mem(cpy|move)\.p([^(]*)i32\(i8([^*]*)\* (.*), i8([^*]*)\* (.*), i32 (.*), i32 ([0-9]*), i1 ([^)]*)\)~call void @llvm.mem\1.p\2i32(i8\3* align \8 \4, i8\5* align \8 \6, i32 \7, i1 \9)~g
s~call void @llvm\.mem(cpy|move)\.p([^(]*)i64\(i8([^*]*)\* (.*), i8([^*]*)\* (.*), i64 (.*), i32 ([0-9]*), i1 ([^)]*)\)~call void @llvm.mem\1.p\2i64(i8\3* align \8 \4, i8\5* align \8 \6, i64 \7, i1 \9)~g
s~call void @llvm\.mem(cpy|move)\.p([^(]*)i128\(i8([^*]*)\* (.*), i8([^*]*)\* (.*), i128 (.*), i32 ([0-9]*), i1 ([^)]*)\)~call void @llvm.mem\1.p\2i128(i8\3* align \8 \4, i8\5* align \8 \6, i128 \7, i1 \9)~g

 The remaining changes in the series will:
Step 2) Expand the IRBuilder API to allow creation of memcpy/memmove with differing
   source and dest alignments.
Step 3) Update Clang to use the new IRBuilder API.
Step 4) Update Polly to use the new IRBuilder API.
Step 5) Update LLVM passes that create memcpy/memmove calls to use the new IRBuilder API,
        and those that use use MemIntrinsicInst::[get|set]Alignment() to use
        getDestAlignment() and getSourceAlignment() instead.
Step 6) Remove the single-alignment IRBuilder API for memcpy/memmove, and the
        MemIntrinsicInst::[get|set]Alignment() methods.

Reviewers: pete, hfinkel, lhames, reames, bollu

Reviewed By: reames

Subscribers: niosHD, reames, jholewinski, qcolombet, jfb, sanjoy, arsenm, dschuff, dylanmckay, mehdi_amini, sdardis, nemanjai, david2050, nhaehnle, javed.absar, sbc100, jgravelle-google, eraman, aheejin, kbarton, JDevlieghere, asb, rbar, johnrusso, simoncook, jordy.potman.lists, apazos, sabuasal, llvm-commits

Differential Revision: https://reviews.llvm.org/D41675

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@322965 91177308-0d34-0410-b5e6-96231b3b80d8
2018-01-19 17:13:12 +00:00
Alexey Bataev
6072d5d178 [DEBUG] Add initial tests for debug info for NVPTX target, NFC.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@321822 91177308-0d34-0410-b5e6-96231b3b80d8
2018-01-04 21:07:07 +00:00
Sean Fertile
b71c6a9b39 [Memcpy Loop Lowering] Remove the fixed int8 lowering.
Switch over to the lowering that uses target supplied operand types.

Differential Revision: https://reviews.llvm.org/D41201

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@320989 91177308-0d34-0410-b5e6-96231b3b80d8
2017-12-18 15:31:14 +00:00
Sean Fertile
a9c4e7f664 [Memcpy Loop Lowering] Only calculate residual size/bytes copied when needed.
If the loop operand type is int8 then there will be no residual loop for the
unknown size expansion. Dont create the residual-size and bytes-copied values
when they are not needed.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@320929 91177308-0d34-0410-b5e6-96231b3b80d8
2017-12-16 22:41:39 +00:00
Sean Fertile
7edcd376d3 [Memcpy Loop Lowering] Insert loop BB inbetween the split BB.
The original memcpy expansion inserted the loop basic block inbetween
the 2 new basic blocks created by splitting the original block the memcpy
call was in. This commit makes the new memcpy expansion do the same to keep the
layout of the IR matching between the old and new implementations.

Differential Review: https://reviews.llvm.org/D41197

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@320848 91177308-0d34-0410-b5e6-96231b3b80d8
2017-12-15 19:29:12 +00:00
Artem Belevich
47a46192bb [NVPTX,CUDA] Added llvm.nvvm.fns intrinsic and matching __nvvm_fns builtin in clang.
Differential Revision: https://reviews.llvm.org/D40872

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@319909 91177308-0d34-0410-b5e6-96231b3b80d8
2017-12-06 17:50:05 +00:00
Jonas Hahnfeld
ba19c689aa [NVPTX] Assign valid global names
PTX requires that identifiers consist only of [a-zA-Z0-9_$]. The
existing pass already ensured this for globals and this patch adds
the cleanup for functions with local linkage.

However, there was a different problem in the case of collisions
of the adjusted name: The ValueSymbolTable then automatically
appended ".N" with increasing Ns to get a unique name while helping
the ABI demangling. Special case this behavior to omit the dots and
append N directly. This will always give us legal names according
to the PTX requirements.

Differential Revision: https://reviews.llvm.org/D40573

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@319657 91177308-0d34-0410-b5e6-96231b3b80d8
2017-12-04 14:19:33 +00:00
Justin Lebar
d8660fa5dc [NVPTX] Implement __nvvm_atom_add_gen_d builtin.
Summary:
This just seems to have been an oversight.  We already supported the f64
atomic add with an explicit scope (e.g. "cta"), but not the scopeless
version.

Reviewers: tra

Subscribers: jholewinski, sanjoy, cfe-commits, llvm-commits, hiraditya

Differential Revision: https://reviews.llvm.org/D39638

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@317623 91177308-0d34-0410-b5e6-96231b3b80d8
2017-11-07 22:10:54 +00:00
Vedant Kumar
b57c6f4150 [Verifier] Remove the -verify-debug-info cl::opt
This cl::opt has been dead for a while. It's no longer possible to run
the verifier without also verifying debug info.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@317288 91177308-0d34-0410-b5e6-96231b3b80d8
2017-11-02 23:44:20 +00:00
Artur Gainullin
359c2dcf35 Improve clamp recognition in ValueTracking.
Summary:
ValueTracking was recognizing not all variations of clamp. Swapping of
true value and false value of select was added to fix this problem. The
first patch was reverted because it caused miscompile in NVPTX target. 
Added corresponding test cases.

Reviewers: spatel, majnemer, efriedma, reames

Subscribers: llvm-commits, jholewinski

Differential Revision: https://reviews.llvm.org/D39240



git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@316795 91177308-0d34-0410-b5e6-96231b3b80d8
2017-10-27 20:53:41 +00:00
Artem Belevich
c79e8ba6d6 [NVPTX] allow address space inference for volatile loads/stores.
If particular target supports volatile memory access operations, we can
avoid AS casting to generic AS. Currently it's only enabled in NVPTX for
loads and stores that access global & shared AS.

Differential Revision: https://reviews.llvm.org/D39026

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@316495 91177308-0d34-0410-b5e6-96231b3b80d8
2017-10-24 20:31:44 +00:00
Artem Belevich
a187b4878e [NVPTX] Implemented wmma intrinsics and instructions.
WMMA = "Warp Level Matrix Multiply-Accumulate".
These are the new instructions introduced in PTX6.0 and available
on sm_70 GPUs.

Differential Revision: https://reviews.llvm.org/D38645

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@315601 91177308-0d34-0410-b5e6-96231b3b80d8
2017-10-12 18:27:55 +00:00
Artem Belevich
b636450ff9 [NVPTX] added match.{any,all}.sync instructions, intrinsics & builtins.
Differential Revision: https://reviews.llvm.org/D38191

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@314223 91177308-0d34-0410-b5e6-96231b3b80d8
2017-09-26 17:07:23 +00:00
Justin Lebar
ba3e4c801a Revert "[NVPTX] added match.{any,all}.sync instructions, intrinsics & builtins.", rL314135.
Causing assertion failures on macos:

> Assertion failed: (Num < NumOperands && "Invalid child # of SDNode!"),
> function getOperand, file
> /Users/buildslave/jenkins/workspace/clang-stage1-cmake-RA-incremental/llvm/include/llvm/CodeGen/SelectionDAGNodes.h,
> line 835.

http://green.lab.llvm.org/green/job/clang-stage1-cmake-RA-incremental/42739/testReport/LLVM/CodeGen_NVPTX/surf_read_cuda_ll/

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@314142 91177308-0d34-0410-b5e6-96231b3b80d8
2017-09-25 19:41:56 +00:00
Artem Belevich
737f60180a [NVPTX] added match.{any,all}.sync instructions, intrinsics & builtins.
Differential Revision: https://reviews.llvm.org/D38191

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@314135 91177308-0d34-0410-b5e6-96231b3b80d8
2017-09-25 18:53:57 +00:00
Artem Belevich
c02a4f5a57 [NVPTX] Implemented bar.warp.sync, barrier.sync, and vote{.sync} instructions/intrinsics/builtins.
Differential Revision: https://reviews.llvm.org/D38148

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@313898 91177308-0d34-0410-b5e6-96231b3b80d8
2017-09-21 18:44:49 +00:00
Artem Belevich
34fb94caca [NVPTX] Implemented shfl.sync instruction and supporting intrinsics/builtins.
Differential Revision: https://reviews.llvm.org/D38090

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@313820 91177308-0d34-0410-b5e6-96231b3b80d8
2017-09-20 21:23:07 +00:00
Artem Belevich
4059d374ce [CUDA] Added rudimentary support for CUDA-9 and sm_70.
For now CUDA-9 is not included in the list of CUDA versions clang
searches for, so the path to CUDA-9 must be explicitly passed
via --cuda-path=.

On LLVM side NVPTX added sm_70 GPU type which bumps required
PTX version to 6.0, but otherwise is equivalent to sm_62 at the moment.

Differential Revision: https://reviews.llvm.org/D37576

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@312734 91177308-0d34-0410-b5e6-96231b3b80d8
2017-09-07 18:14:32 +00:00
Adrian Prantl
69e607f200 Canonicalize the representation of empty an expression in DIGlobalVariableExpression
This change simplifies code that has to deal with
DIGlobalVariableExpression and mirrors how we treat DIExpressions in
debug info intrinsics. Before this change there were two ways of
representing empty expressions on globals, a nullptr and an empty
!DIExpression().

If someone needs to upgrade out-of-tree testcases:
  perl -pi -e 's/(!DIGlobalVariableExpression\(var: ![0-9]*)\)/\1, expr: !DIExpression())/g' <MYTEST.ll>
will catch 95%.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@312144 91177308-0d34-0410-b5e6-96231b3b80d8
2017-08-30 18:06:51 +00:00
Artem Belevich
1ef1996909 [NVPTX] Add lowering of i128 params.
The patch adds support of i128 params lowering. The changes are quite trivial to
support i128 as a "special case" of integer type. With this patch, we lower i128
params the same way as aggregates of size 16 bytes: .param .b8 _ [16].

Currently, NVPTX can't deal with the 128 bit integers:
* in some cases because of failed assertions like
  ValVTs.size() == OutVals.size() && "Bad return value decomposition"
* in other cases emitting PTX with .i128 or .u128 types (which are not valid [1])
  [1] http://docs.nvidia.com/cuda/parallel-thread-execution/index.html#fundamental-types

Differential Revision: https://reviews.llvm.org/D34555
Patch by: Denys Zariaiev (denys.zariaiev@gmail.com)

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@308675 91177308-0d34-0410-b5e6-96231b3b80d8
2017-07-20 21:16:03 +00:00
Sean Fertile
471398ffea Extend memcpy expansion in Transform/Utils to handle wider operand types.
Adds loop expansions for known-size and unknown-sized memcpy calls, allowing the
target to provide the operand types through TTI callbacks. The default values
for the TTI callbacks use int8 operand types and matches the existing behaviour
if they aren't overridden by the target.

Differential revision: https://reviews.llvm.org/D32536

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@307346 91177308-0d34-0410-b5e6-96231b3b80d8
2017-07-07 02:00:06 +00:00
Michael Kuperstein
77b223ff61 Reverting r307326 because it breaks clang tests.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@307334 91177308-0d34-0410-b5e6-96231b3b80d8
2017-07-06 23:24:39 +00:00
Michael Kuperstein
1803a9f234 [NVPTX] Add lowering of i128 params.
The patch adds support of i128 params lowering. The changes are quite trivial to
support i128 as a "special case" of integer type. With this patch, we lower i128
params the same way as aggregates of size 16 bytes: .param .b8 _ [16].

Currently, NVPTX can't deal with the 128 bit integers:
* in some cases because of failed assertions like
ValVTs.size() == OutVals.size() && "Bad return value decomposition"
* in other cases emitting PTX with .i128 or .u128 types (which are not valid [1])
[1] http://docs.nvidia.com/cuda/parallel-thread-execution/index.html#fundamental-types

Differential Revision: https://reviews.llvm.org/D34555
Patch by: Denys Zariaiev (denys.zariaiev@gmail.com)



git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@307326 91177308-0d34-0410-b5e6-96231b3b80d8
2017-07-06 22:18:54 +00:00
Teresa Johnson
4da9193a65 Recommit "r306541 - Add zero-length check to memcpy/memset load store loop expansion""
With fix for use-after-free errors. We can't add the new branch and
remove the old one until we are done with the Builder constructed for
the block.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@306937 91177308-0d34-0410-b5e6-96231b3b80d8
2017-07-01 03:24:10 +00:00
Daniel Jasper
ab8d3d1763 Revert "r306541 - Add zero-length check to memcpy/memset load store loop expansion"
Segfaults in non-optimized builds. I'll get a stack trace and a
reproducer to Teresa.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@306793 91177308-0d34-0410-b5e6-96231b3b80d8
2017-06-30 06:37:33 +00:00
Teresa Johnson
457765feeb Add zero-length check to memcpy/memset load store loop expansion
Summary:
I was testing using this expansion logic in other cases besides
NVPTX, and found some runtime failures due to the lack of a check
for a zero length memcpy/memset before the loop. There is already
such a check in the memmove expansion code though.

Reviewers: hfinkel

Subscribers: jholewinski, wdng, llvm-commits

Differential Revision: https://reviews.llvm.org/D34707

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@306541 91177308-0d34-0410-b5e6-96231b3b80d8
2017-06-28 13:07:37 +00:00
Hans Wennborg
aade6b806c Revert r302938 "Add LiveRangeShrink pass to shrink live range within BB."
This also reverts follow-ups r303292 and r303298.

It broke some Chromium tests under MSan, and apparently also internal
tests at Google.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@303369 91177308-0d34-0410-b5e6-96231b3b80d8
2017-05-18 18:50:05 +00:00
Dehao Chen
a44d688d96 Only enable LiveRangeShrink for x86.
Summary: Moving LiveRangeShrink to x86 as this pass is mostly useful for archtectures with great register pressure.

Reviewers: MatzeB, qcolombet

Reviewed By: qcolombet

Subscribers: jholewinski, jyknight, javed.absar, llvm-commits

Differential Revision: https://reviews.llvm.org/D33294

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@303292 91177308-0d34-0410-b5e6-96231b3b80d8
2017-05-17 20:18:13 +00:00
Simon Pilgrim
2223371da5 [NVPTX] Don't flag StoreParam/LoadParam memory chain operands as ReadMem/WriteMem (PR32146)
Follow up to D33147

NVPTXTargetLowering::LowerCall was trusting the default argument values.

Fixes another 17 of the NVPTX '-verify-machineinstrs with EXPENSIVE_CHECKS' errors in PR32146.

Differential Revision: https://reviews.llvm.org/D33189

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@303082 91177308-0d34-0410-b5e6-96231b3b80d8
2017-05-15 17:17:44 +00:00
Simon Pilgrim
4633bbb4c7 [NVPTX] Don't flag StoreRetVal memory chain operands as ReadMem (PR32146)
This fixes 47 of the 75 NVPTX '-verify-machineinstrs with EXPENSIVE_CHECKS' errors in PR32146.

Differential Revision: https://reviews.llvm.org/D33147

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@302942 91177308-0d34-0410-b5e6-96231b3b80d8
2017-05-12 19:56:43 +00:00
Dehao Chen
0faf9ed31e Add LiveRangeShrink pass to shrink live range within BB.
Summary: LiveRangeShrink pass moves instruction right after the definition with the same BB if the instruction and its operands all have more than one use. This pass is inexpensive and guarantees optimal live-range within BB.

Reviewers: davidxl, wmi, hfinkel, MatzeB, andreadb

Reviewed By: MatzeB, andreadb

Subscribers: hiraditya, jyknight, sanjoy, skatkov, gberry, jholewinski, qcolombet, javed.absar, krytarowski, atrick, spatel, RKSimon, andreadb, MatzeB, mehdi_amini, mgorny, efriedma, davide, dberlin, llvm-commits

Differential Revision: https://reviews.llvm.org/D32563

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@302938 91177308-0d34-0410-b5e6-96231b3b80d8
2017-05-12 19:29:27 +00:00
Simon Pilgrim
ed79276e6b [SelectionDAG] Improve support for promotion of <1 x fX> floating point argument types (PR31088)
PR31088 demonstrated that we were assuming that only integers require promotion from <1 x iX> types, when in fact float types may require it as well - in this case half floats.

This patch adds support for extension/truncation for both integer and float types.

Differential Revision: https://reviews.llvm.org/D32391

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@301910 91177308-0d34-0410-b5e6-96231b3b80d8
2017-05-02 10:33:08 +00:00
Matt Arsenault
bdbe8280f2 Add address space mangling to lifetime intrinsics
In preparation for allowing allocas to have non-0 addrspace.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@299876 91177308-0d34-0410-b5e6-96231b3b80d8
2017-04-10 20:18:21 +00:00
Jonas Paulsson
1b6f5a39a9 [SelectionDAG] Optimize VSELECT->SETCC of incompatible or illegal types.
Don't scalarize VSELECT->SETCC when operands/results needs to be widened,
or when the type of the SETCC operands are different from those of the VSELECT.

(VSELECT SETCC) and (VSELECT (AND/OR/XOR (SETCC,SETCC))) are handled.

The previous splitting of VSELECT->SETCC in DAGCombiner::visitVSELECT() is
no longer needed and has been removed.

Updated tests:

test/CodeGen/ARM/vuzp.ll
test/CodeGen/NVPTX/f16x2-instructions.ll
test/CodeGen/X86/2011-10-19-widen_vselect.ll
test/CodeGen/X86/2011-10-21-widen-cmp.ll
test/CodeGen/X86/psubus.ll
test/CodeGen/X86/vselect-pcmp.ll

Review: Eli Friedman, Simon Pilgrim
https://reviews.llvm.org/D29489

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@297930 91177308-0d34-0410-b5e6-96231b3b80d8
2017-03-16 07:17:12 +00:00
Artem Belevich
85cbde2b32 [NVPTX] Fixed lowering of unaligned loads/stores of f16 scalars and vectors.
Differential Revision: https://reviews.llvm.org/D30672

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@297198 91177308-0d34-0410-b5e6-96231b3b80d8
2017-03-07 20:33:38 +00:00
Taewook Oh
15497c13fd [DAGCombiner] Fix DebugLoc propagation when folding !(x cc y) -> (x !cc y)
Summary:
Currently, when 't1: i1 = setcc t2, t3, cc' followed by 't4: i1 = xor t1, Constant:i1<-1>' is folded into 't5: i1 = setcc t2, t3 !cc', SDLoc of newly created SDValue 't5' follows SDLoc of 't4', not 't1'. However, as the opcode of newly created SDValue is 'setcc', it make more sense to take DebugLoc from 't1' than 't4'. For the code below

```
extern int bar();
extern int baz();

int foo(int x, int y) {
  if (x != y)
    return bar();
  else
    return baz();
}
```

, following is the bitcode representation of 'foo' at the end of llvm-ir level optimization:

```
define i32 @foo(i32 %x, i32 %y) !dbg !4 {
entry:
  tail call void @llvm.dbg.value(metadata i32 %x, i64 0, metadata !9, metadata !11), !dbg !12
  tail call void @llvm.dbg.value(metadata i32 %y, i64 0, metadata !10, metadata !11), !dbg !13
  %cmp = icmp ne i32 %x, %y, !dbg !14
  br i1 %cmp, label %if.then, label %if.else, !dbg !16

if.then:                                          ; preds = %entry
  %call = tail call i32 (...) @bar() #3, !dbg !17
  br label %return, !dbg !18

if.else:                                          ; preds = %entry
  %call1 = tail call i32 (...) @baz() #3, !dbg !19
  br label %return, !dbg !20

return:                                           ; preds = %if.else, %if.then
  %retval.0 = phi i32 [ %call, %if.then ], [ %call1, %if.else ]
  ret i32 %retval.0, !dbg !21
}

!14 = !DILocation(line: 5, column: 9, scope: !15)
!16 = !DILocation(line: 5, column: 7, scope: !4)

```

As you can see, in 'entry' block, 'icmp' instruction and 'br' instruction have different debug locations. However, with current implementation, there's no distinction between debug locations of these two when they are lowered to asm instructions. This is because 'icmp' and 'br' become 'setcc' 'xor' and 'brcond' in SelectionDAG, where SDLoc of 'setcc' follows the debug location of 'icmp' but SDLOC of 'xor' and 'brcond' follows the debug location of 'br' instruction, and SDLoc of 'xor' overwrites SDLoc of 'setcc' when they are folded. This patch addresses this issue.

Reviewers: atrick, bogner, andreadb, craig.topper, aprantl

Reviewed By: andreadb

Subscribers: jlebar, mkuper, jholewinski, andreadb, llvm-commits

Differential Revision: https://reviews.llvm.org/D29813

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@296825 91177308-0d34-0410-b5e6-96231b3b80d8
2017-03-02 21:58:35 +00:00
Sanjay Patel
9a9478ccb0 [DAGCombiner] add missing folds for scalar select of {-1,0,1}
The motivation for filling out these select-of-constants cases goes back to D24480, 
where we discussed removing an IR fold from add(zext) --> select. And that goes back to:
https://reviews.llvm.org/rL75531
https://reviews.llvm.org/rL159230

The idea is that we should always canonicalize patterns like this to a select-of-constants 
in IR because that's the smallest IR and the best for value tracking. Note that we currently 
do the opposite in some cases (like the cases in *this* patch). Ie, the proposed folds in 
this patch already exist in InstCombine today:
https://github.com/llvm-mirror/llvm/blob/master/lib/Transforms/InstCombine/InstCombineSelect.cpp#L1151

As this patch shows, most targets generate better machine code for simple ext/add/not ops 
rather than a select of constants. So the follow-up steps to make this less of a patchwork 
of special-case folds and missing IR canonicalization:

1. Have DAGCombiner convert any select of constants into ext/add/not ops.
2  Have InstCombine canonicalize in the other direction (create more selects).

Differential Revision: https://reviews.llvm.org/D30180


git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@296137 91177308-0d34-0410-b5e6-96231b3b80d8
2017-02-24 17:17:33 +00:00