archived-llvm

mirror of https://github.com/RPCS3/llvm.git synced 2026-01-31 01:25:19 +01:00

Author	SHA1	Message	Date
Sanjay Patel	2f6b67546d	[DAG] fold FP binops with undef operands to NaN This is the FP sibling of D43141 with the corresponding IR change in rL327212. We can't propagate undef here because if a variable operand is a NaN, these binops must propagate NaN. Neither global nor node-level fast-math makes a difference. If we have 'nnan', I think later folds can turn the NaN into undef. The tests in X86/fp-undef.ll are meant to be the definitive verification for these folds - everything reduces identically now. The other test changes are collateral damage. They may need to be altered to preserve their intent. Differential Revision: https://reviews.llvm.org/D47026 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@332920 91177308-0d34-0410-b5e6-96231b3b80d8	2018-05-21 23:54:19 +00:00
Artem Belevich	2f825676aa	[NVPTX] Added a feature to use short pointers for const/local/shared AS. Const/local/shared address spaces are all < 4GB and we can always use 32-bit pointers to access them. This has substantial performance impact on kernels that uses shared memory for intermediary results. The feature is disabled by default. Differential Revision: https://reviews.llvm.org/D46147 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@331941 91177308-0d34-0410-b5e6-96231b3b80d8	2018-05-09 23:46:19 +00:00
Shiva Chen	a8a13bc662	[DebugInfo] Add DILabel metadata and intrinsic llvm.dbg.label. In order to set breakpoints on labels and list source code around labels, we need collect debug information for labels, i.e., label name, the function label belong, line number in the file, and the address label located. In order to keep these information in LLVM IR and to allow backend to generate debug information correctly. We create a new kind of metadata for labels, DILabel. The format of DILabel is !DILabel(scope: !1, name: "foo", file: !2, line: 3) We hope to keep debug information as much as possible even the code is optimized. So, we create a new kind of intrinsic for label metadata to avoid the metadata is eliminated with basic block. The intrinsic will keep existing if we keep it from optimized out. The format of the intrinsic is llvm.dbg.label(metadata !1) It has only one argument, that is the DILabel metadata. The intrinsic will follow the label immediately. Backend could get the label metadata through the intrinsic's parameter. We also create DIBuilder API for labels to be used by Frontend. Frontend could use createLabel() to allocate DILabel objects, and use insertLabel() to insert llvm.dbg.label intrinsic in LLVM IR. Differential Revision: https://reviews.llvm.org/D45024 Patch by Hsiangkai Wang. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@331841 91177308-0d34-0410-b5e6-96231b3b80d8	2018-05-09 02:40:45 +00:00
Benjamin Kramer	91250e2b2c	[NVPTX] Make the legalizer expand shufflevector of <2 x half> There's no direct instruction for this, but it's trivially implemented with two movs. Without this the code generator just dies when encountering a shufflevector. Differential Revision: https://reviews.llvm.org/D46116 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@330948 91177308-0d34-0410-b5e6-96231b3b80d8	2018-04-26 15:26:29 +00:00
Artem Belevich	cbae883886	[NVPTX, CUDA] Added support for m8n32k16 and m32n8k16 variants of wmma instructions. The new instructions were added added for sm_70+ GPUs in CUDA-9.1. Differential Revision: https://reviews.llvm.org/D45068 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@330296 91177308-0d34-0410-b5e6-96231b3b80d8	2018-04-18 21:51:48 +00:00
Artem Belevich	08846e25c5	[NVPTX] add support for initializing fp16 arrays. Previously HalfTy was not handled which would either trigger an assertion, or result in array initialized with garbage. Differential Revision: https://reviews.llvm.org/D45391 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@329463 91177308-0d34-0410-b5e6-96231b3b80d8	2018-04-06 22:25:08 +00:00
Artem Belevich	05471bb621	[NVPTX] Fixed vectorized LDG for f16. v2f16 is a special case in NVPTX. v4f16 may be loaded as a pair of v2f16 and that was not previously handled correctly by tryLDGLDU() Differential Revision: https://reviews.llvm.org/D45339 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@329456 91177308-0d34-0410-b5e6-96231b3b80d8	2018-04-06 21:10:24 +00:00
Artem Belevich	bc1c5eddf2	[NVPTX] Make tensor shape part of WMMA intrinsic's name. This is needed for the upcoming implementation of the new 8x32x16 and 32x8x16 variants of WMMA instructions introduced in CUDA 9.1. Differential Revision: https://reviews.llvm.org/D44719 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@328158 91177308-0d34-0410-b5e6-96231b3b80d8	2018-03-21 21:55:02 +00:00
Artem Belevich	9a1bbfc77b	[NVPTX] Make tensor load/store intrinsics overloaded. This way we can support address-space specific variants without explicitly encoding the space in the name of the intrinsic. Less intrinsics to deal with -> less boilerplate. Added a bit of tablegen magic to match/replace an intrinsics with a pointer argument in particular address space with the space-specific instruction variant. Updated tests to use non-default address spaces. Differential Revision: https://reviews.llvm.org/D43268 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@328006 91177308-0d34-0410-b5e6-96231b3b80d8	2018-03-20 17:18:59 +00:00
Craig Topper	5af2fadb04	[DAGCombiner] When combining zero_extend of a truncate, only mask before extending for vectors. Masking first, prevents the extend from being combine with loads. Its also interfering with some vXi1 extraction code. Differential Revision: https://reviews.llvm.org/D42679 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@326500 91177308-0d34-0410-b5e6-96231b3b80d8	2018-03-01 22:32:25 +00:00
Justin Lebar	5ee5a7c21e	[NVPTX] Lower loads from global constants using ld.global.nc (aka LDG). Summary: After D43914, loads from global variables in addrspace(1) happen with ld.global. But since they're constants, even better would be to use ld.global.nc, aka ldg. Reviewers: tra Subscribers: jholewinski, sanjoy, hiraditya, llvm-commits Differential Revision: https://reviews.llvm.org/D43915 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@326390 91177308-0d34-0410-b5e6-96231b3b80d8	2018-02-28 23:58:05 +00:00
Justin Lebar	1d2683f5a0	[NVPTX] Use addrspacecast instead of target-specific intrinsics in NVPTXGenericToNVVM. Summary: NVPTXGenericToNVVM was using target-specific intrinsics to do address space casts. Using the addrspacecast instruction is (a lot) simpler. But it also has the advantage of being understandable to other passes. In particular, InferAddrSpaces is able to understand these address space casts and remove them in most cases. Reviewers: tra Subscribers: jholewinski, sanjoy, hiraditya, llvm-commits Differential Revision: https://reviews.llvm.org/D43914 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@326389 91177308-0d34-0410-b5e6-96231b3b80d8	2018-02-28 23:57:48 +00:00
Craig Topper	297d461b01	[DAGCombiner] Call ExtendUsesToFormExtLoad in (zext (and (load)))->(and (zextload)) even when the and does not have multiple uses Same for the sign extend case. Currently we check for multiple uses on the binop. Then we call ExtendUsesToFormExtLoad to capture SetCCs that use the load. So we only end up finding any setccs when the and has additional uses and the load is used by a setcc. I don't think the and having multiple uses is relevant here. I think we should only be checking for the load having multiple uses. This changes an NVPTX test because we now find that the load has a second use by a truncate, but ExtendUsesToFormExtLoad only looks at setccs it can extend. All other operations just check isTruncateFree. Maybe we should allow widening of an existing truncate even if its not free? Differential Revision: https://reviews.llvm.org/D43063 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@325289 91177308-0d34-0410-b5e6-96231b3b80d8	2018-02-15 20:20:32 +00:00
Daniel Neilson	afa2e7e6a6	Remove alignment argument from memcpy/memmove/memset in favour of alignment attributes (Step 1) Summary: This is a resurrection of work first proposed and discussed in Aug 2015: http://lists.llvm.org/pipermail/llvm-dev/2015-August/089384.html and initially landed (but then backed out) in Nov 2015: http://lists.llvm.org/pipermail/llvm-commits/Week-of-Mon-20151109/312083.html The @llvm.memcpy/memmove/memset intrinsics currently have an explicit argument which is required to be a constant integer. It represents the alignment of the dest (and source), and so must be the minimum of the actual alignment of the two. This change is the first in a series that allows source and dest to each have their own alignments by using the alignment attribute on their arguments. In this change we: 1) Remove the alignment argument. 2) Add alignment attributes to the source & dest arguments. We, temporarily, require that the alignments for source & dest be equal. For example, code which used to read: call void @llvm.memcpy.p0i8.p0i8.i32(i8* %dest, i8* %src, i32 100, i32 4, i1 false) will now read call void @llvm.memcpy.p0i8.p0i8.i32(i8* align 4 %dest, i8* align 4 %src, i32 100, i1 false) Downstream users may have to update their lit tests that check for @llvm.memcpy/memmove/memset call/declaration patterns. The following extended sed script may help with updating the majority of your tests, but it does not catch all possible patterns so some manual checking and updating will be required. s~declare void @llvm\.mem(set\|cpy\|move)\.p([^(])$(.), i32, i1$~declare void @llvm.mem\1.p\2(\3, i1)~g s~call void @llvm\.memset\.p([^(])i8$i8([^])\ (.), i8 (.), i8 (.), i32 [01], i1 ([^)])$~call void @llvm.memset.p\1i8(i8\2* \3, i8 \4, i8 \5, i1 \6)~g s~call void @llvm\.memset\.p([^(])i16$i8([^])\ (.), i8 (.), i16 (.), i32 [01], i1 ([^)])$~call void @llvm.memset.p\1i16(i8\2* \3, i8 \4, i16 \5, i1 \6)~g s~call void @llvm\.memset\.p([^(])i32$i8([^])\ (.), i8 (.), i32 (.), i32 [01], i1 ([^)])$~call void @llvm.memset.p\1i32(i8\2* \3, i8 \4, i32 \5, i1 \6)~g s~call void @llvm\.memset\.p([^(])i64$i8([^])\ (.), i8 (.), i64 (.), i32 [01], i1 ([^)])$~call void @llvm.memset.p\1i64(i8\2* \3, i8 \4, i64 \5, i1 \6)~g s~call void @llvm\.memset\.p([^(])i128$i8([^])\ (.), i8 (.), i128 (.), i32 [01], i1 ([^)])$~call void @llvm.memset.p\1i128(i8\2* \3, i8 \4, i128 \5, i1 \6)~g s~call void @llvm\.memset\.p([^(])i8$i8([^])\ (.), i8 (.), i8 (.), i32 ([0-9]), i1 ([^)])$~call void @llvm.memset.p\1i8(i8\2 align \6 \3, i8 \4, i8 \5, i1 \7)~g s~call void @llvm\.memset\.p([^(])i16$i8([^])\ (.), i8 (.), i16 (.), i32 ([0-9]), i1 ([^)])$~call void @llvm.memset.p\1i16(i8\2 align \6 \3, i8 \4, i16 \5, i1 \7)~g s~call void @llvm\.memset\.p([^(])i32$i8([^])\ (.), i8 (.), i32 (.), i32 ([0-9]), i1 ([^)])$~call void @llvm.memset.p\1i32(i8\2 align \6 \3, i8 \4, i32 \5, i1 \7)~g s~call void @llvm\.memset\.p([^(])i64$i8([^])\ (.), i8 (.), i64 (.), i32 ([0-9]), i1 ([^)])$~call void @llvm.memset.p\1i64(i8\2 align \6 \3, i8 \4, i64 \5, i1 \7)~g s~call void @llvm\.memset\.p([^(])i128$i8([^])\ (.), i8 (.), i128 (.), i32 ([0-9]), i1 ([^)])$~call void @llvm.memset.p\1i128(i8\2 align \6 \3, i8 \4, i128 \5, i1 \7)~g s~call void @llvm\.mem(cpy\|move)\.p([^(])i8$i8([^])\ (.), i8([^])\ (.), i8 (.), i32 [01], i1 ([^)])$~call void @llvm.mem\1.p\2i8(i8\3 \4, i8\5* \6, i8 \7, i1 \8)~g s~call void @llvm\.mem(cpy\|move)\.p([^(])i16$i8([^])\ (.), i8([^])\ (.), i16 (.), i32 [01], i1 ([^)])$~call void @llvm.mem\1.p\2i16(i8\3 \4, i8\5* \6, i16 \7, i1 \8)~g s~call void @llvm\.mem(cpy\|move)\.p([^(])i32$i8([^])\ (.), i8([^])\ (.), i32 (.), i32 [01], i1 ([^)])$~call void @llvm.mem\1.p\2i32(i8\3 \4, i8\5* \6, i32 \7, i1 \8)~g s~call void @llvm\.mem(cpy\|move)\.p([^(])i64$i8([^])\ (.), i8([^])\ (.), i64 (.), i32 [01], i1 ([^)])$~call void @llvm.mem\1.p\2i64(i8\3 \4, i8\5* \6, i64 \7, i1 \8)~g s~call void @llvm\.mem(cpy\|move)\.p([^(])i128$i8([^])\ (.), i8([^])\ (.), i128 (.), i32 [01], i1 ([^)])$~call void @llvm.mem\1.p\2i128(i8\3 \4, i8\5* \6, i128 \7, i1 \8)~g s~call void @llvm\.mem(cpy\|move)\.p([^(])i8$i8([^])\ (.), i8([^])\ (.), i8 (.), i32 ([0-9]), i1 ([^)])$~call void @llvm.mem\1.p\2i8(i8\3* align \8 \4, i8\5* align \8 \6, i8 \7, i1 \9)~g s~call void @llvm\.mem(cpy\|move)\.p([^(])i16$i8([^])\ (.), i8([^])\ (.), i16 (.), i32 ([0-9]), i1 ([^)])$~call void @llvm.mem\1.p\2i16(i8\3* align \8 \4, i8\5* align \8 \6, i16 \7, i1 \9)~g s~call void @llvm\.mem(cpy\|move)\.p([^(])i32$i8([^])\ (.), i8([^])\ (.), i32 (.), i32 ([0-9]), i1 ([^)])$~call void @llvm.mem\1.p\2i32(i8\3* align \8 \4, i8\5* align \8 \6, i32 \7, i1 \9)~g s~call void @llvm\.mem(cpy\|move)\.p([^(])i64$i8([^])\ (.), i8([^])\ (.), i64 (.), i32 ([0-9]), i1 ([^)])$~call void @llvm.mem\1.p\2i64(i8\3* align \8 \4, i8\5* align \8 \6, i64 \7, i1 \9)~g s~call void @llvm\.mem(cpy\|move)\.p([^(])i128$i8([^])\ (.), i8([^])\ (.), i128 (.), i32 ([0-9]), i1 ([^)])$~call void @llvm.mem\1.p\2i128(i8\3* align \8 \4, i8\5* align \8 \6, i128 \7, i1 \9)~g The remaining changes in the series will: Step 2) Expand the IRBuilder API to allow creation of memcpy/memmove with differing source and dest alignments. Step 3) Update Clang to use the new IRBuilder API. Step 4) Update Polly to use the new IRBuilder API. Step 5) Update LLVM passes that create memcpy/memmove calls to use the new IRBuilder API, and those that use use MemIntrinsicInst::[get\|set]Alignment() to use getDestAlignment() and getSourceAlignment() instead. Step 6) Remove the single-alignment IRBuilder API for memcpy/memmove, and the MemIntrinsicInst::[get\|set]Alignment() methods. Reviewers: pete, hfinkel, lhames, reames, bollu Reviewed By: reames Subscribers: niosHD, reames, jholewinski, qcolombet, jfb, sanjoy, arsenm, dschuff, dylanmckay, mehdi_amini, sdardis, nemanjai, david2050, nhaehnle, javed.absar, sbc100, jgravelle-google, eraman, aheejin, kbarton, JDevlieghere, asb, rbar, johnrusso, simoncook, jordy.potman.lists, apazos, sabuasal, llvm-commits Differential Revision: https://reviews.llvm.org/D41675 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@322965 91177308-0d34-0410-b5e6-96231b3b80d8	2018-01-19 17:13:12 +00:00
Alexey Bataev	6072d5d178	[DEBUG] Add initial tests for debug info for NVPTX target, NFC. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@321822 91177308-0d34-0410-b5e6-96231b3b80d8	2018-01-04 21:07:07 +00:00
Sean Fertile	b71c6a9b39	[Memcpy Loop Lowering] Remove the fixed int8 lowering. Switch over to the lowering that uses target supplied operand types. Differential Revision: https://reviews.llvm.org/D41201 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@320989 91177308-0d34-0410-b5e6-96231b3b80d8	2017-12-18 15:31:14 +00:00
Sean Fertile	a9c4e7f664	[Memcpy Loop Lowering] Only calculate residual size/bytes copied when needed. If the loop operand type is int8 then there will be no residual loop for the unknown size expansion. Dont create the residual-size and bytes-copied values when they are not needed. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@320929 91177308-0d34-0410-b5e6-96231b3b80d8	2017-12-16 22:41:39 +00:00
Sean Fertile	7edcd376d3	[Memcpy Loop Lowering] Insert loop BB inbetween the split BB. The original memcpy expansion inserted the loop basic block inbetween the 2 new basic blocks created by splitting the original block the memcpy call was in. This commit makes the new memcpy expansion do the same to keep the layout of the IR matching between the old and new implementations. Differential Review: https://reviews.llvm.org/D41197 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@320848 91177308-0d34-0410-b5e6-96231b3b80d8	2017-12-15 19:29:12 +00:00
Artem Belevich	47a46192bb	[NVPTX,CUDA] Added llvm.nvvm.fns intrinsic and matching __nvvm_fns builtin in clang. Differential Revision: https://reviews.llvm.org/D40872 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@319909 91177308-0d34-0410-b5e6-96231b3b80d8	2017-12-06 17:50:05 +00:00
Jonas Hahnfeld	ba19c689aa	[NVPTX] Assign valid global names PTX requires that identifiers consist only of [a-zA-Z0-9_$]. The existing pass already ensured this for globals and this patch adds the cleanup for functions with local linkage. However, there was a different problem in the case of collisions of the adjusted name: The ValueSymbolTable then automatically appended ".N" with increasing Ns to get a unique name while helping the ABI demangling. Special case this behavior to omit the dots and append N directly. This will always give us legal names according to the PTX requirements. Differential Revision: https://reviews.llvm.org/D40573 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@319657 91177308-0d34-0410-b5e6-96231b3b80d8	2017-12-04 14:19:33 +00:00
Justin Lebar	d8660fa5dc	[NVPTX] Implement __nvvm_atom_add_gen_d builtin. Summary: This just seems to have been an oversight. We already supported the f64 atomic add with an explicit scope (e.g. "cta"), but not the scopeless version. Reviewers: tra Subscribers: jholewinski, sanjoy, cfe-commits, llvm-commits, hiraditya Differential Revision: https://reviews.llvm.org/D39638 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@317623 91177308-0d34-0410-b5e6-96231b3b80d8	2017-11-07 22:10:54 +00:00
Vedant Kumar	b57c6f4150	[Verifier] Remove the -verify-debug-info cl::opt This cl::opt has been dead for a while. It's no longer possible to run the verifier without also verifying debug info. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@317288 91177308-0d34-0410-b5e6-96231b3b80d8	2017-11-02 23:44:20 +00:00
Artur Gainullin	359c2dcf35	Improve clamp recognition in ValueTracking. Summary: ValueTracking was recognizing not all variations of clamp. Swapping of true value and false value of select was added to fix this problem. The first patch was reverted because it caused miscompile in NVPTX target. Added corresponding test cases. Reviewers: spatel, majnemer, efriedma, reames Subscribers: llvm-commits, jholewinski Differential Revision: https://reviews.llvm.org/D39240 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@316795 91177308-0d34-0410-b5e6-96231b3b80d8	2017-10-27 20:53:41 +00:00
Artem Belevich	c79e8ba6d6	[NVPTX] allow address space inference for volatile loads/stores. If particular target supports volatile memory access operations, we can avoid AS casting to generic AS. Currently it's only enabled in NVPTX for loads and stores that access global & shared AS. Differential Revision: https://reviews.llvm.org/D39026 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@316495 91177308-0d34-0410-b5e6-96231b3b80d8	2017-10-24 20:31:44 +00:00
Artem Belevich	a187b4878e	[NVPTX] Implemented wmma intrinsics and instructions. WMMA = "Warp Level Matrix Multiply-Accumulate". These are the new instructions introduced in PTX6.0 and available on sm_70 GPUs. Differential Revision: https://reviews.llvm.org/D38645 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@315601 91177308-0d34-0410-b5e6-96231b3b80d8	2017-10-12 18:27:55 +00:00
Artem Belevich	b636450ff9	[NVPTX] added match.{any,all}.sync instructions, intrinsics & builtins. Differential Revision: https://reviews.llvm.org/D38191 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@314223 91177308-0d34-0410-b5e6-96231b3b80d8	2017-09-26 17:07:23 +00:00
Justin Lebar	ba3e4c801a	Revert "[NVPTX] added match.{any,all}.sync instructions, intrinsics & builtins.", rL314135. Causing assertion failures on macos: > Assertion failed: (Num < NumOperands && "Invalid child # of SDNode!"), > function getOperand, file > /Users/buildslave/jenkins/workspace/clang-stage1-cmake-RA-incremental/llvm/include/llvm/CodeGen/SelectionDAGNodes.h, > line 835. http://green.lab.llvm.org/green/job/clang-stage1-cmake-RA-incremental/42739/testReport/LLVM/CodeGen_NVPTX/surf_read_cuda_ll/ git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@314142 91177308-0d34-0410-b5e6-96231b3b80d8	2017-09-25 19:41:56 +00:00
Artem Belevich	737f60180a	[NVPTX] added match.{any,all}.sync instructions, intrinsics & builtins. Differential Revision: https://reviews.llvm.org/D38191 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@314135 91177308-0d34-0410-b5e6-96231b3b80d8	2017-09-25 18:53:57 +00:00
Artem Belevich	c02a4f5a57	[NVPTX] Implemented bar.warp.sync, barrier.sync, and vote{.sync} instructions/intrinsics/builtins. Differential Revision: https://reviews.llvm.org/D38148 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@313898 91177308-0d34-0410-b5e6-96231b3b80d8	2017-09-21 18:44:49 +00:00
Artem Belevich	34fb94caca	[NVPTX] Implemented shfl.sync instruction and supporting intrinsics/builtins. Differential Revision: https://reviews.llvm.org/D38090 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@313820 91177308-0d34-0410-b5e6-96231b3b80d8	2017-09-20 21:23:07 +00:00
Artem Belevich	4059d374ce	[CUDA] Added rudimentary support for CUDA-9 and sm_70. For now CUDA-9 is not included in the list of CUDA versions clang searches for, so the path to CUDA-9 must be explicitly passed via --cuda-path=. On LLVM side NVPTX added sm_70 GPU type which bumps required PTX version to 6.0, but otherwise is equivalent to sm_62 at the moment. Differential Revision: https://reviews.llvm.org/D37576 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@312734 91177308-0d34-0410-b5e6-96231b3b80d8	2017-09-07 18:14:32 +00:00
Adrian Prantl	69e607f200	Canonicalize the representation of empty an expression in DIGlobalVariableExpression This change simplifies code that has to deal with DIGlobalVariableExpression and mirrors how we treat DIExpressions in debug info intrinsics. Before this change there were two ways of representing empty expressions on globals, a nullptr and an empty !DIExpression(). If someone needs to upgrade out-of-tree testcases: perl -pi -e 's/(!DIGlobalVariableExpression$var: ![0-9]*)$/\1, expr: !DIExpression())/g' <MYTEST.ll> will catch 95%. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@312144 91177308-0d34-0410-b5e6-96231b3b80d8	2017-08-30 18:06:51 +00:00
Artem Belevich	1ef1996909	[NVPTX] Add lowering of i128 params. The patch adds support of i128 params lowering. The changes are quite trivial to support i128 as a "special case" of integer type. With this patch, we lower i128 params the same way as aggregates of size 16 bytes: .param .b8 _ [16]. Currently, NVPTX can't deal with the 128 bit integers: * in some cases because of failed assertions like ValVTs.size() == OutVals.size() && "Bad return value decomposition" * in other cases emitting PTX with .i128 or .u128 types (which are not valid [1]) [1] http://docs.nvidia.com/cuda/parallel-thread-execution/index.html#fundamental-types Differential Revision: https://reviews.llvm.org/D34555 Patch by: Denys Zariaiev (denys.zariaiev@gmail.com) git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@308675 91177308-0d34-0410-b5e6-96231b3b80d8	2017-07-20 21:16:03 +00:00
Sean Fertile	471398ffea	Extend memcpy expansion in Transform/Utils to handle wider operand types. Adds loop expansions for known-size and unknown-sized memcpy calls, allowing the target to provide the operand types through TTI callbacks. The default values for the TTI callbacks use int8 operand types and matches the existing behaviour if they aren't overridden by the target. Differential revision: https://reviews.llvm.org/D32536 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@307346 91177308-0d34-0410-b5e6-96231b3b80d8	2017-07-07 02:00:06 +00:00
Michael Kuperstein	77b223ff61	Reverting r307326 because it breaks clang tests. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@307334 91177308-0d34-0410-b5e6-96231b3b80d8	2017-07-06 23:24:39 +00:00
Michael Kuperstein	1803a9f234	[NVPTX] Add lowering of i128 params. The patch adds support of i128 params lowering. The changes are quite trivial to support i128 as a "special case" of integer type. With this patch, we lower i128 params the same way as aggregates of size 16 bytes: .param .b8 _ [16]. Currently, NVPTX can't deal with the 128 bit integers: * in some cases because of failed assertions like ValVTs.size() == OutVals.size() && "Bad return value decomposition" * in other cases emitting PTX with .i128 or .u128 types (which are not valid [1]) [1] http://docs.nvidia.com/cuda/parallel-thread-execution/index.html#fundamental-types Differential Revision: https://reviews.llvm.org/D34555 Patch by: Denys Zariaiev (denys.zariaiev@gmail.com) git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@307326 91177308-0d34-0410-b5e6-96231b3b80d8	2017-07-06 22:18:54 +00:00
Teresa Johnson	4da9193a65	Recommit "r306541 - Add zero-length check to memcpy/memset load store loop expansion"" With fix for use-after-free errors. We can't add the new branch and remove the old one until we are done with the Builder constructed for the block. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@306937 91177308-0d34-0410-b5e6-96231b3b80d8	2017-07-01 03:24:10 +00:00
Daniel Jasper	ab8d3d1763	Revert "r306541 - Add zero-length check to memcpy/memset load store loop expansion" Segfaults in non-optimized builds. I'll get a stack trace and a reproducer to Teresa. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@306793 91177308-0d34-0410-b5e6-96231b3b80d8	2017-06-30 06:37:33 +00:00
Teresa Johnson	457765feeb	Add zero-length check to memcpy/memset load store loop expansion Summary: I was testing using this expansion logic in other cases besides NVPTX, and found some runtime failures due to the lack of a check for a zero length memcpy/memset before the loop. There is already such a check in the memmove expansion code though. Reviewers: hfinkel Subscribers: jholewinski, wdng, llvm-commits Differential Revision: https://reviews.llvm.org/D34707 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@306541 91177308-0d34-0410-b5e6-96231b3b80d8	2017-06-28 13:07:37 +00:00
Hans Wennborg	aade6b806c	Revert r302938 "Add LiveRangeShrink pass to shrink live range within BB." This also reverts follow-ups r303292 and r303298. It broke some Chromium tests under MSan, and apparently also internal tests at Google. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@303369 91177308-0d34-0410-b5e6-96231b3b80d8	2017-05-18 18:50:05 +00:00
Dehao Chen	a44d688d96	Only enable LiveRangeShrink for x86. Summary: Moving LiveRangeShrink to x86 as this pass is mostly useful for archtectures with great register pressure. Reviewers: MatzeB, qcolombet Reviewed By: qcolombet Subscribers: jholewinski, jyknight, javed.absar, llvm-commits Differential Revision: https://reviews.llvm.org/D33294 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@303292 91177308-0d34-0410-b5e6-96231b3b80d8	2017-05-17 20:18:13 +00:00
Simon Pilgrim	2223371da5	[NVPTX] Don't flag StoreParam/LoadParam memory chain operands as ReadMem/WriteMem (PR32146) Follow up to D33147 NVPTXTargetLowering::LowerCall was trusting the default argument values. Fixes another 17 of the NVPTX '-verify-machineinstrs with EXPENSIVE_CHECKS' errors in PR32146. Differential Revision: https://reviews.llvm.org/D33189 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@303082 91177308-0d34-0410-b5e6-96231b3b80d8	2017-05-15 17:17:44 +00:00
Simon Pilgrim	4633bbb4c7	[NVPTX] Don't flag StoreRetVal memory chain operands as ReadMem (PR32146) This fixes 47 of the 75 NVPTX '-verify-machineinstrs with EXPENSIVE_CHECKS' errors in PR32146. Differential Revision: https://reviews.llvm.org/D33147 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@302942 91177308-0d34-0410-b5e6-96231b3b80d8	2017-05-12 19:56:43 +00:00
Dehao Chen	0faf9ed31e	Add LiveRangeShrink pass to shrink live range within BB. Summary: LiveRangeShrink pass moves instruction right after the definition with the same BB if the instruction and its operands all have more than one use. This pass is inexpensive and guarantees optimal live-range within BB. Reviewers: davidxl, wmi, hfinkel, MatzeB, andreadb Reviewed By: MatzeB, andreadb Subscribers: hiraditya, jyknight, sanjoy, skatkov, gberry, jholewinski, qcolombet, javed.absar, krytarowski, atrick, spatel, RKSimon, andreadb, MatzeB, mehdi_amini, mgorny, efriedma, davide, dberlin, llvm-commits Differential Revision: https://reviews.llvm.org/D32563 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@302938 91177308-0d34-0410-b5e6-96231b3b80d8	2017-05-12 19:29:27 +00:00
Simon Pilgrim	ed79276e6b	[SelectionDAG] Improve support for promotion of <1 x fX> floating point argument types (PR31088) PR31088 demonstrated that we were assuming that only integers require promotion from <1 x iX> types, when in fact float types may require it as well - in this case half floats. This patch adds support for extension/truncation for both integer and float types. Differential Revision: https://reviews.llvm.org/D32391 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@301910 91177308-0d34-0410-b5e6-96231b3b80d8	2017-05-02 10:33:08 +00:00
Matt Arsenault	bdbe8280f2	Add address space mangling to lifetime intrinsics In preparation for allowing allocas to have non-0 addrspace. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@299876 91177308-0d34-0410-b5e6-96231b3b80d8	2017-04-10 20:18:21 +00:00
Jonas Paulsson	1b6f5a39a9	[SelectionDAG] Optimize VSELECT->SETCC of incompatible or illegal types. Don't scalarize VSELECT->SETCC when operands/results needs to be widened, or when the type of the SETCC operands are different from those of the VSELECT. (VSELECT SETCC) and (VSELECT (AND/OR/XOR (SETCC,SETCC))) are handled. The previous splitting of VSELECT->SETCC in DAGCombiner::visitVSELECT() is no longer needed and has been removed. Updated tests: test/CodeGen/ARM/vuzp.ll test/CodeGen/NVPTX/f16x2-instructions.ll test/CodeGen/X86/2011-10-19-widen_vselect.ll test/CodeGen/X86/2011-10-21-widen-cmp.ll test/CodeGen/X86/psubus.ll test/CodeGen/X86/vselect-pcmp.ll Review: Eli Friedman, Simon Pilgrim https://reviews.llvm.org/D29489 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@297930 91177308-0d34-0410-b5e6-96231b3b80d8	2017-03-16 07:17:12 +00:00
Artem Belevich	85cbde2b32	[NVPTX] Fixed lowering of unaligned loads/stores of f16 scalars and vectors. Differential Revision: https://reviews.llvm.org/D30672 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@297198 91177308-0d34-0410-b5e6-96231b3b80d8	2017-03-07 20:33:38 +00:00
Taewook Oh	15497c13fd	[DAGCombiner] Fix DebugLoc propagation when folding !(x cc y) -> (x !cc y) Summary: Currently, when 't1: i1 = setcc t2, t3, cc' followed by 't4: i1 = xor t1, Constant:i1<-1>' is folded into 't5: i1 = setcc t2, t3 !cc', SDLoc of newly created SDValue 't5' follows SDLoc of 't4', not 't1'. However, as the opcode of newly created SDValue is 'setcc', it make more sense to take DebugLoc from 't1' than 't4'. For the code below ``` extern int bar(); extern int baz(); int foo(int x, int y) { if (x != y) return bar(); else return baz(); } ``` , following is the bitcode representation of 'foo' at the end of llvm-ir level optimization: ``` define i32 @foo(i32 %x, i32 %y) !dbg !4 { entry: tail call void @llvm.dbg.value(metadata i32 %x, i64 0, metadata !9, metadata !11), !dbg !12 tail call void @llvm.dbg.value(metadata i32 %y, i64 0, metadata !10, metadata !11), !dbg !13 %cmp = icmp ne i32 %x, %y, !dbg !14 br i1 %cmp, label %if.then, label %if.else, !dbg !16 if.then: ; preds = %entry %call = tail call i32 (...) @bar() #3, !dbg !17 br label %return, !dbg !18 if.else: ; preds = %entry %call1 = tail call i32 (...) @baz() #3, !dbg !19 br label %return, !dbg !20 return: ; preds = %if.else, %if.then %retval.0 = phi i32 [ %call, %if.then ], [ %call1, %if.else ] ret i32 %retval.0, !dbg !21 } !14 = !DILocation(line: 5, column: 9, scope: !15) !16 = !DILocation(line: 5, column: 7, scope: !4) ``` As you can see, in 'entry' block, 'icmp' instruction and 'br' instruction have different debug locations. However, with current implementation, there's no distinction between debug locations of these two when they are lowered to asm instructions. This is because 'icmp' and 'br' become 'setcc' 'xor' and 'brcond' in SelectionDAG, where SDLoc of 'setcc' follows the debug location of 'icmp' but SDLOC of 'xor' and 'brcond' follows the debug location of 'br' instruction, and SDLoc of 'xor' overwrites SDLoc of 'setcc' when they are folded. This patch addresses this issue. Reviewers: atrick, bogner, andreadb, craig.topper, aprantl Reviewed By: andreadb Subscribers: jlebar, mkuper, jholewinski, andreadb, llvm-commits Differential Revision: https://reviews.llvm.org/D29813 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@296825 91177308-0d34-0410-b5e6-96231b3b80d8	2017-03-02 21:58:35 +00:00
Sanjay Patel	9a9478ccb0	[DAGCombiner] add missing folds for scalar select of {-1,0,1} The motivation for filling out these select-of-constants cases goes back to D24480, where we discussed removing an IR fold from add(zext) --> select. And that goes back to: https://reviews.llvm.org/rL75531 https://reviews.llvm.org/rL159230 The idea is that we should always canonicalize patterns like this to a select-of-constants in IR because that's the smallest IR and the best for value tracking. Note that we currently do the opposite in some cases (like the cases in this patch). Ie, the proposed folds in this patch already exist in InstCombine today: https://github.com/llvm-mirror/llvm/blob/master/lib/Transforms/InstCombine/InstCombineSelect.cpp#L1151 As this patch shows, most targets generate better machine code for simple ext/add/not ops rather than a select of constants. So the follow-up steps to make this less of a patchwork of special-case folds and missing IR canonicalization: 1. Have DAGCombiner convert any select of constants into ext/add/not ops. 2 Have InstCombine canonicalize in the other direction (create more selects). Differential Revision: https://reviews.llvm.org/D30180 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@296137 91177308-0d34-0410-b5e6-96231b3b80d8	2017-02-24 17:17:33 +00:00

1 2 3 4 5 ...

298 Commits