Commit Graph

247 Commits

Author SHA1 Message Date
Artem Belevich
b1a24afa7a [NVPTX] Unify vectorization of load/stores of aggregate arguments and return values.
Original code only used vector loads/stores for explicit vector arguments.
It could also do more loads/stores than necessary (e.g v5f32 would
touch 8 f32 values). Aggregate types were loaded one element at a time,
even the vectors contained within.

This change attempts to generalize (and simplify) parameter space
loads/stores so that vector loads/stores can be used more broadly.
Functionality of the patch has been verified by compiling thrust
test suite and manually checking the differences between PTX
generated by llvm with and without the patch.

General algorithm:
* ComputePTXValueVTs() flattens input/output argument into a flat list
  of scalars to load/store and returns their types and offsets.
* VectorizePTXValueVTs() uses that data to create vectorization plan
  which returns an array of flags marking boundaries of vectorized
  load/stores. Scalars are represented as 1-element vectors.
* Code that generates loads/stores implements a simple state machine
  that constructs a vector according to the plan.

Differential Revision: https://reviews.llvm.org/D30011

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@295784 91177308-0d34-0410-b5e6-96231b3b80d8
2017-02-21 22:56:05 +00:00
Justin Lebar
ff921bc617 [NVPTX] Add tests that invariant vector loads get lowered to ld.global.nc.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@294082 91177308-0d34-0410-b5e6-96231b3b80d8
2017-02-04 01:54:56 +00:00
Justin Lebar
77ddc3f5dd [NVPTX] Enable combineRepeatedFPDivisors for NVPTX.
Reviewers: tra

Subscribers: jholewinski, llvm-commits

Differential Revision: https://reviews.llvm.org/D29477

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@294011 91177308-0d34-0410-b5e6-96231b3b80d8
2017-02-03 15:13:50 +00:00
Matt Arsenault
8c93ddba10 NVPTX: Fix not preserving volatile when expanding memset
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@293851 91177308-0d34-0410-b5e6-96231b3b80d8
2017-02-02 01:20:34 +00:00
Justin Lebar
6f09ea3f57 [NVPTX] Compute approx sqrt as 1/rsqrt(x) rather than x*rsqrt(x).
x*rsqrt(x) returns NaN for x == 0, whereas 1/rsqrt(x) returns 0, as
desired.

Verified that the particular nvptx approximate instructions here do in
fact return 0 for x = 0.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@293713 91177308-0d34-0410-b5e6-96231b3b80d8
2017-01-31 23:08:57 +00:00
Nicolai Haehnle
ab43652716 [DAGCombine] require UnsafeFPMath for re-association of addition
Summary:
The affected transforms all implicitly use associativity of addition,
for which we usually require unsafe math to be enabled.

The "Aggressive" flag is only meant to convey information about the
performance of the fused ops relative to a fmul+fadd sequence.

Fixes Bug 31626.

Reviewers: spatel, hfinkel, mehdi_amini, arsenm, tstellarAMD

Subscribers: jholewinski, nemanjai, wdng, llvm-commits

Differential Revision: https://reviews.llvm.org/D28675

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@293635 91177308-0d34-0410-b5e6-96231b3b80d8
2017-01-31 14:35:37 +00:00
Justin Lebar
6731e5394b [NVPTX] Implement NVPTXTargetLowering::getSqrtEstimate.
Summary:

This lets us lower to sqrt.approx and rsqrt.approx under more
circumstances.

* Now we emit sqrt.approx and rsqrt.approx for calls to @llvm.sqrt.f32,
  when fast-math is enabled.  Previously, we only would emit it for
  calls to @llvm.nvvm.sqrt.f.  (With this patch we no longer emit
  sqrt.approx for calls to @llvm.nvvm.sqrt.f; we rely on intcombine to
  simplify llvm.nvvm.sqrt.f into llvm.sqrt.f32.)

* Now we emit the ftz version of rsqrt.approx when ftz is enabled.
  Previously, we only emitted rsqrt.approx when ftz was disabled.

Reviewers: hfinkel

Subscribers: llvm-commits, tra, jholewinski

Differential Revision: https://reviews.llvm.org/D28508

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@293605 91177308-0d34-0410-b5e6-96231b3b80d8
2017-01-31 05:58:22 +00:00
Matt Arsenault
9be098398c NVPTX: Move InferAddressSpaces to generic code
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@293579 91177308-0d34-0410-b5e6-96231b3b80d8
2017-01-31 01:10:58 +00:00
Matt Arsenault
4aff9df5c8 NVPTX: Refactor NVPTXInferAddressSpaces to check TTI
Add a new TTI hook for getting the generic address space value.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@293563 91177308-0d34-0410-b5e6-96231b3b80d8
2017-01-30 23:02:12 +00:00
Arpith Chacko Jacob
5692e8ecc6 [NVPTX] Add intrinsics to support named barriers.
Support for barrier synchronization between a subset of threads
in a CTA through one of sixteen explicitly specified barriers.
These intrinsics are not directly exposed in CUDA but are
critical for forthcoming support of OpenMP on NVPTX GPUs.

The intrinsics allow the synchronization of an arbitrary
(multiple of 32) number of threads in a CTA at one of 16
distinct barriers. The two intrinsics added are as follows:

call void @llvm.nvvm.barrier.n(i32 10)
waits for all threads in a CTA to arrive at named barrier #10.

call void @llvm.nvvm.barrier(i32 15, i32 992)
waits for 992 threads in a CTA to arrive at barrier #15.

Detailed description of these intrinsics are available in the PTX manual.
http://docs.nvidia.com/cuda/parallel-thread-execution/#parallel-synchronization-and-communication-instructions

Reviewers: hfinkel, jlebar
Differential Revision: https://reviews.llvm.org/D17657


git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@293384 91177308-0d34-0410-b5e6-96231b3b80d8
2017-01-28 16:38:15 +00:00
Benjamin Kramer
84d682db0f Fix some broken CHECK lines.
The colon is important.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@292761 91177308-0d34-0410-b5e6-96231b3b80d8
2017-01-22 20:28:56 +00:00
Justin Lebar
ef5ba4ebc1 [NVPTX] Add explicit check for llvm.sqrt.f32 to intrinsics.ll.
Test-only change.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@292690 91177308-0d34-0410-b5e6-96231b3b80d8
2017-01-21 00:59:23 +00:00
Artem Belevich
df5ffd6153 [NVPTX] Fix lowering of fp16 ISD::FNEG.
There's no neg.f16 instruction, so negation has to
be done via subtraction from zero.

Differential Revision: https://reviews.llvm.org/D28876

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@292452 91177308-0d34-0410-b5e6-96231b3b80d8
2017-01-19 00:14:45 +00:00
Justin Lebar
dc30ded6fb [NVPTX] Support global variables of integer type larger than i64.
Reviewers: tra, majnemer

Subscribers: llvm-commits, jholewinski

Differential Revision: https://reviews.llvm.org/D28825

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@292316 91177308-0d34-0410-b5e6-96231b3b80d8
2017-01-18 00:29:53 +00:00
Justin Lebar
db8ecafe57 [NVPTX] Implement min/max in tablegen, rather than with custom DAGComine logic.
Summary:
This change also lets us use max.{s,u}16.  There's a vague warning in a
test about this maybe being less efficient, but I could not come up with
a case where the resulting SASS (sm_35 or sm_60) was different with or
without max.{s,u}16.  It's true that nvcc seems to emit only
max.{s,u}32, but even ptxas 7.0 seems to have no problem generating
efficient SASS from max.{s,u}16 (the casts up to i32 and back down to
i16 seem to be implicit and nops, happening via register aliasing).

In the absence of evidence, better to have fewer special cases, emit
more straightforward code, etc.  In particular, if a new GPU has 16-bit
min/max instructions, we want to be able to use them.

Reviewers: tra

Subscribers: jholewinski, llvm-commits

Differential Revision: https://reviews.llvm.org/D28732

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@292304 91177308-0d34-0410-b5e6-96231b3b80d8
2017-01-18 00:09:01 +00:00
Justin Lebar
f78819e87f [NVPTX] Lower integer absolute value idiom to abs instruction.
Summary: Previously we lowered it literally, to shifts and xors.

Reviewers: tra

Subscribers: jholewinski, llvm-commits

Differential Revision: https://reviews.llvm.org/D28722

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@292303 91177308-0d34-0410-b5e6-96231b3b80d8
2017-01-18 00:08:44 +00:00
Justin Lebar
38c1089801 [NVPTX] Improve lowering of llvm.ctpop.
Summary:
Avoid an unnecessary conversion operation when using the result of
ctpop.i32 or ctpop.i16 as an i32, as in both cases the ptx instruction
we run returns an i32.

(Previously if we used the value as an i32, we'd do an unnecessary
zext+trunc.)

Reviewers: tra

Subscribers: jholewinski, llvm-commits

Differential Revision: https://reviews.llvm.org/D28721

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@292302 91177308-0d34-0410-b5e6-96231b3b80d8
2017-01-18 00:08:27 +00:00
Justin Lebar
7470b2cf62 [NVPTX] Add lowering for llvm.bitreverse.
Reviewers: tra

Subscribers: llvm-commits, jholewinski

Differential Revision: https://reviews.llvm.org/D28720

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@292301 91177308-0d34-0410-b5e6-96231b3b80d8
2017-01-18 00:08:10 +00:00
Justin Lebar
9ff5ce49a6 [NVPTX] Fix function names in ctlz.ll test. Test-only change.
Looks like a copy/paste mistake, all the functions in ctlz.ll were named
"ctpop".

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@292300 91177308-0d34-0410-b5e6-96231b3b80d8
2017-01-18 00:07:52 +00:00
Justin Lebar
fd46a6b819 [NVPTX] Improve lowering of llvm.ctlz.
Summary:
* Disable "ctlz speculation", which inserts a branch on every ctlz(x) which
  has defined behavior on x == 0 to check whether x is, in fact zero.

* Add DAG patterns that avoid re-truncating or re-expanding the result
  of the 16- and 64-bit ctz instructions.

Reviewers: tra

Subscribers: llvm-commits, jholewinski

Differential Revision: https://reviews.llvm.org/D28719

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@292299 91177308-0d34-0410-b5e6-96231b3b80d8
2017-01-18 00:07:35 +00:00
Justin Lebar
f4556e7ece [NVPTX] Add fptosi tests to convert-fp.ll.
These seem to have been left off by accident.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@292071 91177308-0d34-0410-b5e6-96231b3b80d8
2017-01-15 16:55:54 +00:00
Justin Lebar
164f3ff43b [NVPTX] Add codegen tests for llvm.fma.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@292070 91177308-0d34-0410-b5e6-96231b3b80d8
2017-01-15 16:55:37 +00:00
Justin Lebar
be5157f530 [NVPTX] Modernize intrinics.ll test.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@292069 91177308-0d34-0410-b5e6-96231b3b80d8
2017-01-15 16:54:57 +00:00
Justin Lebar
352f7fdadc [NVPTX] Let there be One True Way to set NVVMReflect params.
Summary:
Previously there were three ways to inform the NVVMReflect pass whether
you wanted to flush denormals to zero:

  * An LLVM command-line option
  * Parameters to the NVVMReflect constructor
  * Metadata on the module itself.

This change removes the first two, leaving only the third.

The motivation for this change, aside from simplifying things, is that
we want LLVM to be aware of whether it's operating in FTZ mode, so other
passes can use this information.  Ideally we'd have a target-generic
piece of metadata on the module.  This change moves us in that
direction.

Reviewers: tra

Subscribers: jholewinski, llvm-commits

Differential Revision: https://reviews.llvm.org/D28700

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@292068 91177308-0d34-0410-b5e6-96231b3b80d8
2017-01-15 16:54:35 +00:00
Artem Belevich
f53524b4f6 [NVPTX] Added support for half-precision floating point.
Only scalar half-precision operations are supported at the moment.

- Adds general support for 'half' type in NVPTX.
- fp16 math operations are supported on sm_53+ GPUs only
  (can be disabled with --nvptx-no-f16-math).
- Type conversions to/from fp16 are supported on all GPU variants.
- On GPU variants that do not have full fp16 support (or if it's disabled),
  fp16 operations are promoted to fp32 and results are converted back
  to fp16 for storage.

Differential Revision: https://reviews.llvm.org/D28540

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@291956 91177308-0d34-0410-b5e6-96231b3b80d8
2017-01-13 20:56:17 +00:00
Artem Belevich
e41bb16926 [NVPTX] Only lower sin/cos to approximate instructions if unsafe math is allowed.
Previously we'd always lower @llvm.{sin,cos}.f32 to {sin.cos}.approx.f32
instruction even when unsafe FP math was not allowed.

Clang-generated IR is not affected by this as it uses precise sin/cos
from CUDA's libdevice when unsafe math is disabled.

Differential Revision: https://reviews.llvm.org/D28619

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@291936 91177308-0d34-0410-b5e6-96231b3b80d8
2017-01-13 18:48:13 +00:00
Justin Lebar
e9bf848e46 [TM] Restore default TargetOptions in TargetMachine::resetTargetOptions.
Summary:
Previously if you had

 * a function with the fast-math-enabled attr, followed by
 * a function without the fast-math attr,

the second function would inherit the first function's fast-math-ness.

This means that mixing fast-math and non-fast-math functions in a module
was completely broken unless you explicitly annotated every
non-fast-math function with "unsafe-fp-math"="false".  This appears to
have been broken since r176986 (March 2013), when the resetTargetOptions
function was introduced.

This patch tests the correct behavior as best we can.  I don't think I
can test FPDenormalMode and NoTrappingFPMath, because they aren't used
in any backends during function lowering.  Surprisingly, I also can't
find any uses at all of LessPreciseFPMAD affecting generated code.

The NVPTX/fast-math.ll test changes are an expected result of fixing
this bug.  When FMA is disabled, we emit add as "add.rn.f32", which
prevents fma combining.  Before this patch, fast-math was enabled in all
functions following the one which explicitly enabled it on itself, so we
were emitting plain "add.f32" where we should have generated
"add.rn.f32".

Reviewers: mkuper

Subscribers: hfinkel, majnemer, jholewinski, nemanjai, llvm-commits

Differential Revision: https://reviews.llvm.org/D28507

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@291618 91177308-0d34-0410-b5e6-96231b3b80d8
2017-01-10 23:43:04 +00:00
Justin Lebar
9975cecf1a [NVPTX] Add CHECK-LABEL where appropriate to fast-math.ll test.
Also fix up whitespace.

Test-only change.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@291617 91177308-0d34-0410-b5e6-96231b3b80d8
2017-01-10 23:42:46 +00:00
David Majnemer
5b3f19dc7f [SelectionDAG] Correctly transform range metadata to AssertZExt
We used the logBase2 of the high instead of the ceilLogBase2 resulting
in the wrong result for certain values.  For example, it resulted in an
i1 AssertZExt when the exclusive portion of the range was 3.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@291196 91177308-0d34-0410-b5e6-96231b3b80d8
2017-01-06 00:11:46 +00:00
David Majnemer
f35020be62 [NVVMIntrRange] Only set range metadata if none is already present
The range metadata inserted by NVVMIntrRange is pessimistic, range
metadata already present could be more precise.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@290294 91177308-0d34-0410-b5e6-96231b3b80d8
2016-12-22 00:51:59 +00:00
Adrian Prantl
7b500b4bdf [IR] Remove the DIExpression field from DIGlobalVariable.
This patch implements PR31013 by introducing a
DIGlobalVariableExpression that holds a pair of DIGlobalVariable and
DIExpression.

Currently, DIGlobalVariables holds a DIExpression. This is not the
best way to model this:

(1) The DIGlobalVariable should describe the source level variable,
    not how to get to its location.

(2) It makes it unsafe/hard to update the expressions when we call
    replaceExpression on the DIGLobalVariable.

(3) It makes it impossible to represent a global variable that is in
    more than one location (e.g., a variable with multiple
    DW_OP_LLVM_fragment-s).  We also moved away from attaching the
    DIExpression to DILocalVariable for the same reasons.

This reapplies r289902 with additional testcase upgrades and a change
to the Bitcode record for DIGlobalVariable, that makes upgrading the
old format unambiguous also for variables without DIExpressions.

<rdar://problem/29250149>
https://llvm.org/bugs/show_bug.cgi?id=31013
Differential Revision: https://reviews.llvm.org/D26769

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@290153 91177308-0d34-0410-b5e6-96231b3b80d8
2016-12-20 02:09:43 +00:00
Adrian Prantl
096faa974a Revert "[IR] Remove the DIExpression field from DIGlobalVariable."
This reverts commit 289920 (again).
I forgot to implement a Bitcode upgrade for the case where a DIGlobalVariable
has not DIExpression. Unfortunately it is not possible to safely upgrade
these variables without adding a flag to the bitcode record indicating which
version they are.
My plan of record is to roll the planned follow-up patch that adds a
unit: field to DIGlobalVariable into this patch before recomitting.
This way we only need one Bitcode upgrade for both changes (with a
version flag in the bitcode record to safely distinguish the record
formats).

Sorry for the churn!

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@289982 91177308-0d34-0410-b5e6-96231b3b80d8
2016-12-16 19:39:01 +00:00
Adrian Prantl
eb38a2a075 [IR] Remove the DIExpression field from DIGlobalVariable.
This patch implements PR31013 by introducing a
DIGlobalVariableExpression that holds a pair of DIGlobalVariable and
DIExpression.

Currently, DIGlobalVariables holds a DIExpression. This is not the
best way to model this:

(1) The DIGlobalVariable should describe the source level variable,
    not how to get to its location.

(2) It makes it unsafe/hard to update the expressions when we call
    replaceExpression on the DIGLobalVariable.

(3) It makes it impossible to represent a global variable that is in
    more than one location (e.g., a variable with multiple
    DW_OP_LLVM_fragment-s).  We also moved away from attaching the
    DIExpression to DILocalVariable for the same reasons.

This reapplies r289902 with additional testcase upgrades.

<rdar://problem/29250149>
https://llvm.org/bugs/show_bug.cgi?id=31013
Differential Revision: https://reviews.llvm.org/D26769

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@289920 91177308-0d34-0410-b5e6-96231b3b80d8
2016-12-16 04:25:54 +00:00
Adrian Prantl
7766e56d48 Revert "[IR] Remove the DIExpression field from DIGlobalVariable."
This reverts commit 289902 while investigating bot berakage.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@289906 91177308-0d34-0410-b5e6-96231b3b80d8
2016-12-16 01:00:30 +00:00
Adrian Prantl
1b11b0778e [IR] Remove the DIExpression field from DIGlobalVariable.
This patch implements PR31013 by introducing a
DIGlobalVariableExpression that holds a pair of DIGlobalVariable and
DIExpression.

Currently, DIGlobalVariables holds a DIExpression. This is not the
best way to model this:

(1) The DIGlobalVariable should describe the source level variable,
    not how to get to its location.

(2) It makes it unsafe/hard to update the expressions when we call
    replaceExpression on the DIGLobalVariable.

(3) It makes it impossible to represent a global variable that is in
    more than one location (e.g., a variable with multiple
    DW_OP_LLVM_fragment-s).  We also moved away from attaching the
    DIExpression to DILocalVariable for the same reasons.

<rdar://problem/29250149>
https://llvm.org/bugs/show_bug.cgi?id=31013
Differential Revision: https://reviews.llvm.org/D26769

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@289902 91177308-0d34-0410-b5e6-96231b3b80d8
2016-12-16 00:36:43 +00:00
Justin Lebar
1df9c48f11 Whitespace cleanup in test/CodeGen/NVPTX/annotations.ll.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@289730 91177308-0d34-0410-b5e6-96231b3b80d8
2016-12-14 22:32:55 +00:00
Justin Lebar
2eea31ff42 [NVPTX] Support .maxnreg annotation.
Reviewers: tra

Subscribers: llvm-commits, jholewinski

Differential Revision: https://reviews.llvm.org/D27638

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@289729 91177308-0d34-0410-b5e6-96231b3b80d8
2016-12-14 22:32:50 +00:00
Justin Lebar
8ddde8c45f [NVPTX] Remove NVPTXFavorNonGenericAddrSpaces pass.
Summary:
This has been replaced by the NVPTXInferAddressSpaces pass.  We've had
the new one as the default with the old one accessible via a flag for
some months now, and we've had no problems.

Reviewers: tra

Subscribers: llvm-commits, jholewinski, jingyue, mgorny

Differential Revision: https://reviews.llvm.org/D26165

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@285642 91177308-0d34-0410-b5e6-96231b3b80d8
2016-10-31 21:51:42 +00:00
Justin Lebar
30c499dfda [NVPTX] Compute 'rem' using the result of 'div', if possible.
Summary:
In isel, transform

  Num % Den

into

  Num - (Num / Den) * Den

if the result of Num / Den is already available.

Reviewers: tra

Subscribers: hfinkel, llvm-commits, jholewinski

Differential Revision: https://reviews.llvm.org/D26090

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@285461 91177308-0d34-0410-b5e6-96231b3b80d8
2016-10-28 21:44:00 +00:00
Artem Belevich
27140649cc [NVPTX] Added intrinsics for atom.gen.{sys|cta}.* instructions.
These are only available on sm_60+ GPUs.

Differential Revision: https://reviews.llvm.org/D24943

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@282607 91177308-0d34-0410-b5e6-96231b3b80d8
2016-09-28 17:25:38 +00:00
NAKAMURA Takumi
bba417d327 llvm/test/CodeGen/NVPTX/zero-cs.ll: Relax an expression to match in -Asserts.
LLVM ERROR: Cannot select: 0x3607bf0: i32 = ExternalSymbol'__powidf2'

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@282053 91177308-0d34-0410-b5e6-96231b3b80d8
2016-09-21 04:43:11 +00:00
Jacques Pienaar
1d75557110 [NVPTX] Check if callsite is defined when computing argument allignment
Summary: In getArgumentAlignment check if the ImmutableCallSite pointer CS is non-null before dereferencing. If CS is 0x0 fall back to the ABI type alignment else compute the alignment as before.

Reviewers: eliben, jpienaar

Subscribers: jlebar, vchuravy, cfe-commits, jholewinski

Differential Revision: https://reviews.llvm.org/D9168

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@282045 91177308-0d34-0410-b5e6-96231b3b80d8
2016-09-21 01:57:57 +00:00
Peter Collingbourne
5420de3f15 DebugInfo: New metadata representation for global variables.
This patch reverses the edge from DIGlobalVariable to GlobalVariable.
This will allow us to more easily preserve debug info metadata when
manipulating global variables.

Fixes PR30362. A program for upgrading test cases is attached to that
bug.

Differential Revision: http://reviews.llvm.org/D20147

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@281284 91177308-0d34-0410-b5e6-96231b3b80d8
2016-09-13 01:12:59 +00:00
Justin Lebar
877859e49f [NVPTX] Use ldg for explicitly invariant loads.
Summary:
With this change (plus some changes to prevent !invariant from being
clobbered within llvm), clang will be able to model the __ldg CUDA
builtin as an invariant load, rather than as a target-specific llvm
intrinsic.  This will let the optimizer play with these loads --
specifically, we should be able to vectorize them in the load-store
vectorizer.

Reviewers: tra

Subscribers: jholewinski, hfinkel, llvm-commits, chandlerc

Differential Revision: https://reviews.llvm.org/D23477

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@281152 91177308-0d34-0410-b5e6-96231b3b80d8
2016-09-11 01:39:04 +00:00
Justin Lebar
8458ab1e9c [NVPTX] Implement llvm.fabs.f32, llvm.max.f32, etc.
Summary:
Previously these only worked via NVPTX-specific intrinsics.

This change will allow us to convert these target-specific intrinsics
into the general LLVM versions, allowing existing LLVM passes to reason
about their behavior.

It also gets us some minor codegen improvements as-is, from situations
where we canonicalize code into one of these llvm intrinsics.

Reviewers: majnemer

Subscribers: llvm-commits, jholewinski, tra

Differential Revision: https://reviews.llvm.org/D24300

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@281092 91177308-0d34-0410-b5e6-96231b3b80d8
2016-09-09 21:07:26 +00:00
Justin Lebar
65cdc468ea [NVPTX] Switch nvptx-use-infer-addrspace to true.
Summary:
This switches us to use a different, more powerful algorithm for address
space inference.  I've tested this locally and it seems to work great.
Once we're more confident in it, we can remove the old pass altogether.

Reviewers: jingyue

Subscribers: llvm-commits, tra, jholewinski

Differential Revision: https://reviews.llvm.org/D23694

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@279317 91177308-0d34-0410-b5e6-96231b3b80d8
2016-08-19 20:46:45 +00:00
Artem Belevich
e10646be3b [NVPTX] Use untyped (.b) integer registers in PTX.
This bring LLVM-generated PTX closer to what nvcc generates and avoids
triggering issues in ptxas.

For instance, ptxas does not accept .s16 (or .u16) registers as operands
for .fp16 instructions.

Differential Revision: https://reviews.llvm.org/D23460

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@278568 91177308-0d34-0410-b5e6-96231b3b80d8
2016-08-12 22:02:19 +00:00
Artem Belevich
23d7717d3d [NVPTX] remove unnecessary named metadata update that happens to break debug info.
Also added test case to verify IR changes done by NVPTXGenericToNVVM pass.

Differential Revision: https://reviews.llvm.org/D22837

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@277520 91177308-0d34-0410-b5e6-96231b3b80d8
2016-08-02 20:58:24 +00:00
Justin Lebar
2b8ecef8b7 Fix NVPTX/call-with-alloca-buffer.ll after r276777.
r276777 makes InstSimplify stronger, letting it see through some
unnecessary addrspace casts.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@276786 91177308-0d34-0410-b5e6-96231b3b80d8
2016-07-26 18:28:33 +00:00
Justin Lebar
e1e21627c0 [NVPTX] Enable the load-store vectorizer on nvptx.
Reviewers: tra

Subscribers: jholewinski, arsenm, asbirlea

Differential Revision: https://reviews.llvm.org/D22592

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@276196 91177308-0d34-0410-b5e6-96231b3b80d8
2016-07-20 22:11:36 +00:00