llvm/test/CodeGen/NVPTX
Artem Belevich b1a24afa7a [NVPTX] Unify vectorization of load/stores of aggregate arguments and return values.
Original code only used vector loads/stores for explicit vector arguments.
It could also do more loads/stores than necessary (e.g v5f32 would
touch 8 f32 values). Aggregate types were loaded one element at a time,
even the vectors contained within.

This change attempts to generalize (and simplify) parameter space
loads/stores so that vector loads/stores can be used more broadly.
Functionality of the patch has been verified by compiling thrust
test suite and manually checking the differences between PTX
generated by llvm with and without the patch.

General algorithm:
* ComputePTXValueVTs() flattens input/output argument into a flat list
  of scalars to load/store and returns their types and offsets.
* VectorizePTXValueVTs() uses that data to create vectorization plan
  which returns an array of flags marking boundaries of vectorized
  load/stores. Scalars are represented as 1-element vectors.
* Code that generates loads/stores implements a simple state machine
  that constructs a vector according to the plan.

Differential Revision: https://reviews.llvm.org/D30011

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@295784 91177308-0d34-0410-b5e6-96231b3b80d8
2017-02-21 22:56:05 +00:00
..
access-non-generic.ll NVPTX: Move InferAddressSpaces to generic code 2017-01-31 01:10:58 +00:00
add-128bit.ll Revert revisions r234755, r234759, r234760 2015-04-13 17:47:15 +00:00
addrspacecast-gvar.ll [NVPTX] Handle addrspacecast constant expressions in aggregate initializers 2015-04-28 17:18:30 +00:00
addrspacecast.ll [NVPTX] Remove NVPTXFavorNonGenericAddrSpaces pass. 2016-10-31 21:51:42 +00:00
aggr-param.ll
aggregate-return.ll [NVPTX] Unify vectorization of load/stores of aggregate arguments and return values. 2017-02-21 22:56:05 +00:00
alias.ll [CUDA] Die gracefully when trying to output an LLVM alias. 2016-01-23 21:12:20 +00:00
annotations.ll Whitespace cleanup in test/CodeGen/NVPTX/annotations.ll. 2016-12-14 22:32:55 +00:00
arg-lowering.ll
arithmetic-fp-sm20.ll
arithmetic-int.ll [NVPTX] expand mul_lohi to mul_lo and mul_hi 2016-01-22 19:47:26 +00:00
atomics-with-scope.ll [NVPTX] Added intrinsics for atom.gen.{sys|cta}.* instructions. 2016-09-28 17:25:38 +00:00
atomics.ll
bfe.ll
branch-fold.ll Roll forward r242871 2015-07-29 18:59:09 +00:00
bug17709.ll
bug21465.ll [NVPTX] Renamed NVPTXLowerKernelArgs -> NVPTXLowerArgs. NFC. 2016-07-20 21:44:07 +00:00
bug22246.ll [NVPTX] Generate a more optimal sequence for select of i1 2015-01-26 19:52:20 +00:00
bug22322.ll [NVPTX] Implement llvm.fabs.f32, llvm.max.f32, etc. 2016-09-09 21:07:26 +00:00
bug26185-2.ll [NVPTX] Fix sign/zero-extending ldg/ldu instruction selection 2016-05-02 18:12:02 +00:00
bug26185.ll [NVPTX] Handle ldg created from sign-/zero-extended load 2016-04-05 12:38:01 +00:00
bypass-div.ll Use 32-bit divides instead of 64-bit divides where possible. 2015-08-11 22:16:34 +00:00
call-with-alloca-buffer.ll Fix NVPTX/call-with-alloca-buffer.ll after r276777. 2016-07-26 18:28:33 +00:00
callchain.ll
calling-conv.ll
combine-min-max.ll [NVPTX] Implement min/max in tablegen, rather than with custom DAGComine logic. 2017-01-18 00:09:01 +00:00
compare-int.ll
constant-vectors.ll
convergent-mir-call.ll [NVPTX] Use different, convergent MIs for convergent calls. 2016-03-01 19:24:03 +00:00
convert-fp.ll [NVPTX] Add fptosi tests to convert-fp.ll. 2017-01-15 16:55:54 +00:00
convert-int-sm20.ll
ctlz.ll [NVPTX] Fix function names in ctlz.ll test. Test-only change. 2017-01-18 00:07:52 +00:00
ctpop.ll
cttz.ll
debug-file-loc.ll [PR27284] Reverse the ownership between DICompileUnit and DISubprogram. 2016-04-15 15:57:41 +00:00
disable-opt.ll [NVPTX] Disable performance optimizations when OptLevel==None 2016-02-04 04:15:36 +00:00
div-ri.ll
divrem-combine.ll [NVPTX] Compute 'rem' using the result of 'div', if possible. 2016-10-28 21:44:00 +00:00
envreg.ll
extloadv.ll [NVPTX] expand extload/truncstore for vectors of floats 2015-07-01 21:32:42 +00:00
f16-instructions.ll [NVPTX] Unify vectorization of load/stores of aggregate arguments and return values. 2017-02-21 22:56:05 +00:00
fast-math.ll [NVPTX] Enable combineRepeatedFPDivisors for NVPTX. 2017-02-03 15:13:50 +00:00
fcos-no-fast-math.ll [NVPTX] Only lower sin/cos to approximate instructions if unsafe math is allowed. 2017-01-13 18:48:13 +00:00
fma-assoc.ll [DAGCombine] require UnsafeFPMath for re-association of addition 2017-01-31 14:35:37 +00:00
fma-disable.ll
fma.ll Check that the TLI callback enableAggressiveFMAFusion has the desired effect on FMA folding. 2015-01-14 15:36:28 +00:00
fp16.ll [opaque pointer type] Add textual IR support for explicit type parameter to load instruction 2015-02-27 21:17:42 +00:00
fp-contract.ll
fp-literals.ll
fsin-no-fast-math.ll [NVPTX] Only lower sin/cos to approximate instructions if unsafe math is allowed. 2017-01-13 18:48:13 +00:00
function-align.ll [NVPTXAsmPrinter] do not print .align on function headers 2015-03-12 01:50:30 +00:00
generic-to-nvvm-ir.ll [IR] Remove the DIExpression field from DIGlobalVariable. 2016-12-20 02:09:43 +00:00
generic-to-nvvm.ll [opaque pointer type] Add textual IR support for explicit type parameter to load instruction 2015-02-27 21:17:42 +00:00
global-addrspace.ll [NVPTX] Allow undef value as global initializer 2015-08-22 05:40:26 +00:00
global-ctor-empty.ll [CUDA] Die if we ask the NVPTX backend to emit a global ctor/dtor. 2016-01-30 01:07:38 +00:00
global-ctor.ll [CUDA] Die if we ask the NVPTX backend to emit a global ctor/dtor. 2016-01-30 01:07:38 +00:00
global-dtor.ll [CUDA] Die if we ask the NVPTX backend to emit a global ctor/dtor. 2016-01-30 01:07:38 +00:00
global-ordering.ll
global-variable-big.ll [NVPTX] Support global variables of integer type larger than i64. 2017-01-18 00:29:53 +00:00
global-visibility.ll [NVPTX] Do not emit .hidden or .protected directives as they are not allowed by PTX. 2016-01-15 23:57:53 +00:00
globals_init.ll The constant initialization for globals in NVPTX is generated as an 2015-06-09 16:29:34 +00:00
globals_lowering.ll Force relocation mode to be default, regardless of what is passed to the backend. 2015-06-30 17:18:00 +00:00
gvar-init.ll
half.ll [NVPTX] Added support for half-precision floating point. 2017-01-13 20:56:17 +00:00
i1-global.ll [opaque pointer type] Add textual IR support for explicit type parameter to load instruction 2015-02-27 21:17:42 +00:00
i1-int-to-fp.ll
i1-param.ll
i8-param.ll [opaque pointer type] Add textual IR support for explicit type parameter to load instruction 2015-02-27 21:17:42 +00:00
idioms.ll [NVPTX] Lower integer absolute value idiom to abs instruction. 2017-01-18 00:08:44 +00:00
imad.ll
implicit-def.ll
inline-asm.ll
intrin-nocapture.ll Reapply 239795 - [InstCombine] Propagate non-null facts to call parameters 2015-06-16 20:24:25 +00:00
intrinsic-old.ll [NVVMIntrRange] Only set range metadata if none is already present 2016-12-22 00:51:59 +00:00
intrinsics.ll Fix some broken CHECK lines. 2017-01-22 20:28:56 +00:00
isspacep.ll
ld-addrspace.ll [opaque pointer type] Add textual IR support for explicit type parameter to load instruction 2015-02-27 21:17:42 +00:00
ld-generic.ll [opaque pointer type] Add textual IR support for explicit type parameter to load instruction 2015-02-27 21:17:42 +00:00
ldg-invariant.ll [NVPTX] Add tests that invariant vector loads get lowered to ld.global.nc. 2017-02-04 01:54:56 +00:00
ldparam-v4.ll [NVPTX] Unify vectorization of load/stores of aggregate arguments and return values. 2017-02-21 22:56:05 +00:00
ldu-i8.ll
ldu-ldg.ll
ldu-reg-plus-offset.ll [opaque pointer type] Add textual IR support for explicit type parameter to getelementptr instruction 2015-02-27 19:29:02 +00:00
lit.local.cfg
load-sext-i1.ll [opaque pointer type] Add textual IR support for explicit type parameter to load instruction 2015-02-27 21:17:42 +00:00
load-with-non-coherent-cache.ll [NVPTX] Use LDG for pointer induction variables. 2015-08-05 23:11:57 +00:00
LoadStoreVectorizer.ll [NVPTX] Enable the load-store vectorizer on nvptx. 2016-07-20 22:11:36 +00:00
local-stack-frame.ll [NVPTX] Move NVPTXPeephole after NVPTXPrologEpilogPass 2015-07-01 20:08:06 +00:00
loop-vectorize.ll [NVPTX] declare no vector registers 2015-07-10 04:31:56 +00:00
lower-aggr-copies.ll [NVPTX] Unify vectorization of load/stores of aggregate arguments and return values. 2017-02-21 22:56:05 +00:00
lower-alloca.ll NVPTX: Move InferAddressSpaces to generic code 2017-01-31 01:10:58 +00:00
lower-kernel-ptr-arg.ll [NVPTX] Improve lowering of byval args of device functions. 2016-07-20 18:39:47 +00:00
machine-sink.ll [opaque pointer type] Add textual IR support for explicit type parameter to load instruction 2015-02-27 21:17:42 +00:00
MachineSink-call.ll [NVPTX] Annotate call machine instructions as calls. 2016-02-17 17:46:50 +00:00
MachineSink-convergent.ll NVPTX: Replace uses of cuda.syncthreads with nvvm.barrier0 2016-07-06 20:02:45 +00:00
managed.ll
math-intrins.ll [NVPTX] Add codegen tests for llvm.fma. 2017-01-15 16:55:37 +00:00
misaligned-vector-ldst.ll [opaque pointer type] Add textual IR support for explicit type parameter to load instruction 2015-02-27 21:17:42 +00:00
module-inline-asm.ll
mulwide.ll
named-barriers.ll [NVPTX] Add intrinsics to support named barriers. 2017-01-28 16:38:15 +00:00
noduplicate-syncthreads.ll NVPTX: Replace uses of cuda.syncthreads with nvvm.barrier0 2016-07-06 20:02:45 +00:00
nounroll.ll [opaque pointer type] Add textual IR support for explicit type parameter to load instruction 2015-02-27 21:17:42 +00:00
nvcl-param-align.ll
nvvm-reflect-module-flag.ll [NVPTX] Read __CUDA_FTZ from module flags in NVVMReflect. 2016-04-01 01:09:07 +00:00
nvvm-reflect.ll [NVPTX] Let there be One True Way to set NVVMReflect params. 2017-01-15 16:54:35 +00:00
param-align.ll [NVPTX] Make sure we adjust alignment at all call sites 2016-07-18 21:58:48 +00:00
param-load-store.ll [NVPTX] Unify vectorization of load/stores of aggregate arguments and return values. 2017-02-21 22:56:05 +00:00
pr13291-i1-store.ll [NVPTX] roll forward r239082 2015-06-04 21:28:26 +00:00
pr16278.ll [opaque pointer type] Add textual IR support for explicit type parameter to load instruction 2015-02-27 21:17:42 +00:00
pr17529.ll [opaque pointer type] Add textual IR support for explicit type parameter to getelementptr instruction 2015-02-27 19:29:02 +00:00
refl1.ll [opaque pointer type] Add textual IR support for explicit type parameter to load instruction 2015-02-27 21:17:42 +00:00
reg-copy.ll [NVPTX] allow register copy between float and int 2015-08-01 18:02:12 +00:00
reg-types.ll [NVPTX] Use untyped (.b) integer registers in PTX. 2016-08-12 22:02:19 +00:00
rotate.ll
sched1.ll [opaque pointer type] Add textual IR support for explicit type parameter to load instruction 2015-02-27 21:17:42 +00:00
sched2.ll [opaque pointer type] Add textual IR support for explicit type parameter to load instruction 2015-02-27 21:17:42 +00:00
sext-in-reg.ll
sext-params.ll
shfl.ll [NVPTX] Remove NVPTXFavorNonGenericAddrSpaces pass. 2016-10-31 21:51:42 +00:00
shift-parts.ll [opaque pointer type] Add textual IR support for explicit type parameter to load instruction 2015-02-27 21:17:42 +00:00
simple-call.ll [opaque pointer type] Add textual IR support for explicit type parameter to load instruction 2015-02-27 21:17:42 +00:00
sm-version-20.ll
sm-version-21.ll
sm-version-30.ll [NVPTX] Associate a minimum PTX version for each SM architecture 2015-03-30 19:30:55 +00:00
sm-version-32.ll [NVPTX] Associate a minimum PTX version for each SM architecture 2015-03-30 19:30:55 +00:00
sm-version-35.ll [NVPTX] Associate a minimum PTX version for each SM architecture 2015-03-30 19:30:55 +00:00
sm-version-37.ll [NVPTX] Associate a minimum PTX version for each SM architecture 2015-03-30 19:30:55 +00:00
sm-version-50.ll [NVPTX] Associate a minimum PTX version for each SM architecture 2015-03-30 19:30:55 +00:00
sm-version-52.ll [NVPTX] Associate a minimum PTX version for each SM architecture 2015-03-30 19:30:55 +00:00
sm-version-53.ll [NVPTX] Associate a minimum PTX version for each SM architecture 2015-03-30 19:30:55 +00:00
sm-version-60.ll [NVPTX] Add sm_60, sm_61, sm_62 targets to LLVM. 2016-07-06 21:06:10 +00:00
sm-version-61.ll [NVPTX] Add sm_60, sm_61, sm_62 targets to LLVM. 2016-07-06 21:06:10 +00:00
sm-version-62.ll [NVPTX] Add sm_60, sm_61, sm_62 targets to LLVM. 2016-07-06 21:06:10 +00:00
speculative-execution-divergent-target.ll Move divergent-target test into CodeGen/NVPTX because it requires an NVPTX target. 2016-04-15 01:20:52 +00:00
sqrt-approx.ll [NVPTX] Compute approx sqrt as 1/rsqrt(x) rather than x*rsqrt(x). 2017-01-31 23:08:57 +00:00
st-addrspace.ll
st-generic.ll
surf-read-cuda.ll [NVPTX] roll forward r239082 2015-06-04 21:28:26 +00:00
surf-read.ll
surf-write-cuda.ll
surf-write.ll
symbol-naming.ll Have a single way for creating unique value names. 2015-11-22 00:16:24 +00:00
TailDuplication-convergent.ll NVPTX: Replace uses of cuda.syncthreads with nvvm.barrier0 2016-07-06 20:02:45 +00:00
tex-read-cuda.ll [NVPTX] roll forward r239082 2015-06-04 21:28:26 +00:00
tex-read.ll
texsurf-queries.ll
tid-range.ll [SelectionDAG] Correctly transform range metadata to AssertZExt 2017-01-06 00:11:46 +00:00
tuple-literal.ll
vec8.ll [NVPTX] Unify vectorization of load/stores of aggregate arguments and return values. 2017-02-21 22:56:05 +00:00
vec-param-load.ll [NVPTX] Unify vectorization of load/stores of aggregate arguments and return values. 2017-02-21 22:56:05 +00:00
vector-args.ll
vector-call.ll [NVPTX] Unify vectorization of load/stores of aggregate arguments and return values. 2017-02-21 22:56:05 +00:00
vector-compare.ll [opaque pointer type] Add textual IR support for explicit type parameter to load instruction 2015-02-27 21:17:42 +00:00
vector-global.ll
vector-loads.ll [opaque pointer type] Add textual IR support for explicit type parameter to load instruction 2015-02-27 21:17:42 +00:00
vector-select.ll [opaque pointer type] Add textual IR support for explicit type parameter to load instruction 2015-02-27 21:17:42 +00:00
vector-stores.ll
weak-global.ll [opaque pointer type] Add textual IR support for explicit type parameter to load instruction 2015-02-27 21:17:42 +00:00
weak-linkage.ll
zero-cs.ll llvm/test/CodeGen/NVPTX/zero-cs.ll: Relax an expression to match in -Asserts. 2016-09-21 04:43:11 +00:00
zeroext-32bit.ll Only emit extension for zeroext/signext arguments if type is < 32 bits 2016-06-27 20:22:22 +00:00