llvm/NVPTX at a6cf40b58411785ed1586f1e727fb71482940fb5 - llvm

RPCSX/llvm

mirror of https://github.com/RPCSX/llvm.git synced 2025-01-05 19:29:01 +00:00

History

Artem Belevich b1a24afa7a [NVPTX] Unify vectorization of load/stores of aggregate arguments and return values. Original code only used vector loads/stores for explicit vector arguments. It could also do more loads/stores than necessary (e.g v5f32 would touch 8 f32 values). Aggregate types were loaded one element at a time, even the vectors contained within. This change attempts to generalize (and simplify) parameter space loads/stores so that vector loads/stores can be used more broadly. Functionality of the patch has been verified by compiling thrust test suite and manually checking the differences between PTX generated by llvm with and without the patch. General algorithm: * ComputePTXValueVTs() flattens input/output argument into a flat list of scalars to load/store and returns their types and offsets. * VectorizePTXValueVTs() uses that data to create vectorization plan which returns an array of flags marking boundaries of vectorized load/stores. Scalars are represented as 1-element vectors. * Code that generates loads/stores implements a simple state machine that constructs a vector according to the plan. Differential Revision: https://reviews.llvm.org/D30011 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@295784 91177308-0d34-0410-b5e6-96231b3b80d8		2017-02-21 22:56:05 +00:00
..
access-non-generic.ll	NVPTX: Move InferAddressSpaces to generic code	2017-01-31 01:10:58 +00:00
add-128bit.ll	Revert revisions r234755, r234759, r234760	2015-04-13 17:47:15 +00:00
addrspacecast-gvar.ll	[NVPTX] Handle addrspacecast constant expressions in aggregate initializers	2015-04-28 17:18:30 +00:00
addrspacecast.ll	[NVPTX] Remove NVPTXFavorNonGenericAddrSpaces pass.	2016-10-31 21:51:42 +00:00
aggr-param.ll
aggregate-return.ll	[NVPTX] Unify vectorization of load/stores of aggregate arguments and return values.	2017-02-21 22:56:05 +00:00
alias.ll	[CUDA] Die gracefully when trying to output an LLVM alias.	2016-01-23 21:12:20 +00:00
annotations.ll	Whitespace cleanup in test/CodeGen/NVPTX/annotations.ll.	2016-12-14 22:32:55 +00:00
arg-lowering.ll
arithmetic-fp-sm20.ll
arithmetic-int.ll	[NVPTX] expand mul_lohi to mul_lo and mul_hi	2016-01-22 19:47:26 +00:00
atomics-with-scope.ll	[NVPTX] Added intrinsics for atom.gen.{sys\|cta}.* instructions.	2016-09-28 17:25:38 +00:00
atomics.ll
bfe.ll
branch-fold.ll	Roll forward r242871	2015-07-29 18:59:09 +00:00
bug17709.ll
bug21465.ll	[NVPTX] Renamed NVPTXLowerKernelArgs -> NVPTXLowerArgs. NFC.	2016-07-20 21:44:07 +00:00
bug22246.ll	[NVPTX] Generate a more optimal sequence for select of i1	2015-01-26 19:52:20 +00:00
bug22322.ll	[NVPTX] Implement llvm.fabs.f32, llvm.max.f32, etc.	2016-09-09 21:07:26 +00:00
bug26185-2.ll	[NVPTX] Fix sign/zero-extending ldg/ldu instruction selection	2016-05-02 18:12:02 +00:00
bug26185.ll	[NVPTX] Handle ldg created from sign-/zero-extended load	2016-04-05 12:38:01 +00:00
bypass-div.ll	Use 32-bit divides instead of 64-bit divides where possible.	2015-08-11 22:16:34 +00:00
call-with-alloca-buffer.ll	Fix NVPTX/call-with-alloca-buffer.ll after r276777.	2016-07-26 18:28:33 +00:00
callchain.ll
calling-conv.ll
combine-min-max.ll	[NVPTX] Implement min/max in tablegen, rather than with custom DAGComine logic.	2017-01-18 00:09:01 +00:00
compare-int.ll
constant-vectors.ll
convergent-mir-call.ll	[NVPTX] Use different, convergent MIs for convergent calls.	2016-03-01 19:24:03 +00:00
convert-fp.ll	[NVPTX] Add fptosi tests to convert-fp.ll.	2017-01-15 16:55:54 +00:00
convert-int-sm20.ll
ctlz.ll	[NVPTX] Fix function names in ctlz.ll test. Test-only change.	2017-01-18 00:07:52 +00:00
ctpop.ll
cttz.ll
debug-file-loc.ll	[PR27284] Reverse the ownership between DICompileUnit and DISubprogram.	2016-04-15 15:57:41 +00:00
disable-opt.ll	[NVPTX] Disable performance optimizations when OptLevel==None	2016-02-04 04:15:36 +00:00
div-ri.ll
divrem-combine.ll	[NVPTX] Compute 'rem' using the result of 'div', if possible.	2016-10-28 21:44:00 +00:00
envreg.ll
extloadv.ll	[NVPTX] expand extload/truncstore for vectors of floats	2015-07-01 21:32:42 +00:00
f16-instructions.ll	[NVPTX] Unify vectorization of load/stores of aggregate arguments and return values.	2017-02-21 22:56:05 +00:00
fast-math.ll	[NVPTX] Enable combineRepeatedFPDivisors for NVPTX.	2017-02-03 15:13:50 +00:00
fcos-no-fast-math.ll	[NVPTX] Only lower sin/cos to approximate instructions if unsafe math is allowed.	2017-01-13 18:48:13 +00:00
fma-assoc.ll	[DAGCombine] require UnsafeFPMath for re-association of addition	2017-01-31 14:35:37 +00:00
fma-disable.ll
fma.ll	Check that the TLI callback enableAggressiveFMAFusion has the desired effect on FMA folding.	2015-01-14 15:36:28 +00:00
fp16.ll	[opaque pointer type] Add textual IR support for explicit type parameter to load instruction	2015-02-27 21:17:42 +00:00
fp-contract.ll
fp-literals.ll
fsin-no-fast-math.ll	[NVPTX] Only lower sin/cos to approximate instructions if unsafe math is allowed.	2017-01-13 18:48:13 +00:00
function-align.ll	[NVPTXAsmPrinter] do not print .align on function headers	2015-03-12 01:50:30 +00:00
generic-to-nvvm-ir.ll	[IR] Remove the DIExpression field from DIGlobalVariable.	2016-12-20 02:09:43 +00:00
generic-to-nvvm.ll	[opaque pointer type] Add textual IR support for explicit type parameter to load instruction	2015-02-27 21:17:42 +00:00
global-addrspace.ll	[NVPTX] Allow undef value as global initializer	2015-08-22 05:40:26 +00:00
global-ctor-empty.ll	[CUDA] Die if we ask the NVPTX backend to emit a global ctor/dtor.	2016-01-30 01:07:38 +00:00
global-ctor.ll	[CUDA] Die if we ask the NVPTX backend to emit a global ctor/dtor.	2016-01-30 01:07:38 +00:00
global-dtor.ll	[CUDA] Die if we ask the NVPTX backend to emit a global ctor/dtor.	2016-01-30 01:07:38 +00:00
global-ordering.ll
global-variable-big.ll	[NVPTX] Support global variables of integer type larger than i64.	2017-01-18 00:29:53 +00:00
global-visibility.ll	[NVPTX] Do not emit .hidden or .protected directives as they are not allowed by PTX.	2016-01-15 23:57:53 +00:00
globals_init.ll	The constant initialization for globals in NVPTX is generated as an	2015-06-09 16:29:34 +00:00
globals_lowering.ll	Force relocation mode to be default, regardless of what is passed to the backend.	2015-06-30 17:18:00 +00:00
gvar-init.ll
half.ll	[NVPTX] Added support for half-precision floating point.	2017-01-13 20:56:17 +00:00
i1-global.ll	[opaque pointer type] Add textual IR support for explicit type parameter to load instruction	2015-02-27 21:17:42 +00:00
i1-int-to-fp.ll
i1-param.ll
i8-param.ll	[opaque pointer type] Add textual IR support for explicit type parameter to load instruction	2015-02-27 21:17:42 +00:00
idioms.ll	[NVPTX] Lower integer absolute value idiom to abs instruction.	2017-01-18 00:08:44 +00:00
imad.ll
implicit-def.ll
inline-asm.ll
intrin-nocapture.ll	Reapply 239795 - [InstCombine] Propagate non-null facts to call parameters	2015-06-16 20:24:25 +00:00
intrinsic-old.ll	[NVVMIntrRange] Only set range metadata if none is already present	2016-12-22 00:51:59 +00:00
intrinsics.ll	Fix some broken CHECK lines.	2017-01-22 20:28:56 +00:00
isspacep.ll
ld-addrspace.ll	[opaque pointer type] Add textual IR support for explicit type parameter to load instruction	2015-02-27 21:17:42 +00:00
ld-generic.ll	[opaque pointer type] Add textual IR support for explicit type parameter to load instruction	2015-02-27 21:17:42 +00:00
ldg-invariant.ll	[NVPTX] Add tests that invariant vector loads get lowered to ld.global.nc.	2017-02-04 01:54:56 +00:00
ldparam-v4.ll	[NVPTX] Unify vectorization of load/stores of aggregate arguments and return values.	2017-02-21 22:56:05 +00:00
ldu-i8.ll
ldu-ldg.ll
ldu-reg-plus-offset.ll	[opaque pointer type] Add textual IR support for explicit type parameter to getelementptr instruction	2015-02-27 19:29:02 +00:00
lit.local.cfg
load-sext-i1.ll	[opaque pointer type] Add textual IR support for explicit type parameter to load instruction	2015-02-27 21:17:42 +00:00
load-with-non-coherent-cache.ll	[NVPTX] Use LDG for pointer induction variables.	2015-08-05 23:11:57 +00:00
LoadStoreVectorizer.ll	[NVPTX] Enable the load-store vectorizer on nvptx.	2016-07-20 22:11:36 +00:00
local-stack-frame.ll	[NVPTX] Move NVPTXPeephole after NVPTXPrologEpilogPass	2015-07-01 20:08:06 +00:00
loop-vectorize.ll	[NVPTX] declare no vector registers	2015-07-10 04:31:56 +00:00
lower-aggr-copies.ll	[NVPTX] Unify vectorization of load/stores of aggregate arguments and return values.	2017-02-21 22:56:05 +00:00
lower-alloca.ll	NVPTX: Move InferAddressSpaces to generic code	2017-01-31 01:10:58 +00:00
lower-kernel-ptr-arg.ll	[NVPTX] Improve lowering of byval args of device functions.	2016-07-20 18:39:47 +00:00
machine-sink.ll	[opaque pointer type] Add textual IR support for explicit type parameter to load instruction	2015-02-27 21:17:42 +00:00
MachineSink-call.ll	[NVPTX] Annotate call machine instructions as calls.	2016-02-17 17:46:50 +00:00
MachineSink-convergent.ll	NVPTX: Replace uses of cuda.syncthreads with nvvm.barrier0	2016-07-06 20:02:45 +00:00
managed.ll
math-intrins.ll	[NVPTX] Add codegen tests for llvm.fma.	2017-01-15 16:55:37 +00:00
misaligned-vector-ldst.ll	[opaque pointer type] Add textual IR support for explicit type parameter to load instruction	2015-02-27 21:17:42 +00:00
module-inline-asm.ll
mulwide.ll
named-barriers.ll	[NVPTX] Add intrinsics to support named barriers.	2017-01-28 16:38:15 +00:00
noduplicate-syncthreads.ll	NVPTX: Replace uses of cuda.syncthreads with nvvm.barrier0	2016-07-06 20:02:45 +00:00
nounroll.ll	[opaque pointer type] Add textual IR support for explicit type parameter to load instruction	2015-02-27 21:17:42 +00:00
nvcl-param-align.ll
nvvm-reflect-module-flag.ll	[NVPTX] Read __CUDA_FTZ from module flags in NVVMReflect.	2016-04-01 01:09:07 +00:00
nvvm-reflect.ll	[NVPTX] Let there be One True Way to set NVVMReflect params.	2017-01-15 16:54:35 +00:00
param-align.ll	[NVPTX] Make sure we adjust alignment at all call sites	2016-07-18 21:58:48 +00:00
param-load-store.ll	[NVPTX] Unify vectorization of load/stores of aggregate arguments and return values.	2017-02-21 22:56:05 +00:00
pr13291-i1-store.ll	[NVPTX] roll forward r239082	2015-06-04 21:28:26 +00:00
pr16278.ll	[opaque pointer type] Add textual IR support for explicit type parameter to load instruction	2015-02-27 21:17:42 +00:00
pr17529.ll	[opaque pointer type] Add textual IR support for explicit type parameter to getelementptr instruction	2015-02-27 19:29:02 +00:00
refl1.ll	[opaque pointer type] Add textual IR support for explicit type parameter to load instruction	2015-02-27 21:17:42 +00:00
reg-copy.ll	[NVPTX] allow register copy between float and int	2015-08-01 18:02:12 +00:00
reg-types.ll	[NVPTX] Use untyped (.b) integer registers in PTX.	2016-08-12 22:02:19 +00:00
rotate.ll
sched1.ll	[opaque pointer type] Add textual IR support for explicit type parameter to load instruction	2015-02-27 21:17:42 +00:00
sched2.ll	[opaque pointer type] Add textual IR support for explicit type parameter to load instruction	2015-02-27 21:17:42 +00:00
sext-in-reg.ll
sext-params.ll
shfl.ll	[NVPTX] Remove NVPTXFavorNonGenericAddrSpaces pass.	2016-10-31 21:51:42 +00:00
shift-parts.ll	[opaque pointer type] Add textual IR support for explicit type parameter to load instruction	2015-02-27 21:17:42 +00:00
simple-call.ll	[opaque pointer type] Add textual IR support for explicit type parameter to load instruction	2015-02-27 21:17:42 +00:00
sm-version-20.ll
sm-version-21.ll
sm-version-30.ll	[NVPTX] Associate a minimum PTX version for each SM architecture	2015-03-30 19:30:55 +00:00
sm-version-32.ll	[NVPTX] Associate a minimum PTX version for each SM architecture	2015-03-30 19:30:55 +00:00
sm-version-35.ll	[NVPTX] Associate a minimum PTX version for each SM architecture	2015-03-30 19:30:55 +00:00
sm-version-37.ll	[NVPTX] Associate a minimum PTX version for each SM architecture	2015-03-30 19:30:55 +00:00
sm-version-50.ll	[NVPTX] Associate a minimum PTX version for each SM architecture	2015-03-30 19:30:55 +00:00
sm-version-52.ll	[NVPTX] Associate a minimum PTX version for each SM architecture	2015-03-30 19:30:55 +00:00
sm-version-53.ll	[NVPTX] Associate a minimum PTX version for each SM architecture	2015-03-30 19:30:55 +00:00
sm-version-60.ll	[NVPTX] Add sm_60, sm_61, sm_62 targets to LLVM.	2016-07-06 21:06:10 +00:00
sm-version-61.ll	[NVPTX] Add sm_60, sm_61, sm_62 targets to LLVM.	2016-07-06 21:06:10 +00:00
sm-version-62.ll	[NVPTX] Add sm_60, sm_61, sm_62 targets to LLVM.	2016-07-06 21:06:10 +00:00
speculative-execution-divergent-target.ll	Move divergent-target test into CodeGen/NVPTX because it requires an NVPTX target.	2016-04-15 01:20:52 +00:00
sqrt-approx.ll	[NVPTX] Compute approx sqrt as 1/rsqrt(x) rather than x*rsqrt(x).	2017-01-31 23:08:57 +00:00
st-addrspace.ll
st-generic.ll
surf-read-cuda.ll	[NVPTX] roll forward r239082	2015-06-04 21:28:26 +00:00
surf-read.ll
surf-write-cuda.ll
surf-write.ll
symbol-naming.ll	Have a single way for creating unique value names.	2015-11-22 00:16:24 +00:00
TailDuplication-convergent.ll	NVPTX: Replace uses of cuda.syncthreads with nvvm.barrier0	2016-07-06 20:02:45 +00:00
tex-read-cuda.ll	[NVPTX] roll forward r239082	2015-06-04 21:28:26 +00:00
tex-read.ll
texsurf-queries.ll
tid-range.ll	[SelectionDAG] Correctly transform range metadata to AssertZExt	2017-01-06 00:11:46 +00:00
tuple-literal.ll
vec8.ll	[NVPTX] Unify vectorization of load/stores of aggregate arguments and return values.	2017-02-21 22:56:05 +00:00
vec-param-load.ll	[NVPTX] Unify vectorization of load/stores of aggregate arguments and return values.	2017-02-21 22:56:05 +00:00
vector-args.ll
vector-call.ll	[NVPTX] Unify vectorization of load/stores of aggregate arguments and return values.	2017-02-21 22:56:05 +00:00
vector-compare.ll	[opaque pointer type] Add textual IR support for explicit type parameter to load instruction	2015-02-27 21:17:42 +00:00
vector-global.ll
vector-loads.ll	[opaque pointer type] Add textual IR support for explicit type parameter to load instruction	2015-02-27 21:17:42 +00:00
vector-select.ll	[opaque pointer type] Add textual IR support for explicit type parameter to load instruction	2015-02-27 21:17:42 +00:00
vector-stores.ll
weak-global.ll	[opaque pointer type] Add textual IR support for explicit type parameter to load instruction	2015-02-27 21:17:42 +00:00
weak-linkage.ll
zero-cs.ll	llvm/test/CodeGen/NVPTX/zero-cs.ll: Relax an expression to match in -Asserts.	2016-09-21 04:43:11 +00:00
zeroext-32bit.ll	Only emit extension for zeroext/signext arguments if type is < 32 bits	2016-06-27 20:22:22 +00:00