llvm/test/CodeGen/NVPTX
Artem Belevich a710d0d215 [NVPTX] Improve lowering of byval args of device functions.
Avoid unnecessary spills of such vars to local space on SASS level and
pointer space conversion.

Instead, make a local copy with appropriate addrspacecasts and let
LLVM optimize them away when possible.

This allows loading value of the argument using [symbol+offset]
instead of converting argument to general space pointer and using it
for indexing (which also implicitly converts param space pointer to
local space one on SASS level and triggers copying of argument into
local space in the process).

This reduces call overhead, uses less registers and reduces overall
SASS size by 2-4%.

Differential Review: http://reviews.llvm.org/D21421

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@273313 91177308-0d34-0410-b5e6-96231b3b80d8
2016-06-21 20:30:26 +00:00
..
access-non-generic.ll
add-128bit.ll
addrspacecast-gvar.ll
addrspacecast.ll
aggr-param.ll
alias.ll
annotations.ll
arg-lowering.ll
arithmetic-fp-sm20.ll
arithmetic-int.ll
atomics.ll
bfe.ll
branch-fold.ll
bug17709.ll
bug21465.ll [NVPTX] Improve lowering of byval args of device functions. 2016-06-21 20:30:26 +00:00
bug22246.ll
bug22322.ll
bug26185-2.ll
bug26185.ll
bypass-div.ll
call-with-alloca-buffer.ll
callchain.ll
calling-conv.ll
combine-min-max.ll
compare-int.ll
constant-vectors.ll
convergent-mir-call.ll
convert-fp.ll
convert-int-sm20.ll
ctlz.ll
ctpop.ll
cttz.ll
debug-file-loc.ll
disable-opt.ll
div-ri.ll
envreg.ll
extloadv.ll
fast-math.ll
fma-assoc.ll
fma-disable.ll
fma.ll
fp16.ll
fp-contract.ll
fp-literals.ll
function-align.ll
generic-to-nvvm.ll
global-addrspace.ll
global-ctor-empty.ll
global-ctor.ll
global-dtor.ll
global-ordering.ll
global-visibility.ll
globals_init.ll
globals_lowering.ll
gvar-init.ll
half.ll
i1-global.ll
i1-int-to-fp.ll
i1-param.ll
i8-param.ll
imad.ll
implicit-def.ll
inline-asm.ll
intrin-nocapture.ll
intrinsic-old.ll
intrinsics.ll
isspacep.ll
ld-addrspace.ll
ld-generic.ll
ldparam-v4.ll
ldu-i8.ll
ldu-ldg.ll
ldu-reg-plus-offset.ll
lit.local.cfg
load-sext-i1.ll
load-with-non-coherent-cache.ll
local-stack-frame.ll
loop-vectorize.ll
lower-aggr-copies.ll
lower-alloca.ll
lower-kernel-ptr-arg.ll [NVPTX] Improve lowering of byval args of device functions. 2016-06-21 20:30:26 +00:00
machine-sink.ll
MachineSink-call.ll
MachineSink-convergent.ll
managed.ll
misaligned-vector-ldst.ll
module-inline-asm.ll
mulwide.ll
noduplicate-syncthreads.ll
nounroll.ll
nvcl-param-align.ll
nvvm-reflect-module-flag.ll
nvvm-reflect.ll
param-align.ll
pr13291-i1-store.ll
pr16278.ll
pr17529.ll
refl1.ll
reg-copy.ll
rotate.ll
rsqrt.ll
sched1.ll
sched2.ll
sext-in-reg.ll
sext-params.ll
shfl.ll [NVPTX] Add intrinsics for shfl instructions. 2016-06-09 20:04:08 +00:00
shift-parts.ll
simple-call.ll
sm-version-20.ll
sm-version-21.ll
sm-version-30.ll
sm-version-32.ll
sm-version-35.ll
sm-version-37.ll
sm-version-50.ll
sm-version-52.ll
sm-version-53.ll
speculative-execution-divergent-target.ll
st-addrspace.ll
st-generic.ll
surf-read-cuda.ll
surf-read.ll
surf-write-cuda.ll
surf-write.ll
symbol-naming.ll
TailDuplication-convergent.ll
tex-read-cuda.ll
tex-read.ll
texsurf-queries.ll
tuple-literal.ll
vec8.ll
vec-param-load.ll
vector-args.ll
vector-call.ll
vector-compare.ll
vector-global.ll
vector-loads.ll
vector-return.ll
vector-select.ll
vector-stores.ll
weak-global.ll
weak-linkage.ll