llvm/NVPTX at 99719f40ee60d4dee69405fa25dbb88acda89bc5 - llvm

RPCSX/llvm

mirror of https://github.com/RPCSX/llvm.git synced 2025-01-07 20:40:46 +00:00

History

Artem Belevich a710d0d215 [NVPTX] Improve lowering of byval args of device functions. Avoid unnecessary spills of such vars to local space on SASS level and pointer space conversion. Instead, make a local copy with appropriate addrspacecasts and let LLVM optimize them away when possible. This allows loading value of the argument using [symbol+offset] instead of converting argument to general space pointer and using it for indexing (which also implicitly converts param space pointer to local space one on SASS level and triggers copying of argument into local space in the process). This reduces call overhead, uses less registers and reduces overall SASS size by 2-4%. Differential Review: http://reviews.llvm.org/D21421 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@273313 91177308-0d34-0410-b5e6-96231b3b80d8		2016-06-21 20:30:26 +00:00
..
access-non-generic.ll	[NVPTX] Adds a new address space inference pass.	2016-03-20 20:59:20 +00:00
add-128bit.ll
addrspacecast-gvar.ll
addrspacecast.ll
aggr-param.ll
alias.ll	[CUDA] Die gracefully when trying to output an LLVM alias.	2016-01-23 21:12:20 +00:00
annotations.ll
arg-lowering.ll
arithmetic-fp-sm20.ll
arithmetic-int.ll	[NVPTX] expand mul_lohi to mul_lo and mul_hi	2016-01-22 19:47:26 +00:00
atomics.ll
bfe.ll
branch-fold.ll
bug17709.ll
bug21465.ll	[NVPTX] Improve lowering of byval args of device functions.	2016-06-21 20:30:26 +00:00
bug22246.ll
bug22322.ll
bug26185-2.ll	[NVPTX] Fix sign/zero-extending ldg/ldu instruction selection	2016-05-02 18:12:02 +00:00
bug26185.ll	[NVPTX] Handle ldg created from sign-/zero-extended load	2016-04-05 12:38:01 +00:00
bypass-div.ll
call-with-alloca-buffer.ll
callchain.ll
calling-conv.ll
combine-min-max.ll
compare-int.ll
constant-vectors.ll
convergent-mir-call.ll	[NVPTX] Use different, convergent MIs for convergent calls.	2016-03-01 19:24:03 +00:00
convert-fp.ll
convert-int-sm20.ll
ctlz.ll
ctpop.ll
cttz.ll
debug-file-loc.ll	[PR27284] Reverse the ownership between DICompileUnit and DISubprogram.	2016-04-15 15:57:41 +00:00
disable-opt.ll	[NVPTX] Disable performance optimizations when OptLevel==None	2016-02-04 04:15:36 +00:00
div-ri.ll
envreg.ll
extloadv.ll
fast-math.ll
fma-assoc.ll
fma-disable.ll
fma.ll
fp16.ll
fp-contract.ll
fp-literals.ll
function-align.ll
generic-to-nvvm.ll
global-addrspace.ll
global-ctor-empty.ll	[CUDA] Die if we ask the NVPTX backend to emit a global ctor/dtor.	2016-01-30 01:07:38 +00:00
global-ctor.ll	[CUDA] Die if we ask the NVPTX backend to emit a global ctor/dtor.	2016-01-30 01:07:38 +00:00
global-dtor.ll	[CUDA] Die if we ask the NVPTX backend to emit a global ctor/dtor.	2016-01-30 01:07:38 +00:00
global-ordering.ll
global-visibility.ll	[NVPTX] Do not emit .hidden or .protected directives as they are not allowed by PTX.	2016-01-15 23:57:53 +00:00
globals_init.ll
globals_lowering.ll
gvar-init.ll
half.ll
i1-global.ll
i1-int-to-fp.ll
i1-param.ll
i8-param.ll
imad.ll
implicit-def.ll
inline-asm.ll
intrin-nocapture.ll
intrinsic-old.ll	[NVPTX] Added NVVMIntrRange pass	2016-05-26 17:02:56 +00:00
intrinsics.ll
isspacep.ll
ld-addrspace.ll
ld-generic.ll
ldparam-v4.ll
ldu-i8.ll
ldu-ldg.ll
ldu-reg-plus-offset.ll
lit.local.cfg
load-sext-i1.ll
load-with-non-coherent-cache.ll
local-stack-frame.ll
loop-vectorize.ll
lower-aggr-copies.ll
lower-alloca.ll
lower-kernel-ptr-arg.ll	[NVPTX] Improve lowering of byval args of device functions.	2016-06-21 20:30:26 +00:00
machine-sink.ll
MachineSink-call.ll	[NVPTX] Annotate call machine instructions as calls.	2016-02-17 17:46:50 +00:00
MachineSink-convergent.ll	[NVPTX] Test that MachineSink won't sink across llvm.cuda.syncthreads.	2016-02-17 17:46:52 +00:00
managed.ll
misaligned-vector-ldst.ll
module-inline-asm.ll
mulwide.ll
noduplicate-syncthreads.ll
nounroll.ll
nvcl-param-align.ll
nvvm-reflect-module-flag.ll	[NVPTX] Read __CUDA_FTZ from module flags in NVVMReflect.	2016-04-01 01:09:07 +00:00
nvvm-reflect.ll
param-align.ll
pr13291-i1-store.ll
pr16278.ll
pr17529.ll
refl1.ll
reg-copy.ll
rotate.ll
rsqrt.ll
sched1.ll
sched2.ll
sext-in-reg.ll
sext-params.ll
shfl.ll	[NVPTX] Add intrinsics for shfl instructions.	2016-06-09 20:04:08 +00:00
shift-parts.ll
simple-call.ll
sm-version-20.ll
sm-version-21.ll
sm-version-30.ll
sm-version-32.ll
sm-version-35.ll
sm-version-37.ll
sm-version-50.ll
sm-version-52.ll
sm-version-53.ll
speculative-execution-divergent-target.ll	Move divergent-target test into CodeGen/NVPTX because it requires an NVPTX target.	2016-04-15 01:20:52 +00:00
st-addrspace.ll
st-generic.ll
surf-read-cuda.ll
surf-read.ll
surf-write-cuda.ll
surf-write.ll
symbol-naming.ll	Have a single way for creating unique value names.	2015-11-22 00:16:24 +00:00
TailDuplication-convergent.ll	Don't tail-duplicate blocks that contain convergent instructions.	2016-02-22 17:50:52 +00:00
tex-read-cuda.ll
tex-read.ll
texsurf-queries.ll
tuple-literal.ll
vec8.ll
vec-param-load.ll
vector-args.ll
vector-call.ll
vector-compare.ll
vector-global.ll
vector-loads.ll
vector-return.ll
vector-select.ll
vector-stores.ll
weak-global.ll
weak-linkage.ll