mirror of
https://github.com/RPCSX/llvm.git
synced 2024-11-24 04:09:45 +00:00
[CUDA] Add section to docs about controlling fp optimizations.
Reviewers: rnk Subscribers: llvm-commits, tra Differential Revision: http://reviews.llvm.org/D20494 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@270789 91177308-0d34-0410-b5e6-96231b3b80d8
This commit is contained in:
parent
28abc1acf3
commit
2969718630
@ -148,6 +148,46 @@ compilation, in host and device modes:
|
|||||||
Both clang and nvcc define ``__CUDACC__`` during CUDA compilation. You can
|
Both clang and nvcc define ``__CUDACC__`` during CUDA compilation. You can
|
||||||
detect NVCC specifically by looking for ``__NVCC__``.
|
detect NVCC specifically by looking for ``__NVCC__``.
|
||||||
|
|
||||||
|
Flags that control numerical code
|
||||||
|
=================================
|
||||||
|
|
||||||
|
If you're using GPUs, you probably care about making numerical code run fast.
|
||||||
|
GPU hardware allows for more control over numerical operations than most CPUs,
|
||||||
|
but this results in more compiler options for you to juggle.
|
||||||
|
|
||||||
|
Flags you may wish to tweak include:
|
||||||
|
|
||||||
|
* ``-ffp-contract={on,off,fast}`` (defaults to ``fast`` on host and device when
|
||||||
|
compiling CUDA) Controls whether the compiler emits fused multiply-add
|
||||||
|
operations.
|
||||||
|
|
||||||
|
* ``off``: never emit fma operations, and prevent ptxas from fusing multiply
|
||||||
|
and add instructions.
|
||||||
|
* ``on``: fuse multiplies and adds within a single statement, but never
|
||||||
|
across statements (C11 semantics). Prevent ptxas from fusing other
|
||||||
|
multiplies and adds.
|
||||||
|
* ``fast``: fuse multiplies and adds wherever profitable, even across
|
||||||
|
statements. Doesn't prevent ptxas from fusing additional multiplies and
|
||||||
|
adds.
|
||||||
|
|
||||||
|
Fused multiply-add instructions can be much faster than the unfused
|
||||||
|
equivalents, but because the intermediate result in an fma is not rounded,
|
||||||
|
this flag can affect numerical code.
|
||||||
|
|
||||||
|
* ``-fcuda-flush-denormals-to-zero`` (default: off) When this is enabled,
|
||||||
|
floating point operations may flush `denormal
|
||||||
|
<https://en.wikipedia.org/wiki/Denormal_number>`_ inputs and/or outputs to 0.
|
||||||
|
Operations on denormal numbers are often much slower than the same operations
|
||||||
|
on normal numbers.
|
||||||
|
|
||||||
|
* ``-fcuda-approx-transcendentals`` (default: off) When this is enabled, the
|
||||||
|
compiler may emit calls to faster, approximate versions of transcendental
|
||||||
|
functions, instead of using the slower, fully IEEE-compliant versions. For
|
||||||
|
example, this flag allows clang to emit the ptx ``sin.approx.f32``
|
||||||
|
instruction.
|
||||||
|
|
||||||
|
This is implied by ``-ffast-math``.
|
||||||
|
|
||||||
Optimizations
|
Optimizations
|
||||||
=============
|
=============
|
||||||
|
|
||||||
|
Loading…
Reference in New Issue
Block a user