mirror of
https://github.com/RPCSX/llvm.git
synced 2024-11-23 03:39:42 +00:00
Add some tips on benchmarking.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@303769 91177308-0d34-0410-b5e6-96231b3b80d8
This commit is contained in:
parent
772effdbda
commit
01c176bc59
87
docs/Benchmarking.rst
Normal file
87
docs/Benchmarking.rst
Normal file
@ -0,0 +1,87 @@
|
||||
==================================
|
||||
Benchmarking tips
|
||||
==================================
|
||||
|
||||
|
||||
Introduction
|
||||
============
|
||||
|
||||
For benchmarking a patch we want to reduce all possible sources of
|
||||
noise as much as possible. How to do that is very OS dependent.
|
||||
|
||||
Note that low noise is required, but not sufficient. It does not
|
||||
exclude measurement bias. See
|
||||
https://www.cis.upenn.edu/~cis501/papers/producing-wrong-data.pdf for
|
||||
example.
|
||||
|
||||
General
|
||||
================================
|
||||
|
||||
* Use a high resolution timer, e.g. perf under linux.
|
||||
|
||||
* Run the benchmark multiple times to be able to recognize noise.
|
||||
|
||||
* Disable as many processes or services as possible on the target system.
|
||||
|
||||
* Disable frequency scaling, turbo boost and address space
|
||||
randomization (see OS specific section).
|
||||
|
||||
* Static link if the OS supports it. That avoids any variation that
|
||||
might be introduced by loading dynamic libraries. This can be done
|
||||
by passing ``-DLLVM_BUILD_STATIC=ON`` to cmake.
|
||||
|
||||
* Try to avoid storage. On some systems you can use tmpfs. Putting the
|
||||
program, inputs and outputs on tmpfs avoids touching a real storage
|
||||
system, which can have a pretty big variability.
|
||||
|
||||
To mount it (on linux and freebsd at least)::
|
||||
|
||||
mount -t tmpfs -o size=<XX>g none dir_to_mount
|
||||
|
||||
Linux
|
||||
=====
|
||||
|
||||
* Disable address space randomization::
|
||||
|
||||
echo 0 > /proc/sys/kernel/randomize_va_space
|
||||
|
||||
* Set scaling_governor to performance::
|
||||
|
||||
for i in /sys/devices/system/cpu/cpu*/cpufreq/scaling_governor
|
||||
do
|
||||
echo performance > /sys/devices/system/cpu/cpu*/cpufreq/scaling_governor
|
||||
done
|
||||
|
||||
* Use https://github.com/lpechacek/cpuset to reserve cpus for just the
|
||||
program you are benchmarking. If using perf, leave at least 2 cores
|
||||
so that perf runs in one and your program in another::
|
||||
|
||||
cset shield -c N1,N2 -k on
|
||||
|
||||
This will move all threads out of N1 and N2. The ``-k on`` means
|
||||
that even kernel threads are moved out.
|
||||
|
||||
* Disable the SMT pair of the cpus you will use for the benchmark. The
|
||||
pair of cpu N can be found in
|
||||
``/sys/devices/system/cpu/cpuN/topology/thread_siblings_list`` and
|
||||
disabled with::
|
||||
|
||||
echo 0 > /sys/devices/system/cpu/cpuX/online
|
||||
|
||||
|
||||
* Run the program with::
|
||||
|
||||
cset shield --exec -- perf stat -r 10 <cmd>
|
||||
|
||||
This will run the command after ``--`` in the isolated cpus. The
|
||||
particular perf command runs the ``<cmd>`` 10 times and reports
|
||||
statistics.
|
||||
|
||||
With these in place you can expect perf variations of less than 0.1%.
|
||||
|
||||
Linux Intel
|
||||
-----------
|
||||
|
||||
* Disable turbo mode::
|
||||
|
||||
echo 1 > /sys/devices/system/cpu/intel_pstate/no_turbo
|
@ -90,6 +90,7 @@ representation.
|
||||
CodeOfConduct
|
||||
CompileCudaWithLLVM
|
||||
ReportingGuide
|
||||
Benchmarking
|
||||
|
||||
:doc:`GettingStarted`
|
||||
Discusses how to get up and running quickly with the LLVM infrastructure.
|
||||
|
Loading…
Reference in New Issue
Block a user