linux/kernel
Mathieu Desnoyers 97e1c18e8d tracing: Kernel Tracepoints
Implementation of kernel tracepoints. Inspired from the Linux Kernel
Markers. Allows complete typing verification by declaring both tracing
statement inline functions and probe registration/unregistration static
inline functions within the same macro "DEFINE_TRACE". No format string
is required. See the tracepoint Documentation and Samples patches for
usage examples.

Taken from the documentation patch :

"A tracepoint placed in code provides a hook to call a function (probe)
that you can provide at runtime. A tracepoint can be "on" (a probe is
connected to it) or "off" (no probe is attached). When a tracepoint is
"off" it has no effect, except for adding a tiny time penalty (checking
a condition for a branch) and space penalty (adding a few bytes for the
function call at the end of the instrumented function and adds a data
structure in a separate section).  When a tracepoint is "on", the
function you provide is called each time the tracepoint is executed, in
the execution context of the caller. When the function provided ends its
execution, it returns to the caller (continuing from the tracepoint
site).

You can put tracepoints at important locations in the code. They are
lightweight hooks that can pass an arbitrary number of parameters, which
prototypes are described in a tracepoint declaration placed in a header
file."

Addition and removal of tracepoints is synchronized by RCU using the
scheduler (and preempt_disable) as guarantees to find a quiescent state
(this is really RCU "classic"). The update side uses rcu_barrier_sched()
with call_rcu_sched() and the read/execute side uses
"preempt_disable()/preempt_enable()".

We make sure the previous array containing probes, which has been
scheduled for deletion by the rcu callback, is indeed freed before we
proceed to the next update. It therefore limits the rate of modification
of a single tracepoint to one update per RCU period. The objective here
is to permit fast batch add/removal of probes on _different_
tracepoints.

Changelog :
- Use #name ":" #proto as string to identify the tracepoint in the
  tracepoint table. This will make sure not type mismatch happens due to
  connexion of a probe with the wrong type to a tracepoint declared with
  the same name in a different header.
- Add tracepoint_entry_free_old.
- Change __TO_TRACE to get rid of the 'i' iterator.

Masami Hiramatsu <mhiramat@redhat.com> :
Tested on x86-64.

Performance impact of a tracepoint : same as markers, except that it
adds about 70 bytes of instructions in an unlikely branch of each
instrumented function (the for loop, the stack setup and the function
call). It currently adds a memory read, a test and a conditional branch
at the instrumentation site (in the hot path). Immediate values will
eventually change this into a load immediate, test and branch, which
removes the memory read which will make the i-cache impact smaller
(changing the memory read for a load immediate removes 3-4 bytes per
site on x86_32 (depending on mov prefixes), or 7-8 bytes on x86_64, it
also saves the d-cache hit).

About the performance impact of tracepoints (which is comparable to
markers), even without immediate values optimizations, tests done by
Hideo Aoki on ia64 show no regression. His test case was using hackbench
on a kernel where scheduler instrumentation (about 5 events in code
scheduler code) was added.

Quoting Hideo Aoki about Markers :

I evaluated overhead of kernel marker using linux-2.6-sched-fixes git
tree, which includes several markers for LTTng, using an ia64 server.

While the immediate trace mark feature isn't implemented on ia64, there
is no major performance regression. So, I think that we don't have any
issues to propose merging marker point patches into Linus's tree from
the viewpoint of performance impact.

I prepared two kernels to evaluate. The first one was compiled without
CONFIG_MARKERS. The second one was enabled CONFIG_MARKERS.

I downloaded the original hackbench from the following URL:
http://devresources.linux-foundation.org/craiger/hackbench/src/hackbench.c

I ran hackbench 5 times in each condition and calculated the average and
difference between the kernels.

    The parameter of hackbench: every 50 from 50 to 800
    The number of CPUs of the server: 2, 4, and 8

Below is the results. As you can see, major performance regression
wasn't found in any case. Even if number of processes increases,
differences between marker-enabled kernel and marker- disabled kernel
doesn't increase. Moreover, if number of CPUs increases, the differences
doesn't increase either.

Curiously, marker-enabled kernel is better than marker-disabled kernel
in more than half cases, although I guess it comes from the difference
of memory access pattern.

* 2 CPUs

Number of | without      | with         | diff     | diff    |
processes | Marker [Sec] | Marker [Sec] |   [Sec]  |   [%]   |
--------------------------------------------------------------
       50 |      4.811   |       4.872  |  +0.061  |  +1.27  |
      100 |      9.854   |      10.309  |  +0.454  |  +4.61  |
      150 |     15.602   |      15.040  |  -0.562  |  -3.6   |
      200 |     20.489   |      20.380  |  -0.109  |  -0.53  |
      250 |     25.798   |      25.652  |  -0.146  |  -0.56  |
      300 |     31.260   |      30.797  |  -0.463  |  -1.48  |
      350 |     36.121   |      35.770  |  -0.351  |  -0.97  |
      400 |     42.288   |      42.102  |  -0.186  |  -0.44  |
      450 |     47.778   |      47.253  |  -0.526  |  -1.1   |
      500 |     51.953   |      52.278  |  +0.325  |  +0.63  |
      550 |     58.401   |      57.700  |  -0.701  |  -1.2   |
      600 |     63.334   |      63.222  |  -0.112  |  -0.18  |
      650 |     68.816   |      68.511  |  -0.306  |  -0.44  |
      700 |     74.667   |      74.088  |  -0.579  |  -0.78  |
      750 |     78.612   |      79.582  |  +0.970  |  +1.23  |
      800 |     85.431   |      85.263  |  -0.168  |  -0.2   |
--------------------------------------------------------------

* 4 CPUs

Number of | without      | with         | diff     | diff    |
processes | Marker [Sec] | Marker [Sec] |   [Sec]  |   [%]   |
--------------------------------------------------------------
       50 |      2.586   |       2.584  |  -0.003  |  -0.1   |
      100 |      5.254   |       5.283  |  +0.030  |  +0.56  |
      150 |      8.012   |       8.074  |  +0.061  |  +0.76  |
      200 |     11.172   |      11.000  |  -0.172  |  -1.54  |
      250 |     13.917   |      14.036  |  +0.119  |  +0.86  |
      300 |     16.905   |      16.543  |  -0.362  |  -2.14  |
      350 |     19.901   |      20.036  |  +0.135  |  +0.68  |
      400 |     22.908   |      23.094  |  +0.186  |  +0.81  |
      450 |     26.273   |      26.101  |  -0.172  |  -0.66  |
      500 |     29.554   |      29.092  |  -0.461  |  -1.56  |
      550 |     32.377   |      32.274  |  -0.103  |  -0.32  |
      600 |     35.855   |      35.322  |  -0.533  |  -1.49  |
      650 |     39.192   |      38.388  |  -0.804  |  -2.05  |
      700 |     41.744   |      41.719  |  -0.025  |  -0.06  |
      750 |     45.016   |      44.496  |  -0.520  |  -1.16  |
      800 |     48.212   |      47.603  |  -0.609  |  -1.26  |
--------------------------------------------------------------

* 8 CPUs

Number of | without      | with         | diff     | diff    |
processes | Marker [Sec] | Marker [Sec] |   [Sec]  |   [%]   |
--------------------------------------------------------------
       50 |      2.094   |       2.072  |  -0.022  |  -1.07  |
      100 |      4.162   |       4.273  |  +0.111  |  +2.66  |
      150 |      6.485   |       6.540  |  +0.055  |  +0.84  |
      200 |      8.556   |       8.478  |  -0.078  |  -0.91  |
      250 |     10.458   |      10.258  |  -0.200  |  -1.91  |
      300 |     12.425   |      12.750  |  +0.325  |  +2.62  |
      350 |     14.807   |      14.839  |  +0.032  |  +0.22  |
      400 |     16.801   |      16.959  |  +0.158  |  +0.94  |
      450 |     19.478   |      19.009  |  -0.470  |  -2.41  |
      500 |     21.296   |      21.504  |  +0.208  |  +0.98  |
      550 |     23.842   |      23.979  |  +0.137  |  +0.57  |
      600 |     26.309   |      26.111  |  -0.198  |  -0.75  |
      650 |     28.705   |      28.446  |  -0.259  |  -0.9   |
      700 |     31.233   |      31.394  |  +0.161  |  +0.52  |
      750 |     34.064   |      33.720  |  -0.344  |  -1.01  |
      800 |     36.320   |      36.114  |  -0.206  |  -0.57  |
--------------------------------------------------------------

Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@polymtl.ca>
Acked-by: Masami Hiramatsu <mhiramat@redhat.com>
Acked-by: 'Peter Zijlstra' <peterz@infradead.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-10-14 10:28:28 +02:00
..
irq Merge branch 'linus' into x86/core 2008-08-14 14:58:01 +02:00
power ftrace: disable tracing for hibernation 2008-08-28 12:27:39 -07:00
time [CPUFREQ][5/6] cpufreq: Changes to get_cpu_idle_time_us(), used by ondemand governor 2008-10-09 13:52:44 -04:00
trace hrtimer: prevent migration of per CPU hrtimers 2008-09-29 17:09:14 +02:00
.gitignore Update kernel/.gitignore with new auto-generated files 2008-02-09 23:27:01 -08:00
acct.c tty: Fix abusers of current->sighand->tty 2008-10-13 09:51:42 -07:00
audit_tree.c [PATCH] list_for_each_rcu must die: audit 2008-05-17 03:30:23 -04:00
audit.c [PATCH] Fix the bug of using AUDIT_STATUS_RATE_LIMIT when set fail, no error output. 2008-08-01 12:15:16 -04:00
audit.h [PATCH 1/2] audit: move extern declarations to audit.h 2008-04-28 06:28:04 -04:00
auditfilter.c Re: [PATCH] the loginuid field should be output in all AUDIT_CONFIG_CHANGE audit messages 2008-08-01 12:15:03 -04:00
auditsc.c tty: Fix abusers of current->sighand->tty 2008-10-13 09:51:42 -07:00
backtracetest.c backtrace: replace timer with tasklet + completions 2008-06-27 18:09:16 +02:00
bounds.c Add kbuild.h that contains common definitions for kbuild users 2008-04-29 08:06:29 -07:00
capability.c security: Fix setting of PF_SUPERPRIV by __capable() 2008-08-14 22:59:43 +10:00
cgroup_debug.c CGroup API files: move "releasable" to cgroup_debug subsystem 2008-04-29 08:06:09 -07:00
cgroup.c mm owner: fix race between swapoff and exit 2008-09-29 08:41:47 -07:00
compat.c ntp: support for TAI 2008-05-01 08:03:59 -07:00
configs.c kernel: use non-racy method for proc entries creation 2008-04-29 08:06:22 -07:00
cpu.c Merge branches 'sched/devel', 'sched/cpu-hotplug', 'sched/cpusets' and 'sched/urgent' into sched/core 2008-10-08 11:31:02 +02:00
cpuset.c cpusets: scan_for_empty_cpusets(), cpuset doesn't seem to be so const 2008-10-03 13:39:50 +02:00
delayacct.c per-task-delay-accounting: update taskstats for memory reclaim delay 2008-07-25 10:53:47 -07:00
dma-coherent.c dma-coherent: export dma_[alloc|release]_from_coherent methods 2008-08-22 08:34:53 +02:00
dma.c kernel: use non-racy method for proc entries creation 2008-04-29 08:06:22 -07:00
exec_domain.c [PATCH] kill altroot 2008-07-26 20:53:20 -04:00
exit.c mm owner: fix race between swapoff and exit 2008-09-29 08:41:47 -07:00
extable.c module: Don't report discarded init pages as kernel text. 2008-01-29 17:13:18 +11:00
fork.c tty: Add a kref count 2008-10-13 09:51:40 -07:00
futex_compat.c futex_compat __user annotation 2008-03-30 14:18:41 -07:00
futex.c futexes: fix fault handling in futex_lock_pi 2008-06-23 13:31:15 +02:00
hrtimer.c hrtimer: prevent migration of per CPU hrtimers 2008-09-29 17:09:14 +02:00
itimer.c ITIMER_REAL: convert to use struct pid 2008-02-08 09:22:29 -08:00
kallsyms.c kallsyms: fix potential overflow in binary search 2008-07-25 10:53:27 -07:00
Kconfig.hz sched: fix SCHED_HRTICK dependency 2008-07-28 14:37:38 +02:00
Kconfig.preempt rcu: move PREEMPT_RCU config option back under PREEMPT 2008-03-10 18:01:20 -07:00
kexec.c kexec: fix segmentation fault in kimage_add_entry 2008-09-23 08:09:14 -07:00
kfifo.c is_power_of_2: kernel/kfifo.c 2007-07-16 09:05:50 -07:00
kgdb.c kgdb: call touch_softlockup_watchdog on resume 2008-10-06 13:50:59 -05:00
kmod.c call_usermodehelper(): increase reliability 2008-07-25 10:53:28 -07:00
kprobes.c kprobes: remove redundant config check 2008-07-25 10:53:30 -07:00
ksysfs.c Kobject: convert remaining kobject_unregister() to kobject_put() 2008-01-24 20:40:40 -08:00
kthread.c tracehook: wait_task_inactive 2008-07-26 12:00:09 -07:00
latencytop.c kernel: use non-racy method for proc entries creation 2008-04-29 08:06:22 -07:00
lockdep_internals.h lockdep: build fix 2008-08-13 12:55:10 +02:00
lockdep_proc.c lockstat: fix numerical output rounding error 2008-08-26 10:37:46 +02:00
lockdep.c lockdep: fix invalid list_del_rcu in zap_class 2008-08-27 08:40:36 +02:00
Makefile tracing: Kernel Tracepoints 2008-10-14 10:28:28 +02:00
marker.c markers: fix markers read barrier for multiple probes 2008-07-30 09:41:45 -07:00
module.c tracing: Kernel Tracepoints 2008-10-14 10:28:28 +02:00
mutex-debug.c mutex-debug: check mutex magic before owner 2008-05-16 16:53:35 +02:00
mutex-debug.h
mutex.c locking: fix mutex @key parameter kernel-doc notation 2008-07-28 18:12:36 +02:00
mutex.h
notifier.c ipc: re-enable msgmni automatic recomputing msgmni if set to negative 2008-04-29 08:06:13 -07:00
ns_cgroup.c cgroup_clone: use pid of newly created task for new cgroup 2008-07-25 10:53:37 -07:00
nsproxy.c removed unused #include <linux/version.h>'s 2008-08-23 12:14:12 -07:00
panic.c Add a WARN() macro; this is WARN_ON() + printk arguments 2008-07-25 10:53:29 -07:00
params.c Add new string functions strict_strto* and convert kernel params to use them 2008-02-08 09:22:41 -08:00
pid_namespace.c pid_ns: (BUG 11391) change ->child_reaper when init->group_leader exits 2008-09-02 19:21:38 -07:00
pid.c pidns: remove now unused find_pid function. 2008-07-25 10:53:45 -07:00
pm_qos_params.c pm_qos_requirement might sleep 2008-09-02 19:21:40 -07:00
posix-cpu-timers.c posix-timers: print RT watchdog message 2008-05-24 18:49:22 +02:00
posix-timers.c fix error-path NULL deref in alloc_posix_timer() 2008-10-02 15:53:13 -07:00
printk.c tty: Move tty_write_message out of kernel/printk 2008-10-13 09:51:41 -07:00
profile.c build kernel/profile.o only when requested 2008-07-25 10:53:27 -07:00
ptrace.c security: Fix setting of PF_SUPERPRIV by __capable() 2008-08-14 22:59:43 +10:00
rcuclassic.c rcu: RCU-based detection of stalled CPUs for Classic RCU, fix 2008-10-03 10:41:00 +02:00
rcupdate.c rcu: fix synchronize_rcu() so that kernel-doc works 2008-08-21 09:31:44 +02:00
rcupreempt_trace.c rcu: trace fix possible mem-leak 2008-08-15 17:54:40 +02:00
rcupreempt.c rcu: remove redundant ACCESS_ONCE definition from rcupreempt.c 2008-08-18 09:45:22 +02:00
rcutorture.c rcu: make rcutorture even more vicious: invoke RCU readers from irq handlers (timers) 2008-06-26 09:24:33 +02:00
relay.c relay: fix "full buffer with exactly full last subbuffer" accounting problem 2008-08-05 14:33:46 -07:00
res_counter.c cgroup files: convert res_counter_write() to be a cgroups write_string() handler 2008-07-25 10:53:36 -07:00
resource.c IO resources: add reserve_region_with_split() 2008-09-04 21:02:44 +02:00
rtmutex_common.h Don't operate with pid_t in rtmutex tester 2008-02-08 09:22:41 -08:00
rtmutex-debug.c Don't operate with pid_t in rtmutex tester 2008-02-08 09:22:41 -08:00
rtmutex-debug.h
rtmutex-tester.c sysdev: Pass the attribute to the low level sysdev show/store function 2008-07-21 21:55:02 -07:00
rtmutex.c hrtimer: more hrtimer_init_sleeper() fallout. 2008-02-13 15:45:36 +01:00
rtmutex.h
rwsem.c sched: mark rwsem functions as __sched for wchan/profiling 2007-12-18 15:21:13 +01:00
sched_clock.c sched_clock: fix cpu_clock() 2008-08-25 17:39:57 +02:00
sched_cpupri.c sched: use a 2-d bitmap for searching lowest-pri CPU 2008-06-06 15:19:28 +02:00
sched_cpupri.h sched: fix the cpuprio count really 2008-06-06 15:19:44 +02:00
sched_debug.c sched: add full schedstats to /proc/sched_debug 2008-06-27 14:31:31 +02:00
sched_fair.c sched: sync wakeups vs avg_overlap 2008-10-08 12:20:26 +02:00
sched_features.h sched: turn off WAKEUP_OVERLAP 2008-09-22 16:29:00 +02:00
sched_idletask.c sched: wakeup preempt when small overlap 2008-09-22 16:28:32 +02:00
sched_rt.c sched_rt.c: resch needed in rt_rq_enqueue() for the root rt_rq 2008-10-04 14:31:54 +02:00
sched_stats.h sched: fix accounting in task delay accounting & migration 2008-07-04 12:50:23 +02:00
sched.c sched debug: add name to sched_domain sysctl entries 2008-10-09 17:13:06 +02:00
seccomp.c make seccomp zerocost in schedule 2007-07-16 09:05:50 -07:00
semaphore.c semaphore: __down_common: use signal_pending_state() 2008-08-05 14:33:47 -07:00
signal.c tracehook: fix SA_NOCLDWAIT 2008-08-19 20:37:07 -07:00
smp.c smp: have smp_call_function_single() detect invalid CPUs 2008-08-25 17:45:48 -07:00
softirq.c Full conversion to early_initcall() interface, remove old interface 2008-07-26 12:00:04 -07:00
softlockup.c softlockup: minor cleanup, don't check task->state twice 2008-09-02 10:49:51 -07:00
spinlock.c lockdep: spin_lock_nest_lock(), checkpatch fixes 2008-08-13 13:56:51 +02:00
srcu.c make srcu_readers_active() static 2008-02-06 10:41:02 -08:00
stacktrace.c stacktrace: fix modular build, export print_stack_trace and save_stack_trace 2008-06-30 09:20:55 +02:00
stop_machine.c stop_machine: remove unused variable 2008-08-12 17:52:55 +10:00
sys_ni.c signalfd: fix undefined reference to `compat_sys_signalfd4' when !CONFIG_SIGNALFD 2008-07-25 11:35:41 -07:00
sys.c tty: Add a kref count 2008-10-13 09:51:40 -07:00
sysctl_check.c sysctl: check for bogus modes 2008-07-25 10:53:45 -07:00
sysctl.c Merge branch 'master' of master.kernel.org:/pub/scm/linux/kernel/git/davem/sparc-2.6 2008-09-16 14:11:43 -07:00
taskstats.c taskstats: remove initialization of static per-cpu variable 2008-07-25 10:53:47 -07:00
test_kprobes.c kprobes: kretprobe user entry-handler 2008-02-06 10:41:11 -08:00
time.c Make constants in kernel/timeconst.h fixed 64 bits 2008-05-02 16:18:42 -07:00
timeconst.pl Make constants in kernel/timeconst.h fixed 64 bits 2008-05-02 16:18:42 -07:00
timer.c Merge branch 'for-linus' of master.kernel.org:/home/rmk/linux-2.6-arm 2008-07-14 16:06:58 -07:00
tracepoint.c tracing: Kernel Tracepoints 2008-10-14 10:28:28 +02:00
tsacct.c task IO accounting: move all IO statistics in struct task_io_accounting 2008-07-27 16:12:28 -07:00
uid16.c asmlinkage_protect replaces prevent_tail_call 2008-04-10 17:28:26 -07:00
user_namespace.c removed unused #include <linux/version.h>'s 2008-08-23 12:14:12 -07:00
user.c sched: rt-bandwidth for user grouping interface 2008-08-19 13:10:09 +02:00
utsname_sysctl.c removed unused #include <linux/version.h>'s 2008-08-23 12:14:12 -07:00
utsname.c removed unused #include <linux/version.h>'s 2008-08-23 12:14:12 -07:00
wait.c kernel: remove fastcall in kernel/* 2008-02-08 09:22:31 -08:00
workqueue.c Merge branch 'core/locking' into core/urgent 2008-08-12 00:11:49 +02:00