It will be used in following patch to expand or collapse only the
current browser entry.
Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Cc: David Ahern <dsahern@gmail.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Link: http://lkml.kernel.org/r/1484904032-11040-2-git-send-email-jolsa@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Using perf with call graph method dwarf fails to provide backtrace
support for stripped binary even though .gnu_debuglink points to *.dbg
flavor with properly populated debug symbols.
Problem is reproduced on ARM (v7, v8), kernels 3.14.y, 4.4.y and
4.10.rc3. Perf is configured with libunwind, and unwind dwarf support
[1]. Test code (stress_bt.c) can be found on [2].
Running (explicitly disable other unwinding methods):
$ gcc -g -o stress_bt -fomit-frame-pointer -fno-unwind-tables \
-fno-asynchronous-unwind-tables stress_bt.c
$ perf record -N --call-graph dwarf ./stress_bt
$ perf report
results in properly generated call graph. Stripping the binary and running
it results with missing call graph. Expected result is to have call graph:
$ gcc -g -o stress_bt -fomit-frame-pointer -fno-unwind-tables \
-fno-asynchronous-unwind-tables stress_bt.c
$ objcopy --only-keep-debug stress_bt stress_bt.dbg
$ objcopy --strip-debug stress_bt
$ objcopy --add-gnu-debuglink=stress_bt.dbg stress_bt
$ perf record -N --call-graph dwarf ./stress_bt
$ perf report
Problem is that perf doesn't try to read symbols pointed by gnu
debuglink. Patch adds checking, and reading of the symbols from
debuglink and symsrc. Order of the check is to first check within dso,
then check whether symsrc is defined and try to read from it. Finally,
debuglink is checked. Default locations of debug files are discussed in
[3] and [4]. Comments on RFC are on [5].
[1] https://wiki.linaro.org/LEG/Engineering/TOOLS/perf-callstack-unwinding
[2] [1]#Backtrace_stress_application
[3] https://sourceware.org/gdb/onlinedocs/gdb/Separate-Debug-Files.html
[4] https://sourceware.org/binutils/docs/binutils/objcopy.html
[5] https://lkml.org/lkml/2016/8/22/473
Signed-off-by: Matija Glavinic Pecotic <matija.glavinic-pecotic.ext@nokia.com>
Acked-by: Jiri Olsa <jolsa@kernel.org>
Cc: Alexander Sverdlin <alexander.sverdlin@nokia.com>
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Link: http://lkml.kernel.org/r/d309d40a-463f-482b-68e1-1465326efdc1@nokia.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
New feature:
- Account thread wait time (off cpu time) separately: sleep, iowait and
preempt, based on the prev_state of the last event, show the breakdown
when using "perf sched timehist --state" (Namhyumg Kim)
Infrastructure:
- Factor out pmu scale conversion code (Andi Kleen)
- Remove unnecessary feature-dwarf warning (David Carrillo-Cisneros)
- Add missing member name in OPT_() macros (Soramichi AKIYAMA)
- Move variables referenced in libperf.a object files from perf's main()
file, so that other tools can use libperf.a with a different main()
(Soramichi AKIYAMA)
Documentation:
- Fix 'perf script' man page about --dump-raw-trace option (Michael Petlan)
- Also allow forcing reading of non-root owned files by root in 'perf
script' (Yannick Brosseau)
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2
iQIcBAABCAAGBQJYfi9UAAoJENZQFvNTUqpApucQALZpwoSQgwG9+9mMYoc56miv
SSmKUMi9E1urHtTqJ61W0F5/whywY3x3q+bJ/6iK7sNi+q5H+3OAhqMoGtawVnP/
O/sB0QhaUm/KyUg7V8loDVrPxXAImAEhm1dIGB7/YBBbJHrlJ/LDWZBhg+fDBNrh
44/0eXpmeSkzxZycx8qCx5rowxUoy14xa8xC418XAC2OxQO9EojwhW3C9wy1phYK
T2YEiGMpjUwT5BFU5cf0S7KKroqED3JDUIVTrO9qzuv6TvhThEfec4QhNlFmR7UL
x1LyprMM4a5XxE5mzfoWsOZTURg7XZigyjH+kTuN5+dEGeqrLXwHWSziYRVNNe5l
Ubm0aNUXp7NIDs63YWNAXfgYxHnBQLo5YXxsc/IWzzAY8oOb7lWLw6RrD59dUmFk
YOt67gCS77m7l50wGbn0vxLku8Rml4a+DX2oRrScLNIhcRrRU5WWx1EjjG1A8uPR
VaUdg2Reiwit1KGow7WIuvnFqJQySPZR26HyvahuAHSza6//YrbpQdcfiPSWKZ89
74lQuJOUMsVEeApIS3+pt4ExyIY2N7CazuJRBTlvCnk6OmyaEfyzAXW1dYi69dJe
/oJpCFH8TCrRn7piCxlY/8e8wVGFqv769okYR76wdZacr/MbRVhxmpfM4lDzGIHc
oM6XrZpjQEifYghRkTZA
=h4w+
-----END PGP SIGNATURE-----
Merge tag 'perf-core-for-mingo-4.11-20170117' of git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux into perf/core
Pull perf/core improvements and fixes from Arnaldo Carvalho de Melo:
New features:
- Account thread wait time (off CPU time) separately: sleep, iowait and
preempt, based on the prev_state of the last event, show the breakdown
when using "perf sched timehist --state" (Namhyumg Kim)
Infrastructure changes:
- Factor out PMU scale conversion code (Andi Kleen)
- Remove unnecessary feature-dwarf warning (David Carrillo-Cisneros)
- Add missing member name in OPT_() macros (Soramichi AKIYAMA)
- Move variables referenced in libperf.a object files from perf's main()
file, so that other tools can use libperf.a with a different main()
(Soramichi AKIYAMA)
Documentation changes:
- Fix 'perf script' man page about --dump-raw-trace option (Michael Petlan)
- Also allow forcing reading of non-root owned files by root in 'perf
script' (Yannick Brosseau)
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
The use_browser and perf_version_string variables are both declared in
perf.c but they are also referenced by other functions of libperf.a.
Therefore a user linking an own main() with libperf.a must declare those
two variables in their files even if the files never use the browser or
the version information.
This patch fixes this issue by moving use_browser and
perf_version_string out of perf.c to some other files.
Signed-off-by: Soramichi Akiyama <akiyama@m.soramichi.jp>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/20170117002237.c1aec0ce3b4d675dca018deb@m.soramichi.jp
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
The --state option is to show task state when switched out. The state
is printed as a single character like in the /proc but I added 'I' for
idle state rather than 'R'.
$ perf sched timehist --state | head
Samples do not have callchains.
time cpu task name wait time sch delay run time state
[tid/pid] (msec) (msec) (msec)
-------- --- ----------------------- -------- ------------------ -----
1.753791 [3] <idle> 0.000 0.000 0.000 I
1.753834 [1] perf[27469] 0.000 0.000 0.000 S
1.753904 [3] perf[27470] 0.000 0.006 0.112 S
1.753914 [1] <idle> 0.000 0.000 0.079 I
1.753915 [3] migration/3[23] 0.000 0.002 0.011 S
1.754287 [2] <idle> 0.000 0.000 0.000 I
1.754335 [2] transmission[1773/1739] 0.000 0.004 0.047 S
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Minchan Kim <minchan@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/20170113104523.31212-2-namhyung@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Separate thread wait time into 3 parts - sleep, iowait and preempt based
on the prev_state of the last event.
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Cc: David Ahern <dsahern@gmail.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Minchan Kim <minchan@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/20170113104523.31212-1-namhyung@kernel.org
[ Fix the build on centos:5 where 'wait' shadows a global declaration ]
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
In 2059fc7a5a ("perf symbols: Allow forcing reading of non-root owned
files by root") 'perf report' was added the option of forcing reading of
non-root owned symbol file.
This add the same behavior for perf script.
Reported-by: Mark Drayton <mbd@fb.com>
Signed-off-by: Yannick Brosseau <scientist@fb.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: kernel-team@fb.com
Link: http://lkml.kernel.org/r/20170113182527.18625-1-scientist@fb.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
The "--dump-raw-script" is not a valid option, replace it with the valid
one, "--dump-raw-trace"
Signed-off-by: Michael Petlan <mpetlan@redhat.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Fixes: 133dc4c39c ("perf: Rename 'perf trace' to 'perf script'")
LPU-Reference: 728644547.14560155.1484320012612.JavaMail.zimbra@redhat.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
This patch adds missing member names to struct initializations.
Although in C99 for struct S {int x, int y} two init codes struct S s =
{.x = (a), (b)} and struct S s = {.x = (a), .y = (b)} are the same, it
is better to explicitly write .y (.argh in this patch) for readability
and robustness against language/compiler evolutions.
Signed-off-by: Soramichi Akiyama <akiyama@m.soramichi.jp>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/20170113215623.32fb1ac2d862af0048c30fe6@m.soramichi.jp
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Don't warn for feature-dwarf==0 if user explicitily disabled DWARF by
using NO_DWARF=1.
Signed-off-by: David Carrillo-Cisneros <davidcc@google.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Link: http://lkml.kernel.org/r/20170112210159.76143-1-davidcc@google.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Move the scale factor parsing code to an own function to reuse it in an
upcoming patch.
v2: Return error in case strdup returns NULL.
Signed-off-by: Andi Kleen <ak@linux.intel.com>
Acked-by: Jiri Olsa <jolsa@kernel.org>
Link: http://lkml.kernel.org/r/20170103150833.6694-2-andi@firstfloor.org
[ Keep returning -ENOMEM when strdup() fails in perf_pmu__parse_scale()/convert_scale() ]
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Improve __kernel_text_address()/kernel_text_address() to return
true if the given address is on a kprobe's instruction slot
trampoline.
This can help stacktraces to determine the address is on a
text area or not.
To implement this atomically in is_kprobe_*_slot(), also change
the insn_cache page list to an RCU list.
This changes timings a bit (it delays page freeing to the RCU garbage
collection phase), but none of that is in the hot path.
Note: this change can add small overhead to stack unwinders because
it adds 2 additional checks to __kernel_text_address(). However, the
impact should be very small, because kprobe_insn_pages list has 1 entry
per 256 probes(on x86, on arm/arm64 it will be 1024 probes),
and kprobe_optinsn_pages has 1 entry per 32 probes(on x86).
In most use cases, the number of kprobe events may be less
than 20, which means that is_kprobe_*_slot() will check just one entry.
Tested-by: Josh Poimboeuf <jpoimboe@redhat.com>
Signed-off-by: Masami Hiramatsu <mhiramat@kernel.org>
Acked-by: Peter Zijlstra <peterz@infradead.org>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Ananth N Mavinakayanahalli <ananth@linux.vnet.ibm.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Andrey Konovalov <andreyknvl@google.com>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/148388747896.6869.6354262871751682264.stgit@devbox
[ Improved the changelog and coding style. ]
Signed-off-by: Ingo Molnar <mingo@kernel.org>
New features:
- Add more triggers to switch the output file (perf.data.TIMESTAMP).
Now, in addition to switching to a different output file when
receiving a SIGUSR2, one can also specify file size and time based
triggers:
perf record -a --switch-output=signal
is equivalent to what we had before:
perf record -a --switch-output
While we can also ask for the file to be "sliced" by size, taking
into account that that will happen only when we get woken up by
the kernel, i.e. one has to take into account the --mmap-pages (the
size of the perf mmap ring buffer):
perf record -a --switch-output=2G
will break the perf.data output into multiple files limited to 2GB
of samples, right when generating the output.
For time based samples, alert() will be used, so to have 1 minute
limited perf.data output files:
perf record -a --switch-output=1m
(Jiri Olsa)
- Remove the need to use -e only for syscalls and --event only for
tracepoints/HW/SW/etc events, i.e. now one can use:
perf trace -e nanosleep,futex,sched:sched_switch ./workload
or:
perf trace --event nanosleep,futex,sched:sched_switch ./workload
And have it tracing raw_syscalls:sys_{enter,exit} for the nanosleep
and futex syscalls, formatting those as strace does while also
tracing sched:sched_switch, ordering it all into one strace like
output.
Using '!' as the first character in the -e/--event argument remains
a way to negate the list of syscalls, i.e. all syscalls except for
the ones specified, doesn't affect the other kinds of events.
E.g:
[root@jouet ~] # perf trace -e sched:sched_switch,nanosleep usleep 1
0.000 ( 0.028 ms): usleep/28150 nanosleep(rqtp: 0x7ffe4201b9f0) ...
0.028 ( ): sched:sched_switch:usleep:28150 [120] S ==> swapper/0:0 [120])
0.000 ( 0.065 ms): usleep/28150 ... [continued]: nanosleep()) = 0
[root@jouet ~]#
(Arnaldo Carvalho de Melo)
- 'perf kallsyms' toy tool to look for extended symbol information on
the running kernel and demonstrate the machine/thread/symbol APIs for
use in other tools, such as 'perf probe' (Arnaldo Carvalho de Melo)
Infrastructure:
- Add missing linux/kernel.h include to subcmd.h (Arnaldo Carvalho de Melo)
tools: Sync x86's vmx.h with the kernel
- Create libdir directory before installing libperf-jvmti.so (Laura Abbott)
- Fix typo in perf_evlist__start_workload() (Soramichi Akiyama)
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2
iQIcBAABCAAGBQJYdo7BAAoJENZQFvNTUqpA+dUP/0D4ha5gxJGnUFMUhYDj6HwC
CLIfr54fh11rEyLrz5Yp5i2NYKqRvKgYOOTy8tKdJcihXE3PuClUJLqMyenWt9NP
e9zXIKi4ug82kMeAImYrKDHXRaXy5U5X3DLx6//gRkTQwDC2pH+zK+WpUUANrlHL
nu9rR3L7zSgtjsHLfdC7Nj0aDxe5dZZJScOBjUv0+bzsZo2EnbNLR8z0WWpTSJ72
+0+4TgAm/3eg99G6MLAILn8Y7BCT0eIMZtEVXRgjqwm4b1xpE2BzK64yxfatU62f
GVd7PSwfGiKxVy62ughh9ZPvQASTr63AhdfFnfjLnfUZY1VRLIcY3/Dttw5Mv1m5
vHosPWkUjcXbpxZ7DNlgKJqkcXOTW4wZJ+RsYHJthszqcqQuRHxF2BEAp+utJdxK
UuQVORpqaG0W8pHqvDjBT+M/1gTRA+Rc9RUj/bCOUbKarpOUMW5XO/IGn8CpHgDB
Fpr2usPoAxT4FhYUUpbpB9gagdreI2Ntqs3FAMkbO2wxIxdNEztBJFrVhuac5wg4
YQzjwWhC44Dw4X5Yt0uIydOaeA8buEM6oKVKiyCuduz3rlWrjhAdKKoIXbpPQwBj
mhLMx6quQAokVCI1lYl9a6LpOwvE98HKzdYq69aDymdLaBssCBtblO2OFaM92dzT
SnbLWQzFbFWUizTMXn9v
=MfnZ
-----END PGP SIGNATURE-----
Merge tag 'perf-core-for-mingo-4.11-20170111' of git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux into perf/core
Pull perf/core improvements and fixes from Arnaldo Carvalho de Melo:
New features:
- Add more triggers to switch the output file (perf.data.TIMESTAMP).
Now, in addition to switching to a different output file when
receiving a SIGUSR2, one can also specify file size and time based
triggers:
perf record -a --switch-output=signal
is equivalent to what we had before:
perf record -a --switch-output
While we can also ask for the file to be "sliced" by size, taking
into account that that will happen only when we get woken up by
the kernel, i.e. one has to take into account the --mmap-pages (the
size of the perf mmap ring buffer):
perf record -a --switch-output=2G
will break the perf.data output into multiple files limited to 2GB
of samples, right when generating the output.
For time based samples, alert() will be used, so to have 1 minute
limited perf.data output files:
perf record -a --switch-output=1m
(Jiri Olsa)
- Remove the need to use -e only for syscalls and --event only for
tracepoints/HW/SW/etc events, i.e. now one can use:
perf trace -e nanosleep,futex,sched:sched_switch ./workload
or:
perf trace --event nanosleep,futex,sched:sched_switch ./workload
And have it tracing raw_syscalls:sys_{enter,exit} for the nanosleep
and futex syscalls, formatting those as strace does while also
tracing sched:sched_switch, ordering it all into one strace like
output.
Using '!' as the first character in the -e/--event argument remains
a way to negate the list of syscalls, i.e. all syscalls except for
the ones specified, doesn't affect the other kinds of events.
E.g:
[root@jouet ~] # perf trace -e sched:sched_switch,nanosleep usleep 1
0.000 ( 0.028 ms): usleep/28150 nanosleep(rqtp: 0x7ffe4201b9f0) ...
0.028 ( ): sched:sched_switch:usleep:28150 [120] S ==> swapper/0:0 [120])
0.000 ( 0.065 ms): usleep/28150 ... [continued]: nanosleep()) = 0
[root@jouet ~]#
(Arnaldo Carvalho de Melo)
- 'perf kallsyms' toy tool to look for extended symbol information on
the running kernel and demonstrate the machine/thread/symbol APIs for
use in other tools, such as 'perf probe' (Arnaldo Carvalho de Melo)
Infrastructure improvements:
- Add missing linux/kernel.h include to subcmd.h (Arnaldo Carvalho de Melo)
tools: Sync x86's vmx.h with the kernel
- Create libdir directory before installing libperf-jvmti.so (Laura Abbott)
- Fix typo in perf_evlist__start_workload() (Soramichi Akiyama)
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
To pick the changes from:
1b07304c58 ("KVM: nVMX: support descriptor table exits")
That adds entries to VMX_EXIT_REASONS, that is used by
tools/perf/arch/x86/util/kvm-stat.c.
This also picks the changes in:
1dc35dacc1 ("KVM: nVMX: check host CR3 on vmentry and vmexit")
But these are not used in 'perf kvm stat', do it just to silence the
kernel/tools file cache coherency detector:
$ make -C tools/perf
make: Entering directory '/home/acme/git/linux/tools/perf'
BUILD: Doing 'make -j4' parallel build
Warning: arch/x86/include/uapi/asm/vmx.h differs from kernel
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Ladi Prosek <lprosek@redhat.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Wang Nan <wangnan0@huawei.com>
Link: http://lkml.kernel.org/n/tip-56uowkk8t5zje49a42asffcy@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
It's now possible to specify the threshold time for perf.data like:
$ perf record --switch-output=30s ...
Once it's reached, the current data are dumped in to the
perf.data.<timestamp> file and session does on.
$ perf record --switch-output=30s ...
[ perf record: dump data: Woken up 44 times ]
[ perf record: Dump perf.data.2017010213043746 ]
...
The time is expected to be a number with appended unit
character - s/m/h/d.
Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Acked-by: Wang Nan <wangnan0@huawei.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Link: http://lkml.kernel.org/r/1483955520-29063-7-git-send-email-jolsa@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Adding switch-output size warning if the requested
size of lower than the wakeup ring buffer size.
$ perf record --switch-output=1K ls
WARNING: switch-output data size lower than wakeup kernel buffer size (258K) expect bigger perf.data sizes
...
Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Suggested-and-Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Wang Nan <wangnan0@huawei.com>
Link: http://lkml.kernel.org/r/1483955520-29063-6-git-send-email-jolsa@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
It's now possible to specify the threshold size for perf.data like:
$ perf record --switch-output=2G ...
Once it's reached, the current data are dumped in to the
perf.data.<timestamp> file and session does on.
$ perf record --switch-output=2G ...
[ perf record: dump data: Woken up 7244 times ]
[ perf record: Dump perf.data.2017010214093746 ]
...
The size is expected to be a number with appended unit character -
B/K/M/G.
Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Acked-by: Wang Nan <wangnan0@huawei.com>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Link: http://lkml.kernel.org/r/1483955520-29063-5-git-send-email-jolsa@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Next patches will add --switch-output option arguments, changing the
option to allow that and adding its default value to 'signal'.
Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Acked-by: Wang Nan <wangnan0@huawei.com>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Link: http://lkml.kernel.org/r/1483955520-29063-4-git-send-email-jolsa@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Next patches will add more --switch-output option arguments,
so preparing the data holder.
Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Acked-by: Wang Nan <wangnan0@huawei.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Link: http://lkml.kernel.org/r/1483955520-29063-3-git-send-email-jolsa@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Add unit_number__scnprintf function to display size units and use it in
-m option info message.
Before:
$ perf record -m 10M ls
rounding mmap pages size to 16777216 bytes (4096 pages)
...
After:
$ perf record -m 10M ls
rounding mmap pages size to 16M (4096 pages)
...
Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Cc: David Ahern <dsahern@gmail.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Wang Nan <wangnan0@huawei.com>
Link: http://lkml.kernel.org/r/1483955520-29063-2-git-send-email-jolsa@kernel.org
[ Rename it to unit_number__scnprintf for consistency ]
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
This patch fixes a typo: s/enable to/unable to/
Signed-off-by: Soramichi AKIYAMA <akiyama@m.soramichi.jp>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Namhyung Kim <namhyung.kim@lge.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Fixes: bcf3145fbe ("perf evlist: Enhance perf_evlist__start_workload()")
Link: http://lkml.kernel.org/r/20170110200006.e1f7a766b4faf1f107ae2e1b@m.soramichi.jp
[ Wasn't applying, fixed it up by hand, added Fixes: tag ]
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Makes it easier to specify both events and syscalls (to be formatter
strace-like), i.e. previously one would have to do:
# perf trace -e nanosleep --event sched:sched_switch usleep 1
Now it is possible to do:
# perf trace -e nanosleep,sched:sched_switch usleep 1
0.000 ( 0.021 ms): usleep/17962 nanosleep(rqtp: 0x7ffdedd61ec0) ...
0.021 ( ): sched:sched_switch:usleep:17962 [120] S ==> swapper/1:0 [120])
0.000 ( 0.066 ms): usleep/17962 ... [continued]: nanosleep()) = 0
#
The old style --expr and using both -e and --event continues to work.
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Milian Wolff <milian.wolff@kdab.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Wang Nan <wangnan0@huawei.com>
Link: http://lkml.kernel.org/n/tip-ieg6bakub4657l9e6afn85r4@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Its similar to doing grep on a /proc/kallsyms, but it also shows extra
information like the path to the kernel module and the unrelocated
addresses in it, to help in diagnosing problems.
It is also helps demonstrate the use of the symbols routines so that
tool writers can use them more effectively.
Using it:
$ perf kallsyms e1000_xmit_frame netif_rx usb_stor_set_xfer_buf
e1000_xmit_frame: [e1000e] /lib/modules/4.9.0+/kernel/drivers/net/ethernet/intel/e1000e/e1000e.ko 0xffffffffc046fc10-0xffffffffc0470bb0 (0x19c80-0x1ac20)
netif_rx: [kernel] [kernel.kallsyms] 0xffffffff916f03a0-0xffffffff916f0410 (0xffffffff916f03a0-0xffffffff916f0410)
usb_stor_set_xfer_buf: [usb_storage] /lib/modules/4.9.0+/kernel/drivers/usb/storage/usb-storage.ko 0xffffffffc057aea0-0xffffffffc057af19 (0xf10-0xf89)
$
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Wang Nan <wangnan0@huawei.com>
Link: http://lkml.kernel.org/n/tip-79bk9pakujn4l4vq0f90klv3@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
To reduce the boilerplate for searching for functions in the running
kernel and modules.
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Wang Nan <wangnan0@huawei.com>
Link: http://lkml.kernel.org/n/tip-93iqzayafpaxaguoiwjqezgz@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
As it was getting the BUILD_BUG_ON_ZERO() definition by luck.
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Wang Nan <wangnan0@huawei.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Link: http://lkml.kernel.org/n/tip-dh71o31ar72ajck8o2x4aoae@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
The install command for libperf-jvmti.so does not check if libdir exists
before installing. This means that when the install command is run:
install libperf-jvmti.so '/tmp/test_root/usr/lib64';
libperf-jvmti.so will get installed to /usr/lib64 as a file and break
further installation. Fix this by ensuring the directory gets created
first.
See https://bugzilla.redhat.com/show_bug.cgi?id=1410296
Signed-off-by: Laura Abbott <labbott@redhat.com>
Acked-by: Jiri Olsa <jolsa@redhat.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Fixes: d4dfdf00d4 ("perf jvmti: Plug compilation into perf build")
Link: http://lkml.kernel.org/r/1483741088-13543-1-git-send-email-labbott@redhat.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
When x86_pmu.num_counters is 32 the shift of the integer constant 1 is
exceeding 32bit and therefor undefined behaviour.
Fix this by shifting 1ULL instead of 1.
Reported-by: CoverityScan CID#1192105 ("Bad bit shift operation")
Signed-off-by: Colin Ian King <colin.king@canonical.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Kan Liang <kan.liang@intel.com>
Cc: Stephane Eranian <eranian@google.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Link: http://lkml.kernel.org/r/20170111114310.17928-1-colin.king@canonical.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
hswep_uncore_cpu_init() uses a hardcoded physical package id 0 for the boot
cpu. This works as long as the boot CPU is actually on the physical package
0, which is normaly the case after power on / reboot.
But it fails with a NULL pointer dereference when a kdump kernel is started
on a secondary socket which has a different physical package id because the
locigal package translation for physical package 0 does not exist.
Use the logical package id of the boot cpu instead of hard coded 0.
[ tglx: Rewrote changelog once more ]
Fixes: cf6d445f68 ("perf/x86/uncore: Track packages, not per CPU data")
Signed-off-by: Prarit Bhargava <prarit@redhat.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Borislav Petkov <bp@suse.de>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Harish Chegondi <harish.chegondi@intel.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Kan Liang <kan.liang@intel.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Vince Weaver <vincent.weaver@maine.edu>
Cc: stable@vger.kernel.org
Link: http://lkml.kernel.org/r/1483628965-2890-1-git-send-email-prarit@redhat.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
The conversion of Intel PMU drivers into modules did not include reference
counting. The machine will crash when attempting to access deleted code
if an event from a module PMU is started and the module removed before the
event is destroyed.
i.e. this crashes the machine:
$ insmod intel-rapl-perf.ko
$ perf stat -e power/energy-cores/ -C 0 &
$ rmmod intel-rapl-perf.ko
Set THIS_MODULE to pmu->module in Intel module PMUs so that generic code
can handle reference counting and deny rmmod while an event still exists.
Signed-off-by: David Carrillo-Cisneros <davidcc@google.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Borislav Petkov <bp@suse.de>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Kan Liang <kan.liang@intel.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Paul Turner <pjt@google.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
Cc: Stephane Eranian <eranian@google.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/1482455860-116269-1-git-send-email-davidcc@google.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Fixes:
- Fix prev/next_prio formatting for deadline tasks in libtraceevent (Daniel Bristot de Oliveira)
- Robustify reading of build-ids from /sys/kernel/note (Arnaldo Carvalho de Melo)
- Fix building some sample/bpf in Alpine Linux 3.4 (Arnaldo Carvalho de Melo)
- Fix 'make install-bin' to install libtraceevent plugins (Arnaldo Carvalho de Melo)
- Fix 'perf record --switch-output' documentation and comment (Jiri Olsa)
- 'perf probe' fixes for cross arch probing (Masami Hiramatsu)
Improvement:
- Show total scheduling time in 'perf sched timehist' (Namhyumg Kim)
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2
iQIcBAABCAAGBQJYbS6KAAoJENZQFvNTUqpAWqkP/1xdvbDqeTA2YjcWG+ffQJye
LULSNQpRShvBGpB3zsOzOebOtw0R5oWb7YAp4JFZNyMIk5qcnS9gVwdw9Qx1lc/W
Gq8OuhI28Pr9AHE37eTJ1+12HI2aAOylwKFqUpEWMzoaUqBDRh5JV2MeVGVQXBRe
YBJ4Qwk1a7Ng74uzN/pdiC6V7pjddPEoLZEDG5p35vrgtNAY1JGlX5rvNR7CO6nN
72QHRnB4r1d9fgNGS5oO5zEU1q5+JN5phe+gDANdk/8pCqDegRjZpHnj5gcm71jr
mEvbstmlV1XPZSA9aFVu2L5LZuv6asS02HjrbGydqO2DUTpDgrynHsjAuqqaTnwk
kLGwA2I+Kyh8g155H+HrQANONe2TTOXRkBVWIoFBWkhd1JUS/p3GE2qvqx5N36sr
U26d9i/Bk/pbOYOdRUFMc4lckZY2HrrRWj8dVr06qLr2rlbYoLv8hpqvdyHr9MJS
Id2Yqqd1JVzBDU3/+GfdFDBjltVtpSG1mdc6LWy9IpXFMJoSOQNsrbjf+mgMadeV
EbtY8MdDZ525EB8niv9E9Vfd56LiWTKCjg9Mc758+U+Ns50YlQFsDq2hBO2+jQ/k
W3QNX+WJiLgxqQK8adveXoYgCSWPNVttIvg8Lj1KNk8Y//vUC1KSrvDfaKiROzLB
iF0u5mLRAQ+Pfec6X5PP
=o9KL
-----END PGP SIGNATURE-----
Merge tag 'perf-urgent-for-mingo-4.10-20170104' of git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux into perf/urgent
Pull perf/urgent fixes and one improvement from Arnaldo Carvalho de Melo:
Fixes:
- Fix prev/next_prio formatting for deadline tasks in libtraceevent (Daniel Bristot de Oliveira)
- Robustify reading of build-ids from /sys/kernel/note (Arnaldo Carvalho de Melo)
- Fix building some sample/bpf in Alpine Linux 3.4 (Arnaldo Carvalho de Melo)
- Fix 'make install-bin' to install libtraceevent plugins (Arnaldo Carvalho de Melo)
- Fix 'perf record --switch-output' documentation and comment (Jiri Olsa)
- Fix 'perf probe' for cross arch probing (Masami Hiramatsu)
Improvement:
- Show total scheduling time in 'perf sched timehist' (Namhyumg Kim)
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Fix perf-probe to show probe definition on gcc generated symbols for
offline kernel (including cross-arch kernel image).
gcc sometimes optimizes functions and generate new symbols with suffixes
such as ".constprop.N" or ".isra.N" etc. Since those symbol names are
not recorded in DWARF, we have to find correct generated symbols from
offline ELF binary to probe on it (kallsyms doesn't correct it). For
online kernel or uprobes we don't need it because those are rebased on
_text, or a section relative address.
E.g. Without this:
$ perf probe -k build-arm/vmlinux -F __slab_alloc*
__slab_alloc.constprop.9
$ perf probe -k build-arm/vmlinux -D __slab_alloc
p:probe/__slab_alloc __slab_alloc+0
If you put above definition on target machine, it should fail
because there is no __slab_alloc in kallsyms.
With this fix, perf probe shows correct probe definition on
__slab_alloc.constprop.9:
$ perf probe -k build-arm/vmlinux -D __slab_alloc
p:probe/__slab_alloc __slab_alloc.constprop.9+0
Signed-off-by: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/148350060434.19001.11864836288580083501.stgit@devbox
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Fix --funcs (-F) option to show correct symbols for offline module.
Since previous perf-probe uses machine__findnew_module_map() for offline
module, even if user passes a module file (with full path) which is for
other architecture, perf-probe always tries to load symbol map for
current kernel module.
This fix uses dso__new_map() to load the map from given binary as same
as a map for user applications.
Signed-off-by: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/148350053478.19001.15435255244512631545.stgit@devbox
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Markus reported that perf segfaults when reading /sys/kernel/notes from
a kernel linked with GNU gold, due to what looks like a gold bug, so do
some bounds checking to avoid crashing in that case.
Reported-by: Markus Trippelsdorf <markus@trippelsdorf.de>
Report-Link: http://lkml.kernel.org/r/20161219161821.GA294@x4
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Wang Nan <wangnan0@huawei.com>
Link: http://lkml.kernel.org/n/tip-ryhgs6a6jxvz207j2636w31c@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Those are binaries as well, so should be installed by:
make -C tools/perf install-bin'
too.
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Daniel Bristot de Oliveira <bristot@redhat.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Steven Rostedt <rostedt@goodmis.org>
Link: http://lkml.kernel.org/n/tip-3841b37u05evxrs1igkyu6ks@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Currently, the sched:sched_switch tracepoint reports deadline tasks with
priority -1. But when reading the trace via perf script I've got the
following output:
# ./d & # (d is a deadline task, see [1])
# perf record -e sched:sched_switch -a sleep 1
# perf script
...
swapper 0 [000] 2146.962441: sched:sched_switch: swapper/0:0 [120] R ==> d:2593 [4294967295]
d 2593 [000] 2146.972472: sched:sched_switch: d:2593 [4294967295] R ==> g:2590 [4294967295]
The task d reports the wrong priority [4294967295]. This happens because
the "int prio" is stored in an unsigned long long val. Although it is
set as a %lld, as int is shorter than unsigned long long,
trace_seq_printf prints it as a positive number.
The fix is just to cast the val as an int, and print it as a %d,
as in the sched:sched_switch tracepoint's "format".
The output with the fix is:
# ./d &
# perf record -e sched:sched_switch -a sleep 1
# perf script
...
swapper 0 [000] 4306.374037: sched:sched_switch: swapper/0:0 [120] R ==> d:10941 [-1]
d 10941 [000] 4306.383823: sched:sched_switch: d:10941 [-1] R ==> swapper/0:0 [120]
[1] d.c
---
#include <stdio.h>
#include <unistd.h>
#include <sys/syscall.h>
#include <linux/types.h>
#include <linux/sched.h>
struct sched_attr {
__u32 size, sched_policy;
__u64 sched_flags;
__s32 sched_nice;
__u32 sched_priority;
__u64 sched_runtime, sched_deadline, sched_period;
};
int sched_setattr(pid_t pid, const struct sched_attr *attr, unsigned int flags)
{
return syscall(__NR_sched_setattr, pid, attr, flags);
}
int main(void)
{
struct sched_attr attr = {
.size = sizeof(attr),
.sched_policy = SCHED_DEADLINE, /* This creates a 10ms/30ms reservation */
.sched_runtime = 10 * 1000 * 1000,
.sched_period = attr.sched_deadline = 30 * 1000 * 1000,
};
if (sched_setattr(0, &attr, 0) < 0) {
perror("sched_setattr");
return -1;
}
for(;;);
}
---
Committer notes:
Got the program from the provided URL, http://bristot.me/lkml/d.c,
trimmed it and included in the cset log above, so that we have
everything needed to test it in one place.
Signed-off-by: Daniel Bristot de Oliveira <bristot@redhat.com>
Acked-by: Steven Rostedt <rostedt@goodmis.org>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Daniel Bristot de Oliveira <bristot@redhat.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/866ef75bcebf670ae91c6a96daa63597ba981f0d.1483443552.git.bristot@redhat.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
There's no --signal-trigger option, also adding the code comment into
record man page.
Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Tested-by: Wang Nan <wangnan0@huawei.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Link: http://lkml.kernel.org/r/1483431600-19887-4-git-send-email-jolsa@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
There's no need for this one to be global.
Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Tested-by: Wang Nan <wangnan0@huawei.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Link: http://lkml.kernel.org/r/1483431600-19887-3-git-send-email-jolsa@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
To allow string options with a default argument and variable set when
the option is used.
Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Tested-by: Wang Nan <wangnan0@huawei.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Link: http://lkml.kernel.org/r/1483431600-19887-2-git-send-email-jolsa@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Since 'perf probe' supports cross-arch probes, it is possible to analyze
different arch kernel image which has different bits-per-long.
In that case, it fails to get the module name because it uses the
MOD_NAME_OFFSET macro based on the host machine bits-per-long, instead
of the target arch bits-per-long.
This fixes above issue by changing modname-offset based on the target
archs bit width. This is ok because linux kernel uses LP64 model on
64bit arch.
E.g. without this (on x86_64, and target module is arm32):
$ perf probe -m build-arm/fs/configfs/configfs.ko -D configfs_lookup
p:probe/configfs_lookup :configfs_lookup+0
^-Here is an empty module name.
With this fix, you can see correct module name:
$ perf probe -m build-arm/fs/configfs/configfs.ko -D configfs_lookup
p:probe/configfs_lookup configfs:configfs_lookup+0
Signed-off-by: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/148337043836.6752.383495516397005695.stgit@devbox
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Pull DAX updates from Dan Williams:
"The completion of Jan's DAX work for 4.10.
As I mentioned in the libnvdimm-for-4.10 pull request, these are some
final fixes for the DAX dirty-cacheline-tracking invalidation work
that was merged through the -mm, ext4, and xfs trees in -rc1. These
patches were prepared prior to the merge window, but we waited for
4.10-rc1 to have a stable merge base after all the prerequisites were
merged.
Quoting Jan on the overall changes in these patches:
"So I'd like all these 6 patches to go for rc2. The first three
patches fix invalidation of exceptional DAX entries (a bug which
is there for a long time) - without these patches data loss can
occur on power failure even though user called fsync(2). The other
three patches change locking of DAX faults so that ->iomap_begin()
is called in a more relaxed locking context and we are safe to
start a transaction there for ext4"
These have received a build success notification from the kbuild
robot, and pass the latest libnvdimm unit tests. There have not been
any -next releases since -rc1, so they have not appeared there"
* 'libnvdimm-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm:
ext4: Simplify DAX fault path
dax: Call ->iomap_begin without entry lock during dax fault
dax: Finish fault completely when loading holes
dax: Avoid page invalidation races and unnecessary radix tree traversals
mm: Invalidate DAX radix tree entries only if appropriate
ext2: Return BH_New buffers for zeroed blocks
- A merge error on my part broke the DocBook build. I've requisitioned
one of tglx's frozen sharks for appropriate disciplinary action and
resolved to be more careful about testing the DocBook stuff as long as
it's still around.
- Fix an error in unaligned-memory-access.txt
-----BEGIN PGP SIGNATURE-----
iQIcBAABAgAGBQJYZoL/AAoJEI3ONVYwIuV6zCQP/2q0yjKH0qVe+FQqjqBkAcfs
m6B/8QXKLEiTfS4sthvIa2iQgN6ZKWTg8DyVndXoGhE5qgZZAuctxxNj2nwpt0XD
jjTrfLVhyASXCCCfSDPcujtgrQElNOm814+ZsAo5e4KCJk1V7Ap1NlwJpd6beld5
ac6INH3r8X3Qgm2BNbiMJ3VApubcjAkjJRi75mc1JIZGtIAf8ePzjkKVpGyTsaaM
qQQDbx5mXNrYPXFJYHgCOGcVGkWRAqIcDiYscS6XHpmj5DWfz/x6OdicdeB9l7b4
Jw4TWVNTXTWz0THRUD74N1SdHwwetnWnZ3abcaorFIA/yr8tw1OuMmEqGn5VNGX5
Bxk8YYMjjezH4y7+XC7IGhZ1dcrGrdc3AJTqZYIoTOI+HxGJ4RjZsqHghgHy3Qvi
YgLumVmXLQQ6gSTE7sRixq/2hpLUlUrR0po+zQcwO/8Ka0JA860qhYcTjNzRdO1s
csQIJtybytbtX28lz4eMxadFcwTtMyYMyVTslUsP/HJQefNcpPz97OQ0zV7UTmi8
zpHRBo5GnDJNL0u/R6tNDsDo5XaXzy+uR/HDUz/oBn7QofhtrUsgBX2DHeV6pqr7
guvfVVhQ4I3r7/BGyFaiXf28W9f3ADwE1jbibLjEA2IrLxaClIGLxLS/fGql5MAI
FQhqlQhsKFeVTAxMtC9j
=W4e5
-----END PGP SIGNATURE-----
Merge tag 'docs-4.10-rc1-fix' of git://git.lwn.net/linux
Pull documentation fixes from Jonathan Corbet:
"Two small fixes:
- A merge error on my part broke the DocBook build. I've
requisitioned one of tglx's frozen sharks for appropriate
disciplinary action and resolved to be more careful about testing
the DocBook stuff as long as it's still around.
- Fix an error in unaligned-memory-access.txt"
* tag 'docs-4.10-rc1-fix' of git://git.lwn.net/linux:
Documentation/unaligned-memory-access.txt: fix incorrect comparison operator
docs: Fix build failure
Pull crypto fix from Herbert Xu:
"This fixes a boot failure on some platforms when crypto self test is
enabled along with the new acomp interface"
* 'linus' of git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6:
crypto: testmgr - Use heap buffer for acomp test input
mm/filemap.c: In function 'clear_bit_unlock_is_negative_byte':
mm/filemap.c:933:9: error: too few arguments to function 'test_bit'
return test_bit(PG_waiters);
^~~~~~~~
Fixes: b91e1302ad ('mm: optimize PageWaiters bit use for unlock_page()')
Signed-off-by: Olof Johansson <olof@lixom.net>
Brown-paper-bag-by: Linus Torvalds <dummy@duh.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
In commit 6290602709 ("mm: add PageWaiters indicating tasks are
waiting for a page bit") Nick Piggin made our page locking no longer
unconditionally touch the hashed page waitqueue, which not only helps
performance in general, but is particularly helpful on NUMA machines
where the hashed wait queues can bounce around a lot.
However, the "clear lock bit atomically and then test the waiters bit"
sequence turns out to be much more expensive than it needs to be,
because you get a nasty stall when trying to access the same word that
just got updated atomically.
On architectures where locking is done with LL/SC, this would be trivial
to fix with a new primitive that clears one bit and tests another
atomically, but that ends up not working on x86, where the only atomic
operations that return the result end up being cmpxchg and xadd. The
atomic bit operations return the old value of the same bit we changed,
not the value of an unrelated bit.
On x86, we could put the lock bit in the high bit of the byte, and use
"xadd" with that bit (where the overflow ends up not touching other
bits), and look at the other bits of the result. However, an even
simpler model is to just use a regular atomic "and" to clear the lock
bit, and then the sign bit in eflags will indicate the resulting state
of the unrelated bit #7.
So by moving the PageWaiters bit up to bit #7, we can atomically clear
the lock bit and test the waiters bit on x86 too. And architectures
with LL/SC (which is all the usual RISC suspects), the particular bit
doesn't matter, so they are fine with this approach too.
This avoids the extra access to the same atomic word, and thus avoids
the costly stall at page unlock time.
The only downside is that the interface ends up being a bit odd and
specialized: clear a bit in a byte, and test the sign bit. Nick doesn't
love the resulting name of the new primitive, but I'd rather make the
name be descriptive and very clear about the limitation imposed by
trying to work across all relevant architectures than make it be some
generic thing that doesn't make the odd semantics explicit.
So this introduces the new architecture primitive
clear_bit_unlock_is_negative_byte();
and adds the trivial implementation for x86. We have a generic
non-optimized fallback (that just does a "clear_bit()"+"test_bit(7)"
combination) which can be overridden by any architecture that can do
better. According to Nick, Power has the same hickup x86 has, for
example, but some other architectures may not even care.
All these optimizations mean that my page locking stress-test (which is
just executing a lot of small short-lived shell scripts: "make test" in
the git source tree) no longer makes our page locking look horribly bad.
Before all these optimizations, just the unlock_page() costs were just
over 3% of all CPU overhead on "make test". After this, it's down to
0.66%, so just a quarter of the cost it used to be.
(The difference on NUMA is bigger, but there this micro-optimization is
likely less noticeable, since the big issue on NUMA was not the accesses
to 'struct page', but the waitqueue accesses that were already removed
by Nick's earlier commit).
Acked-by: Nick Piggin <npiggin@gmail.com>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Bob Peterson <rpeterso@redhat.com>
Cc: Steven Whitehouse <swhiteho@redhat.com>
Cc: Andrew Lutomirski <luto@kernel.org>
Cc: Andreas Gruenbacher <agruenba@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Mel Gorman <mgorman@techsingularity.net>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Alexei Starovoitov <ast@fb.com>
Cc: Daniel Borkmann <daniel@iogearbox.net>
Cc: Joe Stringer <joe@ovn.org>
Cc: Wang Nan <wangnan0@huawei.com>
Link: http://lkml.kernel.org/n/tip-3awp0nv8tpnblatojmwjww7z@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Pull crypto fix from Herbert Xu:
"This fixes a hash corruption bug in the marvell driver"
* 'linus' of git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6:
crypto: marvell - Copy IVDIG before launching partial DMA ahash requests
To avoid the following build failure on Alpine Linux 3.4, that has
clang-3.8 with the bpf target:
HOSTCC samples/bpf/sock_example.o
In file included from /usr/include/net/ethernet.h:10:0,
from /git/linux/samples/bpf/sock_example.h:7,
from /git/linux/samples/bpf/sock_example.c:30:
/usr/include/netinet/if_ether.h:96:8: error: redefinition of 'struct
ethhdr'
struct ethhdr {
^
In file included from /git/linux/samples/bpf/sock_example.c:26:0:
./usr/include/linux/if_ether.h:144:8: note: originally defined here
struct ethhdr {
^
scripts/Makefile.host:124: recipe for target
'samples/bpf/sock_example.o' failed
make[2]: *** [samples/bpf/sock_example.o] Error 1
/git/linux/Makefile:1658: recipe for target 'samples/bpf/' failed
So include net/if_ether.h for the needs of sock_example.h, using the
same include that sock_example.c uses.
Cc: Alexei Starovoitov <ast@fb.com>
Cc: Daniel Borkmann <daniel@iogearbox.net>
Cc: Joe Stringer <joe@ovn.org>
Cc: Wang Nan <wangnan0@huawei.com>
Link: http://lkml.kernel.org/n/tip-m9avekl1b651qe1r1zd5tzz9@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>