Commit Graph

1648 Commits

Author SHA1 Message Date
Frederic Weisbecker
98e1da905c perf tools: Robustify dynamic sample content fetch
Ensure the size of the dynamic fields such as callchains
or raw events don't overlap the whole event boundaries.

This prevents from dereferencing junk if the given size of
the callchain goes too eager.

Reported-by: Linus Torvalds <torvalds@linux-foundation.org>
Reported-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Stephane Eranian <eranian@google.com>
2011-05-22 03:38:48 +02:00
Frederic Weisbecker
a285412479 perf tools: Pre-check sample size before parsing
Check that the total size of the sample fields having a fixed
size do not exceed the one of the whole event. This robustifies
the sample parsing.

Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Stephane Eranian <eranian@google.com>
2011-05-22 03:38:36 +02:00
Frederic Weisbecker
74429964d8 perf tools: Move evlist sample helpers to evlist area
These APIs should belong to evlist.c as they may not be
exclusively tied to the headers.

Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Stephane Eranian <eranian@google.com
2011-05-22 03:12:29 +02:00
Frederic Weisbecker
dd5f5fd108 perf tools: Remove junk code in mmap size handling
size is overriden later and used only then. Those
lines are only junk, probably a leftover.

Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Stephane Eranian <eranian@google.com>
2011-05-22 03:12:28 +02:00
Frederic Weisbecker
eac9eacee1 perf tools: Check we are able to read the event size on mmap
Check we have enough mmaped space to read the current event
size from its headers, otherwise we may dereference some
hell there.

Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Stephane Eranian <eranian@google.com>
2011-05-22 03:12:13 +02:00
Ingo Molnar
c3305257cd perf stat: Add more cache-miss percentage printouts
Print out the cache-miss percentage as well if the cache refs were
collected, for all the generic cache event types.

Before:

   11,103,723,230 dTLB-loads                #  622.471 M/sec                    ( +-  0.30% )
       87,065,337 dTLB-load-misses          #    4.881 M/sec                    ( +-  0.90% )

After:

   11,353,713,242 dTLB-loads                #  626.020 M/sec                    ( +-  0.35% )
      113,393,472 dTLB-load-misses          #    1.00% of all dTLB cache hits   ( +-  0.49% )

Also ASCII color highlight too high percentages, them when it's executed on the console.

Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Steven Rostedt <rostedt@goodmis.org>
Link: http://lkml.kernel.org/n/tip-lkhwxsevdbd9a8nymx0vxc3y@git.kernel.org
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2011-05-19 14:30:50 +02:00
Ingo Molnar
2cba3ffb9a perf stat: Add -d -d and -d -d -d options to show more CPU events
Print even more detailed statistics if requested via perf stat -d:

       -d:          detailed events, L1 and LLC data cache
    -d -d:     more detailed events, dTLB and iTLB events
 -d -d -d:     very detailed events, adding prefetch events

Full output looks like this now:

 Performance counter stats for '/home/mingo/hackbench 10' (5 runs):

       1703.674707 task-clock                #    8.709 CPUs utilized            ( +-  4.19% )
            49,068 context-switches          #    0.029 M/sec                    ( +- 16.66% )
             8,303 CPU-migrations            #    0.005 M/sec                    ( +- 24.90% )
            17,397 page-faults               #    0.010 M/sec                    ( +-  0.46% )
     2,345,389,239 cycles                    #    1.377 GHz                      ( +-  4.61% ) [55.90%]
     1,884,503,527 stalled-cycles-frontend   #   80.35% frontend cycles idle     ( +-  5.67% ) [50.39%]
       743,919,737 stalled-cycles-backend    #   31.72% backend  cycles idle     ( +-  8.75% ) [49.91%]
     1,314,416,379 instructions              #    0.56  insns per cycle
                                             #    1.43  stalled cycles per insn  ( +-  2.53% ) [60.87%]
       272,592,567 branches                  #  160.003 M/sec                    ( +-  1.74% ) [56.56%]
         3,794,846 branch-misses             #    1.39% of all branches          ( +-  6.59% ) [58.50%]
       449,982,778 L1-dcache-loads           #  264.125 M/sec                    ( +-  2.47% ) [49.88%]
        22,404,961 L1-dcache-load-misses     #    4.98% of all L1-dcache hits    ( +-  6.08% ) [55.05%]
         6,204,750 LLC-loads                 #    3.642 M/sec                    ( +-  8.91% ) [43.75%]
         1,837,411 LLC-load-misses           #    1.078 M/sec                    ( +-  7.27% ) [12.07%]
       411,440,421 L1-icache-loads           #  241.502 M/sec                    ( +-  5.60% ) [36.52%]
        27,556,832 L1-icache-load-misses     #   16.175 M/sec                    ( +-  7.46% ) [46.72%]
       464,067,627 dTLB-loads                #  272.392 M/sec                    ( +-  4.46% ) [54.17%]
        10,765,648 dTLB-load-misses          #    6.319 M/sec                    ( +-  3.18% ) [48.68%]
     1,273,080,386 iTLB-loads                #  747.256 M/sec                    ( +-  3.38% ) [47.53%]
           117,481 iTLB-load-misses          #    0.069 M/sec                    ( +- 14.99% ) [47.01%]
         4,590,653 L1-dcache-prefetches      #    2.695 M/sec                    ( +-  4.49% ) [46.19%]
         1,712,660 L1-dcache-prefetch-misses #    1.005 M/sec                    ( +-  3.75% ) [44.82%]

        0.195622057  seconds time elapsed  ( +-  6.84% )

Also clean up the attribute construction code to be appending, and factor
it out into add_default_attributes().

Tweak the coverage percentage printout a bit, so that it's easier to view it
alongside the +- sttddev colum.

Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Steven Rostedt <rostedt@goodmis.org>
Link: http://lkml.kernel.org/n/tip-to3kgu04449s64062val8b62@git.kernel.org
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2011-05-19 14:29:51 +02:00
Ingo Molnar
b313207286 perf bench, x86: Add alternatives-asm.h wrapper
perf bench needs this to build the kernel's memcpy routine:

In file included from bench/mem-memcpy-x86-64-asm.S:2:0:
bench/../../../arch/x86/lib/memcpy_64.S:7:33: fatal error: asm/alternative-asm.h: No such file or directory

Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Steven Rostedt <rostedt@goodmis.org>
Link: http://lkml.kernel.org/n/tip-c5d41xibgullk8h2280q4gv0@git.kernel.org
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2011-05-18 21:00:44 +02:00
Stephane Eranian
94692349c4 perf: Fix multi-event parsing bug
This patch fixes an issue with event parsing.
The following commit appears to have broken the
ability to specify a comma separated list of events:

   commit ceb53fbf6d
   Author: Ingo Molnar <mingo@elte.hu>
   Date:   Wed Apr 27 04:06:33 2011 +0200

       perf stat: Fail more clearly when an invalid modifier is specified

This patch fixes this while preserving the desired effect:

$ perf stat -e instructions:u,instructions:k ls /dev/null /dev/null

 Performance counter stats for 'ls /dev/null':

            365956 instructions:u           #    0.00  insns per cycle
            731806 instructions:k           #    0.00  insns per cycle

        0.001108862  seconds time elapsed

$ perf stat -e task-clock-msecs true
invalid event modifier: '-msecs'
Run 'perf list' for a list of valid events and modifiers

Signed-off-by: Stephane Eranian <eranian@google.com>
Cc: acme@redhat.com
Cc: peterz@infradead.org
Cc: fweisbec@gmail.com
Link: http://lkml.kernel.org/r/20110517133619.GA6999@quad
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2011-05-17 20:45:36 +02:00
Lin Ming
2b348a7798 perf probe: Fix the missed parameter initialization
pubname_callback_param::found should be initialized to 0 in
fastpath lookup, the structure is on the stack and
uninitialized otherwise.

Signed-off-by: Lin Ming <ming.m.lin@intel.com>
Cc: Arnaldo Carvalho de Melo <acme@infradead.org>
Link: http://lkml.kernel.org/r/1304066518-30420-2-git-send-email-ming.m.lin@intel.com
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2011-05-10 17:06:23 +02:00
Ingo Molnar
932fed4e2e Merge commit 'v2.6.39-rc7' into perf/core
Merge reason: pull in the latest fixes.

Signed-off-by: Ingo Molnar <mingo@elte.hu>
2011-05-10 17:05:45 +02:00
Lin Ming
174a7b1f96 perf tools: Makefile: Use gcc to determine ARCH
The original Makefile uses "uname -m" to determine ARCH.
This causes problem on x86 when compile perf tool on 32 bit
userspace with a 64 bit kernel.

 bench/../../../arch/x86/lib/memcpy_64.S: Assembler messages:
 bench/../../../arch/x86/lib/memcpy_64.S:28: Error: bad register name `%rdi'

This is because "uname -m" returns x86_64 and memcpy_64.S is
included in 32 bit build.

Reported-by: Riccardo Magliocchetti <riccardo.magliocchetti@gmail.com>
Signed-off-by: Lin Ming <ming.m.lin@intel.com>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Arnaldo Carvalho de Melo <acme@infradead.org>
Link: http://lkml.kernel.org/r/1304743274.3132.17.camel@localhost
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2011-05-07 11:40:59 +02:00
David Ahern
c63ca0c01d perf stat: Tell user about unsupported events in the list
Similar to perf-record, tell user about unsupported events
that will not be counted if invoked in verbose mode.

e.g.,

 $ perf stat -e dTLB-prefetch-misses -v -- sleep 1
 dTLB-prefetch-misses event is not supported by the kernel.
 dTLB-prefetch-misses: 0 0 0

 Performance counter stats for 'sleep 1':

     <not counted> dTLB-prefetch-misses

        1.001884783  seconds time elapsed

Signed-off-by: David Ahern <dsahern@gmail.com>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Link: http://lkml.kernel.org/r/1304114655-10600-1-git-send-email-dsahern@gmail.com
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2011-04-30 00:18:14 +02:00
Ingo Molnar
947b4ad1d1 perf list: Fix max event string size
Recent stalled-cycles event names were larger than the 40 chars printout
used by perf list.

Extend that, make it robust for future extensions and also adjust alignments
in face of wider event names.

Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Link: http://lkml.kernel.org/n/tip-7y40wib8n009io7hjpn1dsrm@git.kernel.org
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2011-04-29 22:52:54 +02:00
Ingo Molnar
370faf1dd0 perf stat: Fail softly on unsupported events
David Ahern reported this perf stat failure:

> # /tmp/build-perf/perf stat -- sleep 1
>   Error: stalled-cycles-frontend event is not supported.
>   Fatal: Not all events could be opened.
>
> This is a Dell R410 with an E5620 processor.

Fail in a softer fashion on unknown/unsupported events.

Reported-by: David Ahern <dsahern@gmail.com>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Link: http://lkml.kernel.org/n/tip-7y40wib8n006io7hjpn1dsrm@git.kernel.org
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2011-04-29 16:22:33 +02:00
Ingo Molnar
fce3c786d3 perf stat: Leave more room for percentages
Triple digit percentages do not fit otherwise.

Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Link: http://lkml.kernel.org/n/tip-7y40wib8n005io7hjpn1dsrm@git.kernel.org
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2011-04-29 16:06:09 +02:00
Ingo Molnar
2b427e14b7 perf stat: Adjust stall cycles warning percentages
Adjust to color thresholds to better match the percentages seen in
real workloads. Both are now a bit more sensitive.

Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Link: http://lkml.kernel.org/n/tip-7y40wib8n004io7hjpn1dsrm@git.kernel.org
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2011-04-29 14:35:57 +02:00
Ingo Molnar
d3d1e86da0 perf stat: Analyze front-end and back-end stall counts
Sample output:

 Performance counter stats for './loop_1b':

        873.691065 task-clock               #    1.000 CPUs utilized
                 1 context-switches         #    0.000 M/sec
                 1 CPU-migrations           #    0.000 M/sec
                96 page-faults              #    0.000 M/sec
     2,012,637,222 cycles                   #    2.304 GHz                      (66.58%)
     1,001,397,911 stalled-cycles-frontend  #   49.76% frontend cycles idle     (66.58%)
         7,523,398 stalled-cycles-backend   #    0.37%  backend cycles idle     (66.76%)
     2,004,551,046 instructions             #    1.00  insns per cycle
                                            #    0.50  stalled cycles per insn  (66.80%)
     1,001,304,992 branches                 # 1146.063 M/sec                    (66.76%)
            39,453 branch-misses            #    0.00% of all branches          (66.64%)

        0.874046121  seconds time elapsed

Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Link: http://lkml.kernel.org/n/tip-7y40wib8n003io7hjpn1dsrm@git.kernel.org
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2011-04-29 14:35:55 +02:00
Ingo Molnar
129c04cb8c perf tools: Add front-end and back-end stalled cycles support
Update perf tooling to deal with front-end and back-end stalled cycles events.

Add both the default 'perf stat' output.

Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Link: http://lkml.kernel.org/n/tip-7y40wib8n002io7hjpn1dsrm@git.kernel.org
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2011-04-29 14:35:49 +02:00
Ingo Molnar
ede7029004 perf stat: Fix compatibility behavior
Instead of failing on an unknown event, when new perf stat is run on
older kernels:

  $ ./perf stat true
  Error: open_counter returned with 22 (Invalid argument). /bin/dmesg
  may provide additional information.

  Fatal: Not all events could be opened.

Just ignore EINVAL and ENOSYS, we'll print the results as not counted:

 Performance counter stats for 'true':

          0.239483 task-clock               #    0.493 CPUs utilized
                 0 context-switches         #    0.000 M/sec
                 0 CPU-migrations           #    0.000 M/sec
                86 page-faults              #    0.359 M/sec
           704,766 cycles                   #    2.943 GHz
     <not counted> stalled-cycles
           381,961 instructions             #    0.54  insns per cycle
            69,626 branches                 #  290.735 M/sec
             4,594 branch-misses            #    6.60% of all branches

        0.000485883  seconds time elapsed

Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Link: http://lkml.kernel.org/n/tip-7y40wib8n1eqio5hjpn3dsrm@git.kernel.org
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2011-04-28 08:48:42 +02:00
Ingo Molnar
f9cef0a90c perf stat: Add --sync/-S option
--sync will tell perf stat to run sync() before starting a command.

This allows IO-heavy tests to be used with --repeat, without one
iteration impacting the other.

Elapsed time will stabilize for example:

  before:        3.971525714  seconds time elapsed  ( +-  8.56% )
  after:         3.211098537  seconds time elapsed  ( +-  1.52% )

So measurements will be more accurate.

Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Link: http://lkml.kernel.org/n/tip-7y40wib8n1eqio7hjpn1dsrm@git.kernel.org
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2011-04-28 08:39:39 +02:00
Ingo Molnar
9ceb1c3d1f perf stat: Fix printout vertical alignment
Before:

 |
 | Performance counter stats for '/home/mingo/hackbench 20' (5 runs):
 |
 |        71,321,607 instructions:u           #    0.42  insns per cycle  ( +-  0.00% )
 |       168,040,009 cycles:u                 #    0.000 GHz                      ( +-  0.81% )
 |
 |        1.468002368  seconds time elapsed  ( +-  1.33% )
 |

After:

 |
 | Performance counter stats for '/home/mingo/hackbench 20' (5 runs):
 |
 |        71,321,607 instructions:u           #    0.42  insns per cycle          ( +-  0.00% )
 |       168,040,009 cycles:u                 #    0.000 GHz                      ( +-  0.81% )
 |
 |        1.468002368  seconds time elapsed  ( +-  1.33% )
 |

The last column (stddev noise) is properly aligned, vertically.

Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Link: http://lkml.kernel.org/n/tip-7y40wib8n1eqio7hjpn0dsrm@git.kernel.org
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2011-04-27 17:48:55 +02:00
Ingo Molnar
c6264deff7 perf stat: Add -d/--detailed flag to run with a lot of events
Add the new -d/--detailed flag, which generates a pretty detailed event list:

 Performance counter stats for './hackbench 10' (10 runs):

       1514.287888 task-clock               #   10.897 CPUs utilized            ( +-  3.05% )
            39,698 context-switches         #    0.026 M/sec                    ( +- 12.19% )
             8,147 CPU-migrations           #    0.005 M/sec                    ( +- 16.55% )
            17,918 page-faults              #    0.012 M/sec                    ( +-  0.37% )
     2,944,504,050 cycles                   #    1.944 GHz                      ( +-  3.89% )  (32.60%)
     1,043,971,283 stalled-cycles           #   35.45% of all cycles are idle   ( +-  5.22% )  (44.48%)
     1,655,906,768 instructions             #    0.56  insns per cycle
                                            #    0.63  stalled cycles per insn  ( +-  1.95% )  (55.09%)
       338,832,373 branches                 #  223.757 M/sec                    ( +-  1.96% )  (64.47%)
         3,892,416 branch-misses            #    1.15% of all branches          ( +-  5.49% )  (73.12%)
       606,410,482 L1-dcache-loads          #  400.459 M/sec                    ( +-  1.29% )  (71.21%)
        31,204,395 L1-dcache-load-misses    #    5.15% of all L1-dcache hits    ( +-  3.04% )  (60.43%)
         3,922,751 LLC-loads                #    2.590 M/sec                    ( +-  6.80% )  (46.87%)
         5,037,288 LLC-load-misses          #    3.327 M/sec                    ( +-  3.56% )  (13.00%)

        0.138966828  seconds time elapsed  ( +-  4.11% )

This can be used "at a glance" for narrower analysis.

-d can also be used in addition to other -e events, to further expand an event list.

Acked-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Acked-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Link: http://lkml.kernel.org/n/tip-cxs98quixs3qyvdqx3goojc4@git.kernel.org
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2011-04-26 21:03:16 +02:00
Ingo Molnar
8bb6c79f24 perf stat: Print out miss/hit ratio for L1 data-cache events
Print out this kind of l1-dcache-misses percentage:

 Performance counter stats for './bw_tcp localhost':

    29,956,262,201 cycles                   #    3.002 GHz                      (scaled from 85.14%)
     8,255,209,558 stalled-cycles           #   27.56% of all cycles are idle   (scaled from 86.56%)
     1,206,130,308 l1-dcache-misses         #   40.49% of all L1-dcache hits    (scaled from 86.30%)
     2,978,756,779 l1-dcache-refs           #  298.512 M/sec                    (scaled from 70.02%)
     8,861,956,159 instructions             #    0.30  insns per cycle
                                            #    0.93  stalled cycles per insn  (scaled from 84.27%)
     1,644,306,068 branches                 #  164.782 M/sec                    (scaled from 86.43%)
        74,778,443 branch-misses            #    4.55% of all branches          (scaled from 70.69%)
       9978.695711 task-clock               #    0.693 CPUs utilized

       14.404347983  seconds time elapsed

And color the result depending on the severity of cache-trashing.

Acked-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Acked-by Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Link: http://lkml.kernel.org/n/tip-54gmz0zymaid84zcs7joq02p@git.kernel.org
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2011-04-26 20:32:24 +02:00
Ingo Molnar
c78df6c1d4 perf stat: Print branch misses warning colors
Print the missed-branches percentage with different warning level ASCII colors,
as the percentage passes the 5%/10%/20% thresholds.

These thresholds are set to relatively low levels, because on most CPUs even a
moderate percentage of branch-misses already shows up as a slowdown.

Acked-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Acked-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Link: http://lkml.kernel.org/n/tip-ybqukg7p86leiup7gl03ecgk@git.kernel.org
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2011-04-26 20:04:58 +02:00
Ingo Molnar
a5d243d04a perf stat: Print stalled cycles warning colors
Print the stalled-cycles percentage with different warning level ASCII colors,
as the percentage passes the 25%/50%/75% thresholds.

Acked-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Acked-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Link: http://lkml.kernel.org/n/tip-e25zz44rcms7mu9az4fu5zp0@git.kernel.org
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2011-04-26 20:04:58 +02:00
Ingo Molnar
f99844cb76 perf stat: Fix -nan% output in perf stat noise printouts
Before:

                 0 CPU-migrations           #    0.000 M/sec                    ( +-  -nan% )

After:

                 0 CPU-migrations           #    0.000 M/sec                    ( +-  0.00% )

Also factor out the noise printing function.

Acked-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Acked-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Link: http://lkml.kernel.org/n/tip-z89h2v1bk1mikcbsf7e6v34q@git.kernel.org
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2011-04-26 20:04:57 +02:00
Ingo Molnar
1fc570ad89 perf stat: Add stalled cycles to the default output
The new default output looks like this:

 Performance counter stats for './loop_1b_instructions':

        236.010686 task-clock               #    0.996 CPUs utilized
                 0 context-switches         #    0.000 M/sec
                 0 CPU-migrations           #    0.000 M/sec
                99 page-faults              #    0.000 M/sec
       756,487,646 cycles                   #    3.205 GHz
       354,938,996 stalled-cycles           #   46.92% of all cycles are idle
     1,001,403,797 instructions             #    1.32  insns per cycle
                                            #    0.35  stalled cycles per insn
       100,279,773 branches                 #  424.895 M/sec
            12,646 branch-misses            #    0.013 % of all branches

        0.236902540  seconds time elapsed

We dropped cache-refs and cache-misses and added stalled-cycles - this is a
more generic "how well utilized is the CPU" metric.

If the stalled-cycles ratio is too high then more specific measurements can be
taken to figure out the source of the inefficiency.

Acked-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Acked-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Link: http://lkml.kernel.org/n/tip-pbpl2l4mn797s69bclfpwkwn@git.kernel.org
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2011-04-26 20:04:57 +02:00
Ingo Molnar
481f988a01 perf stat: Add stalled cycles accounting, prettify the resulting output
Add stalled cycles accounting and use it to print the "cycles stalled per
instruction" value.

Also change the unit of the cycles output from M/sec to GHz - this is more
intuitive.

Prettify the output to:

 Performance counter stats for './loop_1b_instructions':

        239.775036 task-clock               #    0.997 CPUs utilized
       761,903,912 cycles                   #    3.178 GHz
       356,620,620 stalled-cycles           #   46.81% of all cycles are idle
     1,001,578,351 instructions             #    1.31  insns per cycle
                                            #    0.36  stalled cycles per insn
            14,782 cache-references         #    0.062 M/sec
             5,694 cache-misses             #   38.520 % of all cache refs

        0.240493656  seconds time elapsed

Also adjust the --repeat output to make the percentages align vertically:

 Performance counter stats for './loop_1b_instructions' (10 runs):

        236.096793 task-clock               #    0.997 CPUs utilized             ( +-   0.011% )
       756,553,086 cycles                   #    3.204 GHz                       ( +-   0.002% )
       354,942,692 stalled-cycles           #   46.92% of all cycles are idle    ( +-   0.008% )
     1,001,389,700 instructions             #    1.32  insns per cycle
                                            #    0.35  stalled cycles per insn   ( +-   0.000% )
            10,166 cache-references         #    0.043 M/sec                     ( +-   0.742% )
               468 cache-misses             #    4.608 % of all cache refs       ( +-  13.385% )

        0.236874136  seconds time elapsed   ( +- 0.01% )

Acked-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Acked-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Link: http://lkml.kernel.org/n/tip-uapziqny39601apdmmhoz7hk@git.kernel.org
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2011-04-26 20:04:56 +02:00
Ingo Molnar
dcd9936a5a perf stat: Factor our shadow stats
Create update_shadow_stats() which is then used in both read_counter_aggr()
and read_counter().

This not only simplifies the code but also fixes a bug: HW_CACHE_REFERENCES
was not updated in read_counter().

Acked-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Acked-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Link: http://lkml.kernel.org/n/tip-9uc55z3g88r47exde7zxjm6p@git.kernel.org
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2011-04-26 20:04:56 +02:00
Ingo Molnar
749141d926 perf stat: Make all displayed event names parseable as well
Right now we display this by default:

          0.202204 task-clock-msecs         #      0.282 CPUs
                 0 context-switches         #      0.000 M/sec
                 0 CPU-migrations           #      0.000 M/sec
                85 page-faults              #      0.420 M/sec

The task-clock-msecs event cannot actually be passed back as an
event name, the event name we recognize is 'task-clock'.

So change the output of the cpu-clock and task-clock events
to be idempotent.

( Units should be printed out in the right-side column, if needed. )

Acked-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Acked-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Link: http://lkml.kernel.org/n/tip-lexrnbzy09asscgd4f7oac4i@git.kernel.org
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2011-04-26 20:04:55 +02:00
Ingo Molnar
ceb53fbf6d perf stat: Fail more clearly when an invalid modifier is specified
Currently we fail without printing any error message on "perf stat -e task-clock-msecs".

The reason is that the task-clock event is matched and the "-msecs" postfix is assumed
to be an event modifier - but is not recognized.

This patch changes the code to be more informative:

 $ perf stat -e task-clock-msecs true
 invalid event modifier: '-msecs'
 Run 'perf list' for a list of valid events and modifiers

And restructures the return value of parse_event_modifier() to allow
the printing of all variants of invalid event modifiers.

Acked-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Acked-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Link: http://lkml.kernel.org/n/tip-wlaw3dvz1ly6wple8l52cfca@git.kernel.org
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2011-04-26 20:04:55 +02:00
Ingo Molnar
b908debd4e perf tools: Accept case-insensitive symbolic event variants
We currently fail on something like '-e CPU-migrations', with:

  invalid or unsupported event: 'CPU-migrations'

While 'CPU-migrations' is how we actually print out the event
in the default perf stat output:

 Performance counter stats for 'true':

          0.202204 task-clock-msecs         #      0.282 CPUs
                 0 context-switches         #      0.000 M/sec
                 0 CPU-migrations           #      0.000 M/sec

So change the matching to be case-insensitive.

Acked-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Acked-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Link: http://lkml.kernel.org/n/tip-omcm3edjjtx83a4kh2e244se@git.kernel.org
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2011-04-26 20:04:55 +02:00
Ingo Molnar
d58f4c82fe perf stat: Print cache misses as percentage
Before:

       113,393,041 cache-references         #     83.636 M/sec
         7,052,454 cache-misses             #      5.202 M/sec

After:

       112,589,441 cache-references         #     87.925 M/sec
         6,556,354 cache-misses             #      5.823 %

misses/hits percentages are more expressive than absolute numbers
or rates.

(Also prettify the CPUs printout line to not have a trailing whitespace.)

Acked-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Acked-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Link: http://lkml.kernel.org/n/tip-axm28f43x439bl41zkvfzd63@git.kernel.org
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2011-04-26 20:04:54 +02:00
Ingo Molnar
11ba2b85f5 perf stat: Print stalled cycles percentage
Print:

           611,527 cycles
           400,553 instructions             # (  0.71 instructions per cycle )
            77,809 stalled-cycles           # ( 12.71% of all cycles )

        0.000610987  seconds time elapsed

Acked-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Acked-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Link: http://lkml.kernel.org/n/tip-fd6x8r1cpyb6zhlrc4ix8m45@git.kernel.org
2011-04-26 20:04:54 +02:00
Ingo Molnar
94403f8863 perf events: Add stalled cycles generic event - PERF_COUNT_HW_STALLED_CYCLES
The new PERF_COUNT_HW_STALLED_CYCLES event tries to approximate
cycles the CPU does nothing useful, because it is stalled on a
cache-miss or some other condition.

Acked-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Acked-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Link: http://lkml.kernel.org/n/tip-fue11vymwqsoo5to72jxxjyl@git.kernel.org
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2011-04-26 20:04:53 +02:00
David Ahern
9cbdb70209 perf script: improve validation of sample attributes for output fields
Check for required sample attributes using evsel rather than sample_type
in the session header. If the attribute for a default field is not
present for the event type (e.g., new command operating on file from
older kernel) the field is removed from the output list.

Expected event types must exist. For example, if a user specifies

  -f trace:time,trace -f sw:time,cpu,sym

the perf.data file must contain both tracepoints and software events
(ie., it is an error if either does not exist in the file).

Attribute checking is done once at the beginning of perf-script rather
than for each sample.

v1 -> v2:
- addressed comments from acme

Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/1302148460-570-1-git-send-email-daahern@cisco.com
Signed-off-by: David Ahern <daahern@cisco.com>
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2011-04-20 07:27:52 -03:00
Arun Sharma
0817a6a3a4 perf script: Add support for PERF_TYPE_RAW
Useful for getting stack traces for hardware events not handled by
PERF_TYPE_HARDWARE.

Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Stephane Eranian <eranian@google.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Tom Zanussi <tzanussi@gmail.com>
Signed-off-by: Arun Sharma <asharma@fb.com>
Link: http://lkml.kernel.org/n/tip-qimdcdpekjqxuyqovy4kjusx@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2011-04-19 11:08:58 -03:00
Michael Witten
f18568aae5 perf tools: git mv tools/perf/{features-tests.mak,config/}
Signed-off-by: Michael Witten <mfwitten@gmail.com>
Link: http://lkml.kernel.org/n/tip-a6zhefjayuounko1tk5sjji2@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2011-04-19 08:18:36 -03:00
Michael Witten
7fbd065f5a perf tools: Move `try-cc'
The `try-cc' user-defined function was in tools/perf/feature-tests.mak;
this commit moves it to tools/perf/config/utilities.mak.

Signed-off-by: Michael Witten <mfwitten@gmail.com>
Link: http://lkml.kernel.org/n/tip-bqhwcuxsrve0iodn6q4ejaoi@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2011-04-19 08:18:36 -03:00
Michael Witten
ced465c400 perf tools: Makefile: PYTHON{,_CONFIG} to bandage Python 3 incompatibility
Currently, Python 3 is not supported by perf's code; this
can cause the build to fail for systems that have Python 3
installed as the default python:

  python{,-config}

The Correct Solution is to write compatibility code so that
Python 3 works out-of-the-box.

However, users often have an ancillary Python 2 installed:

  python2{,-config}

Therefore, a quick fix is to allow the user to specify those
ancillary paths as the python binaries that Makefile should
use, thereby avoiding Python 3 altogether; as an added benefit,
the Python binaries may be installed in non-standard locations
without the need for updating any PATH variable.

This commit adds the ability to set PYTHON and/or PYTHON_CONFIG
either as environment variables or as make variables on the
command line; the paths may be relative, and usually only PYTHON
is necessary in order for PYTHON_CONFIG to be defined implicitly.
Some rudimentary error checking is performed when the user
explicitly specifies a value for any of these variables.

In addition, this commit introduces significantly robust makefile
infrastructure for working with paths and communicating with the
shell; it's currently only used for handling Python, but I hope
it will prove useful in refactoring the makefiles.

Thanks to:

  Raghavendra D Prabhu <rprabhu@wnohang.net>

for motivating this patch.

Acked-by: Raghavendra D Prabhu <rprabhu@wnohang.net>
Link: http://lkml.kernel.org/r/e987828e-87ec-4973-95e7-47f10f5d9bab-mfwitten@gmail.com
Signed-off-by: Michael Witten <mfwitten@gmail.com>
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2011-04-19 08:18:36 -03:00
Michael Witten
3643b133f2 perf tools: Makefile: Clean up `python/perf.so' rule
There is no need for a subshell or an explicit `export';
as per the POSIX Shell Command Language specification:

  http://pubs.opengroup.org/onlinepubs/009695399/utilities/xcu_chap02.html#tag_02_09_01
  http://pubs.opengroup.org/onlinepubs/009695399/utilities/xcu_chap02.html#tag_02_10_02

It is only necessary to include the environment variable
assignment just before the command to be run.

Also, it is better to use single-quotes, because GNU make
might expand `$(BASIC_CFLAGS)' into something that the shell
could interpret within double-quotes.

Acked-by: Raghavendra D Prabhu <rprabhu@wnohang.net>
Link: http://lkml.kernel.org/n/tip-58n38o02ocuzrm9qh096hsf5@git.kernel.org
Signed-off-by: Michael Witten <mfwitten@gmail.com>
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2011-04-19 08:18:36 -03:00
Arnaldo Carvalho de Melo
aeafcbaf4f perf symbols: Give more useful names to 'self' parameters
One more installment on an area that is mostly dormant.

Suggested-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Tom Zanussi <tzanussi@gmail.com>
LKML-Reference: <new-submission>
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2011-04-19 08:18:35 -03:00
Ingo Molnar
68d2cf25d3 Merge branch 'perf/urgent' into perf/core
Merge reason: we'll be queueing up dependent changes.

Signed-off-by: Ingo Molnar <mingo@elte.hu>
2011-04-19 07:56:17 +02:00
Arnaldo Carvalho de Melo
5d2cd90922 perf evsel: Fix use of inherit
perf stat doesn't mmap and its perfectly fine for it to use task-bound
counters with inheritance.

So set the attr.inherit on the caller and leave the syscall itself to
validate it.

When the mmap fails perf_evlist__mmap will just emit a warning if this
is the failure reason.

Reported-by: Peter Zijlstra <peterz@infradead.org>
Acked-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Cc: Tom Zanussi <tzanussi@gmail.com>
Link: http://lkml.kernel.org/r/20110414170121.GC3229@ghostprotocols.net
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2011-04-15 12:52:28 -03:00
Lin Ming
db9a9cbc81 perf hists browser: Fix seg fault when annotate null symbol
In hists browser, press hotkey 'a' to annotate current symbol.

Now it causes segment fault if 'a' is pressed on a null symbol.

Here are 2 small bugs:
- In perf_evsel__hists_browse, the condition check after 'a' is pressed
  is not correct, we should check ->sym instead of ->map.
- In symbol__tui_annotate we must check whether sym is NULL or not
  before getting annotation structure.

This patch fixes above 2 small bugs.

Link: http://lkml.kernel.org/r/1302244286.4106.36.camel@minggr.sh.intel.com
Signed-off-by: Lin Ming <ming.m.lin@intel.com>
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2011-04-15 12:51:49 -03:00
Eric Dumazet
621d26567f perf: Fix a build error with some GCC versions
Fix this:

 util/cgroup.c: In function ‘open_cgroup’:
 util/cgroup.c:16:16: error: ‘saved_ptr’ may be used uninitialized in this function
 util/cgroup.c:16:16: note: ‘saved_ptr’ was declared here

Apparently newer GCC (4.6) can figure out that this variable is properly
initialized - but some versions of GCC (such as 4.5.2) need help.

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Cc: Arnaldo Carvalho de Melo <acme@infradead.org>
LKML-Reference: <new-submission>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2011-04-08 17:40:21 +02:00
Linus Torvalds
8b9686ff4d Merge branches 'x86-fixes-for-linus', 'sched-fixes-for-linus', 'timers-fixes-for-linus', 'irq-fixes-for-linus' and 'perf-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip
* 'x86-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
  x86-32, fpu: Fix FPU exception handling on non-SSE systems
  x86, hibernate: Initialize mmu_cr4_features during boot
  x86-32, NUMA: Fix ACPI NUMA init broken by recent x86-64 change
  x86: visws: Fixup irq overhaul fallout

* 'sched-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
  sched: Clean up rebalance_domains() load-balance interval calculation

* 'timers-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
  x86/mrst/vrtc: Fix boot crash in mrst_rtc_init()
  rtc, x86/mrst/vrtc: Fix boot crash in rtc_read_alarm()

* 'irq-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
  genirq: Fix cpumask leak in __setup_irq()

* 'perf-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
  perf probe: Fix listing incorrect line number with inline function
  perf probe: Fix to find recursively inlined function
  perf probe: Fix multiple --vars options behavior
  perf probe: Fix to remove redundant close
  perf probe: Fix to ensure function declared file
2011-04-07 12:12:58 -07:00
Linus Torvalds
42933bac11 Merge branch 'for-linus2' of git://git.profusion.mobi/users/lucas/linux-2.6
* 'for-linus2' of git://git.profusion.mobi/users/lucas/linux-2.6:
  Fix common misspellings
2011-04-07 11:14:49 -07:00
Masami Hiramatsu
1d46ea2a6a perf probe: Fix listing incorrect line number with inline function
Fix a bug showing incorrect line number when a probe is put on the head of an
inline function. This patch updates find_perf_probe_point() and introduces new
rules to get correct line number.

 - If debuginfo doesn't have a correct file name, we shouldn't return line
   number too, because, without file name, line number is meaningless.

 - If the address is in a function, it stores the function name and the offset
   from the function entry.

   - If the address is on a line, it tries to get the relative line number from
     the function entry line, except for the address is same as the entry
     address of the function (in this case, the relative line number should
     be 0).

     - If the address is in an inline function entry (call-site), it uses the
       inline function call line number as the line on which the address is.

   - If the address is in an inline function body, it stores the inline
     function name and offset from the inline function call site instead of the
     (non-inlined) function.

Cc: 2nddept-manager@sdl.hitachi.co.jp
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Lin Ming <ming.m.lin@intel.com>
Cc: Peter Zijlstra <peterz@infradead.org>
LKML-Reference: <20110330092605.2132.11629.stgit@ltc236.sdl.hitachi.co.jp>
Signed-off-by: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2011-04-05 15:38:12 -03:00