590679 Commits

Author SHA1 Message Date
Linus Torvalds
ed1e33dded Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dtor/input
Pull input subsystem fixes from Dmitry Torokhov.

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dtor/input:
  Input: twl6040-vibra - fix DT node memory management
  Input: max8997-haptic - fix NULL pointer dereference
  Input: byd - update copyright header
2016-05-12 12:47:49 -07:00
Arnaldo Carvalho de Melo
42ef8a78c1 perf stat: Fallback to user only counters when perf_event_paranoid > 1
After 0161028b7c8a ("perf/core: Change the default paranoia level to 2")
'perf stat' fails for users without CAP_SYS_ADMIN, so just use
'perf_evsel__fallback()' to have the same behaviour as 'perf record',
i.e. set perf_event_attr.exclude_kernel to 1.

Now:

  [acme@jouet linux]$ perf stat usleep 1

   Performance counter stats for 'usleep 1':

          0.352536      task-clock:u (msec)  #   0.423 CPUs utilized
                 0      context-switches:u   #   0.000 K/sec
                 0      cpu-migrations:u     #   0.000 K/sec
                49      page-faults:u        #   0.139 M/sec
           309,407      cycles:u             #   0.878 GHz
           243,791      instructions:u       #   0.79  insn per cycle
            49,622      branches:u           # 140.757 M/sec
             3,884      branch-misses:u      #   7.83% of all branches

       0.000834174 seconds time elapsed

  [acme@jouet linux]$

Reported-by: Ingo Molnar <mingo@kernel.org>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: David Ahern <dsahern@gmail.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Kees Cook <keescook@chromium.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Wang Nan <wangnan0@huawei.com>
Link: http://lkml.kernel.org/n/tip-b20jmx4dxt5hpaa9t2rroi0o@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2016-05-12 16:25:18 -03:00
Arnaldo Carvalho de Melo
08094828b7 perf evsel: Handle EACCESS + perf_event_paranoid=2 in fallback()
Now with the default for the kernel.perf_event_paranoid sysctl being 2 [1]
we need to fall back to :u, i.e. to set perf_event_attr.exclude_kernel
to 1.

Before:

  [acme@jouet linux]$ perf record usleep 1
  Error:
  You may not have permission to collect stats.

  Consider tweaking /proc/sys/kernel/perf_event_paranoid,
  which controls use of the performance events system by
  unprivileged users (without CAP_SYS_ADMIN).

  The current value is 2:

    -1: Allow use of (almost) all events by all users
  >= 0: Disallow raw tracepoint access by users without CAP_IOC_LOCK
  >= 1: Disallow CPU event access by users without CAP_SYS_ADMIN
  >= 2: Disallow kernel profiling by users without CAP_SYS_ADMIN
  [acme@jouet linux]$

After:

  [acme@jouet linux]$ perf record usleep 1
  [ perf record: Woken up 1 times to write data ]
  [ perf record: Captured and wrote 0.016 MB perf.data (7 samples) ]
  [acme@jouet linux]$ perf evlist
  cycles:u
  [acme@jouet linux]$ perf evlist -v
  cycles:u: size: 112, { sample_period, sample_freq }: 4000, sample_type: IP|TID|TIME|PERIOD, disabled: 1, inherit: 1, exclude_kernel: 1, mmap: 1, comm: 1, freq: 1, enable_on_exec: 1, task: 1, sample_id_all: 1, exclude_guest: 1, mmap2: 1, comm_exec: 1
  [acme@jouet linux]$

And if the user turns on verbose mode, an explanation will appear:

  [acme@jouet linux]$ perf record -v usleep 1
  Warning:
  kernel.perf_event_paranoid=2, trying to fall back to excluding kernel samples
  mmap size 528384B
  [ perf record: Woken up 1 times to write data ]
  Looking at the vmlinux_path (8 entries long)
  Using /lib/modules/4.6.0-rc7+/build/vmlinux for symbols
  [ perf record: Captured and wrote 0.016 MB perf.data (7 samples) ]
  [acme@jouet linux]$

[1] 0161028b7c8a ("perf/core: Change the default paranoia level to 2")

Reported-by: Ingo Molnar <mingo@kernel.org>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: David Ahern <dsahern@gmail.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Kees Cook <keescook@chromium.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Wang Nan <wangnan0@huawei.com>
Link: http://lkml.kernel.org/n/tip-b20jmx4dxt5hpaa9t2rroi0o@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2016-05-12 16:13:16 -03:00
Alex Deucher
c47b9e0944 drm/amdgpu: fix DP mode validation
Switch the order of the loops to walk the rates on the top
so we exhaust all DP 1.1 rate/lane combinations before trying
DP 1.2 rate/lane combos.

This avoids selecting rates that are supported by the monitor,
but not the connector leading to valid modes getting rejected.

bug:
https://bugs.freedesktop.org/show_bug.cgi?id=95206

Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Cc: stable@vger.kernel.org
2016-05-12 15:03:49 -04:00
Alex Deucher
ff0bd441bd drm/radeon: fix DP mode validation
Switch the order of the loops to walk the rates on the top
so we exhaust all DP 1.1 rate/lane combinations before trying
DP 1.2 rate/lane combos.

This avoids selecting rates that are supported by the monitor,
but not the connector leading to valid modes getting rejected.

bug:
https://bugs.freedesktop.org/show_bug.cgi?id=95206

Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Cc: stable@vger.kernel.org
2016-05-12 15:03:39 -04:00
Arnaldo Carvalho de Melo
7d173913a6 perf evsel: Improve EPERM error handling in open_strerror()
We were showing a hardcoded default value for the kernel.perf_event_paranoid
sysctl, now that it became more paranoid (1 -> 2 [1]), this would need to be
updated, instead show the current value:

  [acme@jouet linux]$ perf record ls
  Error:
  You may not have permission to collect stats.

  Consider tweaking /proc/sys/kernel/perf_event_paranoid,
  which controls use of the performance events system by
  unprivileged users (without CAP_SYS_ADMIN).

  The current value is 2:

    -1: Allow use of (almost) all events by all users
  >= 0: Disallow raw tracepoint access by users without CAP_IOC_LOCK
  >= 1: Disallow CPU event access by users without CAP_SYS_ADMIN
  >= 2: Disallow kernel profiling by users without CAP_SYS_ADMIN
  [acme@jouet linux]$

[1] 0161028b7c8a ("perf/core: Change the default paranoia level to 2")

Reported-by: Ingo Molnar <mingo@kernel.org>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: David Ahern <dsahern@gmail.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Kees Cook <keescook@chromium.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Wang Nan <wangnan0@huawei.com>
Link: http://lkml.kernel.org/n/tip-0gc4rdpg8d025r5not8s8028@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2016-05-12 15:44:55 -03:00
Linus Torvalds
422ce5a975 A single last pin control fix for v4.6:
- The pull up/down logic for the AT91 PIO4 controller was tilted:
   we need to mask the reverse pull when unmasking a pull direction.
   Setting both pull up & pull down is illegal and makes no sense.
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1
 
 iQIcBAABAgAGBQJXNILOAAoJEEEQszewGV1zLfAQAIvFWWkNEtuAC4+INKjyqI6/
 UZuc9CtqIAUx23C/Syco2itSQ40xIyQL5fnH9o9IBnSQc+Qv8H8jMvO3eLezZEnK
 s3fPfdid3jhRPQWa5qXP6z5TtRTMPFqoiT8JzVENL3j167pD2Ip1K0XPOR1lZsWY
 1zOHXZtiDGOMqA+PIOKXbst1r2Ney5AL1/WBkwrLisX7oEm57jUgtvJiiNyHQ56H
 GFU4kpa4ei0EBhySOUc1JnUKtQOx9g1ZlQS3vWp9uDDXdEMkgr7zL9mP63lACaG9
 Qfz65bpPzfwgYvYS7FrMkHTCYIODRPAHyyPj0Vm+JNzDTYD1Op99kZIzBEcMKhWq
 obEh9o8PT9/y1JYl+Xhzkv1wLYO/zNwdljh5H7zDVdIkkrAV5t3nFyOutA+lWWi4
 3GoaQAyrJre5UyrgdjI2ARBUwqYDGSziKLensqSakhmHCv6RQsvYYs0nBeVKDADj
 7DSHFcUwo44ivCc7CRy7Lm3Z/nyfwlHOoS4kvzIQTJ0ipiyHoUfgacknWsXoOZJ6
 mLl0HZwRxJ/OaNOtkXKrReph/CMpx3rdSbbTAQf2RXyqoOclOM/AXQJOnTGzVQZI
 jw1xnk2uMY7GM1eTnOADNxHDXUNh0dGHmpkJf/hlZ/oc4Tbu7h8fE8Kf38IzTZzT
 e3oZPmkuBt644pHqSpzT
 =Kzzn
 -----END PGP SIGNATURE-----

Merge tag 'pinctrl-v4.6-4' of git://git.kernel.org/pub/scm/linux/kernel/git/linusw/linux-pinctrl

Pull pinctrl fix from Linus Walleij:
 "A single last pin control fix for v4.6.  t's tagged for stable and
  only hits a single driver with two added lines so should be safe.
  Tested in linux-next.

   - The pull up/down logic for the AT91 PIO4 controller was tilted: we
     need to mask the reverse pull when unmasking a pull direction.

     Setting both pull up & pull down is illegal and makes no sense"

* tag 'pinctrl-v4.6-4' of git://git.kernel.org/pub/scm/linux/kernel/git/linusw/linux-pinctrl:
  pinctrl: at91-pio4: fix pull-up/down logic
2016-05-12 11:23:08 -07:00
Wanpeng Li
f7c17d26f4 workqueue: fix rebind bound workers warning
------------[ cut here ]------------
WARNING: CPU: 0 PID: 16 at kernel/workqueue.c:4559 rebind_workers+0x1c0/0x1d0
Modules linked in:
CPU: 0 PID: 16 Comm: cpuhp/0 Not tainted 4.6.0-rc4+ #31
Hardware name: IBM IBM System x3550 M4 Server -[7914IUW]-/00Y8603, BIOS -[D7E128FUS-1.40]- 07/23/2013
 0000000000000000 ffff881037babb58 ffffffff8139d885 0000000000000010
 0000000000000000 0000000000000000 0000000000000000 ffff881037babba8
 ffffffff8108505d ffff881037ba0000 000011cf3e7d6e60 0000000000000046
Call Trace:
 dump_stack+0x89/0xd4
 __warn+0xfd/0x120
 warn_slowpath_null+0x1d/0x20
 rebind_workers+0x1c0/0x1d0
 workqueue_cpu_up_callback+0xf5/0x1d0
 notifier_call_chain+0x64/0x90
 ? trace_hardirqs_on_caller+0xf2/0x220
 ? notify_prepare+0x80/0x80
 __raw_notifier_call_chain+0xe/0x10
 __cpu_notify+0x35/0x50
 notify_down_prepare+0x5e/0x80
 ? notify_prepare+0x80/0x80
 cpuhp_invoke_callback+0x73/0x330
 ? __schedule+0x33e/0x8a0
 cpuhp_down_callbacks+0x51/0xc0
 cpuhp_thread_fun+0xc1/0xf0
 smpboot_thread_fn+0x159/0x2a0
 ? smpboot_create_threads+0x80/0x80
 kthread+0xef/0x110
 ? wait_for_completion+0xf0/0x120
 ? schedule_tail+0x35/0xf0
 ret_from_fork+0x22/0x50
 ? __init_kthread_worker+0x70/0x70
---[ end trace eb12ae47d2382d8f ]---
notify_down_prepare: attempt to take down CPU 0 failed

This bug can be reproduced by below config w/ nohz_full= all cpus:

CONFIG_BOOTPARAM_HOTPLUG_CPU0=y
CONFIG_DEBUG_HOTPLUG_CPU0=y
CONFIG_NO_HZ_FULL=y

As Thomas pointed out:

| If a down prepare callback fails, then DOWN_FAILED is invoked for all
| callbacks which have successfully executed DOWN_PREPARE.
|
| But, workqueue has actually two notifiers. One which handles
| UP/DOWN_FAILED/ONLINE and one which handles DOWN_PREPARE.
|
| Now look at the priorities of those callbacks:
|
| CPU_PRI_WORKQUEUE_UP        = 5
| CPU_PRI_WORKQUEUE_DOWN      = -5
|
| So the call order on DOWN_PREPARE is:
|
| CB 1
| CB ...
| CB workqueue_up() -> Ignores DOWN_PREPARE
| CB ...
| CB X ---> Fails
|
| So we call up to CB X with DOWN_FAILED
|
| CB 1
| CB ...
| CB workqueue_up() -> Handles DOWN_FAILED
| CB ...
| CB X-1
|
| So the problem is that the workqueue stuff handles DOWN_FAILED in the up
| callback, while it should do it in the down callback. Which is not a good idea
| either because it wants to be called early on rollback...
|
| Brilliant stuff, isn't it? The hotplug rework will solve this problem because
| the callbacks become symetric, but for the existing mess, we need some
| workaround in the workqueue code.

The boot CPU handles housekeeping duty(unbound timers, workqueues,
timekeeping, ...) on behalf of full dynticks CPUs. It must remain
online when nohz full is enabled. There is a priority set to every
notifier_blocks:

workqueue_cpu_up > tick_nohz_cpu_down > workqueue_cpu_down

So tick_nohz_cpu_down callback failed when down prepare cpu 0, and
notifier_blocks behind tick_nohz_cpu_down will not be called any
more, which leads to workers are actually not unbound. Then hotplug
state machine will fallback to undo and online cpu 0 again. Workers
will be rebound unconditionally even if they are not unbound and
trigger the warning in this progress.

This patch fix it by catching !DISASSOCIATED to avoid rebind bound
workers.

Cc: Tejun Heo <tj@kernel.org>
Cc: Lai Jiangshan <jiangshanlai@gmail.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Frédéric Weisbecker <fweisbec@gmail.com>
Cc: stable@vger.kernel.org
Suggested-by: Lai Jiangshan <jiangshanlai@gmail.com>
Signed-off-by: Wanpeng Li <wanpeng.li@hotmail.com>
2016-05-12 12:00:23 -04:00
Arnd Bergmann
8a934ccbf9 Second AT91 fix PR for 4.6:
- fix a regression on the clock subsystem while switching to syscon/regmap
   due to a stricter check of the register map.
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1.4.11 (GNU/Linux)
 
 iQEcBAABAgAGBQJXM0NyAAoJEAf03oE53VmQ0IwIAI05Acjbko78+Hm/kWHlbJrx
 5BLpJqTGLavbjbPjFnElKTr1c8m7Bgd6HNcat/2z9ZNh8/82FXyGOjYSp4W0BE9V
 ovyD/lz/2fZyom4YLJjdCnVkwIV1pKHS+uNxy5zXMY2gJfz4VRtUh54H3ArdaQOE
 LOKvrVl9bGorLbzbTpH0rSeKqQ1XFGPDlp1A3GUtI6646RWd5Eqe6zDzgQUwWJis
 AR+sAcsqVb+IWY5rZwMyL0oZ1kjnEw4tihDTy+G+eBXRdRMYqSgEt8dBz65T3VYi
 5EFXLuLjknnf3qVaxEU+og6kiJkefTYdB4I7xVR+dKbcILBSSqdHxlG1HmNkBQ4=
 =CF/9
 -----END PGP SIGNATURE-----

Merge tag 'at91-fixes2' of git://git.kernel.org/pub/scm/linux/kernel/git/nferre/linux-at91 into fixes

Merge "Second AT91 fix PR for 4.6" from Nicolas Ferre:

- fix a regression on the clock subsystem while switching to syscon/regmap
  due to a stricter check of the register map.

* tag 'at91-fixes2' of git://git.kernel.org/pub/scm/linux/kernel/git/nferre/linux-at91:
  ARM: dts: at91: sam9x5: Fix the memory range assigned to the PMC
2016-05-12 17:44:53 +02:00
Felipe Balbi
09be4c824e cgroup: fix compile warning
commit 4f41fc59620f ("cgroup, kernfs: make mountinfo
 show properly scoped path for cgroup namespaces")
 added the following compile warning:

kernel/cgroup.c: In function ‘cgroup_show_path’:
kernel/cgroup.c:1634:15: warning: unused variable ‘ret’ [-Wunused-variable]
  int len = 0, ret = 0;
               ^
fix it.

Fixes: 4f41fc59620f ("cgroup, kernfs: make mountinfo show properly scoped path for cgroup namespaces")
Signed-off-by: Felipe Balbi <felipe.balbi@linux.intel.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
2016-05-12 11:05:27 -04:00
Serge E. Hallyn
3cc9b23c81 kernfs: kernfs_sop_show_path: don't return 0 after seq_dentry call
Our caller expects 0 on success, not >0.

This fixes a bug in the patch

	cgroup, kernfs: make mountinfo show properly scoped path for cgroup namespaces

where /sys does not show up in mountinfo, breaking criu.

Thanks for catching this, Andrei.

Reported-by: Andrei Vagin <avagin@gmail.com>
Signed-off-by: Serge Hallyn <serge.hallyn@ubuntu.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
2016-05-12 11:03:51 -04:00
Steven Rostedt
106b816cb4 tools lib traceevent: Do not reassign parg after collapse_tree()
At the end of process_filter(), collapse_tree() was changed to update
the parg parameter, but the reassignment after the call wasn't removed.

What happens is that the "current_op" gets modified and freed and parg
is assigned to the new allocated argument. But after the call to
collapse_tree(), parg is assigned again to the just freed "current_op",
and this causes the tool to crash.

The current_op variable must also be assigned to NULL in case of error,
otherwise it will cause it to be free()ed twice.

Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
Acked-by: Namhyung Kim <namhyung@kernel.org>
Cc: stable@vger.kernel.org # 3.14+
Fixes: 42d6194d133c ("tools lib traceevent: Refactor process_filter()")
Link: http://lkml.kernel.org/r/20160511150936.678c18a1@gandalf.local.home
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2016-05-12 11:27:00 -03:00
Arnaldo Carvalho de Melo
4924734570 perf probe: Check if dwarf_getlocations() is available
If not, tell the user that:

  config/Makefile:273: Old libdw.h, finding variables at given 'perf probe' point will not work, install elfutils-devel/libdw-dev >= 0.157

And return -ENOTSUPP in die_get_var_range(), failing features that
need it, like the one pointed out above.

This fixes the build on older systems, such as Ubuntu 12.04.5.

Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Vinson Lee <vlee@freedesktop.org>
Cc: Wang Nan <wangnan0@huawei.com>
Link: http://lkml.kernel.org/n/tip-9l7luqkq4gfnx7vrklkq4obs@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2016-05-12 11:26:59 -03:00
Arnaldo Carvalho de Melo
62aa0e177d perf dwarf: Guard !x86_64 definitions under #ifdef else clause
To fix the build on Fedora Rawhide (gcc 6.0.0 20160311 (Red Hat 6.0.0-0.17):

    CC       /tmp/build/perf/arch/x86/util/dwarf-regs.o
  arch/x86/util/dwarf-regs.c:66:36: error: 'x86_32_regoffset_table' defined but not used [-Werror=unused-const-variable=]
   static const struct pt_regs_offset x86_32_regoffset_table[] = {
                                      ^~~~~~~~~~~~~~~~~~~~~~
  cc1: all warnings being treated as errors

Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Wang Nan <wangnan0@huawei.com>
Link: http://lkml.kernel.org/n/tip-fghuksc1u8ln82bof4lwcj0o@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2016-05-12 11:26:59 -03:00
Arnaldo Carvalho de Melo
22a9f41b55 perf tools: Use readdir() instead of deprecated readdir_r()
The readdir() function is thread safe as long as just one thread uses a
DIR, which is the case when parsing tracepoint event definitions, to
avoid breaking the build with glibc-2.23.90 (upcoming 2.24), use it
instead of readdir_r().

See: http://man7.org/linux/man-pages/man3/readdir.3.html

"However, in modern implementations (including the glibc implementation),
concurrent calls to readdir() that specify different directory streams
are thread-safe.  In cases where multiple threads must read from the
same directory stream, using readdir() with external synchronization is
still preferable to the use of the deprecated readdir_r(3) function."

Noticed while building on a Fedora Rawhide docker container.

Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Wang Nan <wangnan0@huawei.com>
Link: http://lkml.kernel.org/n/tip-wddn49r6bz6wq4ee3dxbl7lo@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2016-05-12 11:26:58 -03:00
Arnaldo Carvalho de Melo
7839b9f32e perf thread_map: Use readdir() instead of deprecated readdir_r()
The readdir() function is thread safe as long as just one thread uses a
DIR, which is the case in thread_map, so, to avoid breaking the build
with glibc-2.23.90 (upcoming 2.24), use it instead of readdir_r().

See: http://man7.org/linux/man-pages/man3/readdir.3.html

"However, in modern implementations (including the glibc implementation),
concurrent calls to readdir() that specify different directory streams
are thread-safe.  In cases where multiple threads must read from the
same directory stream, using readdir() with external synchronization is
still preferable to the use of the deprecated readdir_r(3) function."

Noticed while building on a Fedora Rawhide docker container.

Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Wang Nan <wangnan0@huawei.com>
Link: http://lkml.kernel.org/n/tip-del8h2a0f40z75j4r42l96l0@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2016-05-12 11:26:58 -03:00
Arnaldo Carvalho de Melo
9a5f3bf332 perf script: Use readdir() instead of deprecated readdir_r()
The readdir() function is thread safe as long as just one thread uses a
DIR, which is the case in 'perf script', so, to avoid breaking the build
with glibc-2.23.90 (upcoming 2.24), use it instead of readdir_r().

See: http://man7.org/linux/man-pages/man3/readdir.3.html

"However, in modern implementations (including the glibc implementation),
concurrent calls to readdir() that specify different directory streams
are thread-safe.  In cases where multiple threads must read from the
same directory stream, using readdir() with external synchronization is
still preferable to the use of the deprecated readdir_r(3) function."

Noticed while building on a Fedora Rawhide docker container.

Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Wang Nan <wangnan0@huawei.com>
Link: http://lkml.kernel.org/n/tip-mt3xz7n2hl49ni2vx7kuq74g@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2016-05-12 11:26:57 -03:00
Arnaldo Carvalho de Melo
2515e61483 perf tools: Use readdir() instead of deprecated readdir_r()
The readdir() function is thread safe as long as just one thread uses a
DIR, which is the case when synthesizing events for pre-existing threads
by traversing /proc, so, to avoid breaking the build with glibc-2.23.90
(upcoming 2.24), use it instead of readdir_r().

See: http://man7.org/linux/man-pages/man3/readdir.3.html

"However, in modern implementations (including the glibc implementation),
concurrent calls to readdir() that specify different directory streams
are thread-safe.  In cases where multiple threads must read from the
same directory stream, using readdir() with external synchronization is
still preferable to the use of the deprecated readdir_r(3) function."

Noticed while building on a Fedora Rawhide docker container.

   CC       /tmp/build/perf/util/event.o
  util/event.c: In function '__event__synthesize_thread':
  util/event.c:466:2: error: 'readdir_r' is deprecated [-Werror=deprecated-declarations]
    while (!readdir_r(tasks, &dirent, &next) && next) {
    ^~~~~
  In file included from /usr/include/features.h:368:0,
                   from /usr/include/stdint.h:25,
                   from /usr/lib/gcc/x86_64-redhat-linux/6.0.0/include/stdint.h:9,
                   from /git/linux/tools/include/linux/types.h:6,
                   from util/event.c:1:
  /usr/include/dirent.h:189:12: note: declared here

Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Wang Nan <wangnan0@huawei.com>
Link: http://lkml.kernel.org/n/tip-i1vj7nyjp2p750rirxgrfd3c@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2016-05-12 10:22:54 -03:00
Alexander Shishkin
9f448cd3cb perf/core: Disable the event on a truncated AUX record
When the PMU driver reports a truncated AUX record, it effectively means
that there is no more usable room in the event's AUX buffer (even though
there may still be some room, so that perf_aux_output_begin() doesn't take
action). At this point the consumer still has to be woken up and the event
has to be disabled, otherwise the event will just keep spinning between
perf_aux_output_begin() and perf_aux_output_end() until its context gets
unscheduled.

Again, for cpu-wide events this means never, so once in this condition,
they will be forever losing data.

Fix this by disabling the event and waking up the consumer in case of a
truncated AUX record.

Reported-by: Markus Metzger <markus.t.metzger@intel.com>
Signed-off-by: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: <stable@vger.kernel.org>
Cc: Arnaldo Carvalho de Melo <acme@infradead.org>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Vince Weaver <vincent.weaver@maine.edu>
Cc: vince@deater.net
Link: http://lkml.kernel.org/r/1462886313-13660-3-git-send-email-alexander.shishkin@linux.intel.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-05-12 14:46:11 +02:00
Alexander Shishkin
ab92b232ae perf/x86/intel/pt: Generate PMI in the STOP region as well
Currently, the PT driver always sets the PMI bit one region (page) before
the STOP region so that we can wake up the consumer before we run out of
room in the buffer and have to disable the event. However, we also need
an interrupt in the last output region, so that we actually get to disable
the event (if no more room from new data is available at that point),
otherwise hardware just quietly refuses to start, but the event is
scheduled in and we end up losing trace data till the event gets removed.

For a cpu-wide event it is even worse since there may not be any
re-scheduling at all and no chance for the ring buffer code to notice
that its buffer is filled up and the event needs to be disabled (so that
the consumer can re-enable it when it finishes reading the data out). In
other words, all the trace data will be lost after the buffer gets filled
up.

This patch makes PT also generate a PMI when the last output region is
full.

Reported-by: Markus Metzger <markus.t.metzger@intel.com>
Signed-off-by: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: <stable@vger.kernel.org>
Cc: Arnaldo Carvalho de Melo <acme@infradead.org>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Vince Weaver <vincent.weaver@maine.edu>
Cc: vince@deater.net
Link: http://lkml.kernel.org/r/1462886313-13660-2-git-send-email-alexander.shishkin@linux.intel.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-05-12 14:45:59 +02:00
Dmitry V. Levin
9a7a076e8e x86: Use compat version for preadv2 and pwritev2
Similar to preadv and pwritev, preadv2 and pwritev2 need compat entries
in the 32-bit syscall table.

This bug was found by strace test suite.

Fixes: 4babf2c5efb7 ("x86: wire up preadv2 and pwritev2")
Signed-off-by: Dmitry V. Levin <ldv@altlinux.org>
Acked-by: Christoph Hellwig <hch@lst.de>
Link: http://lkml.kernel.org/r/20160511084817.GA29823@altlinux.org
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2016-05-12 14:27:13 +02:00
David Howells
23c8a812dc KEYS: Fix ASN.1 indefinite length object parsing
This fixes CVE-2016-0758.

In the ASN.1 decoder, when the length field of an ASN.1 value is extracted,
it isn't validated against the remaining amount of data before being added
to the cursor.  With a sufficiently large size indicated, the check:

	datalen - dp < 2

may then fail due to integer overflow.

Fix this by checking the length indicated against the amount of remaining
data in both places a definite length is determined.

Whilst we're at it, make the following changes:

 (1) Check the maximum size of extended length does not exceed the capacity
     of the variable it's being stored in (len) rather than the type that
     variable is assumed to be (size_t).

 (2) Compare the EOC tag to the symbolic constant ASN1_EOC rather than the
     integer 0.

 (3) To reduce confusion, move the initialisation of len outside of:

	for (len = 0; n > 0; n--) {

     since it doesn't have anything to do with the loop counter n.

Signed-off-by: David Howells <dhowells@redhat.com>
Reviewed-by: Mimi Zohar <zohar@linux.vnet.ibm.com>
Acked-by: David Woodhouse <David.Woodhouse@intel.com>
Acked-by: Peter Jones <pjones@redhat.com>
2016-05-12 12:01:49 +01:00
Alexander Shishkin
3f56e687a1 perf/core: Disable the event on a truncated AUX record
When the PMU driver reports a truncated AUX record, it effectively means
that there is no more usable room in the event's AUX buffer (even though
there may still be some room, so that perf_aux_output_begin() doesn't take
action). At this point the consumer still has to be woken up and the event
has to be disabled, otherwise the event will just keep spinning between
perf_aux_output_begin() and perf_aux_output_end() until its context gets
unscheduled.

Again, for cpu-wide events this means never, so once in this condition,
they will be forever losing data.

Fix this by disabling the event and waking up the consumer in case of a
truncated AUX record.

Reported-by: Markus Metzger <markus.t.metzger@intel.com>
Signed-off-by: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Arnaldo Carvalho de Melo <acme@infradead.org>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Vince Weaver <vincent.weaver@maine.edu>
Cc: vince@deater.net
Link: http://lkml.kernel.org/r/1462886313-13660-3-git-send-email-alexander.shishkin@linux.intel.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-05-12 10:14:55 +02:00
Alexander Shishkin
5fbe4788b5 perf/x86/intel/pt: Generate PMI in the STOP region as well
Currently, the PT driver always sets the PMI bit one region (page) before
the STOP region so that we can wake up the consumer before we run out of
room in the buffer and have to disable the event. However, we also need
an interrupt in the last output region, so that we actually get to disable
the event (if no more room from new data is available at that point),
otherwise hardware just quietly refuses to start, but the event is
scheduled in and we end up losing trace data till the event gets removed.

For a cpu-wide event it is even worse since there may not be any
re-scheduling at all and no chance for the ring buffer code to notice
that its buffer is filled up and the event needs to be disabled (so that
the consumer can re-enable it when it finishes reading the data out). In
other words, all the trace data will be lost after the buffer gets filled
up.

This patch makes PT also generate a PMI when the last output region is
full.

Reported-by: Markus Metzger <markus.t.metzger@intel.com>
Signed-off-by: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Arnaldo Carvalho de Melo <acme@infradead.org>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Vince Weaver <vincent.weaver@maine.edu>
Cc: vince@deater.net
Link: http://lkml.kernel.org/r/1462886313-13660-2-git-send-email-alexander.shishkin@linux.intel.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-05-12 10:14:55 +02:00
Ingo Molnar
5319e5ad0c Merge branch 'perf/urgent' into perf/core, to pick up fixes
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-05-12 10:14:45 +02:00
Andrey Ryabinin
6d6f2833bf perf/x86: Fix undefined shift on 32-bit kernels
Jim reported:

	UBSAN: Undefined behaviour in arch/x86/events/intel/core.c:3708:12
	shift exponent 35 is too large for 32-bit type 'long unsigned int'

The use of 'unsigned long' type obviously is not correct here, make it
'unsigned long long' instead.

Reported-by: Jim Cromie <jim.cromie@gmail.com>
Signed-off-by: Andrey Ryabinin <aryabinin@virtuozzo.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: <stable@vger.kernel.org>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Imre Palik <imrep@amazon.de>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Vince Weaver <vincent.weaver@maine.edu>
Fixes: 2c33645d366d ("perf/x86: Honor the architectural performance monitoring version")
Link: http://lkml.kernel.org/r/1462974711-10037-1-git-send-email-aryabinin@virtuozzo.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-05-12 10:14:31 +02:00
Peter Zijlstra
3c3116b745 perf/x86/msr: Fix SMI overflow
We compute 'delta' and properly sign extend it and then ignore it and
recompute the raw value, loosing the sign extention.

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Vince Weaver <vincent.weaver@maine.edu>
Cc: kan.liang@intel.com
Cc: linux-kernel@vger.kernel.org
Cc: luto@kernel.org
Cc: ray.huang@amd.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-05-12 10:14:30 +02:00
hchrzani
ec336c879c perf/x86/intel/uncore: Fix CHA registers configuration procedure for Knights Landing platform
CHA events in Knights Landing platform require programming filter registers properly.
Remote node, local node and NonNearMemCachable bits should be set to 1 at all times.

Signed-off-by: Hubert Chrzaniuk <hubert.chrzaniuk@intel.com>
Signed-off-by: Lawrence F Meadows <lawrence.f.meadows@intel.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Vince Weaver <vincent.weaver@maine.edu>
Cc: bp@suse.de
Cc: harish.chegondi@intel.com
Cc: hpa@zytor.com
Cc: izumi.taku@jp.fujitsu.com
Cc: kan.liang@intel.com
Cc: lukasz.anaczkowski@intel.com
Cc: vthakkar1994@gmail.com
Fixes: 77af0037de0a ('perf/x86/intel/uncore: Add Knights Landing uncore PMU support')
Link: http://lkml.kernel.org/r/1462779419-17115-2-git-send-email-hubert.chrzaniuk@intel.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-05-12 10:14:30 +02:00
Thomas Gleixner
50605ffbda sched/core: Provide a tsk_nr_cpus_allowed() helper
tsk_nr_cpus_allowed() is an accessor for task->nr_cpus_allowed which allows
us to change the representation of ->nr_cpus_allowed if required.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: linux-kernel@vger.kernel.org
Link: http://lkml.kernel.org/r/1462969411-17735-2-git-send-email-bigeasy@linutronix.de
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-05-12 09:55:36 +02:00
Thomas Gleixner
ade42e092b sched/core: Use tsk_cpus_allowed() instead of accessing ->cpus_allowed
Use the future-safe accessor for struct task_struct's.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: linux-kernel@vger.kernel.org
Link: http://lkml.kernel.org/r/1462969411-17735-1-git-send-email-bigeasy@linutronix.de
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-05-12 09:55:35 +02:00
Vik Heyndrickx
20878232c5 sched/loadavg: Fix loadavg artifacts on fully idle and on fully loaded systems
Systems show a minimal load average of 0.00, 0.01, 0.05 even when they
have no load at all.

Uptime and /proc/loadavg on all systems with kernels released during the
last five years up until kernel version 4.6-rc5, show a 5- and 15-minute
minimum loadavg of 0.01 and 0.05 respectively. This should be 0.00 on
idle systems, but the way the kernel calculates this value prevents it
from getting lower than the mentioned values.

Likewise but not as obviously noticeable, a fully loaded system with no
processes waiting, shows a maximum 1/5/15 loadavg of 1.00, 0.99, 0.95
(multiplied by number of cores).

Once the (old) load becomes 93 or higher, it mathematically can never
get lower than 93, even when the active (load) remains 0 forever.
This results in the strange 0.00, 0.01, 0.05 uptime values on idle
systems.  Note: 93/2048 = 0.0454..., which rounds up to 0.05.

It is not correct to add a 0.5 rounding (=1024/2048) here, since the
result from this function is fed back into the next iteration again,
so the result of that +0.5 rounding value then gets multiplied by
(2048-2037), and then rounded again, so there is a virtual "ghost"
load created, next to the old and active load terms.

By changing the way the internally kept value is rounded, that internal
value equivalent now can reach 0.00 on idle, and 1.00 on full load. Upon
increasing load, the internally kept load value is rounded up, when the
load is decreasing, the load value is rounded down.

The modified code was tested on nohz=off and nohz kernels. It was tested
on vanilla kernel 4.6-rc5 and on centos 7.1 kernel 3.10.0-327. It was
tested on single, dual, and octal cores system. It was tested on virtual
hosts and bare hardware. No unwanted effects have been observed, and the
problems that the patch intended to fix were indeed gone.

Tested-by: Damien Wyart <damien.wyart@free.fr>
Signed-off-by: Vik Heyndrickx <vik.heyndrickx@veribox.net>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: <stable@vger.kernel.org>
Cc: Doug Smythies <dsmythies@telus.net>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Fixes: 0f004f5a696a ("sched: Cure more NO_HZ load average woes")
Link: http://lkml.kernel.org/r/e8d32bff-d544-7748-72b5-3c86cc71f09f@veribox.net
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-05-12 09:55:34 +02:00
Morten Rasmussen
cfa1033431 sched/fair: Correct unit of load_above_capacity
In calculate_imbalance() load_above_capacity currently has the unit
[capacity] while it is used as being [load/capacity]. Not only is it
wrong it also makes it unlikely that load_above_capacity is ever used
as the subsequent code picks the smaller of load_above_capacity and
the avg_load

This patch ensures that load_above_capacity has the right unit
[load/capacity].

Signed-off-by: Morten Rasmussen <morten.rasmussen@arm.com>
[ Changed changelog to note it was in capacity unit; +rebase. ]
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Dietmar Eggemann <dietmar.eggemann@arm.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-kernel@vger.kernel.org
Link: http://lkml.kernel.org/r/1461958364-675-4-git-send-email-dietmar.eggemann@arm.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-05-12 09:55:33 +02:00
Peter Zijlstra
1be0eb2a97 sched/fair: Clean up scale confusion
Wanpeng noted that the scale_load_down() in calculate_imbalance() was
weird. I agree, it should be SCHED_CAPACITY_SCALE, since we're going
to compare against busiest->group_capacity, which is in [capacity]
units.

Reported-by: Wanpeng Li <wanpeng.li@hotmail.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Morten Rasmussen <morten.rasmussen@arm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Yuyang Du <yuyang.du@intel.com>
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-05-12 09:55:33 +02:00
Wanpeng Li
444969223c sched/nohz: Fix affine unpinned timers mess
The following commit:

  9642d18eee2c ("nohz: Affine unpinned timers to housekeepers")'

intended to affine unpinned timers to housekeepers:

  unpinned timers(full dynaticks, idle)   =>   nearest busy housekeepers(otherwise, fallback to any housekeepers)
  unpinned timers(full dynaticks, busy)   =>   nearest busy housekeepers(otherwise, fallback to any housekeepers)
  unpinned timers(houserkeepers, idle)    =>   nearest busy housekeepers(otherwise, fallback to itself)

However, the !idle_cpu(i) && is_housekeeping_cpu(cpu) check modified the
intention to:

  unpinned timers(full dynaticks, idle)   =>   any housekeepers(no mattter cpu topology)
  unpinned timers(full dynaticks, busy)   =>   any housekeepers(no mattter cpu topology)
  unpinned timers(housekeepers, idle)     =>   any busy cpus(otherwise, fallback to any housekeepers)

This patch fixes it by checking if there are busy housekeepers nearby,
otherwise falls to any housekeepers/itself. After the patch:

  unpinned timers(full dynaticks, idle)   =>   nearest busy housekeepers(otherwise, fallback to any housekeepers)
  unpinned timers(full dynaticks, busy)   =>   nearest busy housekeepers(otherwise, fallback to any housekeepers)
  unpinned timers(housekeepers, idle)     =>   nearest busy housekeepers(otherwise, fallback to itself)

Signed-off-by: Wanpeng Li <wanpeng.li@hotmail.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
[ Fixed the changelog. ]
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-kernel@vger.kernel.org
Fixes: 'commit 9642d18eee2c ("nohz: Affine unpinned timers to housekeepers")'
Link: http://lkml.kernel.org/r/1462344334-8303-1-git-send-email-wanpeng.li@hotmail.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-05-12 09:55:32 +02:00
Peter Zijlstra
2f950354e6 sched/fair: Fix fairness issue on migration
Pavan reported that in the presence of very light tasks (or cgroups)
the placement of migrated tasks can cause severe fairness issues.

The problem is that enqueue_entity() places the task before it updates
time, thereby it can place the task far in the past (remember that
light tasks will shoot virtual time forward at a high speed, so in
relation to the pre-existing light task, we can land far in the past).

This is done because update_curr() needs the current task, and we
might be placing the current task.

The obvious solution is to differentiate between the current and any
other task; placing the current before we update time, and placing any
other task after, such that !curr tasks end up at the current moment
in time, and not in the past.

This commit re-introduces the previously reverted commit:

  3a47d5124a95 ("sched/fair: Fix fairness issue on migration")

... which is now safe to do, after we've also fixed another
underlying bug first, in:

  sched/fair: Prepare to fix fairness problems on migration

and cleaned up other details in the migration code:

  sched/core: Kill sched_class::task_waking

Reported-by: Pavan Kondeti <pkondeti@codeaurora.org>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-05-12 09:55:32 +02:00
Peter Zijlstra
59efa0bac9 sched/core: Kill sched_class::task_waking to clean up the migration logic
With sched_class::task_waking being called only when we do
set_task_cpu(), we can make sched_class::migrate_task_rq() do the work
and eliminate sched_class::task_waking entirely.

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Andrew Hunter <ahh@google.com>
Cc: Ben Segall <bsegall@google.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Matt Fleming <matt@codeblueprint.co.uk>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Mike Galbraith <umgwanakikbuti@gmail.com>
Cc: Morten Rasmussen <morten.rasmussen@arm.com>
Cc: Paul Turner <pjt@google.com>
Cc: Pavan Kondeti <pkondeti@codeaurora.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: byungchul.park@lge.com
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-05-12 09:55:31 +02:00
Peter Zijlstra
b5179ac70d sched/fair: Prepare to fix fairness problems on migration
Mike reported that our recent attempt to fix migration problems:

  3a47d5124a95 ("sched/fair: Fix fairness issue on migration")

broke interactivity and the signal starve test. We reverted that
commit and now let's try it again more carefully, with some other
underlying problems fixed first.

One problem is that I assumed ENQUEUE_WAKING was only set when we do a
cross-cpu wakeup (migration), which isn't true. This means we now
destroy the vruntime history of tasks and wakeup-preemption suffers.

Cure this by making my assumption true, only call
sched_class::task_waking() when we do a cross-cpu wakeup. This avoids
the indirect call in the case we do a local wakeup.

Reported-by: Mike Galbraith <mgalbraith@suse.de>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Andrew Hunter <ahh@google.com>
Cc: Ben Segall <bsegall@google.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Matt Fleming <matt@codeblueprint.co.uk>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Morten Rasmussen <morten.rasmussen@arm.com>
Cc: Paul Turner <pjt@google.com>
Cc: Pavan Kondeti <pkondeti@codeaurora.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: byungchul.park@lge.com
Cc: linux-kernel@vger.kernel.org
Fixes: 3a47d5124a95 ("sched/fair: Fix fairness issue on migration")
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-05-12 09:55:31 +02:00
Peter Zijlstra
c58d25f371 sched/fair: Move record_wakee()
Since I want to make ->task_woken() conditional on the task getting
migrated, we cannot use it to call record_wakee().

Move it to select_task_rq_fair(), which gets called in almost all the
same conditions. The only exception is if the woken task (@p) is
CPU-bound (as per the nr_cpus_allowed test in select_task_rq()).

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Andrew Hunter <ahh@google.com>
Cc: Ben Segall <bsegall@google.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Matt Fleming <matt@codeblueprint.co.uk>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Mike Galbraith <umgwanakikbuti@gmail.com>
Cc: Morten Rasmussen <morten.rasmussen@arm.com>
Cc: Paul Turner <pjt@google.com>
Cc: Pavan Kondeti <pkondeti@codeaurora.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: byungchul.park@lge.com
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-05-12 09:55:30 +02:00
Ingo Molnar
4eb8676517 Merge branch 'smp/hotplug' into sched/core, to resolve conflicts
Conflicts:
	kernel/sched/core.c

Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-05-12 09:51:36 +02:00
Ingo Molnar
eb60b3e5e8 Merge branch 'sched/urgent' into sched/core to pick up fixes
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-05-12 09:18:13 +02:00
Yazen Ghannam
754a923059 x86/RAS: Add SMCA support to AMD Error Injector
Use SMCA MSRs when writing to MCA_{STATUS,ADDR,MISC} and
MCA_DE{STAT,ADDR} when injecting Deferred Errors on SMCA platforms.

Signed-off-by: Yazen Ghannam <Yazen.Ghannam@amd.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Aravind Gopalakrishnan <aravindksg.lkml@gmail.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Tony Luck <tony.luck@intel.com>
Cc: linux-edac <linux-edac@vger.kernel.org>
Link: http://lkml.kernel.org/r/1462971509-3856-8-git-send-email-bp@alien8.de
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-05-12 09:08:23 +02:00
Yazen Ghannam
a348ed83d9 EDAC, mce_amd: Detect SMCA using X86_FEATURE_SMCA
Use X86_FEATURE_SMCA when detecting if SMCA is available instead of
directly using CPUID 0x80000007_EBX.

Signed-off-by: Yazen Ghannam <Yazen.Ghannam@amd.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Tony Luck <tony.luck@intel.com>
Cc: linux-edac <linux-edac@vger.kernel.org>
Link: http://lkml.kernel.org/r/1462971509-3856-7-git-send-email-bp@alien8.de
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-05-12 09:08:23 +02:00
Yazen Ghannam
14cddfd530 x86/mce: Update AMD mcheck init to use cpu_has() facilities
Use cpu_has() facilities to find available RAS features rather than
directly reading CPUID 0x80000007_EBX.

Signed-off-by: Yazen Ghannam <Yazen.Ghannam@amd.com>
[ Use the struct cpuinfo_x86 ptr instead. ]
Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Tony Luck <tony.luck@intel.com>
Cc: linux-edac <linux-edac@vger.kernel.org>
Link: http://lkml.kernel.org/r/1462971509-3856-6-git-send-email-bp@alien8.de
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-05-12 09:08:22 +02:00
Yazen Ghannam
71faad4306 x86/cpu: Add detection of AMD RAS Capabilities
Add a new CPUID leaf to hold the contents of CPUID 0x80000007_EBX (RasCap).

Define bits that are currently in use:

 Bit 0: McaOverflowRecov
 Bit 1: SUCCOR
 Bit 3: ScalableMca

Signed-off-by: Yazen Ghannam <Yazen.Ghannam@amd.com>
[ Shorten comment. ]
Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Tony Luck <tony.luck@intel.com>
Cc: linux-edac <linux-edac@vger.kernel.org>
Link: http://lkml.kernel.org/r/1462971509-3856-5-git-send-email-bp@alien8.de
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-05-12 09:08:22 +02:00
Borislav Petkov
e128b4f483 x86/mce/AMD: Save an indentation level in prepare_threshold_block()
Do the !SMCA work first and then save us an indentation level for the
SMCA code.

No functionality change.

Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Aravind Gopalakrishnan <aravindksg.lkml@gmail.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Tony Luck <tony.luck@intel.com>
Cc: Yazen Ghannam <Yazen.Ghannam@amd.com>
Cc: linux-edac <linux-edac@vger.kernel.org>
Link: http://lkml.kernel.org/r/1462971509-3856-4-git-send-email-bp@alien8.de
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-05-12 09:08:21 +02:00
Yazen Ghannam
32544f0603 x86/mce/AMD: Disable LogDeferredInMcaStat for SMCA systems
Disable Deferred Error logging in MCA_{STATUS,ADDR} additionally for
SMCA systems as this information will retrieved from MCA_DE{STAT,ADDR}
on those systems.

Signed-off-by: Yazen Ghannam <Yazen.Ghannam@amd.com>
[ Simplify, drop SMCA_MCAX_EN_OFF define too. ]
Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Aravind Gopalakrishnan <aravindksg.lkml@gmail.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Tony Luck <tony.luck@intel.com>
Cc: linux-edac <linux-edac@vger.kernel.org>
Link: http://lkml.kernel.org/r/1462971509-3856-3-git-send-email-bp@alien8.de
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-05-12 09:08:20 +02:00
Yazen Ghannam
3410200958 x86/mce/AMD: Log Deferred Errors using SMCA MCA_DE{STAT,ADDR} registers
Scalable MCA provides new registers for all banks for logging deferred
errors: MCA_DESTAT and MCA_DEADDR. Deferred errors are always logged to
these registers.

Update the AMD deferred error handler to use these registers, if
available.

Signed-off-by: Yazen Ghannam <Yazen.Ghannam@amd.com>
[ Sanity-check __log_error() args, massage a bit. ]
Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Aravind Gopalakrishnan <aravindksg.lkml@gmail.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Tony Luck <tony.luck@intel.com>
Cc: linux-edac <linux-edac@vger.kernel.org>
Link: http://lkml.kernel.org/r/1462971509-3856-2-git-send-email-bp@alien8.de
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-05-12 09:08:19 +02:00
Ingo Molnar
9ab975a029 perf/core improvements and fixes:
User visible:
 
 - Fix symbol insertion and callchain behavior in db-export (Chris Phlipot)
 
 Infrastructure:
 
 - Add libunwind build test (feature query), working towards supporting
   cross-platform DWARF callchains, starting with arm/arm64 (He Kuang)
 
 - Use lsdir() more extensively (Masami Hiramatsu)
 
 - Use SBUILD_ID_SIZE in places where the equivalent expression was
   being used (Masami Hiramatsu)
 
 - Split some more 'perf trace' syscall arg beautifiers (Arnaldo Carvalho de Melo)
 
 Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v2
 
 iQIcBAABCAAGBQJXM1oVAAoJENZQFvNTUqpAHpkP/1k2Ciin/zvyH2fVAO+R3dsW
 yPwdZ6FI+O7gS/4X2ZM5qzbbwb4jNq3obJZrRL3xyY4/eoo4H2vdubwpCCJJMuSe
 NRLqmf2rU7W6fuVjdZAobnOYuUNzZrGwAcInrPhZ/YQTHvsvjDX+xjBFB3nbSClM
 m7a9UbOPw38p5irfdTZFARThovYYoUWHHECly4/QKRFB8v0aMOHpj8ai+aD03+3U
 i737l+MAKD9p0toz20RcStq0Wt0/2OsLlaQZ7D9bu278LZ0GccVQsug8J1cO9J+v
 0ZCbirA0FMdeQfyjsWrogQf7kTFBG3SI/6KXo4wcBqxsE4HaX340kPuuGubQQJFA
 1a7KXbyIry5qjZXXp2gFkiu/w3xiG7REmkRqRvXJ4+jeOiHnZvXwSXG+ciaC9gnP
 2yCWbsNkTN2CSkSqsgKxwN1Qjofffb8/FBQlLh7VkPVvIib6N9a8Ei8Vwz2CYezN
 fV2pL9nx57X3TX4uIOTqhwN+hau4vtwz0n59U+aaBGToAFeteMQhFvE1/+6ose5V
 dyqhwh8IhG8qjNBf6TOnXWBo2xKw9EVdQ1IXiHXplEcNrpH6DwhdKAhgxCo60dmQ
 1zNHE4g+iZOJcQ4mzfjD5KoiYi9OEz3tFckFS9wGKTvDso5DhIt/CV2tyAnAA0iG
 ab/j/kPg5FxPUP3GAtoG
 =9OqC
 -----END PGP SIGNATURE-----

Merge tag 'perf-core-for-mingo-20160511' of git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux into perf/core

Pull perf/core improvements and fixes from Arnaldo Carvalho de Melo:

User visible changes:

- Fix symbol insertion and callchain behavior in db-export (Chris Phlipot)

Infrastructure changes:

- Add libunwind build test (feature query), working towards supporting
  cross-platform DWARF callchains, starting with arm/arm64 (He Kuang)

- Use lsdir() more extensively (Masami Hiramatsu)

- Use SBUILD_ID_SIZE in places where the equivalent expression was
  being used (Masami Hiramatsu)

- Split some more 'perf trace' syscall arg beautifiers (Arnaldo Carvalho de Melo)

Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-05-12 08:57:52 +02:00
David S. Miller
1b7cc307a8 Merge branch 'bnxt_en-fixes'
Michael Chan says:

====================
bnxt_en: Add workaround to detect bad opaque in rx completion.

2-part workaround for this hardware bug.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2016-05-11 23:46:09 -04:00
Michael Chan
fa7e28127a bnxt_en: Add workaround to detect bad opaque in rx completion (part 2)
Add detection and recovery code when the hardware returned opaque value
does not match the expected consumer index.  Once the issue is detected,
we skip the processing of all RX and LRO/GRO packets.  These completion
entries are discarded without sending the SKB to the stack and without
producing new buffers.  The function will be reset from a workqueue.

Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-05-11 23:46:09 -04:00