Notably: changed lib/rwsem* targets from lib- to obj-, no idea about
the ramifications of that.
Suggested-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/n/tip-g0kynfh5feriwc6p3h6kpbw6@git.kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
In certain occasions it is possible for a hung task detector
positive to be false: continuation from a paused VM, for example.
Add a method to reset detection, similar as is done
with other kernel watchdogs.
Acked-by: Don Zickus <dzickus@redhat.com>
Acked-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Signed-off-by: Gleb Natapov <gleb@redhat.com>
Conflicts:
kernel/Makefile
There are conflicts in kernel/Makefile due to file moving in the
scheduler tree - resolve them.
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Completions already have their own header file: linux/completion.h
Move the implementation out of kernel/sched/core.c and into its own
file: kernel/sched/completion.c.
Signed-off-by: Peter Zijlstra <peterz@infradead.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Link: http://lkml.kernel.org/n/tip-x2y49rmxu5dljt66ai2lcfuw@git.kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
For some reason only the wait part of the wait api lives in
kernel/sched/wait.c and the wake part still lives in kernel/sched/core.c;
ammend this.
Signed-off-by: Peter Zijlstra <peterz@infradead.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Link: http://lkml.kernel.org/n/tip-ftycee88naznulqk7ei5mbci@git.kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
There are conflicts in lockdep.c due to RCU changes, and also the RCU
tree changes kernel/Makefile - so pre-merge it to ease the moving of
locking related .c files to kernel/locking/.
Signed-off-by: Ingo Molnar <mingo@kernel.org>
The original SOFT_DISABLE patches didn't add support for soft disable
of syscall events; this adds it.
Add an array of ftrace_event_file pointers indexed by syscall number
to the trace array and remove the existing enabled bitmaps, which as a
result are now redundant. The ftrace_event_file structs in turn
contain the soft disable flags we need for per-syscall soft disable
accounting.
Adding ftrace_event_files also means we can remove the USE_CALL_FILTER
bit, thus enabling multibuffer filter support for syscall events.
Link: http://lkml.kernel.org/r/6e72b566e85d8df8042f133efbc6c30e21fb017e.1382620672.git.tom.zanussi@linux.intel.com
Signed-off-by: Tom Zanussi <tom.zanussi@linux.intel.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
The trace event filters are still tied to event calls rather than
event files, which means you don't get what you'd expect when using
filters in the multibuffer case:
Before:
# echo 'bytes_alloc > 8192' > /sys/kernel/debug/tracing/events/kmem/kmalloc/filter
# cat /sys/kernel/debug/tracing/events/kmem/kmalloc/filter
bytes_alloc > 8192
# mkdir /sys/kernel/debug/tracing/instances/test1
# echo 'bytes_alloc > 2048' > /sys/kernel/debug/tracing/instances/test1/events/kmem/kmalloc/filter
# cat /sys/kernel/debug/tracing/events/kmem/kmalloc/filter
bytes_alloc > 2048
# cat /sys/kernel/debug/tracing/instances/test1/events/kmem/kmalloc/filter
bytes_alloc > 2048
Setting the filter in tracing/instances/test1/events shouldn't affect
the same event in tracing/events as it does above.
After:
# echo 'bytes_alloc > 8192' > /sys/kernel/debug/tracing/events/kmem/kmalloc/filter
# cat /sys/kernel/debug/tracing/events/kmem/kmalloc/filter
bytes_alloc > 8192
# mkdir /sys/kernel/debug/tracing/instances/test1
# echo 'bytes_alloc > 2048' > /sys/kernel/debug/tracing/instances/test1/events/kmem/kmalloc/filter
# cat /sys/kernel/debug/tracing/events/kmem/kmalloc/filter
bytes_alloc > 8192
# cat /sys/kernel/debug/tracing/instances/test1/events/kmem/kmalloc/filter
bytes_alloc > 2048
We'd like to just move the filter directly from ftrace_event_call to
ftrace_event_file, but there are a couple cases that don't yet have
multibuffer support and therefore have to continue using the current
event_call-based filters. For those cases, a new USE_CALL_FILTER bit
is added to the event_call flags, whose main purpose is to keep the
old behavior for those cases until they can be updated with
multibuffer support; at that point, the USE_CALL_FILTER flag (and the
new associated call_filter_check_discard() function) can go away.
The multibuffer support also made filter_current_check_discard()
redundant, so this change removes that function as well and replaces
it with filter_check_discard() (or call_filter_check_discard() as
appropriate).
Link: http://lkml.kernel.org/r/f16e9ce4270c62f46b2e966119225e1c3cca7e60.1382620672.git.tom.zanussi@linux.intel.com
Signed-off-by: Tom Zanussi <tom.zanussi@linux.intel.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
Dave Jones reported that trinity would be able to trigger the following
back trace:
===============================
[ INFO: suspicious RCU usage. ]
3.10.0-rc2+ #38 Not tainted
-------------------------------
include/linux/rcupdate.h:771 rcu_read_lock() used illegally while idle!
other info that might help us debug this:
RCU used illegally from idle CPU! rcu_scheduler_active = 1, debug_locks = 0
RCU used illegally from extended quiescent state!
1 lock held by trinity-child1/18786:
#0: (rcu_read_lock){.+.+..}, at: [<ffffffff8113dd48>] __perf_event_overflow+0x108/0x310
stack backtrace:
CPU: 3 PID: 18786 Comm: trinity-child1 Not tainted 3.10.0-rc2+ #38
0000000000000000 ffff88020767bac8 ffffffff816e2f6b ffff88020767baf8
ffffffff810b5897 ffff88021de92520 0000000000000000 ffff88020767bbf8
0000000000000000 ffff88020767bb78 ffffffff8113ded4 ffffffff8113dd48
Call Trace:
[<ffffffff816e2f6b>] dump_stack+0x19/0x1b
[<ffffffff810b5897>] lockdep_rcu_suspicious+0xe7/0x120
[<ffffffff8113ded4>] __perf_event_overflow+0x294/0x310
[<ffffffff8113dd48>] ? __perf_event_overflow+0x108/0x310
[<ffffffff81309289>] ? __const_udelay+0x29/0x30
[<ffffffff81076054>] ? __rcu_read_unlock+0x54/0xa0
[<ffffffff816f4000>] ? ftrace_call+0x5/0x2f
[<ffffffff8113dfa1>] perf_swevent_overflow+0x51/0xe0
[<ffffffff8113e08f>] perf_swevent_event+0x5f/0x90
[<ffffffff8113e1c9>] perf_tp_event+0x109/0x4f0
[<ffffffff8113e36f>] ? perf_tp_event+0x2af/0x4f0
[<ffffffff81074630>] ? __rcu_read_lock+0x20/0x20
[<ffffffff8112d79f>] perf_ftrace_function_call+0xbf/0xd0
[<ffffffff8110e1e1>] ? ftrace_ops_control_func+0x181/0x210
[<ffffffff81074630>] ? __rcu_read_lock+0x20/0x20
[<ffffffff81100cae>] ? rcu_eqs_enter_common+0x5e/0x470
[<ffffffff8110e1e1>] ftrace_ops_control_func+0x181/0x210
[<ffffffff816f4000>] ftrace_call+0x5/0x2f
[<ffffffff8110e229>] ? ftrace_ops_control_func+0x1c9/0x210
[<ffffffff816f4000>] ? ftrace_call+0x5/0x2f
[<ffffffff81074635>] ? debug_lockdep_rcu_enabled+0x5/0x40
[<ffffffff81074635>] ? debug_lockdep_rcu_enabled+0x5/0x40
[<ffffffff81100cae>] ? rcu_eqs_enter_common+0x5e/0x470
[<ffffffff8110112a>] rcu_eqs_enter+0x6a/0xb0
[<ffffffff81103673>] rcu_user_enter+0x13/0x20
[<ffffffff8114541a>] user_enter+0x6a/0xd0
[<ffffffff8100f6d8>] syscall_trace_leave+0x78/0x140
[<ffffffff816f46af>] int_check_syscall_exit_work+0x34/0x3d
------------[ cut here ]------------
Perf uses rcu_read_lock() but as the function tracer can trace functions
even when RCU is not currently active, this makes the rcu_read_lock()
used by perf ineffective.
As perf is currently the only user of the ftrace_ops_control_func() and
perf is also the only function callback that actively uses rcu_read_lock(),
the quick fix is to prevent the ftrace_ops_control_func() from calling
its callbacks if RCU is not active.
With Paul's new "rcu_is_watching()" we can tell if RCU is active or not.
Reported-by: Dave Jones <davej@redhat.com>
Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Jiri Olsa <jolsa@redhat.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
As perf uses the rcu_read_lock() primitives for recording into its
ring buffer, perf tracing can not be called when RCU in inactive.
With the perf function tracing, there are functions that can be
traced when RCU is not active, and perf must not have its function
callback called when this is the case.
Luckily, Paul McKenney has created a way to detect when RCU is
active or not with the rcu_is_watching() function. Unfortunately,
this function can also be traced, and if that happens it can cause
a bit of overhead for the perf function calls that do the check.
Recursion protection prevents anything bad from happening, but
there is a bit of added overhead for every function being traced that
must detect that the rcu_is_watching() is also being traced.
As rcu_is_watching() is a helper routine and not part of the
critical logic in RCU, it does not need to be traced in order to
debug RCU itself. Add the "notrace" annotation to all the rcu_is_watching()
calls such that we never trace it.
Link: http://lkml.kernel.org/r/20131104202736.72dd8e45@gandalf.local.home
Acked-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
Use rbtree_postorder_for_each_entry_safe() to destroy the rbtree instead
of opencoding an alternate postorder iteration that modifies the tree
Link: http://lkml.kernel.org/r/1383345566-25087-2-git-send-email-cody@linux.vnet.ibm.com
Signed-off-by: Cody P Schafer <cody@linux.vnet.ibm.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
Conflicts:
drivers/net/ethernet/emulex/benet/be.h
drivers/net/netconsole.c
net/bridge/br_private.h
Three mostly trivial conflicts.
The net/bridge/br_private.h conflict was a function signature (argument
addition) change overlapping with the extern removals from Joe Perches.
In drivers/net/netconsole.c we had one change adjusting a printk message
whilst another changed "printk(KERN_INFO" into "pr_info(".
Lastly, the emulex change was a new inline function addition overlapping
with Joe Perches's extern removals.
Signed-off-by: David S. Miller <davem@davemloft.net>
Resolve cherry-picking conflicts:
Conflicts:
mm/huge_memory.c
mm/memory.c
mm/mprotect.c
See this upstream merge commit for more details:
52469b4fcd4f Merge branch 'core-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Currently check_hung_task() prints a warning if it detects the
problem, but it is not convenient to watch the system logs if
user-space wants to be notified about the hang.
Add the new trace_sched_process_hang() into check_hung_task(),
this way a user-space monitor can easily wait for the hang and
potentially resolve a problem.
Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Cc: Dave Sullivan <dsulliva@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Steven Rostedt <rostedt@goodmis.org>
Link: http://lkml.kernel.org/r/20131019161828.GA7439@redhat.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
uprobe_copy_process() does nothing if the child shares ->mm with
the forking process, but there is a special case: CLONE_VFORK.
In this case it would be more correct to do dup_utask() but avoid
dup_xol(). This is not that important, the child should not unwind
its stack too much, this can corrupt the parent's stack, but at
least we need this to allow to ret-probe __vfork() itself.
Note: in theory, it would be better to check task_pt_regs(p)->sp
instead of CLONE_VFORK, we need to dup_utask() if and only if the
child can return from the function called by the parent. But this
needs the arch-dependant helper, and I think that nobody actually
does clone(same_stack, CLONE_VM).
Reported-by: Martin Cermak <mcermak@redhat.com>
Reported-by: David Smith <dsmith@redhat.com>
Signed-off-by: Oleg Nesterov <oleg@redhat.com>
This finally fixes the serious bug in uretprobes: a forked child
crashes if the parent called fork() with the pending ret probe.
Trivial test-case:
# perf probe -x /lib/libc.so.6 __fork%return
# perf record -e probe_libc:__fork perl -le 'fork || print "OK"'
(the child doesn't print "OK", it is killed by SIGSEGV)
If the child returns from the probed function it actually returns
to trampoline_vaddr, because it got the copy of parent's stack
mangled by prepare_uretprobe() when the parent entered this func.
It crashes because a) this address is not mapped and b) until the
previous change it doesn't have the proper->return_instances info.
This means that uprobe_copy_process() has to create xol_area which
has the trampoline slot, and its vaddr should be equal to parent's
xol_area->vaddr.
Unfortunately, uprobe_copy_process() can not simply do
__create_xol_area(child, xol_area->vaddr). This could actually work
but perf_event_mmap() doesn't expect the usage of foreign ->mm. So
we offload this to task_work_run(), and pass the argument via not
yet used utask->vaddr.
We know that this vaddr is fine for install_special_mapping(), the
necessary hole was recently "created" by dup_mmap() which skips the
parent's VM_DONTCOPY area, and nobody else could use the new mm.
Unfortunately, this also means that we can not handle the errors
properly, we obviously can not abort the already completed fork().
So we simply print the warning if GFP_KERNEL allocation (the only
possible reason) fails.
Reported-by: Martin Cermak <mcermak@redhat.com>
Reported-by: David Smith <dsmith@redhat.com>
Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Acked-by: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
uprobe_copy_process() assumes that the new child doesn't need
->utask, it should be allocated by demand.
But this is not true if the forking task has the pending ret-
probes, the child should report them as well and thus it needs
the copy of parent's ->return_instances chain. Otherwise the
child crashes when it returns from the probed function.
Alternatively we could cleanup the child's stack, but this needs
per-arch changes and this is not what we want. At least systemtap
expects a .return in the child too.
Note: this change alone doesn't fix the problem, see the next
change.
Reported-by: Martin Cermak <mcermak@redhat.com>
Reported-by: David Smith <dsmith@redhat.com>
Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Acked-by: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
Currently xol_add_vma() uses get_unmapped_area() for area->vaddr,
but the next patches need to use the fixed address. So this patch
adds the new "vaddr" argument to __create_xol_area() which should
be used as area->vaddr if it is nonzero.
xol_add_vma() doesn't bother to verify that the predefined addr is
not used, insert_vm_struct() should fail if find_vma_links() detects
the overlap with the existing vma.
Also, __create_xol_area() doesn't need __GFP_ZERO to allocate area.
Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Acked-by: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
No functional changes, preparation.
Extract the code which actually allocates/installs the new area
into the new helper, __create_xol_area().
While at it remove the unnecessary "ret = ENOMEM" and "ret = 0"
in xol_add_vma(), they both have no effect.
Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Acked-by: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
Preparation for the next patches.
Move the callsite of uprobe_copy_process() in copy_process() down
to the succesfull return. We do not care if copy_process() fails,
uprobe_free_utask() won't be called in this case so the wrong
->utask != NULL doesn't matter.
OTOH, with this change we know that copy_process() can't fail when
uprobe_copy_process() is called, the new task should either return
to user-mode or call do_exit(). This way uprobe_copy_process() can:
1. setup p->utask != NULL if necessary
2. setup uprobes_state.xol_area
3. use task_work_add(p)
Also, move the definition of uprobe_copy_process() down so that it
can see get_utask().
Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Acked-by: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
Currently we only optimize the context switch between two
contexts that have the same parent; this forgoes the
optimization between parent and child context, even though these
contexts could be equivalent too.
Signed-off-by: Peter Zijlstra <peterz@infradead.org>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Shishkin, Alexander <alexander.shishkin@intel.com>
Link: http://lkml.kernel.org/r/20131007164257.GH3081@twins.programming.kicks-ass.net
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Oleg complained about the excessive 0-ing in perf_event_mmap_event(),
so try and be smarter about it while keeping it fairly fool proof and
avoid leaking random bits out to userspace.
Suggested-by: Oleg Nesterov <oleg@redhat.com>
Signed-off-by: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/n/tip-8jirlm99m6if2z13wd6rbyu6@git.kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
perf_event_mmap_event() does kzalloc(PATH_MAX + sizeof(u64)) to
ensure we can align the size later. However this means that we
actually allocate PAGE_SIZE * 2 buffer, seems too much.
Change this code to allocate PATH_MAX==PAGE_SIZE bytes, but tell
d_path() to not use the last sizeof(u64) bytes.
Note: it is not clear why do we need __GFP_ZERO, see the next patch.
Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Signed-off-by: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/20131016201004.GC23214@redhat.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
1. perf_event_mmap(vma) is never called with a gate_vma-like arg,
remove the "if (!vma->vm_mm)" code.
2. arch_vma_name() can use the chached value of mmap_event->vma.
3. Change the code to not call arch_vma_name() twice.
4. Purely cosmetic, but since we use "goto got_name" all the time
remove "else" from "[stack]" branch just for symmetry.
Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Signed-off-by: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/20131016200945.GB23214@redhat.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
throttle_cfs_rq() doesn't check to make sure that period_timer is running,
and while update_curr/assign_cfs_runtime does, a concurrently running
period_timer on another cpu could cancel itself between this cpu's
update_curr and throttle_cfs_rq(). If there are no other cfs_rqs running
in the tg to restart the timer, this causes the cfs_rq to be stranded
forever.
Fix this by calling __start_cfs_bandwidth() in throttle if the timer is
inactive.
(Also add some sched_debug lines for cfs_bandwidth.)
Tested: make a run/sleep task in a cgroup, loop switching the cgroup
between 1ms/100ms quota and unlimited, checking for timer_active=0 and
throttled=1 as a failure. With the throttle_cfs_rq() change commented out
this fails, with the full patch it passes.
Signed-off-by: Ben Segall <bsegall@google.com>
Signed-off-by: Peter Zijlstra <peterz@infradead.org>
Cc: pjt@google.com
Link: http://lkml.kernel.org/r/20131016181632.22647.84174.stgit@sword-of-the-dawn.mtv.corp.google.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Currently, group entity load-weights are initialized to zero. This
admits some races with respect to the first time they are re-weighted in
earlty use. ( Let g[x] denote the se for "g" on cpu "x". )
Suppose that we have root->a and that a enters a throttled state,
immediately followed by a[0]->t1 (the only task running on cpu[0])
blocking:
put_prev_task(group_cfs_rq(a[0]), t1)
put_prev_entity(..., t1)
check_cfs_rq_runtime(group_cfs_rq(a[0]))
throttle_cfs_rq(group_cfs_rq(a[0]))
Then, before unthrottling occurs, let a[0]->b[0]->t2 wake for the first
time:
enqueue_task_fair(rq[0], t2)
enqueue_entity(group_cfs_rq(b[0]), t2)
enqueue_entity_load_avg(group_cfs_rq(b[0]), t2)
account_entity_enqueue(group_cfs_ra(b[0]), t2)
update_cfs_shares(group_cfs_rq(b[0]))
< skipped because b is part of a throttled hierarchy >
enqueue_entity(group_cfs_rq(a[0]), b[0])
...
We now have b[0] enqueued, yet group_cfs_rq(a[0])->load.weight == 0
which violates invariants in several code-paths. Eliminate the
possibility of this by initializing group entity weight.
Signed-off-by: Paul Turner <pjt@google.com>
Signed-off-by: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/20131016181627.22647.47543.stgit@sword-of-the-dawn.mtv.corp.google.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
__start_cfs_bandwidth calls hrtimer_cancel while holding rq->lock,
waiting for the hrtimer to finish. However, if sched_cfs_period_timer
runs for another loop iteration, the hrtimer can attempt to take
rq->lock, resulting in deadlock.
Fix this by ensuring that cfs_b->timer_active is cleared only if the
_latest_ call to do_sched_cfs_period_timer is returning as idle. Then
__start_cfs_bandwidth can just call hrtimer_try_to_cancel and wait for
that to succeed or timer_active == 1.
Signed-off-by: Ben Segall <bsegall@google.com>
Signed-off-by: Peter Zijlstra <peterz@infradead.org>
Cc: pjt@google.com
Link: http://lkml.kernel.org/r/20131016181622.22647.16643.stgit@sword-of-the-dawn.mtv.corp.google.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
hrtimer_expires_remaining does not take internal hrtimer locks and thus
must be guarded against concurrent __hrtimer_start_range_ns (but
returning HRTIMER_RESTART is safe). Use cfs_b->lock to make it safe.
Signed-off-by: Ben Segall <bsegall@google.com>
Signed-off-by: Peter Zijlstra <peterz@infradead.org>
Cc: pjt@google.com
Link: http://lkml.kernel.org/r/20131016181617.22647.73829.stgit@sword-of-the-dawn.mtv.corp.google.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
When we transition cfs_bandwidth_used to false, any currently
throttled groups will incorrectly return false from cfs_rq_throttled.
While tg_set_cfs_bandwidth will unthrottle them eventually, currently
running code (including at least dequeue_task_fair and
distribute_cfs_runtime) will cause errors.
Fix this by turning off cfs_bandwidth_used only after unthrottling all
cfs_rqs.
Tested: toggle bandwidth back and forth on a loaded cgroup. Caused
crashes in minutes without the patch, hasn't crashed with it.
Signed-off-by: Ben Segall <bsegall@google.com>
Signed-off-by: Peter Zijlstra <peterz@infradead.org>
Cc: pjt@google.com
Link: http://lkml.kernel.org/r/20131016181611.22647.80365.stgit@sword-of-the-dawn.mtv.corp.google.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
The PPC64 people noticed a missing memory barrier and crufty old
comments in the perf ring buffer code. So update all the comments and
add the missing barrier.
When the architecture implements local_t using atomic_long_t there
will be double barriers issued; but short of introducing more
conditional barrier primitives this is the best we can do.
Reported-by: Victor Kaplansky <victork@il.ibm.com>
Tested-by: Victor Kaplansky <victork@il.ibm.com>
Signed-off-by: Peter Zijlstra <peterz@infradead.org>
Cc: Mathieu Desnoyers <mathieu.desnoyers@polymtl.ca>
Cc: michael@ellerman.id.au
Cc: Paul McKenney <paulmck@linux.vnet.ibm.com>
Cc: Michael Neuling <mikey@neuling.org>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: anton@samba.org
Cc: benh@kernel.crashing.org
Link: http://lkml.kernel.org/r/20131025173749.GG19466@laptop.lan
Signed-off-by: Ingo Molnar <mingo@kernel.org>
In commit ee23871389 ("genirq: Set irq thread to RT priority on
creation") we moved the assigment of the thread's priority from the
thread's function into __setup_irq(). That function may run in user
context for instance if the user opens an UART node and then driver
calls requests in the ->open() callback. That user may not have
CAP_SYS_NICE and so the irq thread won't run with the SCHED_OTHER
policy.
This patch uses sched_setscheduler_nocheck() so we omit the CAP_SYS_NICE
check which is otherwise required for the SCHED_OTHER policy.
[bigeasy: Rewrite the changelog]
Signed-off-by: Thomas Pfaff <tpfaff@pcs.com>
Cc: Ivo Sieben <meltedpianoman@gmail.com>
Cc: stable@vger.kernel.org
Link: http://lkml.kernel.org/r/1381489240-29626-1-git-send-email-bigeasy@linutronix.de
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Pull timer fix from Ingo Molnar:
"This tree contains a clockevents regression fix for certain ARM
subarchitectures"
* 'timers-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
clockevents: Sanitize ticks to nsec conversion
Pull perf fixes from Ingo Molnar:
"The tree contains three fixes:
- Two tooling fixes
- Reversal of the new 'MMAP2' extended mmap record ABI, introduced in
this merge window. (Patches were proposed to fix it but it was all
a bit late and we felt it's safer to just delay the ABI one more
kernel release and do it right)"
* 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
perf: Disable PERF_RECORD_MMAP2 support
perf scripting perl: Fix build error on Fedora 12
perf probe: Fix to initialize fname always before use it
Pull locking fix from Ingo Molnar:
"This tree fixes a boot crash in CONFIG_DEBUG_MUTEXES=y kernels, on
kernels built with GCC 3.x (there are still such distros)"
Side note: it's not just a fix for old gcc versions, it's also removing
an incredibly broken/subtle check that LLVM had issues with, and that
made no sense.
* 'core-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
mutex: Avoid gcc version dependent __builtin_constant_p() usage