11754 Commits

Author SHA1 Message Date
john stultz
e05b2efb82 clocksource: Install completely before selecting
Christian Hoffmann reported that the command line clocksource override
with acpi_pm timer fails:

 Kernel command line: <SNIP> clocksource=acpi_pm
 hpet clockevent registered
 Switching to clocksource hpet
 Override clocksource acpi_pm is not HRT compatible.
 Cannot switch while in HRT/NOHZ mode.

The watchdog code is what enables CLOCK_SOURCE_VALID_FOR_HRES, but we
actually end up selecting the clocksource before we enqueue it into
the watchdog list, so that's why we see the warning and fail to switch
to acpi_pm timer as requested. That's particularly bad when we want to
debug timekeeping related problems in early boot.

Put the selection call last.

Reported-by: Christian Hoffmann <email@christianhoffmann.info>
Signed-off-by: John Stultz <johnstul@us.ibm.com>
Cc: stable@kernel.org # 32...
Link: http://lkml.kernel.org/r/%3C1304558210.2943.24.camel%40work-vm%3E
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2011-05-05 15:23:26 +02:00
Ingo Molnar
98bb318864 Merge branch 'perf/urgent' of git://git.kernel.org/pub/scm/linux/kernel/git/frederic/random-tracing into perf/urgent 2011-05-04 20:33:42 +02:00
Vladimir Davydov
931aeeda0d sched: Remove unused 'this_best_prio arg' from balance_tasks()
It's passed across multiple functions but is never really used, so
remove it.

Signed-off-by: Vladimir Davydov <vdavydov@parallels.com>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Link: http://lkml.kernel.org/r/1304447467-29200-1-git-send-email-vdavydov@parallels.com
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2011-05-04 09:07:21 +02:00
Ingo Molnar
e7e7ee2eab perf events: Clean up definitions and initializers, update copyrights
Fix a few inconsistent style bits that were added over the past few
months.

Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Link: http://lkml.kernel.org/n/tip-yv4hwf9yhnzoada8pcpb3a97@git.kernel.org
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2011-05-04 08:49:24 +02:00
Thomas Gleixner
179eb03268 alarmtimer: Drop device refcount after rtc_open()
class_find_device() takes a refcount on the rtc device. rtc_open()
takes another one, so we can drop it after the rtc_open() call.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: John Stultz <john.stultz@linaro.org>
2011-05-04 08:18:34 +02:00
Thomas Gleixner
ce788f930b alarmtimer: Check return value of class_find_device()
alarmtimer_late_init() uses class_find_device() to find a alarm
capable rtc device. The match callback stores a pointer to the name in
the char pointer handed in from the call site. alarmtimer_late_init()
checks the char pointer for NULL, but the pointer is on the stack and
not initialized to NULL before the call. So it can have random content
when the match function did not identify a device, which leads to
random access in the following rtc_open() call where the pointer is
dereferenced

Instead of relying on the char pointer, check the return value of
class_find_device. If a device is found then the name pointer is valid
as well.

Reported-by: Ingo Molnar <mingo@elte.hu>
Cc: John Stultz <john.stultz@linaro.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2011-05-04 08:18:17 +02:00
Borislav Petkov
48dbb6dc86 hw breakpoints: Move to kernel/events/
As part of the events sybsystem unification, relocate hw_breakpoint.c
into its new destination.

Cc: Frederic Weisbecker <fweisbec@gmail.com>
Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
2011-05-03 15:26:43 +02:00
Borislav Petkov
fae85b7c8b perf: Start the restructuring
mv kernel/perf_event.c -> kernel/events/core.c. From there, all further
sensible splitting can happen. The idea is that due to perf_event.c
becoming pretty sizable and with the advent of the marriage with ftrace,
splitting functionality into its logical parts should help speeding up
the unification and to manage the complexity of the subsystem.

Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
2011-05-03 12:59:43 +02:00
Thomas Gleixner
99ee5315da timerfd: Allow timers to be cancelled when clock was set
Some applications must be aware of clock realtime being set
backward. A simple example is a clock applet which arms a timer for
the next minute display. If clock realtime is set backward then the
applet displays a stale time for the amount of time which the clock
was set backwards. Due to that applications poll the time because we
don't have an interface.

Extend the timerfd interface by adding a flag which puts the timer
onto a different internal realtime clock. All timers on this clock are
expired whenever the clock was set.

The timerfd core records the monotonic offset when the timer is
created. When the timer is armed, then the current offset is compared
to the previous recorded offset. When it has changed, then
timerfd_settime returns -ECANCELED. When a timer is read the offset is
compared and if it changed -ECANCELED returned to user space. Periodic
timers are not rearmed in the cancelation case.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: John Stultz <johnstul@us.ibm.com>
Cc: Chris Friesen <chris.friesen@genband.com>
Tested-by: Kay Sievers <kay.sievers@vrfy.org>
Cc: "Kirill A. Shutemov" <kirill@shutemov.name>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Davide Libenzi <davidel@xmailserver.org>
Reviewed-by: Alexander Shishkin <virtuoso@slind.org>
Link: http://lkml.kernel.org/r/%3Calpine.LFD.2.02.1104271359580.3323%40ionos%3E
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2011-05-02 21:39:15 +02:00
Thomas Gleixner
b12a03ce48 hrtimers: Prepare for cancel on clock was set timers
Make clock_was_set() unconditional and rename hres_timers_resume to
hrtimers_resume. This is a preparatory patch for hrtimers which are
cancelled when clock realtime was set.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2011-05-02 21:37:58 +02:00
Mike Frysinger
942c3c5c32 hrtimer: Make lookup table const
Signed-off-by: Mike Frysinger <vapier@gentoo.org>
Link: http://lkml.kernel.org/r/%3C1304364267-14489-1-git-send-email-vapier%40gentoo.org%3E
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2011-05-02 21:37:57 +02:00
Thomas Gleixner
3687a2c0d8 Merge branch 'linus' into timers/core
Reason: Pick up the hrtimer_clock_to_base_table fix from mainline

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2011-05-02 21:37:08 +02:00
John Stultz
472647dcd7 timers: Fix alarmtimer build issues when CONFIG_RTC_CLASS=n
Ingo pointed out that the alarmtimers won't build if CONFIG_RTC_CLASS=n.
This patch adds proper ifdefs to the alarmtimer code to disable the rtc
usage if it is not built in.

Reported-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: John Stultz <john.stultz@linaro.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2011-05-02 21:36:57 +02:00
Geert Uytterhoeven
94b2c363dc genirq: Fix typo CONFIG_GENIRC_IRQ_SHOW_LEVEL
commit ab7798ffcf98b11a9525cf65bacdae3fd58d357f ("genirq: Expand generic
show_interrupts()") added the Kconfig option GENERIC_IRQ_SHOW_LEVEL to
accomodate PowerPC, but this doesn't actually enable the functionality due
to a typo in the #ifdef check.

Signed-off-by: Geert Uytterhoeven <geert@linux-m68k.org>
Cc: Linux/PPC Development <linuxppc-dev@lists.ozlabs.org>
Link: http://lkml.kernel.org/r/%3Calpine.DEB.2.00.1104302251370.19068%40ayla.of.borg%3E
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2011-05-02 21:16:37 +02:00
Thomas Gleixner
c42321c76b genirq: Make generic irq chip depend on CONFIG_GENERIC_IRQ_CHIP
Only compile it in when there are users.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-arm-kernel@lists.infradead.org
2011-05-02 18:16:22 +02:00
Ingo Molnar
ac0a3260f3 Merge branch 'tip/perf/core' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-2.6-trace into perf/core 2011-05-01 19:11:42 +02:00
Ingo Molnar
809435ff4f Merge branch 'tip/perf/urgent' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-2.6-trace into perf/core 2011-05-01 19:09:39 +02:00
Linus Torvalds
3fd9952df4 Merge branch 'fixes-2.6.39' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/wq
* 'fixes-2.6.39' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/wq:
  workqueue: fix deadlock in worker_maybe_bind_and_lock()
  workqueue: Document debugging tricks

Fix up trivial spelling conflict in kernel/workqueue.c
2011-04-30 09:15:40 -07:00
Steven Rostedt
b9df92d2a9 ftrace: Consolidate the function match routines for normal and mods
The code used for matching functions is almost identical between normal
selecting of functions and using the :mod: feature of set_ftrace_notrace.

Consolidate the two users into one function.

Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2011-04-29 22:53:14 -04:00
Steven Rostedt
491d0dcfb9 ftrace: Consolidate updating of ftrace_trace_function
There are three locations that perform almost identical functions in order
to update the ftrace_trace_function (the ftrace function variable that gets
called by mcount).

Consolidate these into a single function called update_ftrace_function().

Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2011-04-29 22:53:11 -04:00
Steven Rostedt
996e87be7f ftrace: Move record update for normal and modules into a separate function
The updating of a function record is moved to a single function. This will allow
us to add specific changes in one location for both modules and kernel
functions.

Later patches will determine if the function record itself needs to be updated
(which enables the mcount caller), or just the ftrace_ops needs the update.

Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2011-04-29 22:53:08 -04:00
Steven Rostedt
d2c8c3eafb ftrace: Remove FTRACE_FL_CONVERTED flag
Since we disable all function tracer processing if we detect
that a modification of a instruction had failed, we do not need
to track that the record has failed. No more ftrace processing
is allowed, and the FTRACE_FL_CONVERTED flag is pointless.

The FTRACE_FL_CONVERTED flag was used to denote records that were
successfully converted from mcount calls into nops. But if a single
record fails, all of ftrace is disabled.

Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2011-04-29 22:53:04 -04:00
Steven Rostedt
45a4a2372b ftrace: Remove FTRACE_FL_FAILED flag
Since we disable all function tracer processing if we detect
that a modification of a instruction had failed, we do not need
to track that the record has failed. No more ftrace processing
is allowed, and the FTRACE_FL_FAILED flag is pointless.

Removing this flag simplifies some of the code, but some ftrace_disabled
checks needed to be added or move around a little.

Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2011-04-29 22:53:01 -04:00
Steven Rostedt
3499e46114 ftrace: Remove failures file
The failures file in the debugfs tracing directory would list the
functions that failed to convert when the old dead ftrace daemon
tried to update code but failed. Since this code is now dead along
with the daemon the failures file is useless. Remove it.

Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2011-04-29 22:52:58 -04:00
Steven Rostedt
8ab2b7efd3 ftrace: Remove unnecessary disabling of irqs
The disabling of interrupts around ftrace_update_code() was used
to protect against the evil ftrace daemon from years past. But that
daemon has long been killed. It is safe to keep interrupts enabled
while updating the initial mcount into nops.

The ftrace_mutex is also held which keeps other users at bay.

Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2011-04-29 22:52:55 -04:00
Steven Rostedt
0778d9ad33 ftrace: Make FTRACE_WARN_ON() work in if condition
Let FTRACE_WARN_ON() be used as a stand alone statement or
inside a conditional: if (FTRACE_WARN_ON(x))

Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2011-04-29 22:52:52 -04:00
Steven Rostedt
058e297d34 ftrace: Only update the function code on write to filter files
If function tracing is enabled, a read of the filter files will
cause the call to stop_machine to update the function trace sites.
It should only call stop_machine on write.

Cc: stable@kernel.org
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2011-04-29 22:42:59 -04:00
Rafael J. Wysocki
85eb8c8d0b PM / Runtime: Generic clock manipulation rountines for runtime PM (v6)
Many different platforms and subsystems may want to disable device
clocks during suspend and enable them during resume which is going to
be done in a very similar way in all those cases.  For this reason,
provide generic routines for the manipulation of device clocks during
suspend and resume.

Convert the ARM shmobile platform to using the new routines.

Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
2011-04-30 00:25:44 +02:00
Linus Torvalds
40a963502c Merge branch 'perf-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip
* 'perf-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
  perf, x86, nmi: Move LVT un-masking into irq handlers
  perf events, x86: Work around the Nehalem AAJ80 erratum
  perf, x86: Fix BTS condition
  ftrace: Build without frame pointers on Microblaze
2011-04-29 15:08:53 -07:00
Linus Torvalds
fcc4dc7151 Merge branch 'timer-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip
* 'timer-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
  hrtimer: Initialize CLOCK_ID to HRTIMER_BASE table statically
  rtc: max8925: Call dev_set_drvdata before rtc_device_register
2011-04-29 15:08:31 -07:00
Tejun Heo
5035b20fa5 workqueue: fix deadlock in worker_maybe_bind_and_lock()
If a rescuer and stop_machine() bringing down a CPU race with each
other, they may deadlock on non-preemptive kernel.  The CPU won't
accept a new task, so the rescuer can't migrate to the target CPU,
while stop_machine() can't proceed because the rescuer is holding one
of the CPU retrying migration.  GCWQ_DISASSOCIATED is never cleared
and worker_maybe_bind_and_lock() retries indefinitely.

This problem can be reproduced semi reliably while the system is
entering suspend.

 http://thread.gmane.org/gmane.linux.kernel/1122051

A lot of kudos to Thilo-Alexander for reporting this tricky issue and
painstaking testing.

stable: This affects all kernels with cmwq, so all kernels since and
        including v2.6.36 need this fix.

Signed-off-by: Tejun Heo <tj@kernel.org>
Reported-by: Thilo-Alexander Ginkel <thilo@ginkel.com>
Tested-by: Thilo-Alexander Ginkel <thilo@ginkel.com>
Cc: stable@kernel.org
2011-04-29 18:08:37 +02:00
Thomas Gleixner
ce31332d3c hrtimer: Initialize CLOCK_ID to HRTIMER_BASE table statically
Sedat and Bruno reported RCU stalls which turned out to be caused by
the following;

sched_init() calls init_rt_bandwidth() which calls hrtimer_init()
_BEFORE_ hrtimers_init() is called. While not entirely correct this
worked because hrtimer_init() only accessed statically initialized
data (hrtimer_bases.clock_base[CLOCK_MONOTONIC])

Commit e06383db9 (hrtimers: extend hrtimer base code to handle more
then 2 clockids) added an indirection to the hrtimer_bases.clock_base
lookup to avoid gap handling in the hot path. The table which is used
for the translataion from CLOCK_ID to HRTIMER_BASE index is
initialized at runtime in hrtimers_init(). So the early call of the
scheduler code translates CLOCK_MONOTONIC to HRTIMER_BASE_REALTIME.

Thus the rt_bandwith timer ends up on CLOCK_REALTIME. If the timer is
armed and the wall clock time is set (e.g. ntpdate in the early boot
process - which also gives the problem deterministic behaviour
i.e. magic recovery after N hours), then the timer ends up with an
expiry time far into the future. That breaks the RT throttler
mechanism as rt runtime is accumulated and never cleared, so the rt
throttler detects a false cpu hog condition and blocks all RT tasks
until the timer finally expires. That in turn stalls the RCU thread of
TINYRCU which leads to an huge amount of RCU callbacks piling up.

Make the translation table statically initialized, so we are back to
the status of <= 2.6.39.

Reported-and-tested-by: Sedat Dilek <sedat.dilek@gmail.com>
Reported-by: Bruno Prémont <bonbons@linux-vserver.org>
Cc: John stultz <johnstul@us.ibm.com>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/%3Calpine.LFD.2.02.1104282353140.3005%40ionos%3E
Reviewed-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2011-04-29 10:57:11 +02:00
John Stultz
7068b7a162 timers: Remove delayed irqwork from alarmtimers implementation
Thomas asked about the delayed irq work in the alarmtimers code,
and I realized that it was a legacy from when the alarmtimer base
lock was a mutex (due to concerns that we'd be interacting with
the RTC device, which is protected by mutexes).

Since the alarmtimer base is now protected by a spinlock, we can
simply execute alarmtimer functions directly from the hrtimer
callback. Should any future alarmtimer functions sleep, they can
simply manage scheduling any delayed work themselves.

CC: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: John Stultz <john.stultz@linaro.org>
2011-04-28 13:39:18 -07:00
John Stultz
180bf812ce timers: Improve alarmtimer comments and minor fixes
This patch addresses a number of minor comment improvements and
other minor issues from Thomas' review of the alarmtimers code.

CC: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: John Stultz <john.stultz@linaro.org>
2011-04-28 13:39:17 -07:00
Hillf Danton
1409f141ac kernel/watchdog.c: disable nmi perf event in the error path of enabling watchdog
In corner cases where softlockup watchdog is not setup successfully, the
relevant nmi perf event for hardlockup watchdog could be disabled, then
the status of the underlying hardware remains unchanged.

Also, if the kthread doesn't start then the hrtimer won't run and the
hardlockup detector will falsely fire.

Signed-off-by: Hillf Danton <dhillf@gmail.com>
Signed-off-by: Don Zickus <dzickus@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2011-04-28 11:28:21 -07:00
Oleg Nesterov
b013c39924 signal: cleanup sys_sigprocmask()
Cleanup. Remove the unneeded goto's, we can simply read blocked.sig[0]
unconditionally and then copy-to-user it if oset != NULL.

Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Acked-by: Tejun Heo <tj@kernel.org>
Reviewed-by: Matt Fleming <matt.fleming@linux.intel.com>
2011-04-28 13:01:40 +02:00
Oleg Nesterov
702a5073fd signal: rename signandsets() to sigandnsets()
As Tejun and Linus pointed out, "nand" is the wrong name for "x & ~y",
it should be "andn". Rename signandsets() as suggested.

Suggested-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Acked-by: Tejun Heo <tj@kernel.org>
2011-04-28 13:01:39 +02:00
Oleg Nesterov
b182801ab3 signal: do_sigtimedwait() needs retarget_shared_pending()
do_sigtimedwait() changes current->blocked and thus it needs
set_current_blocked()->retarget_shared_pending().

We could use set_current_blocked() directly. It is fine to change
->real_blocked from all-zeroes to ->blocked and vice versa lockless,
but this is not immediately clear, looks racy, and needs a huge
comment to explain why this is correct.

To keep the things simple this patch adds the new static helper,
__set_task_blocked() which should be called with ->siglock held. This
way we can change both ->real_blocked and ->blocked atomically under
->siglock as the current code does. This is more understandable.

Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Acked-by: Tejun Heo <tj@kernel.org>
Reviewed-by: Matt Fleming <matt.fleming@linux.intel.com>
2011-04-28 13:01:39 +02:00
Oleg Nesterov
943df1485a signal: introduce do_sigtimedwait() to factor out compat/native code
Factor out the common code in sys_rt_sigtimedwait/compat_sys_rt_sigtimedwait
to the new helper, do_sigtimedwait().

Add the comment to document the extra tick we add to timespec_to_jiffies(ts),
thanks to Linus who explained this to me.

Perhaps it would be better to move compat_sys_rt_sigtimedwait() into
signal.c under CONFIG_COMPAT, then we can make do_sigtimedwait() static.

Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Acked-by: Tejun Heo <tj@kernel.org>
Reviewed-by: Matt Fleming <matt.fleming@linux.intel.com>
2011-04-28 13:01:38 +02:00
Oleg Nesterov
fe0faa005d signal: sys_rt_sigtimedwait: simplify the timeout logic
No functional changes, cleanup compat_sys_rt_sigtimedwait() and
sys_rt_sigtimedwait().

Calculate the timeout before we take ->siglock, this simplifies and
lessens the code. Use timespec_valid() to check the timespec.

Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Acked-by: Tejun Heo <tj@kernel.org>
Reviewed-by: Matt Fleming <matt.fleming@linux.intel.com>
2011-04-28 13:01:38 +02:00
Oleg Nesterov
bb7efee2ca signal: cleanup sys_rt_sigprocmask()
sys_rt_sigprocmask() looks unnecessarily complicated, simplify it.
We can just read current->blocked lockless unconditionally before
anything else and then copy-to-user it if needed.  At worst we
copy 4 words on mips.

We could copy-to-user the old mask first and simplify the code even
more, but the patch tries to keep the current behaviour: we change
current->block even if copy_to_user(oset) fails.

Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Reviewed-by: Matt Fleming <matt.fleming@linux.intel.com>
Acked-by: Tejun Heo <tj@kernel.org>
2011-04-28 13:01:38 +02:00
Oleg Nesterov
e6fa16ab9c signal: sigprocmask() should do retarget_shared_pending()
In short, almost every changing of current->blocked is wrong, or at least
can lead to the unexpected results.

For example. Two threads T1 and T2, T1 sleeps in sigtimedwait/pause/etc.
kill(tgid, SIG) can pick T2 for TIF_SIGPENDING. If T2 calls sigprocmask()
and blocks SIG before it notices the pending signal, nobody else can handle
this pending shared signal.

I am not sure this is bug, but at least this looks strange imho. T1 should
not sleep forever, there is a signal which should wake it up.

This patch moves the code which actually changes ->blocked into the new
helper, set_current_blocked() and changes this code to call
retarget_shared_pending() as exit_signals() does. We should only care about
the signals we just blocked, we use "newset & ~current->blocked" as a mask.

We do not check !sigisemptyset(newblocked), retarget_shared_pending() is
cheap unless mask & shared_pending.

Note: for this particular case we could simply change sigprocmask() to
return -EINTR if signal_pending(), but then we should change other callers
and, more importantly, if we need this fix then set_current_blocked() will
have more callers and some of them can't restart. See the next patch as a
random example.

Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Reviewed-by: Matt Fleming <matt.fleming@linux.intel.com>
Acked-by: Tejun Heo <tj@kernel.org>
2011-04-28 13:01:37 +02:00
Oleg Nesterov
73ef4aeb61 signal: sigprocmask: narrow the scope of ->siglock
No functional changes, preparation to simplify the review of the next change.

1. We can read current->block lockless, nobody else can ever change this mask.

2. Calculate the resulting sigset_t outside of ->siglock into the temporary
   variable, then take ->siglock and change ->blocked.

Also, kill the stale comment about BKL.

Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Reviewed-by: Matt Fleming <matt.fleming@linux.intel.com>
Acked-by: Tejun Heo <tj@kernel.org>
2011-04-28 13:01:36 +02:00
Oleg Nesterov
fec9993db0 signal: retarget_shared_pending: optimize while_each_thread() loop
retarget_shared_pending() blindly does recalc_sigpending_and_wake() for
every sub-thread, this is suboptimal. We can check t->blocked and stop
looping once every bit in shared_pending has the new target.

Note: we do not take task_is_stopped_or_traced(t) into account, we are
not trying to speed up the signal delivery or to avoid the unnecessary
(but harmless) signal_wake_up(0) in this unlikely case.

Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Reviewed-by: Matt Fleming <matt.fleming@linux.intel.com>
Acked-by: Tejun Heo <tj@kernel.org>
2011-04-28 13:01:35 +02:00
Oleg Nesterov
f646e227b8 signal: retarget_shared_pending: consider shared/unblocked signals only
exit_signals() checks signal_pending() before retarget_shared_pending() but
this is suboptimal. We can avoid the while_each_thread() loop in case when
there are no shared signals visible to us.

Add the "shared_pending.signal & ~blocked" check. We don't use tsk->blocked
directly but pass ~blocked as an argument, this is needed for the next patch.

Note: we can optimize this more. while_each_thread(t) can check t->blocked
into account and stop after every pending signal has the new target, see the
next patch.

Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Reviewed-by: Matt Fleming <matt.fleming@linux.intel.com>
Acked-by: Tejun Heo <tj@kernel.org>
2011-04-28 13:01:35 +02:00
Oleg Nesterov
0edceb7bcd signal: introduce retarget_shared_pending()
No functional changes. Move the notify-other-threads code from exit_signals()
to the new helper, retarget_shared_pending().

Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Reviewed-by: Matt Fleming <matt.fleming@linux.intel.com>
Acked-by: Tejun Heo <tj@kernel.org>
2011-04-28 13:01:35 +02:00
Jeff Mahoney
e11feaa119 watchdog, hung_task_timeout: Add Kconfig configurable default
This patch allows the default value for sysctl_hung_task_timeout_secs
to be set at build time. The feature carries virtually no overhead,
so it makes sense to keep it enabled. On heavily loaded systems, though,
it can end up triggering stack traces when there is no bug other than
the system being underprovisioned. We use this patch to keep the hung task
facility available but disabled at boot-time.

The default of 120 seconds is preserved. As a note, commit e162b39a may
have accidentally reverted commit fb822db4, which raised the default from
120 seconds to 480 seconds.

Signed-off-by: Jeff Mahoney <jeffm@suse.com>
Acked-by: Mandeep Singh Baines <msb@google.com>
Link: http://lkml.kernel.org/r/4DB8600C.8080000@suse.com
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2011-04-28 09:13:17 +02:00
Tony Jones
f562988350 audit: acquire creds selectively to reduce atomic op overhead
Commit c69e8d9c01db ("CRED: Use RCU to access another task's creds and to
release a task's own creds") added calls to get_task_cred and put_cred in
audit_filter_rules.  Profiling with a large number of audit rules active
on the exit chain shows that we are spending upto 48% in this routine for
syscall intensive tests, most of which is in the atomic ops.

1. The code should be accessing tsk->cred rather than tsk->real_cred.
2. Since tsk is current (or tsk is being created by copy_process) access to
tsk->cred without rcu read lock is possible.  At the request of the audit
maintainer, a new flag has been added to audit_filter_rules in order to make
this explicit and guide future code.

Signed-off-by: Tony Jones <tonyj@suse.de>
Acked-by: Eric Paris <eparis@redhat.com>
Signed-off-by: Jiri Kosina <jkosina@suse.cz>
2011-04-27 15:11:03 +02:00
Ingo Molnar
32673822e4 Merge branch 'tip/perf/core' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-2.6-trace into perf/core
Conflicts:
	include/linux/perf_event.h

Merge reason: pick up the latest jump-label enhancements, they are cooked ready.

Signed-off-by: Ingo Molnar <mingo@elte.hu>
2011-04-27 10:40:21 +02:00
Ingo Molnar
6c8a721327 Merge branch 'tip/perf/urgent' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-2.6-trace into perf/urgent 2011-04-27 10:31:29 +02:00