Commit Graph

221 Commits

Author SHA1 Message Date
Paul Jackson
ca2f3daf77 [PATCH] undo partial cpu_exclusive sched domain disabling
The partial disabling of Dinakar's new facility to allow
cpu_exclusive cpusets to define dynamic sched domains
doesn't go far enough.  At the suggestion of Nick Piggin
and Dinakar, let us instead totally disable this facility
for 2.6.13, in order to avoid problems first reported
by John Hawkes (corrupt sched data structures and kernel oops).

This patch removes the partial disabling code in 2.6.13-rc7,
in anticipation of the next patch, which will totally disable
it instead.

Signed-off-by: Paul Jackson <pj@sgi.com>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-08-26 16:38:46 -07:00
Paul Jackson
3725822f7c [PATCH] cpu_exclusive sched domains build fix
As reported by Paul Mackerras <paulus@samba.org>, the previous patch
"cpu_exclusive sched domains fix" broke the ppc64 build with
CONFIC_CPUSET, yielding error messages:

kernel/cpuset.c: In function 'update_cpu_domains':
kernel/cpuset.c:648: error: invalid lvalue in unary '&'
kernel/cpuset.c:648: error: invalid lvalue in unary '&'

On some arch's, the node_to_cpumask() is a function, returning
a cpumask_t.  But the for_each_cpu_mask() requires an lvalue mask.

The following patch fixes this build failure by making a copy
of the cpumask_t on the stack.

Signed-off-by: Paul Jackson <pj@sgi.com>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-08-24 09:40:45 -07:00
Paul Jackson
d10689b68a [PATCH] cpu_exclusive sched domains on partial nodes temp fix
This keeps the kernel/cpuset.c routine update_cpu_domains() from
invoking the sched.c routine partition_sched_domains() if the cpuset in
question doesn't fall on node boundaries.

I have boot tested this on an SN2, and with the help of a couple of ad
hoc printk's, determined that it does indeed avoid calling the
partition_sched_domains() routine on partial nodes.

I did not directly verify that this avoids setting up bogus sched
domains or avoids the oops that Hawkes saw.

This patch imposes a silent artificial constraint on which cpusets can
be used to define dynamic sched domains.

This patch should allow proceeding with this new feature in 2.6.13 for
the configurations in which it is useful (node alligned sched domains)
while avoiding trying to setup sched domains in the less useful cases
that can cause the kernel corruption and oops.

Signed-off-by: Paul Jackson <pj@sgi.com>
Acked-by: Ingo Molnar <mingo@elte.hu>
Acked-by: Dinakar Guniguntala <dino@in.ibm.com>
Acked-by: John Hawkes <hawkes@sgi.com>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-08-23 20:02:52 -07:00
David Meybohm
4c5640cb5f [PATCH] preempt race in getppid
With CONFIG_PREEMPT && !CONFIG_SMP, it's possible for sys_getppid to
return a bogus value if the parent's task_struct gets reallocated after
current->group_leader->real_parent is read:

        asmlinkage long sys_getppid(void)
        {
                int pid;
                struct task_struct *me = current;
                struct task_struct *parent;

                parent = me->group_leader->real_parent;
RACE HERE =>    for (;;) {
                        pid = parent->tgid;
        #ifdef CONFIG_SMP
        {
                        struct task_struct *old = parent;

                        /*
                         * Make sure we read the pid before re-reading the
                         * parent pointer:
                         */
                        smp_rmb();
                        parent = me->group_leader->real_parent;
                        if (old != parent)
                                continue;
        }
        #endif
                        break;
                }
                return pid;
        }

If the process gets preempted at the indicated point, the parent process
can go ahead and call exit() and then get wait()'d on to reap its
task_struct. When the preempted process gets resumed, it will not do any
further checks of the parent pointer on !CONFIG_SMP: it will read the
bad pid and return.

So, the same algorithm used when SMP is enabled should be used when
preempt is enabled, which will recheck ->real_parent in this case.

Signed-off-by: David Meybohm <dmeybohmlkml@bellsouth.net>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-08-23 11:44:29 -07:00
Matt Mackall
024f474795 [PATCH] Make RLIMIT_NICE ranges consistent with getpriority(2)
As suggested by Michael Kerrisk <mtk-manpages@gmx.net>, make RLIMIT_NICE
consistent with getpriority before it becomes available in released glibc.

Signed-off-by: Matt Mackall <mpm@selenic.com>
Acked-by: Ingo Molnar <mingo@elte.hu>
Acked-by: Chris Wright <chrisw@osdl.org>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-08-18 12:53:58 -07:00
Bhavesh P. Davda
dd12f48d4e [PATCH] NPTL signal delivery deadlock fix
This bug is quite subtle and only happens in a very interesting
situation where a real-time threaded process is in the middle of a
coredump when someone whacks it with a SIGKILL.  However, this deadlock
leaves the system pretty hosed and you have to reboot to recover.

Not good for real-time priority-preemption applications like our
telephony application, with 90+ real-time (SCHED_FIFO and SCHED_RR)
processes, many of them multi-threaded, interacting with each other for
high volume call processing.

Acked-by: Roland McGrath <roland@redhat.com>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-08-17 12:52:04 -07:00
James Bottomley
6068674437 [PATCH] remove name length check in a workqueue
We have a chek in there to make sure that the name won't overflow
task_struct.comm[], but it's triggering for scsi with lots of HBAs, only
scsi is using single-threaded workqueues which don't append the "/%d"
anyway.

All too hard.  Just kill the BUG_ON.

Cc: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Andrew Morton <akpm@osdl.org>

[ kthread_create() uses vsnprintf() and limits the thing, so no
  actual overflow can actually happen regardless ]

Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-08-10 11:55:19 -07:00
Paul Jackson
3077a260e9 [PATCH] cpuset release ABBA deadlock fix
Fix possible cpuset_sem ABBA deadlock if 'notify_on_release' set.

For a particular usage pattern, creating and destroying cpusets fairly
frequently using notify_on_release, on a very large system, this deadlock
can be seen every few days.  If you are not using the cpuset
notify_on_release feature, you will never see this deadlock.

The existing code, on task exit (or cpuset deletion) did:

  get cpuset_sem
  if cpuset marked notify_on_release and is ready to release:
    compute cpuset path relative to /dev/cpuset mount point
    call_usermodehelper() forks /sbin/cpuset_release_agent with path
  drop cpuset_sem

Unfortunately, the fork in call_usermodehelper can allocate memory, and
allocating memory can require cpuset_sem, if the mems_generation values
changed in the interim.  This results in an ABBA deadlock, trying to obtain
cpuset_sem when it is already held by the current task.

To fix this, I put the cpuset path (which must be computed while holding
cpuset_sem) in a temporary buffer, to be used in the call_usermodehelper
call of /sbin/cpuset_release_agent only _after_ dropping cpuset_sem.

So the new logic is:

  get cpuset_sem
  if cpuset marked notify_on_release and is ready to release:
    compute cpuset path relative to /dev/cpuset mount point
    stash path in kmalloc'd buffer
  drop cpuset_sem
  call_usermodehelper() forks /sbin/cpuset_release_agent with path
  free path

The sharp eyed reader might notice that this patch does not contain any
calls to kmalloc.  The existing code in the check_for_release() routine was
already kmalloc'ing a buffer to hold the cpuset path.  In the old code, it
just held the buffer for a few lines, over the cpuset_release_agent() call
that in turn invoked call_usermodehelper().  In the new code, with the
application of this patch, it returns that buffer via the new char
**ppathbuf parameter, for later use and freeing in cpuset_release_agent(),
which is called after cpuset_sem is dropped.  Whereas the old code has just
one call to cpuset_release_agent(), right in the check_for_release()
routine, the new code has three calls to cpuset_release_agent(), from the
various places that a cpuset can be released.

This patch has been build and booted on SN2, and passed a stress test that
previously hit the deadlock within a few seconds.

Signed-off-by: Paul Jackson <pj@sgi.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-08-09 12:08:22 -07:00
Andrew Morton
c306895167 [PATCH] revert "timer exit cleanup"
Revert this June 17 patch: it broke persistence of timers across execve().

Cc: Roland McGrath <roland@redhat.com>
Cc: george anzinger <george@mvista.com>
Cc: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-08-04 16:57:49 -07:00
Benjamin Herrenschmidt
c36f19e02a [PATCH] Remove suspend() calls from shutdown path
This removes the calls to device_suspend() from the shutdown path that
were added sometime during 2.6.13-rc*.  They aren't working properly on
a number of configs (I got reports from both ppc powerbook users and x86
users) causing the system to not shutdown anymore.

I think it isn't the right approach at the moment anyway.  We have
already a shutdown() callback for the drivers that actually care about
shutdown and the suspend() code isn't yet in a good enough shape to be
so much generalized.  Also, the semantics of suspend and shutdown are
slightly different on a number of setups and the way this was patched in
provides little way for drivers to cleanly differenciate.  It should
have been at least a different message.

For 2.6.13, I think we should revert to 2.6.12 behaviour and have a
working suspend back.

Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-08-04 08:20:47 -07:00
Rusty Russell
842bbaaa73 [PATCH] Module per-cpu alignment cannot always be met
The module code assumes noone will ever ask for a per-cpu area more than
SMP_CACHE_BYTES aligned.  However, as these cases show, gcc asks sometimes
asks for 32-byte alignment for the per-cpu section on a module, and if
CONFIG_X86_L1_CACHE_SHIFT is 4, we hit that BUG_ON().  This is obviously an
unusual combination, as there have been few reports, but better to warn
than die.

See:
	http://www.ussg.iu.edu/hypermail/linux/kernel/0409.0/0768.html

And more recently:
	http://bugs.gentoo.org/show_bug.cgi?id=97006

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-08-01 21:38:01 -07:00
Ingo Molnar
6cb54819d7 [PATCH] remove sys_set_zone_reclaim()
This removes sys_set_zone_reclaim() for now.  While i'm sure Martin is
trying to solve a real problem, we must not hard-code an incomplete and
insufficient approach into a syscall, because syscalls are pretty much
for eternity.  I am quite strongly convinced that this syscall must not
hit v2.6.13 in its current form.

Firstly, the syscall lacks basic syscall design: e.g. it allows the
global setting of VM policy for unprivileged users. (!) [ Imagine an
Oracle installation and a SAP installation on the same NUMA box fighting
over the 'optimal' setting for this flag. What will they do? Will they
try to set the flag to their own preferred value every second or so? ]

Secondly, it was added based on a single datapoint from Martin:

 http://marc.theaimsgroup.com/?l=linux-mm&m=111763597218177&w=2

where Martin characterizes the numbers the following way:

 ' Run-to-run variability for "make -j" is huge, so these numbers aren't
   terribly useful except to see that with reclaim the benchmark still
   finishes in a reasonable amount of time. '

in other words: the fundamental problem has likely not been solved, only
a tendential move into the right direction has been observed, and a
handful of numbers were picked out of a set of hugely variable results,
without showing the variability data. How much variance is there
run-to-run?

I'd really suggest to first walk the walk and see what's needed to get
stable & predictable kernel compilation numbers on that NUMA box, before
adding random syscalls to tune a particular aspect of the VM ... which
approach might not even matter once the whole picture has been analyzed
and understood!

The third, most important point is that the syscall exposes VM tuning
internals in a completely unstructured way. What sense does it make to
have a _GLOBAL_ per-node setting for 'should we go to another node for
reclaim'? If then it might make sense to do this per-app, via numalib or
so.

The change is minimalistic in that it doesnt remove the syscall and the
underlying infrastructure changes, only the user-visible changes.  We
could perhaps add a CAP_SYS_ADMIN-only sysctl for this hack, a'ka
/proc/sys/vm/swappiness, but even that looks quite counterproductive
when the generic approach is that we are trying to reduce the number of
external factors in the VM balance picture.

Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-08-01 10:03:56 -07:00
Andrew Morton
c70f5d6610 [PATCH] revert bogus softirq changes
This snuck in with an x86_64 change.  Thanks to Richard Purdie
<rpurdie@rpsys.net> for spotting it.

Cc: Andi Kleen <ak@muc.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-07-30 10:49:59 -07:00
Eric W. Biederman
1108bae41e [PATCH] reboot: remove device_suspend(PMSG_FREEZE) from kernel_kexec
If device_suspend(PMSG_FREEZE) is not ready to be called in
kernel_restart it is definitely not ready to be called in the even more
fickle kernel_kexec.

Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-07-29 12:02:09 -07:00
George Anzinger
78fa74a23b [PATCH] posix timers: fix normalization problem
(We found this (after a customer complained) and it is in the kernel.org
kernel.  Seems that for CLOCK_MONOTONIC absolute timers and clock_nanosleep
calls both the request time and wall_to_monotonic are subtracted prior to
the normalize resulting in an overflow in the existing normalize test.
This causes the result to be shifted ~4 seconds ahead instead of ~2 seconds
back in time.)

The normalize code in posix-timers.c fails when the tv_nsec member is ~1.2
seconds negative.  This can happen on absolute timers (and
clock_nanosleeps) requested on CLOCK_MONOTONIC (both the request time and
wall_to_monotonic are subtracted resulting in the possibility of a number
close to -2 seconds.)

This fix uses the set_normalized_timespec() (which does not have an
overflow problem) to fix the problem and as a side effect makes the code
cleaner.

Signed-off-by: George Anzinger <george@mvista.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-07-28 21:46:05 -07:00
Andi Kleen
ed6b676ca8 [PATCH] x86_64: Switch to the interrupt stack when running a softirq in local_bh_enable()
This avoids some potential stack overflows with very deep softirq callchains.
i386 does this too.

TOADD CFI annotation

Signed-off-by: Andi Kleen <ak@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-07-28 21:46:02 -07:00
Andrew Morton
e4ff4d7f9d [PATCH] Avoid device suspend on reboot
My fairly ordinary x86 test box gets stuck during reboot on the
wait_for_completion() in ide_do_drive_cmd():

Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-07-27 16:46:37 -07:00
Jesper Juhl
77933d7276 [PATCH] clean up inline static vs static inline
`gcc -W' likes to complain if the static keyword is not at the beginning of
the declaration.  This patch fixes all remaining occurrences of "inline
static" up with "static inline" in the entire kernel tree (140 occurrences in
47 files).

While making this change I came across a few lines with trailing whitespace
that I also fixed up, I have also added or removed a blank line or two here
and there, but there are no functional changes in the patch.

Signed-off-by: Jesper Juhl <juhl-lkml@dif.dk>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-07-27 16:26:20 -07:00
Randy Dunlap
e77e17161c [PATCH] kernel/crash_dump.c: add kerneldoc
Add kerneldoc to kernel/crash_dump.c

Signed-off-by: Randy Dunlap <rdunlap@xenotime.net>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-07-27 16:26:06 -07:00
Randy Dunlap
d9fd8a6d44 [PATCH] kernel/cpuset.c: add kerneldoc, fix typos
Add kerneldoc to kernel/cpuset.c

Fix cpuset typos in init/Kconfig

Signed-off-by: Randy Dunlap <rdunlap@xenotime.net>
Acked-by: Paul Jackson <pj@sgi.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-07-27 16:26:06 -07:00
Randy Dunlap
207a7ba8dc [PATCH] kernel/capability.c: add kerneldoc
Add kerneldoc to kernel/capability.c

Signed-off-by: Randy Dunlap <rdunlap@xenotime.net>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-07-27 16:26:06 -07:00
Martin Schwidefsky
951f22d5b1 [PATCH] s390: spin lock retry
Split spin lock and r/w lock implementation into a single try which is done
inline and an out of line function that repeatedly tries to get the lock
before doing the cpu_relax().  Add a system control to set the number of
retries before a cpu is yielded.

The reason for the spin lock retry is that the diagnose 0x44 that is used to
give up the virtual cpu is quite expensive.  For spin locks that are held only
for a short period of time the costs of the diagnoses outweights the savings
for spin locks that are held for a longer timer.  The default retry count is
1000.

Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-07-27 16:26:04 -07:00
George Anzinger
d912d1ff21 [PATCH] itimer fixes
Fix the recent off-by-one fix in the itimer code:

1. The repeating timer is figured using the requested time
	(not +1 as we know where we are in the jiffie).

2. The tests for interval too large are left to the time_val to jiffie code.

Signed-off-by: George Anzinger <george@mvista.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-07-27 16:25:51 -07:00
Nigel Cunningham
bba0e4670a [PATCH] Address BUG: using smp_processor_id() in preemptible [00000001] code
This patch fixes a warning in the disable_nonboot_cpus call in
kernel/power/smp.c.

Signed-off by: Nigel Cunningham <nigel@suspend2.net>

Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-07-27 16:25:50 -07:00
Steven Rostedt
d46523ea32 [PATCH] fix MAX_USER_RT_PRIO and MAX_RT_PRIO
Here's the patch again to fix the code to handle if the values between
MAX_USER_RT_PRIO and MAX_RT_PRIO are different.

Without this patch, an SMP system will crash if the values are
different.

Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
Cc: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Dean Nelson <dcn@sgi.com>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-07-26 15:40:00 -07:00
Andreas Steinmetz
18586e7216 [PATCH] Fix RLIMIT_RTPRIO breakage
RLIMIT_RTPRIO is supposed to grant non privileged users the right to use
SCHED_FIFO/SCHED_RR scheduling policies with priorites bounded by the
RLIMIT_RTPRIO value via sched_setscheduler(). This is usually used by
audio users.

Unfortunately this is broken in 2.6.13rc3 as you can see in the excerpt
from sched_setscheduler below:

        /*
         * Allow unprivileged RT tasks to decrease priority:
         */
        if (!capable(CAP_SYS_NICE)) {
                /* can't change policy */
                if (policy != p->policy)
                        return -EPERM;

After the above unconditional test which causes sched_setscheduler to
fail with no regard to the RLIMIT_RTPRIO value the following check is made:

               /* can't increase priority */
                if (policy != SCHED_NORMAL &&
                    param->sched_priority > p->rt_priority &&
                    param->sched_priority >
                                p->signal->rlim[RLIMIT_RTPRIO].rlim_cur)
                        return -EPERM;

Thus I do believe that the RLIMIT_RTPRIO value must be taken into
account for the policy check, especially as the RLIMIT_RTPRIO limit is
of no use without this change.

The attached patch fixes this problem.

Signed-off-by: Andreas Steinmetz <ast@domdv.de>
Acked-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-07-26 15:30:51 -07:00
Eric W. Biederman
fdde86ac50 [PATCH] swpsuspend: Have suspend to disk use factors of sys_reboot
The suspend to disk code was a poor copy of the code in
sys_reboot now that we have kernel_power_off, kernel_restart
and kernel_halt use them instead of poorly duplicating them inline.

Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-07-26 14:35:44 -07:00
Eric W. Biederman
2f048ea81d [PATCH] Call emergency_reboot from panic
We know the system is in trouble so there is no question if this
is an emergecy :)

Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-07-26 14:35:43 -07:00
Eric W. Biederman
ff31977782 [PATCH] Use kernel_power_off in sysrq-o
We already do all of the gymnastics to run from process context
to call the power off code so call into the power off code cleanly.

This especially helps acpi as part of it's shutdown logic should
run acpi_shutdown called from device_shutdown which was not
being called from here.

Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-07-26 14:35:43 -07:00
Eric W. Biederman
7c9034735e [PATCH] Add emergency_restart()
When the kernel is working well and we want to restart cleanly
kernel_restart is the function to use.   But in many instances
the kernel wants to reboot when thing are expected to be working
very badly such as from panic or a software watchdog handler.

This patch adds the function emergency_restart() so that
callers can be clear what semantics they expect when calling
restart.  emergency_restart() is expected to be callable
from interrupt context and possibly reliable in even more
trying circumstances.

This is an initial generic implementation for all architectures.

Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-07-26 14:35:41 -07:00
Eric W. Biederman
abcd9e51f5 [PATCH] Make ctrl_alt_del call kernel_restart to get a proper reboot.
It is obvious we wanted to call kernel_restart here
but since we don't have it the code was expanded inline and hasn't
been correct since sometime in 2.4.

Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-07-26 14:35:41 -07:00
Eric W. Biederman
4a00ea1e18 [PATCH] Refactor sys_reboot into reusable parts
Because the factors of sys_reboot don't exist people calling
into the reboot path duplicate the code badly, leading to
inconsistent expectations of code in the reboot path.

This patch should is just code motion.

Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-07-26 14:35:41 -07:00
Eric W. Biederman
47f61f397c [PATCH] Add missing device_suspsend(PMSG_FREEZE) calls.
In the recent addition of device_suspend calls into
sys_reboot two code paths were missed.

Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-07-26 14:35:41 -07:00
Robert Love
0399cb08c5 [PATCH] inotify: move sysctl
This moves the inotify sysctl knobs to "/proc/sys/fs/inotify" from
"/proc/sys/fs".  Also some related cleanup.

Signed-off-by: Robert Love <rml@novell.com>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-07-13 11:09:31 -07:00
Robert Love
0eeca28300 [PATCH] inotify
inotify is intended to correct the deficiencies of dnotify, particularly
its inability to scale and its terrible user interface:

        * dnotify requires the opening of one fd per each directory
          that you intend to watch. This quickly results in too many
          open files and pins removable media, preventing unmount.
        * dnotify is directory-based. You only learn about changes to
          directories. Sure, a change to a file in a directory affects
          the directory, but you are then forced to keep a cache of
          stat structures.
        * dnotify's interface to user-space is awful.  Signals?

inotify provides a more usable, simple, powerful solution to file change
notification:

        * inotify's interface is a system call that returns a fd, not SIGIO.
	  You get a single fd, which is select()-able.
        * inotify has an event that says "the filesystem that the item
          you were watching is on was unmounted."
        * inotify can watch directories or files.

Inotify is currently used by Beagle (a desktop search infrastructure),
Gamin (a FAM replacement), and other projects.

See Documentation/filesystems/inotify.txt.

Signed-off-by: Robert Love <rml@novell.com>
Cc: John McCutchan <ttb@tentacle.dhs.org>
Cc: Christoph Hellwig <hch@lst.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-07-12 20:38:38 -07:00
Linus Torvalds
3f603ed319 Merge master.kernel.org:/pub/scm/linux/kernel/git/lenb/linux-2.6 2005-07-12 16:04:50 -07:00
Hugh Dickins
3b6bfcdb11 [PATCH] lower VM_DONTCOPY total_vm
dup_mmap of a VM_DONTCOPY vma forgot to lower the child's total_vm.  (But
no way does this account for the recent report of total_vm seen too low.)

Signed-off-by: Hugh Dickins <hugh@veritas.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-07-12 16:00:58 -07:00
Andrew Morton
d53d9f16ea [PATCH] name_to_dev_t warning fix
kernel/power/disk.c needs a declaration of name_to_dev_t() in scope.  mount.h
seems like an appropriate choice.

Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-07-12 16:00:58 -07:00
Len Brown
5028770a42 [ACPI] merge acpi-2.6.12 branch into latest Linux 2.6.13-rc...
Signed-off-by: Len Brown <len.brown@intel.com>
2005-07-12 17:21:56 -04:00
David Shaohua Li
5ae947ecc9 [ACPI] Suspend to RAM fix
Free some RAM before entering S3 so that upon
resume we can be sure early allocations will succeed.

http://bugzilla.kernel.org/show_bug.cgi?id=3469

Signed-off-by: David Shaohua Li <shaohua.li@intel.com>
Signed-off-by: Len Brown <len.brown@intel.com>
2005-07-11 23:21:54 -04:00
Alexey Starikovskiy
e2a5b420f7 [ACPI] ACPI poweroff fix
Register an "acpi" system device to be notified of shutdown preparation.
This depends on CONFIG_PM

http://bugzilla.kernel.org/show_bug.cgi?id=4041

Signed-off-by: Alexey Starikovskiy <alexey.y.starikovskiy@intel.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Len Brown <len.brown@intel.com>
2005-07-11 23:20:49 -04:00
Ingo Molnar
5bbcfd9000 [PATCH] cond_resched(): fix bogus might_sleep() warning
The BKS might be reacquired before we have dropped PREEMPT_ACTIVE, which
could trigger a second could trigger a second cond_resched() call.  Bug
found by Hirofumi Ogawa.

Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-07-07 18:23:47 -07:00
Christoph Lameter
6c036527a6 [PATCH] mostly_read data section
Add a new section called ".data.read_mostly" for data items that are read
frequently and rarely written to like cpumaps etc.

If these maps are placed in the .data section then these frequenly read
items may end up in cachelines with data is is frequently updated.  In that
case all processors in an SMP system must needlessly reload the cachelines
again and again containing elements of those frequently used variables.

The ability to share these cachelines will allow each cpu in an SMP system
to keep local copies of those shared cachelines thereby optimizing
performance.

Signed-off-by: Alok N Kataria <alokk@calsoftinc.com>
Signed-off-by: Shobhit Dayal <shobhit@calsoftinc.com>
Signed-off-by: Christoph Lameter <christoph@scalex86.org>
Signed-off-by: Shai Fultheim <shai@scalex86.org>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-07-07 18:23:46 -07:00
Pavel Machek
1322ad4151 [PATCH] pm: clean up process.c
freezeable() already tests for TRACED/STOPPED processes, no need to do it
twice.

Signed-off-by: Pavel Machek <pavel@suse.cz>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-07-07 18:23:43 -07:00
Pavel Machek
47b724f3fe [PATCH] swsusp: fix error handling
Fix error handling and whitespace in swsusp.c.  swsusp_free() was called when
there was nothing allocating, leading to oops.

Signed-off-by: Pavel Machek <pavel@suse.cz>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-07-07 18:23:43 -07:00
Pavel Machek
3efa147ad7 [PATCH] pm: Fix resume from initrd
Move device name resolution code around so that it is not called from
resume-from-initrd.  name_to_dev_t may be unavailable at that point.

Signed-off-by: Pavel Machek <pavel@suse.cz>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-07-07 18:23:43 -07:00
Rusty Lynch
6772926bef [PATCH] kprobes: fix namespace problem and sparc64 build
The following renames arch_init, a kprobes function for performing any
architecture specific initialization, to arch_init_kprobes in order to
cleanup the namespace.

Also, this patch adds arch_init_kprobes to sparc64 to fix the sparc64 kprobes
build from the last return probe patch.

Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-07-05 19:19:00 -07:00
Alan Cox
200803dfe4 [PATCH] irqpoll
Anyone reporting a stuck IRQ should try these options.  Its effectiveness
varies we've found in the Fedora case.  Quite a few systems with misdescribed
IRQ routing just work when you use irqpoll.  It also fixes up the VIA systems
although thats now fixed with the VIA quirk (which we could just make default
as its what Redmond OS does but Linus didn't like it historically).

A small number of systems have jammed IRQ sources or misdescribes that cause
an IRQ that we have no handler registered anywhere for.  In those cases it
doesn't help.

Signed-off-by: Alan Cox <number6@the-village.bc.nu>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-06-28 21:20:35 -07:00
Oleg Nesterov
f01b1b0baa [PATCH] ITIMER_REAL: fix possible deadlock and race
As Steven Rostedt pointed out, there are 2 problems with ITIMER_REAL
timers.

1. do_setitimer() does not call del_timer_sync() in case
   when the timer is not pending (it_real_value() returns 0).
   This is wrong, the timer may still be running, and it can
   rearm itself.

2. It calls del_timer_sync() with tsk->sighand->siglock held.
   This is deadlockable, because timer's handler needs this
   lock too.

Signed-off-by: Oleg Nesterov <oleg@tv-sign.ru>
Acked-by: Steven Rostedt <rostedt@goodmis.org>
Cc: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-06-28 21:20:30 -07:00
Luca Falavigna
47f176fdaf [PATCH] Using msleep() instead of HZ
Use msleep() in a few places.

Signed-off-by: Luca Falavigna <dktrkranz@gmail.com>
Acked-by: Ingo Molnar <mingo@elte.hu>
Acked-by: Jeff Garzik <jgarzik@pobox.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-06-28 21:20:29 -07:00