Introduce cpuset_cpus_allowed_fallback() helper to fix the cpuset problems
with select_fallback_rq(). It can be called from any context and can't use
any cpuset locks including task_lock(). It is called when the task doesn't
have online cpus in ->cpus_allowed but ttwu/etc must be able to find a
suitable cpu.
I am not proud of this patch. Everything which needs such a fat comment
can't be good even if correct. But I'd prefer to not change the locking
rules in the code I hardly understand, and in any case I believe this
simple change make the code much more correct compared to deadlocks we
currently have.
Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
LKML-Reference: <20100315091027.GA9155@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
_cpu_down() changes the current task's affinity and then recovers it at
the end. The problems are well known: we can't restore old_allowed if it
was bound to the now-dead-cpu, and we can race with the userspace which
can change cpu-affinity during unplug.
_cpu_down() should not play with current->cpus_allowed at all. Instead,
take_cpu_down() can migrate the caller of _cpu_down() after __cpu_disable()
removes the dying cpu from cpu_online_mask.
Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Acked-by: Rafael J. Wysocki <rjw@sisk.pl>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
LKML-Reference: <20100315091023.GA9148@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
sched_exec()->select_task_rq() reads/updates ->cpus_allowed lockless.
This can race with other CPUs updating our ->cpus_allowed, and this
looks meaningless to me.
The task is current and running, it must have online cpus in ->cpus_allowed,
the fallback mode is bogus. And, if ->sched_class returns the "wrong" cpu,
this likely means we raced with set_cpus_allowed() which was called
for reason, why should sched_exec() retry and call ->select_task_rq()
again?
Change the code to call sched_class->select_task_rq() directly and do
nothing if the returned cpu is wrong after re-checking under rq->lock.
From now task_struct->cpus_allowed is always stable under TASK_WAKING,
select_fallback_rq() is always called under rq-lock or the caller or
the caller owns TASK_WAKING (select_task_rq).
Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
LKML-Reference: <20100315091019.GA9141@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
The previous patch preserved the retry logic, but it looks unneeded.
__migrate_task() can only fail if we raced with migration after we dropped
the lock, but in this case the caller of set_cpus_allowed/etc must initiate
migration itself if ->on_rq == T.
We already fixed p->cpus_allowed, the changes in active/online masks must
be visible to racer, it should migrate the task to online cpu correctly.
Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
LKML-Reference: <20100315091014.GA9138@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
move_task_off_dead_cpu()->select_fallback_rq() reads/updates ->cpus_allowed
lockless. We can race with set_cpus_allowed() running in parallel.
Change it to take rq->lock around select_fallback_rq(). Note that it is not
trivial to move this spin_lock() into select_fallback_rq(), we must recheck
the task was not migrated after we take the lock and other callers do not
need this lock.
To avoid the races with other callers of select_fallback_rq() which rely on
TASK_WAKING, we also check p->state != TASK_WAKING and do nothing otherwise.
The owner of TASK_WAKING must update ->cpus_allowed and choose the correct
CPU anyway, and the subsequent __migrate_task() is just meaningless because
p->se.on_rq must be false.
Alternatively, we could change select_task_rq() to take rq->lock right
after it calls sched_class->select_task_rq(), but this looks a bit ugly.
Also, change it to not assume irqs are disabled and absorb __migrate_task_irq().
Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
LKML-Reference: <20100315091010.GA9131@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
This patch just states the fact the cpusets/cpuhotplug interaction is
broken and removes the deadlockable code which only pretends to work.
- cpuset_lock() doesn't really work. It is needed for
cpuset_cpus_allowed_locked() but we can't take this lock in
try_to_wake_up()->select_fallback_rq() path.
- cpuset_lock() is deadlockable. Suppose that a task T bound to CPU takes
callback_mutex. If cpu_down(CPU) happens before T drops callback_mutex
stop_machine() preempts T, then migration_call(CPU_DEAD) tries to take
cpuset_lock() and hangs forever because CPU is already dead and thus
T can't be scheduled.
- cpuset_cpus_allowed_locked() is deadlockable too. It takes task_lock()
which is not irq-safe, but try_to_wake_up() can be called from irq.
Kill them, and change select_fallback_rq() to use cpu_possible_mask, like
we currently do without CONFIG_CPUSETS.
Also, with or without this patch, with or without CONFIG_CPUSETS, the
callers of select_fallback_rq() can race with each other or with
set_cpus_allowed() pathes.
The subsequent patches try to to fix these problems.
Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
LKML-Reference: <20100315091003.GA9123@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
USER_SCHED has been removed, so update the documentation
accordingly.
Signed-off-by: Li Zefan <lizf@cn.fujitsu.com>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Acked-by: Serge E. Hallyn <serue@us.ibm.com>
LKML-Reference: <4BA9A07E.8070508@cn.fujitsu.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
This is left over from commit 7c9414385e ("sched: Remove USER_SCHED"")
Signed-off-by: Li Zefan <lizf@cn.fujitsu.com>
Acked-by: Dhaval Giani <dhaval.giani@gmail.com>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: David Howells <dhowells@redhat.com>
LKML-Reference: <4BA9A05F.7010407@cn.fujitsu.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
* 'drm-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/airlied/drm-2.6: (76 commits)
drm/radeon/kms: enable ACPI powermanagement mode on radeon gpus.
drm/radeon/kms: rs400/480 should set common registers.
drm/radeon/kms: add sanity check to wptr.
drm/radeon/kms/evergreen: get DP working
drm/radeon/kms: add hw_i2c module option
drm/radeon/kms: use new pre/post_xfer i2c bit algo hooks
drm/radeon/kms: disable MSI on IGP chips
drm/radeon/kms: display watermark updates (v2)
drm/radeon/kms/dp: disable training pattern on the sink at the end of link training
drm/radeon/kms: minor fixes for eDP with LCD* device tags (v2)
drm/radeon/kms/dp: remove extraneous training complete call
drm/radeon/kms/atom: minor fixes to transmitter setup
drm/radeon/kms: Only restrict BO to visible VRAM size when pinning to VRAM.
drm: fix build error when SYSRQ is disabled
drm/radeon/kms: fix macbookpro connector quirk
drm/radeon/r6xx/r7xx: further safe reg clean up
drm/radeon: bump the UMS driver version for r6xx/r7xx const buffer support
drm/radeon/kms: bump the version for r6xx/r7xx const buffer support
drm/radeon/r6xx/r7xx: CS parser fixes
drm/radeon/kms: fix some typos in r6xx/r7xx hpd setup
...
Fix up MSI-related conflicts in drivers/gpu/drm/radeon/radeon_irq_kms.c
* 'for-linus' of git://git.monstr.eu/linux-2.6-microblaze: (35 commits)
microblaze: Support word copying in copy_tofrom_user
microblaze: Print early printk information to log buffer
microblaze: head.S typo fix
microblaze: Use MICROBLAZE_TLB_SIZE in asm code
microblaze: Kconfig Fix - pci
microblaze: Adding likely macros
microblaze: Add .type and .size to ASM functions
microblaze: Fix TLB macros
microblaze: Use instruction with delay slot
microblaze: Remove additional resr and rear loading
microblaze: Change register usage for ESR and EAR
microblaze: Prepare work for optimization in exception code
microblaze: Add DEBUG option
microblaze: Support systems without lmb bram
microblaze: uaccess: Sync strlen, strnlen, copy_to/from_user
microblaze: uaccess: Unify __copy_tofrom_user
microblaze: uaccess: Move functions to generic location
microblaze: uaccess: Fix put_user for noMMU
microblaze: uaccess: Fix get_user macro for noMMU
microblaze: uaccess: fix clear_user for noMMU kernel
...
* 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mjg59/platform-drivers-x86:
eeepc-wmi: new driver for WMI based hotkeys on Eee PC laptops
asus-laptop: fix warning in asus_handle_init
proc_oom_score(task) has a reference to task_struct, but that is all.
If this task was already released before we take tasklist_lock
- we can't use task->group_leader, it points to nowhere
- it is not safe to call badness() even if this task is
->group_leader, has_intersects_mems_allowed() assumes
it is safe to iterate over ->thread_group list.
- even worse, badness() can hit ->signal == NULL
Add the pid_alive() check to ensure __unhash_process() was not called.
Also, use "task" instead of task->group_leader. badness() should return
the same result for any sub-thread. Currently this is not true, but
this should be changed anyway.
Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Cc: stable@kernel.org
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Word copying is used only for aligned addresses.
Here is space for improving to use any better copying technique.
Look at memcpy implementation.
Signed-off-by: Michal Simek <monstr@monstr.eu>
TLB size was hardcoded in asm code. This patch brings ability
to change TLB size only in one place. (mmu.h).
Signed-off-by: Michal Simek <monstr@monstr.eu>
When the system has no lmb bram, main memory should be start from
zero because of microblaze vectors.
DTS fragment could look like:
DDR2_SDRAM: memory@0 {
device_type = "memory";
reg = < 0x0 0x10000000 >;
} ;
Then you have to setup CONFIG_KERNEL_BASE_ADDR=0 which caused
that kernel physical start address will be zero. On reset vector place
will be jump to 0x100 and on 0x100 starts kernel text.
You have to solve how to load the kernel before cpu starts.
Tested with XMD.
Signed-off-by: Michal Simek <monstr@monstr.eu>
Here is small regression on dhrystone tests and I think
that on all benchmarking tests. It is due to better checking
mechanism in put_user macro
Signed-off-by: Michal Simek <monstr@monstr.eu>
This is the first patch which does uaccess unification.
I choosed to do several patches to be able to use bisect
in future if any fault happens.
Signed-off-by: Michal Simek <monstr@monstr.eu>
If CONFIG_INITRAMFS_SOURCE is set, "scripts/gen_initramfs_list.sh"
checks if the cpio image exists. Remove the duplicate check from the
Makefile.
Remove the "clean-kernel" variable which is unused in the Makefile and
is not used by the Kbuild.
Signed-off-by: Arun Bhanu <arun@bhanu.net>
Signed-off-by: Michal Simek <monstr@monstr.eu>
'make clean' does not to delete the following build generated file:
arch/microblaze/boot/linux.bin.ub
'make mrproper' does not to delete the following build generated files:
arch/microblaze/boot/simpleImage.*
Fix the Makefile to delete these build generated files.
See [1] for a discussion on why simpleImage.* files are deleted with 'make
mrproper' and not with 'make clean'.
[1] http://lkml.org/lkml/2010/3/12/96
Signed-off-by: Arun Bhanu <arun@bhanu.net>
Signed-off-by: Michal Simek <monstr@monstr.eu>
'make ARCH=microblaze help' fails with the following error due to a
missing single quote.
/bin/sh: -c: line 0: unexpected EOF while looking for matching `''
/bin/sh: -c: line 1: syntax error: unexpected end of file
make: *** [help] Error 2
Signed-off-by: Arun Bhanu <arun@bhanu.net>
Signed-off-by: Michal Simek <monstr@monstr.eu>
The "kstack=" command line parameter is not parsed correctly.
All proper values are interpreted as zero.
Signed-off-by: Steven J. Magnani <steve@digidescorp.com>
Signed-off-by: Michal Simek <monstr@monstr.eu>
Some GPUs have an APM/ACPI PM mode selection switch and some BIOSes
set this to APM. We really want this in ACPI mode for Linux.
Signed-off-by: Dave Airlie <airlied@redhat.com>