linux/kernel
KOSAKI Motohiro 0753ba01e1 mm: revert "oom: move oom_adj value"
The commit 2ff05b2b (oom: move oom_adj value) moveed the oom_adj value to
the mm_struct.  It was a very good first step for sanitize OOM.

However Paul Menage reported the commit makes regression to his job
scheduler.  Current OOM logic can kill OOM_DISABLED process.

Why? His program has the code of similar to the following.

	...
	set_oom_adj(OOM_DISABLE); /* The job scheduler never killed by oom */
	...
	if (vfork() == 0) {
		set_oom_adj(0); /* Invoked child can be killed */
		execve("foo-bar-cmd");
	}
	....

vfork() parent and child are shared the same mm_struct.  then above
set_oom_adj(0) doesn't only change oom_adj for vfork() child, it's also
change oom_adj for vfork() parent.  Then, vfork() parent (job scheduler)
lost OOM immune and it was killed.

Actually, fork-setting-exec idiom is very frequently used in userland program.
We must not break this assumption.

Then, this patch revert commit 2ff05b2b and related commit.

Reverted commit list
---------------------
- commit 2ff05b2b4e (oom: move oom_adj value from task_struct to mm_struct)
- commit 4d8b9135c3 (oom: avoid unnecessary mm locking and scanning for OOM_DISABLE)
- commit 8123681022 (oom: only oom kill exiting tasks with attached memory)
- commit 933b787b57 (mm: copy over oom_adj value at fork time)

Signed-off-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Cc: Paul Menage <menage@google.com>
Cc: David Rientjes <rientjes@google.com>
Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: Rik van Riel <riel@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Nick Piggin <npiggin@suse.de>
Cc: Mel Gorman <mel@csn.ul.ie>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-08-18 16:31:13 -07:00
..
gcov
irq genirq: Wake up irq thread after action has been installed 2009-08-18 17:22:43 +02:00
power headers: smp_lock.h redux 2009-07-12 12:22:34 -07:00
time clocksource: Prevent NULL pointer dereference 2009-07-19 17:15:54 +02:00
trace Remove double removal of blktrace directory 2009-08-12 18:50:08 +02:00
.gitignore
acct.c bsdacct: fix access to invalid filp in acct_on() 2009-06-30 18:56:00 -07:00
async.c
audit_tree.c Fix rule eviction order for AUDIT_DIR 2009-06-24 00:02:38 -04:00
audit_watch.c Audit: clean up all op= output to include string quoting 2009-06-24 00:00:52 -04:00
audit.c Fix rule eviction order for AUDIT_DIR 2009-06-24 00:02:38 -04:00
audit.h Fix rule eviction order for AUDIT_DIR 2009-06-24 00:02:38 -04:00
auditfilter.c Audit: clean up all op= output to include string quoting 2009-06-24 00:00:52 -04:00
auditsc.c Fix rule eviction order for AUDIT_DIR 2009-06-24 00:02:38 -04:00
backtracetest.c
bounds.c
capability.c
cgroup_debug.c
cgroup_freezer.c
cgroup.c cgroup avoid permanent sleep at rmdir 2009-07-29 19:10:35 -07:00
compat.c
configs.c
cpu.c mm/init: cpu_hotplug_init() must be initialized before SLAB 2009-06-22 21:18:12 -07:00
cpuset.c
cred-internals.h
cred.c
delayacct.c
dma-coherent.c
dma.c
exec_domain.c
exit.c headers: mnt_namespace.h redux 2009-07-08 09:31:56 -07:00
extable.c
fork.c mm: revert "oom: move oom_adj value" 2009-08-18 16:31:13 -07:00
freezer.c sched: fix nr_uninterruptible accounting of frozen tasks really 2009-07-18 14:19:53 +02:00
futex_compat.c futex: Fix compat_futex to be same as futex for REQUEUE_PI 2009-08-10 15:41:12 +02:00
futex.c futex: Fix handling of bad requeue syscall pairing 2009-08-10 20:38:11 +02:00
groups.c
hrtimer.c hrtimer: Fix migration expiry check 2009-07-10 17:32:55 +02:00
hung_task.c
itimer.c
kallsyms.c
Kconfig.freezer
Kconfig.hz
Kconfig.preempt
kexec.c kexec: fix omitting offset in extended crashkernel syntax 2009-07-29 19:10:34 -07:00
kfifo.c
kgdb.c
kmod.c headers: mnt_namespace.h redux 2009-07-08 09:31:56 -07:00
kprobes.c kprobes: Use kernel_text_address() for checking probe address 2009-07-30 16:44:06 -07:00
ksysfs.c
kthread.c update the comment in kthread_stop() 2009-07-27 12:15:46 -07:00
latencytop.c
lockdep_internals.h
lockdep_proc.c lockdep: Fix file mode of lock_stat 2009-08-07 11:58:38 +02:00
lockdep_states.h
lockdep.c
Makefile Merge branch 'tracing-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip 2009-06-28 11:05:04 -07:00
marker.c
module.c module: use MODULE_SYMBOL_PREFIX with module_layout 2009-07-27 12:15:45 -07:00
mutex-debug.c
mutex-debug.h
mutex.c
mutex.h
notifier.c
ns_cgroup.c
nsproxy.c
panic.c trace: stop tracer in oops_enter() 2009-07-24 15:30:45 -04:00
params.c
perf_counter.c perf_counter: Report the cloning task as parent on perf_counter_fork() 2009-08-13 16:17:15 +02:00
pid_namespace.c
pid.c kmemleak: Remove alloc_bootmem annotations introduced in the past 2009-07-09 17:07:02 +01:00
pm_qos_params.c
posix-cpu-timers.c posix_cpu_timers_exit_group(): Do not use thread_group_cputimer() 2009-08-08 18:30:25 +02:00
posix-timers.c posix-timers: Fix oops in clock_nanosleep() with CLOCK_MONOTONIC_RAW 2009-08-04 10:16:41 +02:00
printk.c
profile.c profile: suppress warning about large allocations when profile=1 is specified 2009-07-29 19:10:36 -07:00
ptrace.c cred_guard_mutex: do not return -EINTR to user-space 2009-07-06 13:57:04 -07:00
rcuclassic.c
rcupdate.c
rcupreempt_trace.c
rcupreempt.c
rcutorture.c
rcutree_trace.c
rcutree.c rcu: Mark Hierarchical RCU no longer experimental 2009-06-24 15:02:48 +02:00
rcutree.h
relay.c
res_counter.c
resource.c kernel/resource.c: fix sign extension in reserve_setup() 2009-06-30 18:56:00 -07:00
rtmutex_common.h
rtmutex-debug.c
rtmutex-debug.h
rtmutex-tester.c
rtmutex.c rtmutex: Avoid deadlock in rt_mutex_start_proxy_lock() 2009-08-06 05:50:21 +02:00
rtmutex.h
rwsem.c
sched_clock.c
sched_cpupri.c sched: Fix race in cpupri introduced by cpumask_var changes 2009-08-02 14:23:29 +02:00
sched_cpupri.h
sched_debug.c
sched_fair.c sched: Fix latencytop and sleep profiling vs group scheduling 2009-08-02 14:10:12 +02:00
sched_features.h
sched_idletask.c
sched_rt.c sched_rt: Fix overload bug on rt group scheduling 2009-07-10 10:43:29 +02:00
sched_stats.h
sched.c sched: fix load average accounting vs. cpu hotplug 2009-07-18 14:19:52 +02:00
seccomp.c
semaphore.c
signal.c do_sigaltstack: small cleanups 2009-08-01 11:18:56 -07:00
slow-work.c
smp.c generic-ipi: fix hotplug_cfd() 2009-08-07 10:39:55 -07:00
softirq.c softirq: introduce tasklet_hrtimer infrastructure 2009-07-22 17:01:17 +02:00
softlockup.c
spinlock.c
srcu.c
stacktrace.c
stop_machine.c
sys_ni.c
sys.c
sysctl_check.c
sysctl.c Security/SELinux: seperate lsm specific mmap_min_addr 2009-08-17 15:09:11 +10:00
taskstats.c
test_kprobes.c
time.c
timeconst.pl
timer.c timer: Avoid reading uninitialized data 2009-07-18 23:11:43 +02:00
tracepoint.c
tsacct.c
uid16.c
up.c
user_namespace.c
user.c
utsname_sysctl.c
utsname.c
wait.c locking, sched: Give waitqueue spinlocks their own lockdep classes 2009-08-10 14:43:09 +02:00
workqueue.c