linux

mirror of https://github.com/FEX-Emu/linux.git synced 2024-12-23 18:07:03 +00:00

History

David Rientjes a63d83f427 oom: badness heuristic rewrite This a complete rewrite of the oom killer's badness() heuristic which is used to determine which task to kill in oom conditions. The goal is to make it as simple and predictable as possible so the results are better understood and we end up killing the task which will lead to the most memory freeing while still respecting the fine-tuning from userspace. Instead of basing the heuristic on mm->total_vm for each task, the task's rss and swap space is used instead. This is a better indication of the amount of memory that will be freeable if the oom killed task is chosen and subsequently exits. This helps specifically in cases where KDE or GNOME is chosen for oom kill on desktop systems instead of a memory hogging task. The baseline for the heuristic is a proportion of memory that each task is currently using in memory plus swap compared to the amount of "allowable" memory. "Allowable," in this sense, means the system-wide resources for unconstrained oom conditions, the set of mempolicy nodes, the mems attached to current's cpuset, or a memory controller's limit. The proportion is given on a scale of 0 (never kill) to 1000 (always kill), roughly meaning that if a task has a badness() score of 500 that the task consumes approximately 50% of allowable memory resident in RAM or in swap space. The proportion is always relative to the amount of "allowable" memory and not the total amount of RAM systemwide so that mempolicies and cpusets may operate in isolation; they shall not need to know the true size of the machine on which they are running if they are bound to a specific set of nodes or mems, respectively. Root tasks are given 3% extra memory just like __vm_enough_memory() provides in LSMs. In the event of two tasks consuming similar amounts of memory, it is generally better to save root's task. Because of the change in the badness() heuristic's baseline, it is also necessary to introduce a new user interface to tune it. It's not possible to redefine the meaning of /proc/pid/oom_adj with a new scale since the ABI cannot be changed for backward compatability. Instead, a new tunable, /proc/pid/oom_score_adj, is added that ranges from -1000 to +1000. It may be used to polarize the heuristic such that certain tasks are never considered for oom kill while others may always be considered. The value is added directly into the badness() score so a value of -500, for example, means to discount 50% of its memory consumption in comparison to other tasks either on the system, bound to the mempolicy, in the cpuset, or sharing the same memory controller. /proc/pid/oom_adj is changed so that its meaning is rescaled into the units used by /proc/pid/oom_score_adj, and vice versa. Changing one of these per-task tunables will rescale the value of the other to an equivalent meaning. Although /proc/pid/oom_adj was originally defined as a bitshift on the badness score, it now shares the same linear growth as /proc/pid/oom_score_adj but with different granularity. This is required so the ABI is not broken with userspace applications and allows oom_adj to be deprecated for future removal. Signed-off-by: David Rientjes <rientjes@google.com> Cc: Nick Piggin <npiggin@suse.de> Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Balbir Singh <balbir@in.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>		2010-08-09 20:45:02 -07:00
..
debug	Merge branch 'timers-timekeeping-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip	2010-08-06 13:18:29 -07:00
gcov
irq	irq: Add new IRQ flag IRQF_NO_SUSPEND	2010-07-29 13:24:57 +02:00
power	Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/wq	2010-08-07 12:42:58 -07:00
time	Merge branch 'timers-timekeeping-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip	2010-08-06 13:18:29 -07:00
trace	Merge branch 'bkl/core' of git://git.kernel.org/pub/scm/linux/kernel/git/frederic/random-tracing	2010-08-07 17:06:54 -07:00
.gitignore
acct.c	Merge branch 'next' into for-linus	2010-05-18 08:57:00 +10:00
async.c	async: use workqueue for worker pool	2010-07-14 11:29:46 +02:00
audit_tree.c	include cleanup: Update gfp.h and slab.h includes to prepare for breaking implicit slab.h inclusion from percpu.h	2010-03-30 22:02:32 +09:00
audit_watch.c	include cleanup: Update gfp.h and slab.h includes to prepare for breaking implicit slab.h inclusion from percpu.h	2010-03-30 22:02:32 +09:00
audit.c	drop_monitor: convert some kfree_skb call sites to consume_skb	2010-07-20 13:28:05 -07:00
audit.h
auditfilter.c	include cleanup: Update gfp.h and slab.h includes to prepare for breaking implicit slab.h inclusion from percpu.h	2010-03-30 22:02:32 +09:00
auditsc.c	audit: preface audit printk with audit	2010-04-05 13:19:45 -07:00
backtracetest.c
bounds.c
capability.c	sched: Remove remaining USER_SCHED code	2010-04-02 20:12:00 +02:00
cgroup_freezer.c	Freezer / cgroup freezer: Update stale locking comments	2010-05-10 23:18:47 +02:00
cgroup.c	cgroupfs: create /sys/fs/cgroup to mount cgroupfs on	2010-08-05 13:53:35 -07:00
compat.c	cpumask: fix compat getaffinity	2010-05-19 11:48:18 -07:00
configs.c
cpu.c	sched: adjust when cpu_active and cpuset configurations are updated during cpu on/offlining	2010-06-08 21:40:36 +02:00
cpuset.c	Merge branch 'sched-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip	2010-08-06 09:39:22 -07:00
cred.c	CRED: Fix get_task_cred() and task_state() to not resurrect dead credentials	2010-07-29 15:16:17 -07:00
delayacct.c
dma.c
early_res.c	kmemleak: Add support for NO_BOOTMEM configurations	2010-07-19 11:54:15 +01:00
elfcore.c
exec_domain.c	sys_personality: change sys_personality() to accept "unsigned int" instead of u_long	2010-06-04 15:21:45 -07:00
exit.c	proc: turn signal_struct->count into "int nr_threads"	2010-05-27 09:12:47 -07:00
extable.c
fork.c	oom: badness heuristic rewrite	2010-08-09 20:45:02 -07:00
freezer.c
futex_compat.c
futex.c	futex: futex_find_get_task remove credentails check	2010-06-30 15:43:44 -07:00
groups.c	security: remove dead hook task_setgroups	2010-04-12 12:19:18 +10:00
hrtimer.c	Merge branch 'timers-timekeeping-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip	2010-08-06 13:18:29 -07:00
hung_task.c
hw_breakpoint.c	Merge branch 'perf-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip	2010-08-06 09:30:52 -07:00
itimer.c
kallsyms.c	kdb: core for kgdb back end (2 of 2)	2010-05-20 21:04:21 -05:00
Kconfig.freezer
Kconfig.hz
Kconfig.locks
Kconfig.preempt
kexec.c	kexec: fix Oops in crash_shrink_memory()	2010-06-29 15:29:31 -07:00
kfifo.c
kmod.c	call_usermodehelper: UMH_WAIT_EXEC ignores kernel_thread() failure	2010-05-27 09:12:45 -07:00
kprobes.c	kprobes: Move enable/disable_kprobe() out from debugfs code	2010-05-08 18:08:30 +02:00
ksysfs.c	sysfs: add struct file* to bin_attr callbacks	2010-05-21 09:37:31 -07:00
kthread.c	kthread: implement kthread_data()	2010-06-29 10:07:09 +02:00
latencytop.c	include cleanup: Update gfp.h and slab.h includes to prepare for breaking implicit slab.h inclusion from percpu.h	2010-03-30 22:02:32 +09:00
lockdep_internals.h	lockdep: No need to disable preemption in debug atomic ops	2010-05-04 05:38:16 +02:00
lockdep_proc.c	lockstat: Make lockstat counting per cpu	2010-04-06 00:15:37 +02:00
lockdep_states.h
lockdep.c	sched_clock: Add local_clock() API and improve documentation	2010-06-09 10:34:49 +02:00
Makefile	Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/wq	2010-08-07 12:42:58 -07:00
module.c	module: cleanup comments, remove noinline	2010-08-05 12:59:13 +09:30
mutex-debug.c
mutex-debug.h
mutex.c	mutex: Fix optimistic spinning vs. BKL	2010-05-19 08:18:44 +02:00
mutex.h
notifier.c
ns_cgroup.c
nsproxy.c	include cleanup: Update gfp.h and slab.h includes to prepare for breaking implicit slab.h inclusion from percpu.h	2010-03-30 22:02:32 +09:00
padata.c	Merge git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6	2010-08-04 15:23:14 -07:00
panic.c	panic: call console_verbose() in panic	2010-05-27 09:12:53 -07:00
params.c
perf_event.c	Merge branch 'sched-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip	2010-08-06 09:39:22 -07:00
pid_namespace.c	include cleanup: Update gfp.h and slab.h includes to prepare for breaking implicit slab.h inclusion from percpu.h	2010-03-30 22:02:32 +09:00
pid.c	pids: increase pid_max based on num_possible_cpus	2010-05-27 09:12:51 -07:00
pm_qos_params.c	pm_qos: Get rid of the allocation in pm_qos_add_request()	2010-07-19 02:00:34 +02:00
posix-cpu-timers.c	sched: Fix the racy usage of thread_group_cputimer() in fastpath_timer_check()	2010-06-18 10:46:57 +02:00
posix-timers.c	posix_timer: Move copy_to_user(created_timer_id) down in timer_create()	2010-07-23 15:08:12 +02:00
printk.c	printk: fix delayed messages from CPU hotplug events	2010-08-05 13:25:59 +01:00
profile.c	numa: in-kernel profiling: use cpu_to_mem() for per cpu allocations	2010-05-27 09:12:57 -07:00
ptrace.c	ptrace: PTRACE_GETFDPIC: fix the unsafe usage of child->mm	2010-05-27 09:12:44 -07:00
range.c
rcupdate.c	tree/tiny rcu: Add debug RCU head objects	2010-06-14 16:37:26 -07:00
rcutiny_plugin.h	rcu: slim down rcutiny by removing rcu_scheduler_active and friends	2010-05-10 11:08:34 -07:00
rcutiny.c	tree/tiny rcu: Add debug RCU head objects	2010-06-14 16:37:26 -07:00
rcutorture.c	sched_clock: Add local_clock() API and improve documentation	2010-06-09 10:34:49 +02:00
rcutree_plugin.h	rcu: remove all rcu head initializations, except on_stack initializations	2010-05-11 16:10:47 -07:00
rcutree_trace.c	rcu: reduce the number of spurious RCU_SOFTIRQ invocations	2010-05-10 11:08:35 -07:00
rcutree.c	tree/tiny rcu: Add debug RCU head objects	2010-06-14 16:37:26 -07:00
rcutree.h	rcu: reduce the number of spurious RCU_SOFTIRQ invocations	2010-05-10 11:08:35 -07:00
relay.c	kernel/: convert cpu notifier to return encapsulate errno value	2010-05-27 09:12:48 -07:00
res_counter.c	include cleanup: Update gfp.h and slab.h includes to prepare for breaking implicit slab.h inclusion from percpu.h	2010-03-30 22:02:32 +09:00
resource.c	resource: shared I/O region support	2010-05-11 12:01:10 -07:00
rtmutex_common.h
rtmutex-debug.c
rtmutex-debug.h
rtmutex-tester.c
rtmutex.c
rtmutex.h
rwsem.c
sched_clock.c	sched_clock: Add local_clock() API and improve documentation	2010-06-09 10:34:49 +02:00
sched_cpupri.c	sched: No need for bootmem special cases	2010-07-17 12:06:22 +02:00
sched_cpupri.h	sched: No need for bootmem special cases	2010-07-17 12:06:22 +02:00
sched_debug.c	sched: Use correct macro to display sched_child_runs_first in /proc/sched_debug	2010-07-21 21:46:12 +02:00
sched_fair.c	Merge branch 'linus' into sched/core	2010-07-21 21:45:08 +02:00
sched_features.h
sched_idletask.c	sched: Cure load average vs NO_HZ woes	2010-04-23 11:02:02 +02:00
sched_rt.c	sched: task_tick_rt: Remove the obsolete ->signal != NULL check	2010-06-18 10:46:56 +02:00
sched_stats.h	sched: Remove the obsolete exit_state/signal hacks	2010-06-18 10:46:56 +02:00
sched.c	Merge branch 'sched-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip	2010-08-06 09:39:22 -07:00
seccomp.c
semaphore.c
signal.c	CRED: Fix RCU warning due to previous patch fixing __task_cred()'s checks	2010-08-04 11:17:10 -07:00
smp.c	kernel/: convert cpu notifier to return encapsulate errno value	2010-05-27 09:12:48 -07:00
softirq.c	kernel/: fix BUG_ON checks for cpu notifier callbacks direct call	2010-06-04 15:21:45 -07:00
spinlock.c
srcu.c	include cleanup: Update gfp.h and slab.h includes to prepare for breaking implicit slab.h inclusion from percpu.h	2010-03-30 22:02:32 +09:00
stacktrace.c
stop_machine.c	sched: Make sure timers have migrated before killing the migration_thread	2010-05-31 08:37:44 +02:00
sys_ni.c
sys.c	kmod: add init function to usermodehelper	2010-05-27 09:12:44 -07:00
sysctl_binary.c	sysctl: don't use own implementation of hex_to_bin()	2010-05-25 08:07:05 -07:00
sysctl_check.c
sysctl.c	oom: move sysctl declarations to oom.h	2010-08-09 20:44:57 -07:00
taskstats.c	include cleanup: Update gfp.h and slab.h includes to prepare for breaking implicit slab.h inclusion from percpu.h	2010-03-30 22:02:32 +09:00
test_kprobes.c
time.c	time: Kill off CONFIG_GENERIC_TIME	2010-07-27 12:40:54 +02:00
timeconst.pl
timer.c	Merge branch 'timers-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip	2010-08-06 13:12:36 -07:00
tracepoint.c	tracing: Let tracepoints have data passed to tracepoint callbacks	2010-05-14 09:50:34 -04:00
tsacct.c
uid16.c
up.c
user_namespace.c	user_ns: Introduce user_nsmap_uid and user_ns_map_gid.	2010-06-16 14:55:34 -07:00
user-return-notifier.c
user.c	sched: Remove a stale comment	2010-05-10 08:48:39 +02:00
utsname_sysctl.c
utsname.c
wait.c
watchdog.c	kernel/watchdog: Initialize 'result'	2010-07-07 08:46:42 +02:00
workqueue_sched.h	workqueue: implement concurrency managed dynamic worker pool	2010-06-29 10:07:14 +02:00
workqueue.c	workqueue: workqueue_cpu_callback() should be cpu_notifier instead of hotcpu_notifier	2010-08-09 11:50:34 +02:00