linux/mm
Michal Hocko 26db62f179 oom: keep mm of the killed task available
oom_reap_task has to call exit_oom_victim in order to make sure that the
oom vicim will not block the oom killer for ever.  This is, however,
opening new problems (e.g oom_killer_disable exclusion - see commit
7407054209 ("oom, suspend: fix oom_reaper vs.  oom_killer_disable
race")).  exit_oom_victim should be only called from the victim's
context ideally.

One way to achieve this would be to rely on per mm_struct flags.  We
already have MMF_OOM_REAPED to hide a task from the oom killer since
"mm, oom: hide mm which is shared with kthread or global init". The
problem is that the exit path:

  do_exit
    exit_mm
      tsk->mm = NULL;
      mmput
        __mmput
      exit_oom_victim

doesn't guarantee that exit_oom_victim will get called in a bounded
amount of time.  At least exit_aio depends on IO which might get blocked
due to lack of memory and who knows what else is lurking there.

This patch takes a different approach.  We remember tsk->mm into the
signal_struct and bind it to the signal struct life time for all oom
victims.  __oom_reap_task_mm as well as oom_scan_process_thread do not
have to rely on find_lock_task_mm anymore and they will have a reliable
reference to the mm struct.  As a result all the oom specific
communication inside the OOM killer can be done via tsk->signal->oom_mm.

Increasing the signal_struct for something as unlikely as the oom killer
is far from ideal but this approach will make the code much more
reasonable and long term we even might want to move task->mm into the
signal_struct anyway.  In the next step we might want to make the oom
killer exclusion and access to memory reserves completely independent
which would be also nice.

Link: http://lkml.kernel.org/r/1472119394-11342-4-git-send-email-mhocko@kernel.org
Signed-off-by: Michal Hocko <mhocko@suse.com>
Cc: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: David Rientjes <rientjes@google.com>
Cc: Vladimir Davydov <vdavydov@parallels.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-10-07 18:46:27 -07:00
..
kasan
backing-dev.c
balloon_compaction.c
bootmem.c
cleancache.c
cma_debug.c
cma.c
cma.h
compaction.c mm, compaction: require only min watermarks for non-costly orders 2016-10-07 18:46:27 -07:00
debug_page_ref.c
debug.c mm: avoid endless recursion in dump_page() 2016-09-19 15:36:16 -07:00
dmapool.c
early_ioremap.c
fadvise.c
failslab.c
filemap.c do_generic_file_read(): fail immediately if killed 2016-10-07 18:46:27 -07:00
frame_vector.c
frontswap.c
gup.c
highmem.c
huge_memory.c Merge branch 'linus' into sched/core, to pick up fixes 2016-09-30 10:44:27 +02:00
hugetlb_cgroup.c
hugetlb.c
hwpoison-inject.c
init-mm.c
internal.h mm, compaction: make whole_zone flag ignore cached scanner positions 2016-10-07 18:46:27 -07:00
interval_tree.c
Kconfig
Kconfig.debug PM / Hibernate: allow hibernation with PAGE_POISONING_ZERO 2016-09-13 02:35:27 +02:00
khugepaged.c mm, thp: fix leaking mapped pte in __collapse_huge_page_swapin() 2016-09-19 15:36:16 -07:00
kmemcheck.c
kmemleak-test.c
kmemleak.c
ksm.c mm,ksm: fix endless looping in allocating memory when ksm enable 2016-09-28 16:19:01 -07:00
list_lru.c
maccess.c
madvise.c
Makefile
memblock.c
memcontrol.c mm: memcontrol: add sanity checks for memcg->id.ref on get/put 2016-10-07 18:46:26 -07:00
memory_hotplug.c mem-hotplug: use nodes that contain memory as mask in new_node_page() 2016-09-28 16:19:02 -07:00
memory-failure.c
memory.c Merge branch 'linus' into sched/core, to pick up fixes 2016-09-30 10:44:27 +02:00
mempolicy.c
mempool.c
memtest.c
migrate.c
mincore.c
mlock.c
mm_init.c
mmap.c Merge branch 'x86-vdso-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip 2016-10-03 17:29:01 -07:00
mmu_context.c
mmu_notifier.c
mmzone.c
mprotect.c
mremap.c
msync.c
nobootmem.c
nommu.c
oom_kill.c oom: keep mm of the killed task available 2016-10-07 18:46:27 -07:00
page_alloc.c mm/page_ext: support extra space allocation by page_ext user 2016-10-07 18:46:27 -07:00
page_counter.c
page_ext.c mm/page_ext: support extra space allocation by page_ext user 2016-10-07 18:46:27 -07:00
page_idle.c
page_io.c mm: fix the page_swap_info() BUG_ON check 2016-09-19 15:36:17 -07:00
page_isolation.c
page_owner.c mm/page_owner: don't define fields on struct page_ext by hard-coding 2016-10-07 18:46:27 -07:00
page_poison.c
page-writeback.c mm, vmscan: get rid of throttle_vm_writeout 2016-10-07 18:46:27 -07:00
pagewalk.c
percpu-km.c
percpu-vm.c
percpu.c
pgtable-generic.c
process_vm_access.c
quicklist.c
readahead.c
rmap.c
shmem.c huge tmpfs: fix Committed_AS leak 2016-09-24 11:20:01 -07:00
slab_common.c
slab.c
slab.h
slob.c
slub.c
sparse-vmemmap.c
sparse.c
swap_cgroup.c
swap_state.c
swap.c
swapfile.c mm, swap: add swap_cluster_list 2016-10-07 18:46:27 -07:00
truncate.c
usercopy.c mm: usercopy: Check for module addresses 2016-09-20 16:07:39 -07:00
userfaultfd.c
util.c
vmacache.c mm: unrig VMA cache hit ratio 2016-10-07 18:46:27 -07:00
vmalloc.c mm/vmalloc.c: fix align value calculation error 2016-10-07 18:46:26 -07:00
vmpressure.c
vmscan.c mm, vmscan: get rid of throttle_vm_writeout 2016-10-07 18:46:27 -07:00
vmstat.c mm/page_owner: move page_owner specific function to page_owner.c 2016-10-07 18:46:27 -07:00
workingset.c mm: workingset: fix crash in shadow node shrinker caused by replace_page_cache_page() 2016-09-30 15:26:52 -07:00
z3fold.c
zbud.c
zpool.c
zsmalloc.c
zswap.c