linux/mm
KOSAKI Motohiro 929bea7c71 vmscan: all_unreclaimable() use zone->all_unreclaimable as a name
all_unreclaimable check in direct reclaim has been introduced at 2.6.19
by following commit.

	2006 Sep 25; commit 408d8544; oom: use unreclaimable info

And it went through strange history. firstly, following commit broke
the logic unintentionally.

	2008 Apr 29; commit a41f24ea; page allocator: smarter retry of
				      costly-order allocations

Two years later, I've found obvious meaningless code fragment and
restored original intention by following commit.

	2010 Jun 04; commit bb21c7ce; vmscan: fix do_try_to_free_pages()
				      return value when priority==0

But, the logic didn't works when 32bit highmem system goes hibernation
and Minchan slightly changed the algorithm and fixed it .

	2010 Sep 22: commit d1908362: vmscan: check all_unreclaimable
				      in direct reclaim path

But, recently, Andrey Vagin found the new corner case. Look,

	struct zone {
	  ..
	        int                     all_unreclaimable;
	  ..
	        unsigned long           pages_scanned;
	  ..
	}

zone->all_unreclaimable and zone->pages_scanned are neigher atomic
variables nor protected by lock.  Therefore zones can become a state of
zone->page_scanned=0 and zone->all_unreclaimable=1.  In this case, current
all_unreclaimable() return false even though zone->all_unreclaimabe=1.

This resulted in the kernel hanging up when executing a loop of the form

1. fork
2. mmap
3. touch memory
4. read memory
5. munmmap

as described in
http://www.gossamer-threads.com/lists/linux/kernel/1348725#1348725

Is this ignorable minor issue?  No.  Unfortunately, x86 has very small dma
zone and it become zone->all_unreclamble=1 easily.  and if it become
all_unreclaimable=1, it never restore all_unreclaimable=0.  Why?  if
all_unreclaimable=1, vmscan only try DEF_PRIORITY reclaim and
a-few-lru-pages>>DEF_PRIORITY always makes 0.  that mean no page scan at
all!

Eventually, oom-killer never works on such systems.  That said, we can't
use zone->pages_scanned for this purpose.  This patch restore
all_unreclaimable() use zone->all_unreclaimable as old.  and in addition,
to add oom_killer_disabled check to avoid reintroduce the issue of commit
d1908362 ("vmscan: check all_unreclaimable in direct reclaim path").

Reported-by: Andrey Vagin <avagin@openvz.org>
Signed-off-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Cc: Nick Piggin <npiggin@kernel.dk>
Reviewed-by: Minchan Kim <minchan.kim@gmail.com>
Reviewed-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Acked-by: David Rientjes <rientjes@google.com>
Cc: <stable@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2011-04-14 16:06:56 -07:00
..
backing-dev.c Fix common misspellings 2011-03-31 11:26:23 -03:00
bootmem.c crash_dump: export is_kdump_kernel to modules, consolidate elfcorehdr_addr, setup_elfcorehdr and saved_max_pfn 2011-03-23 19:47:19 -07:00
bounce.c
compaction.c mm: compaction: minimise the time IRQs are disabled while isolating pages for migration 2011-03-22 17:44:05 -07:00
debug-pagealloc.c
dmapool.c mm/dmapool.c: use TASK_UNINTERRUPTIBLE in dma_pool_alloc() 2011-01-13 17:32:48 -08:00
fadvise.c
failslab.c
filemap_xip.c
filemap.c Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs-2.6 2011-03-24 19:01:30 -07:00
fremap.c
highmem.c mm,x86: fix kmap_atomic_push vs ioremap_32.c 2010-10-27 18:03:05 -07:00
huge_memory.c mm: add VM counters for transparent hugepages 2011-04-14 16:06:55 -07:00
hugetlb.c Fix common misspellings 2011-03-31 11:26:23 -03:00
hwpoison-inject.c Fix common misspellings 2011-03-31 11:26:23 -03:00
init-mm.c
internal.h Fix common misspellings 2011-03-31 11:26:23 -03:00
Kconfig mm: compaction: don't depend on HUGETLB_PAGE 2011-01-26 10:50:02 +10:00
Kconfig.debug mm: debug-pagealloc: fix kconfig dependency warning 2011-03-22 17:44:02 -07:00
kmemcheck.c
kmemleak-test.c kmemleak: remove memset by using kzalloc 2011-01-27 18:31:51 +00:00
kmemleak.c Fix common misspellings 2011-03-31 11:26:23 -03:00
ksm.c Fix common misspellings 2011-03-31 11:26:23 -03:00
maccess.c MN10300: Save frame pointer in thread_info struct rather than global var 2010-10-27 17:29:01 +01:00
madvise.c thp: khugepaged: make khugepaged aware about madvise 2011-01-13 17:32:47 -08:00
Makefile bootmem: Separate out CONFIG_NO_BOOTMEM code into nobootmem.c 2011-02-24 14:43:05 +01:00
memblock.c mm/memblock: properly handle overlaps and fix error path 2011-03-22 17:44:09 -07:00
memcontrol.c Fix common misspellings 2011-03-31 11:26:23 -03:00
memory_hotplug.c mm: optimize pfn calculation in online_page() 2011-04-14 16:06:54 -07:00
memory-failure.c Fix common misspellings 2011-03-31 11:26:23 -03:00
memory.c mm: check that we have the right vma in __access_remote_vm() 2011-04-14 16:06:55 -07:00
mempolicy.c mempolicy: remove redundant check in __mpol_equal() 2011-03-22 17:44:04 -07:00
mempool.c
migrate.c Fix common misspellings 2011-03-31 11:26:23 -03:00
mincore.c thp: mincore transparent hugepage support 2011-01-13 17:32:44 -08:00
mlock.c vm: fix mlock() on stack guard page 2011-04-12 14:15:51 -07:00
mm_init.c
mmap.c brk: COMPAT_BRK: fix detection of randomized brk 2011-04-14 16:06:55 -07:00
mmu_context.c
mmu_notifier.c thp: mmu_notifier_test_young 2011-01-13 17:32:46 -08:00
mmzone.c mm: page allocator: adjust the per-cpu counter threshold when memory is low 2011-01-13 17:32:31 -08:00
mprotect.c thp: mprotect: transparent huge page support 2011-01-13 17:32:44 -08:00
mremap.c mm: avoid wrapping vm_pgoff in mremap() 2011-04-07 07:35:51 -07:00
msync.c
nobootmem.c Fix common misspellings 2011-03-31 11:26:23 -03:00
nommu.c NOMMU: implement access_remote_vm 2011-03-29 14:05:12 +01:00
oom_kill.c lib, arch: add filter argument to show_mem and fix private implementations 2011-03-24 17:49:37 -07:00
page_alloc.c mm/page_alloc.c: silence build_all_zonelists() section mismatch 2011-04-14 16:06:54 -07:00
page_cgroup.c Fix common misspellings 2011-03-31 11:26:23 -03:00
page_io.c block: kill off REQ_UNPLUG 2011-03-10 08:52:27 +01:00
page_isolation.c mm: page_isolation: codeclean fix comment and rm unneeded val init 2010-10-26 16:52:11 -07:00
page-writeback.c Merge branch 'for-2.6.39/core' of git://git.kernel.dk/linux-2.6-block 2011-03-24 10:16:26 -07:00
pagewalk.c pagewalk: only split huge pages when necessary 2011-03-22 17:44:04 -07:00
percpu-km.c percpu: clear memory allocated with the km allocator 2010-10-02 10:28:42 +03:00
percpu-vm.c mm: remove gfp mask from pcpu_get_vm_areas 2011-01-13 17:32:34 -08:00
percpu.c Fix common misspellings 2011-03-31 11:26:23 -03:00
pgtable-generic.c mm/pgtable-generic.c: fix CONFIG_SWAP=n build 2011-01-26 10:49:58 +10:00
prio_tree.c
quicklist.c
readahead.c read-ahead: use plugging 2011-03-10 08:52:26 +01:00
rmap.c fs: move i_wb_list out from under inode_lock 2011-03-24 21:17:51 -04:00
shmem.c tmpfs: fix off-by-one in max_blocks checks 2011-04-14 16:06:55 -07:00
slab.c Fix common misspellings 2011-03-31 11:26:23 -03:00
slob.c mm: Remove support for kmem_cache_name() 2011-01-23 21:00:05 +02:00
slub.c Fix common misspellings 2011-03-31 11:26:23 -03:00
sparse-vmemmap.c tree-wide: fix comment/printk typos 2010-11-01 15:38:34 -04:00
sparse.c Fix common misspellings 2011-03-31 11:26:23 -03:00
swap_state.c block: remove per-queue plugging 2011-03-10 08:52:07 +01:00
swap.c mm: simplify code of swap.c 2011-03-22 17:44:09 -07:00
swapfile.c Merge branch 'for-2.6.39/core' of git://git.kernel.dk/linux-2.6-block 2011-03-24 10:16:26 -07:00
thrash.c
truncate.c mm: deactivate invalidated pages 2011-03-22 17:44:03 -07:00
util.c Fix common misspellings 2011-03-31 11:26:23 -03:00
vmalloc.c vmalloc: remove confusing comment on vwrite() 2011-03-22 17:44:09 -07:00
vmscan.c vmscan: all_unreclaimable() use zone->all_unreclaimable as a name 2011-04-14 16:06:56 -07:00
vmstat.c mm: add VM counters for transparent hugepages 2011-04-14 16:06:55 -07:00