linux/mm
Mel Gorman fa5e084e43 vmscan: do not unconditionally treat zones that fail zone_reclaim() as full
On NUMA machines, the administrator can configure zone_reclaim_mode that
is a more targetted form of direct reclaim.  On machines with large NUMA
distances for example, a zone_reclaim_mode defaults to 1 meaning that
clean unmapped pages will be reclaimed if the zone watermarks are not
being met.  The problem is that zone_reclaim() failing at all means the
zone gets marked full.

This can cause situations where a zone is usable, but is being skipped
because it has been considered full.  Take a situation where a large tmpfs
mount is occuping a large percentage of memory overall.  The pages do not
get cleaned or reclaimed by zone_reclaim(), but the zone gets marked full
and the zonelist cache considers them not worth trying in the future.

This patch makes zone_reclaim() return more fine-grained information about
what occured when zone_reclaim() failued.  The zone only gets marked full
if it really is unreclaimable.  If it's a case that the scan did not occur
or if enough pages were not reclaimed with the limited reclaim_mode, then
the zone is simply skipped.

There is a side-effect to this patch.  Currently, if zone_reclaim()
successfully reclaimed SWAP_CLUSTER_MAX, an allocation attempt would go
ahead.  With this patch applied, zone watermarks are rechecked after
zone_reclaim() does some work.

This bug was introduced by commit 9276b1bc96
("memory page_alloc zonelist caching speedup") way back in 2.6.19 when the
zonelist_cache was introduced.  It was not intended that zone_reclaim()
aggressively consider the zone to be full when it failed as full direct
reclaim can still be an option.  Due to the age of the bug, it should be
considered a -stable candidate.

Signed-off-by: Mel Gorman <mel@csn.ul.ie>
Reviewed-by: Wu Fengguang <fengguang.wu@intel.com>
Reviewed-by: Rik van Riel <riel@redhat.com>
Reviewed-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Cc: Christoph Lameter <cl@linux-foundation.org>
Cc: <stable@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-06-16 19:47:45 -07:00
..
allocpercpu.c percpu: __percpu_depopulate_mask can take a const mask 2009-04-06 13:44:15 -07:00
backing-dev.c block: change the request allocation/congestion logic to be sync/async based 2009-04-06 08:04:53 -07:00
bootmem.c bootmem: fix slab fallback on numa 2009-06-11 19:15:54 +03:00
bounce.c Merge branch 'for-2.6.31' of git://git.kernel.dk/linux-2.6-block 2009-06-11 11:10:35 -07:00
debug-pagealloc.c generic debug pagealloc 2009-04-01 08:59:13 -07:00
dmapool.c
fadvise.c readahead: move max_sane_readahead() calls into force_page_cache_readahead() 2009-06-16 19:47:28 -07:00
failslab.c kmemtrace, mm: fix slab.h dependency problem in mm/failslab.c 2009-04-03 12:23:01 +02:00
filemap_xip.c mm: do_xip_mapping_read: fix length calculation 2009-04-02 19:04:49 -07:00
filemap.c page allocator: do not check NUMA node ID when the caller knows the node is valid 2009-06-16 19:47:32 -07:00
fremap.c Do not account for the address space used by hugetlbfs using VM_ACCOUNT 2009-02-10 10:48:42 -08:00
highmem.c mm: introduce debug_kmap_atomic 2009-04-01 08:59:14 -07:00
hugetlb.c mm: introduce PageHuge() for testing huge/gigantic pages 2009-06-16 19:47:36 -07:00
init-mm.c mm: consolidate init_mm definition 2009-06-16 19:47:28 -07:00
internal.h vmscan: do not unconditionally treat zones that fail zone_reclaim() as full 2009-06-16 19:47:45 -07:00
Kconfig mm: remove CONFIG_UNEVICTABLE_LRU config option 2009-06-16 19:47:42 -07:00
Kconfig.debug generic debug pagealloc: build fix 2009-04-02 19:04:48 -07:00
kmemleak-test.c kmemleak: Simple testing module for kmemleak 2009-06-11 17:04:19 +01:00
kmemleak.c kmemleak: Add the base support 2009-06-11 17:03:28 +01:00
maccess.c [S390] maccess: add weak attribute to probe_kernel_write 2009-06-12 10:27:37 +02:00
madvise.c mm: madvise(): correct return code 2009-06-16 19:47:40 -07:00
Makefile mm: consolidate init_mm definition 2009-06-16 19:47:28 -07:00
memcontrol.c vmscan: evict use-once pages first 2009-06-16 19:47:38 -07:00
memory_hotplug.c page-allocator: reset wmark_min and inactive ratio of zone when hotplug happens 2009-06-16 19:47:42 -07:00
memory.c mm: introduce follow_pfn() 2009-06-16 19:47:40 -07:00
mempolicy.c page allocator: do not check NUMA node ID when the caller knows the node is valid 2009-06-16 19:47:32 -07:00
mempool.c
migrate.c migration: only migrate_prep() once per move_pages() 2009-06-16 19:47:41 -07:00
mincore.c [CVE-2009-0029] System call wrappers part 14 2009-01-14 14:15:24 +01:00
mlock.c mm: remove CONFIG_UNEVICTABLE_LRU config option 2009-06-16 19:47:42 -07:00
mm_init.c
mmap.c Merge branch 'perfcounters-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip 2009-06-11 14:01:07 -07:00
mmu_notifier.c
mmzone.c [ARM] Double check memmap is actually valid with a memmap has unexpected holes V2 2009-05-18 11:22:24 +01:00
mprotect.c perf_counter: Add mmap event hooks to mprotect() 2009-06-08 23:10:43 +02:00
mremap.c [CVE-2009-0029] System call wrappers part 13 2009-01-14 14:15:23 +01:00
msync.c [CVE-2009-0029] System call wrappers part 13 2009-01-14 14:15:23 +01:00
nommu.c nommu: Provide mmap_min_addr definition. 2009-06-10 09:24:09 +10:00
oom_kill.c oom: only oom kill exiting tasks with attached memory 2009-06-16 19:47:45 -07:00
page_alloc.c vmscan: do not unconditionally treat zones that fail zone_reclaim() as full 2009-06-16 19:47:45 -07:00
page_cgroup.c memcg: fix page_cgroup fatal error in FLATMEM 2009-06-12 11:00:54 +03:00
page_io.c mm: remove file argument from swap_readpage() 2009-06-16 19:47:44 -07:00
page_isolation.c
page-writeback.c mm/page-writeback.c: dirty limit type should be unsigned long 2009-06-16 19:47:31 -07:00
pagewalk.c
pdflush.c Revert "mm: add /proc controls for pdflush threads" 2009-05-15 11:32:24 +02:00
percpu.c percpu: remove rbtree and use page->index instead 2009-04-08 18:31:31 +02:00
prio_tree.c
quicklist.c cpumask: replace node_to_cpumask with cpumask_of_node. 2009-03-13 14:49:46 +10:30
readahead.c readahead: introduce context readahead algorithm 2009-06-16 19:47:30 -07:00
rmap.c vmscan: report vm_flags in page_referenced() 2009-06-16 19:47:44 -07:00
shmem_acl.c
shmem.c mm cleanup: shmem_file_setup: 'char *' -> 'const char *' for name argument 2009-06-16 19:47:44 -07:00
slab.c page allocator: slab: use nr_online_nodes to check for a NUMA platform 2009-06-16 19:47:35 -07:00
slob.c page allocator: do not check NUMA node ID when the caller knows the node is valid 2009-06-16 19:47:32 -07:00
slub.c page allocator: use a pre-calculated value instead of num_online_nodes() in fast paths 2009-06-16 19:47:35 -07:00
sparse-vmemmap.c
sparse.c mm: mminit_validate_memmodel_limits(): remove redundant test 2009-04-01 08:59:11 -07:00
swap_state.c mm: remove file argument from swap_readpage() 2009-06-16 19:47:44 -07:00
swap.c mm: fix Committed_AS underflow on large NR_CPUS environment 2009-05-02 15:36:10 -07:00
swapfile.c mm: reuse unused swap entry if necessary 2009-06-16 19:47:42 -07:00
thrash.c
truncate.c mm: remove __invalidate_mapping_pages variant 2009-06-16 19:47:43 -07:00
util.c mm: clean up get_user_pages_fast() documentation 2009-06-16 19:47:30 -07:00
vmalloc.c Merge branch 'for-linus' of git://linux-arm.org/linux-2.6 2009-06-11 14:15:57 -07:00
vmscan.c vmscan: do not unconditionally treat zones that fail zone_reclaim() as full 2009-06-16 19:47:45 -07:00
vmstat.c mm: remove CONFIG_UNEVICTABLE_LRU config option 2009-06-16 19:47:42 -07:00