linux/mm
Tomasz Stanislawski 026b081479 mm/page_alloc.c: fix watermark check in __zone_watermark_ok()
The watermark check consists of two sub-checks.  The first one is:

	if (free_pages <= min + lowmem_reserve)
		return false;

The check assures that there is minimal amount of RAM in the zone.  If
CMA is used then the free_pages is reduced by the number of free pages
in CMA prior to the over-mentioned check.

	if (!(alloc_flags & ALLOC_CMA))
		free_pages -= zone_page_state(z, NR_FREE_CMA_PAGES);

This prevents the zone from being drained from pages available for
non-movable allocations.

The second check prevents the zone from getting too fragmented.

	for (o = 0; o < order; o++) {
		free_pages -= z->free_area[o].nr_free << o;
		min >>= 1;
		if (free_pages <= min)
			return false;
	}

The field z->free_area[o].nr_free is equal to the number of free pages
including free CMA pages.  Therefore the CMA pages are subtracted twice.
This may cause a false positive fail of __zone_watermark_ok() if the CMA
area gets strongly fragmented.  In such a case there are many 0-order
free pages located in CMA.  Those pages are subtracted twice therefore
they will quickly drain free_pages during the check against
fragmentation.  The test fails even though there are many free non-cma
pages in the zone.

This patch fixes this issue by subtracting CMA pages only for a purpose of
(free_pages <= min + lowmem_reserve) check.

Laura said:

  We were observing allocation failures of higher order pages (order 5 =
  128K typically) under tight memory conditions resulting in driver
  failure.  The output from the page allocation failure showed plenty of
  free pages of the appropriate order/type/zone and mostly CMA pages in
  the lower orders.

  For full disclosure, we still observed some page allocation failures
  even after applying the patch but the number was drastically reduced and
  those failures were attributed to fragmentation/other system issues.

Signed-off-by: Tomasz Stanislawski <t.stanislaws@samsung.com>
Signed-off-by: Kyungmin Park <kyungmin.park@samsung.com>
Tested-by: Laura Abbott <lauraa@codeaurora.org>
Cc: Bartlomiej Zolnierkiewicz <b.zolnierkie@samsung.com>
Acked-by: Minchan Kim <minchan@kernel.org>
Cc: Mel Gorman <mel@csn.ul.ie>
Tested-by: Marek Szyprowski <m.szyprowski@samsung.com>
Cc: <stable@vger.kernel.org>	[3.7+]
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-06-12 16:29:46 -07:00
..
backing-dev.c writeback: expose the bdi_wq workqueue 2013-04-01 19:08:06 -07:00
balloon_compaction.c
bootmem.c
bounce.c Merge branch 'for-3.10/core' of git://git.kernel.dk/linux-block 2013-05-08 10:13:35 -07:00
cleancache.c mm: cleancache: clean up cleancache_enabled 2013-04-30 17:04:01 -07:00
compaction.c
debug-pagealloc.c
dmapool.c
fadvise.c teach SYSCALL_DEFINE<n> how to deal with long long/unsigned long long 2013-03-03 22:46:22 -05:00
failslab.c
filemap_xip.c lift sb_start_write() out of ->write() 2013-04-09 14:12:56 -04:00
filemap.c Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs 2013-05-01 17:51:54 -07:00
fremap.c Revert "mm: introduce VM_POPULATE flag to better deal with racy userspace programs" 2013-03-28 17:45:51 -07:00
frontswap.c frontswap: get rid of swap_lock dependency 2013-04-30 17:04:00 -07:00
highmem.c
huge_memory.c mm/THP: use pmd_populate() to update the pmd with pgtable_t pointer 2013-05-24 16:22:51 -07:00
hugetlb_cgroup.c
hugetlb.c Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial 2013-04-30 09:36:50 -07:00
hwpoison-inject.c
init-mm.c
internal.h
interval_tree.c
Kconfig mmKconfig: add an option to disable bounce 2013-04-29 15:54:40 -07:00
Kconfig.debug
kmemcheck.c
kmemleak-test.c
kmemleak.c
ksm.c ksm: fix m68k build: only NUMA needs pfn_to_nid 2013-03-08 15:05:34 -08:00
maccess.c
madvise.c mm: madvise: complete input validation before taking lock 2013-04-29 15:54:37 -07:00
Makefile memcg: add memory.pressure_level events 2013-04-29 15:54:38 -07:00
memblock.c memblock: fix missing comment of memblock_insert_region() 2013-04-29 15:54:38 -07:00
memcontrol.c memcg: don't initialize kmem-cache destroying work for root caches 2013-06-12 16:29:45 -07:00
memory_hotplug.c mm/memory_hotplug.c: fix printk format warnings 2013-05-24 16:22:52 -07:00
memory-failure.c HWPOISON: check dirty flag to match against clean page 2013-04-29 15:54:28 -07:00
memory.c arch, mm: Remove tlb_fast_mode() 2013-06-06 10:07:26 +09:00
mempolicy.c mm/mempolicy.c: fix sp_node_init() argument ordering 2013-03-08 15:05:34 -08:00
mempool.c
migrate.c mm compaction: fix of improper cache flush in migration code 2013-05-24 16:22:51 -07:00
mincore.c
mlock.c Revert "mm: introduce VM_POPULATE flag to better deal with racy userspace programs" 2013-03-28 17:45:51 -07:00
mm_init.c
mmap.c shm: fix null pointer deref when userspace specifies invalid hugepage size 2013-05-09 14:22:47 -07:00
mmu_context.c mm: remove old aio use_mm() comment 2013-05-07 18:38:27 -07:00
mmu_notifier.c mm: mmu_notifier: re-fix freed page still mapped in secondary MMU 2013-05-24 16:22:51 -07:00
mmzone.c
mprotect.c
mremap.c
msync.c
nobootmem.c mm, nobootmem: do memset() after memblock_reserve() 2013-04-29 15:54:39 -07:00
nommu.c Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/signal 2013-05-01 07:21:43 -07:00
oom_kill.c
page_alloc.c mm/page_alloc.c: fix watermark check in __zone_watermark_ok() 2013-06-12 16:29:46 -07:00
page_cgroup.c
page_io.c Merge branch 'for-3.10/core' of git://git.kernel.dk/linux-block 2013-05-08 10:13:35 -07:00
page_isolation.c
page-writeback.c mm: make snapshotting pages for stable writes a per-bio operation 2013-04-29 15:54:33 -07:00
pagewalk.c mm/pagewalk.c: walk_page_range should avoid VM_PFNMAP areas 2013-05-24 16:22:53 -07:00
percpu-km.c
percpu-vm.c
percpu.c
pgtable-generic.c
process_vm_access.c Fix: compat_rw_copy_check_uvector() misuse in aio, readv, writev, and security keys 2013-03-12 11:05:45 -07:00
quicklist.c
readahead.c teach SYSCALL_DEFINE<n> how to deal with long long/unsigned long long 2013-03-03 22:46:22 -05:00
rmap.c rmap: recompute pgoff for unmapping huge page 2013-04-29 15:54:28 -07:00
shmem.c aio: don't include aio.h in sched.h 2013-05-07 20:16:25 -07:00
slab_common.c mm/slab: Fix crash during slab init 2013-05-08 15:02:33 -07:00
slab.c Merge branch 'slab/for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/penberg/linux 2013-05-07 08:42:20 -07:00
slab.h
slob.c
slub.c Merge branch 'slab/for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/penberg/linux 2013-05-07 08:42:20 -07:00
sparse-vmemmap.c sparse-vmemmap: specify vmemmap population range in bytes 2013-04-29 15:54:35 -07:00
sparse.c mm, hotplug: avoid compiling memory hotremove functions when disabled 2013-04-29 15:54:37 -07:00
swap_state.c swap: avoid read_swap_cache_async() race to deadlock while waiting on discard I/O completion 2013-06-12 16:29:45 -07:00
swap.c aio: don't include aio.h in sched.h 2013-05-07 20:16:25 -07:00
swapfile.c frontswap: get rid of swap_lock dependency 2013-04-30 17:04:00 -07:00
truncate.c
util.c
vmalloc.c mm/vmalloc.c: add vfree comment 2013-05-07 18:38:27 -07:00
vmpressure.c memcg: add memory.pressure_level events 2013-04-29 15:54:38 -07:00
vmscan.c mm: thp: add split tail pages to shrink page list in page reclaim 2013-04-29 15:54:38 -07:00
vmstat.c mm/vmstat: add note on safety of drain_zonestat 2013-04-29 15:54:38 -07:00