linux/mm
Yongseok Koh 88f5004430 vmalloc: remove BUG_ON due to racy counting of VM_LAZY_FREE
In free_unmap_area_noflush(), va->flags is marked as VM_LAZY_FREE first, and
then vmap_lazy_nr is increased atomically.

But, in __purge_vmap_area_lazy(), while traversing of vmap_are_list, nr
is counted by checking VM_LAZY_FREE is set to va->flags.  After counting
the variable nr, kernel reads vmap_lazy_nr atomically and checks a
BUG_ON condition whether nr is greater than vmap_lazy_nr to prevent
vmap_lazy_nr from being negative.

The problem is that, if interrupted right after marking VM_LAZY_FREE,
increment of vmap_lazy_nr can be delayed.  Consequently, BUG_ON
condition can be met because nr is counted more than vmap_lazy_nr.

It is highly probable when vmalloc/vfree are called frequently.  This
scenario have been verified by adding delay between marking VM_LAZY_FREE
and increasing vmap_lazy_nr in free_unmap_area_noflush().

Even the vmap_lazy_nr is for checking high watermark, it never be the
strict watermark.  Although the BUG_ON condition is to prevent
vmap_lazy_nr from being negative, vmap_lazy_nr is signed variable.  So,
it could go down to negative value temporarily.

Consequently, removing the BUG_ON condition is proper.

A possible BUG_ON message is like the below.

   kernel BUG at mm/vmalloc.c:517!
   invalid opcode: 0000 [#1] SMP
   EIP: 0060:[<c04824a4>] EFLAGS: 00010297 CPU: 3
   EIP is at __purge_vmap_area_lazy+0x144/0x150
   EAX: ee8a8818 EBX: c08e77d4 ECX: e7c7ae40 EDX: c08e77ec
   ESI: 000081fe EDI: e7c7ae60 EBP: e7c7ae64 ESP: e7c7ae3c
   DS: 007b ES: 007b FS: 00d8 GS: 0033 SS: 0068
   Call Trace:
   [<c0482ad9>] free_unmap_vmap_area_noflush+0x69/0x70
   [<c0482b02>] remove_vm_area+0x22/0x70
   [<c0482c15>] __vunmap+0x45/0xe0
   [<c04831ec>] vmalloc+0x2c/0x30
   Code: 8d 59 e0 eb 04 66 90 89 cb 89 d0 e8 87 fe ff ff 8b 43 20 89 da 8d 48 e0 8d 43 20 3b 04 24 75 e7 fe 05 a8 a5 a3 c0 e9 78 ff ff ff <0f> 0b eb fe 90 8d b4 26 00 00 00 00 56 89 c6 b8 ac a5 a3 c0 31
   EIP: [<c04824a4>] __purge_vmap_area_lazy+0x144/0x150 SS:ESP 0068:e7c7ae3c

[ See also http://marc.info/?l=linux-kernel&m=126335856228090&w=2 ]

Signed-off-by: Yongseok Koh <yongseok.koh@samsung.com>
Reviewed-by: Minchan Kim <minchan.kim@gmail.com>
Cc: Nick Piggin <npiggin@suse.de>
Cc: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2010-01-21 07:20:06 -08:00
..
backing-dev.c
bootmem.c
bounce.c
debug-pagealloc.c
dmapool.c
fadvise.c
failslab.c
filemap_xip.c
filemap.c direct I/O fallback sync simplification 2009-12-16 12:16:50 -05:00
fremap.c
highmem.c
hugetlb.c mm: hugetlb: fix clear_huge_page() 2010-01-11 09:34:06 -08:00
hwpoison-inject.c HWPOISON: Don't do early filtering if filter is disabled 2009-12-16 12:20:01 +01:00
init-mm.c
internal.h HWPOISON: add an interface to switch off/on all the page filters 2009-12-16 12:19:59 +01:00
Kconfig HWPOISON: Add PROC_FS dependency to hwpoison injector v2 2009-12-21 19:56:42 +01:00
Kconfig.debug
kmemcheck.c
kmemleak-test.c
kmemleak.c Merge branch 'kmemleak' of git://linux-arm.org/linux-2.6 2009-12-17 16:00:19 -08:00
ksm.c
maccess.c maccess,probe_kernel: Allow arch specific override probe_kernel_(read|write) 2010-01-07 11:58:36 -06:00
madvise.c HWPOISON: Add a madvise() injector for soft page offlining 2009-12-16 12:20:00 +01:00
Makefile make generic_acl slightly more generic 2009-12-16 12:16:49 -05:00
memcontrol.c memcg: ensure list is empty at rmdir 2010-01-16 12:15:39 -08:00
memory_hotplug.c
memory-failure.c HWPOISON: Add PROC_FS dependency to hwpoison injector v2 2009-12-21 19:56:42 +01:00
memory.c Merge branch 'hwpoison' of git://git.kernel.org/pub/scm/linux/kernel/git/ak/linux-mce-2.6 2009-12-16 12:36:49 -08:00
mempolicy.c
mempool.c
migrate.c
mincore.c
mlock.c
mm_init.c
mmap.c mm: move sys_mmap_pgoff from util.c 2009-12-30 12:23:27 -08:00
mmu_context.c
mmu_notifier.c
mmzone.c
mprotect.c
mremap.c
msync.c
nommu.c nommu: fix shared mmap after truncate shrinkage problems 2010-01-16 12:15:40 -08:00
oom_kill.c memcg: avoid oom-killing innocent task in case of use_hierarchy 2009-12-16 07:20:07 -08:00
page_alloc.c page allocator: update NR_FREE_PAGES only when necessary 2010-01-16 16:53:55 -08:00
page_cgroup.c
page_io.c
page_isolation.c
page-writeback.c
pagewalk.c
percpu.c percpu: avoid calling __pcpu_ptr_to_addr(NULL) 2010-01-11 09:34:04 -08:00
prio_tree.c
quicklist.c
readahead.c readahead: add blk_run_backing_dev 2009-12-17 15:45:32 -08:00
rmap.c memcg: make memcg's file mapped consistent with global VM 2009-12-16 07:20:07 -08:00
shmem.c Fix breakage in shmem.c 2009-12-16 19:48:48 -05:00
slab.c SLAB: Fix lockdep annotation breakage 2009-12-28 20:57:27 +02:00
slob.c
slub.c
sparse-vmemmap.c
sparse.c
swap_state.c
swap.c
swapfile.c
thrash.c
truncate.c vfs: Fix vmtruncate() regression 2010-01-13 16:09:33 -08:00
util.c nommu: don't need get_unmapped_area() for NOMMU 2010-01-16 12:15:40 -08:00
vmalloc.c vmalloc: remove BUG_ON due to racy counting of VM_LAZY_FREE 2010-01-21 07:20:06 -08:00
vmscan.c vmscan: kswapd: don't retry balance_pgdat() if all zones are unreclaimable 2010-01-16 12:15:39 -08:00
vmstat.c