linux/mm
Hugh Dickins 59927fb984 memcg: free mem_cgroup by RCU to fix oops
After fixing the GPF in mem_cgroup_lru_del_list(), three times one
machine running a similar load (moving and removing memcgs while
swapping) has oopsed in mem_cgroup_zone_nr_lru_pages(), when retrieving
memcg zone numbers for get_scan_count() for shrink_mem_cgroup_zone():
this is where a struct mem_cgroup is first accessed after being chosen
by mem_cgroup_iter().

Just what protects a struct mem_cgroup from being freed, in between
mem_cgroup_iter()'s css_get_next() and its css_tryget()? css_tryget()
fails once css->refcnt is zero with CSS_REMOVED set in flags, yes: but
what if that memory is freed and reused for something else, which sets
"refcnt" non-zero? Hmm, and scope for an indefinite freeze if refcnt is
left at zero but flags are cleared.

It's tempting to move the css_tryget() into css_get_next(), to make it
really "get" the css, but I don't think that actually solves anything:
the same difficulty in moving from css_id found to stable css remains.

But we already have rcu_read_lock() around the two, so it's easily fixed
if __mem_cgroup_free() just uses kfree_rcu() to free mem_cgroup.

However, a big struct mem_cgroup is allocated with vzalloc() instead of
kzalloc(), and we're not allowed to vfree() at interrupt time: there
doesn't appear to be a general vfree_rcu() to help with this, so roll
our own using schedule_work().  The compiler decently removes
vfree_work() and vfree_rcu() when the config doesn't need them.

Signed-off-by: Hugh Dickins <hughd@google.com>
Acked-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Acked-by: Johannes Weiner <hannes@cmpxchg.org>
Cc: Konstantin Khlebnikov <khlebnikov@openvz.org>
Cc: Tejun Heo <tj@kernel.org>
Cc: Ying Han <yinghan@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-03-15 17:03:03 -07:00
..
backing-dev.c backing-dev: fix wakeup timer races with bdi_unregister() 2012-02-01 16:52:49 +08:00
bootmem.c
bounce.c
cleancache.c
compaction.c mm: compaction: check for overlapping nodes during isolation for migration 2012-02-08 19:03:51 -08:00
debug-pagealloc.c
dmapool.c
fadvise.c
failslab.c
filemap_xip.c mm/filemap_xip.c: fix race condition in xip_file_fault() 2012-02-03 16:16:41 -08:00
filemap.c readahead: fix pipeline break caused by block plug 2012-02-03 16:16:41 -08:00
fremap.c
highmem.c
huge_memory.c mm: thp: fix BUG on mm->nr_ptes 2012-03-05 15:49:43 -08:00
hugetlb.c flush_tlb_range() needs ->page_table_lock when ->mmap_sem is not held 2012-03-05 13:51:32 -08:00
hwpoison-inject.c
init-mm.c
internal.h
Kconfig
Kconfig.debug
kmemcheck.c
kmemleak-test.c
kmemleak.c kmemleak: Disable early logging when kmemleak is off by default 2012-01-20 16:57:05 +00:00
ksm.c memcg: fix GPF when cgroup removal races with last exit 2012-03-05 15:49:43 -08:00
maccess.c
madvise.c
Makefile
memblock.c memblock: Fix size aligning of memblock_alloc_base_nid() 2012-03-01 10:53:18 +01:00
memcontrol.c memcg: free mem_cgroup by RCU to fix oops 2012-03-15 17:03:03 -07:00
memory_hotplug.c mm: compaction: introduce sync-light migration for use by compaction 2012-01-12 20:13:09 -08:00
memory-failure.c mm: compaction: introduce sync-light migration for use by compaction 2012-01-12 20:13:09 -08:00
memory.c mm: fix rss count leakage during migration 2012-01-23 08:38:49 -08:00
mempolicy.c vm: avoid using find_vma_prev() unnecessarily 2012-03-06 18:23:36 -08:00
mempool.c
migrate.c memcg: fix GPF when cgroup removal races with last exit 2012-03-05 15:49:43 -08:00
mincore.c
mlock.c vm: avoid using find_vma_prev() unnecessarily 2012-03-06 18:23:36 -08:00
mm_init.c
mmap.c mm: fix find_vma_prev 2012-03-06 16:48:03 -08:00
mmu_context.c
mmu_notifier.c
mmzone.c
mprotect.c vm: avoid using find_vma_prev() unnecessarily 2012-03-06 18:23:36 -08:00
mremap.c
msync.c
nobootmem.c
nommu.c NOMMU: Don't need to clear vm_mm when deleting a VMA 2012-02-24 08:59:04 -08:00
oom_kill.c mm: unify remaining mem_cont, mem, etc. variable names to memcg 2012-01-12 20:13:06 -08:00
page_alloc.c vfs: fix panic in __d_lookup() with high dentry hashtable counts 2012-02-13 20:45:38 -05:00
page_cgroup.c page_cgroup: fix horrid swap accounting regression 2012-03-06 08:18:23 -08:00
page_io.c
page_isolation.c
page-writeback.c
pagewalk.c
percpu-km.c
percpu-vm.c percpu: use bitmap_clear 2012-01-20 09:23:16 -08:00
percpu.c Kmemleak patches 2012-01-14 18:11:11 -08:00
pgtable-generic.c
prio_tree.c
process_vm_access.c Fix race in process_vm_rw_core 2012-02-02 12:55:17 -08:00
quicklist.c
readahead.c
rmap.c mm: unify remaining mem_cont, mem, etc. variable names to memcg 2012-01-12 20:13:06 -08:00
shmem.c SHM_UNLOCK: fix Unevictable pages stranded after swap 2012-01-23 08:38:48 -08:00
slab.c
slob.c
slub.c mm,x86,um: move CMPXCHG_DOUBLE config option 2012-01-12 20:13:03 -08:00
sparse-vmemmap.c
sparse.c
swap_state.c memcg: fix GPF when cgroup removal races with last exit 2012-03-05 15:49:43 -08:00
swap.c memcg: fix GPF when cgroup removal races with last exit 2012-03-05 15:49:43 -08:00
swapfile.c mm: unify remaining mem_cont, mem, etc. variable names to memcg 2012-01-12 20:13:06 -08:00
thrash.c
truncate.c
util.c
vmalloc.c mm/vmalloc.c: eliminate extra loop in pcpu_get_vm_areas error path 2012-01-12 20:13:10 -08:00
vmscan.c SHM_UNLOCK: fix Unevictable pages stranded after swap 2012-01-23 08:38:48 -08:00
vmstat.c mm,x86,um: move CMPXCHG_LOCAL config option 2012-01-12 20:13:03 -08:00