linux/mm
Glauber Costa 3f13461939 memcg: decrement static keys at real destroy time
We call the destroy function when a cgroup starts to be removed, such as
by a rmdir event.

However, because of our reference counters, some objects are still
inflight.  Right now, we are decrementing the static_keys at destroy()
time, meaning that if we get rid of the last static_key reference, some
objects will still have charges, but the code to properly uncharge them
won't be run.

This becomes a problem specially if it is ever enabled again, because now
new charges will be added to the staled charges making keeping it pretty
much impossible.

We just need to be careful with the static branch activation: since there
is no particular preferred order of their activation, we need to make sure
that we only start using it after all call sites are active.  This is
achieved by having a per-memcg flag that is only updated after
static_key_slow_inc() returns.  At this time, we are sure all sites are
active.

This is made per-memcg, not global, for a reason: it also has the effect
of making socket accounting more consistent.  The first memcg to be
limited will trigger static_key() activation, therefore, accounting.  But
all the others will then be accounted no matter what.  After this patch,
only limited memcgs will have its sockets accounted.

[akpm@linux-foundation.org: move enum sock_flag_bits into sock.h,
                            document enum sock_flag_bits,
                            convert memcg_proto_active() and memcg_proto_activated() to test_bit(),
                            redo tcp_update_limit() comment to 80 cols]
Signed-off-by: Glauber Costa <glommer@parallels.com>
Cc: Tejun Heo <tj@kernel.org>
Cc: Li Zefan <lizefan@huawei.com>
Acked-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Michal Hocko <mhocko@suse.cz>
Acked-by: David Miller <davem@davemloft.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-05-29 16:22:28 -07:00
..
backing-dev.c backing-dev: fix wakeup timer races with bdi_unregister() 2012-02-01 16:52:49 +08:00
bootmem.c mm/bootmem.c: cleanup on addition to bootmem data list 2012-05-29 16:22:24 -07:00
bounce.c mm: remove the second argument of k[un]map_atomic() 2012-03-20 21:48:27 +08:00
cleancache.c mm: cleancache: Use __read_mostly as appropiate. 2012-01-23 16:08:09 -05:00
compaction.c mm/memcg: apply add/del_page to lruvec 2012-05-29 16:22:28 -07:00
debug-pagealloc.c mm, x86: Remove debug_pagealloc_enabled 2011-12-06 09:24:07 +01:00
dmapool.c
fadvise.c fadvise: only initiate writeback for specified range with FADV_DONTNEED 2012-01-10 16:30:43 -08:00
failslab.c switch debugfs to umode_t 2012-01-03 22:54:56 -05:00
filemap_xip.c mm/filemap_xip.c: fix race condition in xip_file_fault() 2012-02-03 16:16:41 -08:00
filemap.c mm: move readahead syscall to mm/readahead.c 2012-05-29 16:22:23 -07:00
fremap.c
highmem.c Merge branch 'modsplit-Oct31_2011' of git://git.kernel.org/pub/scm/linux/kernel/git/paulg/linux 2011-11-06 19:44:47 -08:00
huge_memory.c mm/memcg: apply add/del_page to lruvec 2012-05-29 16:22:28 -07:00
hugetlb.c hugetlb: fix resv_map leak in error path 2012-05-29 16:22:24 -07:00
hwpoison-inject.c HWPOISON: Clean up memory_failure() vs. __memory_failure() 2012-01-03 12:06:32 -08:00
init-mm.c
internal.h mm: compaction: handle incorrect MIGRATE_UNMOVABLE type pageblocks 2012-05-29 16:22:22 -07:00
Kconfig Cross Memory Attach: make it Kconfigurable 2012-05-29 16:22:20 -07:00
Kconfig.debug mm: more intensive memory corruption debugging 2012-01-10 16:30:42 -08:00
kmemcheck.c
kmemleak-test.c
kmemleak.c kmemleak: Disable early logging when kmemleak is off by default 2012-01-20 16:57:05 +00:00
ksm.c ksm: cleanup: introduce find_mergeable_vma() 2012-03-21 17:54:59 -07:00
maccess.c
madvise.c mm/fs: route MADV_REMOVE to FALLOC_FL_PUNCH_HOLE 2012-05-29 16:22:22 -07:00
Makefile Cross Memory Attach: make it Kconfigurable 2012-05-29 16:22:20 -07:00
memblock.c mm/memblock: fix memory leak on extending regions 2012-05-29 16:22:24 -07:00
memcontrol.c memcg: decrement static keys at real destroy time 2012-05-29 16:22:28 -07:00
memory_hotplug.c mm: print physical addresses consistently with other parts of kernel 2012-05-29 16:22:21 -07:00
memory-failure.c mm/memory_failure: let the compiler add the function name 2012-05-29 16:22:18 -07:00
memory.c thp, memcg: split hugepage for memcg oom on cow 2012-05-29 16:22:19 -07:00
mempolicy.c mm: do_migrate_pages(): rename arguments 2012-05-29 16:22:20 -07:00
mempool.c mempool: fix first round failure behavior 2012-01-10 16:30:45 -08:00
migrate.c Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace 2012-05-23 17:42:39 -07:00
mincore.c mm: thp: fix pmd_bad() triggering in code paths holding mmap_sem read mode 2012-03-21 17:54:54 -07:00
mlock.c vm: avoid using find_vma_prev() unnecessarily 2012-03-06 18:23:36 -08:00
mm_init.c
mmap.c mm/mmap.c: find_vma(): remove unnecessary if(mm) check 2012-05-29 16:22:19 -07:00
mmu_context.c mm, counters: remove task argument to sync_mm_rss() and __sync_task_rss_stat() 2012-03-21 17:54:59 -07:00
mmu_notifier.c
mmzone.c mm: add link from struct lruvec to struct zone 2012-05-29 16:22:26 -07:00
mprotect.c Merge branch 'akpm' (Andrew's patch-bomb) 2012-03-22 09:04:48 -07:00
mremap.c mm: collapse security_vm_enough_memory() variants into a single function 2012-02-14 10:45:39 +11:00
msync.c
nobootmem.c mm: remove sparsemem allocation details from the bootmem allocator 2012-05-29 16:22:22 -07:00
nommu.c kill mm argument of vm_munmap() 2012-04-21 01:58:20 -04:00
oom_kill.c mm, oom: normalize oom scores to oom_score_adj scale only for userspace 2012-05-29 16:22:24 -07:00
page_alloc.c mm: add link from struct lruvec to struct zone 2012-05-29 16:22:26 -07:00
page_cgroup.c page_cgroup: fix horrid swap accounting regression 2012-03-06 08:18:23 -08:00
page_io.c
page_isolation.c mm: page_isolation: MIGRATE_CMA isolation functions added 2012-05-21 15:09:33 +02:00
page-writeback.c writeback: initialize global_dirty_limit 2012-05-06 13:41:58 +08:00
pagewalk.c mm: thp: fix pmd_bad() triggering in code paths holding mmap_sem read mode 2012-03-21 17:54:54 -07:00
percpu-km.c
percpu-vm.c percpu: use bitmap_clear 2012-01-20 09:23:16 -08:00
percpu.c kmemleak: Fix the kmemleak tracking of the percpu areas with !SMP 2012-05-09 10:13:29 -07:00
pgtable-generic.c arch/tile: allow building Linux with transparent huge pages enabled 2012-05-25 12:48:21 -04:00
prio_tree.c
process_vm_access.c Fix race in process_vm_rw_core 2012-02-02 12:55:17 -08:00
quicklist.c
readahead.c mm: move readahead syscall to mm/readahead.c 2012-05-29 16:22:23 -07:00
rmap.c mm: remove swap token code 2012-05-29 16:22:19 -07:00
shmem.c tmpfs: support SEEK_DATA and SEEK_HOLE 2012-05-29 16:22:23 -07:00
slab.c Merge branch 'slab/for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/penberg/linux 2012-03-28 15:04:26 -07:00
slob.c
slub.c slub: missing test for partial pages flush work in flush_all() 2012-05-17 18:00:51 -07:00
sparse-vmemmap.c
sparse.c mm: remove sparsemem allocation details from the bootmem allocator 2012-05-29 16:22:22 -07:00
swap_state.c mm: fix s390 BUG by __set_page_dirty_no_writeback on swap 2012-04-23 18:19:22 -07:00
swap.c mm/memcg: apply add/del_page to lruvec 2012-05-29 16:22:28 -07:00
swapfile.c memcg: fix/change behavior of shared anon at moving task 2012-05-29 16:22:24 -07:00
truncate.c mm/fs: remove truncate_range 2012-05-29 16:22:23 -07:00
util.c procfs: mark thread stack correctly in proc/<pid>/maps 2012-03-21 17:54:58 -07:00
vmalloc.c mm: fix faulty initialization in vmalloc_init() 2012-05-29 16:22:24 -07:00
vmscan.c mm/memcg: apply add/del_page to lruvec 2012-05-29 16:22:28 -07:00
vmstat.c mm/vmstat.c: remove debug fs entries on failure of file creation and made extfrag_debug_root dentry local 2012-05-29 16:22:19 -07:00