linux/mm
KAMEZAWA Hiroyuki 32047e2a85 memcg: avoid lock in updating file_mapped (Was fix race in file_mapped accouting flag management
At accounting file events per memory cgroup, we need to find memory cgroup
via page_cgroup->mem_cgroup.  Now, we use lock_page_cgroup() for guarantee
pc->mem_cgroup is not overwritten while we make use of it.

But, considering the context which page-cgroup for files are accessed,
we can use alternative light-weight mutual execusion in the most case.

At handling file-caches, the only race we have to take care of is "moving"
account, IOW, overwriting page_cgroup->mem_cgroup.  (See comment in the
patch)

Unlike charge/uncharge, "move" happens not so frequently. It happens only when
rmdir() and task-moving (with a special settings.)
This patch adds a race-checker for file-cache-status accounting v.s. account
moving. The new per-cpu-per-memcg counter MEM_CGROUP_ON_MOVE is added.
The routine for account move
  1. Increment it before start moving
  2. Call synchronize_rcu()
  3. Decrement it after the end of moving.
By this, file-status-counting routine can check it needs to call
lock_page_cgroup(). In most case, I doesn't need to call it.

Following is a perf data of a process which mmap()/munmap 32MB of file cache
in a minute.

Before patch:
    28.25%     mmap  mmap               [.] main
    22.64%     mmap  [kernel.kallsyms]  [k] page_fault
     9.96%     mmap  [kernel.kallsyms]  [k] mem_cgroup_update_file_mapped
     3.67%     mmap  [kernel.kallsyms]  [k] filemap_fault
     3.50%     mmap  [kernel.kallsyms]  [k] unmap_vmas
     2.99%     mmap  [kernel.kallsyms]  [k] __do_fault
     2.76%     mmap  [kernel.kallsyms]  [k] find_get_page

After patch:
    30.00%     mmap  mmap               [.] main
    23.78%     mmap  [kernel.kallsyms]  [k] page_fault
     5.52%     mmap  [kernel.kallsyms]  [k] mem_cgroup_update_file_mapped
     3.81%     mmap  [kernel.kallsyms]  [k] unmap_vmas
     3.26%     mmap  [kernel.kallsyms]  [k] find_get_page
     3.18%     mmap  [kernel.kallsyms]  [k] __do_fault
     3.03%     mmap  [kernel.kallsyms]  [k] filemap_fault
     2.40%     mmap  [kernel.kallsyms]  [k] handle_mm_fault
     2.40%     mmap  [kernel.kallsyms]  [k] do_page_fault

This patch reduces memcg's cost to some extent.
(mem_cgroup_update_file_mapped is called by both of map/unmap)

Note: It seems some more improvements are required..but no idea.
      maybe removing set/unset flag is required.

Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Reviewed-by: Daisuke Nishimura <nishimura@mxp.nes.nec.co.jp>
Cc: Balbir Singh <balbir@in.ibm.com>
Cc: Greg Thelen <gthelen@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2010-10-27 18:03:09 -07:00
..
backing-dev.c Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs-2.6 2010-10-26 17:58:44 -07:00
bootmem.c x86, memblock: Replace e820_/_early string with memblock_ 2010-08-27 11:13:47 -07:00
bounce.c bounce: call flush_dcache_page() after bounce_copy_vec() 2010-09-09 18:57:25 -07:00
compaction.c mm: compaction: handle active and inactive fairly in too_many_isolated 2010-09-09 18:57:24 -07:00
debug-pagealloc.c
dmapool.c mm: add a might_sleep_if() to dma_pool_alloc() 2010-10-26 16:52:08 -07:00
fadvise.c
failslab.c
filemap_xip.c
filemap.c mm: remove temporary variable on generic_file_direct_write() 2010-10-26 16:52:09 -07:00
fremap.c Avoid pgoff overflow in remap_file_pages 2010-09-25 09:34:58 -07:00
highmem.c mm,x86: fix kmap_atomic_push vs ioremap_32.c 2010-10-27 18:03:05 -07:00
hugetlb.c mm/hugetlb.c: add missing spin_lock() to hugetlb_cow() 2010-10-26 16:52:11 -07:00
hwpoison-inject.c HWPOISON, hugetlb: support hwpoison injection for hugepage 2010-08-11 09:23:11 +02:00
init-mm.c mm: provide init_mm mm_context initializer 2010-08-09 20:44:54 -07:00
internal.h mm: fix is_mem_section_removable() page_order BUG_ON check 2010-10-26 16:52:11 -07:00
Kconfig Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/percpu 2010-10-22 17:31:36 -07:00
Kconfig.debug
kmemcheck.c
kmemleak-test.c
kmemleak.c kmemleak: Fix typo in the comment 2010-08-08 21:57:23 +01:00
ksm.c ksm: fix bad user data when swapping 2010-10-04 11:09:53 -07:00
maccess.c
madvise.c
Makefile percpu: use percpu allocator on UP too 2010-10-02 10:26:05 +03:00
memblock.c memblock: Annotate memblock functions with __init_memblock 2010-10-11 16:00:52 -07:00
memcontrol.c memcg: avoid lock in updating file_mapped (Was fix race in file_mapped accouting flag management 2010-10-27 18:03:09 -07:00
memory_hotplug.c mm: do_migrate_range: reduce list_empty() check 2010-10-26 16:52:11 -07:00
memory-failure.c mm: compaction: fix COMPACTPAGEFAILED counting 2010-10-26 16:52:06 -07:00
memory.c use clear_page()/copy_page() in favor of memset()/memcpy() on whole pages 2010-10-26 16:52:13 -07:00
mempolicy.c mm/mempolicy.c: check return code of check_range 2010-10-26 16:52:06 -07:00
mempool.c
migrate.c mm: fix error reporting in move_pages() syscall 2010-10-26 16:52:11 -07:00
mincore.c
mlock.c mm: Move vma_stack_continue into mm.h 2010-09-09 09:05:06 -07:00
mm_init.c
mmap.c mmap: call unlink_anon_vmas() in __split_vma() in case of error 2010-09-22 17:22:40 -07:00
mmu_context.c
mmu_notifier.c
mmzone.c mm: page allocator: calculate a better estimate of NR_FREE_PAGES when memory is low and kswapd is awake 2010-09-09 18:57:25 -07:00
mprotect.c
mremap.c mm: remove pte_*map_nested() 2010-10-26 16:52:08 -07:00
msync.c
nommu.c mm: add vzalloc() and vzalloc_node() helpers 2010-10-26 16:52:10 -07:00
oom_kill.c oom: kill all threads sharing oom killed task's mm 2010-10-26 16:52:05 -07:00
page_alloc.c mm: add casts to/from gfp_t in gfp_to_alloc_flags() 2010-10-26 16:52:09 -07:00
page_cgroup.c
page_io.c
page_isolation.c mm: page_isolation: codeclean fix comment and rm unneeded val init 2010-10-26 16:52:11 -07:00
page-writeback.c writeback: remove the internal 5% low bound on dirty_ratio 2010-10-26 16:52:08 -07:00
pagewalk.c
percpu-km.c percpu: clear memory allocated with the km allocator 2010-10-02 10:28:42 +03:00
percpu-vm.c
percpu.c Merge branch 'for-next' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial 2010-10-24 13:41:39 -07:00
prio_tree.c
quicklist.c
readahead.c
rmap.c rmap: make anon_vma_chain_free() static 2010-10-26 16:52:10 -07:00
shmem.c fs: do not assign default i_ino in new_inode 2010-10-25 21:26:11 -04:00
slab.c replace nested max/min macros with {max,min}3 macro 2010-10-26 16:52:12 -07:00
slob.c slob: fix gfp flags for order-0 page allocations 2010-10-02 10:24:28 +03:00
slub.c SLUB: Fix memory hotplug with !NUMA 2010-10-06 21:16:42 +03:00
sparse-vmemmap.c x86: Use memblock to replace early_res 2010-08-27 11:12:29 -07:00
sparse.c
swap_state.c
swap.c
swapfile.c /proc/swaps: support polling 2010-10-26 16:52:11 -07:00
thrash.c
truncate.c check ATTR_SIZE contraints in inode_change_ok 2010-08-09 16:47:39 -04:00
util.c export __get_user_pages_fast() function 2010-10-24 10:51:24 +02:00
vmalloc.c mm: add vzalloc() and vzalloc_node() helpers 2010-10-26 16:52:10 -07:00
vmscan.c vmscan,tmpfs: treat used once pages on tmpfs as used once 2010-10-26 16:52:08 -07:00
vmstat.c vmstat: include compaction.h when CONFIG_COMPACTION 2010-10-26 16:52:10 -07:00