linux/mm
KAMEZAWA Hiroyuki 7a81b88cb5 memcg: introduce charge-commit-cancel style of functions
There is a small race in do_swap_page().  When the page swapped-in is
charged, the mapcount can be greater than 0.  But, at the same time some
process (shares it ) call unmap and make mapcount 1->0 and the page is
uncharged.

      CPUA 			CPUB
       mapcount == 1.
   (1) charge if mapcount==0     zap_pte_range()
                                (2) mapcount 1 => 0.
			        (3) uncharge(). (success)
   (4) set page's rmap()
       mapcount 0=>1

Then, this swap page's account is leaked.

For fixing this, I added a new interface.
  - charge
   account to res_counter by PAGE_SIZE and try to free pages if necessary.
  - commit
   register page_cgroup and add to LRU if necessary.
  - cancel
   uncharge PAGE_SIZE because of do_swap_page failure.

     CPUA
  (1) charge (always)
  (2) set page's rmap (mapcount > 0)
  (3) commit charge was necessary or not after set_pte().

This protocol uses PCG_USED bit on page_cgroup for avoiding over accounting.
Usual mem_cgroup_charge_common() does charge -> commit at a time.

And this patch also adds following function to clarify all charges.

  - mem_cgroup_newpage_charge() ....replacement for mem_cgroup_charge()
	called against newly allocated anon pages.

  - mem_cgroup_charge_migrate_fixup()
        called only from remove_migration_ptes().
	we'll have to rewrite this later.(this patch just keeps old behavior)
	This function will be removed by additional patch to make migration
	clearer.

Good for clarifying "what we do"

Then, we have 4 following charge points.
  - newpage
  - swap-in
  - add-to-cache.
  - migration.

[akpm@linux-foundation.org: add missing inline directives to stubs]
Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Reviewed-by: Daisuke Nishimura <nishimura@mxp.nes.nec.co.jp>
Cc: Balbir Singh <balbir@in.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-01-08 08:31:04 -08:00
..
allocpercpu.c
backing-dev.c Merge branch 'core-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip 2009-01-06 17:10:04 -08:00
bootmem.c bootmem: print request details before BUG_ON(them) 2009-01-06 15:59:10 -08:00
bounce.c bounce: don't rely on a zeroed bio_vec list 2008-12-29 08:29:52 +01:00
dmapool.c
fadvise.c
failslab.c SLUB: failslab support 2008-12-29 11:27:46 +02:00
filemap_xip.c badpage: remove vma from page_remove_rmap 2009-01-06 15:59:07 -08:00
filemap.c mm: pagecache gfp flags fix 2009-01-06 15:59:09 -08:00
fremap.c badpage: remove vma from page_remove_rmap 2009-01-06 15:59:07 -08:00
highmem.c
hugetlb.c mm: hugetlb: remove redundant `if' operation 2009-01-06 15:59:10 -08:00
internal.h mm: make get_user_pages() interruptible 2009-01-06 15:59:08 -08:00
Kconfig Remove obsolete CONFIG_RESOURCES_64BIT 2009-01-06 15:59:14 -08:00
maccess.c
madvise.c
Makefile shmem: unify regular and tiny shmem 2009-01-06 15:59:08 -08:00
memcontrol.c memcg: introduce charge-commit-cancel style of functions 2009-01-08 08:31:04 -08:00
memory_hotplug.c mm: remove GFP_HIGHUSER_PAGECACHE 2009-01-06 15:59:01 -08:00
memory.c memcg: introduce charge-commit-cancel style of functions 2009-01-08 08:31:04 -08:00
mempolicy.c Merge branch 'master' into next 2008-11-14 11:29:12 +11:00
mempool.c
migrate.c memcg: introduce charge-commit-cancel style of functions 2009-01-08 08:31:04 -08:00
mincore.c
mlock.c mm: make get_user_pages() interruptible 2009-01-06 15:59:08 -08:00
mm_init.c
mmap.c mm: check for no mmaps in exit_mmap() 2009-01-06 15:59:10 -08:00
mmu_notifier.c
mmzone.c
mprotect.c mm: cleanup: remove #ifdef CONFIG_MIGRATION 2009-01-06 15:59:00 -08:00
mremap.c mm: update my address 2009-01-05 17:44:42 -08:00
msync.c add a vfs_fsync helper 2009-01-05 11:54:28 -05:00
nommu.c inode->i_op is never NULL 2009-01-05 11:54:28 -05:00
oom_kill.c oom: print triggering task's cpuset and mems allowed 2009-01-06 15:58:59 -08:00
page_alloc.c mm: remove CONFIG_OUT_OF_LINE_PFN_TO_PAGE 2009-01-06 15:59:10 -08:00
page_cgroup.c mm: make init_section_page_cgroup() static 2009-01-06 15:59:04 -08:00
page_io.c mm: try_to_free_swap replaces remove_exclusive_swap_page 2009-01-06 15:59:03 -08:00
page_isolation.c memory hotplug: fix page_zone() calculation in test_pages_isolated() 2008-11-06 15:41:19 -08:00
page-writeback.c mm: add dirty_background_bytes and dirty_bytes sysctls 2009-01-06 15:59:03 -08:00
pagewalk.c
pdflush.c cpumask: convert mm/ 2009-01-01 10:12:29 +10:30
prio_tree.c
quicklist.c
readahead.c
rmap.c badpage: remove vma from page_remove_rmap 2009-01-06 15:59:07 -08:00
shmem_acl.c
shmem.c shmem: unify regular and tiny shmem 2009-01-06 15:59:08 -08:00
slab.c cpumask: convert mm/ 2009-01-01 10:12:29 +10:30
slob.c slob: do not pass the SLAB flags as GFP in kmem_cache_create() 2008-12-15 16:27:06 -08:00
slub.c trivial: fix an -> a typos in documentation and comments 2009-01-06 11:28:07 +01:00
sparse-vmemmap.c vmemmap: warn about page_structs with remote distance 2008-11-06 15:41:19 -08:00
sparse.c meminit section warnings 2008-11-30 10:03:35 -08:00
swap_state.c mm: remove gfp_mask from add_to_swap 2009-01-06 15:59:04 -08:00
swap.c mm: try_to_free_swap replaces remove_exclusive_swap_page 2009-01-06 15:59:03 -08:00
swapfile.c memcg: introduce charge-commit-cancel style of functions 2009-01-08 08:31:04 -08:00
thrash.c
truncate.c
util.c
vmalloc.c mm: vmalloc make lazy unmapping configurable 2009-01-06 15:59:01 -08:00
vmscan.c mm: stop kswapd's infinite loop at high order allocation 2009-01-06 15:59:10 -08:00
vmstat.c cpumask: convert mm/ 2009-01-01 10:12:29 +10:30