linux/Documentation
Greg Thelen c4843a7593 memcg: add per cgroup dirty page accounting
When modifying PG_Dirty on cached file pages, update the new
MEM_CGROUP_STAT_DIRTY counter.  This is done in the same places where
global NR_FILE_DIRTY is managed.  The new memcg stat is visible in the
per memcg memory.stat cgroupfs file.  The most recent past attempt at
this was http://thread.gmane.org/gmane.linux.kernel.cgroups/8632

The new accounting supports future efforts to add per cgroup dirty
page throttling and writeback.  It also helps an administrator break
down a container's memory usage and provides evidence to understand
memcg oom kills (the new dirty count is included in memcg oom kill
messages).

The ability to move page accounting between memcg
(memory.move_charge_at_immigrate) makes this accounting more
complicated than the global counter.  The existing
mem_cgroup_{begin,end}_page_stat() lock is used to serialize move
accounting with stat updates.
Typical update operation:
	memcg = mem_cgroup_begin_page_stat(page)
	if (TestSetPageDirty()) {
		[...]
		mem_cgroup_update_page_stat(memcg)
	}
	mem_cgroup_end_page_stat(memcg)

Summary of mem_cgroup_end_page_stat() overhead:
- Without CONFIG_MEMCG it's a no-op
- With CONFIG_MEMCG and no inter memcg task movement, it's just
  rcu_read_lock()
- With CONFIG_MEMCG and inter memcg  task movement, it's
  rcu_read_lock() + spin_lock_irqsave()

A memcg parameter is added to several routines because their callers
now grab mem_cgroup_begin_page_stat() which returns the memcg later
needed by for mem_cgroup_update_page_stat().

Because mem_cgroup_begin_page_stat() may disable interrupts, some
adjustments are needed:
- move __mark_inode_dirty() from __set_page_dirty() to its caller.
  __mark_inode_dirty() locking does not want interrupts disabled.
- use spin_lock_irqsave(tree_lock) rather than spin_lock_irq() in
  __delete_from_page_cache(), replace_page_cache_page(),
  invalidate_complete_page2(), and __remove_mapping().

   text    data     bss      dec    hex filename
8925147 1774832 1785856 12485835 be84cb vmlinux-!CONFIG_MEMCG-before
8925339 1774832 1785856 12486027 be858b vmlinux-!CONFIG_MEMCG-after
                            +192 text bytes
8965977 1784992 1785856 12536825 bf4bf9 vmlinux-CONFIG_MEMCG-before
8966750 1784992 1785856 12537598 bf4efe vmlinux-CONFIG_MEMCG-after
                            +773 text bytes

Performance tests run on v4.0-rc1-36-g4f671fe2f952.  Lower is better for
all metrics, they're all wall clock or cycle counts.  The read and write
fault benchmarks just measure fault time, they do not include I/O time.

* CONFIG_MEMCG not set:
                            baseline                              patched
  kbuild                 1m25.030000(+-0.088% 3 samples)       1m25.426667(+-0.120% 3 samples)
  dd write 100 MiB          0.859211561 +-15.10%                  0.874162885 +-15.03%
  dd write 200 MiB          1.670653105 +-17.87%                  1.669384764 +-11.99%
  dd write 1000 MiB         8.434691190 +-14.15%                  8.474733215 +-14.77%
  read fault cycles       254.0(+-0.000% 10 samples)            253.0(+-0.000% 10 samples)
  write fault cycles     2021.2(+-3.070% 10 samples)           1984.5(+-1.036% 10 samples)

* CONFIG_MEMCG=y root_memcg:
                            baseline                              patched
  kbuild                 1m25.716667(+-0.105% 3 samples)       1m25.686667(+-0.153% 3 samples)
  dd write 100 MiB          0.855650830 +-14.90%                  0.887557919 +-14.90%
  dd write 200 MiB          1.688322953 +-12.72%                  1.667682724 +-13.33%
  dd write 1000 MiB         8.418601605 +-14.30%                  8.673532299 +-15.00%
  read fault cycles       266.0(+-0.000% 10 samples)            266.0(+-0.000% 10 samples)
  write fault cycles     2051.7(+-1.349% 10 samples)           2049.6(+-1.686% 10 samples)

* CONFIG_MEMCG=y non-root_memcg:
                            baseline                              patched
  kbuild                 1m26.120000(+-0.273% 3 samples)       1m25.763333(+-0.127% 3 samples)
  dd write 100 MiB          0.861723964 +-15.25%                  0.818129350 +-14.82%
  dd write 200 MiB          1.669887569 +-13.30%                  1.698645885 +-13.27%
  dd write 1000 MiB         8.383191730 +-14.65%                  8.351742280 +-14.52%
  read fault cycles       265.7(+-0.172% 10 samples)            267.0(+-0.000% 10 samples)
  write fault cycles     2070.6(+-1.512% 10 samples)           2084.4(+-2.148% 10 samples)

As expected anon page faults are not affected by this patch.

tj: Updated to apply on top of the recent cancel_dirty_page() changes.

Signed-off-by: Sha Zhengju <handai.szj@gmail.com>
Signed-off-by: Greg Thelen <gthelen@google.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Jens Axboe <axboe@fb.com>
2015-06-02 08:33:33 -06:00
..
ABI platform-drivers-x86 for 4.1 2015-04-26 13:44:46 -07:00
accounting
acpi Power management and ACPI updates for v4.1-rc1 2015-04-14 20:21:54 -07:00
aoe
arm ARM: SoC multiplatform code changes for v4.1 2015-04-22 09:20:15 -07:00
arm64 ARM64 / ACPI: additions of ACPI documentation for arm64 2015-03-26 15:13:09 +00:00
auxdisplay
backlight
blackfin Documentation: blackfin: Makefile: Typo building issue 2015-04-11 15:19:31 +02:00
block Documentation: Remove mentioning of block barriers 2015-03-20 07:41:56 -06:00
blockdev Merge branch 'for-4.1/drivers' of git://git.kernel.dk/linux-block 2015-04-16 22:05:27 -04:00
bus-devices
cdrom
cgroups memcg: add per cgroup dirty page accounting 2015-06-02 08:33:33 -06:00
cma cma: debug: document new debugfs interface 2015-04-14 16:49:00 -07:00
connector
console
cpu-freq
cpuidle
cris
crypto crypto: doc - AEAD / RNG AF_ALG interface 2015-03-09 21:06:18 +11:00
development-process
device-mapper dm crypt: update URLs to new cryptsetup project page 2015-04-15 12:10:24 -04:00
devicetree CRIS changes for 4.1 2015-04-26 13:31:05 -07:00
dmaengine Merge branch 'for-linus' of git://git.infradead.org/users/vkoul/slave-dma 2015-02-18 08:49:20 -08:00
DocBook Merge branch 'drm-next-merged' of git://people.freedesktop.org/~airlied/linux into v4l_for_linus 2015-04-21 09:44:55 -03:00
driver-model Char/Misc driver patches for 4.1-rc1 2015-04-21 09:42:58 -07:00
dvb
early-userspace
EDID
extcon
fault-injection
fb
filesystems xfs: update for 4.1-rc1 2015-04-24 07:08:41 -07:00
firmware_class
fmc
frv
gpio The documentation tree update for 4.1. Numerous fixes, the overdue removal 2015-04-18 11:10:49 -04:00
hid HID: sensor: Update document for custom sensor 2015-04-10 22:22:56 +02:00
hwmon hwmon: (it87) Add support for IT8620E 2015-04-05 06:01:00 -07:00
i2c i2c: slave: add documentation for i2c-slave-eeprom 2015-03-27 16:53:39 +01:00
ia64 virtual: Documentation: simplify and generalize paravirt_ops.txt 2015-02-13 17:15:44 +10:30
ide
infiniband
input Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dtor/input 2015-04-21 12:54:08 -07:00
ioctl platform/chrome: Add Chrome OS EC userspace device interface 2015-02-26 15:45:06 -08:00
isdn
ja_JP
kbuild Merge branch 'misc' of git://git.kernel.org/pub/scm/linux/kernel/git/mmarek/kbuild 2015-02-19 10:31:37 -08:00
kdump
ko_KR
laptops thinkpad_acpi: Add adaptive_kbd_mode sysfs attr 2015-03-03 09:00:08 -08:00
leds Documentation: leds: Add description of LED Flash class extension 2015-03-09 17:18:00 -07:00
locking Documentation changes for 3.20 2015-02-11 13:03:11 -08:00
m68k
memory-devices
metag
mic
mips
misc-devices
mmc
mn10300
mtd
namespaces
netlabel
networking net: rfs: fix crash in get_rps_cpus() 2015-04-26 16:07:57 -04:00
nfc
nios2
parisc
PCI The documentation tree update for 4.1. Numerous fixes, the overdue removal 2015-04-18 11:10:49 -04:00
pcmcia
phy
platform
power Power management and ACPI updates for v4.1-rc1 2015-04-14 20:21:54 -07:00
powerpc Revert "powerpc/tm: Abort syscalls in active transactions" 2015-04-30 15:24:58 +10:00
pps
prctl
pti
ptp
rapidio
RCU
s390
scheduler docs/completion.txt: Various tweaks and corrections 2015-04-04 15:20:26 +02:00
scsi genirq: Remove the deprecated 'IRQF_DISABLED' request_irq() flag entirely 2015-03-05 20:53:06 +01:00
security Smack: Updates for Smack documentation 2015-03-31 10:35:31 -07:00
serial
sh
sound ASoC: Updates for v4.1 2015-04-13 14:14:29 +02:00
spi Documentation/spi/spidev_test.c: fix warning 2015-04-17 09:04:12 -04:00
sysctl Doc/sysctl/kernel.txt: document threads-max 2015-04-17 09:04:07 -04:00
target target: Version 2 of TCMU ABI 2015-04-19 22:40:26 -07:00
thermal
timers documentation: Update NO_HZ_FULL interaction with POSIX timers 2015-02-26 11:57:29 -08:00
tpm
trace coresight: Correcting documentation typographical error 2015-04-03 16:17:03 +02:00
usb usb: patches for v4.1 merge window 2015-03-24 22:57:49 +01:00
vDSO
video4linux [media] media/Documentation: New flag EXECUTE_ON_WRITE 2015-04-08 06:35:16 -03:00
virtual KVM: PPC: Book3S HV: Add fast real-mode H_RANDOM implementation. 2015-04-21 15:21:29 +02:00
vm The documentation tree update for 4.1. Numerous fixes, the overdue removal 2015-04-18 11:10:49 -04:00
w1
watchdog
wimax
x86 x86/mm/KASLR: Propagate KASLR status to kernel proper 2015-04-03 15:26:15 +02:00
xtensa
zh_CN Documentation:Update Documentation/zh_CN/arm64/memory.txt 2015-04-04 15:20:26 +02:00
00-INDEX
applying-patches.txt
assoc_array.txt
atomic_ops.txt documentation: Clarify memory-barrier semantics of atomic operations 2015-02-26 11:57:31 -08:00
bad_memory.txt
basic_profiling.txt
bcache.txt
binfmt_misc.txt
braille-console.txt
bt8xxgpio.txt
btmrvl.txt
BUG-HUNTING
bus-virt-phys-mapping.txt
cachetlb.txt rmap: drop support of non-linear mappings 2015-02-10 14:30:31 -08:00
Changes
circular-buffers.txt
clk.txt
coccinelle.txt
CodeOfConflict Code of Conflict 2015-02-27 11:44:24 -08:00
CodingStyle The documentation tree update for 4.1. Numerous fixes, the overdue removal 2015-04-18 11:10:49 -04:00
cpu-hotplug.txt cpumask: fix cpu-hotplug documentation 2015-03-05 13:37:01 +10:30
cpu-load.txt
cputopology.txt
crc32.txt
dcdbas.txt
debugging-modules.txt
debugging-via-ohci1394.txt
dell_rbu.txt
devices.txt
digsig.txt
DMA-API-HOWTO.txt
DMA-API.txt
DMA-attributes.txt
dma-buf-sharing.txt dma-buf: cleanup dma_buf_export() to make it easily extensible 2015-04-21 14:47:16 +05:30
DMA-ISA-LPC.txt
dontdiff
dynamic-debug-howto.txt
edac.txt
efi-stub.txt
eisa.txt
email-clients.txt Documentation/email-clients.txt: Fix one grammar mistake, add extra info about TB 2015-03-20 07:41:55 -06:00
flexible-arrays.txt
futex-requeue-pi.txt
gcov.txt
gdb-kernel-debugging.txt scripts/gdb: add basic documentation 2015-02-17 14:34:54 -08:00
highuid.txt
HOWTO
hsi.txt
hw_random.txt
hwspinlock.txt
init.txt
initrd.txt
intel_txt.txt
Intel-IOMMU.txt
io_ordering.txt
io-mapping.txt
iostats.txt
IPMI.txt
IRQ-affinity.txt
IRQ-domain.txt IRQCHIP: Update docs regarding irq_domain_add_tree() 2015-04-01 17:21:35 +02:00
IRQ.txt
irqflags-tracing.txt
isapnp.txt
java.txt
kasan.txt kasan: enable instrumentation of global variables 2015-02-13 21:21:42 -08:00
kernel-doc-nano-HOWTO.txt
kernel-docs.txt
kernel-parameters.txt uas: Add US_FL_MAX_SECTORS_240 flag 2015-04-28 12:48:57 +02:00
kernel-per-CPU-kthreads.txt documentation: Update per-CPU kthreads documentation 2015-02-26 11:57:30 -08:00
kmemcheck.txt Documentation: update the CONFIG_DEBUG_PAGEALLOC description 2015-03-20 07:41:55 -06:00
kmemleak.txt
kobject.txt
kprobes.txt kprobes: Update Documentation/kprobes.txt 2015-03-20 07:41:55 -06:00
kref.txt
kselftest.txt
ldm.txt
local_ops.txt
lockup-watchdogs.txt
logo.gif
logo.txt
lzo.txt
magic-number.txt
mailbox.txt
Makefile Documentation: Remove ZBOOT MMC/SDHI utility and docs 2015-02-24 06:45:25 +09:00
ManagementStyle
md-cluster.txt md-cluster: Design Documentation 2015-02-23 07:16:46 -06:00
md.txt
media-framework.txt
memory-barriers.txt The documentation tree update for 4.1. Numerous fixes, the overdue removal 2015-04-18 11:10:49 -04:00
memory-hotplug.txt mem-hotplug: fix typo in Documentation/memory-hotplug.txt 2015-03-20 07:41:55 -06:00
module-signing.txt modsign: change default key details 2015-04-30 09:35:41 -07:00
mono.txt
nommu-mmap.txt
numastat.txt
oops-tracing.txt
padata.txt
parport-lowlevel.txt
parport.txt
percpu-rw-semaphore.txt
phy.txt
pi-futex.txt
pinctrl.txt pinctrl: fix example .get_group_pins implementation signature 2015-03-18 02:02:20 +01:00
pnp.txt
preempt-locking.txt
printk-formats.txt The documentation tree update for 4.1. Numerous fixes, the overdue removal 2015-04-18 11:10:49 -04:00
pwm.txt
ramoops.txt
rbtree.txt
remoteproc.txt
rfkill.txt
robust-futex-ABI.txt
robust-futexes.txt
rpmsg.txt
rtc.txt Documentation, split up rtc.txt into documentation and test file 2015-03-24 22:01:58 -06:00
SAK.txt
SecurityBugs
serial-console.txt
sgi-ioc4.txt
SM501.txt
smsc_ece1099.txt
sparse.txt
stable_api_nonsense.txt
stable_kernel_rules.txt stable_kernel_rules: Add clause about specification of kernel versions to patch. 2015-03-26 23:52:24 +01:00
static-keys.txt
SubmitChecklist
SubmittingDrivers
SubmittingPatches checkpatch, SubmittingPatches: suggest line wrapping commit messages at 75 columns 2015-04-17 09:03:57 -04:00
svga.txt
sysfs-rules.txt
sysrq.txt
this_cpu_ops.txt
unaligned-memory-access.txt
unicode.txt
unshare.txt
vfio.txt
VGA-softcursor.txt
vgaarbiter.txt
video-output.txt
vme_api.txt
volatile-considered-harmful.txt
workqueue.txt
xillybus.txt
xz.txt
zorro.txt