linux/mm
Jack Steiner 6adef3ebe5 cpusets: new round-robin rotor for SLAB allocations
We have observed several workloads running on multi-node systems where
memory is assigned unevenly across the nodes in the system.  There are
numerous reasons for this but one is the round-robin rotor in
cpuset_mem_spread_node().

For example, a simple test that writes a multi-page file will allocate
pages on nodes 0 2 4 6 ...  Odd nodes are skipped.  (Sometimes it
allocates on odd nodes & skips even nodes).

An example is shown below.  The program "lfile" writes a file consisting
of 10 pages.  The program then mmaps the file & uses get_mempolicy(...,
MPOL_F_NODE) to determine the nodes where the file pages were allocated.
The output is shown below:

	# ./lfile
	 allocated on nodes: 2 4 6 0 1 2 6 0 2

There is a single rotor that is used for allocating both file pages & slab
pages.  Writing the file allocates both a data page & a slab page
(buffer_head).  This advances the RR rotor 2 nodes for each page
allocated.

A quick confirmation seems to confirm this is the cause of the uneven
allocation:

	# echo 0 >/dev/cpuset/memory_spread_slab
	# ./lfile
	 allocated on nodes: 6 7 8 9 0 1 2 3 4 5

This patch introduces a second rotor that is used for slab allocations.

Signed-off-by: Jack Steiner <steiner@sgi.com>
Acked-by: Christoph Lameter <cl@linux-foundation.org>
Cc: Pekka Enberg <penberg@cs.helsinki.fi>
Cc: Paul Menage <menage@google.com>
Cc: Jack Steiner <steiner@sgi.com>
Cc: Robin Holt <holt@sgi.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2010-05-27 09:12:44 -07:00
..
backing-dev.c writeback: fixups for !dirty_writeback_centisecs 2010-05-21 20:00:35 +02:00
bootmem.c
bounce.c
compaction.c mm: compaction: add a tunable that decides when memory should be compacted and when it should be reclaimed 2010-05-25 08:06:59 -07:00
debug-pagealloc.c
dmapool.c
fadvise.c
failslab.c
filemap_xip.c
filemap.c do_generic_file_read: clear page errors when issuing a fresh read of the page 2010-05-26 10:20:27 -07:00
fremap.c
highmem.c highmem: remove unneeded #ifdef CONFIG_TRACE_IRQFLAGS_SUPPORT for debug_kmap_atomic() 2010-05-25 08:07:01 -07:00
hugetlb.c cpuset,mm: fix no node to alloc memory when changing cpuset's mems 2010-05-25 08:06:57 -07:00
hwpoison-inject.c
init-mm.c
internal.h
Kconfig mm: allow CONFIG_MIGRATION to be set without CONFIG_NUMA or memory hot-remove 2010-05-25 08:06:59 -07:00
Kconfig.debug
kmemcheck.c
kmemleak-test.c
kmemleak.c
ksm.c mm: migration: share the anon_vma ref counts between KSM and page migration 2010-05-25 08:06:58 -07:00
maccess.c
madvise.c
Makefile mm: compaction: memory compaction core 2010-05-25 08:06:59 -07:00
memcontrol.c memcg: clean up memory thresholds 2010-05-27 09:12:44 -07:00
memory_hotplug.c mem-hotplug: fix potential race while building zonelist for new populated zone 2010-05-25 08:07:02 -07:00
memory-failure.c
memory.c mm: document follow_page() 2010-05-25 08:07:00 -07:00
mempolicy.c mempolicy: ERR_PTR dereference in mpol_shared_policy_init() 2010-05-26 08:19:23 -07:00
mempool.c
migrate.c memcg: fix mis-accounting of file mapped racy with migration 2010-05-27 09:12:44 -07:00
mincore.c mincore: do nested page table walks 2010-05-25 08:06:58 -07:00
mlock.c
mm_init.c
mmap.c
mmu_context.c
mmu_notifier.c
mmzone.c
mprotect.c
mremap.c
msync.c sanitize vfs_fsync calling conventions 2010-05-21 18:31:21 -04:00
nommu.c nommu: allow private mappings of read-only devices 2010-05-26 08:19:23 -07:00
oom_kill.c memcg: make oom killer a no-op when no killable task can be found 2010-05-27 09:12:43 -07:00
page_alloc.c mem-hotplug: fix potential race while building zonelist for new populated zone 2010-05-25 08:07:02 -07:00
page_cgroup.c
page_io.c
page_isolation.c
page-writeback.c writeback: fix mixed up arguments to bdi_start_writeback() 2010-05-21 20:01:54 +02:00
pagewalk.c
percpu_up.c
percpu-km.c
percpu-vm.c
percpu.c
prio_tree.c
quicklist.c
readahead.c readahead.c: fix comment 2010-05-25 08:07:00 -07:00
rmap.c mm: migration: avoid race between shift_arg_pages() and rmap_walk() during migration by not migrating temporary stacks 2010-05-25 08:06:59 -07:00
shmem.c memcg: move charge of file pages 2010-05-27 09:12:43 -07:00
slab.c cpusets: new round-robin rotor for SLAB allocations 2010-05-27 09:12:44 -07:00
slob.c mm: Move ARCH_SLAB_MINALIGN and ARCH_KMALLOC_MINALIGN to <linux/slob_def.h> 2010-05-19 22:03:13 +03:00
slub.c cpuset,mm: fix no node to alloc memory when changing cpuset's mems 2010-05-25 08:06:57 -07:00
sparse-vmemmap.c
sparse.c sparsemem: on no vmemmap path put mem_map on node high too 2010-05-25 08:06:56 -07:00
swap_state.c
swap.c
swapfile.c Merge git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/staging-2.6 2010-05-21 15:26:46 -07:00
thrash.c
truncate.c
util.c
vmalloc.c
vmscan.c vmscan: remove isolate_pages callback scan control 2010-05-25 08:07:00 -07:00
vmstat.c mm: compaction: direct compact when a high-order allocation fails 2010-05-25 08:06:59 -07:00