linux/arch/x86/mm
Shaohua Li 9329672021 x86: Spread tlb flush vector between nodes
Currently flush tlb vector allocation is based on below equation:
	sender = smp_processor_id() % 8
This isn't optimal, CPUs from different node can have the same vector, this
causes a lot of lock contention. Instead, we can assign the same vectors to
CPUs from the same node, while different node has different vectors. This has
below advantages:
a. if there is lock contention, the lock contention is between CPUs from one
node. This should be much cheaper than the contention between nodes.
b. completely avoid lock contention between nodes. This especially benefits
kswapd, which is the biggest user of tlb flush, since kswapd sets its affinity
to specific node.

In my test, this could reduce > 20% CPU overhead in extreme case.The test
machine has 4 nodes and each node has 16 CPUs. I then bind each node's kswapd
to the first CPU of the node. I run a workload with 4 sequential mmap file
read thread. The files are empty sparse file. This workload will trigger a
lot of page reclaim and tlbflush. The kswapd bind is to easy trigger the
extreme tlb flush lock contention because otherwise kswapd keeps migrating
between CPUs of a node and I can't get stable result. Sure in real workload,
we can't always see so big tlb flush lock contention, but it's possible.

[ hpa: folded in fix from Eric Dumazet to use this_cpu_read() ]

Signed-off-by: Shaohua Li <shaohua.li@intel.com>
LKML-Reference: <1287544023.4571.8.camel@sli10-conroe.sh.intel.com>
Cc: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2010-10-20 14:44:42 -07:00
..
kmemcheck x86, kmemcheck: Remove double test 2010-08-30 09:19:28 +02:00
dump_pagetables.c x86, mm: Create symbolic index into address_markers array 2010-07-20 16:56:19 -07:00
extable.c x86, 64-bit: Move K8 B step iret fixup to fault entry asm 2009-10-12 18:29:46 +02:00
fault.c x86, mm: Fix incorrect data type in vmalloc_sync_all() 2010-10-20 12:54:04 -07:00
gup.c x86, doc: Fix minor spelling error in arch/x86/mm/gup.c 2010-02-02 16:00:44 -08:00
highmem_32.c kmap_atomic: make kunmap_atomic() harder to misuse 2010-08-09 20:44:54 -07:00
hugetlbpage.c include cleanup: Update gfp.h and slab.h includes to prepare for breaking implicit slab.h inclusion from percpu.h 2010-03-30 22:02:32 +09:00
init_32.c include cleanup: Update gfp.h and slab.h includes to prepare for breaking implicit slab.h inclusion from percpu.h 2010-03-30 22:02:32 +09:00
init_64.c x86, mm: Hold mm->page_table_lock while doing vmalloc_sync 2010-10-19 13:57:08 -07:00
init.c Merge branch 'master' into export-slabh 2010-04-05 11:37:28 +09:00
iomap_32.c x86, pat: Add PAT reserve free to io_mapping* APIs 2009-08-26 15:41:16 -07:00
ioremap.c x86, iomap: Fix wrong page aligned size calculation in ioremapping code 2010-07-20 16:56:35 -07:00
k8topology_64.c x86: Move find_smp_config() earlier and avoid bootmem usage 2009-11-24 12:10:51 +01:00
kmmio.c x86, kmmio/mmiotrace: Fix double free of kmmio_fault_pages 2010-06-18 11:30:09 +02:00
Makefile x86, pat: Migrate to rbtree only backend for pat memtype management 2010-02-18 15:41:36 -08:00
memtest.c x86: memtest: use pointers of equal type for comparison 2009-06-11 16:26:35 +02:00
mmap.c x86: Use helpers for rlimits 2010-01-27 15:17:31 -08:00
mmio-mod.c include cleanup: Update gfp.h and slab.h includes to prepare for breaking implicit slab.h inclusion from percpu.h 2010-03-30 22:02:32 +09:00
numa_32.c x86: Make 32bit support NO_BOOTMEM 2010-02-12 09:42:39 -08:00
numa_64.c numa: x86_64: use generic percpu var numa_node_id() implementation 2010-05-27 09:12:57 -07:00
numa.c x86/mm: Remove unused DBG() macro 2010-05-31 10:01:53 +02:00
pageattr-test.c
pageattr.c Merge branch 'drm-ttm-pool' into drm-core-next 2010-04-20 13:12:28 +10:00
pat_internal.h x86, pat: Fix memory leak in free_memtype 2010-05-26 11:26:04 -07:00
pat_rbtree.c rbtree: Undo augmented trees performance damage and regression 2010-07-05 14:43:50 +02:00
pat.c Merge branch 'x86-mm-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip 2010-08-06 10:17:52 -07:00
pf_in.c x86,mmiotrace: Add support for tracing STOS instruction 2010-08-02 01:32:01 +02:00
pf_in.h
pgtable_32.c x86: remove last traces of quicklist usage 2010-05-24 13:33:31 -07:00
pgtable.c x86, mm: Hold mm->page_table_lock while doing vmalloc_sync 2010-10-19 13:57:08 -07:00
physaddr.c x86: split __phys_addr out into separate file 2009-09-10 11:48:55 -07:00
physaddr.h x86: split __phys_addr out into separate file 2009-09-10 11:48:55 -07:00
setup_nx.c x86, mm: Report state of NX protections during boot 2009-11-16 13:44:59 -08:00
srat_32.c x86: Fix checking of SRAT when node 0 ram is not from 0 2009-12-16 16:43:37 -08:00
srat_64.c Merge branch 'x86-mm-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip 2010-05-18 09:17:50 -07:00
testmmiotrace.c x86, kmmio/mmiotrace: Fix double free of kmmio_fault_pages 2010-06-18 11:30:09 +02:00
tlb.c x86: Spread tlb flush vector between nodes 2010-10-20 14:44:42 -07:00