linux

mirror of https://github.com/FEX-Emu/linux.git synced 2024-12-22 09:22:37 +00:00

History

Shaohua Li ebc2a1a691 swap: make cluster allocation per-cpu swap cluster allocation is to get better request merge to improve performance. But the cluster is shared globally, if multiple tasks are doing swap, this will cause interleave disk access. While multiple tasks swap is quite common, for example, each numa node has a kswapd thread doing swap and multiple threads/processes doing direct page reclaim. ioscheduler can't help too much here, because tasks don't send swapout IO down to block layer in the meantime. Block layer does merge some IOs, but a lot not, depending on how many tasks are doing swapout concurrently. In practice, I've seen a lot of small size IO in swapout workloads. We makes the cluster allocation per-cpu here. The interleave disk access issue goes away. All tasks swapout to their own cluster, so swapout will become sequential, which can be easily merged to big size IO. If one CPU can't get its per-cpu cluster (for example, there is no free cluster anymore in the swap), it will fallback to scan swap_map. The CPU can still continue swap. We don't need recycle free swap entries of other CPUs. In my test (swap to a 2-disk raid0 partition), this improves around 10% swapout throughput, and request size is increased significantly. How does this impact swap readahead is uncertain though. On one side, page reclaim always isolates and swaps several adjancent pages, this will make page reclaim write the pages sequentially and benefit readahead. On the other side, several CPU write pages interleave means the pages don't live _sequentially_ but relatively _near_. In the per-cpu allocation case, if adjancent pages are written by different cpus, they will live relatively _far_. So how this impacts swap readahead depends on how many pages page reclaim isolates and swaps one time. If the number is big, this patch will benefit swap readahead. Of course, this is about sequential access pattern. The patch has no impact for random access pattern, because the new cluster allocation algorithm is just for SSD. Alternative solution is organizing swap layout to be per-mm instead of this per-cpu approach. In the per-mm layout, we allocate a disk range for each mm, so pages of one mm live in swap disk adjacently. per-mm layout has potential issues of lock contention if multiple reclaimers are swap pages from one mm. For a sequential workload, per-mm layout is better to implement swap readahead, because pages from the mm are adjacent in disk. But per-cpu layout isn't very bad in this workload, as page reclaim always isolates and swaps several pages one time, such pages will still live in disk sequentially and readahead can utilize this. For a random workload, per-mm layout isn't beneficial of request merge, because it's quite possible pages from different mm are swapout in the meantime and IO can't be merged in per-mm layout. while with per-cpu layout we can merge requests from any mm. Considering random workload is more popular in workloads with swap (and per-cpu approach isn't too bad for sequential workload too), I'm choosing per-cpu layout. [akpm@linux-foundation.org: coding-style fixes] Signed-off-by: Shaohua Li <shli@fusionio.com> Cc: Rik van Riel <riel@redhat.com> Cc: Minchan Kim <minchan@kernel.org> Cc: Kyungmin Park <kmpark@infradead.org> Cc: Hugh Dickins <hughd@google.com> Cc: Rafael Aquini <aquini@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>		2013-09-11 15:57:17 -07:00
..
backing-dev.c	backing-dev: convert class code to use dev_groups	2013-08-19 21:22:34 -07:00
balloon_compaction.c	mm: introduce a common interface for balloon pages mobility	2012-12-11 17:22:26 -08:00
bootmem.c	mm: kill free_all_bootmem_node()	2013-07-03 16:07:39 -07:00
bounce.c	Merge branch 'for-3.10/core' of git://git.kernel.dk/linux-block	2013-05-08 10:13:35 -07:00
cleancache.c	mm: cleancache: clean up cleancache_enabled	2013-04-30 17:04:01 -07:00
compaction.c	mm: add & use zone_end_pfn() and zone_spans_pfn()	2013-02-23 17:50:20 -08:00
debug-pagealloc.c	mm, x86: Remove debug_pagealloc_enabled	2011-12-06 09:24:07 +01:00
dmapool.c	dmapool: make DMAPOOL_DEBUG detect corruption of free marker	2012-12-11 17:22:24 -08:00
fadvise.c	teach SYSCALL_DEFINE<n> how to deal with long long/unsigned long long	2013-03-03 22:46:22 -05:00
failslab.c	switch debugfs to umode_t	2012-01-03 22:54:56 -05:00
filemap_xip.c	lift sb_start_write() out of ->write()	2013-04-09 14:12:56 -04:00
filemap.c	direct-io: Handle O_(D)SYNC AIO	2013-09-04 09:23:46 -04:00
fremap.c	mm: save soft-dirty bits on file pages	2013-08-13 17:57:48 -07:00
frontswap.c	frontswap: fix incorrect zeroing and allocation size for frontswap_map	2013-06-12 16:29:46 -07:00
highmem.c	Some nice cleanups, and even a patch my wife did as a "live" demo for	2012-12-20 08:37:05 -08:00
huge_memory.c	mm: replace strict_strtoul() with kstrtoul()	2013-09-11 15:57:11 -07:00
hugetlb_cgroup.c	cgroup: pass around cgroup_subsys_state instead of cgroup in file methods	2013-08-08 20:11:24 -04:00
hugetlb.c	mm: replace strict_strtoul() with kstrtoul()	2013-09-11 15:57:11 -07:00
hwpoison-inject.c	memcg: rename config variables	2012-07-31 18:42:43 -07:00
init-mm.c	atomic: use <linux/atomic.h>	2011-07-26 16:49:47 -07:00
internal.h	mm: remove unused __put_page()	2013-07-09 10:33:22 -07:00
interval_tree.c	mm: add CONFIG_DEBUG_VM_RB build option	2012-10-09 16:22:42 +09:00
Kconfig	Merge remote-tracking branch 'origin/next' into kvm-ppc-next	2013-08-29 00:41:59 +02:00
Kconfig.debug	mm: more intensive memory corruption debugging	2012-01-10 16:30:42 -08:00
kmemcheck.c
kmemleak-test.c
kmemleak.c	mm: replace strict_strtoul() with kstrtoul()	2013-09-11 15:57:11 -07:00
ksm.c	mm: replace strict_strtoul() with kstrtoul()	2013-09-11 15:57:11 -07:00
maccess.c	mm: Map most files to use export.h instead of module.h	2011-10-31 09:20:12 -04:00
madvise.c	mm/madvise.c: fix coding-style errors	2013-09-11 15:57:00 -07:00
Makefile	zswap: add to mm/	2013-07-10 18:11:34 -07:00
memblock.c	mm/memblock.c: fix wrong comment in __next_free_mem_range()	2013-07-09 10:33:23 -07:00
memcontrol.c	Merge branch 'for-3.12' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup	2013-09-03 18:25:03 -07:00
memory_hotplug.c	mm/memory_hotplug.c: fix return value of online_pages()	2013-07-09 10:33:25 -07:00
memory-failure.c	Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial	2013-09-06 09:36:28 -07:00
memory.c	Merge 3.11-rc6 into char-misc-next	2013-08-18 20:40:33 -07:00
mempolicy.c	mm: mempolicy: turn vma_set_policy() into vma_dup_policy()	2013-09-11 15:57:00 -07:00
mempool.c	mempool: add @gfp_mask to mempool_create_node()	2012-06-25 11:53:47 +02:00
migrate.c	mm: migration: add migrate_entry_wait_huge()	2013-06-12 16:29:46 -07:00
mincore.c	swap: make each swap partition have one address_space	2013-02-23 17:50:17 -08:00
mlock.c	Revert "mm: introduce VM_POPULATE flag to better deal with racy userspace programs"	2013-03-28 17:45:51 -07:00
mm_init.c	mm: tune vm_committed_as percpu_counter batching size	2013-07-03 16:07:32 -07:00
mmap.c	mm: mmap_region: kill correct_wcount/inode, use allow_write_access()	2013-09-11 15:57:07 -07:00
mmu_context.c	mm: remove old aio use_mm() comment	2013-05-07 18:38:27 -07:00
mmu_notifier.c	treewide: relase -> release	2013-06-28 14:34:33 +02:00
mmzone.c	mm: rename page struct field helpers	2013-02-23 17:50:18 -08:00
mprotect.c	mm/mprotect.c: coding-style cleanups	2012-12-18 15:02:15 -08:00
mremap.c	mm: move_ptes -- Set soft dirty bit depending on pte type	2013-08-27 09:36:17 -07:00
msync.c
nobootmem.c	mm: concentrate modification of totalram_pages into the mm core	2013-07-03 16:07:33 -07:00
nommu.c	mm: remove free_area_cache	2013-07-10 18:11:34 -07:00
oom_kill.c	mm/oom_kill: remove weird use of ERR_PTR()/PTR_ERR().	2013-07-15 11:25:05 +09:30
page_alloc.c	mm/page_alloc.c: use '__paginginit' instead of '__init'	2013-09-11 15:57:13 -07:00
page_cgroup.c	memcontrol: use N_MEMORY instead N_HIGH_MEMORY	2012-12-12 17:38:32 -08:00
page_io.c	mm: remove compressed copy from zram in-memory	2013-07-03 16:07:26 -07:00
page_isolation.c	page_isolation: Fix a comment typo in test_pages_isolated()	2013-08-20 13:03:41 +02:00
page-writeback.c	kernel: delete __cpuinit usage from all core kernel files	2013-07-14 19:36:59 -04:00
pagewalk.c	mm/pagewalk.c: walk_page_range should avoid VM_PFNMAP areas	2013-05-24 16:22:53 -07:00
percpu-km.c
percpu-vm.c	mm: fix kernel-doc warnings	2012-06-20 14:39:36 -07:00
percpu.c	mm, percpu: Make sure percpu_alloc early parameter has an argument	2012-12-02 06:23:04 -08:00
pgtable-generic.c	mm/THP: add pmd args to pgtable deposit and withdraw APIs	2013-06-20 16:55:07 +10:00
process_vm_access.c	Fix: compat_rw_copy_check_uvector() misuse in aio, readv, writev, and security keys	2013-03-12 11:05:45 -07:00
quicklist.c	mm: delete various needless include <linux/module.h>	2011-10-31 09:20:11 -04:00
readahead.c	mm: change invalidatepage prototype to accept length	2013-05-21 23:17:23 -04:00
rmap.c	s390/mm: implement software referenced bits	2013-08-29 13:20:11 +02:00
shmem.c	shm_mnt is as longterm as it gets, TYVM...	2013-09-03 22:50:27 -04:00
slab_common.c	Merge branch 'slab/for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/penberg/linux	2013-07-14 15:14:29 -07:00
slab.c	kernel: delete __cpuinit usage from all core kernel files	2013-07-14 19:36:59 -04:00
slab.h	memcg: check that kmem_cache has memcg_params before accessing it	2013-08-28 19:26:38 -07:00
slob.c	Merge branch 'slab/for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/penberg/linux	2013-07-14 15:14:29 -07:00
slub.c	mm: replace strict_strtoul() with kstrtoul()	2013-09-11 15:57:11 -07:00
sparse-vmemmap.c	sparse-vmemmap: specify vmemmap population range in bytes	2013-04-29 15:54:35 -07:00
sparse.c	mm/sparse.c: put clear_hwpoisoned_pages within CONFIG_MEMORY_HOTREMOVE	2013-07-09 10:33:22 -07:00
swap_state.c	swap: avoid read_swap_cache_async() race to deadlock while waiting on discard I/O completion	2013-06-12 16:29:45 -07:00
swap.c	thp, mm: avoid PageUnevictable on active/inactive lru lists	2013-07-31 14:41:03 -07:00
swapfile.c	swap: make cluster allocation per-cpu	2013-09-11 15:57:17 -07:00
truncate.c	mm: teach truncate_inode_pages_range() to handle non page aligned ranges	2013-05-27 23:32:35 -04:00
util.c	mm: remove free_area_cache	2013-07-10 18:11:34 -07:00
vmalloc.c	mm/vmalloc.c: fix an overflow bug in alloc_vmap_area()	2013-07-09 10:33:23 -07:00
vmpressure.c	Merge branch 'for-3.12' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup	2013-09-03 18:25:03 -07:00
vmscan.c	mm: vmscan: do not scale writeback pages when deciding whether to set ZONE_WRITEBACK	2013-07-09 10:33:23 -07:00
vmstat.c	mm: vmstats: track TLB flush stats on UP too	2013-09-11 15:57:09 -07:00
zbud.c	mm: zbud: fix condition check on allocation size	2013-07-31 14:41:03 -07:00
zswap.c	mm/zswap.c: get swapper address_space by using macro	2013-09-11 15:57:08 -07:00