linux

mirror of https://github.com/FEX-Emu/linux.git synced 2024-12-17 22:41:25 +00:00

History

Wu Fengguang 10be0b372c readahead: introduce context readahead algorithm Introduce page cache context based readahead algorithm. This is to better support concurrent read streams in general. RATIONALE --------- The current readahead algorithm detects interleaved reads in a _passive_ way. Given a sequence of interleaved streams 1,1001,2,1002,3,4,1003,5,1004,1005,6,... By checking for (offset == prev_offset + 1), it will discover the sequentialness between 3,4 and between 1004,1005, and start doing sequential readahead for the individual streams since page 4 and page 1005. The context readahead algorithm guarantees to discover the sequentialness no matter how the streams are interleaved. For the above example, it will start sequential readahead since page 2 and 1002. The trick is to poke for page @offset-1 in the page cache when it has no other clues on the sequentialness of request @offset: if the current requenst belongs to a sequential stream, that stream must have accessed page @offset-1 recently, and the page will still be cached now. So if page @offset-1 is there, we can take request @offset as a sequential access. BENEFICIARIES ------------- - strictly interleaved reads i.e. 1,1001,2,1002,3,1003,... the current readahead will take them as silly random reads; the context readahead will take them as two sequential streams. - cooperative IO processes i.e. NFS and SCST They create a thread pool, farming off (sequential) IO requests to different threads which will be performing interleaved IO. It was not easy(or possible) to reliably tell from file->f_ra all those cooperative processes working on the same sequential stream, since they will have different file->f_ra instances. And NFSD's file->f_ra is particularly unusable, since their file objects are dynamically created for each request. The nfsd does have code trying to restore the f_ra bits, but not satisfactory. The new scheme is to detect the sequential pattern via looking up the page cache, which provides one single and consistent view of the pages recently accessed. That makes sequential detection for cooperative processes possible. USER REPORT ----------- Vladislav recommends the addition of context readahead as a result of his SCST benchmarks. It leads to 6%~40% performance gains in various cases and achieves equal performance in others. http://lkml.org/lkml/2009/3/19/239 OVERHEADS --------- In theory, it introduces one extra page cache lookup per random read. However the below benchmark shows context readahead to be slightly faster, wondering.. Randomly reading 200MB amount of data on a sparse file, repeat 20 times for each block size. The average throughputs are: original ra context ra gain 4K random reads: 65.561MB/s 65.648MB/s +0.1% 16K random reads: 124.767MB/s 124.951MB/s +0.1% 64K random reads: 162.123MB/s 162.278MB/s +0.1% Cc: Jens Axboe <jens.axboe@oracle.com> Cc: Jeff Moyer <jmoyer@redhat.com> Tested-by: Vladislav Bolkhovitin <vst@vlnb.net> Signed-off-by: Wu Fengguang <fengguang.wu@intel.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>		2009-06-16 19:47:30 -07:00
..
allocpercpu.c	percpu: __percpu_depopulate_mask can take a const mask	2009-04-06 13:44:15 -07:00
backing-dev.c	block: change the request allocation/congestion logic to be sync/async based	2009-04-06 08:04:53 -07:00
bootmem.c	bootmem: fix slab fallback on numa	2009-06-11 19:15:54 +03:00
bounce.c	Merge branch 'for-2.6.31' of git://git.kernel.dk/linux-2.6-block	2009-06-11 11:10:35 -07:00
debug-pagealloc.c	generic debug pagealloc	2009-04-01 08:59:13 -07:00
dmapool.c
fadvise.c	readahead: move max_sane_readahead() calls into force_page_cache_readahead()	2009-06-16 19:47:28 -07:00
failslab.c	kmemtrace, mm: fix slab.h dependency problem in mm/failslab.c	2009-04-03 12:23:01 +02:00
filemap_xip.c	mm: do_xip_mapping_read: fix length calculation	2009-04-02 19:04:49 -07:00
filemap.c	readahead: record mmap read-around states in file_ra_state	2009-06-16 19:47:29 -07:00
fremap.c
highmem.c	mm: introduce debug_kmap_atomic	2009-04-01 08:59:14 -07:00
hugetlb.c	mm: account for MAP_SHARED mappings using VM_MAYSHARE and not VM_SHARED in hugetlbfs	2009-05-29 08:40:03 -07:00
init-mm.c	mm: consolidate init_mm definition	2009-06-16 19:47:28 -07:00
internal.h	nommu: there is no mlock() for NOMMU, so don't provide the bits	2009-04-01 08:59:14 -07:00
Kconfig	security: use mmap_min_addr indepedently of security models	2009-06-04 12:07:48 +10:00
Kconfig.debug	generic debug pagealloc: build fix	2009-04-02 19:04:48 -07:00
kmemleak-test.c	kmemleak: Simple testing module for kmemleak	2009-06-11 17:04:19 +01:00
kmemleak.c	kmemleak: Add the base support	2009-06-11 17:03:28 +01:00
maccess.c	[S390] maccess: add weak attribute to probe_kernel_write	2009-06-12 10:27:37 +02:00
madvise.c	readahead: move max_sane_readahead() calls into force_page_cache_readahead()	2009-06-16 19:47:28 -07:00
Makefile	mm: consolidate init_mm definition	2009-06-16 19:47:28 -07:00
memcontrol.c	memcg: fix build warning and avoid checking for mem != null again and again	2009-05-29 08:40:03 -07:00
memory_hotplug.c
memory.c	mm: close page_mkwrite races	2009-05-02 15:36:09 -07:00
mempolicy.c
mempool.c
migrate.c	FS-Cache: Recruit a page flags for cache management	2009-04-03 16:42:36 +01:00
mincore.c
mlock.c	x86, bts, mm: clean up buffer allocation	2009-04-24 10:18:52 +02:00
mm_init.c
mmap.c	Merge branch 'perfcounters-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip	2009-06-11 14:01:07 -07:00
mmu_notifier.c
mmzone.c	[ARM] Double check memmap is actually valid with a memmap has unexpected holes V2	2009-05-18 11:22:24 +01:00
mprotect.c	perf_counter: Add mmap event hooks to mprotect()	2009-06-08 23:10:43 +02:00
mremap.c
msync.c
nommu.c	nommu: Provide mmap_min_addr definition.	2009-06-10 09:24:09 +10:00
oom_kill.c	oom: fix possible oom_dump_tasks NULL pointer	2009-05-29 08:40:01 -07:00
page_alloc.c	kmemleak: Add kmemleak_alloc callback from alloc_large_system_hash	2009-06-11 17:03:30 +01:00
page_cgroup.c	memcg: fix page_cgroup fatal error in FLATMEM	2009-06-12 11:00:54 +03:00
page_io.c	block: fix bad definition of BIO_RW_SYNC	2009-02-18 10:32:00 +01:00
page_isolation.c
page-writeback.c	page-writeback: fix the calculation of the oldest_jif in wb_kupdate()	2009-05-17 16:36:11 -07:00
pagewalk.c
pdflush.c	Revert "mm: add /proc controls for pdflush threads"	2009-05-15 11:32:24 +02:00
percpu.c	percpu: remove rbtree and use page->index instead	2009-04-08 18:31:31 +02:00
prio_tree.c
quicklist.c	cpumask: replace node_to_cpumask with cpumask_of_node.	2009-03-13 14:49:46 +10:30
readahead.c	readahead: introduce context readahead algorithm	2009-06-16 19:47:30 -07:00
rmap.c	hugh: update email address	2009-05-21 13:14:32 -07:00
shmem_acl.c
shmem.c	integrity: move ima_counts_get	2009-05-22 09:45:33 +10:00
slab.c	slab: setup cpu caches later on when interrupts are enabled	2009-06-12 18:53:58 +03:00
slob.c	kmemleak: Add the slob memory allocation/freeing hooks	2009-06-11 17:03:30 +01:00
slub.c	slab,slub: don't enable interrupts during early boot	2009-06-12 18:53:33 +03:00
sparse-vmemmap.c
sparse.c	mm: mminit_validate_memmodel_limits(): remove redundant test	2009-04-01 08:59:11 -07:00
swap_state.c	memcg: fix deadlock between lock_page_cgroup and mapping tree_lock	2009-05-29 08:40:02 -07:00
swap.c	mm: fix Committed_AS underflow on large NR_CPUS environment	2009-05-02 15:36:10 -07:00
swapfile.c	PM/hibernate: fix "swap breaks after hibernation failures"	2009-02-21 14:17:17 -08:00
thrash.c
truncate.c	memcg: fix deadlock between lock_page_cgroup and mapping tree_lock	2009-05-29 08:40:02 -07:00
util.c	Merge branch 'linus' into tracing/core	2009-05-07 11:17:34 +02:00
vmalloc.c	Merge branch 'for-linus' of git://linux-arm.org/linux-2.6	2009-06-11 14:15:57 -07:00
vmscan.c	PM/Suspend: Do not shrink memory before suspend	2009-06-12 21:32:32 +02:00
vmstat.c	[ARM] Double check memmap is actually valid with a memmap has unexpected holes V2	2009-05-18 11:22:24 +01:00