linux/fs/nfs
David Howells 201a15428b FS-Cache: Handle pages pending storage that get evicted under OOM conditions
Handle netfs pages that the vmscan algorithm wants to evict from the pagecache
under OOM conditions, but that are waiting for write to the cache.  Under these
conditions, vmscan calls the releasepage() function of the netfs, asking if a
page can be discarded.

The problem is typified by the following trace of a stuck process:

	kslowd005     D 0000000000000000     0  4253      2 0x00000080
	 ffff88001b14f370 0000000000000046 ffff880020d0d000 0000000000000007
	 0000000000000006 0000000000000001 ffff88001b14ffd8 ffff880020d0d2a8
	 000000000000ddf0 00000000000118c0 00000000000118c0 ffff880020d0d2a8
	Call Trace:
	 [<ffffffffa00782d8>] __fscache_wait_on_page_write+0x8b/0xa7 [fscache]
	 [<ffffffff8104c0f1>] ? autoremove_wake_function+0x0/0x34
	 [<ffffffffa0078240>] ? __fscache_check_page_write+0x63/0x70 [fscache]
	 [<ffffffffa00b671d>] nfs_fscache_release_page+0x4e/0xc4 [nfs]
	 [<ffffffffa00927f0>] nfs_release_page+0x3c/0x41 [nfs]
	 [<ffffffff810885d3>] try_to_release_page+0x32/0x3b
	 [<ffffffff81093203>] shrink_page_list+0x316/0x4ac
	 [<ffffffff8109372b>] shrink_inactive_list+0x392/0x67c
	 [<ffffffff813532fa>] ? __mutex_unlock_slowpath+0x100/0x10b
	 [<ffffffff81058df0>] ? trace_hardirqs_on_caller+0x10c/0x130
	 [<ffffffff8135330e>] ? mutex_unlock+0x9/0xb
	 [<ffffffff81093aa2>] shrink_list+0x8d/0x8f
	 [<ffffffff81093d1c>] shrink_zone+0x278/0x33c
	 [<ffffffff81052d6c>] ? ktime_get_ts+0xad/0xba
	 [<ffffffff81094b13>] try_to_free_pages+0x22e/0x392
	 [<ffffffff81091e24>] ? isolate_pages_global+0x0/0x212
	 [<ffffffff8108e743>] __alloc_pages_nodemask+0x3dc/0x5cf
	 [<ffffffff81089529>] grab_cache_page_write_begin+0x65/0xaa
	 [<ffffffff8110f8c0>] ext3_write_begin+0x78/0x1eb
	 [<ffffffff81089ec5>] generic_file_buffered_write+0x109/0x28c
	 [<ffffffff8103cb69>] ? current_fs_time+0x22/0x29
	 [<ffffffff8108a509>] __generic_file_aio_write+0x350/0x385
	 [<ffffffff8108a588>] ? generic_file_aio_write+0x4a/0xae
	 [<ffffffff8108a59e>] generic_file_aio_write+0x60/0xae
	 [<ffffffff810b2e82>] do_sync_write+0xe3/0x120
	 [<ffffffff8104c0f1>] ? autoremove_wake_function+0x0/0x34
	 [<ffffffff810b18e1>] ? __dentry_open+0x1a5/0x2b8
	 [<ffffffff810b1a76>] ? dentry_open+0x82/0x89
	 [<ffffffffa00e693c>] cachefiles_write_page+0x298/0x335 [cachefiles]
	 [<ffffffffa0077147>] fscache_write_op+0x178/0x2c2 [fscache]
	 [<ffffffffa0075656>] fscache_op_execute+0x7a/0xd1 [fscache]
	 [<ffffffff81082093>] slow_work_execute+0x18f/0x2d1
	 [<ffffffff8108239a>] slow_work_thread+0x1c5/0x308
	 [<ffffffff8104c0f1>] ? autoremove_wake_function+0x0/0x34
	 [<ffffffff810821d5>] ? slow_work_thread+0x0/0x308
	 [<ffffffff8104be91>] kthread+0x7a/0x82
	 [<ffffffff8100beda>] child_rip+0xa/0x20
	 [<ffffffff8100b87c>] ? restore_args+0x0/0x30
	 [<ffffffff8102ef83>] ? tg_shares_up+0x171/0x227
	 [<ffffffff8104be17>] ? kthread+0x0/0x82
	 [<ffffffff8100bed0>] ? child_rip+0x0/0x20

In the above backtrace, the following is happening:

 (1) A page storage operation is being executed by a slow-work thread
     (fscache_write_op()).

 (2) FS-Cache farms the operation out to the cache to perform
     (cachefiles_write_page()).

 (3) CacheFiles is then calling Ext3 to perform the actual write, using Ext3's
     standard write (do_sync_write()) under KERNEL_DS directly from the netfs
     page.

 (4) However, for Ext3 to perform the write, it must allocate some memory, in
     particular, it must allocate at least one page cache page into which it
     can copy the data from the netfs page.

 (5) Under OOM conditions, the memory allocator can't immediately come up with
     a page, so it uses vmscan to find something to discard
     (try_to_free_pages()).

 (6) vmscan finds a clean netfs page it might be able to discard (possibly the
     one it's trying to write out).

 (7) The netfs is called to throw the page away (nfs_release_page()) - but it's
     called with __GFP_WAIT, so the netfs decides to wait for the store to
     complete (__fscache_wait_on_page_write()).

 (8) This blocks a slow-work processing thread - possibly against itself.

The system ends up stuck because it can't write out any netfs pages to the
cache without allocating more memory.

To avoid this, we make FS-Cache cancel some writes that aren't in the middle of
actually being performed.  This means that some data won't make it into the
cache this time.  To support this, a new FS-Cache function is added
fscache_maybe_release_page() that replaces what the netfs releasepage()
functions used to do with respect to the cache.

The decisions fscache_maybe_release_page() makes are counted and displayed
through /proc/fs/fscache/stats on a line labelled "VmScan".  There are four
counters provided: "nos=N" - pages that weren't pending storage; "gon=N" -
pages that were pending storage when we first looked, but weren't by the time
we got the object lock; "bsy=N" - pages that we ignored as they were actively
being written when we looked; and "can=N" - pages that we cancelled the storage
of.

What I'd really like to do is alter the behaviour of the cancellation
heuristics, depending on how necessary it is to expel pages.  If there are
plenty of other pages that aren't waiting to be written to the cache that
could be ejected first, then it would be nice to hold up on immediate
cancellation of cache writes - but I don't see a way of doing that.

Signed-off-by: David Howells <dhowells@redhat.com>
2009-11-19 18:11:35 +00:00
..
cache_lib.c NFS: Add a dns resolver for use with NFSv4 referrals and migration 2009-08-19 18:22:15 -04:00
cache_lib.h NFS: Add a dns resolver for use with NFSv4 referrals and migration 2009-08-19 18:22:15 -04:00
callback_proc.c nfs41: Backchannel: CB_SEQUENCE validation 2009-06-17 14:11:43 -07:00
callback_xdr.c trivial: remove unnecessary semicolons 2009-09-21 15:14:58 +02:00
callback.c NFSv4: Clean up the nfs.callback_tcpport option 2009-08-09 15:06:19 -04:00
callback.h nfs41: Backchannel: update cb_sequence args and results 2009-06-17 14:11:40 -07:00
client.c nfs: Avoid overrun when copying client IP address string 2009-10-06 15:42:18 -04:00
delegation.c headers: smp_lock.h redux 2009-07-12 12:22:34 -07:00
delegation.h
dir.c NFSv4: The link() operation should return any delegation on the file 2009-10-26 08:09:46 -04:00
direct.c nfs: Panic when commit fails 2009-10-23 14:16:30 -04:00
dns_resolve.c NFS: Add a dns resolver for use with NFSv4 referrals and migration 2009-08-19 18:22:15 -04:00
dns_resolve.h NFS: Add a dns resolver for use with NFSv4 referrals and migration 2009-08-19 18:22:15 -04:00
file.c const: mark struct vm_struct_operations 2009-09-27 11:39:25 -07:00
fscache-index.c
fscache.c FS-Cache: Handle pages pending storage that get evicted under OOM conditions 2009-11-19 18:11:35 +00:00
fscache.h NFS: Propagate 'fsc' mount option through automounts 2009-09-23 14:36:39 -04:00
getroot.c headers: mnt_namespace.h redux 2009-07-08 09:31:56 -07:00
idmap.c SUNRPC: Replace rpc_client->cl_dentry and cl_mnt, with a cl_path 2009-08-09 15:14:24 -04:00
inode.c truncate: use new helpers 2009-09-24 08:41:47 -04:00
internal.h NFS: Allow the "nfs" file system type to support NFSv4 2009-09-08 19:50:03 -04:00
iostat.h remove put_cpu_no_resched() 2009-06-16 19:47:48 -07:00
Kconfig Merge branch 'for-2.6.31' of git://fieldses.org/git/linux-nfsd 2009-06-22 12:55:50 -07:00
Makefile NFS: Add a dns resolver for use with NFSv4 referrals and migration 2009-08-19 18:22:15 -04:00
mount_clnt.c Merge branch 'patches_cel-for-2.6.32' into nfs-for-2.6.32 2009-08-10 17:45:50 -04:00
namespace.c NFS: Fix nfs_path() to always return a '/' at the beginning of the path 2009-06-22 21:28:25 -07:00
nfs2xdr.c headers: utsname.h redux 2009-09-23 18:13:10 -07:00
nfs3acl.c nfs: remove unnecessary NFS_INO_INVALID_ACL checks 2009-06-17 18:02:14 -07:00
nfs3proc.c headers: utsname.h redux 2009-09-23 18:13:10 -07:00
nfs3xdr.c headers: utsname.h redux 2009-09-23 18:13:10 -07:00
nfs4_fs.h NFSv4: Fix an NFSv4 mount regression 2009-07-21 16:48:07 -04:00
nfs4namespace.c NFSv4: Fix the referral mount code 2009-10-06 15:42:20 -04:00
nfs4proc.c NFSv4: Fix two unbalanced put_rpccred() issues. 2009-10-26 08:09:46 -04:00
nfs4renewd.c NFSv4: Kill nfs4_renewd_prepare_shutdown() 2009-10-08 11:50:55 -04:00
nfs4state.c const: make file_lock_operations const 2009-09-22 07:17:25 -07:00
nfs4xdr.c NFSv4: Fix a bug when the server returns NFS4ERR_RESOURCE 2009-10-23 14:46:42 -04:00
nfsroot.c NFS: Update MNT and MNT3 reply decoding functions 2009-06-17 18:02:13 -07:00
pagelist.c
proc.c headers: utsname.h redux 2009-09-23 18:13:10 -07:00
read.c NFS: Fix an O_DIRECT Oops... 2009-08-12 08:21:39 -07:00
super.c nfs: Fix nfs_parse_mount_options() kfree() leak 2009-10-22 08:15:23 +09:00
symlink.c
sysctl.c
unlink.c nfs41: use rpc prepare call state for session reset 2009-06-17 12:25:07 -07:00
write.c writeback: get rid of wbc->for_writepages 2009-09-16 15:16:18 +02:00