linux/net
Mel Gorman c76562b670 netvm: prevent a stream-specific deadlock
This patch series is based on top of "Swap-over-NBD without deadlocking
v15" as it depends on the same reservation of PF_MEMALLOC reserves logic.

When a user or administrator requires swap for their application, they
create a swap partition and file, format it with mkswap and activate it
with swapon.  In diskless systems this is not an option so if swap if
required then swapping over the network is considered.  The two likely
scenarios are when blade servers are used as part of a cluster where the
form factor or maintenance costs do not allow the use of disks and thin
clients.

The Linux Terminal Server Project recommends the use of the Network Block
Device (NBD) for swap but this is not always an option.  There is no
guarantee that the network attached storage (NAS) device is running Linux
or supports NBD.  However, it is likely that it supports NFS so there are
users that want support for swapping over NFS despite any performance
concern.  Some distributions currently carry patches that support swapping
over NFS but it would be preferable to support it in the mainline kernel.

Patch 1 avoids a stream-specific deadlock that potentially affects TCP.

Patch 2 is a small modification to SELinux to avoid using PFMEMALLOC
	reserves.

Patch 3 adds three helpers for filesystems to handle swap cache pages.
	For example, page_file_mapping() returns page->mapping for
	file-backed pages and the address_space of the underlying
	swap file for swap cache pages.

Patch 4 adds two address_space_operations to allow a filesystem
	to pin all metadata relevant to a swapfile in memory. Upon
	successful activation, the swapfile is marked SWP_FILE and
	the address space operation ->direct_IO is used for writing
	and ->readpage for reading in swap pages.

Patch 5 notes that patch 3 is bolting
	filesystem-specific-swapfile-support onto the side and that
	the default handlers have different information to what
	is available to the filesystem. This patch refactors the
	code so that there are generic handlers for each of the new
	address_space operations.

Patch 6 adds an API to allow a vector of kernel addresses to be
	translated to struct pages and pinned for IO.

Patch 7 adds support for using highmem pages for swap by kmapping
	the pages before calling the direct_IO handler.

Patch 8 updates NFS to use the helpers from patch 3 where necessary.

Patch 9 avoids setting PF_private on PG_swapcache pages within NFS.

Patch 10 implements the new swapfile-related address_space operations
	for NFS and teaches the direct IO handler how to manage
	kernel addresses.

Patch 11 prevents page allocator recursions in NFS by using GFP_NOIO
	where appropriate.

Patch 12 fixes a NULL pointer dereference that occurs when using
	swap-over-NFS.

With the patches applied, it is possible to mount a swapfile that is on an
NFS filesystem.  Swap performance is not great with a swap stress test
taking roughly twice as long to complete than if the swap device was
backed by NBD.

This patch: netvm: prevent a stream-specific deadlock

It could happen that all !SOCK_MEMALLOC sockets have buffered so much data
that we're over the global rmem limit.  This will prevent SOCK_MEMALLOC
buffers from receiving data, which will prevent userspace from running,
which is needed to reduce the buffered data.

Fix this by exempting the SOCK_MEMALLOC sockets from the rmem limit.  Once
this change it applied, it is important that sockets that set
SOCK_MEMALLOC do not clear the flag until the socket is being torn down.
If this happens, a warning is generated and the tokens reclaimed to avoid
accounting errors until the bug is fixed.

[davem@davemloft.net: Warning about clearing SOCK_MEMALLOC]
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Signed-off-by: Mel Gorman <mgorman@suse.de>
Acked-by: David S. Miller <davem@davemloft.net>
Acked-by: Rik van Riel <riel@redhat.com>
Cc: Trond Myklebust <Trond.Myklebust@netapp.com>
Cc: Neil Brown <neilb@suse.de>
Cc: Christoph Hellwig <hch@infradead.org>
Cc: Mike Christie <michaelc@cs.wisc.edu>
Cc: Eric B Munson <emunson@mgebm.net>
Cc: Sebastian Andrzej Siewior <sebastian@breakpoint.cc>
Cc: Mel Gorman <mgorman@suse.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-07-31 18:42:47 -07:00
..
9p net: Fix (nearly-)kernel-doc comments for various functions 2012-07-10 23:13:45 -07:00
802 tokenring: delete all remaining driver support 2012-05-15 20:23:16 -04:00
8021q netpoll: move np->dev and np->dev_name init into __netpoll_setup() 2012-07-17 09:02:36 -07:00
appletalk net: Fix (nearly-)kernel-doc comments for various functions 2012-07-10 23:13:45 -07:00
atm net: Remove casts to same type 2012-06-04 11:45:11 -04:00
ax25 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net 2012-07-19 11:17:30 -07:00
batman-adv Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net 2012-07-10 23:56:33 -07:00
bluetooth Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/linville/wireless-next into for-davem 2012-07-20 12:30:48 -04:00
bridge net: fix race condition in several drivers when reading stats 2012-07-22 12:12:32 -07:00
caif netvm: prevent a stream-specific deadlock 2012-07-31 18:42:47 -07:00
can can: gw: Remove pointless casts 2012-07-10 22:36:17 +02:00
ceph Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next 2012-07-24 10:01:50 -07:00
core netvm: prevent a stream-specific deadlock 2012-07-31 18:42:47 -07:00
dcb net: Fix non-kernel-doc comments with kernel-doc start marker 2012-07-10 23:13:45 -07:00
dccp ipv4: Prepare for change of rt->rt_iif encoding. 2012-07-23 16:36:26 -07:00
decnet decnet: Don't set RTCF_DIRECTSRC. 2012-07-23 13:16:59 -07:00
dns_resolver Merge branch 'next' of git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/linux-security 2012-05-21 20:27:36 -07:00
dsa dsa: Convert compare_ether_addr to ether_addr_equal 2012-05-09 20:49:19 -04:00
ethernet ipx: move peII functions 2012-07-19 10:48:00 -07:00
ieee802154 6lowpan: Change byte order when storing/accessing to len field 2012-07-16 22:52:02 -07:00
ipv4 netvm: prevent a stream-specific deadlock 2012-07-31 18:42:47 -07:00
ipv6 net: introduce sk_gfp_atomic() to allow addition of GFP flags depending on the individual socket 2012-07-31 18:42:46 -07:00
ipx ipx: move peII functions 2012-07-19 10:48:00 -07:00
irda irda: Fix typo in irda 2012-07-16 23:23:52 -07:00
iucv net: remove skb_orphan_try() 2012-06-15 15:30:15 -07:00
key net: cleanup unsigned to unsigned int 2012-04-15 12:44:40 -04:00
l2tp net: l2tp_eth: provide tx_dropped counter 2012-06-29 00:52:32 -07:00
lapb lapb: Neaten debugging 2012-05-17 18:45:20 -04:00
llc net: Fix (nearly-)kernel-doc comments for various functions 2012-07-10 23:13:45 -07:00
mac80211 Merge branch 'for-3.6-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/cooloney/linux-leds 2012-07-26 20:26:27 -07:00
mac802154 mac802154: sparse warnings: make symbols static 2012-07-12 07:54:45 -07:00
netfilter Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net 2012-07-19 11:17:30 -07:00
netlabel netlabel: use GFP flags from caller instead of GFP_ATOMIC 2012-03-22 19:29:57 -04:00
netlink genetlink: define lockdep_genl_is_held() when CONFIG_LOCKDEP 2012-07-24 00:01:30 -07:00
netrom net: Convert all sysctl registrations to register_net_sysctl 2012-04-20 21:22:30 -04:00
nfc Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/linville/wireless-next into for-davem 2012-07-20 12:30:48 -04:00
openvswitch Revert "openvswitch: potential NULL deref in sample()" 2012-07-27 13:45:51 -07:00
packet net: added support for 40GbE link. 2012-06-27 15:42:24 -07:00
phonet net: remove my future former mail address 2012-06-17 16:29:38 -07:00
rds rds: set correct msg_namelen 2012-07-23 01:01:44 -07:00
rfkill rfkill: Add the capability to switch all devices of all type in __rfkill_switch_all(). 2012-06-06 15:18:17 -04:00
rose net: Convert all sysctl registrations to register_net_sysctl 2012-04-20 21:22:30 -04:00
rxrpc Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net 2012-07-10 23:56:33 -07:00
sched ipv4: Prepare for change of rt->rt_iif encoding. 2012-07-23 16:36:26 -07:00
sctp netvm: prevent a stream-specific deadlock 2012-07-31 18:42:47 -07:00
sunrpc NFS client updates for Linux 3.6 2012-07-30 19:16:57 -07:00
tipc tipc: remove print_buf and deprecated log buffer code 2012-07-13 19:34:43 -04:00
unix net: make sock diag per-namespace 2012-07-16 22:31:34 -07:00
wanrouter wanmain: comparing array with NULL 2012-07-24 13:55:21 -07:00
wimax net: cleanup unsigned to unsigned int 2012-04-15 12:44:40 -04:00
wireless Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/linville/wireless-next into for-davem 2012-07-20 12:30:48 -04:00
x25 net: Fix (nearly-)kernel-doc comments for various functions 2012-07-10 23:13:45 -07:00
xfrm net: Document dst->obsolete better. 2012-07-20 13:31:21 -07:00
compat.c net: Fix references to out-of-scope variables in put_cmsg_compat() 2012-07-22 17:50:49 -07:00
Kconfig net: drop NET dependency from HAVE_BPF_JIT 2012-05-21 12:50:12 -07:00
Makefile econet: remove ancient bug ridden protocol 2012-05-18 01:35:08 -04:00
nonet.c
socket.c net: netprio_cgroup: rework update socket logic 2012-07-22 12:44:01 -07:00
sysctl_net.c net: delete all instances of special processing for token ring 2012-05-15 20:14:35 -04:00