Commit Graph

478 Commits

Author SHA1 Message Date
David Howells
c88d5a7fff afs: Enable IPv6 DNS lookups
Remove the restriction on DNS lookup upcalls that prevents ipv6 addresses
from being looked up.

Signed-off-by: David Howells <dhowells@redhat.com>
2018-06-15 15:27:09 +01:00
David Howells
0aac4bce4b afs: Show all of a server's addresses in /proc/fs/afs/servers
Show all of a server's addresses in /proc/fs/afs/servers, placing the
second plus addresses on padded lines of their own.  The current address is
marked with a star.

Signed-off-by: David Howells <dhowells@redhat.com>
2018-06-15 00:52:59 -04:00
David Howells
b6cfbecafb afs: Handle CONFIG_PROC_FS=n
The AFS filesystem depends at the moment on /proc for configuration and
also presents information that way - however, this causes a compilation
failure if procfs is disabled.

Fix it so that the procfs bits aren't compiled in if procfs is disabled.

This means that you can't configure the AFS filesystem directly, but it is
still usable provided that an up-to-date keyutils is installed to look up
cells by SRV or AFSDB DNS records.

Reported-by: Al Viro <viro@ZenIV.linux.org.uk>
Signed-off-by: David Howells <dhowells@redhat.com>
2018-06-15 00:52:55 -04:00
Al Viro
de52cf922a AFS fixes
-----BEGIN PGP SIGNATURE-----
 
 iQIVAwUAWvmaZvu3V2unywtrAQKZoA/9HzO6QsB7h7hWY6tTuoL0gD8T8S4hC7l3
 UYFtTgq0rFHJYiET4SWoy0Sfs8rY1iFPtaIeFVQG804SrnXu5/Q1tsv+1lRhZIuo
 /upAtZ3xEcqvAqU8pgcksKl/KUdmm7ZHUbhAFCasu+1eczGF5Q55UAUgonFrnEMi
 9N0WviRUkRAlTre7cvCMRI05c+HJV+PCYrJPjStAkJeuS1CuTEAT/d58NumquMAt
 6ENkpR4OhRUJZDhYH7XIRLm7hsYjr9v3VIeCiLpYqUZGuvhaj3jzPi0e9zD5PDzZ
 lyyodQVegBs88V2rXrjjZHohNQRiuSzI+42pMXrdaDu5jBFFqYLEeaBoperJY7nl
 W6l6HSb/I8VValM7iwkyzNWeQ6KhdUhYvA5ljYaJufZvqxp4di9xT4mAxRqbHSX+
 H5I/n+R27FEOFAqnWInaksj5IO80HGThrGhdz9O/4pa8xITz7W2ZKg5YMLEoF9yp
 /QUxsn3lz4VD4tjPrqampJ+IwbpQB+XDiJhM4boI47kC2IxEc9L2QiYWlFl/okZ4
 CGuXsluQFPleR3Mo8xq1WaQzmT40iYQ+aBOPq1/OhDisexZJ55Cjha1GHk/8aHDu
 GL5UiL7AfWEwY20mJiCObg8u2nnkwg/0YPR3awDBlCMDBeYhxbSFOLrKiQxUjWM9
 Pp6PUhTtSjU=
 =1ow3
 -----END PGP SIGNATURE-----

Merge tag 'afs-fixes-20180514' into afs-proc

backmerge AFS fixes that went into mainline and deal with
the conflict in fs/afs/fsclient.c

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2018-06-02 18:09:27 -04:00
David Howells
5b86d4ff5d afs: Implement network namespacing
Implement network namespacing within AFS, but don't yet let mounts occur
outside the init namespace.  An additional patch will be required propagate
the network namespace across automounts.

Signed-off-by: David Howells <dhowells@redhat.com>
2018-05-23 12:01:15 +01:00
David Howells
1588def91d afs: Mark afs_net::ws_cell as __rcu and set using rcu functions
The afs_net::ws_cell member is sometimes used under RCU conditions from
within an seq-readlock.  It isn't, however, marked __rcu and it isn't set
using the proper RCU barrier-imposing functions.

Fix this by annotating it with __rcu and using appropriate barriers to
make sure accesses are correctly ordered.

Without this, the code can produce the following warning:

>> fs/afs/proc.c:151:24: sparse: incompatible types in comparison expression (different address spaces)

Fixes: f044c8847b ("afs: Lay the groundwork for supporting network namespaces")
Reported-by: kbuild test robot <lkp@intel.com>
Signed-off-by: David Howells <dhowells@redhat.com>
2018-05-23 11:51:29 +01:00
David Howells
c875c76a06 afs: Fix a Sparse warning in xdr_decode_AFSFetchStatus()
Sparse doesn't appear able to handle the conditionally-taken locks in
xdr_decode_AFSFetchStatus(), even though the lock and unlock are both
contingent on the same unvarying function argument.

Deal with this by interpolating a wrapper function that takes the lock if
needed and calls xdr_decode_AFSFetchStatus() on two separate branches, one
with the lock held and one without.

This allows Sparse to work out the locking.

Signed-off-by: David Howells <dhowells@redhat.com>
2018-05-23 11:32:06 +01:00
David Howells
5d9de25d93 afs: Rearrange fs/afs/proc.c to remove remaining predeclarations.
Rearrange fs/afs/proc.c to get rid of all the remaining predeclarations.

Signed-off-by: David Howells <dhowells@redhat.com>
2018-05-18 11:46:15 +01:00
David Howells
f06916895b afs: Rearrange fs/afs/proc.c to move the show routines up
Rearrange fs/afs/proc.c to move the show routines up to the top of each
block so the order is show, iteration, ops, file ops, fops.

Signed-off-by: David Howells <dhowells@redhat.com>
2018-05-18 11:46:14 +01:00
David Howells
22ade7e7a8 afs: Rearrange fs/afs/proc.c by moving fops and open functions down
Rearrange fs/afs/proc.c by moving fops and open functions down so as to
remove predeclarations.

Signed-off-by: David Howells <dhowells@redhat.com>
2018-05-18 11:46:14 +01:00
David Howells
10495a0071 afs: Move /proc management functions to the end of the file
In fs/afs/proc.c, move functions that create and remove /proc files to the
end of the source file as a first stage in getting rid of all the forward
declarations.

Signed-off-by: David Howells <dhowells@redhat.com>
2018-05-18 11:46:14 +01:00
Christoph Hellwig
353861cf05 afs: simplify procfs code
Use remove_proc_subtree to remove the whole subtree on cleanup, and
unwind the registration loop into individual calls.  Switch to use
proc_create_seq where applicable.

Signed-off-by: Christoph Hellwig <hch@lst.de>
2018-05-16 07:24:30 +02:00
David Howells
4776cab43f afs: Fix the non-encryption of calls
Some AFS servers refuse to accept unencrypted traffic, so can't be accessed
with kAFS.  Set the AF_RXRPC security level to encrypt client calls to deal
with this.

Note that incoming service calls are set by the remote client and so aren't
affected by this.

This requires an AF_RXRPC patch to pass the value set by setsockopt to calls
begun by the kernel.

Signed-off-by: David Howells <dhowells@redhat.com>
2018-05-14 15:15:19 +01:00
David Howells
428edade4e afs: Fix CB.CallBack handling
The handling of CB.CallBack messages sent by the fileserver to the client
is broken in that they are currently being processed after the reply has
been transmitted.

This is not what the fileserver expects, however.  It holds up change
visibility until the reply comes so as to maintain cache coherency, and so
expects the client to have to refetch the state on the affected files.

Fix CB.CallBack handling to perform the callback break before sending the
reply.

The fileserver is free to hold up status fetches issued by other threads on
the same client that occur in reponse to the callback until any pending
changes have been committed.

Fixes: d001648ec7 ("rxrpc: Don't expose skbs to in-kernel users [ver #2]")
Signed-off-by: David Howells <dhowells@redhat.com>
2018-05-14 15:15:19 +01:00
David Howells
68251f0a68 afs: Fix whole-volume callback handling
It's possible for an AFS file server to issue a whole-volume notification
that callbacks on all the vnodes in the file have been broken.  This is
done for R/O and backup volumes (which don't have per-file callbacks) and
for things like a volume being taken offline.

Fix callback handling to detect whole-volume notifications, to track it
across operations and to check it during inode validation.

Fixes: c435ee3455 ("afs: Overhaul the callback handling")
Signed-off-by: David Howells <dhowells@redhat.com>
2018-05-14 15:15:18 +01:00
Marc Dionne
f9c1bba3d3 afs: Fix afs_find_server search loop
The code that looks up servers by addresses makes the assumption
that the list of addresses for a server is sorted.  It exits the
loop if it finds that the target address is larger than the
current candidate.  As the list is not currently sorted, this
can lead to a failure to find a matching server, which can cause
callbacks from that server to be ignored.

Remove the early exit case so that the complete list is searched.

Fixes: d2ddc776a4 ("afs: Overhaul volume and server record caching and fileserver rotation")
Signed-off-by: Marc Dionne <marc.dionne@auristor.com>
Signed-off-by: David Howells <dhowells@redhat.com>
2018-05-14 15:15:18 +01:00
David Howells
a86b06d1cc afs: Fix the handling of an unfound server in CM operations
If the client cache manager operations that need the server record
(CB.Callback, CB.InitCallBackState, and CB.InitCallBackState3) can't find
the server record, they abort the call from the file server with
RX_CALL_DEAD when they should return okay.

Fixes: c35eccb1f6 ("[AFS]: Implement the CB.InitCallBackState3 operation.")
Signed-off-by: David Howells <dhowells@redhat.com>
2018-05-14 15:15:18 +01:00
David Howells
3709a399c1 afs: Add a tracepoint to record callbacks from unlisted servers
Add a tracepoint to record callbacks from servers for which we don't have a
record.

Signed-off-by: David Howells <dhowells@redhat.com>
2018-05-14 15:15:18 +01:00
David Howells
001ab5a67e afs: Fix the handling of CB.InitCallBackState3 to find the server by UUID
Fix the handling of the CB.InitCallBackState3 service call to find the
record of a server that we're using by looking it up by the UUID passed as
the parameter rather than by its address (of which it might have many, and
which may change).

Fixes: c35eccb1f6 ("[AFS]: Implement the CB.InitCallBackState3 operation.")
Signed-off-by: David Howells <dhowells@redhat.com>
2018-05-14 15:15:18 +01:00
David Howells
3d9fa91161 afs: Fix VNOVOL handling in address rotation
If a volume location record lists multiple file servers for a volume, then
it's possible that due to a misconfiguration or a changing configuration
that one of the file servers doesn't know about it yet and will abort
VNOVOL.  Currently, the rotation algorithm will stop with EREMOTEIO.

Fix this by moving on to try the next server if VNOVOL is returned.  Once
all the servers have been tried and the record rechecked, the algorithm
will stop with EREMOTEIO or ENOMEDIUM.

Fixes: d2ddc776a4 ("afs: Overhaul volume and server record caching and fileserver rotation")
Reported-by: Marc Dionne <marc.dionne@auristor.com>
Signed-off-by: David Howells <dhowells@redhat.com>
2018-05-14 15:15:18 +01:00
David Howells
684b0f68cf afs: Fix AFSFetchStatus decoder to provide OpenAFS compatibility
The OpenAFS server's RXAFS_InlineBulkStatus implementation has a bug
whereby if an error occurs on one of the vnodes being queried, then the
errorCode field is set correctly in the corresponding status, but the
interfaceVersion field is left unset.

Fix kAFS to deal with this by evaluating the AFSFetchStatus blob against
the following cases when called from FS.InlineBulkStatus delivery:

 (1) If InterfaceVersion == 0 then:

     (a) If errorCode != 0 then it indicates the abort code for the
         corresponding vnode.

     (b) If errorCode == 0 then the status record is invalid.

 (2) If InterfaceVersion == 1 then:

     (a) If errorCode != 0 then it indicates the abort code for the
         corresponding vnode.

     (b) If errorCode == 0 then the status record is valid and can be
     	 parsed.

 (3) If InterfaceVersion is anything else then the status record is
     invalid.

Fixes: dd9fbcb8e1 ("afs: Rearrange status mapping")
Reported-by: Jeffrey Altman <jaltman@auristor.com>
Signed-off-by: David Howells <dhowells@redhat.com>
2018-05-14 15:15:18 +01:00
David Howells
ec5a3b4b50 afs: Fix server rotation's handling of fileserver probe failure
The server rotation algorithm just gives up if it fails to probe a
fileserver.  Fix this by rotating to the next fileserver instead.

Fixes: d2ddc776a4 ("afs: Overhaul volume and server record caching and fileserver rotation")
Signed-off-by: David Howells <dhowells@redhat.com>
2018-05-14 13:26:44 +01:00
David Howells
d4a96bec7a afs: Fix refcounting in callback registration
The refcounting on afs_cb_interest struct objects in
afs_register_server_cb_interest() is wrong as it uses the server list
entry's call back interest pointer without regard for the fact that it
might be replaced at any time and the object thrown away.

Fix this by:

 (1) Put a lock on the afs_server_list struct that can be used to
     mediate access to the callback interest pointers in the servers array.

 (2) Keep a ref on the callback interest that we get from the entry.

 (3) Dropping the old reference held by vnode->cb_interest if we replace
     the pointer.

Fixes: c435ee3455 ("afs: Overhaul the callback handling")
Signed-off-by: David Howells <dhowells@redhat.com>
2018-05-14 13:17:35 +01:00
David Howells
f2686b0926 afs: Fix giving up callbacks on server destruction
When a server record is destroyed, we want to send a message to the server
telling it that we're giving up all the callbacks it has promised us.

Apply two fixes to this:

 (1) Only send the FS.GiveUpAllCallBacks message if we actually got a
     callback from that server.  We assume this to be the case if we
     performed at least one successful FS operation on that server.

 (2) Send it to the address last used for that server rather than always
     picking the first address in the list (which might be unreachable).

Fixes: d2ddc776a4 ("afs: Overhaul volume and server record caching and fileserver rotation")
Signed-off-by: David Howells <dhowells@redhat.com>
2018-05-14 13:17:35 +01:00
David Howells
01fd79e6de afs: Fix address list parsing
The parsing of port specifiers in the address list obtained from the DNS
resolution upcall doesn't work as in4_pton() and in6_pton() will fail on
encountering an unexpected delimiter (in this case, the '+' marking the
port number).  However, in*_pton() can't be given multiple specifiers.

Fix this by finding the delimiter in advance and not relying on in*_pton()
to find the end of the address for us.

Fixes: 8b2a464ced ("afs: Add an address list concept")
Signed-off-by: David Howells <dhowells@redhat.com>
2018-05-14 13:17:35 +01:00
David Howells
b61f7dcf4e afs: Fix directory page locking
The afs directory loading code (primarily afs_read_dir()) locks all the
pages that hold a directory's content blob to defend against
getdents/getdents races and getdents/lookup races where the competitors
issue conflicting reads on the same data.  As the reads will complete
consecutively, they may retrieve different versions of the data and
one may overwrite the data that the other is busy parsing.

Fix this by not locking the pages at all, but rather by turning the
validation lock into an rwsem and getting an exclusive lock on it whilst
reading the data or validating the attributes and a shared lock whilst
parsing the data.  Sharing the attribute validation lock should be fine as
the data fetch will retrieve the attributes also.

The individual page locks aren't needed at all as the only place they're
being used is to serialise data loading.

Without this patch, the:

 	if (!test_bit(AFS_VNODE_DIR_VALID, &dvnode->flags)) {
		...
	}

part of afs_read_dir() may be skipped, leaving the pages unlocked when we
hit the success: clause - in which case we try to unlock the not-locked
pages, leading to the following oops:

  page:ffffe38b405b4300 count:3 mapcount:0 mapping:ffff98156c83a978 index:0x0
  flags: 0xfffe000001004(referenced|private)
  raw: 000fffe000001004 ffff98156c83a978 0000000000000000 00000003ffffffff
  raw: dead000000000100 dead000000000200 0000000000000001 ffff98156b27c000
  page dumped because: VM_BUG_ON_PAGE(!PageLocked(page))
  page->mem_cgroup:ffff98156b27c000
  ------------[ cut here ]------------
  kernel BUG at mm/filemap.c:1205!
  ...
  RIP: 0010:unlock_page+0x43/0x50
  ...
  Call Trace:
   afs_dir_iterate+0x789/0x8f0 [kafs]
   ? _cond_resched+0x15/0x30
   ? kmem_cache_alloc_trace+0x166/0x1d0
   ? afs_do_lookup+0x69/0x490 [kafs]
   ? afs_do_lookup+0x101/0x490 [kafs]
   ? key_default_cmp+0x20/0x20
   ? request_key+0x3c/0x80
   ? afs_lookup+0xf1/0x340 [kafs]
   ? __lookup_slow+0x97/0x150
   ? lookup_slow+0x35/0x50
   ? walk_component+0x1bf/0x490
   ? path_lookupat.isra.52+0x75/0x200
   ? filename_lookup.part.66+0xa0/0x170
   ? afs_end_vnode_operation+0x41/0x60 [kafs]
   ? __check_object_size+0x9c/0x171
   ? strncpy_from_user+0x4a/0x170
   ? vfs_statx+0x73/0xe0
   ? __do_sys_newlstat+0x39/0x70
   ? __x64_sys_getdents+0xc9/0x140
   ? __x64_sys_getdents+0x140/0x140
   ? do_syscall_64+0x5b/0x160
   ? entry_SYSCALL_64_after_hwframe+0x44/0xa9

Fixes: f3ddee8dc4 ("afs: Fix directory handling")
Reported-by: Marc Dionne <marc.dionne@auristor.com>
Signed-off-by: David Howells <dhowells@redhat.com>
2018-05-14 13:17:35 +01:00
David Howells
660625922b afs: Fix server record deletion
AFS server records get removed from the net->fs_servers tree when
they're deleted, but not from the net->fs_addresses{4,6} lists, which
can lead to an oops in afs_find_server() when a server record has been
removed, for instance during rmmod.

Fix this by deleting the record from the by-address lists before posting
it for RCU destruction.

The reason this hasn't been noticed before is that the fileserver keeps
probing the local cache manager, thereby keeping the service record
alive, so the oops would only happen when a fileserver eventually gets
bored and stops pinging or if the module gets rmmod'd and a call comes
in from the fileserver during the window between the server records
being destroyed and the socket being closed.

The oops looks something like:

  BUG: unable to handle kernel NULL pointer dereference at 000000000000001c
  ...
  Workqueue: kafsd afs_process_async_call [kafs]
  RIP: 0010:afs_find_server+0x271/0x36f [kafs]
  ...
  Call Trace:
   afs_deliver_cb_init_call_back_state3+0x1f2/0x21f [kafs]
   afs_deliver_to_call+0x1ee/0x5e8 [kafs]
   afs_process_async_call+0x5b/0xd0 [kafs]
   process_one_work+0x2c2/0x504
   worker_thread+0x1d4/0x2ac
   kthread+0x11f/0x127
   ret_from_fork+0x24/0x30

Fixes: d2ddc776a4 ("afs: Overhaul volume and server record caching and fileserver rotation")
Signed-off-by: David Howells <dhowells@redhat.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-04-20 09:59:33 -07:00
Linus Torvalds
19e8a2f875 Merge branch 'afs-dh' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs
Pull AFS updates from Al Viro:
 "The AFS series posted by dhowells depended upon lookup_one_len()
  rework; now that prereq is in the mainline, that series had been
  rebased on top of it and got some exposure and testing..."

* 'afs-dh' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
  afs: Do better accretion of small writes on newly created content
  afs: Add stats for data transfer operations
  afs: Trace protocol errors
  afs: Locally edit directory data for mkdir/create/unlink/...
  afs: Adjust the directory XDR structures
  afs: Split the directory content defs into a header
  afs: Fix directory handling
  afs: Split the dynroot stuff out and give it its own ops tables
  afs: Keep track of invalid-before version for dentry coherency
  afs: Rearrange status mapping
  afs: Make it possible to get the data version in readpage
  afs: Init inode before accessing cache
  afs: Introduce a statistics proc file
  afs: Dump bad status record
  afs: Implement @cell substitution handling
  afs: Implement @sys substitution handling
  afs: Prospectively look up extra files when doing a single lookup
  afs: Don't over-increment the cell usage count when pinning it
  afs: Fix checker warnings
  vfs: Remove the const from dir_context::actor
2018-04-12 11:59:06 -07:00
Matthew Wilcox
b93b016313 page cache: use xa_lock
Remove the address_space ->tree_lock and use the xa_lock newly added to
the radix_tree_root.  Rename the address_space ->page_tree to ->i_pages,
since we don't really care that it's a tree.

[willy@infradead.org: fix nds32, fs/dax.c]
  Link: http://lkml.kernel.org/r/20180406145415.GB20605@bombadil.infradead.orgLink: http://lkml.kernel.org/r/20180313132639.17387-9-willy@infradead.org
Signed-off-by: Matthew Wilcox <mawilcox@microsoft.com>
Acked-by: Jeff Layton <jlayton@redhat.com>
Cc: Darrick J. Wong <darrick.wong@oracle.com>
Cc: Dave Chinner <david@fromorbit.com>
Cc: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
Cc: Will Deacon <will.deacon@arm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-04-11 10:28:39 -07:00
David Howells
5a81327616 afs: Do better accretion of small writes on newly created content
Processes like ld that do lots of small writes that aren't necessarily
contiguous result in a lot of small StoreData operations to the server, the
idea being that if someone else changes the data on the server, we only
write our changes over that and not the space between.  Further, we don't
want to write back empty space if we can avoid it to make it easier for the
server to do sparse files.

However, making lots of tiny RPC ops is a lot less efficient for the server
than one big one because each op requires allocation of resources and the
taking of locks, so we want to compromise a bit.

Reduce the load by the following:

 (1) If a file is just created locally or has just been truncated with
     O_TRUNC locally, allow subsequent writes to the file to be merged with
     intervening space if that space doesn't cross an entire intervening
     page.

 (2) Don't flush the file on ->flush() but rather on ->release() if the
     file was open for writing.

Just linking vmlinux.o, without this patch, looking in /proc/fs/afs/stats:

	file-wr : n=441 nb=513581204

and after the patch:

	file-wr : n=62 nb=513668555

there were 379 fewer StoreData RPC operations at the expense of an extra
87K being written.

Signed-off-by: David Howells <dhowells@redhat.com>
2018-04-09 21:54:48 +01:00
David Howells
76a5cb6fc1 afs: Add stats for data transfer operations
Add statistics to /proc/fs/afs/stats for data transfer RPC operations.  New
lines are added that look like:

	file-rd : n=55794 nb=10252282150
	file-wr : n=9789 nb=3247763645

where n= indicates the number of ops completed and nb= indicates the number
of bytes successfully transferred.  file-rd is the counts for read/fetch
operations and file-wr the counts for write/store operations.

Note that directory and symlink downloading are included in the file-rd
stats at the moment.

Signed-off-by: David Howells <dhowells@redhat.com>
2018-04-09 21:54:48 +01:00
David Howells
5f702c8e12 afs: Trace protocol errors
Trace protocol errors detected in afs.

Signed-off-by: David Howells <dhowells@redhat.com>
2018-04-09 21:54:48 +01:00
David Howells
63a4681ff3 afs: Locally edit directory data for mkdir/create/unlink/...
Locally edit the contents of an AFS directory upon a successful inode
operation that modifies that directory (such as mkdir, create and unlink)
so that we can avoid the current practice of re-downloading the directory
after each change.

This is viable provided that the directory version number we get back from
the modifying RPC op is exactly incremented by 1 from what we had
previously.  The data in the directory contents is in a defined format that
we have to parse locally to perform lookups and readdir, so modifying isn't
a problem.

If the edit fails, we just clear the VALID flag on the directory and it
will be reloaded next time it is needed.

Signed-off-by: David Howells <dhowells@redhat.com>
2018-04-09 21:54:48 +01:00
David Howells
0031763698 afs: Adjust the directory XDR structures
Adjust the AFS directory XDR structures in a number of superficial ways:

 (1) Rename them to all begin afs_xdr_.

 (2) Use u8 instead of uint8_t.

 (3) Mark the structures as __packed so they don't get rearranged by the
     compiler.

 (4) Rename the hdr member of afs_xdr_dir_block to meta.

 (5) Rename the pagehdr member of afs_xdr_dir_block to hdr.

Signed-off-by: David Howells <dhowells@redhat.com>
2018-04-09 21:54:48 +01:00
David Howells
4ea219a839 afs: Split the directory content defs into a header
Split the directory content definitions into a header file so that they can
be used by multiple .c files.

Signed-off-by: David Howells <dhowells@redhat.com>
2018-04-09 21:54:48 +01:00
David Howells
f3ddee8dc4 afs: Fix directory handling
AFS directories are structured blobs that are downloaded just like files
and then parsed by the lookup and readdir code and, as such, are currently
handled in the pagecache like any other file, with the entire directory
content being thrown away each time the directory changes.

However, since the blob is a known structure and since the data version
counter on a directory increases by exactly one for each change committed
to that directory, we can actually edit the directory locally rather than
fetching it from the server after each locally-induced change.

What we can't do, though, is mix data from the server and data from the
client since the server is technically at liberty to rearrange or compress
a directory if it sees fit, provided it updates the data version number
when it does so and breaks the callback (ie. sends a notification).

Further, lookup with lookup-ahead, readdir and, when it arrives, local
editing are likely want to scan the whole of a directory.

So directory handling needs to be improved to maintain the coherency of the
directory blob prior to permitting local directory editing.

To this end:

 (1) If any directory page gets discarded, invalidate and reread the entire
     directory.

 (2) If readpage notes that if when it fetches a single page that the
     version number has changed, the entire directory is flagged for
     invalidation.

 (3) Read as much of the directory in one go as we can.

Note that this removes local caching of directories in fscache for the
moment as we can't pass the pages to fscache_read_or_alloc_pages() since
page->lru is in use by the LRU.

Signed-off-by: David Howells <dhowells@redhat.com>
2018-04-09 21:54:48 +01:00
David Howells
66c7e1d319 afs: Split the dynroot stuff out and give it its own ops tables
Split the AFS dynamic root stuff out of the main directory handling file
and into its own file as they share little in common.

The dynamic root code also gets its own dentry and inode ops tables.

Signed-off-by: David Howells <dhowells@redhat.com>
2018-04-09 21:54:00 +01:00
David Howells
a4ff7401fb afs: Keep track of invalid-before version for dentry coherency
Each afs dentry is tagged with the version that the parent directory was at
last time it was validated and, currently, if this differs, the directory
is scanned and the dentry is refreshed.

However, this leads to an excessive amount of revalidation on directories
that get modified on the client without conflict with another client.  We
know there's no conflict because the parent directory's data version number
got incremented by exactly 1 on any create, mkdir, unlink, etc., therefore
we can trust the current state of the unaffected dentries when we perform a
local directory modification.

Optimise by keeping track of the last version of the parent directory that
was changed outside of the client in the parent directory's vnode and using
that to validate the dentries rather than the current version.

Signed-off-by: David Howells <dhowells@redhat.com>
2018-04-09 21:53:59 +01:00
David Howells
dd9fbcb8e1 afs: Rearrange status mapping
Rearrange the AFSFetchStatus to inode attribute mapping code in a number of
ways:

 (1) Use an XDR structure rather than a series of incremented pointer
     accesses when decoding an AFSFetchStatus object.  This allows
     out-of-order decode.

 (2) Don't store the if_version value but rather just check it and abort if
     it's not something we can handle.

 (3) Store the owner and group in the status record as raw values rather
     than converting them to kuid/kgid.  Do that when they're mapped into
     i_uid/i_gid.

 (4) Validate the type and abort code up front and abort if they're wrong.

 (5) Split the inode attribute setting out into its own function from the
     XDR decode of an AFSFetchStatus object.  This allows it to be called
     from elsewhere too.

 (6) Differentiate changes to data from changes to metadata.

 (7) Use the split-out attribute mapping function from afs_iget().

Signed-off-by: David Howells <dhowells@redhat.com>
2018-04-09 21:53:59 +01:00
David Howells
0c3a5ac281 afs: Make it possible to get the data version in readpage
Store the data version number indicated by an FS.FetchData op into the read
request structure so that it's accessible by the page reader.

Signed-off-by: David Howells <dhowells@redhat.com>
2018-04-09 21:53:56 +01:00
David Howells
5800db810a afs: Init inode before accessing cache
We no longer parse symlinks when we get the inode to determine if this
symlink is actually a mountpoint as we detect that by examining the mode
instead (symlinks are always 0777 and mountpoints 0644).

Access the cache after mapping the status so that we don't have to manually
set the inode size now.

Note that this may need adjusting if the disconnected operation is
implemented as the file metadata may have to be obtained from the cache.

Signed-off-by: David Howells <dhowells@redhat.com>
2018-04-09 21:53:55 +01:00
David Howells
d55b4da433 afs: Introduce a statistics proc file
Introduce a proc file that displays a bunch of statistics for the AFS
filesystem in the current network namespace.

Signed-off-by: David Howells <dhowells@redhat.com>
2018-04-09 21:53:54 +01:00
David Howells
888b338461 afs: Dump bad status record
Dump an AFS FileStatus record that is detected as invalid.

Signed-off-by: David Howells <dhowells@redhat.com>
2018-04-09 21:53:52 +01:00
David Howells
37ab636880 afs: Implement @cell substitution handling
Implement @cell substitution handling such that if @cell is seen as a name
in a dynamic root mount, then the name of the root cell for that network
namespace will be substituted for @cell during lookup.

The substitution of @cell for the current net namespace is set by writing
the cell name to /proc/fs/afs/rootcell.  The value can be obtained by
reading the file.

For example:

	# mount -t afs none /kafs -o dyn
	# echo grand.central.org >/proc/fs/afs/rootcell
	# ls /kafs/@cell
	archive/  cvs/  doc/  local/  project/  service/  software/  user/  www/
	# cat /proc/fs/afs/rootcell
	grand.central.org

Signed-off-by: David Howells <dhowells@redhat.com>
2018-04-09 21:18:58 +01:00
David Howells
6f8880d8e6 afs: Implement @sys substitution handling
Implement the AFS feature by which @sys at the end of a pathname component
may be substituted for one of a list of values, typically naming the
operating system.  Up to 16 alternatives may be specified and these are
tried in turn until one works.  Each network namespace has[*] a separate
independent list.

Upon creation of a new network namespace, the list of values is
initialised[*] to a single OpenAFS-compatible string representing arch type
plus "_linux26".  For example, on x86_64, the sysname is "amd64_linux26".

[*] Or will, once network namespace support is finalised in kAFS.

The list may be set by:

	# for i in foo bar linux-x86_64; do echo $i; done >/proc/fs/afs/sysname

for which separate writes to the same fd are amalgamated and applied on
close.  The LF character may be used as a separator to specify multiple
items in the same write() call.

The list may be cleared by:

	# echo >/proc/fs/afs/sysname

and read by:

	# cat /proc/fs/afs/sysname
	foo
	bar
	linux-x86_64

Signed-off-by: David Howells <dhowells@redhat.com>
2018-04-09 21:12:31 +01:00
David Howells
5cf9dd55a0 afs: Prospectively look up extra files when doing a single lookup
When afs_lookup() is called, prospectively look up the next 50 uncached
fids also from that same directory and cache the results, rather than just
looking up the one file requested.

This allows us to use the FS.InlineBulkStatus RPC op to increase efficiency
by fetching up to 50 file statuses at a time.

Signed-off-by: David Howells <dhowells@redhat.com>
2018-04-09 21:12:31 +01:00
David Howells
17814aef57 afs: Don't over-increment the cell usage count when pinning it
AFS cells that are added or set as the workstation cell through /proc are
pinned against removal by setting the AFS_CELL_FL_NO_GC flag on them and
taking a ref.  The ref should be only taken if the flag wasn't already set.

Fix this by making it conditional.

Without this an assertion failure will occur during module removal
indicating that the refcount is too elevated.

Signed-off-by: David Howells <dhowells@redhat.com>
2018-04-09 21:12:31 +01:00
David Howells
fe342cf77b afs: Fix checker warnings
Fix warnings raised by checker, including:

 (*) Warnings raised by unequal comparison for the purposes of sorting,
     where the endianness doesn't matter:

fs/afs/addr_list.c:246:21: warning: restricted __be16 degrades to integer
fs/afs/addr_list.c:246:30: warning: restricted __be16 degrades to integer
fs/afs/addr_list.c:248:21: warning: restricted __be32 degrades to integer
fs/afs/addr_list.c:248:49: warning: restricted __be32 degrades to integer
fs/afs/addr_list.c:283:21: warning: restricted __be16 degrades to integer
fs/afs/addr_list.c:283:30: warning: restricted __be16 degrades to integer

 (*) afs_set_cb_interest() is not actually used and can be removed.

 (*) afs_cell_gc_delay() should be provided with a sysctl.

 (*) afs_cell_destroy() needs to use rcu_access_pointer() to read
     cell->vl_addrs.

 (*) afs_init_fs_cursor() should be static.

 (*) struct afs_vnode::permit_cache needs to be marked __rcu.

 (*) afs_server_rcu() needs to use rcu_access_pointer().

 (*) afs_destroy_server() should use rcu_access_pointer() on
     server->addresses as the server object is no longer accessible.

 (*) afs_find_server() casts __be16/__be32 values to int in order to
     directly compare them for the purpose of finding a match in a list,
     but is should also annotate the cast with __force to avoid checker
     warnings.

 (*) afs_check_permit() accesses vnode->permit_cache outside of the RCU
     readlock, though it doesn't then access the value; the extraneous
     access is deleted.

False positives:

 (*) Conditional locking around the code in xdr_decode_AFSFetchStatus.  This
     can be dealt with in a separate patch.

fs/afs/fsclient.c:148:9: warning: context imbalance in 'xdr_decode_AFSFetchStatus' - different lock contexts for basic block

 (*) Incorrect handling of seq-retry lock context balance:

fs/afs/inode.c:455:38: warning: context imbalance in 'afs_getattr' - different
lock contexts for basic block
fs/afs/server.c:52:17: warning: context imbalance in 'afs_find_server' - different lock contexts for basic block
fs/afs/server.c:128:17: warning: context imbalance in 'afs_find_server_by_uuid' - different lock contexts for basic block

Errors:

 (*) afs_lookup_cell_rcu() needs to break out of the seq-retry loop, not go
     round again if it successfully found the workstation cell.

 (*) Fix UUID decode in afs_deliver_cb_probe_uuid().

 (*) afs_cache_permit() has a missing rcu_read_unlock() before one of the
     jumps to the someone_else_changed_it label.  Move the unlock to after
     the label.

 (*) afs_vl_get_addrs_u() is using ntohl() rather than htonl() when
     encoding to XDR.

 (*) afs_deliver_yfsvl_get_endpoints() is using htonl() rather than ntohl()
     when decoding from XDR.

Signed-off-by: David Howells <dhowells@redhat.com>
2018-04-09 21:12:31 +01:00
David Howells
ee1235a9a0 fscache: Pass object size in rather than calling back for it
Pass the object size in to fscache_acquire_cookie() and
fscache_write_page() rather than the netfs providing a callback by which it
can be received.  This makes it easier to update the size of the object
when a new page is written that extends the object.

The current object size is also passed by fscache to the check_aux
function, obviating the need to store it in the aux data.

Signed-off-by: David Howells <dhowells@redhat.com>
Acked-by: Anna Schumaker <anna.schumaker@netapp.com>
Tested-by: Steve Dickson <steved@redhat.com>
2018-04-06 14:05:14 +01:00
David Howells
402cb8dda9 fscache: Attach the index key and aux data to the cookie
Attach copies of the index key and auxiliary data to the fscache cookie so
that:

 (1) The callbacks to the netfs for this stuff can be eliminated.  This
     can simplify things in the cache as the information is still
     available, even after the cache has relinquished the cookie.

 (2) Simplifies the locking requirements of accessing the information as we
     don't have to worry about the netfs object going away on us.

 (3) The cache can do lazy updating of the coherency information on disk.
     As long as the cache is flushed before reboot/poweroff, there's no
     need to update the coherency info on disk every time it changes.

 (4) Cookies can be hashed or put in a tree as the index key is easily
     available.  This allows:

     (a) Checks for duplicate cookies can be made at the top fscache layer
     	 rather than down in the bowels of the cache backend.

     (b) Caching can be added to a netfs object that has a cookie if the
     	 cache is brought online after the netfs object is allocated.

A certain amount of space is made in the cookie for inline copies of the
data, but if it won't fit there, extra memory will be allocated for it.

The downside of this is that live cache operation requires more memory.

Signed-off-by: David Howells <dhowells@redhat.com>
Acked-by: Anna Schumaker <anna.schumaker@netapp.com>
Tested-by: Steve Dickson <steved@redhat.com>
2018-04-04 13:41:28 +01:00