44370 Commits

Author SHA1 Message Date
Jiri Kosina
9938b04472 Merge branch 'master' into for-next
Sync with Linus' tree so that patches against newer codebase can be applied.

Signed-off-by: Jiri Kosina <jkosina@suse.cz>
2016-04-18 11:18:55 +02:00
Masanari Iida
bd7ced9881 Doc: treewide : Fix typos in DocBook/filesystem.xml
This patch fix spelling typos found in DocBook/filesystem.xml.
It is because the file was generated from comments in code,
I have to fix the comments in codes, instead of xml file.

Signed-off-by: Masanari Iida <standby24x7@gmail.com>
Signed-off-by: Jiri Kosina <jkosina@suse.cz>
2016-04-18 11:13:05 +02:00
Linus Torvalds
e1e22b27ec "driver core" fixes for 4.6-rc4
Here are 3 small fixes 4.6-rc4.  Two fix up some lz4 issues with big
 endian systems, and the remaining one resolves a minor debugfs issue
 that was reported.
 
 All have been in linux-next with no reported issues.
 
 Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v2
 
 iEYEABECAAYFAlcS3ucACgkQMUfUDdst+ynw1QCbBGgbY7Xt08whNFcAP81z1Q5X
 fmsAn1fyrhfsxe+JnybzswsTOjFw99Xd
 =YX6h
 -----END PGP SIGNATURE-----

Merge tag 'driver-core-4.6-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core

Pull misc fixes from Greg KH:
 "Here are three small fixes for 4.6-rc4.

  Two fix up some lz4 issues with big endian systems, and the remaining
  one resolves a minor debugfs issue that was reported.

  All have been in linux-next with no reported issues"

* tag 'driver-core-4.6-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core:
  lib: lz4: cleanup unaligned access efficiency detection
  lib: lz4: fixed zram with lz4 on big endian machines
  debugfs: Make automount point inodes permanently empty
2016-04-16 20:53:50 -07:00
Linus Torvalds
dfe70581c1 for-linus-4.6-rc4
These patches fix f2fs and fscrypto based on -rc3 bug fixes in ext4 crypto,
 which have not yet been fully propagated as follows.
  - use of dget_parent and file_dentry to avoid crashes
  - disallow RCU-mode lookup in d_invalidate
  - disallow -ENOMEM in the core data encryption path
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1
 
 iQIcBAABAgAGBQJXDbkhAAoJEEAUqH6CSFDS6yMP/1L+1PcmTJKloe0wCft57fL0
 hXRXI57DeNwCxc9fKmaBnAn9uv96qc7fWq1oMqT0oYPTf8qGUaIc0bALxx/IFCwh
 Ul1QOzlGJmomPq4kGleOEPmDGBGUwDYgH55rEadhmevbOMX5IGQOSDGu9qxZ+LMA
 orYTYTHX5tjeR1UAxzjMkwa0WRyYJGRsqqycgIl8f/kIg06GIpxOx1IIuFQOBv1N
 BJlkKbbiX/snWKXXVCNlFzVAZOWjpgzfvCGB0Fge4HZlMZL876jKIj+4dVmQ5xiI
 K6i5NXk1U13hhQEFzK4McicHBwlMnMxeMbdR8wm1vQ/4UfQFZoQAeClAxz5/ZyPr
 lHi+VamyTbMfRwijp49hNQ8SKjMKPd70yFd2d2csQAwVra2t4fmXqieh6AEppkJb
 EDeFAK8Y+PGis6XVnAJ1wkXW7OSwzfJUVAmj3oFsZ7Onny+mm++f+i7a7I3RKUib
 SH23Imv1OQaoGcAFVSCYe/JP9dcfKUwzSFFYOiQneqdNTNN8rzoPYYhQ4hNCKv6/
 zHSEI2EcpGipmSTh4dROPJvNf42QG8Sw4lEuyGsXy251t7wxBrc7e43BzXtjyKsl
 /2d5hgAQbR9HeU3Oz41Dq/LEjReEAcdJDKMETP9/qrzs7qDJgtI0XoUkVSn3NyOR
 i3JSEix2JljR97DzUIW/
 =ML1+
 -----END PGP SIGNATURE-----

Merge tag 'for-linus-4.6-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/jaegeuk/f2fs

Pull f2fs/fscrypto fixes from Jaegeuk Kim:
 "In addition to f2fs/fscrypto fixes, I've added one patch which
  prevents RCU mode lookup in d_revalidate, as Al mentioned.

  These patches fix f2fs and fscrypto based on -rc3 bug fixes in ext4
  crypto, which have not yet been fully propagated as follows.

   - use of dget_parent and file_dentry to avoid crashes
   - disallow RCU-mode lookup in d_invalidate
   - disallow -ENOMEM in the core data encryption path"

* tag 'for-linus-4.6-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/jaegeuk/f2fs:
  ext4/fscrypto: avoid RCU lookup in d_revalidate
  fscrypto: don't let data integrity writebacks fail with ENOMEM
  f2fs: use dget_parent and file_dentry in f2fs_file_open
  fscrypto: use dget_parent() in fscrypt_d_revalidate()
2016-04-14 18:22:42 -07:00
Linus Torvalds
34dbbcdbf6 Make file credentials available to the seqfile interfaces
A lot of seqfile users seem to be using things like %pK that uses the
credentials of the current process, but that is actually completely
wrong for filesystem interfaces.

The unix semantics for permission checking files is to check permissions
at _open_ time, not at read or write time, and that is not just a small
detail: passing off stdin/stdout/stderr to a suid application and making
the actual IO happen in privileged context is a classic exploit
technique.

So if we want to be able to look at permissions at read time, we need to
use the file open credentials, not the current ones.  Normal file
accesses can just use "f_cred" (or any of the helper functions that do
that, like file_ns_capable()), but the seqfile interfaces do not have
any such options.

It turns out that seq_file _does_ save away the user_ns information of
the file, though.  Since user_ns is just part of the full credential
information, replace that special case with saving off the cred pointer
instead, and suddenly seq_file has all the permission information it
needs.

Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-04-14 12:56:09 -07:00
Bob Peterson
e97321fa09 GFS2: Don't dereference inode in gfs2_inode_lookup until it's valid
Function gfs2_inode_lookup was dereferencing the inode, and after,
it checks for the value being NULL. We need to check that first.

Signed-off-by: Bob Peterson <rpeterso@redhat.com>
2016-04-14 09:52:50 -04:00
Hannes Frederic Sowa
fafc4e1ea1 sock: tigthen lockdep checks for sock_owned_by_user
sock_owned_by_user should not be used without socket lock held. It seems
to be a common practice to check .owned before lock reclassification, so
provide a little help to abstract this check away.

Cc: linux-cifs@vger.kernel.org
Cc: linux-bluetooth@vger.kernel.org
Cc: linux-nfs@vger.kernel.org
Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-04-13 22:37:20 -04:00
Jaegeuk Kim
03a8bb0e53 ext4/fscrypto: avoid RCU lookup in d_revalidate
As Al pointed, d_revalidate should return RCU lookup before using d_inode.
This was originally introduced by:
commit 34286d666230 ("fs: rcu-walk aware d_revalidate method").

Reported-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
Cc: Theodore Ts'o <tytso@mit.edu>
Cc: stable <stable@vger.kernel.org>
2016-04-12 20:01:35 -07:00
Seth Forshee
87243deb88 debugfs: Make automount point inodes permanently empty
Starting with 4.1 the tracing subsystem has its own filesystem
which is automounted in the tracing subdirectory of debugfs.
Prior to this debugfs could be bind mounted in a cloned mount
namespace, but if tracefs has been mounted under debugfs this
now fails because there is a locked child mount. This creates
a regression for container software which bind mounts debugfs
to satisfy the assumption of some userspace software.

In other pseudo filesystems such as proc and sysfs we're already
creating mountpoints like this in such a way that no dirents can
be created in the directories, allowing them to be exceptions to
some MNT_LOCKED tests. In fact we're already do this for the
tracefs mountpoint in sysfs.

Do the same in debugfs_create_automount(), since the intention
here is clearly to create a mountpoint. This fixes the regression,
as locked child mounts on permanently empty directories do not
cause a bind mount to fail.

Cc: stable@vger.kernel.org # v4.1+
Signed-off-by: Seth Forshee <seth.forshee@canonical.com>
Acked-by: Serge Hallyn <serge.hallyn@canonical.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2016-04-12 15:01:53 -07:00
Nicolai Stange
c4a74f63df debugfs: unproxify files created through debugfs_create_u32_array()
The struct file_operations u32_array_fops associated with files created
through debugfs_create_u32_array() has been lifetime aware already:
everything needed for subsequent operation is copied to a ->f_private
buffer at file opening time in u32_array_open(). Now, ->open() is always
protected against file removal issues by the debugfs core.

There is no need for the debugfs core to wrap the u32_array_fops
with a file lifetime managing proxy.

Make debugfs_create_u32_array() create its files in non-proxying operation
mode by means of debugfs_create_file_unsafe().

Signed-off-by: Nicolai Stange <nicstange@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2016-04-12 14:14:21 -07:00
Nicolai Stange
83b711cbf4 debugfs: unproxify files created through debugfs_create_blob()
Currently, the struct file_operations fops_blob associated with files
created through the debugfs_create_blob() helpers are not file
lifetime aware.

Thus, a lifetime managing proxy is created around fops_blob each time such
a file is opened which is an unnecessary waste of resources.

Implement file lifetime management for the fops_bool file_operations.
Namely, make read_file_blob() safe gainst file removals by means of
debugfs_use_file_start() and debugfs_use_file_finish().

Make debugfs_create_blob() create its files in non-proxying operation mode
by means of debugfs_create_file_unsafe().

Signed-off-by: Nicolai Stange <nicstange@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2016-04-12 14:14:21 -07:00
Nicolai Stange
4d45f7974c debugfs: unproxify files created through debugfs_create_bool()
Currently, the struct file_operations fops_bool associated with files
created through the debugfs_create_bool() helpers are not file
lifetime aware.

Thus, a lifetime managing proxy is created around fops_bool each time such
a file is opened which is an unnecessary waste of resources.

Implement file lifetime management for the fops_bool file_operations.
Namely, make debugfs_read_file_bool() and debugfs_write_file_bool() safe
against file removals by means of debugfs_use_file_start() and
debugfs_use_file_finish().

Make debugfs_create_bool() create its files in non-proxying operation mode
through debugfs_create_mode_unsafe().

Finally, purge debugfs_create_mode() as debugfs_create_bool() had been its
last user.

Signed-off-by: Nicolai Stange <nicstange@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2016-04-12 14:14:21 -07:00
Nicolai Stange
4909f16810 debugfs: unproxify integer attribute files
Currently, the struct file_operations associated with the integer attribute
style files created through the debugfs_create_*() helpers are not file
lifetime aware as they are defined by means of DEFINE_SIMPLE_ATTRIBUTE().

Thus, a lifetime managing proxy is created around the original fops each
time such a file is opened which is an unnecessary waste of resources.

Migrate all usages of DEFINE_SIMPLE_ATTRIBUTE() within debugfs itself
to DEFINE_DEBUGFS_ATTRIBUTE() in order to implement file lifetime managing
within the struct file_operations thus defined.

Introduce the debugfs_create_mode_unsafe() helper, analogous to
debugfs_create_mode(), but distinct in that it creates the files in
non-proxying operation mode through debugfs_create_file_unsafe().

Feed all struct file_operations migrated to DEFINE_DEBUGFS_ATTRIBUTE()
into debugfs_create_mode_unsafe() instead of former debugfs_create_mode().

Signed-off-by: Nicolai Stange <nicstange@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2016-04-12 14:14:21 -07:00
Nicolai Stange
c646880814 debugfs: add support for self-protecting attribute file fops
In order to protect them against file removal issues, debugfs_create_file()
creates a lifetime managing proxy around each struct file_operations
handed in.

In cases where this struct file_operations is able to manage file lifetime
by itself already, the proxy created by debugfs is a waste of resources.

The most common class of struct file_operations given to debugfs are those
defined by means of the DEFINE_SIMPLE_ATTRIBUTE() macro.

Introduce a DEFINE_DEBUGFS_ATTRIBUTE() macro to allow any
struct file_operations of this class to be easily made file lifetime aware
and thus, to be operated unproxied.

Specifically, introduce debugfs_attr_read() and debugfs_attr_write()
which wrap simple_attr_read() and simple_attr_write() under the protection
of a debugfs_use_file_start()/debugfs_use_file_finish() pair.

Make DEFINE_DEBUGFS_ATTRIBUTE() set the defined struct file_operations'
->read() and ->write() members to these wrappers.

Export debugfs_create_file_unsafe() in order to allow debugfs users to
create their files in non-proxying operation mode.

Signed-off-by: Nicolai Stange <nicstange@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2016-04-12 14:14:21 -07:00
Nicolai Stange
49d200deaa debugfs: prevent access to removed files' private data
Upon return of debugfs_remove()/debugfs_remove_recursive(), it might
still be attempted to access associated private file data through
previously opened struct file objects. If that data has been freed by
the caller of debugfs_remove*() in the meanwhile, the reading/writing
process would either encounter a fault or, if the memory address in
question has been reassigned again, unrelated data structures could get
overwritten.

However, since debugfs files are seldomly removed, usually from module
exit handlers only, the impact is very low.

Currently, there are ~1000 call sites of debugfs_create_file() spread
throughout the whole tree and touching all of those struct file_operations
in order to make them file removal aware by means of checking the result of
debugfs_use_file_start() from within their methods is unfeasible.

Instead, wrap the struct file_operations by a lifetime managing proxy at
file open:
- In debugfs_create_file(), the original fops handed in has got stashed
  away in ->d_fsdata already.
- In debugfs_create_file(), install a proxy file_operations factory,
  debugfs_full_proxy_file_operations, at ->i_fop.

This proxy factory has got an ->open() method only. It carries out some
lifetime checks and if successful, dynamically allocates and sets up a new
struct file_operations proxy at ->f_op. Afterwards, it forwards to the
->open() of the original struct file_operations in ->d_fsdata, if any.

The dynamically set up proxy at ->f_op has got a lifetime managing wrapper
set for each of the methods defined in the original struct file_operations
in ->d_fsdata.

Its ->release()er frees the proxy again and forwards to the original
->release(), if any.

In order not to mislead the VFS layer, it is strictly necessary to leave
those fields blank in the proxy that have been NULL in the original
struct file_operations also, i.e. aren't supported. This is why there is a
need for dynamically allocated proxies. The choice made not to allocate a
proxy instance for every dentry at file creation, but for every
struct file object instantiated thereof is justified by the expected usage
pattern of debugfs, namely that in general very few files get opened more
than once at a time.

The wrapper methods set in the struct file_operations implement lifetime
managing by means of the SRCU protection facilities already in place for
debugfs:
They set up a SRCU read side critical section and check whether the dentry
is still alive by means of debugfs_use_file_start(). If so, they forward
the call to the original struct file_operation stored in ->d_fsdata, still
under the protection of the SRCU read side critical section.
This SRCU read side critical section prevents any pending debugfs_remove()
and friends to return to their callers. Since a file's private data must
only be freed after the return of debugfs_remove(), the ongoing proxied
call is guarded against any file removal race.

If, on the other hand, the initial call to debugfs_use_file_start() detects
that the dentry is dead, the wrapper simply returns -EIO and does not
forward the call. Note that the ->poll() wrapper is special in that its
signature does not allow for the return of arbitrary -EXXX values and thus,
POLLHUP is returned here.

In order not to pollute debugfs with wrapper definitions that aren't ever
needed, I chose not to define a wrapper for every struct file_operations
method possible. Instead, a wrapper is defined only for the subset of
methods which are actually set by any debugfs users.
Currently, these are:

  ->llseek()
  ->read()
  ->write()
  ->unlocked_ioctl()
  ->poll()

The ->release() wrapper is special in that it does not protect the original
->release() in any way from dead files in order not to leak resources.
Thus, any ->release() handed to debugfs must implement file lifetime
management manually, if needed.
For only 33 out of a total of 434 releasers handed in to debugfs, it could
not be verified immediately whether they access data structures that might
have been freed upon a debugfs_remove() return in the meanwhile.

Export debugfs_use_file_start() and debugfs_use_file_finish() in order to
allow any ->release() to manually implement file lifetime management.

For a set of common cases of struct file_operations implemented by the
debugfs_core itself, future patches will incorporate file lifetime
management directly within those in order to allow for their unproxied
operation. Rename the original, non-proxying "debugfs_create_file()" to
"debugfs_create_file_unsafe()" and keep it for future internal use by
debugfs itself. Factor out code common to both into the new
__debugfs_create_file().

Signed-off-by: Nicolai Stange <nicstange@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2016-04-12 14:14:21 -07:00
Nicolai Stange
9fd4dcece4 debugfs: prevent access to possibly dead file_operations at file open
Nothing prevents a dentry found by path lookup before a return of
__debugfs_remove() to actually get opened after that return. Now, after
the return of __debugfs_remove(), there are no guarantees whatsoever
regarding the memory the corresponding inode's file_operations object
had been kept in.

Since __debugfs_remove() is seldomly invoked, usually from module exit
handlers only, the race is hard to trigger and the impact is very low.

A discussion of the problem outlined above as well as a suggested
solution can be found in the (sub-)thread rooted at

  http://lkml.kernel.org/g/20130401203445.GA20862@ZenIV.linux.org.uk
  ("Yet another pipe related oops.")

Basically, Greg KH suggests to introduce an intermediate fops and
Al Viro points out that a pointer to the original ones may be stored in
->d_fsdata.

Follow this line of reasoning:
- Add SRCU as a reverse dependency of DEBUG_FS.
- Introduce a srcu_struct object for the debugfs subsystem.
- In debugfs_create_file(), store a pointer to the original
  file_operations object in ->d_fsdata.
- Make debugfs_remove() and debugfs_remove_recursive() wait for a
  SRCU grace period after the dentry has been delete()'d and before they
  return to their callers.
- Introduce an intermediate file_operations object named
  "debugfs_open_proxy_file_operations". It's ->open() functions checks,
  under the protection of a SRCU read lock, whether the dentry is still
  alive, i.e. has not been d_delete()'d and if so, tries to acquire a
  reference on the owning module.
  On success, it sets the file object's ->f_op to the original
  file_operations and forwards the ongoing open() call to the original
  ->open().
- For clarity, rename the former debugfs_file_operations to
  debugfs_noop_file_operations -- they are in no way canonical.

The choice of SRCU over "normal" RCU is justified by the fact, that the
former may also be used to protect ->i_private data from going away
during the execution of a file's readers and writers which may (and do)
sleep.

Finally, introduce the fs/debugfs/internal.h header containing some
declarations internal to the debugfs implementation.

Signed-off-by: Nicolai Stange <nicstange@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2016-04-12 14:14:21 -07:00
Jaegeuk Kim
b32e4482aa fscrypto: don't let data integrity writebacks fail with ENOMEM
This patch fixes the issue introduced by the ext4 crypto fix in a same manner.
For F2FS, however, we flush the pending IOs and wait for a while to acquire free
memory.

Fixes: c9af28fdd4492 ("ext4 crypto: don't let data integrity writebacks fail with ENOMEM")
Cc: Theodore Ts'o <tytso@mit.edu>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2016-04-12 10:25:30 -07:00
Jaegeuk Kim
33b1395124 f2fs: use dget_parent and file_dentry in f2fs_file_open
This patch synced with the below two ext4 crypto fixes together.

In 4.6-rc1, f2fs newly introduced accessing f_path.dentry which crashes
overlayfs. To fix, now we need to use file_dentry() to access that field.

Fixes: c0a37d487884 ("ext4: use file_dentry()")
Fixes: 9dd78d8c9a7b ("ext4: use dget_parent() in ext4_file_open()")
Cc: Miklos Szeredi <mszeredi@redhat.com>
Cc: Theodore Ts'o <tytso@mit.edu>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2016-04-12 10:24:22 -07:00
Jaegeuk Kim
d7d7535289 fscrypto: use dget_parent() in fscrypt_d_revalidate()
This patch updates fscrypto along with the below ext4 crypto change.

Fixes: 3d43bcfef5f0 ("ext4 crypto: use dget_parent() in ext4_d_revalidate()")
Cc: Theodore Ts'o <tytso@mit.edu>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2016-04-12 10:24:04 -07:00
Denys Vlasenko
a527b38e14 GFS2: fs/gfs2/glock.c: Deinline do_error, save 1856 bytes
This function compiles to 522 bytes of machine code.

Error paths are not very time critical.

Signed-off-by: Denys Vlasenko <dvlasenk@redhat.com>
Signed-off-by: Bob Peterson <rpeterso@redhat.com>
2016-04-12 12:39:12 -04:00
David Howells
5ac7eace2d KEYS: Add a facility to restrict new links into a keyring
Add a facility whereby proposed new links to be added to a keyring can be
vetted, permitting them to be rejected if necessary.  This can be used to
block public keys from which the signature cannot be verified or for which
the signature verification fails.  It could also be used to provide
blacklisting.

This affects operations like add_key(), KEYCTL_LINK and KEYCTL_INSTANTIATE.

To this end:

 (1) A function pointer is added to the key struct that, if set, points to
     the vetting function.  This is called as:

	int (*restrict_link)(struct key *keyring,
			     const struct key_type *key_type,
			     unsigned long key_flags,
			     const union key_payload *key_payload),

     where 'keyring' will be the keyring being added to, key_type and
     key_payload will describe the key being added and key_flags[*] can be
     AND'ed with KEY_FLAG_TRUSTED.

     [*] This parameter will be removed in a later patch when
     	 KEY_FLAG_TRUSTED is removed.

     The function should return 0 to allow the link to take place or an
     error (typically -ENOKEY, -ENOPKG or -EKEYREJECTED) to reject the
     link.

     The pointer should not be set directly, but rather should be set
     through keyring_alloc().

     Note that if called during add_key(), preparse is called before this
     method, but a key isn't actually allocated until after this function
     is called.

 (2) KEY_ALLOC_BYPASS_RESTRICTION is added.  This can be passed to
     key_create_or_update() or key_instantiate_and_link() to bypass the
     restriction check.

 (3) KEY_FLAG_TRUSTED_ONLY is removed.  The entire contents of a keyring
     with this restriction emplaced can be considered 'trustworthy' by
     virtue of being in the keyring when that keyring is consulted.

 (4) key_alloc() and keyring_alloc() take an extra argument that will be
     used to set restrict_link in the new key.  This ensures that the
     pointer is set before the key is published, thus preventing a window
     of unrestrictedness.  Normally this argument will be NULL.

 (5) As a temporary affair, keyring_restrict_trusted_only() is added.  It
     should be passed to keyring_alloc() as the extra argument instead of
     setting KEY_FLAG_TRUSTED_ONLY on a keyring.  This will be replaced in
     a later patch with functions that look in the appropriate places for
     authoritative keys.

Signed-off-by: David Howells <dhowells@redhat.com>
Reviewed-by: Mimi Zohar <zohar@linux.vnet.ibm.com>
2016-04-11 22:37:37 +01:00
David Howells
dc44b3a09a rxrpc: Differentiate local and remote abort codes in structs
In the rxrpc_connection and rxrpc_call structs, there's one field to hold
the abort code, no matter whether that value was generated locally to be
sent or was received from the peer via an abort packet.

Split the abort code fields in two for cleanliness sake and add an error
field to hold the Linux error number to the rxrpc_call struct too
(sometimes this is generated in a context where we can't return it to
userspace directly).

Furthermore, add a skb mark to indicate a packet that caused a local abort
to be generated so that recvmsg() can pick up the correct abort code.  A
future addition will need to be to indicate to userspace the difference
between aborts via a control message.

Signed-off-by: David Howells <dhowells@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-04-11 15:34:40 -04:00
David Howells
2f02f7aea7 afs: Wait for outstanding async calls before closing rxrpc socket
The afs filesystem needs to wait for any outstanding asynchronous calls
(such as FS.GiveUpCallBacks cleaning up the callbacks lodged with a server)
to complete before closing the AF_RXRPC socket when unloading the module.

This may occur if the module is removed too quickly after unmounting all
filesystems.  This will produce an error report that looks like:

	AFS: Assertion failed
	1 == 0 is false
	0x1 == 0x0 is false
	------------[ cut here ]------------
	kernel BUG at ../fs/afs/rxrpc.c:135!
	...
	RIP: 0010:[<ffffffffa004111c>] afs_close_socket+0xec/0x107 [kafs]
	...
	Call Trace:
	 [<ffffffffa004a160>] afs_exit+0x1f/0x57 [kafs]
	 [<ffffffff810c30a0>] SyS_delete_module+0xec/0x17d
	 [<ffffffff81610417>] entry_SYSCALL_64_fastpath+0x12/0x6b

Signed-off-by: David Howells <dhowells@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-04-11 15:34:40 -04:00
Al Viro
ce23e64013 ->getxattr(): pass dentry and inode as separate arguments
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2016-04-11 00:48:00 -04:00
Al Viro
b296821a7c xattr_handler: pass dentry and inode as separate arguments of ->get()
... and do not assume they are already attached to each other

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2016-04-10 20:48:24 -04:00
Linus Torvalds
9f2394c9be Revert "ext4: allow readdir()'s of large empty directories to be interrupted"
This reverts commit 1028b55bafb7611dda1d8fed2aeca16a436b7dff.

It's broken: it makes ext4 return an error at an invalid point, causing
the readdir wrappers to write the the position of the last successful
directory entry into the position field, which means that the next
readdir will now return that last successful entry _again_.

You can only return fatal errors (that terminate the readdir directory
walk) from within the filesystem readdir functions, the "normal" errors
(that happen when the readdir buffer fills up, for example) happen in
the iterorator where we know the position of the actual failing entry.

I do have a very different patch that does the "signal_pending()"
handling inside the iterator function where it is allowable, but while
that one passes all the sanity checks, I screwed up something like four
times while emailing it out, so I'm not going to commit it today.

So my track record is not good enough, and the stars will have to align
better before that one gets committed.  And it would be good to get some
review too, of course, since celestial alignments are always an iffy
debugging model.

IOW, let's just revert the commit that caused the problem for now.

Reported-by: Greg Thelen <gthelen@google.com>
Cc: Theodore Ts'o <tytso@mit.edu>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-04-10 16:52:24 -07:00
Al Viro
79a628d14e reiserfs: switch to generic_{get,set,remove}xattr()
reiserfs_xattr_[sg]et() will fail with -EOPNOTSUPP for V1 inodes anyway,
and all reiserfs instances of ->[sg]et() call it and so does ->set_acl().

Checks for name length in the instances had been bogus; they should've
been "bugger off if it's _exactly_ the prefix" (as generic would
do on its own) and not "bugger off if it's shorter than the prefix" -
that can't happen.

xattr_full_name() is needed to adjust for the fact that generic instances
will skip the prefix in the name passed to ->[gs]et(); reiserfs homegrown
analogues didn't.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2016-04-10 19:31:09 -04:00
Al Viro
5fdccfef48 cifs: kill more bogus checks in ->...xattr() methods
none of that stuff can ever be called for NULL or negative
dentry.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2016-04-10 17:12:03 -04:00
Al Viro
fc64005c93 don't bother with ->d_inode->i_sb - it's always equal to ->d_sb
... and neither can ever be NULL

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2016-04-10 17:11:51 -04:00
David S. Miller
ae95d71261 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net 2016-04-09 17:41:41 -04:00
Linus Torvalds
839a3f7657 Merge branch 'for-linus-4.6' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs
Pull btrfs fixes from Chris Mason:
 "These are bug fixes, including a really old fsync bug, and a few trace
  points to help us track down problems in the quota code"

* 'for-linus-4.6' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs:
  Btrfs: fix file/data loss caused by fsync after rename and new inode
  btrfs: Reset IO error counters before start of device replacing
  btrfs: Add qgroup tracing
  Btrfs: don't use src fd for printk
  btrfs: fallback to vmalloc in btrfs_compare_tree
  btrfs: handle non-fatal errors in btrfs_qgroup_inherit()
  btrfs: Output more info for enospc_debug mount option
  Btrfs: fix invalid reference in replace_path
  Btrfs: Improve FL_KEEP_SIZE handling in fallocate
2016-04-09 10:41:34 -07:00
Linus Torvalds
6759212640 Orangefs: cleanups and a strncpy vulnerability fix.
Cleanups:
 
   - remove an unused variable from orangefs_readdir.
   - clean up printk wrapper used for ofs "gossip" debugging.
   - clean up truncate ctime and mtime setting in inode.c
   - remove a useless null check found by coccinelle.
   - optimize some memcpy/memset boilerplate code.
   - remove some useless sanity checks from xattr.c
 
 Fix:
 
   - fix a potential strncpy vulnerability.
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v2.0.22 (GNU/Linux)
 
 iQIcBAABAgAGBQJXCAvtAAoJEM9EDqnrzg2+U3kP/RcypLnTdnOLFI7GtTr7erpc
 4ys4UE3CdvOdIkDNgTeW+qTvy/b7qo3JBdaRqMAAmnWkxWn4cGXDGLhmfKD/6a3Q
 LL9vKxiczYN8gAYgnxoF5rNwoVScFVD7PospKkWMPedkRu00K0dqiJvomPXsvwM7
 MEl4ueewq3KMrXHu1GYSdoZhU8BWtku789Oa5leXRqEO2pRit0amX/qxB9KG6Urz
 Ebz0yCGMPP61nTN9A/Q7IM2imcJO5F0wc5uaSaWimjHeHb040ytMVqFa4GOh4Pky
 oQjjw5OYSsX/O7Hh66wKQ68YqYb762OKE1y5v8K43vRhWQtnQo9pdKpFVEK0An8Z
 wOSaUFd5mhPD2KENcKB/kSiH3eOyGdyD2QamptLJ6Opl/6UdPQMt++1EkuSuYEdW
 wnCFsJM5yam3Ot+REnQiYAVjLsDZ0XhPfNIuAp+d4LIV32JGTQPOBurKECwsJbj5
 fK0lsBNk7b8qgBwG41JjV7XyGlf5HWT2TwpuC2S5PysVXcxsvdfR3oVeUnzdL6aB
 fv0mp7bBqg2lNLKoeEYlxb+vbSiLutOCIPWSNptYjbCWa/q/bqnZMhXOUQjS5XRj
 HIG/91XeRUxOZk+y1gs5N5F8pCT9mfWRTvSCX5ex5x17rUFto/jhn2mQLAhxSyPD
 /AGQgNGz2VmmVBlaCngI
 =ZRlK
 -----END PGP SIGNATURE-----

Merge tag 'for-linus-4.6-ofs1' of git://git.kernel.org/pub/scm/linux/kernel/git/hubcap/linux

Pull orangefs fixes from Mike Marshall:
 "Orangefs cleanups and a strncpy vulnerability fix.

  Cleanups:
   - remove an unused variable from orangefs_readdir.
   - clean up printk wrapper used for ofs "gossip" debugging.
   - clean up truncate ctime and mtime setting in inode.c
   - remove a useless null check found by coccinelle.
   - optimize some memcpy/memset boilerplate code.
   - remove some useless sanity checks from xattr.c

  Fix:
   - fix a potential strncpy vulnerability"

* tag 'for-linus-4.6-ofs1' of git://git.kernel.org/pub/scm/linux/kernel/git/hubcap/linux:
  orangefs: remove unused variable
  orangefs: Add KERN_<LEVEL> to gossip_<level> macros
  orangefs: strncpy -> strscpy
  orangefs: clean up truncate ctime and mtime setting
  Orangefs: fix ifnullfree.cocci warnings
  Orangefs: optimize boilerplate code.
  Orangefs: xattr.c cleanup
2016-04-09 10:33:58 -07:00
Martin Brandenburg
e56f498142 orangefs: remove unused variable
Signed-off-by: Martin Brandenburg <martin@omnibond.com>
Signed-off-by: Mike Marshall <hubcap@omnibond.com>
2016-04-08 15:50:44 -04:00
Joe Perches
1917a69328 orangefs: Add KERN_<LEVEL> to gossip_<level> macros
Emit the logging messages at the appropriate levels.

Miscellanea:

o Change format to fmt
o Use the more common ##__VA_ARGS__

Signed-off-by: Joe Perches <joe@perches.com>
Signed-off-by: Mike Marshall <hubcap@omnibond.com>
2016-04-08 14:10:45 -04:00
Martin Brandenburg
2eacea74cc orangefs: strncpy -> strscpy
It would have been possible for a rogue client-core to send in a symlink
target which is not NUL terminated. This returns EIO if the client-core
gives us corrupt data.

Leave debugfs and superblock code as is for now.

Other dcache.c and namei.c strncpy instances are safe because
ORANGEFS_NAME_MAX = NAME_MAX + 1; there is always enough space for a
name plus a NUL byte.

Signed-off-by: Martin Brandenburg <martin@omnibond.com>
Signed-off-by: Mike Marshall <hubcap@omnibond.com>
2016-04-08 14:10:34 -04:00
Martin Brandenburg
f83140c146 orangefs: clean up truncate ctime and mtime setting
The ctime and mtime are always updated on a successful ftruncate and
only updated on a successful truncate where the size changed.

We handle the ``if the size changed'' bit.

This matches FUSE's behavior.

Signed-off-by: Martin Brandenburg <martin@omnibond.com>
Signed-off-by: Mike Marshall <hubcap@omnibond.com>
2016-04-08 14:10:31 -04:00
kbuild test robot
2fa37fd713 Orangefs: fix ifnullfree.cocci warnings
fs/orangefs/orangefs-debugfs.c:130:2-26: WARNING: NULL check before freeing functions like kfree, debugfs_remove, debugfs_remove_recursive or usb_free_urb is not needed. Maybe consider reorganizing relevant code to avoid passing NULL values.

 NULL check before some freeing functions is not needed.

 Based on checkpatch warning
 "kfree(NULL) is safe this check is probably not required"
 and kfreeaddr.cocci by Julia Lawall.

Generated by: scripts/coccinelle/free/ifnullfree.cocci

Signed-off-by: Fengguang Wu <fengguang.wu@intel.com>
Signed-off-by: Mike Marshall <hubcap@omnibond.com>
2016-04-08 14:08:38 -04:00
Mike Marshall
a9bb3ba81f Orangefs: optimize boilerplate code.
Suggested by David Binderman <dcb314@hotmail.com>
The former can potentially be a performance win over the latter.

memcpy(d, s, len);
memset(d+len, c, size-len);

memset(d, c, size);
memcpy(d, s, len);

Signed-off-by: Mike Marshall <hubcap@omnibond.com>
2016-04-08 14:08:27 -04:00
Mike Marshall
2d09a2ca6a Orangefs: xattr.c cleanup
1. It is nonsense to test for negative size_t, suggested by
   David Binderman <dcb314@hotmail.com>

2. By the time Orangefs gets called, the vfs has ensured that
   name != NULL, and that buffer and size are sane.

Signed-off-by: Mike Marshall <hubcap@omnibond.com>
2016-04-08 14:08:27 -04:00
Linus Torvalds
93061f390f These changes contains a fix for overlayfs interacting with some
(badly behaved) dentry code in various file systems.  These have been
 reviewed by Al and the respective file system mtinainers and are going
 through the ext4 tree for convenience.
 
 This also has a few ext4 encryption bug fixes that were discovered in
 Android testing (yes, we will need to get these sync'ed up with the
 fs/crypto code; I'll take care of that).  It also has some bug fixes
 and a change to ignore the legacy quota options to allow for xfstests
 regression testing of ext4's internal quota feature and to be more
 consistent with how xfs handles this case.
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v2
 
 iQEcBAABCAAGBQJXBn4aAAoJEPL5WVaVDYGjHWgH/2wXnlQnC2ndJhblBWtPzprz
 OQW4dawdnhxqbTEGUqWe942tZivSb/liu/lF+urCGbWsbgz9jNOCmEAg7JPwlccY
 mjzwDvtVq5U4d2rP+JDWXLy/Gi8XgUclhbQDWFVIIIea6fS7IuFWqoVBR+HPMhra
 9tEygpiy5lNtJA/hqq3/z9x0AywAjwrYR491CuWreo2Uu1aeKg0YZsiDsuAcGioN
 Waa2TgbC/ZZyJuJcPBP8If+VOFAa0ea3F+C/o7Tb9bOqwuz0qSTcaMRgt6eQ2KUt
 P4b9Ecp1XLjJTC7IYOknUOScY3lCyREx/Xya9oGZfFNTSHzbOlLBoplCr3aUpYQ=
 =/HHR
 -----END PGP SIGNATURE-----

Merge tag 'ext4_for_linus_stable' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4

Pull ext4 bugfixes from Ted Ts'o:
 "These changes contains a fix for overlayfs interacting with some
  (badly behaved) dentry code in various file systems.  These have been
  reviewed by Al and the respective file system mtinainers and are going
  through the ext4 tree for convenience.

  This also has a few ext4 encryption bug fixes that were discovered in
  Android testing (yes, we will need to get these sync'ed up with the
  fs/crypto code; I'll take care of that).  It also has some bug fixes
  and a change to ignore the legacy quota options to allow for xfstests
  regression testing of ext4's internal quota feature and to be more
  consistent with how xfs handles this case"

* tag 'ext4_for_linus_stable' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4:
  ext4: ignore quota mount options if the quota feature is enabled
  ext4 crypto: fix some error handling
  ext4: avoid calling dquot_get_next_id() if quota is not enabled
  ext4: retry block allocation for failed DIO and DAX writes
  ext4: add lockdep annotations for i_data_sem
  ext4: allow readdir()'s of large empty directories to be interrupted
  btrfs: fix crash/invalid memory access on fsync when using overlayfs
  ext4 crypto: use dget_parent() in ext4_d_revalidate()
  ext4: use file_dentry()
  ext4: use dget_parent() in ext4_file_open()
  nfs: use file_dentry()
  fs: add file_dentry()
  ext4 crypto: don't let data integrity writebacks fail with ENOMEM
  ext4: check if in-inode xattr is corrupted in ext4_expand_extra_isize_ea()
2016-04-07 17:22:20 -07:00
Filipe Manana
56f23fdbb6 Btrfs: fix file/data loss caused by fsync after rename and new inode
If we rename an inode A (be it a file or a directory), create a new
inode B with the old name of inode A and under the same parent directory,
fsync inode B and then power fail, at log tree replay time we end up
removing inode A completely. If inode A is a directory then all its files
are gone too.

Example scenarios where this happens:
This is reproducible with the following steps, taken from a couple of
test cases written for fstests which are going to be submitted upstream
soon:

   # Scenario 1

   mkfs.btrfs -f /dev/sdc
   mount /dev/sdc /mnt
   mkdir -p /mnt/a/x
   echo "hello" > /mnt/a/x/foo
   echo "world" > /mnt/a/x/bar
   sync
   mv /mnt/a/x /mnt/a/y
   mkdir /mnt/a/x
   xfs_io -c fsync /mnt/a/x
   <power failure happens>

   The next time the fs is mounted, log tree replay happens and
   the directory "y" does not exist nor do the files "foo" and
   "bar" exist anywhere (neither in "y" nor in "x", nor the root
   nor anywhere).

   # Scenario 2

   mkfs.btrfs -f /dev/sdc
   mount /dev/sdc /mnt
   mkdir /mnt/a
   echo "hello" > /mnt/a/foo
   sync
   mv /mnt/a/foo /mnt/a/bar
   echo "world" > /mnt/a/foo
   xfs_io -c fsync /mnt/a/foo
   <power failure happens>

   The next time the fs is mounted, log tree replay happens and the
   file "bar" does not exists anymore. A file with the name "foo"
   exists and it matches the second file we created.

Another related problem that does not involve file/data loss is when a
new inode is created with the name of a deleted snapshot and we fsync it:

   mkfs.btrfs -f /dev/sdc
   mount /dev/sdc /mnt
   mkdir /mnt/testdir
   btrfs subvolume snapshot /mnt /mnt/testdir/snap
   btrfs subvolume delete /mnt/testdir/snap
   rmdir /mnt/testdir
   mkdir /mnt/testdir
   xfs_io -c fsync /mnt/testdir # or fsync some file inside /mnt/testdir
   <power failure>

   The next time the fs is mounted the log replay procedure fails because
   it attempts to delete the snapshot entry (which has dir item key type
   of BTRFS_ROOT_ITEM_KEY) as if it were a regular (non-root) entry,
   resulting in the following error that causes mount to fail:

   [52174.510532] BTRFS info (device dm-0): failed to delete reference to snap, inode 257 parent 257
   [52174.512570] ------------[ cut here ]------------
   [52174.513278] WARNING: CPU: 12 PID: 28024 at fs/btrfs/inode.c:3986 __btrfs_unlink_inode+0x178/0x351 [btrfs]()
   [52174.514681] BTRFS: Transaction aborted (error -2)
   [52174.515630] Modules linked in: btrfs dm_flakey dm_mod overlay crc32c_generic ppdev xor raid6_pq acpi_cpufreq parport_pc tpm_tis sg parport tpm evdev i2c_piix4 proc
   [52174.521568] CPU: 12 PID: 28024 Comm: mount Tainted: G        W       4.5.0-rc6-btrfs-next-27+ #1
   [52174.522805] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS by qemu-project.org 04/01/2014
   [52174.524053]  0000000000000000 ffff8801df2a7710 ffffffff81264e93 ffff8801df2a7758
   [52174.524053]  0000000000000009 ffff8801df2a7748 ffffffff81051618 ffffffffa03591cd
   [52174.524053]  00000000fffffffe ffff88015e6e5000 ffff88016dbc3c88 ffff88016dbc3c88
   [52174.524053] Call Trace:
   [52174.524053]  [<ffffffff81264e93>] dump_stack+0x67/0x90
   [52174.524053]  [<ffffffff81051618>] warn_slowpath_common+0x99/0xb2
   [52174.524053]  [<ffffffffa03591cd>] ? __btrfs_unlink_inode+0x178/0x351 [btrfs]
   [52174.524053]  [<ffffffff81051679>] warn_slowpath_fmt+0x48/0x50
   [52174.524053]  [<ffffffffa03591cd>] __btrfs_unlink_inode+0x178/0x351 [btrfs]
   [52174.524053]  [<ffffffff8118f5e9>] ? iput+0xb0/0x284
   [52174.524053]  [<ffffffffa0359fe8>] btrfs_unlink_inode+0x1c/0x3d [btrfs]
   [52174.524053]  [<ffffffffa038631e>] check_item_in_log+0x1fe/0x29b [btrfs]
   [52174.524053]  [<ffffffffa0386522>] replay_dir_deletes+0x167/0x1cf [btrfs]
   [52174.524053]  [<ffffffffa038739e>] fixup_inode_link_count+0x289/0x2aa [btrfs]
   [52174.524053]  [<ffffffffa038748a>] fixup_inode_link_counts+0xcb/0x105 [btrfs]
   [52174.524053]  [<ffffffffa038a5ec>] btrfs_recover_log_trees+0x258/0x32c [btrfs]
   [52174.524053]  [<ffffffffa03885b2>] ? replay_one_extent+0x511/0x511 [btrfs]
   [52174.524053]  [<ffffffffa034f288>] open_ctree+0x1dd4/0x21b9 [btrfs]
   [52174.524053]  [<ffffffffa032b753>] btrfs_mount+0x97e/0xaed [btrfs]
   [52174.524053]  [<ffffffff8108e1b7>] ? trace_hardirqs_on+0xd/0xf
   [52174.524053]  [<ffffffff8117bafa>] mount_fs+0x67/0x131
   [52174.524053]  [<ffffffff81193003>] vfs_kern_mount+0x6c/0xde
   [52174.524053]  [<ffffffffa032af81>] btrfs_mount+0x1ac/0xaed [btrfs]
   [52174.524053]  [<ffffffff8108e1b7>] ? trace_hardirqs_on+0xd/0xf
   [52174.524053]  [<ffffffff8108c262>] ? lockdep_init_map+0xb9/0x1b3
   [52174.524053]  [<ffffffff8117bafa>] mount_fs+0x67/0x131
   [52174.524053]  [<ffffffff81193003>] vfs_kern_mount+0x6c/0xde
   [52174.524053]  [<ffffffff8119590f>] do_mount+0x8a6/0x9e8
   [52174.524053]  [<ffffffff811358dd>] ? strndup_user+0x3f/0x59
   [52174.524053]  [<ffffffff81195c65>] SyS_mount+0x77/0x9f
   [52174.524053]  [<ffffffff814935d7>] entry_SYSCALL_64_fastpath+0x12/0x6b
   [52174.561288] ---[ end trace 6b53049efb1a3ea6 ]---

Fix this by forcing a transaction commit when such cases happen.
This means we check in the commit root of the subvolume tree if there
was any other inode with the same reference when the inode we are
fsync'ing is a new inode (created in the current transaction).

Test cases for fstests, covering all the scenarios given above, were
submitted upstream for fstests:

  * fstests: generic test for fsync after renaming directory
    https://patchwork.kernel.org/patch/8694281/

  * fstests: generic test for fsync after renaming file
    https://patchwork.kernel.org/patch/8694301/

  * fstests: add btrfs test for fsync after snapshot deletion
    https://patchwork.kernel.org/patch/8670671/

Cc: stable@vger.kernel.org
Signed-off-by: Filipe Manana <fdmanana@suse.com>
Signed-off-by: Chris Mason <clm@fb.com>
2016-04-06 17:01:44 -07:00
Abhi Das
611526756a gfs2: Use gfs2 wrapper to sync inode before calling generic_file_splice_read()
gfs2_file_splice_read() f_op grabs and releases the cluster-wide
inode glock to sync the inode size to the latest.

Without this, generic_file_splice_read() uses an older i_size value
and can return EOF for valid offsets in the inode.

Signed-off-by: Abhi Das <adas@redhat.com>
Signed-off-by: Bob Peterson <rpeterso@redhat.com>
2016-04-05 12:06:15 -04:00
Bob Peterson
73cc86252b GFS2: Get rid of dead code in inode_go_demote_ok
Function inode_go_demote_ok had some code that was only executed
if gl_holders was not empty. However, if gl_holders was not empty,
the only caller, demote_ok(), returns before inode_go_demote_ok
would ever be called. Therefore, it's dead code, so I removed it.

Signed-off-by: Bob Peterson <rpeterso@redhat.com>
Acked-by: Steven Whitehouse <swhiteho@redhat.com>
2016-04-05 11:59:18 -04:00
Bob Copeland
8f6fd83c6c rhashtable: accept GFP flags in rhashtable_walk_init
In certain cases, the 802.11 mesh pathtable code wants to
iterate over all of the entries in the forwarding table from
the receive path, which is inside an RCU read-side critical
section.  Enable walks inside atomic sections by allowing
GFP_ATOMIC allocations for the walker state.

Change all existing callsites to pass in GFP_KERNEL.

Acked-by: Thomas Graf <tgraf@suug.ch>
Signed-off-by: Bob Copeland <me@bobcopeland.com>
[also adjust gfs2/glock.c and rhashtable tests]
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2016-04-05 10:56:32 +02:00
Linus Torvalds
e865f4965f Merge branch 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs
Pull quota fixes from Jan Kara:
 "Fixes for oopses when the new quotactl gets used with quotas disabled"

* 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs:
  ocfs2: Fix Q_GETNEXTQUOTA for filesystem without quotas
  quota: Handle Q_GETNEXTQUOTA when quota is disabled
2016-04-04 13:18:27 -07:00
Linus Torvalds
c7e82c6485 Merge tag 'f2fs-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jaegeuk/f2fs
Pull f2fs fixes from Jaegeuk Kim.

* tag 'f2fs-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jaegeuk/f2fs:
  f2fs: retrieve IO write stat from the right place
  f2fs crypto: fix corrupted symlink in encrypted case
  f2fs: cover large section in sanity check of super
2016-04-04 13:00:39 -07:00
Linus Torvalds
4a2d057e4f Merge branch 'PAGE_CACHE_SIZE-removal'
Merge PAGE_CACHE_SIZE removal patches from Kirill Shutemov:
 "PAGE_CACHE_{SIZE,SHIFT,MASK,ALIGN} macros were introduced *long* time
  ago with promise that one day it will be possible to implement page
  cache with bigger chunks than PAGE_SIZE.

  This promise never materialized.  And unlikely will.

  Let's stop pretending that pages in page cache are special.  They are
  not.

  The first patch with most changes has been done with coccinelle.  The
  second is manual fixups on top.

  The third patch removes macros definition"

[ I was planning to apply this just before rc2, but then I spaced out,
  so here it is right _after_ rc2 instead.

  As Kirill suggested as a possibility, I could have decided to only
  merge the first two patches, and leave the old interfaces for
  compatibility, but I'd rather get it all done and any out-of-tree
  modules and patches can trivially do the converstion while still also
  working with older kernels, so there is little reason to try to
  maintain the redundant legacy model.    - Linus ]

* PAGE_CACHE_SIZE-removal:
  mm: drop PAGE_CACHE_* and page_cache_{get,release} definition
  mm, fs: remove remaining PAGE_CACHE_* and page_cache_{get,release} usage
  mm, fs: get rid of PAGE_CACHE_* and page_cache_{get,release} macros
2016-04-04 10:50:24 -07:00
Kirill A. Shutemov
ea1754a084 mm, fs: remove remaining PAGE_CACHE_* and page_cache_{get,release} usage
Mostly direct substitution with occasional adjustment or removing
outdated comments.

Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Acked-by: Michal Hocko <mhocko@suse.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-04-04 10:41:08 -07:00
Kirill A. Shutemov
09cbfeaf1a mm, fs: get rid of PAGE_CACHE_* and page_cache_{get,release} macros
PAGE_CACHE_{SIZE,SHIFT,MASK,ALIGN} macros were introduced *long* time
ago with promise that one day it will be possible to implement page
cache with bigger chunks than PAGE_SIZE.

This promise never materialized.  And unlikely will.

We have many places where PAGE_CACHE_SIZE assumed to be equal to
PAGE_SIZE.  And it's constant source of confusion on whether
PAGE_CACHE_* or PAGE_* constant should be used in a particular case,
especially on the border between fs and mm.

Global switching to PAGE_CACHE_SIZE != PAGE_SIZE would cause to much
breakage to be doable.

Let's stop pretending that pages in page cache are special.  They are
not.

The changes are pretty straight-forward:

 - <foo> << (PAGE_CACHE_SHIFT - PAGE_SHIFT) -> <foo>;

 - <foo> >> (PAGE_CACHE_SHIFT - PAGE_SHIFT) -> <foo>;

 - PAGE_CACHE_{SIZE,SHIFT,MASK,ALIGN} -> PAGE_{SIZE,SHIFT,MASK,ALIGN};

 - page_cache_get() -> get_page();

 - page_cache_release() -> put_page();

This patch contains automated changes generated with coccinelle using
script below.  For some reason, coccinelle doesn't patch header files.
I've called spatch for them manually.

The only adjustment after coccinelle is revert of changes to
PAGE_CAHCE_ALIGN definition: we are going to drop it later.

There are few places in the code where coccinelle didn't reach.  I'll
fix them manually in a separate patch.  Comments and documentation also
will be addressed with the separate patch.

virtual patch

@@
expression E;
@@
- E << (PAGE_CACHE_SHIFT - PAGE_SHIFT)
+ E

@@
expression E;
@@
- E >> (PAGE_CACHE_SHIFT - PAGE_SHIFT)
+ E

@@
@@
- PAGE_CACHE_SHIFT
+ PAGE_SHIFT

@@
@@
- PAGE_CACHE_SIZE
+ PAGE_SIZE

@@
@@
- PAGE_CACHE_MASK
+ PAGE_MASK

@@
expression E;
@@
- PAGE_CACHE_ALIGN(E)
+ PAGE_ALIGN(E)

@@
expression E;
@@
- page_cache_get(E)
+ get_page(E)

@@
expression E;
@@
- page_cache_release(E)
+ put_page(E)

Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Acked-by: Michal Hocko <mhocko@suse.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-04-04 10:41:08 -07:00
Yauhen Kharuzhy
7ccefb98ce btrfs: Reset IO error counters before start of device replacing
If device replace entry was found on disk at mounting and its num_write_errors
stats counter has non-NULL value, then replace operation will never be
finished and -EIO error will be reported by btrfs_scrub_dev() because
this counter is never reset.

 # mount -o degraded /media/a4fb5c0a-21c5-4fe7-8d0e-fdd87d5f71ee/
 # btrfs replace status /media/a4fb5c0a-21c5-4fe7-8d0e-fdd87d5f71ee/
 Started on 25.Mar 07:28:00, canceled on 25.Mar 07:28:01 at 0.0%, 40 write errs, 0 uncorr. read errs
 # btrfs replace start -B 4 /dev/sdg /media/a4fb5c0a-21c5-4fe7-8d0e-fdd87d5f71ee/
 ERROR: ioctl(DEV_REPLACE_START) failed on "/media/a4fb5c0a-21c5-4fe7-8d0e-fdd87d5f71ee/": Input/output error, no error

Reset num_write_errors and num_uncorrectable_read_errors counters in the
dev_replace structure before start of replacing.

Signed-off-by: Yauhen Kharuzhy <yauhen.kharuzhy@zavadatar.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2016-04-04 16:29:22 +02:00