28276 Commits

Author SHA1 Message Date
Jan Kara
e7848683ae btrfs: Push mnt_want_write() outside of i_mutex
When mnt_want_write() starts to handle freezing it will get a full lock
semantics requiring proper lock ordering. So push mnt_want_write() call
consistently outside of i_mutex.

CC: Chris Mason <chris.mason@oracle.com>
CC: linux-btrfs@vger.kernel.org
Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2012-07-31 01:02:51 +04:00
Jan Kara
e24f17da35 fat: Push mnt_want_write() outside of i_mutex
When mnt_want_write() starts to handle freezing it will get a full lock
semantics requiring proper lock ordering. So push mnt_want_write() call
outside of i_mutex as in other places.

CC: OGAWA Hirofumi <hirofumi@mail.parknet.co.jp>
Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2012-07-31 01:02:50 +04:00
Jan Kara
c30dabfe5d fs: Push mnt_want_write() outside of i_mutex
Currently, mnt_want_write() is sometimes called with i_mutex held and sometimes
without it. This isn't really a problem because mnt_want_write() is a
non-blocking operation (essentially has a trylock semantics) but when the
function starts to handle also frozen filesystems, it will get a full lock
semantics and thus proper lock ordering has to be established. So move
all mnt_want_write() calls outside of i_mutex.

One non-trivial case needing conversion is kern_path_create() /
user_path_create() which didn't include mnt_want_write() but now needs to
because it acquires i_mutex.  Because there are virtual file systems which
don't bother with freeze / remount-ro protection we actually provide both
versions of the function - one which calls mnt_want_write() and one which does
not.

[AV: scratch the previous, mnt_want_write() has been moved to kern_path_create()
by now]

Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2012-07-31 01:02:49 +04:00
Jan Kara
14ae417c6f sysfs: Push file_update_time() into bin_page_mkwrite()
CC: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Acked-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2012-07-31 01:02:47 +04:00
Jan Kara
a63e9b2e76 gfs2: Push file_update_time() into gfs2_page_mkwrite()
CC: Steven Whitehouse <swhiteho@redhat.com>
CC: cluster-devel@redhat.com
Acked-by: Steven Whitehouse <swhiteho@redhat.com>
Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2012-07-31 01:02:46 +04:00
Jan Kara
120c2bcad8 9p: Push file_update_time() into v9fs_vm_page_mkwrite()
CC: Eric Van Hensbergen <ericvh@gmail.com>
CC: Ron Minnich <rminnich@sandia.gov>
CC: Latchesar Ionkov <lucho@ionkov.net>
CC: v9fs-developer@lists.sourceforge.net
Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2012-07-31 01:02:46 +04:00
Jan Kara
3ca9c3bd8a ceph: Push file_update_time() into ceph_page_mkwrite()
CC: Sage Weil <sage@newdream.net>
CC: ceph-devel@vger.kernel.org
Acked-by: Sage Weil <sage@newdream.net>
Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2012-07-31 01:02:45 +04:00
Jan Kara
5e8830dc85 fs: Push file_update_time() into __block_page_mkwrite()
Tested-by: Kamal Mostafa <kamal@canonical.com>
Tested-by: Peter M. Petrakis <peter.petrakis@canonical.com>
Tested-by: Dann Frazier <dann.frazier@canonical.com>
Tested-by: Massimo Morana <massimo.morana@canonical.com>
Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2012-07-31 01:02:44 +04:00
Al Viro
64894cf843 simplify lookup_open()/atomic_open() - do the temporary mnt_want_write() early
The write ref to vfsmount taken in lookup_open()/atomic_open() is going to
be dropped; we take the one to stay in dentry_open().  Just grab the temporary
in caller if it looks like we are going to need it (create/truncate/writable open)
and pass (by value) "has it succeeded" flag.  Instead of doing mnt_want_write()
inside, check that flag and treat "false" as "mnt_want_write() has just failed".
mnt_want_write() is cheap and the things get considerably simpler and more robust
that way - we get it and drop it in the same function, to start with, rather
than passing a "has something in the guts of really scary functions taken it"
back to caller.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2012-07-31 00:53:35 +04:00
Linus Torvalds
37cd9600a9 xfs: update for 3.6-rc1
Numerous cleanups and several bug fixes.  Here are some highlights:
 
 * Discontiguous directory buffer support
 * Inode allocator refactoring
 * Removal of the IO lock in inode reclaim
 * Implementation of .update_time
 * Fix for handling of EOF in xfs_vm_writepage
 * Fix for races in xfsaild, and idle mode is re-enabled
 * Fix for a crash in xfs_buf completion handlers on unmount.
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1.4.10 (GNU/Linux)
 
 iQIcBAABAgAGBQJQFtCvAAoJENaLyazVq6ZOIuEQAINJXb4SK9oBrdwGmq+Vsqf2
 Eh4OmzZmdnSPrxfFGmqvyL9DdUBvGBuidwOcVLMAXGtzbxE9USK9NuKC5zN/hJip
 8tIyv/8bqZ0aD4RJlHGN5zKFoQh/9Tag+JsaaqWstO8Ir1tA/5p04hDAz492btfT
 49SvnV64sJ1fi7pmaJblMWMMtlWJjD6iOldaHwnKBQ3LKmcgy9sD9DY5HiGOTr1j
 ecKtucX7B8Q9oFLKHaKEwTYZRRYDNuTbqZmI6hlEcA5hT280jotsGA4q/aXx/gHS
 lZuBaqVtNFT5WCKm+j/et76tmTfIh0CSbo64ZfgSOESy2BkEVXHg5XJ1gDvPdV+L
 6eBlUx3jaiNyFVHxVzFhzwKC/XdaITCd/ixFEogRDmoppDXencTCibLJXHNXxupN
 BCAyTLCxEJIE9WCeOMmwHA0450bMY4or13NGep57pIvG8GomtdG1WncTRIo84KV5
 0W5ocaUTGP7ROsr+KF8U9C7H866OHzVFijA+vvcTy8GtsT/xOCFxuJrqPVb+kgD7
 mIKaoK7iH6Kufu433TzsLEcUkF36gq/7NytPKjQhURLpZhxkHG3rq6LC0HXp6uuZ
 QgX5Y5Gl7SwDovIrndXmQXRnGrzvqHLguZl65+rB1CKggjemkLSdSLhryoNVjLU2
 iB7/hvzOUdYFMRRz2mLc
 =2wkC
 -----END PGP SIGNATURE-----

Merge tag 'for-linus-v3.6-rc1' of git://oss.sgi.com/xfs/xfs

Pull xfs update from Ben Myers:
 "Numerous cleanups and several bug fixes.  Here are some highlights:

   - Discontiguous directory buffer support
   - Inode allocator refactoring
   - Removal of the IO lock in inode reclaim
   - Implementation of .update_time
   - Fix for handling of EOF in xfs_vm_writepage
   - Fix for races in xfsaild, and idle mode is re-enabled
   - Fix for a crash in xfs_buf completion handlers on unmount."

Fix up trivial conflicts in fs/xfs/{xfs_buf.c,xfs_log.c,xfs_log_priv.h}
due to duplicate patches that had already been merged for 3.5.

* tag 'for-linus-v3.6-rc1' of git://oss.sgi.com/xfs/xfs: (44 commits)
  xfs: wait for the write the superblock on unmount
  xfs: re-enable xfsaild idle mode and fix associated races
  xfs: remove iolock lock classes
  xfs: avoid the iolock in xfs_free_eofblocks for evicted inodes
  xfs: do not take the iolock in xfs_inactive
  xfs: remove xfs_inactive_attrs
  xfs: clean up xfs_inactive
  xfs: do not read the AGI buffer in xfs_dialloc until nessecary
  xfs: refactor xfs_ialloc_ag_select
  xfs: add a short cut to xfs_dialloc for the non-NULL agbp case
  xfs: remove the alloc_done argument to xfs_dialloc
  xfs: split xfs_dialloc
  xfs: remove xfs_ialloc_find_free
  Prefix IO_XX flags with XFS_IO_XX to avoid namespace colision.
  xfs: remove xfs_inotobp
  xfs: merge xfs_itobp into xfs_imap_to_bp
  xfs: handle EOF correctly in xfs_vm_writepage
  xfs: implement ->update_time
  xfs: fix comment typo of struct xfs_da_blkinfo.
  xfs: do not call xfs_bdstrat_cb in xfs_buf_iodone_callbacks
  ...
2012-07-30 13:37:53 -07:00
Sage Weil
8842b3be96 ceph: clean up useless d_parent checks
d_parent is never NULL, and IS_ROOT() is the proper way to check for a
(non-self-referential) parent.

Reported-by: Al Viro <viro@ZenIV.linux.org.uk>
Signed-off-by: Sage Weil <sage@inktank.com>
2012-07-30 09:29:54 -07:00
Al Viro
f8310c5920 fix O_EXCL handling for devices
O_EXCL without O_CREAT has different semantics; it's "fail if already opened",
not "fail if already exists".  commit 71574865 broke that...

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2012-07-30 11:50:30 +04:00
Mark Tinguely
9a57fa8ee7 xfs: wait for the write the superblock on unmount
v2: Add the xfs_buf_lock to xfs_quiesce_attr().
    Add explaination why xfs_buf_lock() is used to wait for write.

xfs_wait_buftarg() does not wait for the completion of the write of the
uncached superblock. This write can race with the shutdown of the log
and causes a panic if the write does not win the race.

During the log write, xfsaild_push() will lock the buffer and set the
XBF_ASYNC flag. Because the XBF_FLAG is set, complete() is not performed
on the buffer's iowait entry, we cannot call xfs_buf_iowait() to wait
for the write to complete. The buffer's lock is held until the write is
complete, so we can block on a xfs_buf_lock() request to be notified
that the write is complete.

Signed-off-by: Mark Tinguely <tinguely@sgi.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Ben Myers <bpm@sgi.com>
2012-07-29 16:34:19 -05:00
Brian Foster
8375f922aa xfs: re-enable xfsaild idle mode and fix associated races
xfsaild idle mode logic currently leads to a couple hangs:

1.) If xfsaild is rescheduled in during an incremental scan
    (i.e., tout != 0) and the target has been updated since
    the previous run, we can hit the new target and go into
    idle mode with a still populated ail.
2.) A wake up is only issued when the target is pushed forward.
    The wake up can race with xfsaild if it is currently in the
    process of entering idle mode, causing future wake up
    events to be lost.

These hangs have been reproduced and verified as fixed by
running xfstests 273 in a loop on a slightly modified upstream
kernel. The kernel is modified to re-enable idle mode as
previously implemented (when count == 0) and with a revert of
commit 670ce93f, which includes performance improvements that
make this harder to reproduce.

The solution, the algorithm for which has been outlined by
Dave Chinner, is to modify xfsaild to enter idle mode only when
the ail is empty and the push target has not been moved forward
since the last push.

Signed-off-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Ben Myers <bpm@sgi.com>
2012-07-29 16:27:57 -05:00
Christoph Hellwig
4f59af758f xfs: remove iolock lock classes
Content-Disposition: inline; filename=xfs-remove-iolock-classes

Now that we never take the iolock during inode reclaim we don't need
to play games with lock classes.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Rich Johnston <rjohnston@sgi.com>
Signed-off-by: Ben Myers <bpm@sgi.com>
2012-07-29 16:23:51 -05:00
Christoph Hellwig
5a15322da1 xfs: avoid the iolock in xfs_free_eofblocks for evicted inodes
Same rational as the last patch - these inodes are not reachable, so
don't bother with locking.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Rich Johnston <rjohnston@sgi.com>
Signed-off-by: Ben Myers <bpm@sgi.com>
2012-07-29 16:22:20 -05:00
Christoph Hellwig
0b56185b0d xfs: do not take the iolock in xfs_inactive
An inode that enters xfs_inactive has been removed from all global
lists but the inode hash, and can't be recycled in xfs_iget before
it has been marked reclaimable.  Thus taking the iolock in here
is not nessecary at all, and given the amount of lockdep false
positives it has triggered already I'd rather remove the locking.

The only change outside of xfs_inactive is relaxing an assert in
xfs_itruncate_extents.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Rich Johnston <rjohnston@sgi.com>
Signed-off-by: Ben Myers <bpm@sgi.com>
2012-07-29 16:16:49 -05:00
Christoph Hellwig
fe67be036f xfs: remove xfs_inactive_attrs
Remove this helper as the code flow is a lot more obvious when it gets
merged into its only caller.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Rich Johnston <rjohnston@sgi.com>
Signed-off-by: Ben Myers <bpm@sgi.com>
2012-07-29 16:15:33 -05:00
Christoph Hellwig
b373e98daa xfs: clean up xfs_inactive
The code to reserve log space and join the inode to the transaction is
common for all cases, so don't duplicate it.  Also remove the trivial
xfs_inactive_symlink_local helper which can simply be opencode now.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Rich Johnston <rjohnston@sgi.com>
Signed-off-by: Ben Myers <bpm@sgi.com>
2012-07-29 16:13:09 -05:00
Christoph Hellwig
be60fe54b2 xfs: do not read the AGI buffer in xfs_dialloc until nessecary
Refactor the AG selection loop in xfs_dialloc to operate on the in-memory
perag data as much as possible.  We only read the AGI buffer once we have
selected an AG to allocate inodes now instead of for every AG considered.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Mark Tinguely <tinguely@sgi.com>
Signed-off-by: Ben Myers <bpm@sgi.com>
2012-07-29 16:10:54 -05:00
Christoph Hellwig
55d6af64cb xfs: refactor xfs_ialloc_ag_select
Loop over the in-core perag structures and prefer using pagi_freecount over
going out to the AGI buffer where possible.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Mark Tinguely <tinguely@sgi.com>
Signed-off-by: Ben Myers <bpm@sgi.com>
2012-07-29 16:08:13 -05:00
Christoph Hellwig
4bb61069d2 xfs: add a short cut to xfs_dialloc for the non-NULL agbp case
In this case we already have selected an AG and know it has free space
beause the buffer lock never got released.  Jump directly into xfs_dialloc_ag
and short cut the AG selection loop.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Mark Tinguely <tinguely@sgi.com>
Signed-off-by: Ben Myers <bpm@sgi.com>
2012-07-29 16:03:23 -05:00
Christoph Hellwig
08358906ed xfs: remove the alloc_done argument to xfs_dialloc
We can simplify check the IO_agbp pointer for being non-NULL instead of
passing another argument through two layers of function calls.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Mark Tinguely <tinguely@sgi.com>
Signed-off-by: Ben Myers <bpm@sgi.com>
2012-07-29 16:00:31 -05:00
Christoph Hellwig
f2ecc5e453 xfs: split xfs_dialloc
Move the actual allocation once we have selected an allocation group into a
separate helper, and make xfs_dialloc a wrapper around it.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Ben Myers <bpm@sgi.com>
Signed-off-by: Ben Myers <bpm@sgi.com>
2012-07-29 15:56:49 -05:00
Al Viro
bf8848918d lockd: handle lockowner allocation failure in nlmclnt_proc()
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2012-07-29 23:17:39 +04:00
Al Viro
446945ab9a lockd: shift grabbing a reference to nlm_host into nlm_alloc_call()
It's used both for client and server hosts; we can't do nlmclnt_release_host()
on failure exits, since the host might need nlmsvc_release_host(), with BUG_ON()
for calling the wrong one.  Makes life simpler for callers, actually...

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2012-07-29 23:09:57 +04:00
Kees Cook
a51d9eaa41 fs: add link restriction audit reporting
Adds audit messages for unexpected link restriction violations so that
system owners will have some sort of potentially actionable information
about misbehaving processes.

Signed-off-by: Kees Cook <keescook@chromium.org>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2012-07-29 21:43:08 +04:00
Kees Cook
800179c9b8 fs: add link restrictions
This adds symlink and hardlink restrictions to the Linux VFS.

Symlinks:

A long-standing class of security issues is the symlink-based
time-of-check-time-of-use race, most commonly seen in world-writable
directories like /tmp. The common method of exploitation of this flaw
is to cross privilege boundaries when following a given symlink (i.e. a
root process follows a symlink belonging to another user). For a likely
incomplete list of hundreds of examples across the years, please see:
http://cve.mitre.org/cgi-bin/cvekey.cgi?keyword=/tmp

The solution is to permit symlinks to only be followed when outside
a sticky world-writable directory, or when the uid of the symlink and
follower match, or when the directory owner matches the symlink's owner.

Some pointers to the history of earlier discussion that I could find:

 1996 Aug, Zygo Blaxell
  http://marc.info/?l=bugtraq&m=87602167419830&w=2
 1996 Oct, Andrew Tridgell
  http://lkml.indiana.edu/hypermail/linux/kernel/9610.2/0086.html
 1997 Dec, Albert D Cahalan
  http://lkml.org/lkml/1997/12/16/4
 2005 Feb, Lorenzo Hernández García-Hierro
  http://lkml.indiana.edu/hypermail/linux/kernel/0502.0/1896.html
 2010 May, Kees Cook
  https://lkml.org/lkml/2010/5/30/144

Past objections and rebuttals could be summarized as:

 - Violates POSIX.
   - POSIX didn't consider this situation and it's not useful to follow
     a broken specification at the cost of security.
 - Might break unknown applications that use this feature.
   - Applications that break because of the change are easy to spot and
     fix. Applications that are vulnerable to symlink ToCToU by not having
     the change aren't. Additionally, no applications have yet been found
     that rely on this behavior.
 - Applications should just use mkstemp() or O_CREATE|O_EXCL.
   - True, but applications are not perfect, and new software is written
     all the time that makes these mistakes; blocking this flaw at the
     kernel is a single solution to the entire class of vulnerability.
 - This should live in the core VFS.
   - This should live in an LSM. (https://lkml.org/lkml/2010/5/31/135)
 - This should live in an LSM.
   - This should live in the core VFS. (https://lkml.org/lkml/2010/8/2/188)

Hardlinks:

On systems that have user-writable directories on the same partition
as system files, a long-standing class of security issues is the
hardlink-based time-of-check-time-of-use race, most commonly seen in
world-writable directories like /tmp. The common method of exploitation
of this flaw is to cross privilege boundaries when following a given
hardlink (i.e. a root process follows a hardlink created by another
user). Additionally, an issue exists where users can "pin" a potentially
vulnerable setuid/setgid file so that an administrator will not actually
upgrade a system fully.

The solution is to permit hardlinks to only be created when the user is
already the existing file's owner, or if they already have read/write
access to the existing file.

Many Linux users are surprised when they learn they can link to files
they have no access to, so this change appears to follow the doctrine
of "least surprise". Additionally, this change does not violate POSIX,
which states "the implementation may require that the calling process
has permission to access the existing file"[1].

This change is known to break some implementations of the "at" daemon,
though the version used by Fedora and Ubuntu has been fixed[2] for
a while. Otherwise, the change has been undisruptive while in use in
Ubuntu for the last 1.5 years.

[1] http://pubs.opengroup.org/onlinepubs/9699919799/functions/linkat.html
[2] http://anonscm.debian.org/gitweb/?p=collab-maint/at.git;a=commitdiff;h=f4114656c3a6c6f6070e315ffdf940a49eda3279

This patch is based on the patches in Openwall and grsecurity, along with
suggestions from Al Viro. I have added a sysctl to enable the protected
behavior, and documentation.

Signed-off-by: Kees Cook <keescook@chromium.org>
Acked-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2012-07-29 21:37:58 +04:00
Jeff Layton
3134f37e93 vfs: don't let do_last pass negative dentry to audit_inode
I can reliably reproduce the following panic by simply setting an audit
rule on a recent 3.5.0+ kernel:

 BUG: unable to handle kernel NULL pointer dereference at 0000000000000040
 IP: [<ffffffff810d1250>] audit_copy_inode+0x10/0x90
 PGD 7acd9067 PUD 7b8fb067 PMD 0
 Oops: 0000 [#86] SMP
 Modules linked in: nfs nfs_acl auth_rpcgss fscache lockd sunrpc tpm_bios btrfs zlib_deflate libcrc32c kvm_amd kvm joydev virtio_net pcspkr i2c_piix4 floppy virtio_balloon microcode virtio_blk cirrus drm_kms_helper ttm drm i2c_core [last unloaded: scsi_wait_scan]
 CPU 0
 Pid: 1286, comm: abrt-dump-oops Tainted: G      D      3.5.0+ #1 Bochs Bochs
 RIP: 0010:[<ffffffff810d1250>]  [<ffffffff810d1250>] audit_copy_inode+0x10/0x90
 RSP: 0018:ffff88007aebfc38  EFLAGS: 00010282
 RAX: 0000000000000000 RBX: ffff88003692d860 RCX: 00000000000038c4
 RDX: 0000000000000000 RSI: ffff88006baf5d80 RDI: ffff88003692d860
 RBP: ffff88007aebfc68 R08: 0000000000000000 R09: 0000000000000000
 R10: 0000000000000000 R11: 0000000000000001 R12: 0000000000000000
 R13: ffff880036d30f00 R14: ffff88006baf5d80 R15: ffff88003692d800
 FS:  00007f7562634740(0000) GS:ffff88007fc00000(0000) knlGS:0000000000000000
 CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
 CR2: 0000000000000040 CR3: 000000003643d000 CR4: 00000000000006f0
 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
 DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
 Process abrt-dump-oops (pid: 1286, threadinfo ffff88007aebe000, task ffff880079614530)
 Stack:
  ffff88007aebfdf8 ffff88007aebff28 ffff88007aebfc98 ffffffff81211358
  ffff88003692d860 0000000000000000 ffff88007aebfcc8 ffffffff810d4968
  ffff88007aebfcc8 ffff8800000038c4 0000000000000000 0000000000000000
 Call Trace:
  [<ffffffff81211358>] ? ext4_lookup+0xe8/0x160
  [<ffffffff810d4968>] __audit_inode+0x118/0x2d0
  [<ffffffff811955a9>] do_last+0x999/0xe80
  [<ffffffff81191fe8>] ? inode_permission+0x18/0x50
  [<ffffffff81171efa>] ? kmem_cache_alloc_trace+0x11a/0x130
  [<ffffffff81195b4a>] path_openat+0xba/0x420
  [<ffffffff81196111>] do_filp_open+0x41/0xa0
  [<ffffffff811a24bd>] ? alloc_fd+0x4d/0x120
  [<ffffffff811855cd>] do_sys_open+0xed/0x1c0
  [<ffffffff810d40cc>] ? __audit_syscall_entry+0xcc/0x300
  [<ffffffff811856c1>] sys_open+0x21/0x30
  [<ffffffff81611ca9>] system_call_fastpath+0x16/0x1b
  RSP <ffff88007aebfc38>
 CR2: 0000000000000040

The problem is that do_last is passing a negative dentry to audit_inode.
The comments on lookup_open note that it can pass back a negative dentry
if O_CREAT is not set.

This patch fixes the oops, but I'm not clear on whether there's a better
approach.

Cc: Miklos Szeredi <miklos@szeredi.hu>
Signed-off-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2012-07-29 21:27:03 +04:00
Al Viro
e4fad8e5d2 consolidate pipe file creation
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2012-07-29 21:24:19 +04:00
Al Viro
b5bcdda327 take grabbing f->f_path to do_dentry_open()
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2012-07-29 21:24:18 +04:00
Al Viro
5c33b183a3 uninline file_free_rcu()
What inline?  Its only use is passing its address to call_rcu(), for fuck sake!

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2012-07-29 21:24:17 +04:00
Al Viro
0b1d90119a ecryptfs_lookup_interpose(): allocate dentry_info first
less work on failure that way

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2012-07-29 21:24:17 +04:00
Al Viro
bc65a1215e sanitize ecryptfs_lookup()
* ->lookup() never gets hit with . or ..
* dentry it gets is unhashed, so unless we had gone and hashed it ourselves, there's
no need to d_drop() the sucker.
* wrong name printed in one of the printks (NULL, in fact)

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2012-07-29 21:24:16 +04:00
Al Viro
a8104a9fcd pull mnt_want_write()/mnt_drop_write() into kern_path_create()/done_path_create() resp.
One side effect - attempt to create a cross-device link on a read-only fs fails
with EROFS instead of EXDEV now.  Makes more sense, POSIX allows, etc.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2012-07-29 21:24:15 +04:00
Al Viro
8e4bfca1d1 mknod: take sanity checks on mode into the very beginning
Note that applying umask can't affect their results.  While
that affects errno in cases like
	mknod("/no_such_directory/a", 030000)
yielding -EINVAL (due to impossible mode_t) instead of
-ENOENT (due to inexistent directory), IMO that makes a lot
more sense, POSIX allows to return either and any software
that relies on getting -ENOENT instead of -EINVAL in that
case deserves everything it gets.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2012-07-29 21:24:14 +04:00
Al Viro
921a1650de new helper: done_path_create()
releases what needs to be released after {kern,user}_path_create()

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2012-07-29 21:24:13 +04:00
Linus Torvalds
173f865474 The usual collection of bug fixes and optimizations. Perhaps of
greatest note is a speed up for parallel, non-allocating DIO writes,
 since we no longer take the i_mutex lock in that case.  For bug fixes,
 we fix an incorrect overhead calculation which caused slightly
 incorrect results for df(1) and statfs(2).  We also fixed bugs in the
 metadata checksum feature.
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1.4.12 (GNU/Linux)
 
 iQIcBAABCAAGBQJQE1SzAAoJENNvdpvBGATwzOMQAKxVkaTqkMc+cUahLLUkFdGQ
 xnmigHufVuOvv+B8l1p6vbYx+qOftztqXF1WC41j8mTfEDs19jKXv3LI57TJ+bIW
 a/Kp1CjMicRs/HGhFPNWp1D+N8J95MTFP6Y9biXSmBBvefg2NSwxpg48yZtjUy1/
 Zl0414AqMvTJyqKKOUa++oyl3XlawnzDZ+6a7QPKsrAaDOU5977W4y2tZkNFk84d
 qRjTfaiX13aVe7XupgQHe/jtk40Sj38pyGVGiAGlHOZhZtYKE6MzB8OreGiMTy8z
 rmJz/CQUsQ8+sbKYhAcDru+bMrKghbO0u78XRz9CY+YpVFF/Xs2QiXoV0ZOGkIm6
 eokq7jEV0K+LtDEmM3PkmUPgXfYw5damTv8qWuBUFd4wtVE5x/DmK8AJVMidCAUN
 GkVR+rEbbEi7RCwsuac/aKB8baVQCTiJ5tfNTgWh9zll+9GZSk+U71Pp0KdcJGiS
 nxitAZ+20hZN2CQctlmaGbCPTPYCWU4hQ3IuMdTlQTQAs8S0y1FtylTRsXcC1eVR
 i1hBS/dVw5PVCaqoX79zYrByUymgX0ZaYY6seRT6+U9xPGDCSNJ9mjXZL3fttnOC
 d5gsx/pbMIAv52G5Hj6DfsXR2JFmmxsaIzsLtRvKi9q89d84XaZfbUsHYjn4Neym
 5lTKaSQHU71cKCxrStHC
 =VAVB
 -----END PGP SIGNATURE-----

Merge tag 'ext4_for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4

Pull ext4 updates from Ted Ts'o:
 "The usual collection of bug fixes and optimizations.  Perhaps of
  greatest note is a speed up for parallel, non-allocating DIO writes,
  since we no longer take the i_mutex lock in that case.

  For bug fixes, we fix an incorrect overhead calculation which caused
  slightly incorrect results for df(1) and statfs(2).  We also fixed
  bugs in the metadata checksum feature."

* tag 'ext4_for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4: (23 commits)
  ext4: undo ext4_calc_metadata_amount if we fail to claim space
  ext4: don't let i_reserved_meta_blocks go negative
  ext4: fix hole punch failure when depth is greater than 0
  ext4: remove unnecessary argument from __ext4_handle_dirty_metadata()
  ext4: weed out ext4_write_super
  ext4: remove unnecessary superblock dirtying
  ext4: convert last user of ext4_mark_super_dirty() to ext4_handle_dirty_super()
  ext4: remove useless marking of superblock dirty
  ext4: fix ext4 mismerge back in January
  ext4: remove dynamic array size in ext4_chksum()
  ext4: remove unused variable in ext4_update_super()
  ext4: make quota as first class supported feature
  ext4: don't take the i_mutex lock when doing DIO overwrites
  ext4: add a new nolock flag in ext4_map_blocks
  ext4: split ext4_file_write into buffered IO and direct IO
  ext4: remove an unused statement in ext4_mb_get_buddy_page_lock()
  ext4: fix out-of-date comments in extents.c
  ext4: use s_csum_seed instead of i_csum_seed for xattr block
  ext4: use proper csum calculation in ext4_rename
  ext4: fix overhead calculation used by ext4_statfs()
  ...
2012-07-27 20:52:25 -07:00
Stanislav Kinsbursky
2c142baa7b NFSd: make boot_time variable per network namespace
NFSd's boot_time represents grace period start point in time.

Signed-off-by: Stanislav Kinsbursky <skinsbursky@parallels.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2012-07-27 16:49:22 -04:00
Stanislav Kinsbursky
a51c84ed50 NFSd: make grace end flag per network namespace
Signed-off-by: Stanislav Kinsbursky <skinsbursky@parallels.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2012-07-27 16:49:22 -04:00
Stanislav Kinsbursky
5630f7fa97 Lockd: move grace period management from lockd() to per-net functions
Signed-off-by: Stanislav Kinsbursky <skinsbursky@parallels.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2012-07-27 16:49:22 -04:00
Stanislav Kinsbursky
5ccb0066f2 LockD: pass actual network namespace to grace period management functions
Passed network namespace replaced hard-coded init_net

Signed-off-by: Stanislav Kinsbursky <skinsbursky@parallels.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2012-07-27 16:49:22 -04:00
Stanislav Kinsbursky
db9c455341 LockD: manage grace list per network namespace
Signed-off-by: Stanislav Kinsbursky <skinsbursky@parallels.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2012-07-27 16:49:22 -04:00
Stanislav Kinsbursky
9695c7057f SUNRPC: service request network namespace helper introduced
This is a cleanup patch - makes code looks simplier.
It replaces widely used rqstp->rq_xprt->xpt_net by introduced SVC_NET(rqstp).

Signed-off-by: Stanislav Kinsbursky <skinsbursky@parallels.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2012-07-27 16:49:21 -04:00
Stanislav Kinsbursky
5e1533c788 NFSd: make nfsd4_manager allocated per network namespace context.
Signed-off-by: Stanislav Kinsbursky <skinsbursky@parallels.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2012-07-27 16:49:21 -04:00
Stanislav Kinsbursky
08d44a35a9 LockD: make lockd manager allocated per network namespace
Signed-off-by: Stanislav Kinsbursky <skinsbursky@parallels.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2012-07-27 16:48:44 -04:00
Stanislav Kinsbursky
66547b0251 LockD: manage grace period per network namespace
Signed-off-by: Stanislav Kinsbursky <skinsbursky@parallels.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2012-07-27 16:48:44 -04:00
Stanislav Kinsbursky
e2edaa98cb Lockd: add more debug to host shutdown functions
Signed-off-by: Stanislav Kinsbursky <skinsbursky@parallels.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2012-07-27 16:48:44 -04:00
Stanislav Kinsbursky
d5850ff9ea Lockd: host complaining function introduced
Just a small cleanup.

Signed-off-by: Stanislav Kinsbursky <skinsbursky@parallels.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2012-07-27 16:48:44 -04:00
Stanislav Kinsbursky
caa4e76b6f LockD: manage used host count per networks namespace
This patch introduces moves nrhosts in per-net data.
It also adds kernel warning to nlm_shutdown_hosts_net() about remaining hosts
in specified network namespace context.

Signed-off-by: Stanislav Kinsbursky <skinsbursky@parallels.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2012-07-27 16:48:43 -04:00