Currently, the NFS I/O counters count the number of bytes requested
by applications, rather than the number of bytes actually read by the
system calls.
The number of bytes requested for reads is actually not that useful,
because the value is usually a buffer size for reads. That is, that
requested number is usually a maximum, and frequently doesn't reflect
the actual number of bytes read.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Nit: The VFSOPEN and VFSFLUSH counters are function call counters.
Count every call to these routines.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
When session is reset, client can renegotiate slot table size.
Signed-off-by: Andy Adamson <andros@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Drain the fore channel and reset the max_slots to the new value.
Signed-off-by: Andy Adamson <andros@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
For now the back channel ca_maxresponsesize_cached is 0 and there is no
backchannel DRC. Return NFS4ERR_REP_TOO_BIG_TO_CACHE when a cb_sequence
cachethis is true. When it is false, return NFS4ERR_RETRY_UNCACHED_REP as the
next operation error.
Remember the replay error accross compound operation processing.
Signed-off-by: Andy Adamson <andros@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Make all cb_sequence arguments available to verify_seqid which will make
replay decisions.
Signed-off-by: Andy Adamson <andros@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
All callback operations have arguments to decode and require processing.
The preprocess_nfs4X_op functions catch unsupported or illegal ops so
decode_args and process_op pointers are always non NULL.
Signed-off-by: Andy Adamson <andros@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Skip all other processing when error is encountered.
Signed-off-by: Andy Adamson <andros@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Set NFS4ERR_RESOURCE as CB_COMPOUND status and do not return an op on
decode_op_hdr or encode_op_hdr buffer overflow.
NFS4ERR_RESOURCE is correct for v4.0. Will fix the return for v4.1 along with
all the other NFS4ERR_RESOURCE errors in a later patch.
Signed-off-by: Andy Adamson <andros@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
If a CB_SEQUENCE referring call triple matches a slot table entry, the
client is still waiting for a response to the original request. In this
case, return NFS4ERR_DELAY as the response to the callback.
Signed-off-by: Mike Sager <sager@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Traverse a list of referring calls and look for a session/slot/seq number
match.
Signed-off-by: Mike Sager <sager@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
For the CREATE_SESSION attribute ca_maxresponsesize_cached, calculate
the value based on the rpc reply header size plus the maximum nfs compound
reply size.
Signed-off-by: Mike Sager <sager@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Add a wrapper around rpc_call_sync that handles -EKEYEXPIRED errors from
the RPC layer as it would an -EJUKEBOX error if NFSv2 had such a thing.
Also, add a handler for that error for async calls that makes it
resubmit the RPC on -EKEYEXPIRED.
Signed-off-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
We're using -EKEYEXPIRED to indicate that a krb5 credcache contains an
expired ticket and that we should have the NFS layer retry the RPC call
instead of returning an error back to the caller. Handle this as we
would an -EJUKEBOX error return.
Signed-off-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
If a KRB5 TGT ticket expires, we don't want to return an error
immediatel. If someone has a long running job and just forgets to run
"kinit" in time then this will make it fail.
Instead, we want to treat this situation as we would NFS4ERR_DELAY and
retry the upcall after delaying a bit with an exponential backoff.
This patch just makes any place that would handle NFS4ERR_DELAY also
handle -EKEYEXPIRED the same way. In the future, we may want to be more
sophisticated however and handle hard vs. soft mounts differently, or
specify some upper limit on how long we'll wait for a new TGT to be
acquired.
Signed-off-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ericvh/v9fs:
9p: fix p9_client_destroy unconditional calling v9fs_put_trans
9p: fix memory leak in v9fs_parse_options()
9p: Fix the kernel crash on a failed mount
9p: fix option parsing
9p: Include fsync support for 9p client
net/9p: fix statsize inside twstat
net/9p: fail when user specifies a transport which we can't find
net/9p: fix virtio transport to correctly update status on connect
* 'fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jlbec/ocfs2:
ocfs2/cluster: Make o2net connect messages KERN_NOTICE
ocfs2/dlm: Fix printing of lockname
ocfs2: Fix contiguousness check in ocfs2_try_to_merge_extent_map()
ocfs2/dlm: Remove BUG_ON in dlm recovery when freeing locks of a dead node
ocfs2: Plugs race between the dc thread and an unlock ast message
ocfs2: Remove overzealous BUG_ON during blocked lock processing
ocfs2: Do not downconvert if the lock level is already compatible
ocfs2: Prevent a livelock in dlmglue
ocfs2: Fix setting of OCFS2_LOCK_BLOCKED during bast
ocfs2: Use compat_ptr in reflink_arguments.
ocfs2/dlm: Handle EAGAIN for compatibility - v2
ocfs2: Add parenthesis to wrap the check for O_DIRECT.
ocfs2: Only bug out when page size is larger than cluster size.
ocfs2: Fix memory overflow in cow_by_page.
ocfs2/dlm: Print more messages during lock migration
ocfs2/dlm: Ignore LVBs of locks in the Blocked list
ocfs2/trivial: Remove trailing whitespaces
ocfs2: fix a misleading variable name
ocfs2: Sync max_inline_data_with_xattr from tools.
ocfs2: Fix refcnt leak on ocfs2_fast_follow_link() error path
If match_strdup() fail this function exits without freeing the options string.
Signed-off-by: Venkateswararao Jujjuri <jvrao@us.ibm.com>
Sigend-off-by: Eric Van Hensbergen <ericvh@gmail.com>
Options pointer is being moved before calling kfree() which seems
to cause problems. This uses a separate pointer to track and free
original allocation.
Signed-off-by: Venkateswararao Jujjuri <jvrao@us.ibm.com>
Signed-off-by: Eric Van Hensbergen <ericvh@gmail.com>w
Implement the fsync in the client side by marking stat field values to 'don't touch' so that server may
interpret it as a request to guarantee that the contents of the associated file are committed to stable
storage before the Rwstat message is returned.
Without this patch, calling fsync on a 9p file results in "Invalid argument" error. Please check the attached
C program.
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: M. Mohan Kumar <mohan@in.ibm.com>
Acked-by: Venkateswararao Jujjuri (JV) <jvrao@linux.vnet.ibm.com>
Signed-off-by: Eric Van Hensbergen <ericvh@gmail.com>
Connect and disconnect messages are more than informational as they are required
during root cause analysis for failures. This patch changes them from KERN_INFO
to KERN_NOTICE.
Signed-off-by: Sunil Mushran <sunil.mushran@oracle.com>
Acked-by: Mark Faseh <mfasheh@suse.com>
Signed-off-by: Joel Becker <joel.becker@oracle.com>
The debug call printing the name of the lock resource was chopping
off the last character. This patch fixes the problem.
Signed-off-by: Sunil Mushran <sunil.mushran@oracle.com>
Acked-by: Mark Fasheh <mfasheh@suse.com>
Signed-off-by: Joel Becker <joel.becker@oracle.com>
Commit f39bde24b2 fixed the error return from PUTROOTFH in the
case where there is no pseudofilesystem.
This is really a case we shouldn't hit on a correctly configured server:
in the absence of a root filehandle, there's no point accepting version
4 NFS rpc calls at all.
But the shared responsibility between kernel and userspace here means
the kernel on its own can't eliminate the possiblity of this happening.
And we have indeed gotten this wrong in distro's, so new client-side
mount code that attempts to negotiate v4 by default first has to work
around this case.
Therefore when commit f39bde24b2 arrived at roughly the same
time as the new v4-default mount code, which explicitly checked only for
the previous error, the result was previously fine mounts suddenly
failing.
We'll fix both sides for now: revert the error change, and make the
client-side mount workaround more robust.
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs-2.6:
Take ima_file_free() to proper place.
ima: rename PATH_CHECK to FILE_CHECK
ima: rename ima_path_check to ima_file_check
ima: initialize ima before inodes can be allocated
fix ima breakage
Take ima_path_check() in nfsd past dentry_open() in nfsd_open()
freeze_bdev: don't deactivate successfully frozen MS_RDONLY sb
befs: fix leak
This reverts commit 7036251180 ("tty: fix race in tty_fasync") and
commit b04da8bfdf ("fnctl: f_modown should call write_lock_irqsave/
restore") that tried to fix up some of the fallout but was incomplete.
It turns out that we really cannot hold 'tty->ctrl_lock' over calling
__f_setown, because not only did that cause problems with interrupt
disables (which the second commit fixed), it also causes a potential
ABBA deadlock due to lock ordering.
Thanks to Tetsuo Handa for following up on the issue, and running
lockdep to show the problem. It goes roughly like this:
- f_getown gets filp->f_owner.lock for reading without interrupts
disabled, so an interrupt that happens while that lock is held can
cause a lockdep chain from f_owner.lock -> sighand->siglock.
- at the same time, the tty->ctrl_lock -> f_owner.lock chain that
commit 7036251180 introduced, together with the pre-existing
sighand->siglock -> tty->ctrl_lock chain means that we have a lock
dependency the other way too.
So instead of extending tty->ctrl_lock over the whole __f_setown() call,
we now just take a reference to the 'pid' structure while holding the
lock, and then release it after having done the __f_setown. That still
guarantees that 'struct pid' won't go away from under us, which is all
we really ever needed.
Reported-and-tested-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Acked-by: Greg Kroah-Hartman <gregkh@suse.de>
Acked-by: Américo Wang <xiyou.wangcong@gmail.com>
Cc: stable@kernel.org
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
ima_path_check actually deals with files! call it ima_file_check instead.
Signed-off-by: Eric Paris <eparis@redhat.com>
Acked-by: Mimi Zohar <zohar@linux.vnet.ibm.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
The "Untangling ima mess, part 2 with counters" patch messed
up the counters. Based on conversations with Al Viro, this patch
streamlines ima_path_check() by removing the counter maintaince.
The counters are now updated independently, from measuring the file,
in __dentry_open() and alloc_file() by calling ima_counts_get().
ima_path_check() is called from nfsd and do_filp_open().
It also did not measure all files that should have been measured.
Reason: ima_path_check() got bogus value passed as mask.
[AV: mea culpa]
[AV: add missing nfsd bits]
Signed-off-by: Mimi Zohar <zohar@us.ibm.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Thanks Thomas and Christoph for testing and review.
I removed 'smp_wmb()' before up_write from the previous patch,
since up_write() should have necessary ordering constraints.
(I.e. the change of s_frozen is visible to others after up_write)
I'm quite sure the change is harmless but if you are uncomfortable
with Tested-by/Reviewed-by on the modified patch, please remove them.
If MS_RDONLY, freeze_bdev should just up_write(s_umount) instead of
deactivate_locked_super().
Also, keep sb->s_frozen consistent so that remount can check the frozen state.
Otherwise a crash reported here can happen:
http://lkml.org/lkml/2010/1/16/37http://lkml.org/lkml/2010/1/28/53
This patch should be applied for 2.6.32 stable series, too.
Reviewed-by: Christoph Hellwig <hch@lst.de>
Tested-by: Thomas Backlund <tmb@mandriva.org>
Signed-off-by: Jun'ichi Nomura <j-nomura@ce.jp.nec.com>
Cc: stable@kernel.org
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
The wrong member was compared in the continguousness check.
Acked-by: Tao Ma <tao.ma@oracle.com>
Signed-off-by: Roel Kluin <roel.kluin@gmail.com>
Signed-off-by: Joel Becker <joel.becker@oracle.com>
* git://git.kernel.org/pub/scm/linux/kernel/git/mason/btrfs-unstable:
Btrfs: apply updated fallocate i_size fix
Btrfs: do not try and lookup the file extent when finishing ordered io
Btrfs: Fix oopsen when dropping empty tree.
Btrfs: remove BUG_ON() due to mounting bad filesystem
Btrfs: make error return negative in btrfs_sync_file()
Btrfs: fix race between allocate and release extent buffer.
* 'bugfixes' of git://git.linux-nfs.org/projects/trondmy/nfs-2.6:
NFS: Don't clobber the attribute type in nfs_update_inode()
NFS: Fix a umount race
NFS: Fix an Oops when truncating a file
NFS: Ensure that we handle NFS4ERR_STALE_STATEID correctly
NFSv4.1: Don't call nfs4_schedule_state_recovery() unnecessarily
NFSv4: Don't allow posix locking against servers that don't support it
NFSv4: Ensure that the NFSv4 locking can recover from stateid errors
NFS: Avoid warnings when CONFIG_NFS_V4=n
NFS: Make nfs_commitdata_release static
NFS: Try to commit unstable writes in nfs_release_page()
NFS: Fix a reference leak in nfs_wb_cancel_page()
* git://git.kernel.org/pub/scm/linux/kernel/git/steve/gfs2-2.6-fixes:
GFS2: Extend umount wait coverage to full glock lifetime
GFS2: Wait for unlock completion on umount
This version of the i_size fix for fallocate makes sure we only update
the i_size when the current fallocate is really operating outside of
i_size.
Signed-off-by: Chris Mason <chris.mason@oracle.com>
When running the following fio job
[torrent]
filename=torrent-test
rw=randwrite
size=4g
filesize=4g
bs=4k
ioengine=sync
you would see long stalls where no work was being done. That is because we were
doing all this extra work to read in the file extent outside of the transaction,
however in the random io case this ends up hurting us because the file extents
are not there to begin with. So axe this logic, since we end up reading in the
file extent when we go to update it anyway. This took the fio job from 11 mb/s
with several ~10 second stalls to 24 mb/s to a couple of 1-2 second stalls.
Signed-off-by: Josef Bacik <josef@redhat.com>
Signed-off-by: Chris Mason <chris.mason@oracle.com>
When dropping a empty tree, walk_down_tree() skips checking
extent information for the tree root. This will triggers a
BUG_ON in walk_up_proc().
Signed-off-by: Yan Zheng <zheng.yan@oracle.com>
Signed-off-by: Chris Mason <chris.mason@oracle.com>
Mounting a bad filesystem caused a BUG_ON(). The following is steps to
reproduce it.
# mkfs.btrfs /dev/sda2
# mount /dev/sda2 /mnt
# mkfs.btrfs /dev/sda1 /dev/sda2
(the program says that /dev/sda2 was mounted, and then exits. )
# umount /mnt
# mount /dev/sda1 /mnt
At the third step, mkfs.btrfs exited in the way of make filesystem. So the
initialization of the filesystem didn't finish. So the filesystem was bad, and
it caused BUG_ON() when mounting it. But BUG_ON() should be called by the wrong
code, not user's operation, so I think it is a bug of btrfs.
This patch fixes it.
Signed-off-by: Miao Xie <miaox@cn.fujitsu.com>
Signed-off-by: Chris Mason <chris.mason@oracle.com>
Increase extent buffer's reference count while holding the lock.
Otherwise it can race with try_release_extent_buffer.
Signed-off-by: Yan Zheng <zheng.yan@oracle.com>
Signed-off-by: Chris Mason <chris.mason@oracle.com>
During recovery, the dlm frees the locks for the dead node. If it finds a
lock in a resource for the dead node, it expects that node to also have a
ref in that lock resource. If not, it BUGs.
ossbz#1175 was filed with the above BUG. Now, while it is correct that we
should be expecting the ref, I see no reason why we have to BUG. After all,
we are freeing up the lock and clearing the ref.
This patch replaces the BUG_ON with a printk(). Hopefully, that will give
us more clues next time this happens.
http://oss.oracle.com/bugzilla/show_bug.cgi?id=1175
Signed-off-by: Sunil Mushran <sunil.mushran@oracle.com>
Acked-by: Mark Fasheh <mfasheh@suse.com>
Signed-off-by: Joel Becker <joel.becker@oracle.com>
This patch plugs a race between the downconvert thread and an unlock ast message.
Specifically, after the downconvert worker has done its task, the dc thread needs
to check whether an unlock ast made the downconvert moot.
Reported-by: David Teigland <teigland@redhat.com>
Signed-off-by: Sunil Mushran <sunil.mushran@oracle.com>
Acked-by: Mark Fasheh <mfasheh@sus.com>
Signed-off-by: Joel Becker <joel.becker@oracle.com>
If the NFS_ATTR_FATTR_TYPE field isn't set in fattr->valid, then we should
not set the S_IFMT part of inode->i_mode.
Reported-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Ensure that we unregister the bdi before kill_anon_super() calls
ida_remove() on our device name.
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Cc: stable@kernel.org