xfs_mount_free mostly frees the perag data, which is something that is
duplicated in the mount error path.
Move the XFS_QM_DONE call to the caller and remove the useless
mutex_destroy/spinlock_destroy calls so that we can re-use it for the
mount error path. Also rename it to xfs_free_perag to reflect what it
does.
SGI-PV: 981498
SGI-Modid: xfs-linux-melb:xfs-kern:31836a
Signed-off-by: Christoph Hellwig <hch@infradead.org>
Signed-off-by: Lachlan McIlroy <lachlan@sgi.com>
xfs_readsb is called before xfs_mount so xfs_freesb should be called after
xfs_unmountfs, too. This means it now happens after a few things during
the of xfs_unmount which all have nothing to do with the superblock.
SGI-PV: 981498
SGI-Modid: xfs-linux-melb:xfs-kern:31835a
Signed-off-by: Christoph Hellwig <hch@infradead.org>
Signed-off-by: Lachlan McIlroy <lachlan@sgi.com>
xfs_unmounts can't and shouldn't return errors so declare it as returning
void.
SGI-PV: 981498
SGI-Modid: xfs-linux-melb:xfs-kern:31833a
Signed-off-by: Christoph Hellwig <hch@infradead.org>
Signed-off-by: Lachlan McIlroy <lachlan@sgi.com>
Remove all the useless flags and code keyed off it in xfs_mountfs.
SGI-PV: 981498
SGI-Modid: xfs-linux-melb:xfs-kern:31831a
Signed-off-by: Christoph Hellwig <hch@infradead.org>
Signed-off-by: Lachlan McIlroy <lachlan@sgi.com>
The root inode is allocated in xfs_mountfs so it should be release in
xfs_unmountfs. For the unmount case that means we do it after the the
xfs_sync(mp, SYNC_WAIT | SYNC_CLOSE) in the forced shutdown case and the
dmapi unmount event. Note that both reference the rip variable which might
be freed by that time in case inode flushing has kicked in, so strictly
speaking this might count as a bug fix
SGI-PV: 981498
SGI-Modid: xfs-linux-melb:xfs-kern:31830a
Signed-off-by: Christoph Hellwig <hch@infradead.org>
Signed-off-by: Lachlan McIlroy <lachlan@sgi.com>
xfs_ichtime updates the xfs_inode and Linux inode timestamps just fine, no
need to call file_update_time and then copy the values over to the XFS
inode. The only additional thing in file_update_time are checks not
applicable to the write path.
SGI-PV: 981498
SGI-Modid: xfs-linux-melb:xfs-kern:31829a
Signed-off-by: Christoph Hellwig <hch@infradead.org>
Signed-off-by: Lachlan McIlroy <lachlan@sgi.com>
Signed-off-by: David Chinner <david@fromorbit.com>
Port a little optmization from file_update_time to xfs_ichgtime, and only
update the timestamp and mark the inode dirty if the timestamp actually
changes in the timer tick resultion supported by the running kernel.
SGI-PV: 981498
SGI-Modid: xfs-linux-melb:xfs-kern:31827a
Signed-off-by: Christoph Hellwig <hch@infradead.org>
Signed-off-by: Lachlan McIlroy <lachlan@sgi.com>
In xfs_ialloc we just want to set all timestamps to the current time. We
don't need to mark the inode dirty like xfs_ichgtime does, and we don't
need nor want the opimizations in xfs_ichgtime that I will introduce in
the next patch.
So just opencode the timestamp update in xfs_ialloc, and remove the new
unused XFS_ICHGTIME_ACC case in xfs_ichgtime.
SGI-PV: 981498
SGI-Modid: xfs-linux-melb:xfs-kern:31825a
Signed-off-by: Christoph Hellwig <hch@infradead.org>
Signed-off-by: Lachlan McIlroy <lachlan@sgi.com>
Now that all users of the sema_t are gone from XFS we can finally kill it.
SGI-PV: 981498
SGI-Modid: xfs-linux-melb:xfs-kern:31823a
Signed-off-by: David Chinner <david@fromorbit.com>
Signed-off-by: Lachlan McIlroy <lachlan@sgi.com>
Use the new completion flush code to implement the dquot flush lock.
Removes one of the final users of semaphores in the XFS code base.
SGI-PV: 981498
SGI-Modid: xfs-linux-melb:xfs-kern:31822a
Signed-off-by: David Chinner <david@fromorbit.com>
Signed-off-by: Lachlan McIlroy <lachlan@sgi.com>
Use the new completion flush code to implement the inode flush lock.
Removes one of the final users of semaphores in the XFS code base.
SGI-PV: 981498
SGI-Modid: xfs-linux-melb:xfs-kern:31817a
Signed-off-by: David Chinner <david@fromorbit.com>
Signed-off-by: Lachlan McIlroy <lachlan@sgi.com>
The xfs_buf_t b_iodonesema is really just a semaphore that wants to be a
completion. Change it to a completion and remove the last user of the
sema_t from XFS.
SGI-PV: 981498
SGI-Modid: xfs-linux-melb:xfs-kern:31815a
Signed-off-by: David Chinner <david@fromorbit.com>
Signed-off-by: Lachlan McIlroy <lachlan@sgi.com>
A lot of code has been converted away from semaphores, but there are still
comments that reference semaphore behaviour. The log code is the worst
offender. Update the comments to reflect what the code really does now.
SGI-PV: 981498
SGI-Modid: xfs-linux-melb:xfs-kern:31814a
Signed-off-by: David Chinner <david@fromorbit.com>
Signed-off-by: Lachlan McIlroy <lachlan@sgi.com>
The alloc and inobt btree use the same agbp/agno pair in the btree_cur
union. Make them use the same bc_private.a union member so that code for
these two short form btree implementations can be shared.
SGI-PV: 981498
SGI-Modid: xfs-linux-melb:xfs-kern:31788a
Signed-off-by: Christoph Hellwig <hch@infradead.org>
Signed-off-by: Tim Shimmin <tes@sgi.com>
Signed-off-by: David Chinner <david@fromorbit.com>
Signed-off-by: Lachlan McIlroy <lachlan@sgi.com>
Sanitize setting up the Linux indode.
Setting up the xfs_inode <-> inode link is opencoded in xfs_iget_core now
because that's the only place it needs to be done, xfs_initialize_vnode is
renamed to xfs_setup_inode and loses all superflous paramaters. The check
for I_NEW is removed because it always is true and the di_mode check moves
into xfs_iget_core because it's only needed there.
xfs_set_inodeops and xfs_revalidate_inode are merged into xfs_setup_inode
and the whole things is moved into xfs_iops.c where it belongs.
SGI-PV: 981498
SGI-Modid: xfs-linux-melb:xfs-kern:31782a
Signed-off-by: Christoph Hellwig <hch@infradead.org>
Signed-off-by: Niv Sardi <xaiki@sgi.com>
Signed-off-by: Lachlan McIlroy <lachlan@sgi.com>
All remaining bhv_vnode_t instance are in code that's more or less Linux
specific. (Well, for xfs_acl.c that could be argued, but that code is on
the removal list, too). So just do an s/bhv_vnode_t/struct inode/ over the
whole tree. We can clean up variable naming and some useless helpers
later.
SGI-PV: 981498
SGI-Modid: xfs-linux-melb:xfs-kern:31781a
Signed-off-by: Christoph Hellwig <hch@infradead.org>
Signed-off-by: Lachlan McIlroy <lachlan@sgi.com>
In various places we can just move a VFS_I call into the argument list of
called functions/macros instead of having a local bhv_vnode_t.
SGI-PV: 981498
SGI-Modid: xfs-linux-melb:xfs-kern:31776a
Signed-off-by: Christoph Hellwig <hch@infradead.org>
Signed-off-by: Lachlan McIlroy <lachlan@sgi.com>
When multiple inodes are locked in XFS it happens in order of the inode
number, with the everything but the first inode trylocked if any of the
previous inodes is in the AIL.
Except for the sorting of the inodes this logic is implemented in
xfs_lock_inodes, but also partially duplicated in xfs_lock_dir_and_entry
in a particularly stupid way adds a lock roundtrip if the inode ordering
is not optimal.
This patch adds a new helper xfs_lock_two_inodes that takes two inodes and
locks them in the most optimal way according to the above locking protocol
and uses it for all places that want to lock two inodes.
The only caller of xfs_lock_inodes is xfs_rename which might lock up to
four inodes.
SGI-PV: 981498
SGI-Modid: xfs-linux-melb:xfs-kern:31772a
Signed-off-by: Christoph Hellwig <hch@infradead.org>
Signed-off-by: Donald Douwsma <donaldd@sgi.com>
Signed-off-by: Lachlan McIlroy <lachlan@sgi.com>
All the error injection is already enabled through ifdef DEBUG, so kill
the never set second cpp symbol to activate it without the rest of the
debugging infrastructure.
SGI-PV: 981498
SGI-Modid: xfs-linux-melb:xfs-kern:31771a
Signed-off-by: Christoph Hellwig <hch@infradead.org>
Signed-off-by: Niv Sardi <xaiki@sgi.com>
Signed-off-by: Lachlan McIlroy <lachlan@sgi.com>
Now that all direct calls to VN_HOLD/VN_RELE are gone we can implement
IHOLD/IRELE directly.
For the IHOLD case also replace igrab with a direct increment of i_count
because we are guaranteed to already have a live and referenced inode by
the VFS. Also remove the vn_hold statistic because it's been rather
meaningless for some time with most references done by other callers.
SGI-PV: 981498
SGI-Modid: xfs-linux-melb:xfs-kern:31764a
Signed-off-by: Christoph Hellwig <hch@infradead.org>
Signed-off-by: Lachlan McIlroy <lachlan@sgi.com>
All the ACL routines are called from inode operations which are guaranteed
to have a referenced inode by the VFS, so there's no need for the ACL code
to grab another temporary one.
SGI-PV: 981498
SGI-Modid: xfs-linux-melb:xfs-kern:31763a
Signed-off-by: Christoph Hellwig <hch@infradead.org>
Signed-off-by: Lachlan McIlroy <lachlan@sgi.com>
bhv_vnode_t is just a typedef for struct inode, so there's
no need for a helper to convert between the two.
SGI-PV: 981498
SGI-Modid: xfs-linux-melb:xfs-kern:31761a
Signed-off-by: Christoph Hellwig <hch@infradead.org>
Signed-off-by: Lachlan McIlroy <lachlan@sgi.com>
bhv_vnode_t is just a typedef for struct inode, so there's
no need for a helper to convert between the two.
SGI-PV: 981498
SGI-Modid: xfs-linux-melb:xfs-kern:31760a
Signed-off-by: Christoph Hellwig <hch@infradead.org>
Signed-off-by: Lachlan McIlroy <lachlan@sgi.com>
Looks like somehow xfs got missed in the conversion that took place in
e231c2ee64, "Convert ERR_PTR(PTR_ERR(p))
instances to ERR_CAST(p)
<http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commit
diff;h=e231c2ee64eb1c5cd3c63c31da9dac7d888dcf7f>"
SGI-PV: 981498
SGI-Modid: xfs-linux-melb:xfs-kern:31757a
Signed-off-by: Eric Sandeen <sandeen@sandeen.net>
Acked-by: David Howells <dhowells@redhat.com>
Signed-off-by: Niv Sardi <xaiki@sgi.com>
Signed-off-by: Christoph Hellwig <hch@infradead.org>
Signed-off-by: David Howells <dhowells@redhat.com>
Signed-off-by: Lachlan McIlroy <lachlan@sgi.com>
Thanks to hch's endian work, INT_GET etc are no longer used, and may as
well be removed. INT_SET is still used in the acl code, though.
SGI-PV: 981498
SGI-Modid: xfs-linux-melb:xfs-kern:31756a
Signed-off-by: Eric Sandeen <sandeen@sandeen.net>
Signed-off-by: Niv Sardi <xaiki@sgi.com>
Signed-off-by: Christoph Hellwig <hch@infradead.org>
Signed-off-by: Lachlan McIlroy <lachlan@sgi.com>
Move it from the attr code to the transaction code and make
the attr code call the new function.
We rolltrans is really usefull whenever we want to use rolling
transaction, should be generic, it isn't dependent on any part
of the attr code anyway.
We use this excuse to change all the:
if ((error = xfs_attr_rolltrans()))
calls into:
error = xfs_trans_roll();
if (error)
SGI-PV: 981498
SGI-Modid: xfs-linux-melb:xfs-kern:31729a
Signed-off-by: Niv Sardi <xaiki@sgi.com>
Signed-off-by: Christoph Hellwig <hch@infradead.org>
Signed-off-by: Lachlan McIlroy <lachlan@sgi.com>
Add a helper to free the m_fsname/m_rtname/m_logname allocations and use
it properly for all mount failure cases. Also switch the allocations for
these to kstrdup while we're at it.
SGI-PV: 981498
SGI-Modid: xfs-linux-melb:xfs-kern:31728a
Signed-off-by: Christoph Hellwig <hch@infradead.org>
Signed-off-by: Niv Sardi <xaiki@sgi.com>
Signed-off-by: David Chinner <david@fromorbit.com>
Signed-off-by: Lachlan McIlroy <lachlan@sgi.com>
We will need that to be able to calculate the size of log we need for a
specific attr (for Create+EA). The local flag is needed so that we can
fail if we run into ENOSPC when trying to alloc blocks.
SGI-PV: 981498
SGI-Modid: xfs-linux-melb:xfs-kern:31727a
Signed-off-by: Niv Sardi <xaiki@sgi.com>
Signed-off-by: Christoph Hellwig <hch@infradead.org>
Signed-off-by: Tim Shimmin <tes@sgi.com>
Signed-off-by: Lachlan McIlroy <lachlan@sgi.com>
If we allow incore extent tree allocations to recurse into the
filesystem under memory pressure, new delayed allocations through
xfs_iomap_write_delay() can deadlock on themselves if memory
reclaim tries to write back dirty pages from that inode.
It will deadlock in xfs_iomap_write_allocate() trying to take the
ilock we already hold. This can also show up as complex ABBA deadlocks
when multiple threads are triggering memory reclaim when trying to
allocate extents.
The main cause of this is the fact that delayed allocation is not done in
a transaction, so KM_NOFS is not automatically added to the allocations to
prevent this recursion.
Mark all allocations done for the incore inode extent tree as KM_NOFS to
ensure they never recurse back into the filesystem.
Version 2: o KM_NOFS implies KM_SLEEP, so just use KM_NOFS
SGI-PV: 981498
SGI-Modid: xfs-linux-melb:xfs-kern:31726a
Signed-off-by: David Chinner <david@fromorbit.com>
Signed-off-by: Niv Sardi <xaiki@sgi.com>
Signed-off-by: Christoph Hellwig <hch@infradead.org>
Signed-off-by: Lachlan McIlroy <lachlan@sgi.com>
xfs_vtoi() is redundant and only unsed in small sections of code.
Replace them with widely used XFS_I() inline and kill xfs_vtoi().
SGI-PV: 981498
SGI-Modid: xfs-linux-melb:xfs-kern:31725a
Signed-off-by: David Chinner <david@fromorbit.com>
Signed-off-by: Niv Sardi <xaiki@sgi.com>
Signed-off-by: Christoph Hellwig <hch@infradead.org>
Signed-off-by: Lachlan McIlroy <lachlan@sgi.com>
In several places we directly convert from the XFS inode
to the linux (VFS) inode by a simple deference of ip->i_vnode.
We should not do this - a helper function should be used to
extract the VFS inode from the XFS inode.
Introduce the function VFS_I() to extract the VFS inode
from the XFS inode. The name was chosen to match XFS_I() which
is used to extract the XFS inode from the VFS inode.
SGI-PV: 981498
SGI-Modid: xfs-linux-melb:xfs-kern:31720a
Signed-off-by: David Chinner <david@fromorbit.com>
Signed-off-by: Niv Sardi <xaiki@sgi.com>
Signed-off-by: Christoph Hellwig <hch@infradead.org>
Signed-off-by: Lachlan McIlroy <lachlan@sgi.com>
We should not access a buffer after dropping it's reference count
otherwise we could race with another thread that releases the final
reference count and frees the buffer causing us to access potentially
unmapped memory. The bug this change fixes only occured on DEBUG XFS since
the offending code was in an ASSERT.
SGI-PV: 984429
SGI-Modid: xfs-linux-melb:xfs-kern:31715a
Signed-off-by: Lachlan McIlroy <lachlan@sgi.com>
Signed-off-by: David Chinner <david@fromorbit.com>
This keeps xfs_lowbit64 as it was since there aren't good generic helpers
there ... Patch inspired by Andi Kleen.
SGI-PV: 981498
SGI-Modid: xfs-linux-melb:xfs-kern:31472a
Signed-off-by: David Chinner <david@fromorbit.com>
Signed-off-by: Eric Sandeen <sandeen@sandeen.net>
Signed-off-by: Donald Douwsma <donaldd@sgi.com>
Signed-off-by: Lachlan McIlroy <lachlan@sgi.com>
Short enough reads from /proc/irq/*/smp_affinity return -EINVAL for no
good reason.
This became noticed with NR_CPUS=4096 patches, when length of printed
representation of cpumask becase 1152, but cat(1) continued to read with
1024-byte chunks. bitmap_scnprintf() in good faith fills buffer, returns
1023, check returns -EINVAL.
Fix it by switching to seq_file, so handler will just fill buffer and
doesn't care about offsets, length, filling EOF and all this crap.
For that add seq_bitmap(), and wrappers around it -- seq_cpumask() and
seq_nodemask().
Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Reviewed-by: Paul Jackson <pj@sgi.com>
Cc: Mike Travis <travis@sgi.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
This provides a FLAT_PLAT_INIT() arch hook for platforms that need to set
up specific register state prior to calling in to the process, as per
ELF_PLAT_INIT().
Signed-off-by: Takashi YOSHII <yoshii.takashi@renesas.com>
Signed-off-by: Paul Mundt <lethal@linux-sh.org>
the names were too generic:
drivers/uio/uio.c:87: error: expected identifier or '(' before 'do'
drivers/uio/uio.c:87: error: expected identifier or '(' before 'while'
drivers/uio/uio.c:113: error: 'map_release' undeclared here (not in a function)
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Most the free-standing lock_acquire() usages look remarkably similar, sweep
them into a new helper.
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
* 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/sfrench/cifs-2.6:
[CIFS] list entry can not return null
turn cifs_setattr into a multiplexor that calls the correct function
move file time and dos attribute setting logic into new function
spin off cifs_setattr with unix extensions to its own function
[CIFS] Code cleanup in old sessionsetup code
[CIFS] cifs_mkdir and cifs_create should respect the setgid bit on parent dir
Rename CIFSSMBSetFileTimes to CIFSSMBSetFileInfo and add PID arg
change CIFSSMBSetTimes to CIFSSMBSetPathInfo
[CIFS] fix trailing whitespace
bundle up Unix SET_PATH_INFO args into a struct and change name
Fix missing braces in cifs_revalidate()
remove locking around tcpSesAllocCount atomic variable
[CIFS] properly account for new user= field in SPNEGO upcall string allocation
[CIFS] remove level of indentation from decode_negTokenInit
[CIFS] cifs send2 not retrying enough in some cases on full socket
[CIFS] oid should also be checked against class in cifs asn
There doesn't seem to be a compelling reason why nfsd4_op_name() is
marked as "inline":
It's only used in a dprintk(), and as long as it has only one caller
non-ancient gcc versions anyway inline it automatically.
This patch fixes the following compile error with gcc 3.4:
...
CC fs/nfsd/nfs4proc.o
nfs4proc.c: In function `nfsd4_proc_compound':
nfs4proc.c:854: sorry, unimplemented: inlining failed in call to
nfs4proc.c:897: sorry, unimplemented: called from here
make[3]: *** [fs/nfsd/nfs4proc.o] Error 1
Reported-by: Adrian Bunk <bunk@kernel.org>
Signed-off-by: Adrian Bunk <bunk@kernel.org>
[ Also made it "const char *" - Linus]
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Break up cifs_setattr further by moving the logic that sets file times
and dos attributes into a separate function. This patch also refactors
the logic a bit so that when the file is already open then we go ahead
and do a SetFileInfo call. SetPathInfo seems to be unreliable when
setting times on open files.
Signed-off-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: Steve French <sfrench@us.ibm.com>
Create a new cifs_setattr_unix function to handle a setattr when unix
extensions are enabled and have cifs_setattr call it. Also, clean up
variable declarations in cifs_setattr.
Signed-off-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: Steve French <sfrench@us.ibm.com>
Since introduced in 7ba1ba12ee, it should be made use of.
Signed-off-by: Denis ChengRq <crquan@gmail.com>
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
If a server supports unix extensions but does not support POSIX create
routines, then the client will create a new inode with a standard SMB
mkdir or create/open call and then will set the mode. When it does this,
it does not take the setgid bit on the parent directory into account.
This patch has CIFS flip on the setgid bit when the parent directory has
it. If the share is mounted with "setuids" then also change the group
owner to the gid of the parent.
This patch should apply cleanly on top of the setattr cleanup patches
that I sent a few weeks ago.
Signed-off-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: Steve French <sfrench@us.ibm.com>
The new name is more clear since this is also used to set file
attributes. We'll need the pid_of_opener arg so that we can
pass in filehandles of other pids and spare ourselves an open
call.
Signed-off-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: Steve French <sfrench@us.ibm.com>
CIFSSMBSetTimes is a deceptive name. This function does more that just
set file times. Change it to CIFSSMBSetPathInfo, which is closer to its
real purpose.
Signed-off-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: Steve French <sfrench@us.ibm.com>
We'd like to be able to use the unix SET_PATH_INFO_BASIC args to set
file times as well, but that makes the argument list rather long. Bundle
up the args for unix SET_PATH_INFO call into a struct. For now, we don't
actually use the times fields anywhere. That will be done in a follow-on
patch.
Signed-off-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: Steve French <sfrench@us.ibm.com>
proc: fix warnings
fs/proc/base.c:2429: warning: format '%llu' expects type 'long long unsigned int', but argument 3 has type 'u64'
fs/proc/base.c:2429: warning: format '%llu' expects type 'long long unsigned int', but argument 4 has type 'u64'
fs/proc/base.c:2429: warning: format '%llu' expects type 'long long unsigned int', but argument 5 has type 'u64'
fs/proc/base.c:2429: warning: format '%llu' expects type 'long long unsigned int', but argument 6 has type 'u64'
fs/proc/base.c:2429: warning: format '%llu' expects type 'long long unsigned int', but argument 7 has type 'u64'
fs/proc/base.c:2429: warning: format '%llu' expects type 'long long unsigned int', but argument 8 has type 'u64'
fs/proc/base.c:2429: warning: format '%llu' expects type 'long long unsigned int', but argument 9 has type 'u64'
Signed-off-by: Alexander Beregalov <a.beregalov@gmail.com>
Acked-by: Andrea Righi <righi.andrea@gmail.com>
Cc: Oleg Nesterov <oleg@tv-sign.ru>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
fs/omfs/inode.c:495: warning: format '%llx' expects type 'long long
unsigned int', but argument 2 has type 'u64'
fs/omfs/inode.c:495: warning: format '%llx' expects type 'long
long unsigned int', but argument 3 has type '__be64'
Signed-off-by: Alexander Beregalov <a.beregalov@gmail.com>
Acked-by: Bob Copeland <me@bobcopeland.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Fix missing braces introduced during commit
cea218054a. Though setting wbrc to 0
keeps this from causing real bug, this should have been there.
Acked-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: Suresh Jayaraman <sjayaraman@suse.de>
Signed-off-by: Steve French <sfrench@us.ibm.com>
Like the page lock change, this also requires name change, so convert the
raw test_and_set bitop to a trylock.
Signed-off-by: Nick Piggin <npiggin@suse.de>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Converting page lock to new locking bitops requires a change of page flag
operation naming, so we might as well convert it to something nicer
(!TestSetPageLocked_Lock => trylock_page, SetPageLocked => set_page_locked).
This also facilitates lockdeping of page lock.
Signed-off-by: Nick Piggin <npiggin@suse.de>
Acked-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Acked-by: Peter Zijlstra <peterz@infradead.org>
Acked-by: Andrew Morton <akpm@linux-foundation.org>
Acked-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
This reverts commit f9247273cb (and
fb2e405fc1 - "fix fs/nfs/nfsroot.c
compilation" - that fixed a missed conversion).
The changes cause problems for at least the sparc build. Let's re-do
them when the exact issues are resolved.
Requested-by: Andrew Morton <akpm@linux-foundation.org>
Requested-by: Steven Whitehouse <swhiteho@redhat.com>
Cc: David Miller <davem@davemloft.net>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
The global tcpSesAllocCount variable is an atomic already and doesn't
really need the extra locking around it. Remove the locking and just use
the atomic_inc_return and atomic_dec_return functions to make sure we
access it correctly.
Signed-off-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: Steve French <sfrench@us.ibm.com>
* 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4:
ext4: remove write-only variables from ext4_ordered_write_end
ext4: unexport jbd2_journal_update_superblock
ext4: Cleanup whitespace and other miscellaneous style issues
ext4: improve ext4_fill_flex_info() a bit
ext4: Cleanup the block reservation code path
ext4: don't assume extents can't cross block groups when truncating
ext4: Fix lack of credits BUG() when deleting a badly fragmented inode
ext4: Fix ext4_ext_journal_restart()
ext4: fix ext4_da_write_begin error path
jbd2: don't abort if flushing file data failed
ext4: don't read inode block if the buffer has a write error
ext4: Don't allow lg prealloc list to be grow large.
ext4: Convert the usage of NR_CPUS to nr_cpu_ids.
ext4: Improve error handling in mballoc
ext4: lock block groups when initializing
ext4: sync up block and inode bitmap reading functions
ext4: Allow read/only mounts with corrupted block group checksums
ext4: Fix data corruption when writing to prealloc area
The variables 'from' and 'to' are not used anywhere.
Signed-off-by: Eric Sandeen <sandeen@redhat.com>
Acked-by: Mingming Cao <cmm@us.ibm.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
FAT has to handle the newly introduced ATTR_TIMES_SET for allow_utime
option.
Signed-off-by: OGAWA Hirofumi <hirofumi@mail.parknet.co.jp>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* 'for-linus' of git://oss.sgi.com:8090/xfs/xfs-pull: (64 commits)
[XFS] Remove vn_revalidate calls in xfs.
[XFS] Now that xfs_setattr is only used for attributes set from ->setattr
[XFS] xfs_setattr currently doesn't just handle the attributes set through
[XFS] fix use after free with external logs or real-time devices
[XFS] A bug was found in xfs_bmap_add_extent_unwritten_real(). In a
[XFS] fix compilation without CONFIG_PROC_FS
[XFS] s/XFS_PURGE_INODE/IRELE/g s/VN_HOLD(XFS_ITOV())/IHOLD()/
[XFS] fix mount option parsing in remount
[XFS] Disable queue flag test in barrier check.
[XFS] streamline init/exit path
[XFS] Fix up problem when CONFIG_XFS_POSIX_ACL is not set and yet we still
[XFS] Don't assert if trying to mount with blocksize > pagesize
[XFS] Don't update mtime on rename source
[XFS] Allow xfs_bmbt_split() to fallback to the lowspace allocator
[XFS] Restore the lowspace extent allocator algorithm
[XFS] use minleft when allocating in xfs_bmbt_split()
[XFS] attrmulti cleanup
[XFS] Check for invalid flags in xfs_attrlist_by_handle.
[XFS] Fix CI lookup in leaf-form directories
[XFS] Use the generic xattr methods.
...
* 'upstream-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mfasheh/ocfs2:
[PATCH] ocfs2: Release mutex in error handling code
[PATCH] ocfs2: Fix oops when racing files truncates with writes into an mmap region
[PATCH 2/2] ocfs2: Fix race between mount and recovery
[PATCH 1/2] ocfs2: Add counter in struct ocfs2_dinode to track journal replays
[PATCH] configfs: Convenience macros for attribute definition.
[PATCH] configfs: Pin configfs subsystems separately from new config_items.
[PATCH] configfs: Fix open directory making rmdir() fail
[PATCH] configfs: Lock new directory inodes before removing on cleanup after failure
[PATCH] configfs: Prevent userspace from creating new entries under attaching directories
[PATCH] configfs: Fix failing symlink() making rmdir() fail
[PATCH] configfs: Fix symlink() to a removing item
[PATCH] configfs: Include linux/err.h in linux/configfs.h
* git://git.infradead.org/mtd-2.6:
[MTD] [NAND] drivers/mtd/nand/nandsim.c: fix printk warnings
[MTD] [NAND] Blackfin NFC Driver: Cleanup the error exit path of bf5xx_nand_probe function
[MTD] [NAND] Blackfin NFC Driver: use standard dev_err() rather than printk()
[MTD] [NAND] Blackfin NFC Driver: enable Blackfin nand HWECC support by default
[MTD] [NAND] Blackfin NFC Driver: add proper devinit/devexit markings to probe/remove functions
[MTD] [NAND] Blackfin NFC Driver: add support for the ECC layout the Blackfin bootrom uses
[MTD] [NAND] Blackfin NFC Driver: fix bug - hw ecc calc by making sure we extract 11 bits from each register instead of 10
[MTD] [NAND] Blackfin NFC Driver: fix bug - do not clobber the status from the first 256 bytes if operating on 512 pages
[MTD] [NAND] diskonchip.c fix sparse endian warnings
[MTD] [NAND] drivers/mtd/nand/nandsim.c needs div64.h
[JFFS2] Fix allocation of summary buffer
Fix rename of at91_nand -> atmel_nand
[MTD] [NOR] drivers/mtd/chips/jedec_probe.c: fix Am29DL800BB device ID
[MTD] MTD_DEBUG always does compile-time typechecks
[MTD] DataFlash: bugfix, binary page sizes now handled
[MTD] [NAND] fsl_elbc_nand.c: fix printk warning
[MTD] [NAND] nandsim: support random page read command
[MTD] [NAND] fix subpage read for small page NAND
...it doesn't look like it's being accounted for at the moment. Also
try to reorganize the calculation to make it a little more evident
what each piece means.
This should probably go to the stable series as well...
Signed-off-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: Steve French <sfrench@us.ibm.com>
proc doesn't use "associate pointer with id" feature of IDR, so switch
to IDA.
NOTE, NOTE, NOTE:
Do not apply if release_inode_number() still mantions MAX_ID_MASK!
Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Id which proc gets from IDR for inode number and id which proc removes
from IDR do not match. E.g. 0x11a transforms into 0x8000011a.
Which stayed unnoticed for a long time because, surprise, idr_remove()
masks out that high bit before doing anything.
All of this due to "| ~MAX_ID_MASK" in release_inode_number().
I still don't understand how it's supposed to work, because "| ~MASK"
is not an inversion for "& MAX" operation.
So, use just one nice, working addition. Make start offset unsigned int,
while I'm at it. It's longness is not used anywhere.
Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
* new helper: vfs_quota_on_path(); equivalent of vfs_quota_on() sans the
pathname resolution.
* callers of vfs_quota_on() that do their own pathname resolution and
checks based on it are switched to vfs_quota_on_path(); that way we
avoid the races.
* reiserfs leaked dentry/vfsmount references on several failure exits.
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
New primitive: alloc_fd(start, flags). get_unused_fd() and
get_unused_fd_flags() become wrappers on top of it.
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
for July 17: early crash on x86-64)
SELinux needs MAY_APPEND to be passed down to the security hook.
Otherwise, we get permission denials when only append permission is
granted by policy even if the opening process specified O_APPEND.
Shows up as a regression in the ltp selinux testsuite, fixed by
this patch.
Signed-off-by: Stephen Smalley <sds@tycho.nsa.gov>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
We can't use vmalloc for the buffer we use for writing summaries,
because some drivers may want to DMA from it. So limit the size to 64KiB
and use kmalloc for it instead.
Signed-off-by: David Woodhouse <David.Woodhouse@intel.com>
The mutex is released on a successful return, so it would seem that it
should be released on an error return as well.
The semantic patch finds this problem is as follows:
(http://www.emn.fr/x-info/coccinelle/)
// <smpl>
@@
expression l;
@@
mutex_lock(l);
... when != mutex_unlock(l)
when any
when strict
(
if (...) { ... when != mutex_unlock(l)
+ mutex_unlock(l);
return ...;
}
|
mutex_unlock(l);
)
// </smpl>
Signed-off-by: Julia Lawall <julia@diku.dk>
Signed-off-by: Mark Fasheh <mfasheh@suse.com>
This patch fixes an oops that is reproduced when one races writes to a mmap-ed
region with another process truncating the file.
Signed-off-by: Sunil Mushran <sunil.mushran@oracle.com>
Signed-off-by: Mark Fasheh <mfasheh@suse.com>
As the fs recovery is asynchronous, there is a small chance that another
node can mount (and thus recover) the slot before the recovery thread
gets to it.
If this happens, the recovery thread will block indefinitely on the
journal/slot lock as that lock will be held for the duration of the mount
(by design) by the node assigned to that slot.
The solution implemented is to keep track of the journal replays using
a recovery generation in the journal inode, which will be incremented by the
thread replaying that journal. The recovery thread, before attempting the
blocking lock on the journal/slot lock, will compare the generation on disk
with what it has cached and skip recovery if it does not match.
This bug appears to have been inadvertently introduced during the mount/umount
vote removal by mainline commit 34d024f843. In the
mount voting scheme, the messaging would indirectly indicate that the slot
was being recovered.
Signed-off-by: Sunil Mushran <sunil.mushran@oracle.com>
Signed-off-by: Mark Fasheh <mfasheh@suse.com>
This patch renames the ij_pad to ij_recovery_generation in struct ocfs2_dinode.
This will be used to keep count of journal replays after an unclean shutdown.
Signed-off-by: Sunil Mushran <sunil.mushran@oracle.com>
Signed-off-by: Mark Fasheh <mfasheh@suse.com>
configfs_mkdir() creates a new item by calling its parent's
->make_item/group() functions. Once that object is created,
configfs_mkdir() calls try_module_get() on the new item's module. If it
succeeds, the module owning the new item cannot be unloaded, and
configfs is safe to reference the item.
If the item and the subsystem it belongs to are part of the same module,
the subsystem is also pinned. This is the common case.
However, if the subsystem is made up of multiple modules, this may not
pin the subsystem. Thus, it would be possible to unload the toplevel
subsystem module while there is still a child item. Thus, we now
try_module_get() the subsystem's module. This only really affects
children of the toplevel subsystem group. Deeper children already have
their parents pinned.
Signed-off-by: Joel Becker <joel.becker@oracle.com>
Signed-off-by: Mark Fasheh <mfasheh@suse.com>
When checking for user-created elements under an item to be removed by rmdir(),
configfs_detach_prep() counts fake configfs_dirents created by dir_open() as
user-created and fails when finding one. It is however perfectly valid to remove
a directory that is open.
Simply make configfs_detach_prep() skip fake configfs_dirent, like it already
does for attributes, and like detach_groups() does.
Signed-off-by: Louis Rilling <louis.rilling@kerlabs.com>
Signed-off-by: Joel Becker <joel.becker@oracle.com>
Signed-off-by: Mark Fasheh <mfasheh@suse.com>
Once a new configfs directory is created by configfs_attach_item() or
configfs_attach_group(), a failure in the remaining initialization steps leads
to removing a directory which inode the VFS may have already accessed.
This commit adds the necessary inode locking to safely remove configfs
directories while cleaning up after a failure. As an advantage, the locking
rules of populate_groups() and detach_groups() become the same: the caller must
have the group's inode mutex locked.
Signed-off-by: Louis Rilling <louis.rilling@kerlabs.com>
Signed-off-by: Joel Becker <joel.becker@oracle.com>
Signed-off-by: Mark Fasheh <mfasheh@suse.com>
process 1: process 2:
configfs_mkdir("A")
attach_group("A")
attach_item("A")
d_instantiate("A")
populate_groups("A")
mutex_lock("A")
attach_group("A/B")
attach_item("A")
d_instantiate("A/B")
mkdir("A/B/C")
do_path_lookup("A/B/C", LOOKUP_PARENT)
ok
lookup_create("A/B/C")
mutex_lock("A/B")
ok
configfs_mkdir("A/B/C")
ok
attach_group("A/C")
attach_item("A/C")
d_instantiate("A/C")
populate_groups("A/C")
mutex_lock("A/C")
attach_group("A/C/D")
attach_item("A/C/D")
failure
mutex_unlock("A/C")
detach_groups("A/C")
nothing to do
mkdir("A/C/E")
do_path_lookup("A/C/E", LOOKUP_PARENT)
ok
lookup_create("A/C/E")
mutex_lock("A/C")
ok
configfs_mkdir("A/C/E")
ok
detach_item("A/C")
d_delete("A/C")
mutex_unlock("A")
detach_groups("A")
mutex_lock("A/B")
detach_group("A/B")
detach_groups("A/B")
nothing since no _default_ group
detach_item("A/B")
mutex_unlock("A/B")
d_delete("A/B")
detach_item("A")
d_delete("A")
Two bugs:
1/ "A/B/C" and "A/C/E" are created, but never removed while their parent are
removed in the end. The same could happen with symlink() instead of mkdir().
2/ "A" and "A/C" inodes are not locked while detach_item() is called on them,
which may probably confuse VFS.
This commit fixes 1/, tagging new directories with CONFIGFS_USET_CREATING before
building the inode and instantiating the dentry, and validating the whole
group+default groups hierarchy in a second pass by clearing
CONFIGFS_USET_CREATING.
mkdir(), symlink(), lookup(), and dir_open() simply return -ENOENT if
called in (or linking to) a directory tagged with CONFIGFS_USET_CREATING. This
does not prevent userspace from calling stat() successfuly on such directories,
but this prevents userspace from adding (children to | symlinking from/to |
read/write attributes of | listing the contents of) not validated items. In
other words, userspace will not interact with the subsystem on a new item until
the new item creation completes correctly.
It was first proposed to re-use CONFIGFS_USET_IN_MKDIR instead of a new
flag CONFIGFS_USET_CREATING, but this generated conflicts when checking the
target of a new symlink: a valid target directory in the middle of attaching
a new user-created child item could be wrongly detected as being attached.
2/ is fixed by next commit.
Signed-off-by: Louis Rilling <louis.rilling@kerlabs.com>
Signed-off-by: Joel Becker <joel.becker@oracle.com>
Signed-off-by: Mark Fasheh <mfasheh@suse.com>
On a similar pattern as mkdir() vs rmdir(), a failing symlink() may make rmdir()
fail for the symlink's parent and the symlink's target as well.
failing symlink() making target's rmdir() fail:
process 1: process 2:
symlink("A/S" -> "B")
allow_link()
create_link()
attach to "B" links list
rmdir("B")
detach_prep("B")
error because of new link
configfs_create_link("A", "S")
error (eg -ENOMEM)
failing symlink() making parent's rmdir() fail:
process 1: process 2:
symlink("A/D/S" -> "B")
allow_link()
create_link()
attach to "B" links list
configfs_create_link("A/D", "S")
make_dirent("A/D", "S")
rmdir("A")
detach_prep("A")
detach_prep("A/D")
error because of "S"
create("S")
error (eg -ENOMEM)
We cannot use the same solution as for mkdir() vs rmdir(), since rmdir() on the
target cannot wait on the i_mutex of the new symlink's parent without risking a
deadlock (with other symlink() or sys_rename()). Instead we define a global
mutex protecting all configfs symlinks attachment, so that rmdir() can avoid the
races above.
Signed-off-by: Louis Rilling <louis.rilling@kerlabs.com>
Signed-off-by: Joel Becker <joel.becker@oracle.com>
Signed-off-by: Mark Fasheh <mfasheh@suse.com>
The rule for configfs symlinks is that symlinks always point to valid
config_items, and prevent the target from being removed. However,
configfs_symlink() only checks that it can grab a reference on the target item,
without ensuring that it remains alive until the symlink is correctly attached.
This patch makes configfs_symlink() fail whenever the target is being removed,
using the CONFIGFS_USET_DROPPING flag set by configfs_detach_prep() and
protected by configfs_dirent_lock.
This patch introduces a similar (weird?) behavior as with mkdir failures making
rmdir fail: if symlink() races with rmdir() of the parent directory (or its
youngest user-created ancestor if parent is a default group) or rmdir() of the
target directory, and then fails in configfs_create(), this can make the racing
rmdir() fail despite the concerned directory having no user-created entry (resp.
no symlink pointing to it or one of its default groups) in the end.
This behavior is fixed in later patches.
Signed-off-by: Louis Rilling <louis.rilling@kerlabs.com>
Signed-off-by: Joel Becker <joel.becker@oracle.com>
Signed-off-by: Mark Fasheh <mfasheh@suse.com>
We now use PTR_ERR() in the ->make_item() and ->make_group() operations.
Folks including configfs.h need err.h.
Signed-off-by: Joel Becker <joel.becker@oracle.com>
Signed-off-by: Mark Fasheh <mfasheh@suse.com>