When building v2.6.31-rc2-344-g69ca06c, the following build errors are
found due to missing includes:
CC [M] fs/fuse/dev.o
fs/fuse/dev.c: In function ‘request_end’:
fs/fuse/dev.c:289: error: ‘BLK_RW_SYNC’ undeclared (first use in this function)
...
fs/nfs/write.c: In function ‘nfs_set_page_writeback’:
fs/nfs/write.c:207: error: ‘BLK_RW_ASYNC’ undeclared (first use in this function)
Signed-off-by: Larry Finger@lwfinger.net>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ryusuke/nilfs2:
nilfs2: fix disorder in cp count on error during deleting checkpoints
nilfs2: fix lockdep warning between regular file and inode file
nilfs2: fix incorrect KERN_CRIT messages in case of write failures
nilfs2: fix hang problem of log writer which occurs after write failures
nilfs2: remove unlikely directive causing mis-conversion of error code
I overlooked SG_DXFER_TO_FROM_DEV support when I converted sg to use
the block layer mapping API (2.6.28).
Douglas Gilbert explained SG_DXFER_TO_FROM_DEV:
http://www.spinics.net/lists/linux-scsi/msg37135.html
=
The semantics of SG_DXFER_TO_FROM_DEV were:
- copy user space buffer to kernel (LLD) buffer
- do SCSI command which is assumed to be of the DATA_IN
(data from device) variety. This would overwrite
some or all of the kernel buffer
- copy kernel (LLD) buffer back to the user space.
The idea was to detect short reads by filling the original
user space buffer with some marker bytes ("0xec" it would
seem in this report). The "resid" value is a better way
of detecting short reads but that was only added this century
and requires co-operation from the LLD.
=
This patch changes the block layer mapping API to support this
semantics. This simply adds another field to struct rq_map_data and
enables __bio_copy_iov() to copy data from user space even with READ
requests.
It's better to add the flags field and kills null_mapped and the new
from_user fields in struct rq_map_data but that approach makes it
difficult to send this patch to stable trees because st and osst
drivers use struct rq_map_data (they were converted to use the block
layer in 2.6.29 and 2.6.30). Well, I should clean up the block layer
mapping API.
zhou sf reported this regiression and tested this patch:
http://www.spinics.net/lists/linux-scsi/msg37128.htmlhttp://www.spinics.net/lists/linux-scsi/msg37168.html
Reported-by: zhou sf <sxzzsf@gmail.com>
Tested-by: zhou sf <sxzzsf@gmail.com>
Cc: stable@kernel.org
Signed-off-by: FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp>
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
Commit 1faa16d228 accidentally broke
the bdi congestion wait queue logic, causing us to wait on congestion
for WRITE (== 1) when we really wanted BLK_RW_ASYNC (== 0) instead.
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
* git://git.kernel.org/pub/scm/linux/kernel/git/sfrench/cifs-2.6:
cifs: when ATTR_READONLY is set, only clear write bits on non-directories
cifs: remove cifsInodeInfo->inUse counter
cifs: convert cifs_get_inode_info and non-posix readdir to use cifs_iget
[CIFS] update cifs version number
cifs: add and use CIFSSMBUnixSetFileInfo for setattr calls
cifs: make a separate function for filling out FILE_UNIX_BASIC_INFO
cifs: rename CIFSSMBUnixSetInfo to CIFSSMBUnixSetPathInfo
cifs: add pid of initiating process to spnego upcall info
cifs: fix regression with O_EXCL creates and optimize away lookup
cifs: add new cifs_iget function and convert unix codepath to use it
cifs: when ATTR_READONLY is set, only clear write bits on non-directories
On windows servers, ATTR_READONLY apparently either has no meaning or
serves as some sort of queue to certain applications for unrelated
behavior. This MS kbase article has details:
http://support.microsoft.com/kb/326549/
Don't clear the write bits directory mode when ATTR_READONLY is set.
Reported-by: pouchat@peewiki.net
Signed-off-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: Steve French <sfrench@us.ibm.com>
cifs: remove cifsInodeInfo->inUse counter
It was purported to be a refcounter of some sort, but was never
used that way. It never served any purpose that wasn't served equally well
by the I_NEW flag.
Signed-off-by: Jeff Layton <jlayton@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Steve French <sfrench@us.ibm.com>
cifs: convert cifs_get_inode_info and non-posix readdir to use cifs_iget
Rather than allocating an inode and filling it out, have
cifs_get_inode_info fill out a cifs_fattr and call cifs_iget. This means
a pretty hefty reorganization of cifs_get_inode_info.
For the readdir codepath, add a couple of new functions for filling out
cifs_fattr's from different FindFile response infolevels.
Finally, remove cifs_new_inode since there are no more callers.
Signed-off-by: Jeff Layton <jlayton@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Steve French <sfrench@us.ibm.com>
cifs: add and use CIFSSMBUnixSetFileInfo for setattr calls
When there's an open filehandle, SET_FILE_INFO is apparently preferred
over SET_PATH_INFO. Add a new variant that sets a FILE_UNIX_INFO_BASIC
infolevel via SET_FILE_INFO and switch cifs_setattr_unix to use the
new call when there's an open filehandle available.
Signed-off-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: Steve French <sfrench@us.ibm.com>
cifs: make a separate function for filling out FILE_UNIX_BASIC_INFO
The SET_FILE_INFO variant will need to do the same thing here. Break
this code out into a separate function that both variants can call.
Signed-off-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: Steve French <sfrench@us.ibm.com>
cifs: rename CIFSSMBUnixSetInfo to CIFSSMBUnixSetPathInfo
...in preparation of adding a SET_FILE_INFO variant.
Signed-off-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: Steve French <sfrench@us.ibm.com>
cifs: add pid of initiating process to spnego upcall info
This will allow the upcall to poke in /proc/<pid>/environ and get
the value of the $KRB5CCNAME env var for the process.
Signed-off-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: Steve French <sfrench@us.ibm.com>
In the 'ubifs_recover_leb()' function, when we find corrupted
empty space, we dump 8K starting from the offset where the last
node ends. This is OK if the corrupted empty space is somewhere
near that offset. But if the corruption is far at the end of the
LEB, we will dump all 0xFF bytes and complitely ignore the
interesting data. This is observed on a PPC ("kilauea") with
NOR flash.
This patch changes the behavior and teaches UBIFS to print only
interesting data. I.e., now we find where corruption starts and
start dumping from that offset.
Signed-off-by: Artem Bityutskiy <Artem.Bityutskiy@nokia.com>
Reviewed-by: Adrian Hunter <Adrian.Hunter@nokia.com>
recovery.c has 'is_empty()' helper and it is better to use
this helper instead of re-implementing it in several places.
This patch does this and removes some amount of unneeded code.
Signed-off-by: Artem Bityutskiy <Artem.Bityutskiy@nokia.com>
Reviewed-by: Adrian Hunter <Adrian.Hunter@nokia.com>
This patch fixes few minor things I've spotted while going through
code:
1. Better document return codes
2. If 'ubifs_scan_a_node()' returns some thing we do not expect,
treat this as an error.
3. Try to do recovery only when 'ubifs_scan()' returns %-EUCLEAN,
not on any error.
4. If empty space starts at a non-aligned address, print a message.
Signed-off-by: Artem Bityutskiy <Artem.Bityutskiy@nokia.com>
Reviewed-by: Adrian Hunter <Adrian.Hunter@nokia.com>
In case of corruptions, dump 8192 bytes instead of 4096. The
largest node is 4096+ bytes, so it is better to see a node
boundary, which is not always possible when only 4096 bytes
are printed.
Signed-off-by: Artem Bityutskiy <Artem.Bityutskiy@nokia.com>
Reviewed-by: Adrian Hunter <Adrian.Hunter@nokia.com>
Commit 5fd29d6ccb ("printk: clean up
handling of log-levels and newlines") changed printk semantics. printk
lines with multiple KERN_<level> prefixes are no longer emitted as
before the patch.
<level> is now included in the output on each additional use.
Remove all uses of multiple KERN_<level>s in formats.
Signed-off-by: Joe Perches <joe@perches.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Commit 1c8542c7bb replaced kmalloc() with memdup_user() in the write()
function but also dropped the kfree(temp). The memdup_user() function
allocates memory which is never freed.
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Cc: Parag Warudkar <parag.warudkar@gmail.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Fix various silly problems wrt mnt_namespace.h:
- exit_mnt_ns() isn't used, remove it
- done that, sched.h and nsproxy.h inclusions aren't needed
- mount.h inclusion was need for vfsmount_lock, but no longer
- remove mnt_namespace.h inclusion from files which don't use anything
from mnt_namespace.h
Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
The following test script triggers a deadlock on ext2 filesystem:
while true; do quotaon /dev/hda >&/dev/null; usleep $RANDOM; done &
while true; do quotaoff /dev/hda >&/dev/null; usleep $RANDOM; done &
I found there is a potential deadlock between quotaon and quotaoff (or
quotasync). Basically, all of quotactl operations need to be protected by
dqonoff_mutex. vfs_quota_off and vfs_quota_sync also call sb->s_op->quota_write
that needs to grab the i_mutex of the quota file. But in vfs_quota_on_inode
(called from quotaon operation), the current code tries to grab the i_mutex of
the quota file first before getting quonoff_mutex.
Reverse the order in which we take locks in vfs_quota_on_inode().
Jan Kara: Changed changelog to be more readable, made lockdep happy with
I_MUTEX_QUOTA.
Signed-off-by: Jiaying Zhang <jiayingz@google.com>
Signed-off-by: Jan Kara <jack@suse.cz>
do_execve() and ptrace_attach() return -EINTR if
mutex_lock_interruptible(->cred_guard_mutex) fails.
This is not right, change the code to return ERESTARTNOINTR.
Perhaps we should also change proc_pid_attr_write().
Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Cc: David Howells <dhowells@redhat.com>
Acked-by: Roland McGrath <roland@redhat.com>
Cc: James Morris <jmorris@namei.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
I run many ffsb test cases on JBODs (typically 13/12 disks). Comparing
with kernel 2.6.30, 2.6.31-rc1 has about 16% regression with
ffsb_create_4k. The sub test case creates files continuously for 10
minitues and every file is 1MB.
Bisect located below patch.
5cee5815d1 is first bad commit
commit 5cee5815d1
Author: Jan Kara <jack@suse.cz>
Date: Mon Apr 27 16:43:51 2009 +0200
vfs: Make sys_sync() use fsync_super() (version 4)
It is unnecessarily fragile to have two places (fsync_super() and do_sync())
doing data integrity sync of the filesystem. Alter __fsync_super() to
accommodate needs of both callers and use it. So after this patch
__fsync_super() is the only place where we gather all the calls needed to
properly send all data on a filesystem to disk.
As a matter of fact, ffsb calls sys_sync in the end to make sure all data
is flushed to disks and the flushing is counted into the result. vmstat
shows ffsb is blocked when syncing for a long time. With 2.6.30, ffsb is
blocked for a short time.
I checked the patch and did experiments to recover the original methods.
Eventually, the root cause is the patch deletes the calling to
wakeup_pdflush when syncing, so only ffsb is blocked on disk I/O.
wakeup_pdflush could ask pdflush to write back pages with ffsb at the
same time.
[akpm@linux-foundation.org: restore comment too]
Signed-off-by: Zhang Yanmin <yanmin_zhang@linux.intel.com>
Cc: Jan Kara <jack@suse.cz>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Acked-by: Jens Axboe <jens.axboe@oracle.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
UBIFS uses a bdi device per volume, but does not care to hand out unique
names to each of them. This causes an error when trying to mount more
than one volumes. Append the UBI volume and device ID to avoid that.
[Amended a bit by Artem Bityutskiy]
Signed-off-by: Daniel Mack <daniel@caiaq.de>
Cc: Artem Bityutskiy <dedekind@infradead.org>
Cc: Adrian Hunter <ext-adrian.hunter@nokia.com>
Cc: linux-mtd@lists.infradead.org
Signed-off-by: Artem Bityutskiy <Artem.Bityutskiy@nokia.com>
When debugging is enabled and an unclean file-system is mounter,
the following assertion is triggered:
UBIFS assert failed in ubifs_tnc_start_commit at 805 (pid 1081)
Call Trace:
[cfaffbd0] [c0006cf8] show_stack+0x44/0x16c (unreliable)
[cfaffc10] [c011b738] ubifs_tnc_start_commit+0xbb8/0xd18
[cfaffc90] [c0112670] do_commit+0x150/0xa44
[cfaffd10] [c0125234] ubifs_rcvry_gc_commit+0xd8/0x544
[cfaffd60] [c0100e9c] ubifs_fill_super+0xe78/0x15f8
[cfaffdf0] [c0102118] ubifs_get_sb+0x20c/0x320
[cfaffe70] [c007f764] vfs_kern_mount+0x58/0xe0
[cfaffe90] [c007f83c] do_kern_mount+0x40/0xf8
[cfaffeb0] [c0095c24] do_mount+0x550/0x758
[cfafff10] [c0095ebc] sys_mount+0x90/0xe0
[cfafff40] [c000ed4c] ret_from_syscall+0x0/0x3c
The reason is that we initialize 'c->min_leb_idx' early, and do
not re-calculate it after journal replay.
Signed-off-by: Artem Bityutskiy <Artem.Bityutskiy@nokia.com>
This patch adds the following minor optimization:
1. If write-buffer does not use the timer, indicate it with the
wbuf->no_timer variable, instead of using the wbuf->softlimit
variable. This is better because wbuf->softlimit is of ktime_t
type, and the ktime_to_ns function contains 64-bit multiplication.
2. Do not call the 'hrtimer_cancel()' function for write-buffers
which do not use timers.
3. Do not cancel the timer in 'ubifs_put_super()' because the
synchronization function does this.
This patch also removes a confusing comment.
Signed-off-by: Artem Bityutskiy <Artem.Bityutskiy@nokia.com>
1. Make the I/O debugging message print the journal head number.
2. Add prints to timer functions.
Signed-off-by: Artem Bityutskiy <Artem.Bityutskiy@nokia.com>
Fix the following warning:
fs/ubifs/io.c: In function 'ubifs_wbuf_init':
fs/ubifs/io.c:860: warning: integer overflow in expression
And limit maximum hrtimer delta to ULONG_MAX because the
argument is 'unsigned long'.
Signed-off-by: Adrian Hunter <adrian.hunter@nokia.com>
Signed-off-by: Artem Bityutskiy <Artem.Bityutskiy@nokia.com>
This fixes a bug that checkpoint count gets wrong on errors when
deleting a series of checkpoints.
The count error is persistent since the checkpoint count is stored on
disk. Some userland programs refer to the count via ioctl, and this
bugfix is needed to prevent malfunction of such programs.
Signed-off-by: Jiro SEKIBA <jir@unicus.jp>
Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
Cc: stable@kernel.org
This will fix the following false positive of recursive locking which
lockdep has detected:
=============================================
[ INFO: possible recursive locking detected ]
2.6.30-nilfs #42
---------------------------------------------
nilfs_cleanerd/10607 is trying to acquire lock:
(&bmap->b_sem){++++-.}, at: [<e0d025b7>] nilfs_bmap_lookup_at_level+0x1a/0x74 [nilfs2]
but task is already holding lock:
(&bmap->b_sem){++++-.}, at: [<e0d024e0>] nilfs_bmap_truncate+0x19/0x6a [nilfs2]
other info that might help us debug this:
2 locks held by nilfs_cleanerd/10607:
#0: (&nilfs->ns_segctor_sem){++++.+}, at: [<e0d0d75a>] nilfs_transaction_begin+0xb6/0x10c [nilfs2]
#1: (&bmap->b_sem){++++-.}, at: [<e0d024e0>] nilfs_bmap_truncate+0x19/0x6a [nilfs2]
Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
Leandro Lucarella gave me a report that nilfs gets stuck after its
write function fails.
The problem turned out to be caused by bugs which leave writeback flag
on pages. This fixes the problem by ensuring to clear the writeback
flag in error path.
Reported-by: Leandro Lucarella <llucax@gmail.com>
Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
Cc: stable@kernel.org
The following error code handling in nilfs_segctor_write() function
wrongly converted negative error codes to a truth value (i.e. 1):
err = unlikely(err) ? : res;
which originaly meant to be
err = err ? : res;
This mis-conversion caused that write or sync functions receive the
unexpected error code. This fixes the bug by removing the unlikely
directive.
Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
Cc: stable@kernel.org
nfsd_open() gets an unrefcounted pointer to the current process's effective
credentials at the top of the function, then calls nfsd_setuser() via
fh_verify() - which may replace and destroy the current process's effective
credentials - and then passes the unrefcounted pointer to dentry_open() - but
the credentials may have been destroyed by this point.
Instead, the value from current_cred() should be passed directly to
dentry_open() as one of its arguments, rather than being cached in a variable.
Possibly fh_verify() should return the creds to use.
This is a regression introduced by
745ca2475a "CRED: Pass credentials through
dentry_open()".
Signed-off-by: David Howells <dhowells@redhat.com>
Tested-and-Verified-By: Steve Dickson <steved@redhat.com>
Cc: stable@kernel.org
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
* git://git.kernel.org/pub/scm/linux/kernel/git/mason/btrfs-unstable:
Btrfs: fix error message formatting
Btrfs: fix use after free in btrfs_start_workers fail path
Btrfs: honor nodatacow/sum mount options for new files
Btrfs: update backrefs while dropping snapshot
Btrfs: account for space we may use in fallocate
Btrfs: fix the file clone ioctl for preallocated extents
Btrfs: don't log the inode in file_write while growing the file
Make an error msg look nicer by inserting a space between number and word.
Signed-off-by: Hu Tao <hu.taoo@gmail.com>
Signed-off-by: Jiri Kosina <jkosina@suse.cz>
Signed-off-by: Chris Mason <chris.mason@oracle.com>
worker memory is already freed on one fail path in btrfs_start_workers,
but is still dereferenced. Switch the dereference and kfree.
Signed-off-by: Jiri Slaby <jirislaby@gmail.com>
Signed-off-by: Chris Mason <chris.mason@oracle.com>
The btrfs attr patches unconditionally inherited the inode flags field
without honoring nodatacow and nodatasum. This fix makes sure
we properly record the nodatacow/sum mount options in new inodes.
Signed-off-by: Chris Mason <chris.mason@oracle.com>
The new backref format has restriction on type of backref item. If a tree
block isn't referenced by its owner tree, full backrefs must be used for the
pointers in it. When a tree block loses its owner tree's reference, backrefs
for the pointers in it should be updated to full backrefs. Current
btrfs_drop_snapshot misses the code that updates backrefs, so it's unsafe for
general use.
This patch adds backrefs update code to btrfs_drop_snapshot. It isn't a
problem in the restricted form btrfs_drop_snapshot is used today, but for
general snapshot deletion this update is required.
Signed-off-by: Yan Zheng <zheng.yan@oracle.com>
Signed-off-by: Chris Mason <chris.mason@oracle.com>
Using Eric Sandeen's xfstest for fallocate, you can easily trigger a ENOSPC
panic on btrfs. This is because we do not account for data we may use when
doing the fallocate. This patch fixes the problem by properly reserving space,
and then just freeing it when we are done. The reservation stuff was made with
delalloc in mind, so its a little crude for this case, but it keeps the box
from panicing.
Signed-off-by: Josef Bacik <jbacik@redhat.com>
Signed-off-by: Chris Mason <chris.mason@oracle.com>
The per-user inotify_devs value is incremented each time a new file is
allocated, but never decremented. This led to inotify_init failing after a
limited number of calls.
Signed-off-by: Keith Packard <keithp@keithp.com>
Signed-off-by: Eric Paris <eparis@redhat.com>
cifs: add new cifs_iget function and convert unix codepath to use it
In order to unify some codepaths, introduce a common cifs_fattr struct
for storing inode attributes. The different codepaths (unix, legacy,
normal, etc...) can fill out this struct with inode info. It can then be
passed as an arg to a common set of routines to get and update inodes.
Add a new cifs_iget function that uses iget5_locked to identify inodes.
This will compare inodes based on the uniqueid value in a cifs_fattr
struct.
Rather than filling out an already-created inode, have
cifs_get_inode_info_unix instead fill out cifs_fattr and hand that off
to cifs_iget. cifs_iget can then properly look for hardlinked inodes.
On the readdir side, add a new cifs_readdir_lookup function that spawns
populated dentries. Redefine FILE_UNIX_INFO so that it's basically a
FILE_UNIX_BASIC_INFO that has a few fields wrapped around it. This
allows us to more easily use the same function for filling out the fattr
as the non-readdir codepath.
With this, we should then have proper hardlink detection and can
eventually get rid of some nasty CIFS-specific hacks for handing them.
Signed-off-by: Jeff Layton <jlayton@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Steve French <sfrench@us.ibm.com>
* git://git.infradead.org/mtd-2.6:
mtd: nand: fix build failure and incorrect return from omap_wait()
mtd: Use BLOCK_NIL consistently in NFTL/INFTL
mtd: m25p80 timeout too short for worst-case m25p16 devices
mtd: atmel_nand: Fix typo s/parititions/partitions/
mtd: cmdlineparts: Use 64-bit format when printing a debug message.
mtd: maps: Remove BUS_ID_SIZE from integrator_flash
jffs2: fix another potential leak on error path in scan.c
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/fuse:
fuse: invalidation reverse calls
fuse: allow umask processing in userspace
fuse: fix bad return value in fuse_file_poll()
fuse: fix return value of fuse_dev_write()
Check before use it.
Signed-off-by: WANG Cong <amwang@redhat.com>
Cc: Alexander Viro <viro@zeniv.linux.org.uk>
Cc: David Howells <dhowells@redhat.com>
Acked-by: Roland McGrath <roland@redhat.com>
Acked-by: James Morris <jmorris@namei.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
This patch restores stacking ability to the block layer integrity
infrastructure by creating a set of dedicated bip slabs. Each bip slab
has an embedded bio_vec array at the end. This cuts down on memory
allocations and also simplifies the code compared to the original bvec
version. Only the largest bip slab is backed by a mempool. The pool is
contained in the bio_set so stacking drivers can ensure forward
progress.
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Jens Axboe <axboe@carl.(none)>
Maximum file size for hostfs mounts defaults to 2GB, so bigger files cannot be
read/written through hostfs. This patch initializes the maximum file size to
MAX_LFS_SIZE.
Addresses http://bugzilla.kernel.org/show_bug.cgi?id=13531
Signed-off-by: Wolfgang Illmeyer <wolfgang@illmeyer.com>
Cc: Jeff Dike <jdike@addtoit.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
ext2_iget() returns -ESTALE if invoked on a deleted inode, in order to
report errors to NFS properly. However, in ext[234]_lookup(), this
-ESTALE can be propagated to userspace if the filesystem is corrupted such
that a directory entry references a deleted inode. This leads to a
misleading error message - "Stale NFS file handle" - and confusion on the
part of the admin.
The bug can be easily reproduced by creating a new filesystem, making a
link to an unused inode using debugfs, then mounting and attempting to ls
-l said link.
This patch thus changes ext2_lookup to return -EIO if it receives -ESTALE
from ext2_iget(), as ext2 does for other filesystem metadata corruption;
and also invokes the appropriate ext*_error functions when this case is
detected.
Signed-off-by: Bryan Donlan <bdonlan@gmail.com>
Cc: <linux-ext4@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
With ELF, at generating coredump, some more headers other than used
vmas are added.
When max_map_count == 65536, a core generated by following kinds of
code can be unreadable because the number of ELF's program header is
written in 16bit in Ehdr (please see elf.h) and the number overflows.
==
... = mmap(); (munmap, mprotect, etc...)
if (failed)
abort();
==
This can happen in mmap/munmap/mprotect/etc...which calls split_vma().
I think 65536 is not safe as _default_ and reduce it to 65530 is good
for avoiding unexpected corrupted core.
Anyway, max_map_count can be enlarged by sysctl if a user is brave..
Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: Hugh Dickins <hugh.dickins@tiscali.co.uk>
Cc: Jakub Jelinek <jakub@redhat.com>
Acked-by: Roland McGrath <roland@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Change the eventfd interface to de-couple the eventfd memory context, from
the file pointer instance.
Without such change, there is no clean way to racely free handle the
POLLHUP event sent when the last instance of the file* goes away. Also,
now the internal eventfd APIs are using the eventfd context instead of the
file*.
This patch is required by KVM's IRQfd code, which is still under
development.
Signed-off-by: Davide Libenzi <davidel@xmailserver.org>
Cc: Gregory Haskins <ghaskins@novell.com>
Cc: Rusty Russell <rusty@rustcorp.com.au>
Cc: Benjamin LaHaise <bcrl@kvack.org>
Cc: Avi Kivity <avi@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Don't unlock on vfs_rejected_lock path in afs_do_setlk, since the lock
is unlocked after abort_attempt label.
Signed-off-by: Jiri Slaby <jirislaby@gmail.com>
Signed-off-by: David Howells <dhowells@redhat.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Add notification messages that allow the filesystem to invalidate VFS
caches.
Two notifications are added:
1) inode invalidation
- invalidate cached attributes
- invalidate a range of pages in the page cache (this is optional)
2) dentry invalidation
- try to invalidate a subtree in the dentry cache
Care must be taken while accessing the 'struct super_block' for the
mount, as it can go away while an invalidation is in progress. To
prevent this, introduce a rw-semaphore, that is taken for read during
the invalidation and taken for write in the ->kill_sb callback.
Cc: Csaba Henk <csaba@gluster.com>
Cc: Anand Avati <avati@zresearch.com>
Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
This patch lets filesystems handle masking the file mode on creation.
This is needed if filesystem is using ACLs.
- The CREATE, MKDIR and MKNOD requests are extended with a "umask"
parameter.
- A new FUSE_DONT_MASK flag is added to the INIT request/reply. With
this the filesystem may request that the create mode is not masked.
CC: Jean-Pierre André <jean-pierre.andre@wanadoo.fr>
Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
On 64 bit systems -- where sizeof(ssize_t) > sizeof(int) -- the following test
exposes a bug due to a non-careful return of an int or unsigned value:
implement a FUSE filesystem which sends an unsolicited notification to
the kernel with invalid opcode. The respective write to /dev/fuse
will return (1 << 32) - EINVAL with errno == 0 instead of -1 with
errno == EINVAL.
Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
CC: stable@kernel.org
This patch fixes an imbalance message as reported by J.R. Okajima.
The IMA file counters are incremented in ima_path_check. If the
actual open fails, such as ETXTBSY, decrement the counters to
prevent unnecessary imbalance messages.
Reported-by: J.R. Okajima <hooanon05@yahoo.co.jp>
Signed-off-by: Mimi Zohar <zohar@us.ibm.com>
Signed-off-by: James Morris <jmorris@namei.org>
Fixes a regression caused by commit a6ce4932fb
When this lock was converted to a mutex, the locks were turned into
unlocks and vice-versa.
Signed-off-by: Jeff Layton <jlayton@redhat.com>
Acked-by: Shirish Pargaonkar <shirishp@us.ibm.com>
Cc: Stable Tree <stable@kernel.org>
Signed-off-by: Steve French <sfrench@us.ibm.com>
* git://git.kernel.org/pub/scm/linux/kernel/git/sfrench/cifs-2.6:
[CIFS] remove unknown mount option warning message
[CIFS] remove bkl usage from umount begin
cifs: Fix incorrect return code being printed in cFYI messages
[CIFS] cleanup asn handling for ntlmssp
[CIFS] Copy struct *after* setting the port, instead of before.
cifs: remove rw/ro options
cifs: fix problems with earlier patches
cifs: have cifs parse scope_id out of IPv6 addresses and use it
[CIFS] Do not send tree disconnect if session is already disconnected
[CIFS] Fix build break
cifs: display scopeid in /proc/mounts
cifs: add new routine for converting AF_INET and AF_INET6 addrs
cifs: have cifs_show_options show forceuid/forcegid options
cifs: remove unneeded NULL checks from cifs_show_options
Jeff's previous patch which removed the unneeded rw/ro
parsing can cause a minor warning in dmesg (about the
unknown rw or ro mount option) at mount time. This
patch makes cifs ignore them in kernel to remove the warning
(they are already handled in the mount helper and VFS).
Signed-off-by: Steve French <sfrench@us.ibm.com>
The lock_kernel call moved into the fs for umount_begin
is not needed. This adds a check to make sure we don't
call umount_begin twice on the same fs.
umount_begin for cifs is probably not needed and
may eventually be able to be removed, but in
the meantime this smaller patch is safe and
gets rid of the bkl from this path which provides
some benefit.
Acked-by: Jeff Layton <redhat.com>
Signed-off-by: Steve French <sfrench@us.ibm.com>
FreeXid() along with freeing Xid does add a cifsFYI debug message that
prints rc (return code) as well. In some code paths where we set/return
error code after calling FreeXid(), incorrect error code is being
printed when cifsFYI is enabled.
This could be misleading in few cases. For eg.
In cifs_open() if cifs_fill_filedata() returns a valid pointer to
cifsFileInfo, FreeXid() prints rc=-13 whereas 0 is actually being
returned. Fix this by setting rc before calling FreeXid().
Basically convert
FreeXid(xid); rc = -ERR;
return -ERR; => FreeXid(xid);
return rc;
[Note that Christoph would like to replace the GetXid/FreeXid
calls, which are primarily used for debugging. This seems
like a good longer term goal, but although there is an
alternative tracing facility, there are no examples yet
available that I know of that we can use (yet) to
convert this cifs function entry/exit logging, and for
creating an identifier that we can use to correlate
all dmesg log entries for a particular vfs operation
(ie identify all log entries for a particular vfs
request to cifs: e.g. a particular close or read or write
or byte range lock call ... and just using the thread id
is harder). Eventually when a replacement
for this is available (e.g. when NFS switches over and various
samples to look at in other file systems) we can remove the
GetXid/FreeXid macro but in the meantime multiple people
use this run time configurable logging all the time
for debugging, and Suresh's patch fixes a problem
which made it harder to notice some low
memory problems in the log so it is worthwhile
to fix this problem until a better logging
approach is able to be used]
Acked-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: Suresh Jayaraman <sjayaraman@suse.de>
Signed-off-by: Steve French <sfrench@us.ibm.com>
Also removes obsolete distinction between rawntlmssp and ntlmssp (in asn/SPNEGO)
since as jra noted we can always send raw ntlmssp in session setup now.
remove check for experimental runtime flag (/proc/fs/cifs/Experimental) in
ntlmssp path.
Reviewed-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: Steve French <sfrench@us.ibm.com>
cifs: remove rw/ro options
These options are handled at the VFS layer. They only ever set the
option in the smb_vol struct. Nothing was ever done with them afterward
anyway.
Signed-off-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: Steve French <sfrench@us.ibm.com>
cifs: fix problems with earlier patches
cifs_show_address hasn't been introduced yet, and fix a typo that was
silently fixed by a later patch in the series.
Signed-off-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: Steve French <sfrench@us.ibm.com>
This patch has CIFS look for a '%' in an IPv6 address. If one is
present then it will try to treat that value as a numeric interface
index suitable for stuffing into the sin6_scope_id field.
This should allow people to mount servers on IPv6 link-local addresses.
Signed-off-by: Jeff Layton <jlayton@redhat.com>
Acked-by: David Holder <david@erion.co.uk>
Signed-off-by: Steve French <sfrench@us.ibm.com>
Noticed this when tree connect timed out (due to Samba server crash) -
we try to send a tree disconnect for a tid that does not exist
since we don't have a valid tree id yet. This checks that the
session is valid before sending the tree disconnect to handle
this case.
Signed-off-by: Steve French <sfrench@us.ibm.com>
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs-2.6: (23 commits)
switch xfs to generic acl caching helpers
helpers for acl caching + switch to those
switch shmem to inode->i_acl
switch reiserfs to inode->i_acl
switch reiserfs to usual conventions for caching ACLs
reiserfs: minimal fix for ACL caching
switch nilfs2 to inode->i_acl
switch btrfs to inode->i_acl
switch jffs2 to inode->i_acl
switch jfs to inode->i_acl
switch ext4 to inode->i_acl
switch ext3 to inode->i_acl
switch ext2 to inode->i_acl
add caching of ACLs in struct inode
fs: Add new pre-allocation ioctls to vfs for compatibility with legacy xfs ioctls
cleanup __writeback_single_inode
... and the same for vfsmount id/mount group id
Make allocation of anon devices cheaper
update Documentation/filesystems/Locking
devpts: remove module-related code
...
* 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-udf-2.6:
udf: remove redundant tests on unsigned
udf: Use device size when drive reported bogus number of written blocks
helpers: get_cached_acl(inode, type), set_cached_acl(inode, type, acl),
forget_cached_acl(inode, type).
ubifs/xattr.c needed includes reordered, the rest is a plain switchover.
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
reiserfs uses NULL as "unknown" and ERR_PTR(-ENODATA) as "no ACL";
several codepaths store the former instead of the latter.
All those codepaths go through iset_acl() and all cases when it's
called with NULL acl are for the second variety, so the minimal
fix is to teach iset_acl() to deal with that.
Proper fix is to switch to more usual conventions and avoid back
and forth between internally used ERR_PTR(-ENODATA) and NULL
expected by the rest of the kernel.
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
This patch adds ioctls to vfs for compatibility with legacy XFS
pre-allocation ioctls (XFS_IOC_*RESVP*). The implementation
effectively invokes sys_fallocate for the new ioctls.
Also handles the compat_ioctl case.
Note: These legacy ioctls are also implemented by OCFS2.
[AV: folded fixes from hch]
Signed-off-by: Ankit Jain <me@ankitjain.org>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
There is no reason to for the split between __writeback_single_inode and
__sync_single_inode, the former just does a couple of checks before
tail-calling the latter. So merge the two, and while we're at it split
out the I_SYNC waiting case for data integrity writers, as it's
logically separate function. Finally rename __writeback_single_inode to
writeback_single_inode.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Standard trick - add a new variable (start) such that
for each n < start n is known to be busy. Allocation can
skip checking everything in [0..start) and if it returns
n, we can set start to n + 1. Freeing below start sets
start to what we'd just freed.
Of course, it still sucks if we do something like
free 0
allocate
allocate
in a loop - still O(n^2) time. However, on saner loads it
improves the things a lot and the entire thing is not worth
the trouble of switching to something with better worst-case
behaviour.
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>