- export __blk_put_request and blk_execute_rq_nowait
needed for async REQ_BLOCK_PC requests
- seperate max_hw_sectors and max_sectors for block/scsi_ioctl.c and
SG_IO bio.c helpers per Jens's last comments. Since block/scsi_ioctl.c SG_IO was
already testing against max_sectors and SCSI-ml was setting max_sectors and
max_hw_sectors to the same value this does not change any scsi SG_IO behavior. It only
prepares ll_rw_blk.c, scsi_ioctl.c and bio.c for when SCSI-ml begins to set
a valid max_hw_sectors for all LLDs. Today if a LLD does not set it
SCSI-ml sets it to a safe default and some LLDs set it to a artificial low
value to overcome memory and feedback issues.
Note: Since we now cap max_sectors to BLK_DEF_MAX_SECTORS, which is 1024,
drivers that used to call blk_queue_max_sectors with a large value of
max_sectors will now see the fs requests capped to BLK_DEF_MAX_SECTORS.
Signed-off-by: Mike Christie <michaelc@cs.wisc.edu>
Signed-off-by: James Bottomley <James.Bottomley@SteelEye.com>
Add scsi helpers to create really-large-requests and convert
scsi-ml to scsi_execute_async().
Per Jens's previous comments, I placed this function in scsi_lib.c.
I made it follow all the queue's limits - I think I did at least :), so
I removed the warning on the function header.
I think the scsi_execute_* functions should eventually take a request_queue
and be placed some place where the dm-multipath hw_handler can use them
if that failover code is going to stay in the kernel. That conversion
patch will be sent in another mail though.
Signed-off-by: Mike Christie <michaelc@cs.wisc.edu>
Signed-off-by: James Bottomley <James.Bottomley@SteelEye.com>
The below patch lets userspace have more control over the inodes that
inotify will watch. It introduces two new flags.
IN_ONLYDIR -- only watch the inode if it is a directory.
This is needed to avoid the race that can occur when we want to be
sure that we are watching a directory.
IN_DONT_FOLLOW -- don't follow a symlink. In combination
with IN_ONLYDIR we can make sure that we don't watch the target of
symlinks.
The issues the flags fix came up when writing the gnome-vfs inotify
backend. Default behaviour is unchanged.
Signed-off-by: John McCutchan <ttb@tentacle.dhs.org>
Acked-by: Robert Love <rml@novell.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Commit f549d6c18c introduced a generic
fallback for security xattrs, but appears to include a subtle bug.
Gentoo users with kernels with selinux compiled in, and coreutils compiled
with acl support, noticed that they could not copy files on tmpfs using
'cp'.
cp (compiled with acl support) copies the file, lists the extended
attributes on the old file, copies them all to the new file, and then
exits. However the listxattr() calls were failing with this odd behaviour:
llistxattr("a.out", (nil), 0) = 17
llistxattr("a.out", 0x7fffff8c6cb0, 17) = -1 ERANGE (Numerical result out of
range)
I believe this is a simple problem in the logic used to check the buffer
sizes; if the user sends a buffer the exact size of the data, then its ok
:)
This change solves the problem.
More info can be found at http://bugs.gentoo.org/113138
Signed-off-by: Daniel Drake <dsd@gentoo.org>
Acked-by: James Morris <jmorris@namei.org>
Acked-by: Stephen Smalley <sds@tycho.nsa.gov>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
- Missing initialisation of attribute bitmask in _nfs4_proc_write()
- On success, _nfs4_proc_write() must return number of bytes written.
- Missing post_op_update_inode() in _nfs4_proc_write()
- Missing initialisation of attribute bitmask in _nfs4_proc_commit()
- Missing post_op_update_inode() in _nfs4_proc_commit()
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Ensure that we use set_page_writeback() in the appropriate places
to help the VM in keeping its page radix_tree in sync.
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Steve Dickson writes:
Doing the following:
1. On server:
$ mkdir ~/t
$ echo Hello > ~/t/tmp
2. On client, wait for a string to appear in this file:
$ until grep -q foo t/tmp ; do echo -n . ; sleep 1 ; done
3. On server, create a *new* file with the same name containing that
string:
$ mv ~/t/tmp ~/t/tmp.old; echo foo > ~/t/tmp
will show how the client will never (and I mean never ;-) ) see
the updated file.
The problem is that we do not update nfsi->cache_change_attribute when the
file changes on the server (we only update it when our client makes the
changes). This again means that functions like nfs_check_verifier() will
fail to register when the parent directory has changed and should trigger
a dentry lookup revalidation.
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Make sure cache_change_attribute is initialized to jiffies
so when the mtime changes on directory, the directory
will be refreshed.
Signed-off by: Steve Dickson <steved@redhat.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
the request queue. Also periodically wakeup response_q so threads can
check if stuck requests have timed out. Workaround Windows server illegal smb
length on transact2 findfirst response.
Signed-off-by: Steve French <sfrench@us.ibm.com>
disabled. Also set mode, uid, gid better on mkdir and create for the
case when Unix Extensions is not enabled and setuids is enabled. This is
necessary to fix the hole in which chown could be allowed for non-root
users in some cases if root mounted, and also to display the mode and uid
properly in some cases.
Signed-off-by: Steve French <sfrench@us.ibm.com>
Access to a journaled HFS+ volume is not officially supported under Linux, so
mount such a volume read-only, but users can override this behaviour using the
"force" mount option.
The minimum requirement to relax this check is to at least check that the
journal is empty and so nothing needs to be replayed to make sure the volume
is consistent.
Signed-off-by: Roman Zippel <zippel@linux-m68k.org>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
If an external device is used for a journal, by default it will use the
entire device. The reiserfs journal code allocates structures per journal
block when it mounts the file system. If the journal device is too large,
and memory cannot be allocated for the structures, it will continue and
ultimately panic when it can't pull one off the free list.
This patch handles the allocation failure gracefully and prints an error
message at mount time.
Changes: Updated error message to be more descriptive to the user.
Discussed and approved on ReiserFS Mailing List, Nov 28.
Signed-off-by: Jeff Mahoney <jeffm@suse.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
JFFS2 initialize f->sem mutex as "locked" in the slab constructor which is a
bug. Objects are freed with unlocked f->sem mutex. So, when they allocated
again, f->sem is unlocked because the slab cache constructor is not called for
them. The constructor is called only once when memory pages are allocated for
objects (namely, when the slab layer allocates new slabs). So, sometimes
'struct jffs2_inode_info' are allocated with unlocked f->sem, sometimes with
locked. This is a bug. Instead, initialize f->sem as unlocked in the
constructor. I.e., in the "constructed" state f->sem must be unlocked.
From: Keijiro Yano <keijiro_yano@yahoo.co.jp>
Acked-by: Artem B. Bityutskiy <dedekind@infradead.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Check for invalid node ID values in the new atomic create+open method.
Signed-off-by: Miklos Szeredi <miklos@szeredi.hu>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Check the created directory inode for aliases in the mkdir() method.
Signed-off-by: Miklos Szeredi <miklos@szeredi.hu>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
When quota file specified in mount options did not exist, we tried to
dereference NULL pointer later. Fix it.
Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Assign the appropriate dentry operations to the dentry. Fixes memory leak.
Signed-off-by: Latchesar Ionkov <lucho@ionkov.net>
Cc: Eric Van Hensbergen <ericvh@gmail.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
This patch corrects the return value for the EXT3_IOC_GROUP_ADD in case it
fails due to the presence of multiple resizers at the filesystem.
The problem is a little bit more serious than a wrong return value in this
case, since the clause err=0 in the exit_journal path will lead to a call
to update_backups which in turns causes a NULL pointer dereference.
Signed-off-by: Glauber de Oliveira Costa <glommer@br.ibm.com>
Cc: "Stephen C. Tweedie" <sct@redhat.com>
Cc: Andreas Dilger <adilger@clusterfs.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
I now see another overflow in reiserfs that should lead to data corruptions
with files that are bigger than 4G under certain circumstances when using
mmap.
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
This replaces the (in my opinion horrible) VM_UNMAPPED logic with very
explicit support for a "remapped page range" aka VM_PFNMAP. It allows a
VM area to contain an arbitrary range of page table entries that the VM
never touches, and never considers to be normal pages.
Any user of "remap_pfn_range()" automatically gets this new
functionality, and doesn't even have to mark the pages reserved or
indeed mark them any other way. It just works. As a side effect, doing
mmap() on /dev/mem works for arbitrary ranges.
Sparc update from David in the next commit.
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
fix cifs negative dentries so they are freed faster (not requiring
umount or readdir e.g.) so the client recognizes the new file on
the server more quickly.
Signed-off-by: Steve French <sfrench@us.ibm.com>
In cases where the server has gone insane, nfs_update_inode() may end
up calling nfs_invalidate_inode(), which again calls stuff that takes
the inode->i_lock that we're already holding.
In addition, given the sort of things we have in NFS these days that
need to be cleaned up on inode release, I'm not sure we should ever
be calling make_bad_inode().
Fix up spinlock recursion, and limit nfs_invalidate_inode() to clearing
the caches, and marking the inode as being stale.
Thanks to Steve Dickson <SteveD@redhat.com> for spotting this.
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
When caching locks due to holding a file delegation, we must always
check against local locks before sending anything to the server.
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
its queue of IO completion callbacks, thus creating the deadlock between
umount and xfslogd. Breaking the loop solves the problem.
SGI-PV: 943821
SGI-Modid: xfs-linux-melb:xfs-kern:202363a
Signed-off-by: Felix Blyakher <felixb@sgi.com>
Signed-off-by: Nathan Scott <nathans@sgi.com>
Work around gcc-2.95.x macro expansion bug.
Cc: David Woodhouse <dwmw2@infradead.org>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
When non-leader thread does exec, de_thread adds old leader to the init's
->children list in EXIT_ZOMBIE state and drops tasklist_lock.
This means that release_task(leader) in de_thread() is racy vs do_wait()
from init task.
I think de_thread() should set old leader's state to EXIT_DEAD instead.
Signed-off-by: Oleg Nesterov <oleg@tv-sign.ru>
Cc: george anzinger <george@mvista.com>
Cc: Roland Dreier <rolandd@cisco.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Linus Torvalds <torvalds@osdl.org>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Currently, if a hugetlbfs is mounted without limits (the default), statfs()
will return -1 for max/free/used blocks. This does not appear to be in
line with normal convention: simple_statfs() and shmem_statfs() both return
0 in similar cases. Worse, it confuses the translation logic in
put_compat_statfs(), causing it to return -EOVERFLOW on such a mount.
This patch alters hugetlbfs_statfs() to return 0 for max/free/used blocks
on a mount without limits. Note that we need the test in the patch below,
rather than just using 0 in the sbinfo structure, because the -1 marked in
the free blocks field is used internally to tell the
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
In fs/compat.c, whenever put_compat_statfs() returns an error, the
containing syscall returns -EFAULT. This is presumably by analogy with the
non-compat case, where any non-zero code from copy_to_user() should be
translated into an EFAULT. However, put_compat_statfs() is also return
-EOVERFLOW. The same applies for put_compat_statfs64().
This bug can be observed with a statfs() on a hugetlbfs directory.
hugetlbfs, when mounted without limits reports available, free and total
blocks as -1 (itself a bug, another patch coming). statfs() will
mysteriously return EFAULT although it's parameters are perfectly valid
addresses.
This patch causes the compat versions of statfs() and statfs64() to
correctly propogate the return values from put_compat_statfs() and
put_compat_statfs64().
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
From: Alexandra Kossovsky <Alexandra.Kossovsky@oktetlabs.ru>
From http://bugzilla.kernel.org/show_bug.cgi?id=4746
There is user data corruption when using ioctl(SIOCGIFCONF) in 32-bit
application running amd64 kernel. I do not think that this problem is
exploitable, but any data corruption may lead to security problems.
Following code demonstrates the problem
#include <stdint.h>
#include <stdio.h>
#include <sys/time.h>
#include <sys/socket.h>
#include <net/if.h>
#include <sys/ioctl.h>
char buf[256];
main()
{
int s = socket(AF_INET, SOCK_DGRAM, 0);
struct ifconf req;
int i;
req.ifc_buf = buf;
req.ifc_len = 41;
printf("Result %d\n", ioctl(s, SIOCGIFCONF, &req));
printf("Len %d\n", req.ifc_len);
for (i = 41; i < 256; i++)
if (buf[i] != 0)
printf("Byte %d is corrupted\n", i);
}
Steps to reproduce:
Compile the code above into 32-bit elf and run it. You'll get
Result 0
Len 32
Byte 48 is corrupted
Byte 52 is corrupted
Byte 53 is corrupted
Byte 54 is corrupted
Byte 55 is corrupted
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Originally for 2.6.16, but the semaphore causes problems for some
people so get rid of it now.
It's not needed anymore because the ioctl hash table is never changed
at run time now.
Signed-off-by: Andi Kleen <ak@suse.de>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Fix the case in which readdir reset file type when SFU mount option
specified.
Also fix sfu related functions to not request EAs (xattrs) when not
configured in Kconfig
Signed-off-by: Steve French <sfrench@us.ibm.com>
writev and aio_write to flush properly.
This is Christoph's patch merged with the new nobrl file operations
Signed-off-by: Dave Kleikamp <shaggy@austin.ibm.com>
From: Christoph Hellwig <hch@lst.de>
- support vectored and async aio ops unconditionally - this is above
the pagecache and transparent to the fs
- remove cifs_read_wrapper. it was only doing silly checks and
calling generic_file_write in all cases.
- use do_sync_read/do_sync_write as read/write operations. They call
->readv/->writev which we now always implemente.
- add the filemap_fdatawrite calls to writev/aio_write which were
missing previously compared to plain write. no idea what the point
behind them is, but let's be consistent at least..
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Steven French <sfrench@us.ibm.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>