Change de_thread() to use KILLABLE rather than UNINTERRUPTIBLE while
waiting for other threads. The only complication is that we should
clear ->group_exit_task and ->notify_count before we return, and we
should do this under tasklist_lock. -EAGAIN is used to match the
initial signal_group_exit() check/return, it doesn't really matter.
This fixes the (unlikely) race with coredump. de_thread() checks
signal_group_exit() before it starts to kill the subthreads, but this
can't help if another CLONE_VM (but non CLONE_THREAD) task starts the
coredumping after de_thread() unlocks ->siglock. In this case the
killed sub-thread can block in exit_mm() waiting for coredump_finish(),
execing thread waits for that sub-thead, and the coredumping thread
waits for execing thread. Deadlock.
Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Give the key type the opportunity to preparse the payload prior to the
instantiation and update routines being called. This is done with the
provision of two new key type operations:
int (*preparse)(struct key_preparsed_payload *prep);
void (*free_preparse)(struct key_preparsed_payload *prep);
If the first operation is present, then it is called before key creation (in
the add/update case) or before the key semaphore is taken (in the update and
instantiate cases). The second operation is called to clean up if the first
was called.
preparse() is given the opportunity to fill in the following structure:
struct key_preparsed_payload {
char *description;
void *type_data[2];
void *payload;
const void *data;
size_t datalen;
size_t quotalen;
};
Before the preparser is called, the first three fields will have been cleared,
the payload pointer and size will be stored in data and datalen and the default
quota size from the key_type struct will be stored into quotalen.
The preparser may parse the payload in any way it likes and may store data in
the type_data[] and payload fields for use by the instantiate() and update()
ops.
The preparser may also propose a description for the key by attaching it as a
string to the description field. This can be used by passing a NULL or ""
description to the add_key() system call or the key_create_or_update()
function. This cannot work with request_key() as that required the description
to tell the upcall about the key to be created.
This, for example permits keys that store PGP public keys to generate their own
name from the user ID and public key fingerprint in the key.
The instantiate() and update() operations are then modified to look like this:
int (*instantiate)(struct key *key, struct key_preparsed_payload *prep);
int (*update)(struct key *key, struct key_preparsed_payload *prep);
and the new payload data is passed in *prep, whether or not it was preparsed.
Signed-off-by: David Howells <dhowells@redhat.com>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Apparently this was lost when we converted to the standard option
parser in 8830d7e07a5e38bc47650a7554b7c1cfd49902bf
Cc: Sachin Prabhu <sprabhu@redhat.com>
Cc: stable@vger.kernel.org # v3.4+
Reported-by: Gregory Lee Bartholomew <gregory.lee.bartholomew@gmail.com>
Signed-off-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: Steve French <smfrench@gmail.com>
wchar_t is currently 16bit so converting a utf8 encoded characters not
in plane 0 (>= 0x10000) to wchar_t (that is calling char2uni) lead to a
-EINVAL return. This patch detect utf8 in cifs_strtoUTF16 and add special
code calling utf8s_to_utf16s.
Signed-off-by: Frediano Ziglio <frediano.ziglio@citrix.com>
Acked-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: Steve French <smfrench@gmail.com>
kernel_sendmsg() is less likely to return -ENOSPC and it might be
a bug to do so. However, in the past there might have been cases
where a -ENOSPC was returned from a low level driver.
Add a WARN_ON_ONCE() to ensure that it is safe to assume that -ENOSPC
is no longer returned. This -ENOSPC specific handling will be removed
once we are sure it is no longer returned.
Reviewed-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: Suresh Jayaraman <sjayaraman@suse.com>
Signed-off-by: Steve French <smfrench@gmail.com>
Pull ceph updates from Sage Weil:
"The bulk of this pull is a series from Alex that refactors and cleans
up the RBD code to lay the groundwork for supporting the new image
format and evolving feature set. There are also some cleanups in
libceph, and for ceph there's fixed validation of file striping
layouts and a bugfix in the code handling a shrinking MDS cluster."
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client: (71 commits)
ceph: avoid 32-bit page index overflow
ceph: return EIO on invalid layout on GET_DATALOC ioctl
rbd: BUG on invalid layout
ceph: propagate layout error on osd request creation
libceph: check for invalid mapping
ceph: convert to use le32_add_cpu()
ceph: Fix oops when handling mdsmap that decreases max_mds
rbd: update remaining header fields for v2
rbd: get snapshot name for a v2 image
rbd: get the snapshot context for a v2 image
rbd: get image features for a v2 image
rbd: get the object prefix for a v2 rbd image
rbd: add code to get the size of a v2 rbd image
rbd: lay out header probe infrastructure
rbd: encapsulate code that gets snapshot info
rbd: add an rbd features field
rbd: don't use index in __rbd_add_snap_dev()
rbd: kill create_snap sysfs entry
rbd: define rbd_dev_image_id()
rbd: define some new format constants
...
using the meta_bg feature. This allows us to resize file systems
which are greater than 16TB. In addition, the speed of online
resizing has been improved in general.
We also fix a number of races, some of which could lead to deadlocks,
in ext4's Asynchronous I/O and online defrag support, thanks to good
work by Dmitry Monakhov.
There are also a large number of more minor bug fixes and cleanups
from a number of other ext4 contributors, quite of few of which have
submitted fixes for the first time.
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.12 (GNU/Linux)
iQIcBAABCAAGBQJQbxMXAAoJENNvdpvBGATwlg4QAJZ4mHNSL2eaaxjRtTbL1pAz
+FVXpJ3lhw1lSfE9hJGqPVE8EfU2fWjIqxEI7dgh95Tukc5pUnPAQ2/hBz8ZA0qq
o0AFMk3mRnvCEh6HsZfumsV83eqpR3k/zEy4uFH+KtxBskPe2sEKy3B7qOxvgdKW
Gh8B2WqF2BpIj9WIT1P9G6xsxZW64EMHTbWcgRhuoRD7bakDNnwQ3kElz/TJQU5q
bM/5wE7pqKwU2J1L0Ho0mxDi0f/BbXeJdA9k1tQy2KM1pZwHtpj4Ls0qmfoi49GE
KyZqQOXlFbAz/9tidPDceY5KoRRQm1MwZ+1MimQX1P+40cs/w3pNu3yiibcaXIru
UZ63AQMCj5JHMcFNVi20sVCwjU/ibNtEO75cfDD4bzPgHJvfCj73EbHTLl21nbTu
izIMffhJEHmRnmRXiiortYVuI4b19oIfnXg7eclrJoUWSuGwKKsJOc5nMjDqidG4
B7Gq4TD89sGkIYzx+50E+ll2ispcBN0BQnGqp4k2BzgDyEHhuFYk7VuVQvJgCGTi
eobzQJj7JUXPWxyemcAVkQTtUq4vVbkm/IwS+/GA9b9Z80X8hR8x6EVHUW5lX3qC
YHoBSCU4XKZXXWqzx0fIVCXyKKFiBzM+OXcgHOKH90vK8k6kPmPODhNCxvV3pITU
jfl9q+X1dY4SpybZjLt5
=iYeV
-----END PGP SIGNATURE-----
Merge tag 'ext4_for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4
Pull ext4 updates from Ted Ts'o:
"The big new feature added this time is supporting online resizing
using the meta_bg feature. This allows us to resize file systems
which are greater than 16TB. In addition, the speed of online
resizing has been improved in general.
We also fix a number of races, some of which could lead to deadlocks,
in ext4's Asynchronous I/O and online defrag support, thanks to good
work by Dmitry Monakhov.
There are also a large number of more minor bug fixes and cleanups
from a number of other ext4 contributors, quite of few of which have
submitted fixes for the first time."
* tag 'ext4_for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4: (69 commits)
ext4: fix ext4_flush_completed_IO wait semantics
ext4: fix mtime update in nodelalloc mode
ext4: fix ext_remove_space for punch_hole case
ext4: punch_hole should wait for DIO writers
ext4: serialize truncate with owerwrite DIO workers
ext4: endless truncate due to nonlocked dio readers
ext4: serialize unlocked dio reads with truncate
ext4: serialize dio nonlocked reads with defrag workers
ext4: completed_io locking cleanup
ext4: fix unwritten counter leakage
ext4: give i_aiodio_unwritten a more appropriate name
ext4: ext4_inode_info diet
ext4: convert to use leXX_add_cpu()
ext4: ext4_bread usage audit
fs: reserve fallocate flag codepoint
ext4: remove redundant offset check in mext_check_arguments()
ext4: don't clear orphan list on ro mount with errors
jbd2: fix assertion failure in commit code due to lacking transaction credits
ext4: release donor reference when EXT4_IOC_MOVE_EXT ioctl fails
ext4: enable FITRIM ioctl on bigalloc file system
...
and no longer use its debugfs knobs. The change slightly touches
kernel/trace directory, but it got the needed ack from Steven Rostedt:
http://lkml.org/lkml/2012/8/21/688
2. Added maintainers entry;
3. A bunch of fixes, nothing special.
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2.0.19 (GNU/Linux)
iQIcBAABAgAGBQJQbjoFAAoJEGgI9fZJve1bZgUP/A/ZwFGfdnochDgRhK5p7ljY
baRZpgSh2B+BxIDTEPLfVh6HbOivmYJ8WF0unD9kKTzCCS71ZUMiLB25G/bV4lnZ
fawAOhGfOLG3rXmldxf6nJllHr9JpoSmVEHypvjFNcbjYZ04zhe7jM+YsaWmBw68
eHXQkOSdfPPpKXZ2B0Eef/EoGWhORW0kTD7xFlorsxYAkksSheY0PC0nYgCFhvCZ
168y9pi4T4lucr4s44x8AJ/r/5BQ1jEQAY/A2qUE/iBfRFP4XyE1Oao4OHtVDYdU
KjVPA1VYmwkKSfnkiVFrpb/94IyrKslblgR8nX0kK3L/ccFYjQix4nd9jR+n857s
xfAuj9nfhUO6fI5qoaVSOBufxKyPp1S7X8INEAJ7WQ0c9VoMv00biK9M77ifDGZg
ll/Ecq1CADtcbOnQXf6qwGwRKmpR+qgPkIzpNXcuGMuM4AEPwtckOhCyXFr37Txk
6ZoGM8IIaBJ0yXxHkfpUA7l9ZF0gXR+qHMQCwpUS8tIMx35On+IbybEaKbniKEi1
AURgQ7ZimVYAHPi0Y0L00+EKI3IPVQJvCFH7SG+wUfLWcbEtNbTv3MAer5o3DANJ
GMnWBwNw9ClTydWKI0GMNmnWpFukWhd4OXleyl2+q4qRJi3HhNacrok3s/2r+CnT
QRg8i/0SDvxGuXazrTZT
=1HAE
-----END PGP SIGNATURE-----
Merge tag 'for-v3.7' of git://git.infradead.org/users/cbou/linux-pstore
Pull pstore changes from Anton Vorontsov:
1) We no longer ad-hoc to the function tracer "high level"
infrastructure and no longer use its debugfs knobs. The change
slightly touches kernel/trace directory, but it got the needed ack
from Steven Rostedt:
http://lkml.org/lkml/2012/8/21/688
2) Added maintainers entry;
3) A bunch of fixes, nothing special.
* tag 'for-v3.7' of git://git.infradead.org/users/cbou/linux-pstore:
pstore: Avoid recursive spinlocks in the oops_in_progress case
pstore/ftrace: Convert to its own enable/disable debugfs knob
pstore/ram: Add missing platform_device_unregister
MAINTAINERS: Add pstore maintainers
pstore/ram: Mark ramoops_pstore_write_buf() as notrace
pstore/ram: Fix printk format warning
pstore/ram: Fix possible NULL dereference
Split it into two functions, one which checks if layoutgets are blocked,
and one which checks if the layout stateid has expired.
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Convert cpu_to_beXX(beXX_to_cpu(E1) + E2) to use beXX_add_cpu().
dpatch engine is used to auto generate this patch.
(https://github.com/weiyj/dpatch)
Signed-off-by: Wei Yongjun <yongjun_wei@trendmicro.com.cn>
Acked-by: Bob Copeland <me@bobcopeland.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
This cleanup also fixes the following sparse warning:
fs/proc/root.c:64:45: warning: Using plain integer as NULL pointer
Signed-off-by: Sachin Kamat <sachin.kamat@linaro.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
The use of if (!head) BUG(); can be replaced with the single line
BUG_ON(!head).
Signed-off-by: Prasad Joshi <prasadjoshi.linux@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Part of the memory will be written twice after this change, but that
should be negligible.
[akpm@linux-foundation.org: fix __proc_create() coding-style issues, remove unneeded zero-initialisations]
Signed-off-by: yan <clouds.yan@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
proc_get_inode() obtains the inode via a call to iget_locked().
iget_locked() calls alloc_inode() which will call proc_alloc_inode() which
clears proc_inode.fd, so there is no need to clear this field in
proc_get_inode().
If iget_locked() instead found the inode via find_inode_fast(), that inode
will not have I_NEW set so this change has no effect.
Signed-off-by: yan <clouds.yan@gmail.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
If proc_get_inode() returns NULL then presumably it encountered memory
exhaustion. proc_lookup_de() should return -ENOMEM in this case, not
-EINVAL.
Signed-off-by: yan <clouds.yan@gmail.com>
Cc: Ryan Mallon <rmallon@gmail.com>
Cc: Cong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
This note has the following format:
long count -- how many files are mapped
long page_size -- units for file_ofs
array of [COUNT] elements of
long start
long end
long file_ofs
followed by COUNT filenames in ASCII: "FILE1" NUL "FILE2" NUL...
Signed-off-by: Denys Vlasenko <vda.linux@googlemail.com>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Amerigo Wang <amwang@redhat.com>
Cc: "Jonathan M. Foote" <jmfoote@cert.org>
Cc: Roland McGrath <roland@hack.frob.com>
Cc: Pedro Alves <palves@redhat.com>
Cc: Fengguang Wu <fengguang.wu@intel.com>
Cc: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Existing PRSTATUS note contains only si_signo, si_code, si_errno fields
from the siginfo of the signal which caused core to be dumped.
There are tools which try to analyze crashes for possible security
implications, and they want to use, among other data, si_addr field from
the SIGSEGV.
This patch adds a new elf note, NT_SIGINFO, which contains the complete
siginfo_t of the signal which killed the process.
Signed-off-by: Denys Vlasenko <vda.linux@googlemail.com>
Reviewed-by: Oleg Nesterov <oleg@redhat.com>
Cc: Amerigo Wang <amwang@redhat.com>
Cc: "Jonathan M. Foote" <jmfoote@cert.org>
Cc: Roland McGrath <roland@hack.frob.com>
Cc: Pedro Alves <palves@redhat.com>
Cc: Fengguang Wu <fengguang.wu@intel.com>
Cc: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
This is a preparatory patch for the introduction of NT_SIGINFO elf note.
With this patch we pass "siginfo_t *siginfo" instead of "int signr" to
do_coredump() and put it into coredump_params. It will be used by the
next patch. Most changes are simple s/signr/siginfo->si_signo/.
Signed-off-by: Denys Vlasenko <vda.linux@googlemail.com>
Reviewed-by: Oleg Nesterov <oleg@redhat.com>
Cc: Amerigo Wang <amwang@redhat.com>
Cc: "Jonathan M. Foote" <jmfoote@cert.org>
Cc: Roland McGrath <roland@hack.frob.com>
Cc: Pedro Alves <palves@redhat.com>
Cc: Fengguang Wu <fengguang.wu@intel.com>
Cc: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Cosmetic. Change setup_new_exec() and task_dumpable() to use
SUID_DUMPABLE_ENABLED for /bin/grep.
[akpm@linux-foundation.org: checkpatch fixes]
Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Some coredump handlers want to create a core file in a way compatible with
standard behavior. Standard behavior with fs.suid_dumpable = 2 is to
create core file with uid=gid=0. However, there was no way for coredump
handler to know that the process being dumped was suid'ed.
This patch adds the new %d specifier for format_corename() which simply
reports __get_dumpable(mm->flags), this is compatible with
/proc/sys/fs/suid_dumpable we already have.
Addresses https://bugzilla.redhat.com/show_bug.cgi?id=787135
Developed during a discussion with Denys Vlasenko.
Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Cc: Denys Vlasenko <vda.linux@googlemail.com>
Cc: Alex Kelly <alex.page.kelly@gmail.com>
Cc: Andi Kleen <andi@firstfloor.org>
Cc: Cong Wang <amwang@redhat.com>
Cc: Jiri Moskovcak <jmoskovc@redhat.com>
Acked-by: Neil Horman <nhorman@tuxdriver.com>
Cc: Alan Cox <alan@lxorguk.ukuu.org.uk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Create a new header file, fs/coredump.h, which contains functions only
used by the new coredump.c. It also moves do_coredump to the
include/linux/coredump.h header file, for consistency.
Signed-off-by: Alex Kelly <alex.page.kelly@gmail.com>
Reviewed-by: Josh Triplett <josh@joshtriplett.org>
Acked-by: Serge Hallyn <serge.hallyn@canonical.com>
Acked-by: Kees Cook <keescook@chromium.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Adds an expert Kconfig option, CONFIG_COREDUMP, which allows disabling of
core dump. This saves approximately 2.6k in the compiled kernel, and
complements CONFIG_ELF_CORE, which now depends on it.
CONFIG_COREDUMP also disables coredump-related sysctls, except for
suid_dumpable and related functions, which are necessary for ptrace.
[akpm@linux-foundation.org: fix binfmt_aout.c build]
Signed-off-by: Alex Kelly <alex.page.kelly@gmail.com>
Reviewed-by: Josh Triplett <josh@joshtriplett.org>
Acked-by: Serge Hallyn <serge.hallyn@canonical.com>
Acked-by: Kees Cook <keescook@chromium.org>
Cc: Randy Dunlap <rdunlap@xenotime.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
#define FAT_ENT_EOF(EOF_FAT32)
there is no need to reset value of 'new' for FAT32 as the values is
already correct
Signed-off-by: Namjae Jeon <linkinjeon@gmail.com>
Signed-off-by: Amit Sahrawat <amit.sahrawat83@gmail.com>
Acked-by: OGAWA Hirofumi <hirofumi@mail.parknet.co.jp>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Simply remove the spacing between function definitions and
EXPORT_SYMBOL_GPL calls, which were previously generating warnings.
Signed-off-by: Cruz Julian Bishop <cruzjbishop@gmail.com>
Acked-by: OGAWA Hirofumi <hirofumi@mail.parknet.co.jp>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
This does the following:
1: Splits the arguments of a function call to stop it
from exceeding 80 characters
2: Re-indents the arguments of another function call
to prevent the splitting of a quoted string.
Signed-off-by: Cruz Julian Bishop <cruzjbishop@gmail.com>
Acked-by: OGAWA Hirofumi <hirofumi@mail.parknet.co.jp>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
The comments were not lined up properly, so I just re-indented them.
This also fixes a stupid checkpatch issue unknowingly
Signed-off-by: Cruz Julian Bishop <cruzjbishop@gmail.com>
Acked-by: OGAWA Hirofumi <hirofumi@mail.parknet.co.jp>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Add a space before an equals sign/operator in line 410.
Signed-off-by: Cruz Julian Bishop <cruzjbishop@gmail.com>
Acked-by: OGAWA Hirofumi <hirofumi@mail.parknet.co.jp>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Maintain an index of directory inodes by starting cluster, so that
fat_get_parent() can return the proper cached inode rather than inventing
one that cannot be traced back to the filesystem root.
Add a new msdos/vfat binary mount option "nfs" so that FAT filesystems
that are _not_ exported via NFS are not saddled with maintenance of an
index they will never use.
Finally, simplify NFS file handle generation and lookups. An
ext2-congruent implementation is adequate for FAT needs.
Signed-off-by: Steven J. Magnani <steve@digidescorp.com>
Acked-by: OGAWA Hirofumi <hirofumi@mail.parknet.co.jp>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Under memory pressure, the system may evict dentries from cache. When the
FAT driver receives a NFS request involving an evicted dentry, it is
unable to reconnect it to the filesystem root. This causes the request to
fail, often with ENOENT.
This is partially due to ineffectiveness of the current FAT NFS
implementation, and partially due to an unimplemented fh_to_parent method.
The latter can cause file accesses to fail on shares exported with
subtree_check.
This patch set provides the FAT driver with the ability to
reconnect dentries. NFS file handle generation and lookups are simplified
and made congruent with ext2.
Testing has involved a memory-starved virtual machine running 3.5-rc5 that
exports a ~2 GB vfat filesystem containing a kernel tree (~770 MB, ~40000
files, 9 levels). Both 'cp -r' and 'ls -lR' operations were performed
from a client, some overlapping, some consecutive. Exports with
'subtree_check' and 'no_subtree_check' have been tested.
Note that while this patch set improves FAT's NFS support, it does not
eliminate ESTALE errors completely.
The following should be considered for NFS clients who are sensitive to ESTALE:
* Mounting with lookupcache=none
Unfortunately this can degrade performance severely, particularly for deep
filesystems.
* Incorporating VFS patches to retry ESTALE failures on the client-side,
such as https://lkml.org/lkml/2012/6/29/381
* Handling ESTALE errors in client application code
This patch:
Move NFS-related code into its own C file. No functional changes.
Signed-off-by: Steven J. Magnani <steve@digidescorp.com>
Acked-by: OGAWA Hirofumi <hirofumi@mail.parknet.co.jp>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Convert cpu_to_leXX(leXX_to_cpu(E1) + E2) to use leXX_add_cpu().
dpatch engine is used to auto generate this patch.
(https://github.com/weiyj/dpatch)
Signed-off-by: Wei Yongjun <yongjun_wei@trendmicro.com.cn>
Cc: Mikulas Patocka <mikulas@artax.karlin.mff.cuni.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
load_elf_interp() has interp_map_addr carefully described as
"uninitialized_var" and marked so as to avoid a warning. However if you
trace the code it is passed into load_elf_interp and then this value is
checked against NULL.
As this return value isn't used this is actually safe but it freaks
various analysis tools that see un-initialized memory addresses being read
before their value is ever defined.
Set it to NULL as a matter of programming good taste if nothing else
Signed-off-by: Alan Cox <alan@linux.intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Enhanced epoll_ctl to support EPOLL_CTL_DISABLE, which disables an epoll
item. If epoll_ctl doesn't return -EBUSY in this case, it is then safe to
delete the epoll item in a multi-threaded environment. Also added a new
test_epoll self- test app to both demonstrate the need for this feature
and test it.
Signed-off-by: Paton J. Lewis <palewis@adobe.com>
Cc: Alexander Viro <viro@zeniv.linux.org.uk>
Cc: Jason Baron <jbaron@redhat.com>
Cc: Paul Holland <pholland@adobe.com>
Cc: Davide Libenzi <davidel@xmailserver.org>
Cc: Michael Kerrisk <mtk.manpages@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
To avoid name conflicts:
drivers/video/riva/fbdev.c:281:9: sparse: preprocessor token MAX_LEVEL redefined
While at it, also make the other names more consistent and add
parentheses.
[akpm@linux-foundation.org: repair fallout]
[sfr@canb.auug.org.au: IB/mlx4: fix for MAX_ID_MASK to MAX_IDR_MASK name change]
Signed-off-by: Fengguang Wu <fengguang.wu@intel.com>
Cc: Bernd Petrovitsch <bernd@petrovitsch.priv.at>
Cc: walter harms <wharms@bfs.de>
Cc: Glauber Costa <glommer@parallels.com>
Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>
Cc: Roland Dreier <roland@purestorage.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Fallocate should wait for pended ext4_convert_unwritten_extents()
otherwise following race may happen:
ftruncate( ,12288);
fallocate( ,0, 4096)
io_sibmit( ,0, 4096); /* Write to fallocated area, split extent if needed */
fallocate( ,0, 8192); /* Grow extent and broke assumption about extent */
Later kwork completion will do:
->ext4_convert_unwritten_extents (0, 4096)
->ext4_map_blocks(handle, inode, &map, EXT4_GET_BLOCKS_IO_CONVERT_EXT);
->ext4_ext_map_blocks() /* Will find new extent: ex = [0,2] !!!!!! */
->ext4_ext_handle_uninitialized_extents()
->ext4_convert_unwritten_extents_endio()
/* convert [0,2] extent to initialized, but only[0,1] was written */
Signed-off-by: Dmitry Monakhov <dmonakhov@openvz.org>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
BUG #1) All places where we call ext4_flush_completed_IO are broken
because buffered io and DIO/AIO goes through three stages
1) submitted io,
2) completed io (in i_completed_io_list) conversion pended
3) finished io (conversion done)
And by calling ext4_flush_completed_IO we will flush only
requests which were in (2) stage, which is wrong because:
1) punch_hole and truncate _must_ wait for all outstanding unwritten io
regardless to it's state.
2) fsync and nolock_dio_read should also wait because there is
a time window between end_page_writeback() and ext4_add_complete_io()
As result integrity fsync is broken in case of buffered write
to fallocated region:
fsync blkdev_completion
->filemap_write_and_wait_range
->ext4_end_bio
->end_page_writeback
<-- filemap_write_and_wait_range return
->ext4_flush_completed_IO
sees empty i_completed_io_list but pended
conversion still exist
->ext4_add_complete_io
BUG #2) Race window becomes wider due to the 'ext4: completed_io
locking cleanup V4' patch series
This patch make following changes:
1) ext4_flush_completed_io() now first try to flush completed io and when
wait for any outstanding unwritten io via ext4_unwritten_wait()
2) Rename function to more appropriate name.
3) Assert that all callers of ext4_flush_unwritten_io should hold i_mutex to
prevent endless wait
Signed-off-by: Dmitry Monakhov <dmonakhov@openvz.org>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Reviewed-by: Jan Kara <jack@suse.cz>
Clamp the layout barrier sequence id to the current sequence id
minus the maximum number of outstanding layoutget requests.
Also ensure that we correctly initialise lo->plh_barrier if there are
no layout segments associated to this layout header.
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Pull ext3 & udf fixes from Jan Kara:
"Shortlog pretty much says it all.
The interesting bits are UDF support for direct IO and ext3 fix for a
long standing oops in data=journal mode."
* 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs:
jbd: Fix assertion failure in commit code due to lacking transaction credits
UDF: Add support for O_DIRECT
ext3: Replace 0 with NULL for pointer in super.c file
udf: add writepages support for udf
ext3: don't clear orphan list on ro mount with errors
reiserfs: Make reiserfs_xattr_handlers static
Hi,
the patch si simple, but it has user visible impact and I'm not quite sure how
to resolve it.
In short, $subj says it, chattr -C supports it and we want to use it.
The conditions that acutally allow to change the NOCOW flag are clear. What if
I try to set the flag on a file that is not empty? Options:
1) whole ioctl will fail, EINVAL
2.1) ioctl will succeed, the NOCOW flag will be silently removed, but the file
will stay COW-ed and checksummed
2.2) ioctl will succeed, flag will not be removed and a syslog message will
warn that the COW flag has not been changed
2.2.1) dtto, no syslog message
Man page of chattr states that
"If it is set on a file which already has data blocks, it is undefined when
the blocks assigned to the file will be fully stable."
Yes, it's undefined and with current implementation it'll never happen. So from
this end, the user cannot expect anything. I'm trying to find a reasonable
behaviour, so that a command like 'chattr -R -aijS +C' to tweak a broad set of
flags in a deep directory does not fail unnecessarily and does not pollute the
log.
My personal preference is 2.2.1, but my dev's oppinion is skewed, not counting
the fact that I know the code and otherwise would look there before consulting
the documentation.
The patch implements 2.2.1.
david
-------------8<-------------------
From: David Sterba <dsterba@suse.cz>
It's safe to turn off checksums for a zero sized file.
http://thread.gmane.org/gmane.comp.file-systems.btrfs/18030
"We cannot switch on NODATASUM for a file that already has extents that
are checksummed. The invariant here is that either all the extents or
none are checksummed.
Theoretically it's possible to add/remove all checksums from a given
file, but it's a potentially longtime operation, the file has to be in
some intermediate state where the checksums partially exist but have to
be ignored (for the csum->nocsum) until the file is fully converted,
this brings more special cases to extent handling, it has to survive
power failure and remain consistent, and probably needs to be restarted
after next mount."
Signed-off-by: David Sterba <dsterba@suse.cz>
I saw the warning in btrfs_drop_extent_cache where our end is less than our
start while running xfstests 68 in a loop. This is because we
unconditionally do drop_end = min(end, extent_end) in
__btrfs_drop_extents(), even though we may not have found an extent in the
range we were looking to drop. So keep track of wether or not we found
something, and if we didn't just use our end. Thanks,
Signed-off-by: Josef Bacik <jbacik@fusionio.com>
We do not need to do anything special to freeze or unfreeze, it's all taken
care of by the generic work, and what we currently have is wrong anyway
since we shouldn't be returnning to userspace with mutexes held anyway.
Thanks,
Signed-off-by: Josef Bacik <jbacik@fusionio.com>
The btree inode has it's own write cache pages so we can remove this write
cache pages hook as it's not used. Thanks,
Signed-off-by: Josef Bacik <jbacik@fusionio.com>
We can race when checking wether PagePrivate is set on a page and we
actually have an eb saved in the pages private pointer. We could have
easily written out this page and released it in the time that we did the
pagevec lookup and actually got around to looking at this page. So use
mapping->private_lock to ensure we get a consistent view of the
page->private pointer. This is inline with the alloc and releasepage paths
which use private_lock when manipulating page->private. Thanks,
Reported-by: David Sterba <dave@jikos.cz>
Signed-off-by: Josef Bacik <jbacik@fusionio.com>
Dave Sterba pointed out a sleeping while atomic bug while doing fsync. This
is because I'm an idiot and didn't realize that rwlock's were spin locks, so
we've been holding this thing while doing allocations and such which is not
good. This patch fixes this by dropping the write lock before we do
anything heavy and re-acquire it when it is done. We also need to take a
ref on the em's in case their corresponding pages are evicted and mark them
as being logged so that releasepage does not remove them and doesn't remove
them from our local list. Thanks,
Reported-by: Dave Sterba <dave@jikos.cz>
Signed-off-by: Josef Bacik <jbacik@fusionio.com>
So we start our freeze, somebody comes in and does an fsync() on a file
where we have to commit a transaction for whatever reason, and we will
deadlock because the freeze is waiting on FS_FREEZE people to stop writing
to the file system, but the transaction is waiting for its free space inodes
to be written out, which are in turn waiting on sb_start_intwrite while
trying to write the file extents. To fix this we'll just skip the
sb_start_intwrite() if we TRANS_JOIN_NOLOCK since we're being waited on by a
transaction commit so we're safe wrt to freeze and this will keep us from
deadlocking. Thanks,
Signed-off-by: Josef Bacik <jbacik@fusionio.com>