This renames CAP_EPOLLWAKEUP to CAP_BLOCK_SUSPEND to encourage future
reuse of the capability in question in related cases.
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2.0.18 (GNU/Linux)
iQIcBAABAgAGBQJQBcRhAAoJEKhOf7ml8uNsnIoP/2XhSul9N/AWC5jfEAh4Af07
QdhfJmYXnXC1Irndh/IoAITu+vHQecm0XjbvAy/9QOBn9oSkM7kNilvOLrCrdzzQ
j9/BRMRCJRcu/vMyJmt37z0OIgfiktgDoOBaE6nC5t+1nHotcByAMWdy/AGwqqaL
q3lbYcoRtDDQpDr9XPm68cyRdddvWnq81gXb90gNovvfgCjNFVvscshXmMGv3Luy
Dx29zROJHJNOWG3kV1Xq7PdNffZj1ChCgIsBRKkzKWROcVEGPEuH5O0wjf4I4rCV
PW6nRV9WOykqJI5CAnrWzr9bf8AvpclXtGYWFiwPvUF0kMggSoNFb5xQyRy45SBC
nC+daLZNO123yU8xKb3qXaotsKPJ0qRTKAWUqWaGkRkQ0Mg90VmanyYkmP5PkeUX
ZABNS4QlxnLGDtZuhSBioUO5pf0iDdzSrYkIOuYD81DGM8yKWWmUyxupOoVW5Kmu
QD0d34+ZgEndv9znZzBF8DdGxkwjwljJW6sIBw7PGDq3qXcYdzd4awgtPlnGEOh/
oi6iG24r8oysB8w5IJpwj20/zCvJyYVR+m+eHXxEs373xIGpbAfJbHYRKHqkYgTo
nYkZyLgE0g46Izqbb42yrN7y5dUhSsrbImTI8L5xaLVkBYhspEuSO/eSLgoklWiw
VgbmreU3R0apj0hwPcA5
=oZrz
-----END PGP SIGNATURE-----
Merge tag 'pm-post-3.5-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm
Pull a last-minute PM update from Rafael J. Wysocki:
"This renames CAP_EPOLLWAKEUP to CAP_BLOCK_SUSPEND to encourage future
reuse of the capability in question in related cases."
* tag 'pm-post-3.5-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
PM: Rename CAP_EPOLLWAKEUP to CAP_BLOCK_SUSPEND
As discussed in
http://thread.gmane.org/gmane.linux.kernel/1249726/focus=1288990,
the capability introduced in 4d7e30d989
to govern EPOLLWAKEUP seems misnamed: this capability is about governing
the ability to suspend the system, not using a particular API flag
(EPOLLWAKEUP). We should make the name of the capability more general
to encourage reuse in related cases. (Whether or not this capability
should also be used to govern the use of /sys/power/wake_lock is a
question that needs to be separately resolved.)
This patch renames the capability to CAP_BLOCK_SUSPEND. In order to ensure
that the old capability name doesn't make it out into the wild, could you
please apply and push up the tree to ensure that it is incorporated
for the 3.5 release.
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
Acked-by: Serge Hallyn <serge.hallyn@canonical.com>
Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
- Really fix a cursor leak in xfs_alloc_ag_vextent_near
- Fix a performance regression related to doing allocation in workqueues
- Prevent recursion in xfs_buf_iorequest which is causing stack overflows
- Don't call xfs_bdstrat_cb in xfs_buf_iodone callbacks
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.10 (GNU/Linux)
iQIcBAABAgAGBQJQAGW0AAoJENaLyazVq6ZOOyEP/0xLuQKFF71eB/VL8BFylEHY
H22/aDEWTo9pWDxxwBirVioBeCU07ByEv7zeQM1nqEm9pXESTSzsBUYKX2tWSRzY
YclXYA0rpdCK2cdXpuz+0kWBFr9Y1Q1BIYNll6C3ZqhADgubAMHa13rKVUQlQqpD
EZhvGrh42ujRhckwmi1E3+g3Ll79fty47WzyzEOa18ij3LI3q5Dm7WZpZlhv/MVW
Fj975Q+LdJVchoQZ7gTiddMkZ936TwjLxM4EtWZd46CMG2/YRcPHx2YaItI0k6Xa
Q34pbUHidZjqzng28iO6Y6BaB/rPLX/f7KcoZib+rc85zt8sShoDFXS9eq+DdTeH
f5AgkPzpFfr3QpK3Fv5ZjICj5SkC6KhI14qnxLZhVGIWgJLGD8lb0fYDwN2y5BHq
HmA7G4hALBaZ+TOpXQ2+XfxFyrffkQcM/Ja57yfDj37aOKYxMubAco4JlADBzBkg
m1gF2QebQrbZ49k9x85vpIvoNFXcAg6FeuesYXciMcORCpMXp4f0oLjGlfwf13Fo
upZ2AGfcpIcMl3fZzliw64VuJpX4QswTzUZFqXH8+Vn9eB6cwFqA8PJpLxBSshCk
Y6E+LE6oYbsN+T6R+6Xj0DsPl4OIE0eWlGPaZZu9eWku02e0vZMj57RxUWpNcOmH
6lq8TJaS/N5Qri51NPVs
=TP1A
-----END PGP SIGNATURE-----
Merge tag 'for-linus-v3.5-rc7' of git://oss.sgi.com/xfs/xfs
Pull xfs regression fixes from Ben Myers:
- Really fix a cursor leak in xfs_alloc_ag_vextent_near
- Fix a performance regression related to doing allocation in
workqueues
- Prevent recursion in xfs_buf_iorequest which is causing stack
overflows
- Don't call xfs_bdstrat_cb in xfs_buf_iodone callbacks
* tag 'for-linus-v3.5-rc7' of git://oss.sgi.com/xfs/xfs:
xfs: do not call xfs_bdstrat_cb in xfs_buf_iodone_callbacks
xfs: prevent recursion in xfs_buf_iorequest
xfs: don't defer metadata allocation to the workqueue
xfs: really fix the cursor leak in xfs_alloc_ag_vextent_near
If a parent and child process open the two ends of a fifo, and the
child immediately exits, the parent may receive a SIGCHLD before its
open() returns. In that case, we need to make sure that open() will
return successfully after the SIGCHLD handler returns, instead of
throwing EINTR or being restarted. Otherwise, the restarted open()
would incorrectly wait for a second partner on the other end.
The following test demonstrates the EINTR that was wrongly thrown from
the parent’s open(). Change .sa_flags = 0 to .sa_flags = SA_RESTART
to see a deadlock instead, in which the restarted open() waits for a
second reader that will never come. (On my systems, this happens
pretty reliably within about 5 to 500 iterations. Others report that
it manages to loop ~forever sometimes; YMMV.)
#include <sys/stat.h>
#include <sys/types.h>
#include <sys/wait.h>
#include <fcntl.h>
#include <signal.h>
#include <stdio.h>
#include <stdlib.h>
#include <unistd.h>
#define CHECK(x) do if ((x) == -1) {perror(#x); abort();} while(0)
void handler(int signum) {}
int main()
{
struct sigaction act = {.sa_handler = handler, .sa_flags = 0};
CHECK(sigaction(SIGCHLD, &act, NULL));
CHECK(mknod("fifo", S_IFIFO | S_IRWXU, 0));
for (;;) {
int fd;
pid_t pid;
putc('.', stderr);
CHECK(pid = fork());
if (pid == 0) {
CHECK(fd = open("fifo", O_RDONLY));
_exit(0);
}
CHECK(fd = open("fifo", O_WRONLY));
CHECK(close(fd));
CHECK(waitpid(pid, NULL, 0));
}
}
This is what I suspect was causing the Git test suite to fail in
t9010-svn-fe.sh:
http://bugs.debian.org/678852
Signed-off-by: Anders Kaseorg <andersk@mit.edu>
Reviewed-by: Jonathan Nieder <jrnieder@gmail.com>
Cc: stable@kernel.org
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
xfs_bdstrat_cb only adds a check for a shutdown filesystem over
xfs_buf_iorequest, but xfs_buf_iodone_callbacks just checked for a shut down
filesystem a little earlier. In addition the shutdown handling in
xfs_bdstrat_cb is not very suitable for this caller.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Ben Myers <bpm@sgi.com>
If the b_iodone handler is run in calling context in xfs_buf_iorequest we
can run into a recursion where xfs_buf_iodone_callbacks keeps calling back
into xfs_buf_iorequest because an I/O error happened, which keeps calling
back into xfs_buf_iorequest. This chain will usually not take long
because the filesystem gets shut down because of log I/O errors, but even
over a short time it can cause stack overflows if run on the same context.
As a short term workaround make sure we always call the iodone handler in
workqueue context.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Ben Myers <bpm@sgi.com>
Almost all metadata allocations come from shallow stack usage
situations. Avoid the overhead of switching the allocation to a
workqueue as we are not in danger of running out of stack when
making these allocations. Metadata allocations are already marked
through the args that are passed down, so this is trivial to do.
Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reported-by: Mel Gorman <mgorman@suse.de>
Tested-by: Mel Gorman <mgorman@suse.de>
Signed-off-by: Ben Myers <bpm@sgi.com>
The current cursor is reallocated when retrying the allocation, so
the existing cursor needs to be destroyed in both the restart and
the failure cases.
Signed-off-by: Dave Chinner <dchinner@redhat.com>
Tested-by: Mike Snitzer <snitzer@redhat.com>
Signed-off-by: Ben Myers <bpm@sgi.com>
This can be trivially triggered from userspace by passing in something unexpected.
kernel BUG at fs/locks.c:1468!
invalid opcode: 0000 [#1] SMP
RIP: 0010:generic_setlease+0xc2/0x100
Call Trace:
__vfs_setlease+0x35/0x40
fcntl_setlease+0x76/0x150
sys_fcntl+0x1c6/0x810
system_call_fastpath+0x1a/0x1f
Signed-off-by: Dave Jones <davej@redhat.com>
Cc: stable@kernel.org # 3.2+
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Commit 080399aaaf ("block: don't mark buffers beyond end of disk as
mapped") exposed a bug in __getblk_slow that causes mount to hang as it
loops infinitely waiting for a buffer that lies beyond the end of the
disk to become uptodate.
The problem was initially reported by Torsten Hilbrich here:
https://lkml.org/lkml/2012/6/18/54
and also reported independently here:
http://www.sysresccd.org/forums/viewtopic.php?f=13&t=4511
and then Richard W.M. Jones and Marcos Mello noted a few separate
bugzillas also associated with the same issue. This patch has been
confirmed to fix:
https://bugzilla.redhat.com/show_bug.cgi?id=835019
The main problem is here, in __getblk_slow:
for (;;) {
struct buffer_head * bh;
int ret;
bh = __find_get_block(bdev, block, size);
if (bh)
return bh;
ret = grow_buffers(bdev, block, size);
if (ret < 0)
return NULL;
if (ret == 0)
free_more_memory();
}
__find_get_block does not find the block, since it will not be marked as
mapped, and so grow_buffers is called to fill in the buffers for the
associated page. I believe the for (;;) loop is there primarily to
retry in the case of memory pressure keeping grow_buffers from
succeeding. However, we also continue to loop for other cases, like the
block lying beond the end of the disk. So, the fix I came up with is to
only loop when grow_buffers fails due to memory allocation issues
(return value of 0).
The attached patch was tested by myself, Torsten, and Rich, and was
found to resolve the problem in call cases.
Signed-off-by: Jeff Moyer <jmoyer@redhat.com>
Reported-and-Tested-by: Torsten Hilbrich <torsten.hilbrich@secunet.com>
Tested-by: Richard W.M. Jones <rjones@redhat.com>
Reviewed-by: Josh Boyer <jwboyer@redhat.com>
Cc: Stable <stable@vger.kernel.org> # 3.0+
[ Jens is on vacation, taking this directly - Linus ]
--
Stable Notes: this patch requires backport to 3.0, 3.2 and 3.3.
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
fat_encode_fh() can fetch an invalid i_pos value on systems where 64-bit
accesses are not atomic. Make it use the same accessor as the rest of the
FAT code.
Signed-off-by: Steven J. Magnani <steve@digidescorp.com>
Acked-by: OGAWA Hirofumi <hirofumi@mail.parknet.co.jp>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
There is a bug in the below scenario for !CONFIG_MMU:
1. create a new file
2. mmap the file and write to it
3. read the file can't get the correct value
Because
sys_read() -> generic_file_aio_read() -> simple_readpage() -> clear_page()
which causes the page to be zeroed.
Add SetPageUptodate() to ramfs_nommu_expand_for_mapping() so that
generic_file_aio_read() do not call simple_readpage().
Signed-off-by: Bob Liu <lliubbo@gmail.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: David Howells <dhowells@redhat.com>
Cc: Geert Uytterhoeven <geert@linux-m68k.org>
Cc: Greg Ungerer <gerg@uclinux.org>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
As ocfs2_fallocate() will invoke __ocfs2_change_file_space() with a NULL
as the first parameter (file), it may trigger a NULL pointer dereferrence
due to a missing check.
Addresses http://bugs.launchpad.net/bugs/1006012
Signed-off-by: Luis Henriques <luis.henriques@canonical.com>
Reported-by: Bret Towe <magnade@gmail.com>
Tested-by: Bret Towe <magnade@gmail.com>
Cc: Sunil Mushran <sunil.mushran@oracle.com>
Acked-by: Joel Becker <jlbec@evilplan.org>
Acked-by: Mark Fasheh <mfasheh@suse.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
The helper nfs_fs_mount() will always call nfs4_try_mount with the
mount_info->fill_super argument pointing to nfs_fill_super, which is
NFSv2/v3 only.
Fix is to have nfs4_try_mount replace it with nfs4_fill_super.
The regression was introduced by commit c40f8d1d (NFS: Create a common
fs_mount() function)
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Fix 2 bugs in nfs_direct_write_reschedule:
- The request needs to be removed from the 'reqs' list before it can
be added to 'failed'.
- Fix an infinite loop if the 'failed' list is non-empty.
Reported-by: Julia Lawall <julia.lawall@lip6.fr>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
We already use them for openat() and friends, but fchdir() also wants to
be able to use O_PATH file descriptors. This should make it comparable
to the O_SEARCH of Solaris. In particular, O_PATH allows you to access
(not-quite-open) a directory you don't have read persmission to, only
execute permission.
Noticed during development of multithread support for ksh93.
Reported-by: ольга крыжановская <olga.kryzhanovska@gmail.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: stable@kernel.org # O_PATH introduced in 3.0+
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
filesystem. This isn't an urgent fix, but it is simple and the check was
obviously incorrect.
Also fixes a couple important bugs in the eCryptfs miscdev interface. These
changes are low risk due to the small number of users that use the miscdev
interface. I was able to keep the changes minimal and I have some cleaner, more
complete changes queued up for the next merge window that will build on these
patches.
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.11 (GNU/Linux)
iQIcBAABCgAGBQJP92LsAAoJENaSAD2qAscKCWkP/3BWpv0AS0fnrZPniXv/+vjf
gdV4NcQhE/86VsQ7CtZS7jqfSVTzm+YTta9BTKj6jWZuGUZGcjXsZdyMpleBZukh
TvRSW3HKCRtC8XNHzle3YUukD1o465nMEiCUQOYcWjAa3in7cZTiFU+3S2Unn5UF
yh2Slfzjxkl2EUHEbcBiBayzaMH2gqwAvRR4sjM0P175m/jjDF6pDGT5vc0skvcP
kLzFr/3Ia9BW1nU0yblTtSNcHzYV8GTJVEpj1NR7q59x2gVJubF6hBDtbZdaaGK0
rYlKV+w9mRwzUCuVdb4zPCa9EGrbqH4gYvIWsCW+R0zoK57rfIRolQVYEglGE2TU
K3HHL6UOsPASZCQqhi+K+tCmYtZaCfeMhDRgxyDOaxS4rQ6dy+XO6f9zM30qw1UB
QHeVEQl7bM0IpByCcjVbuNJT4zTlW7xmsLm/pbGv60UBdZpqaUZptEBEpgUFjq30
shgNLlHHWvelhf52gbff+ytCHf+IDVPT/Q2aGjhC2fgqWiQno44vR88gtMQz6b7g
4yEL7t0TqBB9jCBu/ikTITGpRH5S149e3oYGm2P/+YYZUGlw0Gf9N6TBkctJFSg/
/vk6aobMnjfxmeM80xOKey5Y1zDis660sgt1hX8NVAuo4hp7VQfWGhEZ8lYqzCzP
aJci4ZXaDzwXx6UCC5w2
=TEei
-----END PGP SIGNATURE-----
Merge tag 'ecryptfs-3.5-rc6-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tyhicks/ecryptfs
Pull eCryptfs fixes from Tyler Hicks:
"Fixes an incorrect access mode check when preparing to open a file in
the lower filesystem. This isn't an urgent fix, but it is simple and
the check was obviously incorrect.
Also fixes a couple important bugs in the eCryptfs miscdev interface.
These changes are low risk due to the small number of users that use
the miscdev interface. I was able to keep the changes minimal and I
have some cleaner, more complete changes queued up for the next merge
window that will build on these patches."
* tag 'ecryptfs-3.5-rc6-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tyhicks/ecryptfs:
eCryptfs: Gracefully refuse miscdev file ops on inherited/passed files
eCryptfs: Fix lockdep warning in miscdev operations
eCryptfs: Properly check for O_RDONLY flag before doing privileged open
File operations on /dev/ecryptfs would BUG() when the operations were
performed by processes other than the process that originally opened the
file. This could happen with open files inherited after fork() or file
descriptors passed through IPC mechanisms. Rather than calling BUG(), an
error code can be safely returned in most situations.
In ecryptfs_miscdev_release(), eCryptfs still needs to handle the
release even if the last file reference is being held by a process that
didn't originally open the file. ecryptfs_find_daemon_by_euid() will not
be successful, so a pointer to the daemon is stored in the file's
private_data. The private_data pointer is initialized when the miscdev
file is opened and only used when the file is released.
https://launchpad.net/bugs/994247
Signed-off-by: Tyler Hicks <tyhicks@canonical.com>
Reported-by: Sasha Levin <levinsasha928@gmail.com>
Tested-by: Sasha Levin <levinsasha928@gmail.com>
Pull ocfs2 fixes from Joel Becker.
* 'fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jlbec/ocfs2:
aio: make kiocb->private NUll in init_sync_kiocb()
ocfs2: Fix bogus error message from ocfs2_global_read_info
ocfs2: for SEEK_DATA/SEEK_HOLE, return internal error unchanged if ocfs2_get_clusters_nocache() or ocfs2_inode_lock() call failed.
ocfs2: use spinlock irqsave for downconvert lock.patch
ocfs2: Misplaced parens in unlikley
ocfs2: clear unaligned io flag when dio fails
Pull cifs fixes from Steve French.
* git://git.samba.org/sfrench/cifs-2.6:
cifs: when server doesn't set CAP_LARGE_READ_X, cap default rsize at MaxBufferSize
cifs: fix parsing of password mount option
Pull btrfs updates from Chris Mason:
"I held off on my rc5 pull because I hit an oops during log recovery
after a crash. I wanted to make sure it wasn't a regression because
we have some logging fixes in here.
It turns out that a commit during the merge window just made it much
more likely to trigger directory logging instead of full commits,
which exposed an old bug.
The new backref walking code got some additional fixes. This should
be the final set of them.
Josef fixed up a corner where our O_DIRECT writes and buffered reads
could expose old file contents (not stale, just not the most recent).
He and Liu Bo fixed crashes during tree log recover as well.
Ilya fixed errors while we resume disk balancing operations on
readonly mounts."
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs:
Btrfs: run delayed directory updates during log replay
Btrfs: hold a ref on the inode during writepages
Btrfs: fix tree log remove space corner case
Btrfs: fix wrong check during log recovery
Btrfs: use _IOR for BTRFS_IOC_SUBVOL_GETFLAGS
Btrfs: resume balance on rw (re)mounts properly
Btrfs: restore restriper state on all mounts
Btrfs: fix dio write vs buffered read race
Btrfs: don't count I/O statistic read errors for missing devices
Btrfs: resolve tree mod log locking issue in btrfs_next_leaf
Btrfs: fix tree mod log rewind of ADD operations
Btrfs: leave critical region in btrfs_find_all_roots as soon as possible
Btrfs: always put insert_ptr modifications into the tree mod log
Btrfs: fix tree mod log for root replacements at leaf level
Btrfs: support root level changes in __resolve_indirect_ref
Btrfs: avoid waiting for delayed refs when we must not
'status' variable in ocfs2_global_read_info() is always != 0 when leaving the
function because it happens to contain number of read bytes. Thus we always log
error message although everything is OK. Since all error cases properly call
mlog_errno() before jumping to out_err, there's no reason to call mlog_errno()
on exit at all. This is a fallout of c1e8d35e (conversion of mlog_exit()
calls).
Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: Joel Becker <jlbec@evilplan.org>
Hello,
Since ENXIO only means "offset beyond EOF" for SEEK_DATA/SEEK_HOLE,
Hence we should return the internal error unchanged if ocfs2_inode_lock() or
ocfs2_get_clusters_nocache() call failed rather than ENXIO.
Otherwise, it will confuse the user applications when they trying to understand the root cause.
Thanks Dave for pointing this out.
Thanks,
-Jeff
Cc: Dave Chinner <david@fromorbit.com>
Signed-off-by: Jie Liu <jeff.liu@oracle.com>
Signed-off-by: Joel Becker <jlbec@evilplan.org>
When ocfs2dc thread holds dc_task_lock spinlock and receives soft IRQ it
deadlock itself trying to get same spinlock in ocfs2_wake_downconvert_thread.
Below is the stack snippet.
The patch disables interrupts when acquiring dc_task_lock spinlock.
ocfs2_wake_downconvert_thread
ocfs2_rw_unlock
ocfs2_dio_end_io
dio_complete
.....
bio_endio
req_bio_endio
....
scsi_io_completion
blk_done_softirq
__do_softirq
do_softirq
irq_exit
do_IRQ
ocfs2_downconvert_thread
[kthread]
Signed-off-by: Srinivas Eeda <srinivas.eeda@oracle.com>
Signed-off-by: Joel Becker <jlbec@evilplan.org>
The unaligned io flag is set in the kiocb when an unaligned
dio is issued, it should be cleared even when the dio fails,
or it may affect the following io which are using the same
kiocb.
Signed-off-by: Junxiao Bi <junxiao.bi@oracle.com>
Cc: stable@vger.kernel.org
Signed-off-by: Joel Becker <jlbec@evilplan.org>
If the first attempt at opening the lower file read/write fails,
eCryptfs will retry using a privileged kthread. However, the privileged
retry should not happen if the lower file's inode is read-only because a
read/write open will still be unsuccessful.
The check for determining if the open should be retried was intended to
be based on the access mode of the lower file's open flags being
O_RDONLY, but the check was incorrectly performed. This would cause the
open to be retried by the privileged kthread, resulting in a second
failed open of the lower file. This patch corrects the check to
determine if the open request should be handled by the privileged
kthread.
Signed-off-by: Tyler Hicks <tyhicks@canonical.com>
Reported-by: Dan Carpenter <dan.carpenter@oracle.com>
Acked-by: Dan Carpenter <dan.carpenter@oracle.com>
Pull block bits from Jens Axboe:
"As vacation is coming up, thought I'd better get rid of my pending
changes in my for-linus branch for this iteration. It contains:
- Two patches for mtip32xx. Killing a non-compliant sysfs interface
and moving it to debugfs, where it belongs.
- A few patches from Asias. Two legit bug fixes, and one killing an
interface that is no longer in use.
- A patch from Jan, making the annoying partition ioctl warning a bit
less annoying, by restricting it to !CAP_SYS_RAWIO only.
- Three bug fixes for drbd from Lars Ellenberg.
- A fix for an old regression for umem, it hasn't really worked since
the plugging scheme was changed in 3.0.
- A few fixes from Tejun.
- A splice fix from Eric Dumazet, fixing an issue with pipe
resizing."
* 'for-linus' of git://git.kernel.dk/linux-block:
scsi: Silence unnecessary warnings about ioctl to partition
block: Drop dead function blk_abort_queue()
block: Mitigate lock unbalance caused by lock switching
block: Avoid missed wakeup in request waitqueue
umem: fix up unplugging
splice: fix racy pipe->buffers uses
drbd: fix null pointer dereference with on-congestion policy when diskless
drbd: fix list corruption by failing but already aborted reads
drbd: fix access of unallocated pages and kernel panic
xen/blkfront: Add WARN to deal with misbehaving backends.
blkcg: drop local variable @q from blkg_destroy()
mtip32xx: Create debugfs entries for troubleshooting
mtip32xx: Remove 'registers' and 'flags' from sysfs
blkcg: fix blkg_alloc() failure path
block: blkcg_policy_cfq shouldn't be used if !CONFIG_CFQ_GROUP_IOSCHED
block: fix return value on cfq_init() failure
mtip32xx: Remove version.h header file inclusion
xen/blkback: Copy id field when doing BLKIF_DISCARD.
When the server doesn't advertise CAP_LARGE_READ_X, then MS-CIFS states
that you must cap the size of the read at the client's MaxBufferSize.
Unfortunately, testing with many older servers shows that they often
can't service a read larger than their own MaxBufferSize.
Since we can't assume what the server will do in this situation, we must
be conservative here for the default. When the server can't do large
reads, then assume that it can't satisfy any read larger than its
MaxBufferSize either.
Luckily almost all modern servers can do large reads, so this won't
affect them. This is really just for older win9x and OS/2 era servers.
Also, note that this patch just governs the default rsize. The admin can
always override this if he so chooses.
Cc: <stable@vger.kernel.org> # 3.2
Reported-by: David H. Durgee <dhdurgee@acm.org>
Signed-off-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: Steven French <sfrench@w500smf.(none)>
While we are resolving directory modifications in the
tree log, we are triggering delayed metadata updates to
the filesystem btrees.
This commit forces the delayed updates to run so the
replay code can find any modifications done. It stops
us from crashing because the directory deleltion replay
expects items to be removed immediately from the tree.
Signed-off-by: Chris Mason <chris.mason@fusionio.com>
cc: stable@kernel.org
We can race with unlink and not actually be able to do our igrab in
btrfs_add_ordered_extent. This will result in all sorts of problems.
Instead of doing the complicated work to try and handle returning an error
properly from btrfs_add_ordered_extent, just hold a ref to the inode during
writepages. If we cannot grab a ref we know we're freeing this inode anyway
and can just drop the dirty pages on the floor, because screw them we're
going to invalidate them anyway. Thanks,
Signed-off-by: Josef Bacik <jbacik@fusionio.com>
The tree log stuff can have allocated space that we end up having split
across a bitmap and a real extent. The free space code does not deal with
this, it assumes that if it finds an extent or bitmap entry that the entire
range must fall within the entry it finds. This isn't necessarily the case,
so rework the remove function so it can handle this case properly. This
fixed two panics the user hit, first in the case where the space was
initially in a bitmap and then in an extent entry, and then the reverse
case. Thanks,
Reported-and-tested-by: Shaun Reich <sreich@kde.org>
Signed-off-by: Josef Bacik <jbacik@fusionio.com>
When we're evicting an inode during log recovery, we need to ensure that the inode
is not in orphan state any more, which means inode's run_time flags has _no_
BTRFS_INODE_HAS_ORPHAN_ITEM. Thus, the BUG_ON was triggered because of a wrong
check for the flags.
Reviewed-by: David Sterba <dsterba@suse.cz>
Signed-off-by: Liu Bo <liubo2009@cn.fujitsu.com>
Signed-off-by: Josef Bacik <jbacik@fusionio.com>
We used the wrong ioctl macro for the getflags ioctl before.
As we don't have the set/getflags ioctls in the user space ioctl.h
at the moment, it's safe to fix it now.
Reviewed-by: David Sterba <dsterba@suse.cz>
Signed-off-by: Alexander Block <ablock84@googlemail.com>
This introduces btrfs_resume_balance_async(), which, given that
restriper state was recovered earlier by btrfs_recover_balance(),
resumes balance in btrfs-balance kthread.
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
Fix a bug that triggered asserts in btrfs_balance() in both normal and
resume modes -- restriper state was not properly restored on read-only
mounts. This factors out resuming code from btrfs_restore_balance(),
which is now also called earlier in the mount sequence to avoid the
problem of some early writes getting the old profile.
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
Miao pointed out there's a problem with mixing dio writes and buffered
reads. If the read happens between us invalidating the page range and
actually locking the extent we can bring in pages into page cache. Then
once the write finishes if somebody tries to read again it will just find
uptodate pages and we'll read stale data. So we need to lock the extent and
check for uptodate bits in the range. If there are uptodate bits we need to
unlock and invalidate again. This will keep this race from happening since
we will hold the extent locked until we create the ordered extent, and then
teh read side always waits for ordered extents. There was also a race in
how we updated i_size, previously we were relying on the generic DIO stuff
to adjust the i_size after the DIO had completed, but this happens outside
of the extent lock which means reads could come in and not see the updated
i_size. So instead move this work into where we create the extents, and
then this way the update ordered i_size stuff works properly in the endio
handlers. Thanks,
Signed-off-by: Josef Bacik <josef@redhat.com>
It is normal behaviour of the low level btrfs function btrfs_map_bio()
to complete a bio with -EIO if the device is missing, instead of just
preventing the bio creation in an earlier step.
This used to cause I/O statistic read error increments and annoying
printk_ratelimited messages. This commit fixes the issue.
Signed-off-by: Stefan Behrens <sbehrens@giantdisaster.de>
Reported-by: Carey Underwood <cwillu@cwillu.com>
Pull UDF fixes from Jan Kara:
"Make UDF more robust in presence of corrupted filesystem"
* 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs:
udf: Fortify loading of sparing table
udf: Avoid run away loop when partition table length is corrupted
udf: Use 'ret' instead of abusing 'i' in udf_load_logicalvol()
'IS_ENABLED()' macro usage: should be 'IS_ENABLED(CONFIG_DEBUG_FS)',
but we had 'IS_ENABLED(DEBUG_FS)'. Also fix incorrect assertion.
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.12 (GNU/Linux)
iQIcBAABAgAGBQJP7Hx+AAoJECmIfjd9wqK04FEP/3NC0qMlleuEcHv9AFwOJ1PB
rn1PDjz6kuEjZ1/xhGFoboBOUj577qyzIP6IvG7MqSH66Yc8gBF+sPbKWWglx7wB
i2y7/Fi4lHM3w59GfvnOhdI7McklhPyl3R183MZ3EBJk4V0LPi2rXsl7G5puLNgG
XkJuOjXLXZPgyeMR+DlBsoaaxBMihnh/pdpUAyLER1cQdzQwCzba82tNrMgnCp7i
TIFTPtn+LmEQyHcqXx5ub/FV6BiEUXIJbkKlp5Ajqyh/olSNtGdCHRrP5MXpU2kI
DmtyMvp+3PBHxrUYQjXT6uerL9uXhIUyRv49qO0tS68fUg44JCFflmPkoV9qEvvl
ADbOqklx1DWdVCiZdhXWe1GFhf6U+TOoUyeiIzGIy0fIIlycNl915F4LzxVUqQKm
yoqouEvzqd1LIAsopakF2DDIKoK6ViWmHBkuN04B+u+iab4DC4aX3vgxq+Ie8mqA
0QNIamovk/2MR2665XhbARu0yDSEmGZvD6dkuSAgXIxjw7tdvvlY7pkSouWTOSR0
fbqPrhgbRON+mT4Fcrb5dMq+PAiOTw5kp7az90+U6i1oLm5TY8CaHt3UvmdspOM/
UeHRhLR8o/RhfnvnexiOxIWtQGCX+CCuePe9oN/fQabj9dfvKI1iR+zoyheFLwiT
HxaU6I7oVYQVkeifNJVV
=iZTv
-----END PGP SIGNATURE-----
Merge tag 'upstream-3.5-rc5' of git://git.infradead.org/linux-ubifs
Pull ubi/ubifs fixes from Artem Bityutskiy:
"Fix the debugfs regression - we never enable it because incorrect
'IS_ENABLED()' macro usage: should be 'IS_ENABLED(CONFIG_DEBUG_FS)',
but we had 'IS_ENABLED(DEBUG_FS)'. Also fix incorrect assertion."
* tag 'upstream-3.5-rc5' of git://git.infradead.org/linux-ubifs:
UBI: correct usage of IS_ENABLED()
UBIFS: correct usage of IS_ENABLED()
UBIFS: fix assertion
Check provided length of partition table so that (possibly maliciously)
corrupted partition table cannot cause accessing data beyond current buffer.
Signed-off-by: Jan Kara <jack@suse.cz>
With the tree mod log, we may end up with two roots (the current root and a
rewinded version of it) both pointing to two leaves, l1 and l2, of which l2
had already been cow-ed in the current transaction. If we don't rewind any
tree blocks, we cannot have two roots both pointing to an already cowed tree
block.
Now there is btrfs_next_leaf, which has a leaf locked and wants a lock on
the next (right) leaf. And there is push_leaf_left, which has a (cowed!)
leaf locked and wants a lock on the previous (left) leaf.
In order to solve this dead lock situation, we use try_lock in
btrfs_next_leaf (only in case it's called with a tree mod log time_seq
paramter) and if we fail to get a lock on the next leaf, we give up our lock
on the current leaf and retry from the very beginning.
Signed-off-by: Jan Schmidt <list.btrfs@jan-o-sch.net>
When a MOD_LOG_KEY_ADD operation is rewinded, we remove the key from the
tree block. If its not the last key, removal involves a move operation.
This move operation was explicitly done before this commit.
However, at insertion time, there's a move operation before the actual
addition to make room for the new key, which is recorded in the tree mod
log as well. This means, we must drop the move operation when rewinding the
add operation, because the next operation we'll be rewinding will be the
corresponding MOD_LOG_MOVE_KEYS operation.
Signed-off-by: Jan Schmidt <list.btrfs@jan-o-sch.net>
When delayed refs exist, btrfs_find_all_roots used to hold the delayed ref
mutex way longer than actually required. We ought to drop it immediately
after we're done collecting all the delayed refs.
Signed-off-by: Jan Schmidt <list.btrfs@jan-o-sch.net>
Several callers of insert_ptr set the tree_mod_log parameter to 0 to avoid
addition to the tree mod log. In fact, we need all of those operations. This
commit simply removes the additional parameter and makes addition to the
tree mod log unconditional.
Signed-off-by: Jan Schmidt <list.btrfs@jan-o-sch.net>
For the tree mod log, we don't log any operations at leaf level. If the root
is at the leaf level (i.e. the tree consists only of the root), then
__tree_mod_log_oldest_root will find a ROOT_REPLACE operation in the log
(because we always log that one no matter which level), but no other
operations.
With this patch __tree_mod_log_oldest_root exits cleanly instead of
BUGging in this situation. get_old_root checks if its really a root at leaf
level in case we don't have any operations and WARNs if this assumption
breaks.
Signed-off-by: Jan Schmidt <list.btrfs@jan-o-sch.net>