There are two locks involved in managing the journal lists. The general
reiserfs_write_lock and the journal->j_flush_mutex.
While flush_journal_list is sleeping to acquire the j_flush_mutex or to
submit a block for write, it will drop the write lock. This allows
another thread to acquire the write lock and ultimately call
flush_used_journal_lists to traverse the list of journal lists and
select one for flushing. It can select the journal_list that has just
had flush_journal_list called on it in the original thread and call it
again with the same journal_list.
The second thread then drops the write lock to acquire j_flush_mutex and
the first thread reacquires it and continues execution and eventually
clears and frees the journal list before dropping j_flush_mutex and
returning.
The second thread acquires j_flush_mutex and ends up operating on a
journal_list that has already been released. If the memory hasn't
been reused, we'll soon after hit a BUG_ON because the transaction id
has already been cleared. If it's been reused, we'll crash in other
fun ways.
Since flush_journal_list will synchronize on j_flush_mutex, we can fix
the race by taking a proper reference in flush_used_journal_lists
and checking to see if it's still valid after the mutex is taken. It's
safe to iterate the list of journal lists and pick a list with
just the write lock as long as a reference is taken on the journal list
before we drop the lock. We already have code to handle whether a
transaction has been flushed already so we can use that to handle the
race and get rid of the trans_id BUG_ON.
Signed-off-by: Jeff Mahoney <jeffm@suse.com>
Signed-off-by: Jan Kara <jack@suse.cz>
Commit a3172027 introduced test_transaction as a requirement for
flushing old lists -- but it can never return 1 unless the transaction
has already been flushed.
As a result, we have a routine that iterates the j_realblocks list but
doesn't actually do anything. Since it's been this way since 2006 and
the latency numbers were what Chris expected, let's just rip it out.
Signed-off-by: Jeff Mahoney <jeffm@suse.com>
Signed-off-by: Jan Kara <jack@suse.cz>
A user has reported an oops in udf_statfs() that was caused by
numOfPartitions entry in LVID structure being corrupted. Fix the problem
by verifying whether numOfPartitions makes sense at least to the extent
that LVID fits into a single block as it should.
Reported-by: Juergen Weigert <jw@suse.com>
Signed-off-by: Jan Kara <jack@suse.cz>
Pull block IO fixes from Jens Axboe:
"After merge window, no new stuff this time only a collection of neatly
confined and simple fixes"
* 'for-3.12/core' of git://git.kernel.dk/linux-block:
cfq: explicitly use 64bit divide operation for 64bit arguments
block: Add nr_bios to block_rq_remap tracepoint
If the queue is dying then we only call the rq->end_io callout. This leaves bios setup on the request, because the caller assumes when the blk_execute_rq_nowait/blk_execute_rq call has completed that the rq->bios have been cleaned up.
bio-integrity: Fix use of bs->bio_integrity_pool after free
blkcg: relocate root_blkg setting and clearing
block: Convert kmalloc_node(...GFP_ZERO...) to kzalloc_node(...)
block: trace all devices plug operation
Pull btrfs fixes from Chris Mason:
"These are mostly bug fixes and a two small performance fixes. The
most important of the bunch are Josef's fix for a snapshotting
regression and Mark's update to fix compile problems on arm"
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs: (25 commits)
Btrfs: create the uuid tree on remount rw
btrfs: change extent-same to copy entire argument struct
Btrfs: dir_inode_operations should use btrfs_update_time also
btrfs: Add btrfs: prefix to kernel log output
btrfs: refuse to remount read-write after abort
Btrfs: btrfs_ioctl_default_subvol: Revert back to toplevel subvolume when arg is 0
Btrfs: don't leak transaction in btrfs_sync_file()
Btrfs: add the missing mutex unlock in write_all_supers()
Btrfs: iput inode on allocation failure
Btrfs: remove space_info->reservation_progress
Btrfs: kill delay_iput arg to the wait_ordered functions
Btrfs: fix worst case calculator for space usage
Revert "Btrfs: rework the overcommit logic to be based on the total size"
Btrfs: improve replacing nocow extents
Btrfs: drop dir i_size when adding new names on replay
Btrfs: replay dir_index items before other items
Btrfs: check roots last log commit when checking if an inode has been logged
Btrfs: actually log directory we are fsync()'ing
Btrfs: actually limit the size of delalloc range
Btrfs: allocate the free space by the existed max extent size when ENOSPC
...
Users have been complaining of the uuid tree stuff warning that there is no uuid
root when trying to do snapshot operations. This is because if you mount -o ro
we will not create the uuid tree. But then if you mount -o rw,remount we will
still not create it and then any subsequent snapshot/subvol operations you try
to do will fail gloriously. Fix this by creating the uuid_root on remount rw if
it was not already there. Thanks,
Signed-off-by: Josef Bacik <jbacik@fusionio.com>
Signed-off-by: Chris Mason <chris.mason@fusionio.com>
btrfs_ioctl_file_extent_same() uses __put_user_unaligned() to copy some data
back to it's argument struct. Unfortunately, not all architectures provide
__put_user_unaligned(), so compiles break on them if btrfs is selected.
Instead, just copy the whole struct in / out at the start and end of
operations, respectively.
Signed-off-by: Mark Fasheh <mfasheh@suse.de>
Signed-off-by: Josef Bacik <jbacik@fusionio.com>
Signed-off-by: Chris Mason <chris.mason@fusionio.com>
Commit 2bc5565286 (Btrfs: don't update atime on
RO subvolumes) ensures that the access time of an inode is not updated when
the inode lives in a read-only subvolume.
However, if a directory on a read-only subvolume is accessed, the atime is
updated. This results in a write operation to a read-only subvolume. I
believe that access times should never be updated on read-only subvolumes.
To reproduce:
# mkfs.btrfs -f /dev/dm-3
(...)
# mount /dev/dm-3 /mnt
# btrfs subvol create /mnt/sub
Create subvolume '/mnt/sub'
# mkdir /mnt/sub/dir
# echo "abc" > /mnt/sub/dir/file
# btrfs subvol snapshot -r /mnt/sub /mnt/rosnap
Create a readonly snapshot of '/mnt/sub' in '/mnt/rosnap'
# stat /mnt/rosnap/dir
File: `/mnt/rosnap/dir'
Size: 8 Blocks: 0 IO Block: 4096 directory
Device: 16h/22d Inode: 257 Links: 1
Access: (0755/drwxr-xr-x) Uid: ( 0/ root) Gid: ( 0/ root)
Access: 2013-09-11 07:21:49.389157126 -0400
Modify: 2013-09-11 07:22:02.330156079 -0400
Change: 2013-09-11 07:22:02.330156079 -0400
# ls /mnt/rosnap/dir
file
# stat /mnt/rosnap/dir
File: `/mnt/rosnap/dir'
Size: 8 Blocks: 0 IO Block: 4096 directory
Device: 16h/22d Inode: 257 Links: 1
Access: (0755/drwxr-xr-x) Uid: ( 0/ root) Gid: ( 0/ root)
Access: 2013-09-11 07:22:56.797151670 -0400
Modify: 2013-09-11 07:22:02.330156079 -0400
Change: 2013-09-11 07:22:02.330156079 -0400
Reported-by: Koen De Wit <koen.de.wit@oracle.com>
Signed-off-by: Guangyu Sun <guangyu.sun@oracle.com>
Signed-off-by: Josef Bacik <jbacik@fusionio.com>
Signed-off-by: Chris Mason <chris.mason@fusionio.com>
The kernel log entries for device label %s and device fsid %pU
are missing the btrfs: prefix. Add those here.
Signed-off-by: Frank Holton <fholton@gmail.com>
Reviewed-by: David Sterba <dsterba@suse.cz>
Signed-off-by: Josef Bacik <jbacik@fusionio.com>
Signed-off-by: Chris Mason <chris.mason@fusionio.com>
It's still possible to flip the filesystem into RW mode after it's
remounted RO due to an abort. There are lots of places that check for
the superblock error bit and will not write data, but we should not let
the filesystem appear read-write.
Signed-off-by: David Sterba <dsterba@suse.cz>
Signed-off-by: Josef Bacik <jbacik@fusionio.com>
Signed-off-by: Chris Mason <chris.mason@fusionio.com>
This patch makes it possible to set BTRFS_FS_TREE_OBJECTID as the default
subvolume by passing a subvolume id of 0.
Signed-off-by: chandan <chandan@linux.vnet.ibm.com>
Reviewed-by: David Sterba <dsterba@suse.cz>
Signed-off-by: Josef Bacik <jbacik@fusionio.com>
Signed-off-by: Chris Mason <chris.mason@fusionio.com>
In btrfs_sync_file(), if the call to btrfs_log_dentry_safe() returns
a negative error (for e.g. -ENOMEM via btrfs_log_inode()), we would
return without ending/freeing the transaction.
Signed-off-by: Josef Bacik <jbacik@fusionio.com>
Signed-off-by: Chris Mason <chris.mason@fusionio.com>
The BUG() was replaced by btrfs_error() and return -EIO with the
patch "get rid of one BUG() in write_all_supers()", but the missing
mutex_unlock() was overlooked.
The 0-DAY kernel build service from Intel reported the missing
unlock which was found by the coccinelle tool:
fs/btrfs/disk-io.c:3422:2-8: preceding lock on line 3374
Signed-off-by: Stefan Behrens <sbehrens@giantdisaster.de>
Signed-off-by: Josef Bacik <jbacik@fusionio.com>
Signed-off-by: Chris Mason <chris.mason@fusionio.com>
We don't do the iput when we fail to allocate our delayed delalloc work in
__start_delalloc_inodes, fix this.
Signed-off-by: Josef Bacik <jbacik@fusionio.com>
Signed-off-by: Chris Mason <chris.mason@fusionio.com>
This isn't used for anything anymore, just remove it.
Signed-off-by: Josef Bacik <jbacik@fusionio.com>
Signed-off-by: Chris Mason <chris.mason@fusionio.com>
This is a left over of how we used to wait for ordered extents, which was to
grab the inode and then run filemap flush on it. However if we have an ordered
extent then we already are holding a ref on the inode, and we just use
btrfs_start_ordered_extent anyway, so there is no reason to have an extra ref on
the inode to start work on the ordered extent. Thanks,
Signed-off-by: Josef Bacik <jbacik@fusionio.com>
Signed-off-by: Chris Mason <chris.mason@fusionio.com>
Forever ago I made the worst case calculator say that we could potentially split
into 3 blocks for every level on the way down, which isn't right. If we split
we're only going to get two new blocks, the one we originally cow'ed and the new
one we're going to split. Thanks,
Signed-off-by: Josef Bacik <jbacik@fusionio.com>
Signed-off-by: Chris Mason <chris.mason@fusionio.com>
This reverts commit 70afa3998c. It is causing
performance issues and wasn't actually correct. There were problems with the
way we flushed delalloc and that was the real cause of the early enospc.
Thanks,
Signed-off-by: Josef Bacik <jbacik@fusionio.com>
Signed-off-by: Chris Mason <chris.mason@fusionio.com>
Various people have hit a deadlock when running btrfs/011. This is because when
replacing nocow extents we will take the i_mutex to make sure nobody messes with
the file while we are replacing the extent. The problem is we are already
holding a transaction open, which is a locking inversion, so instead we need to
save these inodes we find and then process them outside of the transaction.
Further we can't just lock the inode and assume we are good to go. We need to
lock the extent range and then read back the extent cache for the inode to make
sure the extent really still points at the physical block we want. If it
doesn't we don't have to copy it. Thanks,
Signed-off-by: Josef Bacik <jbacik@fusionio.com>
Signed-off-by: Chris Mason <chris.mason@fusionio.com>
So if we have dir_index items in the log that means we also have the inode item
as well, which means that the inode's i_size is correct. However when we
process dir_index'es we call btrfs_add_link() which will increase the
directory's i_size for the new entry. To fix this we need to just set the dir
items i_size to 0, and then as we find dir_index items we adjust the i_size.
btrfs_add_link() will do it for new entries, and if the entry already exists we
can just add the name_len to the i_size ourselves. Thanks,
Signed-off-by: Josef Bacik <jbacik@fusionio.com>
Signed-off-by: Chris Mason <chris.mason@fusionio.com>
A user reported a bug where his log would not replay because he was getting
-EEXIST back. This was because he had a file moved into a directory that was
logged. What happens is the file had a lower inode number, and so it is
processed first when replaying the log, and so we add the inode ref in for the
directory it was moved to. But then we process the directories DIR_INDEX item
and try to add the inode ref for that inode and it fails because we already
added it when we replayed the inode. To solve this problem we need to just
process any DIR_INDEX items we have in the log first so this all is taken care
of, and then we can replay the rest of the items. With this patch my reproducer
can remount the file system properly instead of erroring out. Thanks,
Signed-off-by: Josef Bacik <jbacik@fusionio.com>
Signed-off-by: Chris Mason <chris.mason@fusionio.com>
Liu introduced a local copy of the last log commit for an inode to make sure we
actually log an inode even if a log commit has already taken place. In order to
make sure we didn't relog the same inode multiple times he set this local copy
to the current trans when we log the inode, because usually we log the inode and
then sync the log. The exception to this is during rename, we will relog an
inode if the name changed and it is already in the log. The problem with this
is then we go to sync the inode, and our check to see if the inode has already
been logged is tripped and we don't sync the log. To fix this we need to _also_
check against the roots last log commit, because it could be less than what is
in our local copy of the log commit. This fixes a bug where we rename a file
into a directory and then fsync the directory and then on remount the directory
is no longer there. Thanks,
Signed-off-by: Josef Bacik <jbacik@fusionio.com>
Signed-off-by: Chris Mason <chris.mason@fusionio.com>
If you just create a directory and then fsync that directory and then pull the
power plug you will come back up and the directory will not be there. That is
because we won't actually create directories if we've logged files inside of
them since they will be created on replay, but in this check we will set our
logged_trans of our current directory if it happens to be a directory, making us
think it doesn't need to be logged. Fix the logic to only do this to parent
directories. Thanks,
Signed-off-by: Josef Bacik <jbacik@fusionio.com>
Signed-off-by: Chris Mason <chris.mason@fusionio.com>
So forever we have had this thing to limit the amount of delalloc pages we'll
setup to be written out to 128mb. This is because we have to lock all the pages
in this range, so anything above this gets a bit unweildly, and also without a
limit we'll happily allocate gigantic chunks of disk space. Turns out our check
for this wasn't quite right, we wouldn't actually limit the chunk we wanted to
write out, we'd just stop looking for more space after we went over the limit.
So if you do a giant 20gb dd on my box with lots of ram I could get 2gig
extents. This is fine normally, except when you go to relocate these extents
and we can't find enough space to relocate these moster extents, since we have
to be able to allocate exactly the same sized extent to move it around. So fix
this by actually enforcing the limit. With this patch I'm no longer seeing
giant 1.5gb extents. Thanks,
Signed-off-by: Josef Bacik <jbacik@fusionio.com>
Signed-off-by: Chris Mason <chris.mason@fusionio.com>
By the current code, if the requested size is very large, and all the extents
in the free space cache are small, we will waste lots of the cpu time to cut
the requested size in half and search the cache again and again until it gets
down to the size the allocator can return. In fact, we can know the max extent
size in the cache after the first search, so we needn't cut the size in half
repeatedly, and just use the max extent size directly. This way can save
lots of cpu time and make the performance grow up when there are only fragments
in the free space cache.
According to my test, if there are only 4KB free space extents in the fs,
and the total size of those extents are 256MB, we can reduce the execute
time of the following test from 5.4s to 1.4s.
dd if=/dev/zero of=<testfile> bs=1MB count=1 oflag=sync
Changelog v2 -> v3:
- fix the problem that we skip the block group with the space which is
less than we need.
Changelog v1 -> v2:
- address the problem that we return a wrong start position when searching
the free space in a bitmap.
Signed-off-by: Miao Xie <miaox@cn.fujitsu.com>
Signed-off-by: Josef Bacik <jbacik@fusionio.com>
Signed-off-by: Chris Mason <chris.mason@fusionio.com>
We want to know if there are debugging features compiled in, this may
affect performance. The message is printed before the sanity checks.
(This commit message is a copy of David Sterba's commit message when
he introduced btrfs_print_info()).
Signed-off-by: Stefan Behrens <sbehrens@giantdisaster.de>
Reviewed-by: David Sterba <dsterba@suse.cz>
Signed-off-by: Josef Bacik <jbacik@fusionio.com>
Signed-off-by: Chris Mason <chris.mason@fusionio.com>
Instead of removing the current inode from the red black tree
and then add the new one, just use the red black tree replace
operation, which is more efficient.
Signed-off-by: Filipe David Borba Manana <fdmanana@gmail.com>
Reviewed-by: Zach Brown <zab@redhat.com>
Signed-off-by: Josef Bacik <jbacik@fusionio.com>
Signed-off-by: Chris Mason <chris.mason@fusionio.com>
If replace was suspended by the umount, replace target device is added
to the fs_devices->alloc_list during a later mount. This is obviously
wrong. ->is_tgtdev_for_dev_replace is supposed to guard against that,
but ->is_tgtdev_for_dev_replace is (and can only ever be) initialized
*after* everything is opened and fs_devices lists are populated. Fix
this by checking the devid instead: for replace targets it's always
equal to BTRFS_DEV_REPLACE_DEVID.
Cc: Stefan Behrens <sbehrens@giantdisaster.de>
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
Reviewed-by: Stefan Behrens <sbehrens@giantdisaster.de>
Signed-off-by: Josef Bacik <jbacik@fusionio.com>
Signed-off-by: Chris Mason <chris.mason@fusionio.com>
If we failed to actually allocate the correct size of the extent to relocate we
will end up in an infinite loop because we won't return an error, we'll just
move on to the next extent. So fix this up by returning an error, and then fix
all the callers to return an error up the stack rather than BUG_ON()'ing.
Thanks,
Signed-off-by: Josef Bacik <jbacik@fusionio.com>
Signed-off-by: Chris Mason <chris.mason@fusionio.com>
Don't try to dump the index key that distinguishes an object if netfs
data in the cookie the object refers to has been cleared (ie. the
cookie has passed most of the way through
__fscache_relinquish_cookie()).
Since the netfs holds the index key, we can't get at it once the ->def
and ->netfs_data pointers have been cleared - and a NULL pointer
exception will ensue, usually just after a:
CacheFiles: Error: Unexpected object collision
error is reported.
Signed-off-by: David Howells <dhowells@redhat.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
In cachefiles_check_auxdata(), we allocate auxbuf but fail to free it if
we determine there's an error or that the data is stale.
Further, assigning the output of vfs_getxattr() to auxbuf->len gives
problems with checking for errors as auxbuf->len is a u16. We don't
actually need to set auxbuf->len, so keep the length in a variable for
now. We shouldn't need to check the upper limit of the buffer as an
overflow there should be indicated by -ERANGE.
While we're at it, fscache_check_aux() returns an enum value, not an
int, so assign it to an appropriately typed variable rather than to ret.
Signed-off-by: Josh Boyer <jwboyer@fedoraproject.org>
Signed-off-by: David Howells <dhowells@redhat.com>
cc: Hongyi Jia <jiayisuse@gmail.com>
cc: Milosz Tanski <milosz@adfin.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Pull ceph fixes from Sage Weil:
"These fix several bugs with RBD from 3.11 that didn't get tested in
time for the merge window: some error handling, a use-after-free, and
a sequencing issue when unmapping and image races with a notify
operation.
There is also a patch fixing a problem with the new ceph + fscache
code that just went in"
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client:
fscache: check consistency does not decrement refcount
rbd: fix error handling from rbd_snap_name()
rbd: ignore unmapped snapshots that no longer exist
rbd: fix use-after free of rbd_dev->disk
rbd: make rbd_obj_notify_ack() synchronous
rbd: complete notifies before cleaning up osd_client and rbd_dev
libceph: add function to ensure notifies are complete
Pull vfs fixes from Al Viro:
"atomic_open-related fixes (Miklos' series, with EEXIST-related parts
replaced with fix in fs/namei.c:atomic_open() instead of messing with
the instances) + race fix in autofs + leak on failure exit in 9p"
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
9p: don't forget to destroy inode cache if fscache registration fails
atomic_open: take care of EEXIST in no-open case with O_CREAT|O_EXCL in fs/namei.c
vfs: don't set FILE_CREATED before calling ->atomic_open()
nfs: set FILE_CREATED
gfs2: set FILE_CREATED
cifs: fix filp leak in cifs_atomic_open()
vfs: improve i_op->atomic_open() documentation
autofs4: close the races around autofs4_notify_daemon()
1) Better adjustment of size of compression buffer (was too big
for EFIVARS backend resulting in compression failure
2) Use zlib_inflateInit2 instead of zlib_inflateInit
3) Don't print messages about compression failure. They will
waste space that may better be used to log console output
leading to the crash.
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.12 (GNU/Linux)
iQIcBAABAgAGBQJSOeAIAAoJEKurIx+X31iBq8wP/1MthA3CDTVFl2beFNXEo8G/
Sq3YAfTHj61f+UKT2489WSyYwc6Q3y4iEia+shCu28DkuQZMifH8KoDfsoJAKF1X
SVsm5MkelhXEDlmt94AnEXmNIgQMnJ1c5uToTanNz/UbpUZdsdVzP+c4ifUC1mX3
m+uARA2oy7obVm0RihXEzRhMZAOdkq0TXxL4TVaZShjDPuxN5BSQGlNB13+6LAEM
Q54HI/j9RHVFiIxT7INttyOMvDps2zDNJtsVgiphp0bBQBWzY1puJJykM/T64ZJV
/UMsycoKLJdLi3pnwWtZ1USTk4EwkjjVWCtUHtan6wEt1rDbrkWaMU1RvTASBz9Z
418EUAob0FZuL0ZdaN4WgYc04xwgc748S/PcUtkFfvk8KqhQbmkgbdVu6cs/mJmQ
Jbi+ATJda1zmCEQXZBLENfe7o4yiGgKjOWWy5/tbtMi8a6cpMIPUn9phNXNoRvBb
II0iMKwZetuOkDDqJAtZwPUiYNdRHWLosn+66AjpYARXqrCnRfi87x4WMWYJ4CVR
RMxrn6YQT3DIDxnBd00zVepdK9ee8It10t7k07f6Ve/EdvOJZK9lSg/FUp9MhL5a
N6S9X2gQ0R2wDHjFNRyL8p0xIoe45zFXPICLYaqcDxEcC0G7bd1AxGZ5y9v+/qvK
76dJvg0f1E/TsoqhQw79
=E5IH
-----END PGP SIGNATURE-----
Merge tag 'please-pull-pstore' of git://git.kernel.org/pub/scm/linux/kernel/git/aegl/linux
Pull pstore/compression fixes from Tony Luck:
"Three pstore fixes related to compression:
1) Better adjustment of size of compression buffer (was too big for
EFIVARS backend resulting in compression failure
2) Use zlib_inflateInit2 instead of zlib_inflateInit
3) Don't print messages about compression failure. They will waste
space that may better be used to log console output leading to the
crash"
* tag 'please-pull-pstore' of git://git.kernel.org/pub/scm/linux/kernel/git/aegl/linux:
pstore: Remove the messages related to compression failure
pstore: Use zlib_inflateInit2 instead of zlib_inflateInit
pstore: Adjust buffer size for compression for smaller registered buffers
This fixes a copy and paste error introduced by 9f060e2231
("block: Convert integrity to bvec_alloc_bs()").
Found by Coverity (CID 1020654).
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Acked-by: Kent Overstreet <koverstreet@google.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
If O_CREAT|O_EXCL are passed to open, then we know that either
- the file is successfully created, or
- the operation fails in some way.
So previously we set FILE_CREATED before calling ->atomic_open() so the
filesystem doesn't have to. This, however, led to bugs in the
implementation that went unnoticed when the filesystem didn't check for
existence, yet returned success. To prevent this kind of bug, require
filesystems to always explicitly set FILE_CREATED on O_CREAT|O_EXCL and
verify this in the VFS.
Also added a couple more verifications for the result of atomic_open():
- Warn if filesystem set FILE_CREATED despite the lack of O_CREAT.
- Warn if filesystem set FILE_CREATED but gave a negative dentry.
Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Set FILE_CREATED on O_CREAT|O_EXCL. If the NFS server honored our request
for exclusivity then this must be correct.
Currently this is a no-op, since the VFS sets FILE_CREATED anyway. The
next patch will, however, require this flag to be always set by
filesystems.
Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
Cc: Trond Myklebust <Trond.Myklebust@netapp.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
In gfs2_create_inode() set FILE_CREATED in *opened.
Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
Cc: Steven Whitehouse <swhiteho@redhat.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
If an error occurs after having called finish_open() then fput() needs to
be called on the already opened file.
Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
Cc: Steve French <sfrench@samba.org>
Cc: stable@vger.kernel.org
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Fix documentation of ->atomic_open() and related functions: finish_open()
and finish_no_open(). Also add details that seem to be unclear and a
source of bugs (some of which are fixed in the following series).
Cc-ing maintainers of all filesystems implementing ->atomic_open().
Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
Cc: Eric Van Hensbergen <ericvh@gmail.com>
Cc: Sage Weil <sage@inktank.com>
Cc: Steve French <sfrench@samba.org>
Cc: Steven Whitehouse <swhiteho@redhat.com>
Cc: Trond Myklebust <Trond.Myklebust@netapp.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Don't drop ->wq_mutex before calling autofs4_notify_daemon() only to regain it
there. Besides being pointless, that opens a race window where autofs4_wait_release()
could've come and freed wq->name.name. And do the debugging printk in the "reused an
existing wq" case before dropping ->wq_mutex - the same reason...
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Acked-by: Ian Kent <raven@themaw.net>
Pull CIFS fixes from Steve French:
"Two minor cifs fixes and a minor documentation cleanup for cifs.txt"
* 'for-next' of git://git.samba.org/sfrench/cifs-2.6:
cifs: update cifs.txt and remove some outdated infos
cifs: Avoid calling unlock_page() twice in cifs_readpage() when using fscache
cifs: Do not take a reference to the page in cifs_readpage_worker()
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.14 (GNU/Linux)
iQIcBAABAgAGBQJSNrA6AAoJECmIfjd9wqK0r60P/ijFSSZxYEr5/ChOVt1Jjs/q
cx0FcOO3r4RnXJXEQ9yNNlHWDZ+ZWYrSalaaKAAeh0WGvmCkHEyUbrAuL3Y76GEw
O37eM9Qlbpb23iQ+gTtapIhdBjABGwo556UebzUsSkJZef+B7aCdgxNjOYAYitF6
mcG3dndj91XUuhNd+93R8ovVHFjXwndruCYp+UsAajSHYGs3ThocWXXVRF/Rv0mG
GDeJD4MGuNOGG5t6WjeOYlVE5WuDHJBUYRoUqhnzHfEx7hQ60m26H6Oir8ncXj7/
3IIrfkF9pbIFiQ1jBRmcGFzzaY2UTqXaDoZN5MUc1w/1DH9PGkfeF7OfpREvDIJY
rvbT/lX/iHUbQ7lQ+CBZqc3orJT0t1nJy/mhtRy3rb2xFf2gRaFwMwuLPFgeBarm
hbUpZu3VQpi0Anx7pTavbYn5ZCoobBHvnzuOGg/2EjOFhW0baTnXzmXgHGoJAW+v
ZxcLEMsTFERr3T6pqxu6v9CNL3DVkO2jvKNR/0I30cE4XDjcd81tXvOAfw0pVp3x
bEhWLJSG2UFybQ2/PLgvuTriZ4wuJ2Mw5KCGmfp3i0IM9J7/1e9tMNvUOickcnz2
qkSFuL8Ee47QmTV95tdRwM2T679MXmDoPY6QulIl2bSMnshfMEKbL83wNCpVzXee
wwV0z4EbGlNtbR254LVF
=0WcB
-----END PGP SIGNATURE-----
Merge tag 'upstream-3.12-rc1' of git://git.infradead.org/linux-ubifs
Pull ubifs fix from Artem Bityutskiy:
"Just one patch which fixes the power-cut recovery testing mode.
I'll start using a single UBI/UBIFS tree instead of 2 trees from now
on. So in the future you'll get 1 small pull request instead of 2
tiny ones"
* tag 'upstream-3.12-rc1' of git://git.infradead.org/linux-ubifs:
UBIFS: remove invalid warn msg with tst_recovery enabled
Remove the messages indicating compression failure as it will
add to the space during panic path.
Reported-by: Seiji Aguchi <seiji.aguchi@hds.com>
Tested-by: Seiji Aguchi <seiji.aguchi@hds.com>
Signed-off-by: Aruna Balakrishnaiah <aruna@linux.vnet.ibm.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
Since zlib_deflateInit2() is used for specifying window bit during compression,
zlib_inflateInit2() is appropriate for decompression.
Reported-by: Seiji Aguchi <seiji.aguchi@hds.com>
Tested-by: Seiji Aguchi <seiji.aguchi@hds.com>
Signed-off-by: Aruna Balakrishnaiah <aruna@linux.vnet.ibm.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
When backends (ex: efivars) have smaller registered buffers, the
big_oops_buf is too big for them as number of repeated occurences
in the text captured will be less. What happens is that pstore takes
too big a bite from the dmesg log and then finds it cannot compress it
enough to meet the backend block size. Patch takes care of adjusting
the buffer size based on the registered buffer size. cmpr values have
been arrived after doing experiments with plain text for buffers of
size 1k - 4k (Smaller the buffer size repeated occurence will be less)
and with sample crash log for buffers ranging from 4k - 10k.
Reported-by: Seiji Aguchi <seiji.aguchi@hds.com>
Tested-by: Seiji Aguchi <seiji.aguchi@hds.com>
Signed-off-by: Aruna Balakrishnaiah <aruna@linux.vnet.ibm.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>