udf_count_free_bitmap() does not need BKL because bitmaps are in a fixed
place on disk and so we can count set bits without serialization.
udf_count_free_table() is now protected by s_alloc_mutex instead of BKL
to get a consistent view of free space extents.
Signed-off-by: Jan Kara <jack@suse.cz>
There's no need to call udf_add_free_space() for one block at a time. It saves
us noticeable amount of work and yields different result from the original
code only if the filesystem is corrupted and bitmap bit is already cleared.
In such case counter of free blocks is probably wrong anyways so the change
does not matter.
Signed-off-by: Jan Kara <jack@suse.cz>
udf_put_super() does not need BKL because the filesystem is shut down so
there's nothing to race with. The credential changes in udf_remount_fs()
and LVID changes are now protected by dedicated locks so we can remove BKL
from this function as well.
Signed-off-by: Jan Kara <jack@suse.cz>
Superblock carries credentials (uid, gid, etc.) which are used as default
values in __udf_read_inode() when media does not provide these. These
credentials can change during remount so we protect them by a rwlock so that
each inode gets a consistent set of credentials.
Signed-off-by: Jan Kara <jack@suse.cz>
udf_open_lvid() and udf_close_lvid() were modifying LVID without
s_alloc_mutex. Since they can be called from remount, the modification
could race with other filesystem modifications of LVID so protect them
by s_alloc_mutex just to be sure.
Signed-off-by: Jan Kara <jack@suse.cz>
uniqueID handling has been duplicated in three places. Move it into a common
helper. Since we modify an LVID buffer with uniqueID update, we take
sbi->s_alloc_mutex to protect agaist other modifications of the structure.
Signed-off-by: Jan Kara <jack@suse.cz>
udf_update_inode() does not need BKL since on-disk inode modifications are
protected by the buffer lock and reading of values of in-memory inode is
safe without any lock. In some cases we can write inconsistent inode state
to disk but in that case inode will be marked dirty and overwritten later.
Also make unnecessarily global udf_sync_inode() static.
Signed-off-by: Jan Kara <jack@suse.cz>
Add __attribute__((format... to udf_warning.
All arguments matched formats, no other changes necessary.
Signed-off-by: Joe Perches <joe@perches.com>
Signed-off-by: Jan Kara <jack@suse.cz>
Check return value of ext3_journal_get_write_access() and
ext3_journal_dirty_metadata().
Signed-off-by: Namhyung Kim <namhyung@gmail.com>
Signed-off-by: Jan Kara <jack@suse.cz>
Use the search_dirblock() in ext3_dx_find_entry(). It makes the code
easier to read, and it takes advantage of common code. It also saves
100 bytes or so of text space.
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Cc: Brad Spengler <spender@grsecurity.net>
Signed-off-by: Jan Kara <jack@suse.cz>
If the first htree directory is missing '.' or '..' but is otherwise a
valid directory, and we do a lookup for '.' or '..', it's possible to
dereference an uninitialized memory pointer in ext3_htree_next_block().
Avoid this.
We avoid this by moving the special case from ext3_dx_find_entry() to
ext3_find_entry(); this also means we can optimize ext3_find_entry()
slightly when NFS looks up "..".
Thanks to Brad Spengler for pointing a Clang warning that led me to
look more closely at this code. The warning was harmless, but it was
useful in pointing out code that was too ugly to live. This warning was
also reported by Roman Borisov.
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Cc: Jan Kara <jack@suse.cz>
Cc: Brad Spengler <spender@grsecurity.net>
Signed-off-by: Jan Kara <jack@suse.cz>
ext3_fill_super should return the error code that generic_check_accessible
returns when an error condition occurs.
Signed-off-by: Darrick J. Wong <djwong@us.ibm.com>
Signed-off-by: Jan Kara <jack@suse.cz>
Check return value of ext3_journal_get_write_access() and
ext3_journal_dirty_metadata().
Signed-off-by: Namhyung Kim <namhyung@gmail.com>
Signed-off-by: Jan Kara <jack@suse.cz>
Check return value of ext3_journal_get_write_access, ext3_journal_dirty_metadata
and ext3_mark_inode_dirty. Consolidate error path under new label 'out_clear_inode'
and adjust bh releasing appropriately.
Signed-off-by: Namhyung Kim <namhyung@gmail.com>
Signed-off-by: Jan Kara <jack@suse.cz>
Using %pV reduces the number of printk calls and
eliminates any possible message interleaving from
other printk calls.
Signed-off-by: Joe Perches <joe@perches.com>
Signed-off-by: Jan Kara <jack@suse.cz>
Using %pV reduces the number of printk calls and
eliminates any possible message interleaving from
other printk calls.
Signed-off-by: Joe Perches <joe@perches.com>
Signed-off-by: Jan Kara <jack@suse.cz>
An ext3 filesystem on a read-only device, with an external journal
which is at a different device number then recorded in the superblock
will fail to honor the read-only setting of the device and trigger
a superblock update (write).
For example:
- ext3 on a software raid which is in read-only mode
- external journal on a read-write device which has changed device num
- attempt to mount with -o journal_dev=<new_number>
- hits BUG_ON(mddev->ro = 1) in md.c
Cc: Theodore Ts'o <tytso@mit.edu>
Signed-off-by: Maciej Żenczykowski <zenczykowski@gmail.com>
Signed-off-by: Jan Kara <jack@suse.cz>
This reverts commit 3825bdb7ed920845961f32f364454bee5f469abb.
You cannot dget() a dentry without having a reference, or holding
a lock that guarantees it remains valid.
Signed-off-by: Nick Piggin <npiggin@kernel.dk>
There's no sense on keeping it on 2.6.38, as nobody is using it
anymore, at the kernel tree, and installing it at the userspace
API.
As two deprecated drivers still need it, move it to their internal
directories.
Reviewed-by: Hans Verkuil <hverkuil@xs4all.nl>
Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
flush_scheduled_work() is deprecated and scheduled to be removed.
Directly flush the used works on stop instead.
Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: Petr Vandrovec <petr@vandrovec.name>
flush_scheduled_work() is deprecated and scheduled to be removed.
* cancel_delayed_work() + flush_schedule_work() ->
cancel_delayed_work_sync().
* flush qs->qs_work directly on exit instead.
Signed-off-by: Tejun Heo <tj@kernel.org>
Acked-by: Joel Becker <joel.becker@oracle.com>
Cc: Mark Fasheh <mfasheh@suse.com>
* 'upstream-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jlbec/ocfs2:
ocfs2: Fix system inodes cache overflow.
ocfs2: Hold ip_lock when set/clear flags for indexed dir.
ocfs2: Adjust masklog flag values
Ocfs2: Teach 'coherency=full' O_DIRECT writes to correctly up_read i_alloc_sem.
ocfs2/dlm: Migrate lockres with no locks if it has a reference
https://bugzilla.kernel.org/show_bug.cgi?id=25352
This regression was caused by commit a31437b85: "ext4: use
sb_issue_zeroout in setup_new_group_blocks", by accidentally dropping
the code which reserved the block group descriptor and inode table
blocks.
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
This happens when __logfs_create() tries to write a new inode to the disk
which is full.
__logfs_create() associates the transaction pointer with inode. During
the logfs_write_inode() function call chain this transaction pointer is
moved from inode to page->private using function move_inode_to_page
(do_write_inode() -> inode_to_page() -> move_inode_to_page)
When the write inode fails, the transaction is aborted and iput is called
on the failed inode. During delete_inode the same transaction pointer
associated with the page is getting used. Thus causing kernel BUG.
The patch checks for error in write_inode() and restores the page->private
to NULL.
Addresses https://bugzilla.kernel.org/show_bug.cgi?id=20162
Signed-off-by: Prasad Joshi <prasadjoshi124@gmail.com>
Cc: Joern Engel <joern@logfs.org>
Cc: Florian Mickler <florian@mickler.org>
Cc: "Rafael J. Wysocki" <rjw@sisk.pl>
Cc: Maciej Rutecki <maciej.rutecki@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
do_logfs_journal_wl_pass() should use GFP_NOFS for memory allocation GC
code calls btree_insert32 with GFP_KERNEL while holding a mutex
super->s_write_mutex.
The same mutex is used in address_space_operations->writepage(), and a
call to writepage() could be triggered as a result of memory allocation
in btree_insert32, causing a deadlock.
Addresses https://bugzilla.kernel.org/show_bug.cgi?id=20342
Signed-off-by: Prasad Joshi <prasadjoshi124@gmail.com>
Cc: Joern Engel <joern@logfs.org>
Cc: Florian Mickler <florian@mickler.org>
Cc: "Rafael J. Wysocki" <rjw@sisk.pl>
Cc: Maciej Rutecki <maciej.rutecki@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Adds debugfs dentry o2net/stats to show the o2net timing statistics.
Signed-off-by: Sunil Mushran <sunil.mushran@oracle.com>
Signed-off-by: Joel Becker <joel.becker@oracle.com>
Tracks total time taken to process messages received on a socket.
Signed-off-by: Sunil Mushran <sunil.mushran@oracle.com>
Signed-off-by: Joel Becker <joel.becker@oracle.com>
Tracks total send and status times for all messages sent on a socket.
Signed-off-by: Sunil Mushran <sunil.mushran@oracle.com>
Signed-off-by: Joel Becker <joel.becker@oracle.com>
Replace time trackers in struct o2net_sock_container from struct timeval to
union ktime.
Signed-off-by: Sunil Mushran <sunil.mushran@oracle.com>
Signed-off-by: Joel Becker <joel.becker@oracle.com>
Replace time trackers in struct o2net_send_tracking from struct timeval to
union ktime.
Signed-off-by: Sunil Mushran <sunil.mushran@oracle.com>
Signed-off-by: Joel Becker <joel.becker@oracle.com>
In o2dlm, the enumerated message values are part of the protocol.
The patch hard codes each value so as to reduce the chance of an editing
error causing a protocol mismatch.
Signed-off-by: Sunil Mushran <sunil.mushran@oracle.com>
Signed-off-by: Joel Becker <joel.becker@oracle.com>
Patch makes use of task_pid_nr(). Also removes the null check before calling
debugfs_remove().
Signed-off-by: Sunil Mushran <sunil.mushran@oracle.com>
Signed-off-by: Joel Becker <joel.becker@oracle.com>
In ocfs2_double_lock, when ocfs2_inode_lock for inode1 fails, we
just unlock inode2 and return without releasing buffer we get from
inode_lock(inode2). The good thing is that it is freed by the only
caller ocfs2_rename when it exits.
But I don't think this is a right way for error handling. We should
free the buffer_head we get in ocfs2_double_lock before exit so that
the caller doesn't need to take care of it.
Signed-off-by: Tao Ma <boyu.mt@taobao.com>
Signed-off-by: Joel Becker <joel.becker@oracle.com>
When we store system inodes cache in ocfs2_super,
we use a array for global system inodes. But unfortunately,
the range is calculated wrongly which makes it overflow and
pollute ocfs2_super->local_system_inodes.
This patch fix it by setting the range properly.
The corresponding bug is ossbug1303.
http://oss.oracle.com/bugzilla/show_bug.cgi?id=1303
Cc: stable@kernel.org
Signed-off-by: Tao Ma <boyu.mt@taobao.com>
Signed-off-by: Joel Becker <joel.becker@oracle.com>
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client:
ceph: handle partial result from get_user_pages
ceph: mark user pages dirty on direct-io reads
ceph: fix null pointer dereference in ceph_init_dentry for nfs reexport
ceph: fix direct-io on non-page-aligned buffers
ceph: fix msgr_init error path
The only thing that the grant lock remains to protect is the grant head
manipulations when adding or removing space from the log. These calculations
are already based on atomic variables, so we can already update them safely
without locks. However, the grant head manpulations require atomic multi-step
calculations to be executed, which the algorithms currently don't allow.
To make these multi-step calculations atomic, convert the algorithms to
compare-and-exchange loops on the atomic variables. That is, we sample the old
value, perform the calculation and use atomic64_cmpxchg() to attempt to update
the head with the new value. If the head has not changed since we sampled it,
it will succeed and we are done. Otherwise, we rerun the calculation again from
a new sample of the head.
This allows us to remove the grant lock from around all the grant head space
manipulations, and that effectively removes the grant lock from the log
completely. Hence we can remove the grant lock completely from the log at this
point.
Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
The log grant ticket wait queues are currently protected by the log
grant lock. However, the queues are functionally independent from
each other, and operations on them only require serialisation
against other queue operations now that all of the other log
variables they use are atomic values.
Hence, we can make them independent of the grant lock by introducing
new locks just to protect the lists operations. because the lists
are independent, we can use a lock per list and ensure that reserve
and write head queuing do not contend.
To ensure forced shutdowns work correctly in conjunction with the
new fast paths, ensure that we check whether the log has been shut
down in the grant functions once we hold the relevant spin locks but
before we go to sleep. This is needed to co-ordinate correctly with
the wakeups that are issued on the ticket queues so we don't leave
any processes sleeping on the queues during a shutdown.
Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Using %pV reduces the number of printk calls and eliminates any
possible message interleaving from other printk calls.
In function __ext4_grp_locked_error also added KERN_CONT to some
printks.
Signed-off-by: Joe Perches <joe@perches.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
When nanosecond timestamp resolution isn't supported on an ext4
partition (inode size = 128), stat() appears to be returning
uninitialized garbage in the nanosecond component of timestamps.
EXT4_INODE_GET_XTIME should zero out tv_nsec when EXT4_FITS_IN_INODE
evaluates to false.
Reported-by: Jordan Russell <jr-list-2010@quo.to>
Signed-off-by: Eric Sandeen <sandeen@redhat.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
This function gets called a lot for large directories, and the answer
is almost always "no, no, there's no problem". This means using
unlikely() is a good thing.
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>