This patch forms a pair with the previous patch which shrunk
di_height. Like that patch di_depth is renamed i_depth and moved
into struct gfs2_inode directly. Also the field goes from 16 bits
to 8 bits since it is also limited to a max value which is rather
small (17 in this case). In addition we also now validate the field
against this maximum value when its read in.
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
This patch further reduces GFS2's memory requirements by
eliminating the 64-bit version number fields in lieu of
a couple bits.
Signed-off-by: Bob Peterson <rpeterso@redhat.com>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
The functions in lm.c were just wrappers which were mostly
only used in one other file. By moving the functions to
the files where they are being used, they can be marked
static and also this will usually result in them being inlined
since they are often only used from one point in the code.
A couple of really trivial functions have been inlined by hand
into the function which called them as it makes the code clearer
to do that.
We also gain from one fewer function call in the glock lock and
unlock paths.
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
This patch further reduces the memory needs of GFS2 by
eliminating the gl_req_bh variable from struct gfs2_glock.
Signed-off-by: Bob Peterson <rpeterso@redhat.com>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
There are a couple of routines which scan bitmaps where we can
mark the bitmaps const, plus a couple of call sites that can
be updated too.
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
The reason for doing this is to allow all the block mapping code
to share the same array. As a result we can remove two arguments
from lookup_metapath since they are now returned via the array.
We also add a function to drop all refs to buffer heads when we
are done with the metapath. The build_height function shares the
struct metapath, but currently still frees its own buffers, and
this will change in a future patch.
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
This patch reduces memory by replacing the int variable
gl_waiters2 by a single bit in the gl_flags.
Signed-off-by: Bob Peterson <rpeterso@redhat.com>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
This patch reduces the memory required by GFS2 by combining
the rd_flags and rg_flags (in core only).
Signed-off-by: Bob Peterson <rpeterso@redhat.com>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
This patch moves the gfs2_rgrpd structure to its own slab
memory. This makes it easier to control and monitor, and
yields less memory fragmentation.
Signed-off-by: Bob Peterson <rpeterso@redhat.com>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
This patch only wakes up the glock reclaim daemon if there is
actually something to be reclaimed.
Signed-off-by: Bob Peterson <rpeterso@redhat.com>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
This patch contains two small fixups that didn't fit elsewhere.
They are: (1) get rid of temp variable in find_metapath.
(2) Remove vestigial "ret" variable from gfs2_writepage_common.
Signed-off-by: Bob Peterson <rpeterso@redhat.com>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
This patch is performance related. When we're doing a log flush,
I noticed we were calling buf_lo_incore_commit twice: once for
data bufs and once for metadata bufs. Since this is the same
function and does the same thing in both cases, there should be
no reason to call it twice. Since we only need to call it once,
we can also make it faster by removing it from the generic "lops"
code and making it a stand-along static function.
Signed-off-by: Bob Peterson <rpeterso@redhat.com>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
I noticed that the latest change to i_height got rid of the
value from the inode dump. This patch adds it back.
Signed-off-by: Bob Peterson <rpeterso@redhat.com>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
This patch removed the unnecessary parameter from function
gfs2_rlist_alloc. The parameter was always passed in as 0.
Signed-off-by: Bob Peterson <rpeterso@redhat.com>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
This patch improves the calculation of the tree height in order to reduce
the number of operations which are carried out on each call to gfs2_block_map.
In the common case, we now make a single comparison, rather than calculating
the required tree height from scratch each time. Also in the case that the
tree does need some extra height, we start from the current height rather from
zero when we work out what the new height ought to be.
In addition the di_height field is moved into the inode proper and reduced
in size to a u8 since the value must be between 0 and GFS2_MAX_META_HEIGHT (10).
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
This patch removes the call to gfs2_extent_map from gfs2_write_alloc_required,
instead we call gfs2_block_map directly. This results in fewer overall calls
to gfs2_block_map in the multi-block case.
Also, gfs2_extent_map is marked as deprecated so that people know that its
going away as soon as all the callers have been converted.
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs-2.6:
[PATCH] mnt_expire is protected by namespace_sem, no need for vfsmount_lock
[PATCH] do shrink_submounts() for all fs types
[PATCH] sanitize locking in mark_mounts_for_expiry() and shrink_submounts()
[PATCH] count ghost references to vfsmounts
[PATCH] reduce stack footprint in namespace.c
kafs doesn't check if the cell already exists - so if you do an echo "add
newcell.org 1.2.3.4" >/proc/fs/afs/cells it will try to create this cell
again. kobject will also complain about a double registration. To prevent
such problems, return -EEXIST in that case.
Signed-off-by: Sven Schnelle <svens@stackframe.org>
Signed-off-by: David Howells <dhowells@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
... and take it out of ->umount_begin() instances. Call with all locks
already taken (by do_umount()) and leave calling release_mounts() to
caller (it will do release_mounts() anyway, so we can just put into
the same list).
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
make propagate_mount_busy() exclude references from the vfsmounts
that had been isolated by umount_tree() and are just waiting for
release_mounts() to dispose of their ->mnt_parent/->mnt_mountpoint.
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs-2.6:
[PATCH] get stack footprint of pathname resolution back to relative sanity
[PATCH] double iput() on failure exit in hugetlb
[PATCH] double dput() on failure exit in tiny-shmem
[PATCH] fix up new filp allocators
[PATCH] check for null vfsmount in dentry_open()
[PATCH] reiserfs: eliminate private use of struct file in xattr
[PATCH] sanitize hppfs
hppfs pass vfsmount to dentry_open()
[PATCH] restore export of do_kern_mount()
Try to find the culprit who caused
http://bugzilla.kernel.org/show_bug.cgi?id=10150
Cc: <balajirrao@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
* git://git.kernel.org/pub/scm/linux/kernel/git/sfrench/cifs-2.6:
[CIFS] Fix mem leak on dfs referral
[CIFS] file create with acl support enabled is slow
[CIFS] Fix mtime on cp -p when file data cached but written out too late
[CIFS] Fix build problem
[CIFS] cifs: replace remaining __FUNCTION__ occurrences
[CIFS] DFS patch that connects inode with dfs handling ops
Change pagemap output format to allow for future reporting of huge pages.
(Format comment and minor cleanups: mpm@selenic.com)
Signed-off-by: Hans Rosenfeld <hans.rosenfeld@amd.com>
Signed-off-by: Matt Mackall <mpm@selenic.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
commit e9720ac ([NET]: Make /proc/net a symlink on /proc/self/net (v3))
broke ganglia and probably other applications that read /proc/net/dev.
This is due to the change of permissions of /proc/net that was
introduced in that commit.
Before: dr-xr-xr-x 5 root root 0 Mar 19 11:30 /proc/net
After: dr-xr--r-- 5 root root 0 Mar 19 11:29 /proc/self/net
This patch restores the permissions to the old value which makes
ganglia happy again.
Pavel Emelyanov says:
This also broke the postfix, as it was reported in bug #10286
and described in detail by Benjamin.
Signed-off-by: Andre Noll <maan@systemlinux.org>
Acked-by: Pavel Emelyanov <xemul@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
fs/ufs/balloc.c: In function `ufs_change_blocknr':
fs/ufs/balloc.c:317: warning: long long unsigned int format, long unsigned int arg (arg 2)
fs/ufs/balloc.c:317: warning: long long unsigned int format, long unsigned int arg (arg 3)
sector_t is u64 and we don't know what type the architecture uses to implement
u64.
Cc: Evgeniy Dushistov <dushistov@mail.ru>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
A read request outside i_size will be handled in do_generic_file_read(). So
we just return 0 to avoid getting -EIO as normal reading, let
do_generic_file_read do the rest.
At the same time we need unlock the page to avoid system stuck.
Fixes http://bugzilla.kernel.org/show_bug.cgi?id=10227
Signed-off-by: Dave Young <hidave.darkstar@gmail.com>
Acked-by: Jan Kara <jack@suse.cz>
Report-by: Christian Perle <chris@linuxinfotag.de>
Cc: <stable@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Fix kernel-doc notation warnings in fs/.
Warning(mmotm-2008-0314-1449//fs/super.c:560): missing initial short description on line:
* mark_files_ro
Warning(mmotm-2008-0314-1449//fs/locks.c:1277): missing initial short description on line:
* lease_get_mtime
Warning(mmotm-2008-0314-1449//fs/locks.c:1277): missing initial short description on line:
* lease_get_mtime
Warning(mmotm-2008-0314-1449//fs/namei.c:1368): missing initial short description on line:
* lookup_one_len: filesystem helper to lookup single pathname component
Warning(mmotm-2008-0314-1449//fs/buffer.c:3221): missing initial short description on line:
* bh_uptodate_or_lock: Test whether the buffer is uptodate
Warning(mmotm-2008-0314-1449//fs/buffer.c:3240): missing initial short description on line:
* bh_submit_read: Submit a locked buffer for reading
Warning(mmotm-2008-0314-1449//fs/fs-writeback.c:30): missing initial short description on line:
* writeback_acquire: attempt to get exclusive writeback access to a device
Warning(mmotm-2008-0314-1449//fs/fs-writeback.c:47): missing initial short description on line:
* writeback_in_progress: determine whether there is writeback in progress
Warning(mmotm-2008-0314-1449//fs/fs-writeback.c:58): missing initial short description on line:
* writeback_release: relinquish exclusive writeback access against a device.
Warning(mmotm-2008-0314-1449//include/linux/jbd.h:351): contents before sections
Warning(mmotm-2008-0314-1449//include/linux/jbd.h:561): contents before sections
Warning(mmotm-2008-0314-1449//fs/jbd/transaction.c:1935): missing initial short description on line:
* void journal_invalidatepage()
Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
ecryptfs_d_release() is doing a mntput before doing the dput. This patch
moves the dput before the mntput.
Thanks to Rajouri Jammu for reporting this.
Signed-off-by: Michael Halcrow <mhalcrow@us.ibm.com>
Cc: Rajouri Jammu <rajouri.jammu@gmail.com>
Cc: Eric Sandeen <sandeen@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Fix a long-standing typo (predating git) that will cause data corruption if a
journal data block needs unescaping. At the moment the wrong buffer head's
data is being unescaped.
To test this case mount a filesystem with data=journal, start creating and
deleting a bunch of files containing only JBD2_MAGIC_NUMBER (0xc03b3998), then
pull the plug on the device. Without this patch the files will contain zeros
instead of the correct data after recovery.
Signed-off-by: Duane Griffin <duaneg@dghda.com>
Acked-by: Jan Kara <jack@suse.cz>
Cc: <linux-ext4@vger.kernel.org>
Cc: <stable@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Fix a long-standing typo (predating git) that will cause data corruption if a
journal data block needs unescaping. At the moment the wrong buffer head's
data is being unescaped.
To test this case mount a filesystem with data=journal, start creating and
deleting a bunch of files containing only JFS_MAGIC_NUMBER (0xc03b3998), then
pull the plug on the device. Without this patch the files will contain zeros
instead of the correct data after recovery.
Signed-off-by: Duane Griffin <duaneg@dghda.com>
Acked-by: Jan Kara <jack@suse.cz>
Cc: <linux-ext4@vger.kernel.org>
Cc: <stable@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Fix up an error in iget removal in which romfs_lookup() making a successful
call to romfs_iget() continues through the negative/error handling (previously
the successful case jumped around the negative/error handling case):
(1) inode is initialised to NULL at the top of the function, eliminating the
need for specific negative-inode handling. This means the positive
success handling now flows straight through.
(2) Rename the labels to be clearer about what they mean.
Also make romfs_lookup()'s result variable of type long so as to avoid
32-bit/64-bit conversions with PTR_ERR() and friends.
Based upon a report and patch from Adam Richter.
Signed-off-by: David Howells <dhowells@redhat.com>
Acked-by: "Adam J. Richter" <adam@yggdrasil.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
There are several places where we make allocations with GFP_KERNEL while under
a transaction, which could lead to an assertion panic or lockup if under
memory pressure. This patch switches these problem areas to use GFP_NOFS to
keep these problems from happening.
Signed-off-by: Josef Bacik <jbacik@redhat.com>
Cc: <linux-ext4@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
We should always put inode we have reference to, even if quota was reenabled
in the mean time.
Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
My group ran into a AIO process hang on a 2.6.24 kernel with the process
sleeping indefinitely in io_getevents(2) waiting for the last wakeup to come
and it never would.
We ran the tests on x86_64 SMP. The hang only occurred on a Xeon box
("Clovertown") but not a Core2Duo ("Conroe"). On the Xeon, the L2 cache isn't
shared between all eight processors, but is L2 is shared between between all
two processors on the Core2Duo we use.
My analysis of the hang is if you go down to the second while-loop
in read_events(), what happens on processor #1:
1) add_wait_queue_exclusive() adds thread to ctx->wait
2) aio_read_evt() to check tail
3) if aio_read_evt() returned 0, call [io_]schedule() and sleep
In aio_complete() with processor #2:
A) info->tail = tail;
B) waitqueue_active(&ctx->wait)
C) if waitqueue_active() returned non-0, call wake_up()
The way the code is written, step 1 must be seen by all other processors
before processor 1 checks for pending events in step 2 (that were recorded by
step A) and step A by processor 2 must be seen by all other processors
(checked in step 2) before step B is done.
The race I believed I was seeing is that steps 1 and 2 were
effectively swapped due to the __list_add() being delayed by the L2
cache not shared by some of the other processors. Imagine:
proc 2: just before step A
proc 1, step 1: adds to ctx->wait, but is not visible by other processors yet
proc 1, step 2: checks tail and sees no pending events
proc 2, step A: updates tail
proc 1, step 3: calls [io_]schedule() and sleeps
proc 2, step B: checks ctx->wait, but sees no one waiting, skips wakeup
so proc 1 sleeps indefinitely
My patch adds a memory barrier between steps A and B. It ensures that the
update in step 1 gets seen on processor 2 before continuing. If processor 1
was just before step 1, the memory barrier makes sure that step A (update
tail) gets seen by the time processor 1 makes it to step 2 (check tail).
Before the patch our AIO process would hang virtually 100% of the time. After
the patch, we have yet to see the process ever hang.
Signed-off-by: Quentin Barnes <qbarnes+linux@yahoo-inc.com>
Reviewed-by: Zach Brown <zach.brown@oracle.com>
Cc: Benjamin LaHaise <bcrl@kvack.org>
Cc: <stable@kernel.org>
Cc: Nick Piggin <nickpiggin@yahoo.com.au>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
[ We should probably disallow that "if (waitqueue_active()) wake_up()"
coding pattern, because it's so often buggy wrt memory ordering ]
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Ignoring the return value from nfs_pageio_add_request can cause deadlocks.
In read path:
call nfs_pageio_add_request from readpage_async_filler
assume at this point that there are requests already in desc, that
can't be merged with the current request.
so nfs_pageio_doio is fired up to clear out desc.
assume something goes wrong in setting up the io, so desc->pg_error is set.
This causes nfs_pageio_add_request to return 0, *WITHOUT* adding the original
request.
BUT, since return code is ignored, readpage_async_filler assumes it has
been added, and does nothing further, leaving page locked.
do_generic_mapping_read will eventually call lock_page, resulting in deadlock
In write path:
page is marked dirty by generic_perform_write
nfs_writepages is called
call nfs_pageio_add_request from nfs_page_async_flush
assume at this point that there are requests already in desc, that
can't be merged with the current request.
so nfs_pageio_doio is fired up to clear out desc.
assume something goes wrong in setting up the io, so desc->pg_error is set.
This causes nfs_page_async_flush to return 0, *WITHOUT* adding the original
request, yet marking the request as locked (PG_BUSY) and in writeback,
clearing dirty marks.
The next time a write is done to the page, deadlock will result as
nfs_write_end calls nfs_update_request
Signed-off-by: Fred Isaman <iisaman@citi.umich.edu>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>