24842 Commits

Author SHA1 Message Date
Yongqiang Yang
e7b319e397 ext4: trace punch_hole correctly in ext4_ext_map_blocks
When ext4_ext_map_blocks() is called by punch_hole, trace should
trace blocks punched out.

Signed-off-by: Yongqiang Yang <xiaoqiangnk@gmail.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2011-10-29 09:39:51 -04:00
Yongqiang Yang
02dc62fba8 ext4: clean up AGGRESSIVE_TEST code
Signed-off-by: Yongqiang Yang <xiaoqiangnk@gmail.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2011-10-29 09:29:11 -04:00
Yongqiang Yang
81fdbb4a8d ext4: move variables to their scope
Signed-off-by: Yongqiang Yang <xiaoqiangnk@gmail.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2011-10-29 09:23:38 -04:00
Dmitry Monakhov
5cb81dabcc ext4: fix quota accounting during migration
The tmp_inode should have same uid/gid as the original inode.
Otherwise new metadata blocks will be accounted to wrong quota-id,
which will result in a quota leak after the inode migration is
completed.

Signed-off-by: Dmitry Monakhov <dmonakhov@openvz.org>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2011-10-29 09:05:00 -04:00
Dmitry Monakhov
fba90ffee8 ext4: migrate cleanup
This patch cleanup code a bit, actual logic not changed
- Move current block pointer to migrate_structure, let's all
  walk info will be in one structure.
- Get rid of usless null ind-block ptr checks, caller already
  does that check.

Signed-off-by: Dmitry Monakhov <dmonakhov@openvz.org>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2011-10-29 09:03:00 -04:00
Linus Torvalds
97d2eb13a0 Merge branch 'for-linus' of git://ceph.newdream.net/git/ceph-client
* 'for-linus' of git://ceph.newdream.net/git/ceph-client:
  libceph: fix double-free of page vector
  ceph: fix 32-bit ino numbers
  libceph: force resend of osd requests if we skip an osdmap
  ceph: use kernel DNS resolver
  ceph: fix ceph_monc_init memory leak
  ceph: let the set_layout ioctl set single traits
  Revert "ceph: don't truncate dirty pages in invalidate work thread"
  ceph: replace leading spaces with tabs
  libceph: warn on msg allocation failures
  libceph: don't complain on msgpool alloc failures
  libceph: always preallocate mon connection
  libceph: create messenger with client
  ceph: document ioctls
  ceph: implement (optional) max read size
  ceph: rename rsize -> rasize
  ceph: make readpages fully async
2011-10-28 16:42:18 -07:00
Steve French
8ea00c6977 [CIFS] Update cifs version to 1.76
Update cifs version to 1.76 now that async read,
lock caching, and changes to oplock enabled interface
are in.

Thanks to Pavel for reminding me.

Reviewed-by: Pavel Shilovsky <piastry@etersoft.ru>
Signed-off-by: Steve French <smfrench@gmail.com>
2011-10-28 14:49:46 -05:00
Pavel Shilovsky
d12799b4c3 CIFS: Remove extra mutex_unlock in cifs_lock_add_if
to prevent the mutex being unlocked twice if we interrupt a blocked lock.

Signed-off-by: Pavel Shilovsky <piastry@etersoft.ru>
Signed-off-by: Steve French <smfrench@gmail.com>
2011-10-28 14:09:23 -05:00
Linus Torvalds
f362f98e7c Merge branch 'for-next' of git://git.kernel.org/pub/scm/linux/kernel/git/hch/vfs-queue
* 'for-next' of git://git.kernel.org/pub/scm/linux/kernel/git/hch/vfs-queue: (21 commits)
  leases: fix write-open/read-lease race
  nfs: drop unnecessary locking in llseek
  ext4: replace cut'n'pasted llseek code with generic_file_llseek_size
  vfs: add generic_file_llseek_size
  vfs: do (nearly) lockless generic_file_llseek
  direct-io: merge direct_io_walker into __blockdev_direct_IO
  direct-io: inline the complete submission path
  direct-io: separate map_bh from dio
  direct-io: use a slab cache for struct dio
  direct-io: rearrange fields in dio/dio_submit to avoid holes
  direct-io: fix a wrong comment
  direct-io: separate fields only used in the submission path from struct dio
  vfs: fix spinning prevention in prune_icache_sb
  vfs: add a comment to inode_permission()
  vfs: pass all mask flags check_acl and posix_acl_permission
  vfs: add hex format for MAY_* flag values
  vfs: indicate that the permission functions take all the MAY_* flags
  compat: sync compat_stats with statfs.
  vfs: add "device" tag to /proc/self/mountstats
  cleanup: vfs: small comment fix for block_invalidatepage
  ...

Fix up trivial conflict in fs/gfs2/file.c (llseek changes)
2011-10-28 10:49:34 -07:00
Linus Torvalds
f793f29611 Merge http://sucs.org/~rohan/git/gfs2-3.0-nmw
* http://sucs.org/~rohan/git/gfs2-3.0-nmw: (24 commits)
  GFS2: Move readahead of metadata during deallocation into its own function
  GFS2: Remove two unused variables
  GFS2: Misc fixes
  GFS2: rewrite fallocate code to write blocks directly
  GFS2: speed up delete/unlink performance for large files
  GFS2: Fix off-by-one in gfs2_blk2rgrpd
  GFS2: Clean up ->page_mkwrite
  GFS2: Correctly set goal block after allocation
  GFS2: Fix AIL flush issue during fsync
  GFS2: Use cached rgrp in gfs2_rlist_add()
  GFS2: Call do_strip() directly from recursive_scan()
  GFS2: Remove obsolete assert
  GFS2: Cache the most recently used resource group in the inode
  GFS2: Make resource groups "append only" during life of fs
  GFS2: Use rbtree for resource groups and clean up bitmap buffer ref count scheme
  GFS2: Fix lseek after SEEK_DATA, SEEK_HOLE have been added
  GFS2: Clean up gfs2_create
  GFS2: Use ->dirty_inode()
  GFS2: Fix bug trap and journaled data fsync
  GFS2: Fix inode allocation error path
  ...
2011-10-28 10:44:50 -07:00
Linus Torvalds
dabcbb1bae Merge branch '3.2-without-smb2' of git://git.samba.org/sfrench/cifs-2.6
* '3.2-without-smb2' of git://git.samba.org/sfrench/cifs-2.6: (52 commits)
  Fix build break when freezer not configured
  Add definition for share encryption
  CIFS: Make cifs_push_locks send as many locks at once as possible
  CIFS: Send as many mandatory unlock ranges at once as possible
  CIFS: Implement caching mechanism for posix brlocks
  CIFS: Implement caching mechanism for mandatory brlocks
  CIFS: Fix DFS handling in cifs_get_file_info
  CIFS: Fix error handling in cifs_readv_complete
  [CIFS] Fixup trivial checkpatch warning
  [CIFS] Show nostrictsync and noperm mount options in /proc/mounts
  cifs, freezer: add wait_event_freezekillable and have cifs use it
  cifs: allow cifs_max_pending to be readable under /sys/module/cifs/parameters
  cifs: tune bdi.ra_pages in accordance with the rsize
  cifs: allow for larger rsize= options and change defaults
  cifs: convert cifs_readpages to use async reads
  cifs: add cifs_async_readv
  cifs: fix protocol definition for READ_RSP
  cifs: add a callback function to receive the rest of the frame
  cifs: break out 3rd receive phase into separate function
  cifs: find mid earlier in receive codepath
  ...
2011-10-28 10:43:32 -07:00
Linus Torvalds
5619a69396 Merge branch 'for-linus' of git://oss.sgi.com/xfs/xfs
* 'for-linus' of git://oss.sgi.com/xfs/xfs: (69 commits)
  xfs: add AIL pushing tracepoints
  xfs: put in missed fix for merge problem
  xfs: do not flush data workqueues in xfs_flush_buftarg
  xfs: remove XFS_bflush
  xfs: remove xfs_buf_target_name
  xfs: use xfs_ioerror_alert in xfs_buf_iodone_callbacks
  xfs: clean up xfs_ioerror_alert
  xfs: clean up buffer allocation
  xfs: remove buffers from the delwri list in xfs_buf_stale
  xfs: remove XFS_BUF_STALE and XFS_BUF_SUPER_STALE
  xfs: remove XFS_BUF_SET_VTYPE and XFS_BUF_SET_VTYPE_REF
  xfs: remove XFS_BUF_FINISH_IOWAIT
  xfs: remove xfs_get_buftarg_list
  xfs: fix buffer flushing during unmount
  xfs: optimize fsync on directories
  xfs: reduce the number of log forces from tail pushing
  xfs: Don't allocate new buffers on every call to _xfs_buf_find
  xfs: simplify xfs_trans_ijoin* again
  xfs: unlock the inode before log force in xfs_change_file_space
  xfs: unlock the inode before log force in xfs_fs_nfs_commit_metadata
  ...
2011-10-28 10:31:42 -07:00
J. Bruce Fields
f3c7691e8d leases: fix write-open/read-lease race
In setlease, we use i_writecount to decide whether we can give out a
read lease.

In open, we break leases before incrementing i_writecount.

There is therefore a window between the break lease and the i_writecount
increment when setlease could add a new read lease.

This would leave us with a simultaneous write open and read lease, which
shouldn't happen.

Signed-off-by: J. Bruce Fields <bfields@redhat.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
2011-10-28 14:59:00 +02:00
Andi Kleen
79835a710d nfs: drop unnecessary locking in llseek
This makes NFS follow the standard generic_file_llseek locking scheme.

Cc: Trond.Myklebust@netapp.com
Signed-off-by: Andi Kleen <ak@linux.intel.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
2011-10-28 14:59:00 +02:00
Andi Kleen
4cce0e28b9 ext4: replace cut'n'pasted llseek code with generic_file_llseek_size
This gives ext4 the benefits of unlocked llseek.

Cc: tytso@mit.edu
Signed-off-by: Andi Kleen <ak@linux.intel.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
2011-10-28 14:58:59 +02:00
Andi Kleen
5760495a87 vfs: add generic_file_llseek_size
Add a generic_file_llseek variant to the VFS that allows passing in
the maximum file size of the file system, instead of always
using maxbytes from the superblock.

This can be used to eliminate some cut'n'paste seek code in ext4.

Signed-off-by: Andi Kleen <ak@linux.intel.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
2011-10-28 14:58:59 +02:00
Andi Kleen
ef3d0fd27e vfs: do (nearly) lockless generic_file_llseek
The i_mutex lock use of generic _file_llseek hurts.  Independent processes
accessing the same file synchronize over a single lock, even though
they have no need for synchronization at all.

Under high utilization this can cause llseek to scale very poorly on larger
systems.

This patch does some rethinking of the llseek locking model:

First the 64bit f_pos is not necessarily atomic without locks
on 32bit systems. This can already cause races with read() today.
This was discussed on linux-kernel in the past and deemed acceptable.
The patch does not change that.

Let's look at the different seek variants:

SEEK_SET: Doesn't really need any locking.
If there's a race one writer wins, the other loses.

For 32bit the non atomic update races against read()
stay the same. Without a lock they can also happen
against write() now.  The read() race was deemed
acceptable in past discussions, and I think if it's
ok for read it's ok for write too.

=> Don't need a lock.

SEEK_END: This behaves like SEEK_SET plus it reads
the maximum size too. Reading the maximum size would have the
32bit atomic problem. But luckily we already have a way to read
the maximum size without locking (i_size_read), so we
can just use that instead.

Without i_mutex there is no synchronization with write() anymore,
however since the write() update is atomic on 64bit it just behaves
like another racy SEEK_SET.  On non atomic 32bit it's the same
as SEEK_SET.

=> Don't need a lock, but need to use i_size_read()

SEEK_CUR: This has a read-modify-write race window
on the same file. One could argue that any application
doing unsynchronized seeks on the same file is already broken.
But for the sake of not adding a regression here I'm
using the file->f_lock to synchronize this. Using this
lock is much better than the inode mutex because it doesn't
synchronize between processes.

=> So still need a lock, but can use a f_lock.

This patch implements this new scheme in generic_file_llseek.
I dropped generic_file_llseek_unlocked and changed all callers.

Signed-off-by: Andi Kleen <ak@linux.intel.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
2011-10-28 14:58:58 +02:00
Andi Kleen
847cc6371b direct-io: merge direct_io_walker into __blockdev_direct_IO
This doesn't change anything for the compiler, but hch thought it would
make the code clearer.

I moved the reference counting into its own little inline.

Signed-off-by: Andi Kleen <ak@linux.intel.com>
Acked-by: Jeff Moyer <jmoyer@redhat.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
2011-10-28 14:58:58 +02:00
Andi Kleen
ba253fbf6d direct-io: inline the complete submission path
Add inlines to all the submission path functions. While this increases
code size it also gives gcc a lot of optimization opportunities
in this critical hotpath.

In particular -- together with some other changes -- this
allows gcc to get rid of the unnecessary clearing of
sdio at the beginning and optimize the messy parameter passing.
Any non inlining of a function which takes a sdio parameter
would break this optimization because they cannot be done if the
address of a structure is taken.

Note that benefits are only seen with CONFIG_OPTIMIZE_INLINING
and CONFIG_CC_OPTIMIZE_FOR_SIZE both set to off.

This gives about 2.2% improvement on a large database benchmark
with a high IOPS rate.

Signed-off-by: Andi Kleen <ak@linux.intel.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
2011-10-28 14:58:58 +02:00
Andi Kleen
18772641db direct-io: separate map_bh from dio
Only a single b_private field in the map_bh buffer head is needed after
the submission path. Move map_bh separately to avoid storing
this information in the long term slab.

This avoids the weird 104 byte hole in struct dio_submit which also needed
to be memseted early.

Signed-off-by: Andi Kleen <ak@linux.intel.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
2011-10-28 14:58:57 +02:00
Andi Kleen
6e8267f532 direct-io: use a slab cache for struct dio
A direct slab call is slightly faster than kmalloc and can be better cached
per CPU. It also avoids rounding to the next kmalloc slab.

In addition this enforces cache line alignment for struct dio to avoid
any false sharing.

Signed-off-by: Andi Kleen <ak@linux.intel.com>
Acked-by: Jeff Moyer <jmoyer@redhat.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
2011-10-28 14:58:57 +02:00
Andi Kleen
0dc2bc49be direct-io: rearrange fields in dio/dio_submit to avoid holes
Fix most problems reported by pahole.

There is still a weird 104 byte hole after map_bh. I'm not sure what
causes this.

Signed-off-by: Andi Kleen <ak@linux.intel.com>
Acked-by: Jeff Moyer <jmoyer@redhat.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
2011-10-28 14:58:56 +02:00
Andi Kleen
cde1ecb324 direct-io: fix a wrong comment
There's nothing on the stack, even before my changes.

Signed-off-by: Andi Kleen <ak@linux.intel.com>
Acked-by: Jeff Moyer <jmoyer@redhat.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
2011-10-28 14:58:56 +02:00
Andi Kleen
eb28be2b4c direct-io: separate fields only used in the submission path from struct dio
This large, but largely mechanic, patch moves all fields in struct dio
that are only used in the submission path into a separate on stack
data structure. This has the advantage that the memory is very likely
cache hot, which is not guaranteed for memory fresh out of kmalloc.

This also gives gcc more optimization potential because it can easier
determine that there are no external aliases for these variables.

The sdio initialization is a initialization now instead of memset.
This allows gcc to break sdio into individual fields and optimize
away unnecessary zeroing (after all the functions are inlined)

Signed-off-by: Andi Kleen <ak@linux.intel.com>
Acked-by: Jeff Moyer <jmoyer@redhat.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
2011-10-28 14:58:56 +02:00
Christoph Hellwig
62a3ddef61 vfs: fix spinning prevention in prune_icache_sb
We need to move the inode to the end of the list to actually make the
spinning prevention explained in the comment above it work.  With a
plain list_move it will simply stay in place as we're always reclaiming
from the head of the list.

Signed-off-by: Christoph Hellwig <hch@lst.de>
2011-10-28 14:58:55 +02:00
Andreas Gruenbacher
948409c74d vfs: add a comment to inode_permission()
Acked-by: J. Bruce Fields <bfields@redhat.com>
Acked-by: David Howells <dhowells@redhat.com>
Signed-off-by: Andreas Gruenbacher <agruen@kernel.org>
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
2011-10-28 14:58:55 +02:00
Andreas Gruenbacher
d124b60a83 vfs: pass all mask flags check_acl and posix_acl_permission
Acked-by: J. Bruce Fields <bfields@redhat.com>
Acked-by: David Howells <dhowells@redhat.com>
Signed-off-by: Andreas Gruenbacher <agruen@kernel.org>
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
2011-10-28 14:58:54 +02:00
Andreas Gruenbacher
8fd90c8d1d vfs: indicate that the permission functions take all the MAY_* flags
Acked-by: J. Bruce Fields <bfields@redhat.com>
Acked-by: David Howells <dhowells@redhat.com>
Signed-off-by: Andreas Gruenbacher <agruen@kernel.org>
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
2011-10-28 14:58:54 +02:00
Eric W. Biederman
1448c721e4 compat: sync compat_stats with statfs.
This was found by inspection while tracking a similar
bug in compat_statfs64, that has been fixed in mainline
since decemeber.

- This fixes a bug where not all of the f_spare fields
  were cleared on mips and s390.
- Add the f_flags field to struct compat_statfs
- Copy f_flags to userspace in case someone cares.
- Use __clear_user to copy the f_spare field to userspace
  to ensure that all of the elements of f_spare are cleared.
  On some architectures f_spare is has 5 ints and on some
  architectures f_spare only has 4 ints.  Which makes
  the previous technique of clearing each int individually
  broken.

I don't expect anyone actually uses the old statfs system
call anymore but if they do let them benefit from having
the compat and the native version working the same.

Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
2011-10-28 14:58:53 +02:00
Bryan Schumaker
a877ee03ac vfs: add "device" tag to /proc/self/mountstats
nfsiostat was failing to find mounted filesystems on kernels after
2.6.38 because of changes to show_vfsstat() by commit
c7f404b40a3665d9f4e9a927cc5c1ee0479ed8f9.  This patch adds back the
"device" tag before the nfs server entry so scripts can parse the
mountstats file correctly.

Signed-off-by: Bryan Schumaker <bjschuma@netapp.com>
CC: stable@kernel.org [>=2.6.39]
Signed-off-by: Christoph Hellwig <hch@lst.de>
2011-10-28 13:55:08 +02:00
Wang Sheng-Hui
814e1d25a5 cleanup: vfs: small comment fix for block_invalidatepage
The patch is aganist 3.1-rc3.

Signed-off-by: Wang Sheng-Hui <shhuiw@gmail.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
2011-10-28 13:55:08 +02:00
Steve French
96814ecb40 Add definition for share encryption
Samba supports a setfs info level to negotiate encrypted
shares.  This patch adds the defines so we recognize
this info level.  Later patches will add the enablement
for it.

Acked-by: Jeremy Allison <jra@samba.org>
Signed-off-by: Steve French <smfrench@gmail.com>
2011-10-27 16:53:31 -05:00
Eric Gouriou
80e675f906 ext4: optimize memmmove lengths in extent/index insertions
ext4_ext_insert_extent() (respectively ext4_ext_insert_index())
was using EXT_MAX_EXTENT() (resp. EXT_MAX_INDEX()) to determine
how many entries needed to be moved beyond the insertion point.
In practice this means that (320 - I) * 24 bytes were memmove()'d
when I is the insertion point, rather than (#entries - I) * 24 bytes.

This patch uses EXT_LAST_EXTENT() (resp. EXT_LAST_INDEX()) instead
to only move existing entries. The code flow is also simplified
slightly to highlight similarities and reduce code duplication in
the insertion logic.

This patch reduces system CPU consumption by over 25% on a 4kB
synchronous append DIO write workload when used with the
pre-2.6.39 x86_64 memmove() implementation. With the much faster
2.6.39 memmove() implementation we still see a decrease in
system CPU usage between 2% and 7%.

Note that the ext_debug() output changes with this patch, splitting
some log information between entries. Users of the ext_debug() output
should note that the "move %d" units changed from reporting the number
of bytes moved to reporting the number of entries moved.

Signed-off-by: Eric Gouriou <egouriou@google.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2011-10-27 11:52:18 -04:00
Eric Gouriou
6f91bc5fda ext4: optimize ext4_ext_convert_to_initialized()
This patch introduces a fast path in ext4_ext_convert_to_initialized()
for the case when the conversion can be performed by transferring
the newly initialized blocks from the uninitialized extent into
an adjacent initialized extent. Doing so removes the expensive
invocations of memmove() which occur during extent insertion and
the subsequent merge.

In practice this should be the common case for clients performing
append writes into files pre-allocated via
fallocate(FALLOC_FL_KEEP_SIZE). In such a workload performed via
direct IO and when using a suboptimal implementation of memmove()
(x86_64 prior to the 2.6.39 rewrite), this patch reduces kernel CPU
consumption by 32%.

Two new trace points are added to ext4_ext_convert_to_initialized()
to offer visibility into its operations. No exit trace point has
been added due to the multiplicity of return points. This can be
revisited once the upstream cleanup is backported.

Signed-off-by: Eric Gouriou <egouriou@google.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2011-10-27 11:43:23 -04:00
Randy Dunlap
4470575461 jbd2: fix build when CONFIG_BUG is not enabled
Fix build error when CONFIG_BUG is not enabled:

fs/jbd2/transaction.c:1175:3: error: implicit declaration of function '__WARN'

by changing __WARN() to WARN_ON(), as suggested by
Arnaud Lacombe <lacombar@gmail.com>.

Signed-off-by: Randy Dunlap <rdunlap@xenotime.net>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Arnaud Lacombe <lacombar@gmail.com>
2011-10-27 04:05:13 -04:00
Boaz Harrosh
60325f0c6e fs/Makefile: Stupid typo breakage of exofs inclusion
In my last patch I did a stupid mistake and broke the exofs
compilation completely. Fix it ASAP.

Instead of obj-y I did obj-$(y)

Really Really sorry. Me totally blushing :-{|

Signed-off-by: Boaz Harrosh <bharrosh@panasas.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2011-10-27 08:36:51 +02:00
Linus Torvalds
c28cfd60e4 Merge branch 'for-linus' of git://git.open-osd.org/linux-open-osd
* 'for-linus' of git://git.open-osd.org/linux-open-osd: (21 commits)
  ore: Enable RAID5 mounts
  exofs: Support for RAID5 read-4-write interface.
  ore: RAID5 Write
  ore: RAID5 read
  fs/Makefile: Always inspect exofs/
  ore: Make ore_calc_stripe_info EXPORT_SYMBOL
  ore/exofs: Change ore_check_io API
  ore/exofs: Define new ore_verify_layout
  ore: Support for partial component table
  ore: Support for short read/writes
  exofs: Support for short read/writes
  ore: Remove check for ios->kern_buff in _prepare_for_striping to later
  ore: cleanup: Embed an ore_striping_info inside ore_io_state
  ore: Only IO one group at a time (API change)
  ore/exofs: Change the type of the devices array (API change)
  ore: Make ore_striping_info and ore_calc_stripe_info public
  exofs: Remove unused data_map member from exofs_sb_info
  exofs: Rename struct ore_components comps => oc
  exofs/super.c: local functions should be static
  exofs/ore.c: local functions should be static
  ...
2011-10-26 21:33:50 +02:00
Linus Torvalds
39adff5f69 Merge branch 'timers-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
* 'timers-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (23 commits)
  time, s390: Get rid of compile warning
  dw_apb_timer: constify clocksource name
  time: Cleanup old CONFIG_GENERIC_TIME references that snuck in
  time: Change jiffies_to_clock_t() argument type to unsigned long
  alarmtimers: Fix error handling
  clocksource: Make watchdog reset lockless
  posix-cpu-timers: Cure SMP accounting oddities
  s390: Use direct ktime path for s390 clockevent device
  clockevents: Add direct ktime programming function
  clockevents: Make minimum delay adjustments configurable
  nohz: Remove "Switched to NOHz mode" debugging messages
  proc: Consider NO_HZ when printing idle and iowait times
  nohz: Make idle/iowait counter update conditional
  nohz: Fix update_ts_time_stat idle accounting
  cputime: Clean up cputime_to_usecs and usecs_to_cputime macros
  alarmtimers: Rework RTC device selection using class interface
  alarmtimers: Add try_to_cancel functionality
  alarmtimers: Add more refined alarm state tracking
  alarmtimers: Remove period from alarm structure
  alarmtimers: Remove interval cap limit hack
  ...
2011-10-26 17:15:03 +02:00
Tao Ma
b3ff056908 ext4: don't check io->flag when setting EXT4_STATE_DIO_UNWRITTEN inode state
When we want to convert the unitialized extent in direct write, we can
either do it in ext4_end_io_nolock(AIO case) or in
ext4_ext_direct_IO(non AIO case) and EXT4_I(inode)->cur_aio_dio is a
guard for ext4_ext_map_blocks to find the right case.  In e9e3bcecf,
we mistakenly change it by:

-			if (io)
+			if (io && !(io->flag & EXT4_IO_END_UNWRITTEN)) {
 				io->flag = EXT4_IO_END_UNWRITTEN;
-			else
+				atomic_inc(&EXT4_I(inode)->i_aiodio_unwritten);
+			} else
 				ext4_set_inode_state(inode,
 						     EXT4_STATE_DIO_UNWRITTEN);

So now if we map 2 blocks, and the first one set the
EXT_IO_END_UNWRITTEN, the 2nd mapping will set inode state because of
the check for the flag. This is wrong.

Cc: Eric Sandeen <sandeen@redhat.com>
Signed-off-by: Tao Ma <boyu.mt@taobao.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2011-10-26 11:08:39 -04:00
Robin Dong
0a10da73e1 ext4: fix a wrong comment in __mb_check_buddy()
The comment says the bit should be 0, but the after code assert the
bit to be 1.  This makes people confused, so fix it.

Signed-off-by: Robin Dong <sanbai@taobao.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2011-10-26 08:48:54 -04:00
Linus Torvalds
e33bae14fd Merge branch 'for-linus' of git://github.com/ericvh/linux
* 'for-linus' of git://github.com/ericvh/linux:
  9p: fix 9p.txt to advertise msize instead of maxdata
  net/9p: Convert net/9p protocol dumps to tracepoints
  fs/9p: change an int to unsigned int
  fs/9p: Cleanup option parsing in 9p
  9p: move dereference after NULL check
  fs/9p: inode file operation is properly initialized init_special_inode
  fs/9p: Update zero-copy implementation in 9p
2011-10-26 14:20:53 +02:00
Robin Dong
b051d8dc4e ext4: remove unused variable in mb_find_extent()
The variable 'ord' in function mb_find_extent() is redundant, so
remove it.

Signed-off-by: Robin Dong <sanbai@taobao.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2011-10-26 05:30:30 -04:00
Robin Dong
66a83cde47 ext4: remove unused variable in ext4_mb_generate_from_pa()
The variable 'count' in function ext4_mb_generate_from_pa() looks
useless, so remove it.

Signed-off-by: Robin Dong <sanbai@taobao.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2011-10-26 05:29:21 -04:00
Robin Dong
ebbe027797 ext4: use stream-alloc when mb_group_prealloc set to zero
The kernel will crash on 

ext4_mb_mark_diskspace_used:
	BUG_ON(ac->ac_b_ex.fe_len <= 0);

after we set /sys/fs/ext4/sda/mb_group_prealloc to zero and create new files in an ext4 filesystem.

The reason is: ac_b_ex.fe_len also set to zero(mb_group_prealloc) in ext4_mb_normalize_group_request
because the ac_flags contains EXT4_MB_HINT_GROUP_ALLOC.

I think when someone set mb_group_prealloc to zero, it means DO NOT USE GROUP PREALLOCATION,
so we should set alloc-strategy to STREAM in this case.

Signed-off-by: Robin Dong <sanbai@taobao.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2011-10-26 05:14:27 -04:00
Yongqiang Yang
fcbb551582 ext4: let ext4_page_mkwrite stop started handle in failure
The started journal handle should be stopped in failure case.

Signed-off-by: Yongqiang Yang <xiaoqiangnk@gmail.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Acked-by: Jan Kara <jack@suse.cz>
Cc: stable@kernel.org
2011-10-26 05:00:19 -04:00
Curt Wohlgemuth
6f8ff53726 ext4: handle NULL p_ext in ext4_ext_next_allocated_block()
In ext4_ext_next_allocated_block(), the path[depth] might
have a p_ext that is NULL -- see ext4_ext_binsearch().  In
such a case, dereferencing it will crash the machine.

This patch checks for p_ext == NULL in
ext4_ext_next_allocated_block() before dereferencinging it.

Tested using a hand-crafted an inode with eh_entries == 0 in
an extent block, verified that running FIEMAP on it crashes
without this patch, works fine with it.

Signed-off-by: Curt Wohlgemuth <curtw@google.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2011-10-26 04:38:59 -04:00
Dan Carpenter
f85b287a01 ext4: error handling fix in ext4_ext_convert_to_initialized()
When allocated is unsigned it breaks the error handling at the end
of the function when we call:
	allocated = ext4_split_extent(...);
	if (allocated < 0)
		err = allocated;

I've made it a signed int instead of unsigned.

Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2011-10-26 03:42:36 -04:00
Eric Sandeen
665436175c ext4: use ext4_reserve_inode_write in ext4_xattr_set_handle
ext4_mark_iloc_dirty() says:

 * The caller must have previously called ext4_reserve_inode_write().
 * Give this, we know that the caller already has write access to iloc->bh.

ext4_xattr_set_handle, however, just open-codes it.  May as well use
the helper function for consistency.

No bug here, just tidiness.

(Note: on cleanup path, ext4_reserve_inode_write sets
the bh to NULL if it returns an error, and brelse() of 
a null bh is handled gracefully).

Signed-off-by: Eric Sandeen <sandeen@redhat.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2011-10-26 03:32:07 -04:00
Andreas Dilger
909a4cf1ff ext4: avoid setting directory i_nlink to zero
If a directory with more than EXT4_LINK_MAX subdirectories, the nlink
count is set to 1.  Subsequently, if any subdirectories are deleted,
ext4_dec_count() decrements the i_nlink count, which may go to 0
temporarily before being incremented back to 1.

While this is done under i_mutex, which prevents races for directory
and inode operations that check i_nlink, the temporary i_nlink == 0
case is exposed to userspace via stat() and similar calls that do not
hold i_mutex.

Instead, change the code to not decrement i_nlink count for any
directories that do not already have i_nlink larger than 2.

Reported-by: Cliff White <cliffw@whamcloud.com>
Reviewed-by: Johann Lombardi <johann@whamcloud.com>
Signed-off-by: Andreas Dilger <adilger@whamcloud.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2011-10-26 03:22:31 -04:00
Sage Weil
3395734067 libceph: fix double-free of page vector
ceph_release_page_vector() kfrees the vector; we shouldn't do it here too.

Reported-by: Jeff Wu <cpwu@tnsoft.com.cn>
Signed-off-by: Sage Weil <sage@newdream.net>
2011-10-25 16:10:17 -07:00