23067 Commits

Author SHA1 Message Date
Amerigo Wang
69a60c4d17 ocfs2: remove the /sys/o2cb symlink
It is obsoleted since Dec 2005.

Signed-off-by: WANG Cong <amwang@redhat.com>
Signed-off-by: Joel Becker <jlbec@evilplan.org>
2011-05-23 23:37:14 -07:00
Tiger Yang
e2b0c215c2 ocfs2: clean up mount option about atime in ocfs2.txt
As ocfs2 supports relatime and strictatime, we need update the
relative document. Atime_quantum need work with strictatime, so only
show it in procfs when mount with strictatime.

Signed-off-by: Tiger Yang <tiger.yang@oracle.com>
Signed-off-by: Joel Becker <jlbec@evilplan.org>
2011-05-23 23:37:12 -07:00
Linus Torvalds
343800e7d2 Merge git://git.kernel.org/pub/scm/linux/kernel/git/hirofumi/fatfs-2.6
* git://git.kernel.org/pub/scm/linux/kernel/git/hirofumi/fatfs-2.6:
  fat: Fix statfs->f_namelen
  fat: Replace all printk with fat_msg()
  fat: Add fat_msg() function for preformated FAT messages
  fat: Convert fat_fs_error to use %pV
  fat: Fix possible null deref in fat_cache_add()
  fat: use new setup() for ->dir_ops too
2011-05-23 21:11:38 -07:00
Jeff Layton
3c1105df69 cifs: don't call mid_q_entry->callback under the Global_MidLock (try #5)
Minor revision to the last version of this patch -- the only difference
is the fix to the cFYI statement in cifs_reconnect.

Holding the spinlock while we call this function means that it can't
sleep, which really limits what it can do. Taking it out from under
the spinlock also means less contention for this global lock.

Change the semantics such that the Global_MidLock is not held when
the callback is called. To do this requires that we take extra care
not to have sync_mid_result remove the mid from the list when the
mid is in a state where that has already happened. This prevents
list corruption when the mid is sitting on a private list for
reconnect or when cifsd is coming down.

Reviewed-by: Shirish Pargaonkar <shirishpargaonkar@gmail.com>
Reviewed-by: Pavel Shilovsky <piastry@etersoft.ru>
Signed-off-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: Steve French <sfrench@us.ibm.com>
2011-05-24 03:11:33 +00:00
Pavel Shilovsky
724d9f1cfb CIFS: Simplify mount code for further shared sb capability
Reorganize code to get mount option at first and when get a superblock.
This lets us use shared superblock model further for equal mounts.

Signed-off-by: Pavel Shilovsky <piastry@etersoft.ru>
Signed-off-by: Steve French <sfrench@us.ibm.com>
2011-05-24 03:07:42 +00:00
Eryu Guan
c2b67735e5 jbd: Fix comment to match the code in journal_start()
journal_start returns an ERR_PTR() value rather than NULL on failure.

Cc: Jan Kara <jack@suse.cz>
Signed-off-by: Eryu Guan <guaneryu@gmail.com>
Signed-off-by: Jan Kara <jack@suse.cz>
2011-05-24 00:27:53 +02:00
Linus Torvalds
a77febbef1 Merge branch 'for-linus' of git://oss.sgi.com/xfs/xfs
* 'for-linus' of git://oss.sgi.com/xfs/xfs:
  xfs: obey minleft values during extent allocation correctly
  xfs: reset buffer pointers before freeing them
  xfs: avoid getting stuck during async inode flushes
  xfs: fix xfs_itruncate_start tracing
  xfs: fix duplicate workqueue initialisation
  xfs: kill off xfs_printk()
  xfs: fix race condition in AIL push trigger
  xfs: make AIL target updates and compares 32bit safe.
  xfs: always push the AIL to the target
  xfs: exit AIL push work correctly when AIL is empty
  xfs: ensure reclaim cursor is reset correctly at end of AG
  xfs: add an x86 compat handler for XFS_IOC_ZERO_RANGE
  xfs: fix compiler warning in xfs_trace.h
  xfs: cleanup duplicate initializations
  xfs: reduce the number of pagb_lock roundtrips in xfs_alloc_clear_busy
  xfs: exact busy extent tracking
  xfs: do not immediately reuse busy extent ranges
  xfs: optimize AGFL refills
2011-05-23 15:19:16 -07:00
Linus Torvalds
99dff58562 Merge branch 'tty-next' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty-2.6
* 'tty-next' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty-2.6: (48 commits)
  serial: 8250_pci: add support for Cronyx Omega PCI multiserial board.
  tty/serial: Fix break handling for PORT_TEGRA
  tty/serial: Add explicit PORT_TEGRA type
  n_tracerouter and n_tracesink ldisc additions.
  Intel PTI implementaiton of MIPI 1149.7.
  Kernel documentation for the PTI feature.
  export kernel call get_task_comm().
  tty: Remove to support serial for S5P6442
  pch_phub: Support new device ML7223
  8250_pci: Add support for the Digi/IBM PCIe 2-port Adapter
  ASoC: Update cx20442 for TTY API change
  pch_uart: Support new device ML7223 IOH
  parport: Use request_muxed_region for IT87 probe and lock
  tty/serial: add support for Xilinx PS UART
  n_gsm: Use print_hex_dump_bytes
  drivers/tty/moxa.c: Put correct tty value
  TTY: tty_io, annotate locking functions
  TTY: serial_core, remove superfluous set_task_state
  TTY: serial_core, remove invalid test
  Char: moxa, fix locking in moxa_write
  ...

Fix up trivial conflicts in drivers/bluetooth/hci_ldisc.c and
drivers/tty/serial/Makefile.

I did the hci_ldisc thing as an evil merge, cleaning things up.
2011-05-23 12:23:20 -07:00
Theodore Ts'o
072bd7ea74 ext4: use truncate_setsize() unconditionally
In commit c8d46e41 (ext4: Add flag to files with blocks intentionally
past EOF), if the EOFBLOCKS_FL flag is set, we call ext4_truncate()
before calling vmtruncate().  This caused any allocated but unwritten
blocks created by calling fallocate() with the FALLOC_FL_KEEP_SIZE
flag to be dropped.  This was done to make to make sure that
EOFBLOCKS_FL would not be cleared while still leaving blocks past
i_size allocated.  This was not necessary, since ext4_truncate()
guarantees that blocks past i_size will be dropped, even in the case
where truncate() has increased i_size before calling ext4_truncate().

So fix this by removing the EOFBLOCKS_FL special case treatment in
ext4_setattr().  In addition, use truncate_setsize() followed by a
call to ext4_truncate() instead of using vmtruncate().  This is more
efficient since it skips the call to inode_newsize_ok(), which has
been checked already by inode_change_ok().  This is also in a win in
the case where EOFBLOCKS_FL is set since it avoids calling
ext4_truncate() twice.

Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2011-05-23 15:13:02 -04:00
Pavel Shilovsky
37bb04e5a0 CIFS: Simplify connection structure search calls
Use separate functions for comparison between existing structure
and what we are requesting for to make server, session and tcon
search code easier to use on next superblock match call.

Reviewed-by: Jeff Layton <jlayton@samba.org>
Signed-off-by: Pavel Shilovsky <piastry@etersoft.ru>
Signed-off-by: Steve French <sfrench@us.ibm.com>
2011-05-23 19:05:09 +00:00
Chris Mason
d6c0cb379c Merge branch 'cleanups_and_fixes' into inode_numbers
Conflicts:
	fs/btrfs/tree-log.c
	fs/btrfs/volumes.c

Signed-off-by: Chris Mason <chris.mason@oracle.com>
2011-05-23 14:37:47 -04:00
Linus Torvalds
30cb6d5f2e Merge branch 'timers-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip
* 'timers-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
  hrtimers: Reorder clock bases
  hrtimers: Avoid touching inactive timer bases
  hrtimers: Make struct hrtimer_cpu_base layout less stupid
  timerfd: Manage cancelable timers in timerfd
  clockevents: Move C3 stop test outside lock
  alarmtimer: Drop device refcount after rtc_open()
  alarmtimer: Check return value of class_find_device()
  timerfd: Allow timers to be cancelled when clock was set
  hrtimers: Prepare for cancel on clock was set timers
2011-05-23 11:30:28 -07:00
Christoph Hellwig
c02324a6ae cifs: remove unused SMB2 config and mount options
There's no SMB2 support in the CIFS filesystem driver, so there's no need to
have a config and mount option for it.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Steve French <sfrench@us.ibm.com>
2011-05-23 18:08:05 +00:00
Namhyung Kim
825cdcb1a5 splice: add wakeup_pipe_readers()
Add and use wakeup_pipe_readers() to consolidate duplicated codes.

Signed-off-by: Namhyung Kim <namhyung@gmail.com>
Cc: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Jens Axboe <jaxboe@fusionio.com>
2011-05-23 19:58:53 +02:00
Xiao Guangrong
1f78160ce1 Btrfs: using rcu lock in the reader side of devices list
fs_devices->devices is only updated on remove and add device paths, so we can
use rcu to protect it in the reader side

Signed-off-by: Xiao Guangrong <xiaoguangrong@cn.fujitsu.com>
Signed-off-by: Chris Mason <chris.mason@oracle.com>
2011-05-23 13:24:43 -04:00
Xiao Guangrong
4622470565 Btrfs: drop unnecessary device lock
Drop device_list_mutex for the reader side  on clone_fs_devices and
btrfs_rm_device pathes since the fs_info->volume_mutex can ensure the device
list is not updated

btrfs_close_extra_devices is the initialized path, we can not add or remove
device at this time, so we can simply drop the mutex safely, like other
initialized function does(add_missing_dev, __find_device, __btrfs_open_devices
...).

Signed-off-by: Xiao Guangrong <xiaoguangrong@cn.fujitsu.com>
Signed-off-by: Chris Mason <chris.mason@oracle.com>
2011-05-23 13:24:43 -04:00
Xiao Guangrong
0c1daee085 Btrfs: fix the race between remove dev and alloc chunk
On remove device path, it updates device->dev_alloc_list but does not hold
chunk lock

Signed-off-by: Xiao Guangrong <xiaoguangrong@cn.fujitsu.com>
Signed-off-by: Chris Mason <chris.mason@oracle.com>
2011-05-23 13:24:43 -04:00
Xiao Guangrong
c9513edb00 Btrfs: fix the race between reading and updating devices
On btrfs_congested_fn and __unplug_io_fn paths, we should hold
device_list_mutex to avoid remove/add device path to
update fs_devices->devices

On __btrfs_close_devices and btrfs_prepare_sprout paths, the devices in
fs_devices->devices or fs_devices->devices is updated, so we should hold
the mutex to avoid the reader side to reach them

Signed-off-by: Xiao Guangrong <xiaoguangrong@cn.fujitsu.com>
Signed-off-by: Chris Mason <chris.mason@oracle.com>
2011-05-23 13:24:42 -04:00
Xiao Guangrong
4f6c9328c6 Btrfs: fix bh leak on __btrfs_open_devices path
'bh' is forgot to release if no error is detected

Signed-off-by: Xiao Guangrong <xiaoguangrong@cn.fujitsu.com>
Signed-off-by: Chris Mason <chris.mason@oracle.com>
2011-05-23 13:24:42 -04:00
Xiao Guangrong
c7f895a2b2 Btrfs: fix unsafe usage of merge_state
merge_state can free the current state if it can be merged with the next node,
but in set_extent_bit(), after merge_state, we still use the current extent to
get the next node and cache it into cached_state

Signed-off-by: Xiao Guangrong <xiaoguangrong@cn.fujitsu.com>
Signed-off-by: Chris Mason <chris.mason@oracle.com>
2011-05-23 13:24:41 -04:00
Xiao Guangrong
8233767a22 Btrfs: allocate extent state and check the result properly
It doesn't allocate extent_state and check the result properly:
- in set_extent_bit, it doesn't allocate extent_state if the path is not
  allowed wait

- in clear_extent_bit, it doesn't check the result after atomic-ly allocate,
  we trigger BUG_ON() if it's fail

- if allocate fail, we trigger BUG_ON instead of returning -ENOMEM since
  the return value of clear_extent_bit() is ignored by many callers

Signed-off-by: Xiao Guangrong <xiaoguangrong@cn.fujitsu.com>
Signed-off-by: Chris Mason <chris.mason@oracle.com>
2011-05-23 13:24:41 -04:00
Julia Lawall
b083916638 fs/btrfs: Add missing btrfs_free_path
Btrfs_alloc_path should be matched with btrfs_free_path in error-handling code.

A simplified version of the semantic match that finds this problem is as
follows: (http://coccinelle.lip6.fr/)

// <smpl>
@r exists@
local idexpression struct btrfs_path * x;
expression ra,rb;
position p1,p2;
@@

x = btrfs_alloc_path@p1(...)
...  when != btrfs_free_path(x,...)
     when != if (...) { ... btrfs_free_path(x,...) ...}
     when != x = ra
if(...) { ... when != x = rb
     when forall
     when != btrfs_free_path(x,...)
 \(return <+...x...+>; \| return@p2...; \) }

@script:python@
p1 << r.p1;
p2 << r.p2;
@@

cocci.print_main("alloc",p1)
cocci.print_secs("return",p2)
// </smpl>

Signed-off-by: Julia Lawall <julia@diku.dk>
Signed-off-by: Chris Mason <chris.mason@oracle.com>
2011-05-23 13:24:41 -04:00
Tsutomu Itoh
37daa4f968 Btrfs: check return value of btrfs_inc_extent_ref()
If return value of btrfs_inc_extent_ref() is not 0, BUG() is called.

Signed-off-by: Tsutomu Itoh <t-itoh@jp.fujitsu.com>
Signed-off-by: Chris Mason <chris.mason@oracle.com>
2011-05-23 13:24:40 -04:00
Tsutomu Itoh
c00e9493f1 Btrfs: return error to caller if read_one_inode() fails
When read_one_inode() fails, error code is returned to caller instead
of BUG_ON().

Signed-off-by: Tsutomu Itoh <t-itoh@jp.fujitsu.com>
Signed-off-by: Chris Mason <chris.mason@oracle.com>
2011-05-23 13:24:40 -04:00
Tsutomu Itoh
1cd307990d Btrfs: BUG_ON is deleted from the caller of btrfs_truncate_item & btrfs_extend_item
Currently, btrfs_truncate_item and btrfs_extend_item returns only 0.
So, the check by BUG_ON in the caller is unnecessary.

Signed-off-by: Tsutomu Itoh <t-itoh@jp.fujitsu.com>
Signed-off-by: Chris Mason <chris.mason@oracle.com>
2011-05-23 13:24:39 -04:00
Tsutomu Itoh
65a246c5ff Btrfs: return error code to caller when btrfs_del_item fails
The error code is returned instead of calling BUG_ON when
btrfs_del_item returns the error.

Signed-off-by: Tsutomu Itoh <t-itoh@jp.fujitsu.com>
Signed-off-by: Chris Mason <chris.mason@oracle.com>
2011-05-23 13:24:39 -04:00
Tsutomu Itoh
b0b802d7e3 Btrfs: return error code to caller when btrfs_previous_item fails
The error code is returned instead of calling BUG_ON when
btrfs_previous_item returns the error.

Signed-off-by: Tsutomu Itoh <t-itoh@jp.fujitsu.com>
Signed-off-by: Chris Mason <chris.mason@oracle.com>
2011-05-23 13:24:39 -04:00
Sergei Trofimovich
27160b6b5a btrfs: fix typo 'testeing' -> 'testing'
Signed-off-by: Sergei Trofimovich <slyfox@gentoo.org>
Signed-off-by: Chris Mason <chris.mason@oracle.com>
2011-05-23 13:24:32 -04:00
Sergei Trofimovich
9694b3fcbb btrfs: typo: 'btrfS' -> 'btrfs'
Signed-off-by: Sergei Trofimovich <slyfox@gentoo.org>
Signed-off-by: Chris Mason <chris.mason@oracle.com>
2011-05-23 13:24:25 -04:00
Sergei Trofimovich
c4f675cd40 btrfs: don't spin in shrink_delalloc if there is nothing to free
Observed as a large delay when --mixed filesystem is filled up.
Test example:
1. create tiny --mixed FS:
   $ dd if=/dev/zero of=2G.img seek=$((2048 * 1024 * 1024 - 1)) count=1 bs=1
   $ mkfs.btrfs --mixed 2G.img
   $ mount -oloop 2G.img /mnt/ut/
2. Try to fill it up:
   $ dd if=/dev/urandom of=10M.file bs=10240 count=1024
   $ seq 1 256 | while read file_no; do echo $file_no; time cp 10M.file ${file_no}.copy; done

Up to '200.copy' it goes fast, but when disk fills-up each -ENOSPC
message takes 3 seconds to pop-up _every_ ENOSPC (and in usermode linux
it's even more: 30-60 seconds!). (Maybe, time depends on kernel's timer resolution).

No IO, no CPU load, just rescheduling. Some debugging revealed busy spinning
in shrink_delalloc.

Signed-off-by: Sergei Trofimovich <slyfox@gentoo.org>
Reviewed-by: Josef Bacik <josef@redhat.com>
Signed-off-by: Chris Mason <chris.mason@oracle.com>
2011-05-23 13:24:14 -04:00
Jamey Sharp
0f3b708c11 btrfs: Delete unused version.sh script.
In 2008, commit b4f6c45dfbf84f47c21f73f6370ad1292b0627fd dropped the use
of fs/btrfs/version.sh, but left the script behind. Kill it.

Commit by Jamey Sharp and Josh Triplett.

Signed-off-by: Jamey Sharp <jamey@minilop.net>
Signed-off-by: Josh Triplett <josh@joshtriplett.org>
Cc: Chris Mason <chris.mason@oracle.com>
Signed-off-by: Chris Mason <chris.mason@oracle.com>
2011-05-23 13:05:39 -04:00
Hugo Mills
e215686715 btrfs: Ensure the tree search ioctl returns the right number of records
Btrfs's tree search ioctl has a field to indicate that no more than a
given number of records should be returned. The ioctl doesn't honour
this, as the tested value is not incremented until the end of the
copy_to_sk function. This patch removes an unnecessary local variable,
and updates the num_found counter as each key is found in the tree.

Signed-off-by: Hugo Mills <hugo@carfax.org.uk>
Signed-off-by: Chris Mason <chris.mason@oracle.com>
2011-05-23 13:05:39 -04:00
Andi Kleen
0956c798ef BTRFS: Remove unused node_lock
240f62c8756 replaced the node_lock with rcu_read_lock, but forgot
to remove the actual lock in the data structure. Remove it here.

Signed-off-by: Andi Kleen <ak@linux.intel.com>
Signed-off-by: Chris Mason <chris.mason@oracle.com>
2011-05-23 13:05:39 -04:00
Josef Bacik
d90c732122 Btrfs: leave spinning on lookup and map the leaf
On lookup we only want to read the inode item, so leave the path spinning.  Also
we're just wholesale reading the leaf off, so map the leaf so we don't do a
bunch of kmap/kunmaps.  Thanks,

Signed-off-by: Josef Bacik <josef@redhat.com>
2011-05-23 13:03:17 -04:00
Josef Bacik
207dde8289 Btrfs: check for duplicate entries in the free space cache
If there are duplicate entries in the free space cache, discard the entire cache
and load it the old fashioned way.  Thanks,

Signed-off-by: Josef Bacik <josef@redhat.com>
2011-05-23 13:03:16 -04:00
Josef Bacik
cca1c81f43 Btrfs: don't try to allocate from a block group that doesn't have enough space
If we have a very large filesystem, we can spend a lot of time in
find_free_extent just trying to allocate from empty block groups.  So instead
check to see if the block group even has enough space for the allocation, and if
not go on to the next block group.

Signed-off-by: Josef Bacik <josef@redhat.com>
2011-05-23 13:03:15 -04:00
Josef Bacik
026fd31782 Btrfs: don't always do readahead
Our readahead is sort of sloppy, and really isn't always needed.  For example if
ls is doing a stating ls (which is the default) it's going to stat in non-disk
order, so if say you have a directory with a stupid amount of files, readahead
is going to do nothing but waste time in the case of doing the stat.  Taking the
unconditional readahead out made my test go from 57 minutes to 36 minutes.  This
means that everywhere we do loop through the tree we want to make sure we do set
path->reada properly, so I went through and found all of the places where we
loop through the path and set reada to 1.  Thanks,

Signed-off-by: Josef Bacik <josef@redhat.com>
2011-05-23 13:03:14 -04:00
Josef Bacik
589d8ade83 Btrfs: try not to sleep as much when doing slow caching
When the fs is super full and we unmount the fs, we could get stuck in this
thing where unmount is waiting for the caching kthread to make progress and the
caching kthread keeps scheduling because we're in the middle of a commit.  So
instead just let the caching kthread keep going and only yeild if
need_resched().  This makes my horrible umount case go from taking up to 10
minutes to taking less than 20 seconds.  Thanks,

Signed-off-by: Josef Bacik <josef@redhat.com>
2011-05-23 13:03:13 -04:00
Josef Bacik
d82a6f1d7e Btrfs: kill BTRFS_I(inode)->block_group
Originally this was going to be used as a way to give hints to the allocator,
but frankly we can get much better hints elsewhere and it's not even used at all
for anything usefull.  In addition to be completely useless, when we initialize
an inode we try and find a freeish block group to set as the inodes block group,
and with a completely full 40gb fs this takes _forever_, so I imagine with say
1tb fs this is just unbearable.  So just axe the thing altoghether, we don't
need it and it saves us 8 bytes in the inode and saves us 500 microseconds per
inode lookup in my testcase.  Thanks,

Signed-off-by: Josef Bacik <josef@redhat.com>
2011-05-23 13:03:12 -04:00
Josef Bacik
7e2355ba1a Btrfs: don't look at the extent buffer level 3 times in a row
We have a bit of debugging in btrfs_search_slot to make sure the level of the
cow block is the same as the original block we were cow'ing.  I don't think I've
ever seen this tripped, so kill it.  This saves us 2 kmap's per level in our
search.  Thanks,

Signed-off-by: Josef Bacik <josef@redhat.com>
2011-05-23 13:03:11 -04:00
Josef Bacik
cb25c2ea6a Btrfs: map the node block when looking for readahead targets
If we have particularly full nodes, we could call btrfs_node_blockptr up to 32
times, which is 32 pairs of kmap/kunmap, which _sucks_.  So go ahead and map the
extent buffer while we look for readahead targets.  Thanks,

Signed-off-by: Josef Bacik <josef@redhat.com>
2011-05-23 13:03:10 -04:00
Josef Bacik
af60bed24e Btrfs: set range_start to the right start in count_range_bits
In count_range_bits we are adjusting total_bytes based on the range we are
searching for, but we don't adjust the range start according to the range we are
searching for, which makes for weird results.  For example, if the range

[0-8192]

is set DELALLOC, but I search for 4096-8192, I will get back 4096 for the number
of bytes found, but the range_start will be 0, which makes it look like the
range is [0-4096].  So instead set range_start = max(cur_start, state->start).
This makes everything come out right.  Thanks,

Signed-off-by: Josef Bacik <josef@redhat.com>
2011-05-23 13:03:09 -04:00
Josef Bacik
fcb80c2aff Btrfs: fix how we do space reservation for truncate
The ceph guys keep running into problems where we have space reserved in our
orphan block rsv when freeing it up.  This is because they tend to do snapshots
alot, so their truncates tend to use a bunch of space, so when we go to do
things like update the inode we have to steal reservation space in order to make
the reservation happen.  This happens because truncate can use as much space as
it freaking feels like, but we still have to hold space for removing the orphan
item and updating the inode, which will definitely always happen.  So in order
to fix this we need to split all of the reservation stuf up.  So with this patch
we have

1) The orphan block reserve which only holds the space for deleting our orphan
item when everything is over.

2) The truncate block reserve which gets allocated and used specifically for the
space that the truncate will use on a per truncate basis.

3) The transaction will always have 1 item's worth of data reserved so we can
update the inode normally.

Hopefully this will make the ceph problem go away.  Thanks,

Signed-off-by: Josef Bacik <josef@redhat.com>
2011-05-23 13:03:08 -04:00
Josef Bacik
a4abeea41a Btrfs: kill trans_mutex
We use trans_mutex for lots of things, here's a basic list

1) To serialize trans_handles joining the currently running transaction
2) To make sure that no new trans handles are started while we are committing
3) To protect the dead_roots list and the transaction lists

Really the serializing trans_handles joining is not too hard, and can really get
bogged down in acquiring a reference to the transaction.  So replace the
trans_mutex with a trans_lock spinlock and use it to do the following

1) Protect fs_info->running_transaction.  All trans handles have to do is check
this, and then take a reference of the transaction and keep on going.
2) Protect the fs_info->trans_list.  This doesn't get used too much, basically
it just holds the current transactions, which will usually just be the currently
committing transaction and the currently running transaction at most.
3) Protect the dead roots list.  This is only ever processed by splicing the
list so this is relatively simple.
4) Protect the fs_info->reloc_ctl stuff.  This is very lightweight and was using
the trans_mutex before, so this is a pretty straightforward change.
5) Protect fs_info->no_trans_join.  Because we don't hold the trans_lock over
the entirety of the commit we need to have a way to block new people from
creating a new transaction while we're doing our work.  So we set no_trans_join
and in join_transaction we test to see if that is set, and if it is we do a
wait_on_commit.
6) Make the transaction use count atomic so we don't need to take locks to
modify it when we're dropping references.
7) Add a commit_lock to the transaction to make sure multiple people trying to
commit the same transaction don't race and commit at the same time.
8) Make open_ioctl_trans an atomic so we don't have to take any locks for ioctl
trans.

I have tested this with xfstests, but obviously it is a pretty hairy change so
lots of testing is greatly appreciated.  Thanks,

Signed-off-by: Josef Bacik <josef@redhat.com>
2011-05-23 13:00:57 -04:00
Josef Bacik
2a1eb4614d Btrfs: if we've already started a trans handle, use that one
We currently track trans handles in current->journal_info, but we don't actually
use it.  This patch fixes it.  This will cover the case where we have multiple
people starting transactions down the call chain.  This keeps us from having to
allocate a new handle and all of that, we just increase the use count of the
current handle, save the old block_rsv, and return.  I tested this with xfstests
and it worked out fine.  Thanks,

Signed-off-by: Josef Bacik <josef@redhat.com>
2011-05-23 13:00:57 -04:00
Josef Bacik
7a7eaa40a3 Btrfs: take away the num_items argument from btrfs_join_transaction
I keep forgetting that btrfs_join_transaction() just ignores the num_items
argument, which leads me to sending pointless patches and looking stupid :).  So
just kill the num_items argument from btrfs_join_transaction and
btrfs_start_ioctl_transaction, since neither of them use it.  Thanks,

Signed-off-by: Josef Bacik <josef@redhat.com>
2011-05-23 13:00:56 -04:00
Josef Bacik
74b2107543 Btrfs: make sure to use the delalloc reserve when filling delalloc
In the prealloc filling code and compressed code we don't set trans->block_rsv
to the delalloc block reserve properly, which is going to make us use metadata
from the wrong pool, this patch fixes that.  Thanks,

Signed-off-by: Josef Bacik <josef@redhat.com>
2011-05-23 13:00:56 -04:00
Linus Torvalds
57d19e80f4 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial: (39 commits)
  b43: fix comment typo reqest -> request
  Haavard Skinnemoen has left Atmel
  cris: typo in mach-fs Makefile
  Kconfig: fix copy/paste-ism for dell-wmi-aio driver
  doc: timers-howto: fix a typo ("unsgined")
  perf: Only include annotate.h once in tools/perf/util/ui/browsers/annotate.c
  md, raid5: Fix spelling error in comment ('Ofcourse' --> 'Of course').
  treewide: fix a few typos in comments
  regulator: change debug statement be consistent with the style of the rest
  Revert "arm: mach-u300/gpio: Fix mem_region resource size miscalculations"
  audit: acquire creds selectively to reduce atomic op overhead
  rtlwifi: don't touch with treewide double semicolon removal
  treewide: cleanup continuations and remove logging message whitespace
  ath9k_hw: don't touch with treewide double semicolon removal
  include/linux/leds-regulator.h: fix syntax in example code
  tty: fix typo in descripton of tty_termios_encode_baud_rate
  xtensa: remove obsolete BKL kernel option from defconfig
  m68k: fix comment typo 'occcured'
  arch:Kconfig.locks Remove unused config option.
  treewide: remove extra semicolons
  ...
2011-05-23 09:12:26 -07:00
Tejun Heo
ff2a9941ca block: move bd_set_size() above rescan_partitions() in __blkdev_get()
02e352287a4 (block: rescan partitions on invalidated devices on
-ENOMEDIA too) relocated partition rescan above explicit bd_set_size()
to simplify condition check.  As rescan_partitions() does its own bdev
size setting, this doesn't break anything; however,
rescan_partitions() prints out the following messages when adjusting
bdev size, which can be confusing.

  sda: detected capacity change from 0 to 146815737856
  sdb: detected capacity change from 0 to 146815737856

This patch restores the original order and remove the warning
messages.

stable: Please apply together with 02e352287a4 (block: rescan
        partitions on invalidated devices on -ENOMEDIA too).

Signed-off-by: Tejun Heo <tj@kernel.org>
Reported-by: Tony Luck <tony.luck@gmail.com>
Tested-by: Tony Luck <tony.luck@gmail.com>
Cc: stable@kernel.org
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2011-05-23 08:50:48 -07:00
David Teigland
901025d2f3 dlm: make plock operation killable
Allow processes blocked on plock requests to be interrupted
when they are killed.  This leaves the problem of cleaning
up the lock state in userspace.  This has three parts:

1. Add a flag to unlock operations sent to userspace
indicating the file is being closed.  Userspace will
then look for and clear any waiting plock operations that
were abandoned by an interrupted process.

2. Queue an unlock-close operation (like in 1) to clean up
userspace from an interrupted plock request.  This is needed
because the vfs will not send a cleanup-unlock if it sees no
locks on the file, which it won't if the interrupted operation
was the only one.

3. Do not use replies from userspace for unlock-close operations
because they are unnecessary (they are just cleaning up for the
process which did not make an unlock call).  This also simplifies
the new unlock-close generated from point 2.

Signed-off-by: David Teigland <teigland@redhat.com>
2011-05-23 10:47:06 -05:00