linux/fs/xfs
Dave Chinner 1316d4da3f xfs: unpin stale inodes directly in IOP_COMMITTED
When inodes are marked stale in a transaction, they are treated
specially when the inode log item is being inserted into the AIL.
It tries to avoid moving the log item forward in the AIL due to a
race condition with the writing the underlying buffer back to disk.
The was "fixed" in commit de25c18 ("xfs: avoid moving stale inodes
in the AIL").

To avoid moving the item forward, we return a LSN smaller than the
commit_lsn of the completing transaction, thereby trying to trick
the commit code into not moving the inode forward at all. I'm not
sure this ever worked as intended - it assumes the inode is already
in the AIL, but I don't think the returned LSN would have been small
enough to prevent moving the inode. It appears that the reason it
worked is that the lower LSN of the inodes meant they were inserted
into the AIL and flushed before the inode buffer (which was moved to
the commit_lsn of the transaction).

The big problem is that with delayed logging, the returning of the
different LSN means insertion takes the slow, non-bulk path.  Worse
yet is that insertion is to a position -before- the commit_lsn so it
is doing a AIL traversal on every insertion, and has to walk over
all the items that have already been inserted into the AIL. It's
expensive.

To compound the matter further, with delayed logging inodes are
likely to go from clean to stale in a single checkpoint, which means
they aren't even in the AIL at all when we come across them at AIL
insertion time. Hence these were all getting inserted into the AIL
when they simply do not need to be as inodes marked XFS_ISTALE are
never written back.

Transactional/recovery integrity is maintained in this case by the
other items in the unlink transaction that were modified (e.g. the
AGI btree blocks) and committed in the same checkpoint.

So to fix this, simply unpin the stale inodes directly in
xfs_inode_item_committed() and return -1 to indicate that the AIL
insertion code does not need to do any further processing of these
inodes.

Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Alex Elder <aelder@sgi.com>
2011-07-06 15:44:40 -05:00
..
linux-2.6 xfs: make log devices with write back caches work 2011-06-16 10:52:39 -05:00
quota vmscan: change shrinker API by passing shrink_control struct 2011-05-25 08:39:26 -07:00
support
Kconfig
Makefile Merge branch 'for-linus' of git://oss.sgi.com/xfs/xfs 2011-03-21 14:24:56 -07:00
xfs_acl.h
xfs_ag.h xfs: do not discard alloc btree blocks 2011-05-24 11:17:22 -05:00
xfs_alloc_btree.c xfs: do not discard alloc btree blocks 2011-05-24 11:17:22 -05:00
xfs_alloc_btree.h
xfs_alloc.c xfs: do not discard alloc btree blocks 2011-05-24 11:17:22 -05:00
xfs_alloc.h xfs: do not discard alloc btree blocks 2011-05-24 11:17:22 -05:00
xfs_arch.h
xfs_attr_leaf.c
xfs_attr_leaf.h
xfs_attr_sf.h
xfs_attr.c xfs: prevent bogus assert when trying to remove non-existent attribute 2011-06-23 22:13:51 -05:00
xfs_attr.h
xfs_bit.c
xfs_bit.h
xfs_bmap_btree.c
xfs_bmap_btree.h
xfs_bmap.c xfs: correctly decrement the extent buffer index in xfs_bmap_del_extent 2011-05-25 10:48:38 -05:00
xfs_bmap.h xfs: remove the unused XFS_BMAPI_RSVBLOCKS flag 2011-05-25 10:48:36 -05:00
xfs_btree_trace.c
xfs_btree_trace.h
xfs_btree.c
xfs_btree.h
xfs_buf_item.c Fix common misspellings 2011-03-31 11:26:23 -03:00
xfs_buf_item.h
xfs_da_btree.c
xfs_da_btree.h
xfs_dfrag.c xfs: cleanup duplicate initializations 2011-04-28 13:25:29 -05:00
xfs_dfrag.h
xfs_dinode.h
xfs_dir2_block.c
xfs_dir2_block.h
xfs_dir2_data.c
xfs_dir2_data.h
xfs_dir2_leaf.c
xfs_dir2_leaf.h
xfs_dir2_node.c
xfs_dir2_node.h
xfs_dir2_sf.c
xfs_dir2_sf.h
xfs_dir2.c
xfs_dir2.h
xfs_error.c
xfs_error.h
xfs_extfree_item.c
xfs_extfree_item.h
xfs_filestream.c
xfs_filestream.h
xfs_fs.h
xfs_fsops.c
xfs_fsops.h
xfs_ialloc_btree.c
xfs_ialloc_btree.h
xfs_ialloc.c
xfs_ialloc.h
xfs_iget.c xfs: reset inode per-lifetime state when recycling it 2011-06-23 22:13:31 -05:00
xfs_inode_item.c xfs: unpin stale inodes directly in IOP_COMMITTED 2011-07-06 15:44:40 -05:00
xfs_inode_item.h
xfs_inode.c Merge branch 'for-linus' of git://oss.sgi.com/xfs/xfs 2011-05-26 10:49:11 -07:00
xfs_inode.h xfs: reset inode per-lifetime state when recycling it 2011-06-23 22:13:31 -05:00
xfs_inum.h
xfs_iomap.c
xfs_iomap.h
xfs_itable.c xfs: fix variable set but not used warnings 2011-04-08 08:09:12 -05:00
xfs_itable.h
xfs_log_cil.c xfs: add online discard support 2011-05-24 11:17:13 -05:00
xfs_log_priv.h xfs: exact busy extent tracking 2011-04-28 13:18:04 -05:00
xfs_log_recover.c xfs: reset buffer pointers before freeing them 2011-05-19 12:03:45 -05:00
xfs_log_recover.h
xfs_log.c xfs: make log devices with write back caches work 2011-06-16 10:52:39 -05:00
xfs_log.h xfs: exact busy extent tracking 2011-04-28 13:18:04 -05:00
xfs_mount.c xfs: cleanup duplicate initializations 2011-04-28 13:25:29 -05:00
xfs_mount.h xfs: add online discard support 2011-05-24 11:17:13 -05:00
xfs_mru_cache.c
xfs_mru_cache.h
xfs_quota.h
xfs_rename.c
xfs_rtalloc.c
xfs_rtalloc.h
xfs_rw.c
xfs_rw.h
xfs_sb.h
xfs_trans_ail.c xfs: fix race condition in AIL push trigger 2011-05-09 18:35:04 -05:00
xfs_trans_buf.c xfs: xfs_trans_read_buf() should return an error on failure 2011-03-26 09:14:44 +11:00
xfs_trans_extfree.c
xfs_trans_inode.c Fix common misspellings 2011-03-31 11:26:23 -03:00
xfs_trans_priv.h xfs: push the AIL from memory reclaim and periodic sync 2011-04-08 12:45:07 +10:00
xfs_trans_space.h
xfs_trans.c xfs: unpin stale inodes directly in IOP_COMMITTED 2011-07-06 15:44:40 -05:00
xfs_trans.h
xfs_types.h xfs: exact busy extent tracking 2011-04-28 13:18:04 -05:00
xfs_utils.c
xfs_utils.h
xfs_vnodeops.c xfs: clear XFS_IDIRTY_RELEASE on truncate down 2011-06-23 22:13:46 -05:00
xfs_vnodeops.h xfs: preallocation transactions do not need to be synchronous 2011-03-26 09:13:08 +11:00
xfs.h