linux

mirror of https://github.com/FEX-Emu/linux.git synced 2024-12-22 17:33:01 +00:00

Author	SHA1	Message	Date
Sergey Senozhatsky	c39aa7056f	btrfs compression: reuse recently used workspace Add compression `workspace' in free_workspace() to `idle_workspace' list head, instead of tail. So we have better chances to reuse most recently used `workspace'. Signed-off-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com> Reviewed-by: David Sterba <dsterba@suse.cz> Signed-off-by: Chris Mason <clm@fb.com>	2014-06-28 13:48:46 -07:00
Liu Bo	5588383ece	Btrfs: fix crash when mounting raid5 btrfs with missing disks The reproducer is $ mkfs.btrfs D1 D2 D3 -mraid5 $ mkfs.ext4 D2 && mkfs.ext4 D3 $ mount D1 /btrfs -odegraded ------------------- [ 87.672992] ------------[ cut here ]------------ [ 87.673845] kernel BUG at fs/btrfs/raid56.c:1828! ... [ 87.673845] RIP: 0010:[<ffffffff813efc7e>] [<ffffffff813efc7e>] __raid_recover_end_io+0x4ae/0x4d0 ... [ 87.673845] Call Trace: [ 87.673845] [<ffffffff8116bbc6>] ? mempool_free+0x36/0xa0 [ 87.673845] [<ffffffff813f0255>] raid_recover_end_io+0x75/0xa0 [ 87.673845] [<ffffffff81447c5b>] bio_endio+0x5b/0xa0 [ 87.673845] [<ffffffff81447cb2>] bio_endio_nodec+0x12/0x20 [ 87.673845] [<ffffffff81374621>] end_workqueue_fn+0x41/0x50 [ 87.673845] [<ffffffff813ad2aa>] normal_work_helper+0xca/0x2c0 [ 87.673845] [<ffffffff8108ba2b>] process_one_work+0x1eb/0x530 [ 87.673845] [<ffffffff8108b9c9>] ? process_one_work+0x189/0x530 [ 87.673845] [<ffffffff8108c15b>] worker_thread+0x11b/0x4f0 [ 87.673845] [<ffffffff8108c040>] ? rescuer_thread+0x290/0x290 [ 87.673845] [<ffffffff810939c4>] kthread+0xe4/0x100 [ 87.673845] [<ffffffff810938e0>] ? kthread_create_on_node+0x220/0x220 [ 87.673845] [<ffffffff817e7c7c>] ret_from_fork+0x7c/0xb0 [ 87.673845] [<ffffffff810938e0>] ? kthread_create_on_node+0x220/0x220 ------------------- It's because that we miscalculate @rbio->bbio->error so that it doesn't reach maximum of tolerable errors while it should have. Signed-off-by: Liu Bo <bo.li.liu@oracle.com> Tested-by: Satoru Takeuchi<takeuchi_satoru@jp.fujitsu.com> Signed-off-by: Chris Mason <clm@fb.com>	2014-06-28 13:48:45 -07:00
Anand Jain	b2373f255c	btrfs: create sprout should rename fsid on the sysfs as well Creating sprout will change the fsid of the mounted root. do the same on the sysfs as well. reproducer: mount /dev/sdb /btrfs (seed disk) btrfs dev add /dev/sdc /btrfs mount -o rw,remount /btrfs btrfs dev del /dev/sdb /btrfs mount /dev/sdb /btrfs Error: kobject_add_internal failed for fe350492-dc28-4051-a601-e017b17e6145 with -EEXIST, don't try to register things with the same name in the same directory. Signed-off-by: Anand Jain <anand.jain@oracle.com> Reviewed-by: David Sterba <dsterba@suse.cz> Signed-off-by: Chris Mason <clm@fb.com>	2014-06-28 13:48:44 -07:00
Anand Jain	49c6f736f3	btrfs: dev replace should replace the sysfs entry when we replace the device its corresponding sysfs entry has to be replaced as well Signed-off-by: Anand Jain <anand.jain@oracle.com> Reviewed-by: David Sterba <dsterba@suse.cz> Signed-off-by: Chris Mason <clm@fb.com>	2014-06-28 13:48:44 -07:00
Anand Jain	0d39376aa2	btrfs: dev add should add its sysfs entry we would need the device links to be created, when device is added. Signed-off-by: Anand Jain <Anand.Jain@oracle.com> Reviewed-by: David Sterba <dsterba@suse.cz> Signed-off-by: Chris Mason <clm@fb.com>	2014-06-28 13:48:43 -07:00
Anand Jain	99994cde9c	btrfs: dev delete should remove sysfs entry when we delete the device from the mounted btrfs, we would need its corresponding sysfs enty to be removed as well. Signed-off-by: Anand Jain <Anand.Jain@oracle.com> Reviewed-by: David Sterba <dsterba@suse.cz> Signed-off-by: Chris Mason <clm@fb.com>	2014-06-28 13:48:42 -07:00
Anand Jain	9b4eaf43f4	btrfs: rename add_device_membership to btrfs_kobj_add_device Signed-off-by: Anand Jain <anand.jain@oracle.com> Reviewed-by: David Sterba <dsterba@suse.cz> Signed-off-by: Chris Mason <clm@fb.com>	2014-06-28 13:48:41 -07:00
J. Bruce Fields	76f47128f9	nfsd: fix rare symlink decoding bug An NFS operation that creates a new symlink includes the symlink data, which is xdr-encoded as a length followed by the data plus 0 to 3 bytes of zero-padding as required to reach a 4-byte boundary. The vfs, on the other hand, wants null-terminated data. The simple way to handle this would be by copying the data into a newly allocated buffer with space for the final null. The current nfsd_symlink code tries to be more clever by skipping that step in the (likely) case where the byte following the string is already 0. But that assumes that the byte following the string is ours to look at. In fact, it might be the first byte of a page that we can't read, or of some object that another task might modify. Worse, the NFSv4 code tries to fix the problem by actually writing to that byte. In the NFSv2/v3 cases this actually appears to be safe: - nfs3svc_decode_symlinkargs explicitly null-terminates the data (after first checking its length and copying it to a new page). - NFSv2 limits symlinks to 1k. The buffer holding the rpc request is always at least a page, and the link data (and previous fields) have maximum lengths that prevent the request from reaching the end of a page. In the NFSv4 case the CREATE op is potentially just one part of a long compound so can end up on the end of a page if you're unlucky. The minimal fix here is to copy and null-terminate in the NFSv4 case. The nfsd_symlink() interface here seems too fragile, though. It should really either do the copy itself every time or just require a null-terminated string. Reported-by: Jeff Layton <jlayton@primarydata.com> Cc: stable@vger.kernel.org Signed-off-by: J. Bruce Fields <bfields@redhat.com>	2014-06-27 16:10:46 -04:00
Jan Kara	a93cd4cf86	ext4: Fix hole punching for files with indirect blocks Hole punching code for files with indirect blocks wrongly computed number of blocks which need to be cleared when traversing the indirect block tree. That could result in punching more blocks than actually requested and thus effectively cause a data loss. For example: fallocate -n -p 10240000 4096 will punch the range 10240000 - 12632064 instead of the range 1024000 - 10244096. Fix the calculation. CC: stable@vger.kernel.org Fixes: `8bad6fc813` Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Theodore Ts'o <tytso@mit.edu>	2014-06-26 12:30:54 -04:00
Jan Kara	77ea2a4ba6	ext4: Fix block zeroing when punching holes in indirect block files free_holes_block() passed local variable as a block pointer to ext4_clear_blocks(). Thus ext4_clear_blocks() zeroed out this local variable instead of proper place in inode / indirect block. We later zero out proper place in inode / indirect block but don't dirty the inode / buffer again which can lead to subtle issues (some changes e.g. to inode can be lost). Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Theodore Ts'o <tytso@mit.edu>	2014-06-26 12:28:57 -04:00
Namjae Jeon	e43bb4e612	ext4: decrement free clusters/inodes counters when block group declared bad We should decrement free clusters counter when block bitmap is marked as corrupt and free inodes counter when the allocation bitmap is marked as corrupt to avoid misunderstanding due to incorrect available size in statfs result. User can get immediately ENOSPC error from write begin without reaching for the writepages. Cc: Darrick J. Wong<darrick.wong@oracle.com> Reported-by: Amit Sahrawat <amit.sahrawat83@gmail.com> Signed-off-by: Namjae Jeon <namjae.jeon@samsung.com> Signed-off-by: Ashish Sangwan <a.sangwan@samsung.com>	2014-06-26 10:11:53 -04:00
Linus Torvalds	d7933ab727	Merge branch 'for-next' of git://git.samba.org/sfrench/cifs-2.6 Pull CIFS fixes from Steve French: "Small set of misc cifs/smb3 fixes" * 'for-next' of git://git.samba.org/sfrench/cifs-2.6: [CIFS] fix mount failure with broken pathnames when smb3 mount with mapchars option cifs: revalidate mapping prior to satisfying read_iter request with cache=loose fs/cifs: fix regression in cifs_create_mf_symlink()	2014-06-25 21:47:28 -07:00
Linus Torvalds	ec71feae06	NFS client fixes for Linux 3.16 Highlights include: - Stable fix for a data corruption case due to incorrect cache validation - Fix a couple of false positive cache invalidations - Fix NFSv4 security negotiation issues -----BEGIN PGP SIGNATURE----- Version: GnuPG v1 iQIcBAABAgAGBQJTq1GIAAoJEGcL54qWCgDyCfUP/3S7Py5Gocdqvb7FBPpCWtsb PJlv1RjC4ngT+BJpBeDSOFEcZerfeQAwguL5kEgIjdyKmsAjVjIF7ThagNQK/0yr qpeKh2EtbAipjjXVmul7saG3Ucuv/PggEhqGl9iJK0QyPdmnr30cHGHHt3kCIPGE e4AkaCN4ZuXBdDOO4YpKzIl6wQPb0Gjwps1boW4INCvnBvK6Yno26Q6ilDf92gJE hisEn0l8l09C6t2jZKP7daCyGForTYYlMxIbmjmQhsMEwnh1kmfpr/xuAQP2bflr 14OFrNbrZg3p4ucp8g7EzgS1Z5m/Ism0xNKfO4LgNwUobSgbvvvScAC3/LP2HIIk RXuRhgb8u6pbWQRqq4XznB+csh6DGR/ui2PhonK4lJDaJxcU3bnFlhTgoC0GSyCa Wbbdv+nhXhw5Xi9jsma6PW/CnHJH6sk/8KviRPOpC+RsCg+X41vTHzC4XvWbentw aZGkNuWAnBKMyswu08E4+ScFQxToSB6ju4RjOsTTMleC0ewWXD3Y6FL+B5p4crPO L05KCLkP+SeRxpakOM3e/x/bkVOa+DBna7foXUZ9snWybYoOmuxOkJgJT7bxrYaA /3N0e/WUUgPR/bhdydMJSRo6DchKj+5GRSpx8FB9eMqqp8mNE+I61/Kq0dFEbtPQ 1IQCFT4w1PEegDpwjb0L =o+QR -----END PGP SIGNATURE----- Merge tag 'nfs-for-3.16-2' of git://git.linux-nfs.org/projects/trondmy/linux-nfs Pull NFS client fixes from Trond Myklebust: "Highlights include: - Stable fix for a data corruption case due to incorrect cache validation - Fix a couple of false positive cache invalidations - Fix NFSv4 security negotiation issues" * tag 'nfs-for-3.16-2' of git://git.linux-nfs.org/projects/trondmy/linux-nfs: NFSv4: test SECINFO RPC_AUTH_GSS pseudoflavors for support NFS Return -EPERM if no supported or matching SECINFO flavor NFS check the return of nfs4_negotiate_security in nfs4_submount NFS: Don't mark the data cache as invalid if it has been flushed NFS: Clear NFS_INO_REVAL_PAGECACHE when we update the file size nfs: Fix cache_validity check in nfs_write_pageuptodate()	2014-06-25 20:06:06 -07:00
T Makphaibulchoke	ec7756ae15	fs/mbcache: replace __builtin_log2() with ilog2() Fix compiler error with some gcc version(s) that do not support __builtin_log2() by replacing __builtin_log2() with ilog2(). Signed-off-by: T. Makphaibulchoke <tmac@hp.com> Signed-off-by: Theodore Ts'o <tytso@mit.edu> Reviewed-by: Maciej W. Rozycki <macro@linux-mips.org>	2014-06-25 22:08:29 -04:00
Andy Adamson	66b0686049	NFSv4: test SECINFO RPC_AUTH_GSS pseudoflavors for support Fix nfs4_negotiate_security to create an rpc_clnt used to test each SECINFO returned pseudoflavor. Check credential creation (and gss_context creation) which is important for RPC_AUTH_GSS pseudoflavors which can fail for multiple reasons including mis-configuration. Don't call nfs4_negotiate in nfs4_submount as it was just called by nfs4_proc_lookup_mountpoint (nfs4_proc_lookup_common) Signed-off-by: Andy Adamson <andros@netapp.com> [Trond: fix corrupt return value from nfs_find_best_sec()] Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>	2014-06-24 18:46:58 -04:00
Andy Adamson	8445cd3528	NFS Return -EPERM if no supported or matching SECINFO flavor Do not return RPC_AUTH_UNIX if SEINFO reply tests fail. This prevents an infinite loop of NFS4ERR_WRONGSEC for non RPC_AUTH_UNIX mounts. Without this patch, a mount with no sec= option to a server that does not include RPC_AUTH_UNIX in the SECINFO return can be presented with an attemtp to use RPC_AUTH_UNIX which will result in an NFS4ERR_WRONG_SEC which will prompt the SECINFO call which will again try RPC_AUTH_UNIX.... Signed-off-by: Andy Adamson <andros@netapp.com> Tested-By: Steve Dickson <steved@redhat.com> Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>	2014-06-24 18:46:58 -04:00
Andy Adamson	57bbe3d7c1	NFS check the return of nfs4_negotiate_security in nfs4_submount Signed-off-by: Andy Adamson <andros@netapp.com> Tested-By: Steve Dickson <steved@redhat.com> Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>	2014-06-24 18:46:57 -04:00
Trond Myklebust	6edf96097b	NFS: Don't mark the data cache as invalid if it has been flushed Now that we have functions such as nfs_write_pageuptodate() that use the cache_validity flags to check if the data cache is valid or not, it is a little more important to keep the flags in sync with the state of the data cache. In particular, we'd like to ensure that if the data cache is empty, we don't start marking it as needing revalidation. Reported-by: Scott Mayhew <smayhew@redhat.com> Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>	2014-06-24 18:46:57 -04:00
Trond Myklebust	f2467b6f64	NFS: Clear NFS_INO_REVAL_PAGECACHE when we update the file size In nfs_update_inode(), if the change attribute is seen to change on the server, then we set NFS_INO_REVAL_PAGECACHE in order to make sure that we check the file size. However, if we also update the file size in the same function, we don't need to check it again. So make sure that we clear the NFS_INO_REVAL_PAGECACHE that was set earlier. Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>	2014-06-24 18:46:57 -04:00
Scott Mayhew	18dd78c427	nfs: Fix cache_validity check in nfs_write_pageuptodate() NFS_INO_INVALID_DATA cannot be ignored, even if we have a delegation. We're still having some problems with data corruption when multiple clients are appending to a file and those clients are being granted write delegations on open. To reproduce: Client A: vi /mnt/`hostname -s` while :; do echo "XXXXXXXXXXXXXXX" >>/mnt/file; sleep $(( $RANDOM % 5 )); done Client B: vi /mnt/`hostname -s` while :; do echo "YYYYYYYYYYYYYYY" >>/mnt/file; sleep $(( $RANDOM % 5 )); done What's happening is that in nfs_update_inode() we're recognizing that the file size has changed and we're setting NFS_INO_INVALID_DATA accordingly, but then we ignore the cache_validity flags in nfs_write_pageuptodate() because we have a delegation. As a result, in nfs_updatepage() we're extending the write to cover the full page even though we've not read in the data to begin with. Signed-off-by: Scott Mayhew <smayhew@redhat.com> Cc: <stable@vger.kernel.org> # v3.11+ Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>	2014-06-24 18:46:56 -04:00
Benjamin LaHaise	edfbbf388f	aio: fix kernel memory disclosure in io_getevents() introduced in v3.10 A kernel memory disclosure was introduced in aio_read_events_ring() in v3.10 by commit `a31ad380be`. The changes made to aio_read_events_ring() failed to correctly limit the index into ctx->ring_pages[], allowing an attacked to cause the subsequent kmap() of an arbitrary page with a copy_to_user() to copy the contents into userspace. This vulnerability has been assigned CVE-2014-0206. Thanks to Mateusz and Petr for disclosing this issue. This patch applies to v3.12+. A separate backport is needed for 3.10/3.11. Signed-off-by: Benjamin LaHaise <bcrl@kvack.org> Cc: Mateusz Guzik <mguzik@redhat.com> Cc: Petr Matousek <pmatouse@redhat.com> Cc: Kent Overstreet <kmo@daterainc.com> Cc: Jeff Moyer <jmoyer@redhat.com> Cc: stable@vger.kernel.org	2014-06-24 13:46:01 -04:00
Benjamin LaHaise	f8567a3845	aio: fix aio request leak when events are reaped by userspace The aio cleanups and optimizations by kmo that were merged into the 3.10 tree added a regression for userspace event reaping. Specifically, the reference counts are not decremented if the event is reaped in userspace, leading to the application being unable to submit further aio requests. This patch applies to 3.12+. A separate backport is required for 3.10/3.11. This issue was uncovered as part of CVE-2014-0206. Signed-off-by: Benjamin LaHaise <bcrl@kvack.org> Cc: stable@vger.kernel.org Cc: Kent Overstreet <kmo@daterainc.com> Cc: Mateusz Guzik <mguzik@redhat.com> Cc: Petr Matousek <pmatouse@redhat.com>	2014-06-24 13:32:27 -04:00
Steve French	ce36d9ab3b	[CIFS] fix mount failure with broken pathnames when smb3 mount with mapchars option When we SMB3 mounted with mapchars (to allow reserved characters : \ / > < * ? via the Unicode Windows to POSIX remap range) empty paths (eg when we open "" to query the root of the SMB3 directory on mount) were not null terminated so we sent garbarge as a path name on empty paths which caused SMB2/SMB2.1/SMB3 mounts to fail when mapchars was specified. mapchars is particularly important since Unix Extensions for SMB3 are not supported (yet) Signed-off-by: Steve French <smfrench@gmail.com> Cc: <stable@vger.kernel.org> Reviewed-by: David Disseldorp <ddiss@suse.de>	2014-06-24 08:10:24 -05:00
Xue jiufei	ac4fef4d23	ocfs2/dlm: do not purge lockres that is queued for assert master When workqueue is delayed, it may occur that a lockres is purged while it is still queued for master assert. it may trigger BUG() as follows. N1 N2 dlm_get_lockres() ->dlm_do_master_requery is the master of lockres, so queue assert_master work dlm_thread() start running and purge the lockres dlm_assert_master_worker() send assert master message to other nodes receiving the assert_master message, set master to N2 dlmlock_remote() send create_lock message to N2, but receive DLM_IVLOCKID, if it is RECOVERY lockres, it triggers the BUG(). Another BUG() is triggered when N3 become the new master and send assert_master to N1, N1 will trigger the BUG() because owner doesn't match. So we should not purge lockres when it is queued for assert master. Signed-off-by: joyce.xue <xuejiufei@huawei.com> Reviewed-by: Mark Fasheh <mfasheh@suse.de> Cc: Joel Becker <jlbec@evilplan.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2014-06-23 16:47:45 -07:00
jiangyiwen	b9aaac5a6b	ocfs2: do not return DLM_MIGRATE_RESPONSE_MASTERY_REF to avoid endless,loop during umount The following case may lead to endless loop during umount. node A node B node C node D umount volume, migrate lockres1 to B want to lock lockres1, send MASTER_REQUEST_MSG to C init block mle send MIGRATE_REQUEST_MSG to C find a block mle, and then return DLM_MIGRATE_RESPONSE_MASTERY_REF to B set C in refmap umount successfully try to umount, endless loop occurs when migrate lockres1 since C is in refmap So we can fix this endless loop case by only returning DLM_MIGRATE_RESPONSE_MASTERY_REF if it has a mastery mle when receiving MIGRATE_REQUEST_MSG. [akpm@linux-foundation.org: coding-style fixes] Signed-off-by: jiangyiwen <jiangyiwen@huawei.com> Reviewed-by: Mark Fasheh <mfasheh@suse.de> Cc: Joel Becker <jlbec@evilplan.org> Cc: Xue jiufei <xuejiufei@huawei.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2014-06-23 16:47:45 -07:00
jiangyiwen	595297a8f9	ocfs2: manually do the iput once ocfs2_add_entry failed in ocfs2_symlink and ocfs2_mknod When the call to ocfs2_add_entry() failed in ocfs2_symlink() and ocfs2_mknod(), iput() will not be called during dput(dentry) because no d_instantiate(), and this will lead to umount hung. Signed-off-by: jiangyiwen <jiangyiwen@huawei.com> Cc: Joel Becker <jlbec@evilplan.org> Reviewed-by: Mark Fasheh <mfasheh@suse.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2014-06-23 16:47:45 -07:00
Yiwen Jiang	f7a14f32e7	ocfs2: fix a tiny race when running dirop_fileop_racer When running dirop_fileop_racer we found a dead lock case. 2 nodes, say Node A and Node B, mount the same ocfs2 volume. Create /race/16/1 in the filesystem, and let the inode number of dir 16 is less than the inode number of dir race. Node A Node B mv /race/16/1 /race/ right after Node A has got the EX mode of /race/16/, and tries to get EX mode of /race ls /race/16/ In this case, Node A has got the EX mode of /race/16/, and wants to get EX mode of /race/. Node B has got the PR mode of /race/, and wants to get the PR mode of /race/16/. Since EX and PR are mutually exclusive, dead lock happens. This patch fixes this case by locking in ancestor order before trying inode number order. Signed-off-by: Yiwen Jiang <jiangyiwen@huawei.com> Signed-off-by: Joseph Qi <joseph.qi@huawei.com> Cc: Joel Becker <jlbec@evilplan.org> Reviewed-by: Mark Fasheh <mfasheh@suse.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2014-06-23 16:47:45 -07:00
Xue jiufei	a270c6d3c0	ocfs2/dlm: fix misuse of list_move_tail() in dlm_run_purge_list() When a lockres in purge list but is still in use, it should be moved to the tail of purge list. dlm_thread will continue to check next lockres in purge list. However, code list_move_tail(&dlm->purge_list, &lockres->purge) will do no movements, so dlm_thread will purge the same lockres in this loop again and again. If it is in use for a long time, other lockres will not be processed. Signed-off-by: Yiwen Jiang <jiangyiwen@huawei.com> Signed-off-by: joyce.xue <xuejiufei@huawei.com> Reviewed-by: Mark Fasheh <mfasheh@suse.de> Cc: Joel Becker <jlbec@evilplan.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2014-06-23 16:47:45 -07:00
Wengang Wang	8a8ad1c2f6	ocfs2: refcount: take rw_lock in ocfs2_reflink This patch tries to fix this crash: #5 [ffff88003c1cd690] do_invalid_op at ffffffff810166d5 #6 [ffff88003c1cd730] invalid_op at ffffffff8159b2de [exception RIP: ocfs2_direct_IO_get_blocks+359] RIP: ffffffffa05dfa27 RSP: ffff88003c1cd7e8 RFLAGS: 00010202 RAX: 0000000000000000 RBX: ffff88003c1cdaa8 RCX: 0000000000000000 RDX: 000000000000000c RSI: ffff880027a95000 RDI: ffff88003c79b540 RBP: ffff88003c1cd858 R8: 0000000000000000 R9: ffffffff815f6ba0 R10: 00000000000001c9 R11: 00000000000001c9 R12: ffff88002d271500 R13: 0000000000000001 R14: 0000000000000000 R15: 0000000000001000 ORIG_RAX: ffffffffffffffff CS: 0010 SS: 0018 #7 [ffff88003c1cd860] do_direct_IO at ffffffff811cd31b #8 [ffff88003c1cd950] direct_IO_iovec at ffffffff811cde9c #9 [ffff88003c1cd9b0] do_blockdev_direct_IO at ffffffff811ce764 #10 [ffff88003c1cdb80] __blockdev_direct_IO at ffffffff811ce7cc #11 [ffff88003c1cdbb0] ocfs2_direct_IO at ffffffffa05df756 [ocfs2] #12 [ffff88003c1cdbe0] generic_file_direct_write_iter at ffffffff8112f935 #13 [ffff88003c1cdc40] ocfs2_file_write_iter at ffffffffa0600ccc [ocfs2] #14 [ffff88003c1cdd50] do_aio_write at ffffffff8119126c #15 [ffff88003c1cddc0] aio_rw_vect_retry at ffffffff811d9bb4 #16 [ffff88003c1cddf0] aio_run_iocb at ffffffff811db880 #17 [ffff88003c1cde30] io_submit_one at ffffffff811dc238 #18 [ffff88003c1cde80] do_io_submit at ffffffff811dc437 #19 [ffff88003c1cdf70] sys_io_submit at ffffffff811dc530 #20 [ffff88003c1cdf80] system_call_fastpath at ffffffff8159a159 It crashes at BUG_ON(create && (ext_flags & OCFS2_EXT_REFCOUNTED)); in ocfs2_direct_IO_get_blocks. ocfs2_direct_IO_get_blocks is expecting the OCFS2_EXT_REFCOUNTED be removed in ocfs2_prepare_inode_for_write() if it was there. But no cluster lock is taken during the time before (or inside) ocfs2_prepare_inode_for_write() and after ocfs2_direct_IO_get_blocks(). It can happen in this case: Node A(which crashes) Node B ------------------------ --------------------------- ocfs2_file_aio_write ocfs2_prepare_inode_for_write ocfs2_inode_lock ... ocfs2_inode_unlock #no refcount found .... ocfs2_reflink ocfs2_inode_lock ... ocfs2_inode_unlock #now, refcount flag set on extent ... flush change to disk ocfs2_direct_IO_get_blocks ocfs2_get_clusters #extent map miss #buffer_head miss read extents from disk found refcount flag on extent crash.. Fix: Take rw_lock in ocfs2_reflink path Signed-off-by: Wengang Wang <wen.gang.wang@oracle.com> Reviewed-by: Mark Fasheh <mfasheh@suse.de> Cc: Joel Becker <jlbec@evilplan.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2014-06-23 16:47:45 -07:00
Xue jiufei	b253bfd878	ocfs2: revert "ocfs2: fix NULL pointer dereference when dismount and ocfs2rec simultaneously" `75f82eaa50` ("ocfs2: fix NULL pointer dereference when dismount and ocfs2rec simultaneously") may cause umount hang while shutting down truncate log. The situation is as followes: ocfs2_dismout_volume -> ocfs2_recovery_exit -> free osb->recovery_map -> ocfs2_truncate_shutdown -> lock global bitmap inode -> ocfs2_wait_for_recovery -> check whether osb->recovery_map->rm_used is zero Because osb->recovery_map is already freed, rm_used can be any other values, so it may yield umount hang. Signed-off-by: joyce.xue <xuejiufei@huawei.com> Reviewed-by: Mark Fasheh <mfasheh@suse.de> Cc: Joel Becker <jlbec@evilplan.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2014-06-23 16:47:45 -07:00
Tariq Saeed	27bf6305cf	ocfs2: fix deadlock when two nodes are converting same lock from PR to EX and idletimeout closes conn Orabug: 18639535 Two node cluster and both nodes hold a lock at PR level and both want to convert to EX at the same time. Master node 1 has sent BAST and then closes the connection due to idletime out. Node 0 receives BAST, sends unlock req with cancel flag but gets error -ENOTCONN. The problem is this error is ignored in dlm_send_remote_unlock_request() on the incorrect assumption that the master is dead. See NOTE in comment why it returns DLM_NORMAL. Upon getting DLM_NORMAL, node 0 proceeds to sends convert (without cancel flg) which fails with -ENOTCONN. waits 5 sec and resends. This time gets DLM_IVLOCKID from the master since lock not found in grant, it had been moved to converting queue in response to conv PR->EX req. No way out. Node 1 (master) Node 0 ============== ====== lock mode PR PR convert PR -> EX mv grant -> convert and que BAST ... <-------- convert PR -> EX convert que looks like this: ((node 1, PR -> EX) (node 0, PR -> EX)) ... BAST (want PR -> NL) ------------------> ... idle timout, conn closed ... In response to BAST, sends unlock with cancel convert flag gets -ENOTCONN. Ignores and sends remote convert request gets -ENOTCONN, waits 5 Sec, retries ... reconnects <----------------- convert req goes through on next try does not find lock on grant que status DLM_IVLOCKID ------------------> ... No way out. Fix is to keep retrying unlock with cancel flag until it succeeds or the master dies. Signed-off-by: Tariq Saeed <tariq.x.saeed@oracle.com> Reviewed-by: Mark Fasheh <mfasheh@suse.de> Cc: Joel Becker <jlbec@evilplan.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2014-06-23 16:47:45 -07:00
alex chen	5fb1beb069	ocfs2: should add inode into orphan dir after updating entry in ocfs2_rename() There are two files a and b in dir /mnt/ocfs2. node A node B mv a b In ocfs2_rename(), after calling ocfs2_orphan_add(), the inode of file b will be added into orphan dir. If ocfs2_update_entry() fails, ocfs2_rename return error and mv operation fails. But file b still exists in the parent dir. ocfs2_queue_orphan_scan -> ocfs2_queue_recovery_completion -> ocfs2_complete_recovery -> ocfs2_recover_orphans The inode of the file b will be put with iput(). ocfs2_evict_inode -> ocfs2_delete_inode -> ocfs2_wipe_inode -> ocfs2_remove_inode OCFS2_VALID_FL in the inode i_flags will be cleared. The file b still can be accessed on node B. ls /mnt/ocfs2 When first read the file b with ocfs2_read_inode_block(). It will validate the inode using ocfs2_validate_inode_block(). Because OCFS2_VALID_FL not set in the inode i_flags, so the file system will be readonly. So we should add inode into orphan dir after updating entry in ocfs2_rename(). Signed-off-by: alex.chen <alex.chen@huawei.com> Reviewed-by: Mark Fasheh <mfasheh@suse.de> Cc: Joel Becker <jlbec@evilplan.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2014-06-23 16:47:45 -07:00
Jaegeuk Kim	98397ff3cd	f2fs: fix not to allocate unnecessary blocks during fallocate This patch fixes the fallocate bug like below. (See xfstests/255) In fallocate(fd, 0, 20480), expand_inode_data processes for (index = pg_start; index <= pg_end; index++) { f2fs_reserve_block(); ... } So, even though fallocate requests 20480, 5 blocks, f2fs allocates 6 blocks including pg_end. So, this patch adds one condition to avoid block allocation. Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2014-06-23 10:05:08 +09:00
Jaegeuk Kim	ead432756a	f2fs: recover fallocated data and its i_size together This patch arranges the f2fs_locks to cover the fallocated data and its i_size. Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2014-06-23 10:05:08 +09:00
Jaegeuk Kim	ccfb30001f	f2fs: fix to report newly allocate region as extent Previous get_block in f2fs didn't report the newly allocated region which has NEW_ADDR. For reader, it should not report, but fiemap needs this. So, this patch introduces two get_block sharing core function. Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2014-06-23 10:05:07 +09:00
Linus Torvalds	2dfded8210	File locking related bugfixes for v3.16 (pile #2 ) -----BEGIN PGP SIGNATURE----- Version: GnuPG v1 iQIcBAABAgAGBQJTpiwZAAoJEAAOaEEZVoIVplsP/383a9q3eXonbsi+Ea8CGbRl tdjjVhM1OY4NZYFAoulILDt3HqPTC6MBnqKlHz+BuMziwd/1+3w8S4E7IEwm/KtM ghNYX8ct5Bf1nc5QEdDmwf4PX48QbRwTuT1uIcXaJ+KtxTzI9qN7mnjRN91TUtq4 WRGvOl0AsGCVq8YqxjztgD3TYbu7AG/72Em+DE9f81PTArAPTo2ySc3gxPuJJAsg G1x46Gx46sfqFX2FY4SPsXen+J/67Og67y6eBawxnT2Bp6ZGDuW+jyPRmkhf0yth pWAtkUi3XmEe6kk6GHiICsS0Yn0RG4jbz39+Ja+X7jibQVJ8Iz6b+Optw9RNQwYt jDWHKFS2AaL/CDejHYOQ1shHcozpRojtIbDLIZ9vTNTQ2r5cdaBvkXMmQzdoktmN wQtQ9AzBl8fHOFOQeCAwd/ZfCLIotvLoLds3K/CSqmpsyK2+9IyriQLKKZ2xm6Iu 8+UUspGQcNVwcMP6YWtI6G+u58/mVanmK6dtpiyXrncZLAfU4H7ETL2IPu8jJTbv kTFCOJtXQzNZa5Xqur1hIewOG9/RlvAZAnii0Ghc3nXTWCCeNeI8re3jw4g9KdRv 33t4sYfJld8LQ1NSMqIDyAs+fvytmGurYt+uhVpb58G/4CqBLNpmmIGIQ6LFLMfs 75FQnbAezrD0H/JyAHUk =AJD3 -----END PGP SIGNATURE----- Merge tag 'locks-v3.16-2' of git://git.samba.org/jlayton/linux Pull file locking fixes from Jeff Layton: "File locking related bugfixes Nothing too earth-shattering here. A fix for a potential regression due to a patch in pile #1, and the addition of a memory barrier to prevent a race condition between break_deleg and generic_add_lease" * tag 'locks-v3.16-2' of git://git.samba.org/jlayton/linux: locks: set fl_owner for leases back to current->files locks: add missing memory barrier in break_deleg	2014-06-21 16:40:30 -10:00
Linus Torvalds	e13d100beb	Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs Pull btrfs fixes from Chris Mason: "This fixes some lockups in btrfs reported with rc1. It probably has some performance impact because it is backing off our spinning locks more often and switching to a blocking lock. I'll be able to nail that down next week, but for now I want to get the lockups taken care of. Otherwise some more stack reduction and assorted fixes" * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs: Btrfs: fix wrong error handle when the device is missing or is not writeable Btrfs: fix deadlock when mounting a degraded fs Btrfs: use bio_endio_nodec instead of open code Btrfs: fix NULL pointer crash when running balance and scrub concurrently btrfs: Skip scrubbing removed chunks to avoid -ENOENT. Btrfs: fix broken free space cache after the system crashed Btrfs: make free space cache write out functions more readable Btrfs: remove unused wait queue in struct extent_buffer Btrfs: fix deadlocks with trylock on tree nodes	2014-06-21 14:21:43 -10:00
Linus Torvalds	147f1404db	Merge branch 'for-3.16' of git://linux-nfs.org/~bfields/linux Pull nfsd bugfixes from Bruce Fields: "Fixes for a new regression from the xdr encoding rewrite, and a delegation problem we've had for a while (made somewhat more annoying by the vfs delegation support added in 3.13)" * 'for-3.16' of git://linux-nfs.org/~bfields/linux: NFSD: fix bug for readdir of pseudofs NFSD: Don't hand out delegations for 30 seconds after recalling them.	2014-06-21 14:20:38 -10:00
Miao Xie	8408c716d7	Btrfs: fix wrong error handle when the device is missing or is not writeable The original bio might be submitted, so we shoud increase bi_remaining to account for it when we deal with the error that the device is missing or is not writeable, or we would skip the endio handle. Signed-off-by: Miao Xie <miaox@cn.fujitsu.com> Signed-off-by: Chris Mason <clm@fb.com>	2014-06-19 14:20:56 -07:00
Miao Xie	c55f139640	Btrfs: fix deadlock when mounting a degraded fs The deadlock happened when we mount degraded filesystem, the reproduced steps are following: # mkfs.btrfs -f -m raid1 -d raid1 <dev0> <dev1> # echo 1 > /sys/block/`basename <dev0>`/device/delete # mount -o degraded <dev1> <mnt> The reason was that the counter -- bi_remaining was wrong. If the missing or unwriteable device was the last device in the mapping array, we would not submit the original bio, so we shouldn't increase bi_remaining of it in btrfs_end_bio(), or we would skip the final endio handle. Fix this problem by adding a flag into btrfs bio structure. If we submit the original bio, we will set the flag, and we increase bi_remaining counter, or we don't. Though there is another way to fix it -- decrease bi_remaining counter of the original bio when we make sure the original bio is not submitted, this method need add more check and is easy to make mistake. Signed-off-by: Miao Xie <miaox@cn.fujitsu.com> Reviewed-by: Liu Bo <bo.li.liu@oracle.com> Signed-off-by: Chris Mason <clm@fb.com>	2014-06-19 14:20:56 -07:00
Miao Xie	e990f16763	Btrfs: use bio_endio_nodec instead of open code Signed-off-by: Miao Xie <miaox@cn.fujitsu.com> Signed-off-by: Chris Mason <clm@fb.com>	2014-06-19 14:20:55 -07:00
Wang Shilong	298a8f9cf1	Btrfs: fix NULL pointer crash when running balance and scrub concurrently While running balance, scrub, fsstress concurrently we hit the following kernel crash: [56561.448845] BTRFS info (device sde): relocating block group 11005853696 flags 132 [56561.524077] BUG: unable to handle kernel NULL pointer dereference at 0000000000000078 [56561.524237] IP: [<ffffffffa038956d>] scrub_chunk.isra.12+0xdd/0x130 [btrfs] [56561.524297] PGD 9be28067 PUD 7f3dd067 PMD 0 [56561.524325] Oops: 0000 [#1] SMP [....] [56561.527237] Call Trace: [56561.527309] [<ffffffffa038980e>] scrub_enumerate_chunks+0x24e/0x490 [btrfs] [56561.527392] [<ffffffff810abe00>] ? abort_exclusive_wait+0x50/0xb0 [56561.527476] [<ffffffffa038add4>] btrfs_scrub_dev+0x1a4/0x530 [btrfs] [56561.527561] [<ffffffffa0368107>] btrfs_ioctl+0x13f7/0x2a90 [btrfs] [56561.527639] [<ffffffff811c82f0>] do_vfs_ioctl+0x2e0/0x4c0 [56561.527712] [<ffffffff8109c384>] ? vtime_account_user+0x54/0x60 [56561.527788] [<ffffffff810f768c>] ? __audit_syscall_entry+0x9c/0xf0 [56561.527870] [<ffffffff811c8551>] SyS_ioctl+0x81/0xa0 [56561.527941] [<ffffffff815707f7>] tracesys+0xdd/0xe2 [...] [56561.528304] RIP [<ffffffffa038956d>] scrub_chunk.isra.12+0xdd/0x130 [btrfs] [56561.528395] RSP <ffff88004c0f5be8> [56561.528454] CR2: 0000000000000078 This is because in btrfs_relocate_chunk(), we will free @bdev directly while scrub may still hold extent mapping, and may access freed memory. Fix this problem by wrapping freeing @bdev work into free_extent_map() which is based on reference count. Reported-by: Qu Wenruo <quwenruo@cn.fujitsu.com> Signed-off-by: Wang Shilong <wangsl.fnst@cn.fujitsu.com> Signed-off-by: Miao Xie <miaox@cn.fujitsu.com> Signed-off-by: Chris Mason <clm@fb.com>	2014-06-19 14:20:55 -07:00
Qu Wenruo	ced96edc48	btrfs: Skip scrubbing removed chunks to avoid -ENOENT. When run scrub with balance, sometimes -ENOENT will be returned, since in scrub_enumerate_chunks() will search dev_extent in COMMIT_ROOT, but btrfs_lookup_block_group() will search block group in MEMORY, so if a chunk is removed but not committed, -ENOENT will be returned. However, there is no need to stop scrubbing since other chunks may be scrubbed without problem. So this patch changes the behavior to skip removed chunks and continue to scrub the rest. Signed-off-by: Qu Wenruo <quwenruo@cn.fujitsu.com> Signed-off-by: Miao Xie <miaox@cn.fujitsu.com> Signed-off-by: Chris Mason <clm@fb.com>	2014-06-19 14:20:54 -07:00
Miao Xie	e570fd27f2	Btrfs: fix broken free space cache after the system crashed When we mounted the filesystem after the crash, we got the following message: BTRFS error (device xxx): block group xxxx has wrong amount of free space BTRFS error (device xxx): failed to load free space cache for block group xxx It is because we didn't update the metadata of the allocated space (in extent tree) until the file data was written into the disk. During this time, there was no information about the allocated spaces in either the extent tree nor the free space cache. when we wrote out the free space cache at this time (commit transaction), those spaces were lost. In fact, only the free space that is used to store the file data had this problem, the others didn't because the metadata of them is updated in the same transaction context. There are many methods which can fix the above problem - track the allocated space, and write it out when we write out the free space cache - account the size of the allocated space that is used to store the file data, if the size is not zero, don't write out the free space cache. The first one is complex and may make the performance drop down. This patch chose the second method, we use a per-block-group variant to account the size of that allocated space. Besides that, we also introduce a per-block-group read-write semaphore to avoid the race between the allocation and the free space cache write out. Signed-off-by: Miao Xie <miaox@cn.fujitsu.com> Signed-off-by: Chris Mason <clm@fb.com>	2014-06-19 14:20:54 -07:00
Miao Xie	5349d6c3ff	Btrfs: make free space cache write out functions more readable This patch makes the free space cache write out functions more readable, and beisdes that, it also reduces the stack space that the function -- __btrfs_write_out_cache uses from 194bytes to 144bytes. Signed-off-by: Miao Xie <miaox@cn.fujitsu.com> Signed-off-by: Chris Mason <clm@fb.com>	2014-06-19 14:20:54 -07:00
Filipe Manana	46fefe41b5	Btrfs: remove unused wait queue in struct extent_buffer The lock_wq wait queue is not used anywhere, therefore just remove it. On a x86_64 system, this reduced sizeof(struct extent_buffer) from 320 bytes down to 296 bytes, which means a 4Kb page can now be used for 13 extent buffers instead of 12. Signed-off-by: Filipe David Borba Manana <fdmanana@gmail.com> Signed-off-by: Chris Mason <clm@fb.com>	2014-06-19 14:20:28 -07:00
Chris Mason	ea4ebde02e	Btrfs: fix deadlocks with trylock on tree nodes The Btrfs tree trylock function is poorly named. It always takes the spinlock and backs off if the blocking lock is held. This can lead to surprising lockups because people expect it to really be a trylock. This commit makes it a pure trylock, both for the spinlock and the blocking lock. It also reworks the nested lock handling slightly to avoid taking the read lock while a spinning write lock might be held. Signed-off-by: Chris Mason <clm@fb.com>	2014-06-19 14:19:55 -07:00
Jeff Layton	08bc03539d	cifs: revalidate mapping prior to satisfying read_iter request with cache=loose Before satisfying a read with cache=loose, we should always check that the pagecache is valid before allowing a read to be satisfied out of it. Reported-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp> Signed-off-by: Jeff Layton <jlayton@poochiereds.net> Signed-off-by: Steve French <smfrench@gmail.com>	2014-06-19 13:34:04 -05:00
Kinglong Mee	f41c5ad2ff	NFSD: fix bug for readdir of pseudofs Commit `561f0ed498` (nfsd4: allow large readdirs) introduces a bug about readdir the root of pseudofs. Call xdr_truncate_encode() revert encoded name when skipping. Signed-off-by: Kinglong Mee <kinglongmee@gmail.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>	2014-06-17 16:42:48 -04:00
NeilBrown	6282cd5655	NFSD: Don't hand out delegations for 30 seconds after recalling them. If nfsd needs to recall a delegation for some reason it implies that there is contention on the file, so further delegations should not be handed out. The current code fails to do so, and the result is effectively a live-lock under some workloads: a client attempting a conflicting operation on a read-delegated file receives NFS4ERR_DELAY and retries the operation, but by the time it retries the server may already have given out another delegation. We could simply avoid delegations for (say) 30 seconds after any recall, but this is probably too heavy handed. We could keep a list of inodes (or inode numbers or filehandles) for recalled delegations, but that requires memory allocation and searching. The approach taken here is to use a bloom filter to record the filehandles which are currently blocked from delegation, and to accept the cost of a few false positives. We have 2 bloom filters, each of which is valid for 30 seconds. When a delegation is recalled the filehandle is added to one filter and will remain disabled for between 30 and 60 seconds. We keep a count of the number of filehandles that have been added, so when that count is zero we can bypass all other tests. The bloom filters have 256 bits and 3 hash functions. This should allow a couple of dozen blocked filehandles with minimal false positives. If many more filehandles are all blocked at once, behaviour will degrade towards rejecting all delegations for between 30 and 60 seconds, then resetting and allowing new delegations. Signed-off-by: NeilBrown <neilb@suse.de> Signed-off-by: J. Bruce Fields <bfields@redhat.com>	2014-06-17 16:42:47 -04:00
Konstantin Khlebnikov	ebe06187bf	epoll: fix use-after-free in eventpoll_release_file This fixes use-after-free of epi->fllink.next inside list loop macro. This loop actually releases elements in the body. The list is rcu-protected but here we cannot hold rcu_read_lock because we need to lock mutex inside. The obvious solution is to use list_for_each_entry_safe(). RCU-ness isn't essential because nobody can change this list under us, it's final fput for this file. The bug was introduced by `ae10b2b4eb` ("epoll: optimize EPOLL_CTL_DEL using rcu") Signed-off-by: Konstantin Khlebnikov <koct9i@gmail.com> Reported-by: Cyrill Gorcunov <gorcunov@openvz.org> Cc: Stable <stable@vger.kernel.org> # 3.13+ Cc: Sasha Levin <sasha.levin@oracle.com> Cc: Jason Baron <jbaron@akamai.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2014-06-16 17:21:59 -10:00
Björn Baumbach	a1d0b84c30	fs/cifs: fix regression in cifs_create_mf_symlink() commit `d81b8a40e2` ("CIFS: Cleanup cifs open codepath") changed disposition to FILE_OPEN. Signed-off-by: Björn Baumbach <bb@sernet.de> Signed-off-by: Stefan Metzmacher <metze@samba.org> Reviewed-by: Stefan Metzmacher <metze@samba.org> Cc: <stable@vger.kernel.org> # v3.14+ Cc: Pavel Shilovsky <piastry@etersoft.ru> Cc: Steve French <sfrench@samba.org> Signed-off-by: Steve French <smfrench@gmail.com>	2014-06-16 13:50:11 -05:00
Jan Kara	c5c7b8ddfb	ext4: Fix buffer double free in ext4_alloc_branch() Error recovery in ext4_alloc_branch() calls ext4_forget() even for buffer corresponding to indirect block it did not allocate. This leads to brelse() being called twice for that buffer (once from ext4_forget() and once from cleanup in ext4_ind_map_blocks()) leading to buffer use count misaccounting. Eventually (but often much later because there are other users of the buffer) we will see messages like: VFS: brelse: Trying to free free buffer Another manifestation of this problem is an error: JBD2 unexpected failure: jbd2_journal_revoke: !buffer_revoked(bh); inconsistent data on disk The fix is easy - don't forget buffer we did not allocate. Also add an explanatory comment because the indexing at ext4_alloc_branch() is somewhat subtle. Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Theodore Ts'o <tytso@mit.edu> Cc: stable@vger.kernel.org	2014-06-15 23:46:28 -04:00
Linus Torvalds	16d52ef7c0	Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs Pull more btrfs updates from Chris Mason: "This has a few fixes since our last pull and a new ioctl for doing btree searches from userland. It's very similar to the existing ioctl, but lets us return larger items back down to the app" * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs: btrfs: fix error handling in create_pending_snapshot btrfs: fix use of uninit "ret" in end_extent_writepage() btrfs: free ulist in qgroup_shared_accounting() error path Btrfs: fix qgroups sanity test crash or hang btrfs: prevent RCU warning when dereferencing radix tree slot Btrfs: fix unfinished readahead thread for raid5/6 degraded mounting btrfs: new ioctl TREE_SEARCH_V2 btrfs: tree_search, search_ioctl: direct copy to userspace btrfs: new function read_extent_buffer_to_user btrfs: tree_search, copy_to_sk: return needed size on EOVERFLOW btrfs: tree_search, copy_to_sk: return EOVERFLOW for too small buffer btrfs: tree_search, search_ioctl: accept varying buffer btrfs: tree_search: eliminate redundant nr_items check	2014-06-14 19:48:43 -05:00
Linus Torvalds	a311c48038	Merge git://git.kvack.org/~bcrl/aio-next Pull aio fix and cleanups from Ben LaHaise: "This consists of a couple of code cleanups plus a minor bug fix" * git://git.kvack.org/~bcrl/aio-next: aio: cleanup: flatten kill_ioctx() aio: report error from io_destroy() when threads race in io_destroy() fs/aio.c: Remove ctx parameter in kiocb_cancel	2014-06-14 19:43:27 -05:00
Eric Sandeen	47a306a748	btrfs: fix error handling in create_pending_snapshot `fcebe456` cut and pasted some code to a later point in create_pending_snapshot(), but didn't switch to the appropriate error handling for this stage of the function. Signed-off-by: Eric Sandeen <sandeen@redhat.com> Signed-off-by: Chris Mason <clm@fb.com>	2014-06-13 09:52:30 -07:00
Eric Sandeen	3e2426bd0e	btrfs: fix use of uninit "ret" in end_extent_writepage() If this condition in end_extent_writepage() is false: if (tree->ops && tree->ops->writepage_end_io_hook) we will then test an uninitialized "ret" at: ret = ret < 0 ? ret : -EIO; The test for ret is for the case where ->writepage_end_io_hook failed, and we'd choose that ret as the error; but if there is no ->writepage_end_io_hook, nothing sets ret. Initializing ret to 0 should be sufficient; if writepage_end_io_hook wasn't set, (!uptodate) means non-zero err was passed in, so we choose -EIO in that case. Signed-of-by: Eric Sandeen <sandeen@redhat.com> Signed-off-by: Chris Mason <clm@fb.com>	2014-06-13 09:52:28 -07:00
Eric Sandeen	d737278091	btrfs: free ulist in qgroup_shared_accounting() error path If tmp = ulist_alloc(GFP_NOFS) fails, we return without freeing the previously allocated qgroups = ulist_alloc(GFP_NOFS) and cause a memory leak. Signed-off-by: Eric Sandeen <sandeen@redhat.com> Signed-off-by: Chris Mason <clm@fb.com>	2014-06-13 09:52:26 -07:00
Filipe Manana	b050f9f6dd	Btrfs: fix qgroups sanity test crash or hang Often when running the qgroups sanity test, a crash or a hang happened. This is because the extent buffer the test uses for the root node doesn't have an header level explicitly set, making it have a random level value. This is a problem when it's not zero for the btrfs_search_slot() calls the test ends up doing, resulting in crashes or hangs such as the following: [ 6454.127192] Btrfs loaded, debug=on, assert=on, integrity-checker=on (...) [ 6454.127760] BTRFS: selftest: Running qgroup tests [ 6454.127964] BTRFS: selftest: Running test_test_no_shared_qgroup [ 6454.127966] BTRFS: selftest: Qgroup basic add [ 6480.152005] BUG: soft lockup - CPU#0 stuck for 23s! [modprobe:5383] [ 6480.152005] Modules linked in: btrfs(+) xor raid6_pq binfmt_misc nfsd auth_rpcgss oid_registry nfs_acl nfs lockd fscache sunrpc i2c_piix4 i2c_core pcspkr evbug psmouse serio_raw e1000 [last unloaded: btrfs] [ 6480.152005] irq event stamp: 188448 [ 6480.152005] hardirqs last enabled at (188447): [<ffffffff8168ef5c>] restore_args+0x0/0x30 [ 6480.152005] hardirqs last disabled at (188448): [<ffffffff81698e6a>] apic_timer_interrupt+0x6a/0x80 [ 6480.152005] softirqs last enabled at (188446): [<ffffffff810516cf>] __do_softirq+0x1cf/0x450 [ 6480.152005] softirqs last disabled at (188441): [<ffffffff81051c25>] irq_exit+0xb5/0xc0 [ 6480.152005] CPU: 0 PID: 5383 Comm: modprobe Not tainted 3.15.0-rc8-fdm-btrfs-next-33+ #4 [ 6480.152005] Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011 [ 6480.152005] task: ffff8802146125a0 ti: ffff8800d0d00000 task.ti: ffff8800d0d00000 [ 6480.152005] RIP: 0010:[<ffffffff81349a63>] [<ffffffff81349a63>] __write_lock_failed+0x13/0x20 [ 6480.152005] RSP: 0018:ffff8800d0d038e8 EFLAGS: 00000287 [ 6480.152005] RAX: 0000000000000000 RBX: ffffffff8168ef5c RCX: 000005deb8525852 [ 6480.152005] RDX: 0000000000000000 RSI: 0000000000001d45 RDI: ffff8802105000b8 [ 6480.152005] RBP: ffff8800d0d038e8 R08: fffffe12710f63db R09: ffffffffa03196fb [ 6480.152005] R10: ffff8802146125a0 R11: ffff880214612e28 R12: ffff8800d0d03858 [ 6480.152005] R13: 0000000000000000 R14: ffff8800d0d00000 R15: ffff8802146125a0 [ 6480.152005] FS: 00007f14ff804700(0000) GS:ffff880215e00000(0000) knlGS:0000000000000000 [ 6480.152005] CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b [ 6480.152005] CR2: 00007fff4df0dac8 CR3: 00000000d1796000 CR4: 00000000000006f0 [ 6480.152005] Stack: [ 6480.152005] ffff8800d0d03908 ffffffff810ae967 0000000000000001 ffff8802105000b8 [ 6480.152005] ffff8800d0d03938 ffffffff8168e57e ffffffffa0319c16 0000000000000007 [ 6480.152005] ffff880210500000 ffff880210500100 ffff8800d0d039b8 ffffffffa0319c16 [ 6480.152005] Call Trace: [ 6480.152005] [<ffffffff810ae967>] do_raw_write_lock+0x47/0xa0 [ 6480.152005] [<ffffffff8168e57e>] _raw_write_lock+0x5e/0x80 [ 6480.152005] [<ffffffffa0319c16>] ? btrfs_tree_lock+0x116/0x270 [btrfs] [ 6480.152005] [<ffffffffa0319c16>] btrfs_tree_lock+0x116/0x270 [btrfs] [ 6480.152005] [<ffffffffa02b2acb>] btrfs_lock_root_node+0x3b/0x50 [btrfs] [ 6480.152005] [<ffffffffa02b81a6>] btrfs_search_slot+0x916/0xa20 [btrfs] [ 6480.152005] [<ffffffff811a727f>] ? create_object+0x23f/0x300 [ 6480.152005] [<ffffffffa02b9958>] btrfs_insert_empty_items+0x78/0xd0 [btrfs] [ 6480.152005] [<ffffffffa036041a>] insert_normal_tree_ref.constprop.4+0xa2/0x19a [btrfs] [ 6480.152005] [<ffffffffa03605c3>] test_no_shared_qgroup+0xb1/0x1ca [btrfs] [ 6480.152005] [<ffffffff8108cad6>] ? local_clock+0x16/0x30 [ 6480.152005] [<ffffffffa035ef8e>] btrfs_test_qgroups+0x1ae/0x1d7 [btrfs] [ 6480.152005] [<ffffffffa03a69d2>] ? ftrace_define_fields_btrfs_space_reservation+0xfd/0xfd [btrfs] [ 6480.152005] [<ffffffffa03a6a86>] init_btrfs_fs+0xb4/0x153 [btrfs] [ 6480.152005] [<ffffffff81000352>] do_one_initcall+0x102/0x150 [ 6480.152005] [<ffffffff8103d223>] ? set_memory_nx+0x43/0x50 [ 6480.152005] [<ffffffff81682668>] ? set_section_ro_nx+0x6d/0x74 [ 6480.152005] [<ffffffff810d91cc>] load_module+0x1cdc/0x2630 (...) Therefore initialize the extent buffer as an empty leaf (level 0). Issue easy to reproduce when btrfs is built as a module via: $ for ((i = 1; i <= 1000000; i++)); do rmmod btrfs; modprobe btrfs; done Signed-off-by: Filipe David Borba Manana <fdmanana@gmail.com> Signed-off-by: Chris Mason <clm@fb.com>	2014-06-13 09:52:24 -07:00
Sasha Levin	f1e3c28949	btrfs: prevent RCU warning when dereferencing radix tree slot Mark the dereference as protected by lock. Not doing so triggers an RCU warning since the radix tree assumed that RCU is in use. Signed-off-by: Sasha Levin <sasha.levin@oracle.com> Signed-off-by: Chris Mason <clm@fb.com>	2014-06-13 09:52:22 -07:00
Wang Shilong	5fbc7c59fd	Btrfs: fix unfinished readahead thread for raid5/6 degraded mounting Steps to reproduce: # mkfs.btrfs -f /dev/sd[b-f] -m raid5 -d raid5 # mkfs.ext4 /dev/sdc --->corrupt one of btrfs device # mount /dev/sdb /mnt -o degraded # btrfs scrub start -BRd /mnt This is because readahead would skip missing device, this is not true for RAID5/6, because REQ_GET_READ_MIRRORS return 1 for RAID5/6 block mapping. If expected data locates in missing device, readahead thread would not call __readahead_hook() which makes event @rc->elems=0 wait forever. Fix this problem by checking return value of btrfs_map_block(),we can only skip missing device safely if there are several mirrors. Signed-off-by: Wang Shilong <wangsl.fnst@cn.fujitsu.com> Signed-off-by: Chris Mason <clm@fb.com>	2014-06-13 09:52:21 -07:00
Gerhard Heift	cc68a8a5a4	btrfs: new ioctl TREE_SEARCH_V2 This new ioctl call allows the user to supply a buffer of varying size in which a tree search can store its results. This is much more flexible if you want to receive items which are larger than the current fixed buffer of 3992 bytes or if you want to fetch more items at once. Items larger than this buffer are for example some of the type EXTENT_CSUM. Signed-off-by: Gerhard Heift <Gerhard@Heift.Name> Signed-off-by: Chris Mason <clm@fb.com> Acked-by: David Sterba <dsterba@suse.cz>	2014-06-13 09:52:19 -07:00
Linus Torvalds	4bdeb31208	dlm for 3.16 This set includes one small fix related to resending SCTP messages. -----BEGIN PGP SIGNATURE----- Version: GnuPG v1 iQIcBAABAgAGBQJTmwixAAoJEDgbc8f8gGmqlT0P/Akhh1P264NVAXSwVhQf1HKs SZFlCIKolyM8ih4TkPM1QDcB3i2FBJiLkaLQT3yd8jcxW6fpMnH6wmRkU3T3uSlB cV88eXUdS18gnPeMx1vMZrip2VOa5xpCLWyp7ClC6U7E3xI5xkc3eIbzilhhsr0b sp72ahBujPsbltUYpg2giJg0DtoQtc9Tw0PMbF2smv/p1m+gKE8IE0v9JiVoK148 B3vRxiuqSfScOYLhNjyLmgMeCxalbNUwdd+v16HfgUCjVQR/ji9EO29AGDGwvR8P UCqHPS+WA9gUBff0/kHFNNweDWdAhvfVSDmxLUWxmEN7zyCZA/SYsq6COnh+zYz6 S1rT4nCvlchPNxK7OthZh/iSJy2xqfX2mPy/rKf7SFHoE/AZidY4OIKAE4Y92/mp cq3RCYvn+SGFNLt7Y4VceWmH5IUFak9KhK6d+13oNFneY5QRCqMH5AcOSoCV0V7m sGRiobgMY9PW06/WMSJYN1H1QQ1YbEY2Jn44xEIeyDcegRUIi8joOTuM3iRx/ZuZ uKIe5lMSajeIsQvJOrWkrr+sD+odhxMs3NlsTjLoopXAM9oihMdnrWAapP6G3z2G 0ZSHAtuy9KJxi/ycKwwHRcXrY6OFtWpBUcmDeTz7wQ3q4idf3q64tyf0VRfkMTGj jqP6JMfHKKcn67bjEcbn =HEIE -----END PGP SIGNATURE----- Merge tag 'dlm-3.16' of git://git.kernel.org/pub/scm/linux/kernel/git/teigland/linux-dlm Pull dlm fix from David Teigland: "This contains one small fix related to resending SCTP messages" * tag 'dlm-3.16' of git://git.kernel.org/pub/scm/linux/kernel/git/teigland/linux-dlm: dlm: keep listening connection alive with sctp mode	2014-06-13 07:41:57 -07:00
Linus Torvalds	6d87c225f5	Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client Pull Ceph updates from Sage Weil: "This has a mix of bug fixes and cleanups. Alex's patch fixes a rare race in RBD. Ilya's patches fix an ENOENT check when a second rbd image is mapped and a couple memory leaks. Zheng fixes several issues with fragmented directories and multiple MDSs. Josh fixes a spin/sleep issue, and Josh and Guangliang's patches fix setting and unsetting RBD images read-only. Naturally there are several other cleanups mixed in for good measure" * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client: (23 commits) rbd: only set disk to read-only once rbd: move calls that may sleep out of spin lock range rbd: add ioctl for rbd ceph: use truncate_pagecache() instead of truncate_inode_pages() ceph: include time stamp in every MDS request rbd: fix ida/idr memory leak rbd: use reference counts for image requests rbd: fix osd_request memory leak in __rbd_dev_header_watch_sync() rbd: make sure we have latest osdmap on 'rbd map' libceph: add ceph_monc_wait_osdmap() libceph: mon_get_version request infrastructure libceph: recognize poolop requests in debugfs ceph: refactor readpage_nounlock() to make the logic clearer mds: check cap ID when handling cap export message ceph: remember subtree root dirfrag's auth MDS ceph: introduce ceph_fill_fragtree() ceph: handle cap import atomically ceph: pre-allocate ceph_cap struct for ceph_add_cap() ceph: update inode fields according to issued caps rbd: replace IS_ERR and PTR_ERR with PTR_ERR_OR_ZERO ...	2014-06-12 23:06:23 -07:00
Linus Torvalds	3737a12761	Merge branch 'perf-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull more perf updates from Ingo Molnar: "A second round of perf updates: - wide reaching kprobes sanitization and robustization, with the hope of fixing all 'probe this function crashes the kernel' bugs, by Masami Hiramatsu. - uprobes updates from Oleg Nesterov: tmpfs support, corner case fixes and robustization work. - perf tooling updates and fixes from Jiri Olsa, Namhyung Ki, Arnaldo et al: * Add support to accumulate hist periods (Namhyung Kim) * various fixes, refactorings and enhancements" * 'perf-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (101 commits) perf: Differentiate exec() and non-exec() comm events perf: Fix perf_event_comm() vs. exec() assumption uprobes/x86: Rename arch_uprobe->def to ->defparam, minor comment updates perf/documentation: Add description for conditional branch filter perf/x86: Add conditional branch filtering support perf/tool: Add conditional branch filter 'cond' to perf record perf: Add new conditional branch filter 'PERF_SAMPLE_BRANCH_COND' uprobes: Teach copy_insn() to support tmpfs uprobes: Shift ->readpage check from __copy_insn() to uprobe_register() perf/x86: Use common PMU interrupt disabled code perf/ARM: Use common PMU interrupt disabled code perf: Disable sampled events if no PMU interrupt perf: Fix use after free in perf_remove_from_context() perf tools: Fix 'make help' message error perf record: Fix poll return value propagation perf tools: Move elide bool into perf_hpp_fmt struct perf tools: Remove elide setup for SORT_MODE__MEMORY mode perf tools: Fix "==" into "=" in ui_browser__warning assignment perf tools: Allow overriding sysfs and proc finding with env var perf tools: Consider header files outside perf directory in tags target ...	2014-06-12 19:18:49 -07:00
Gerhard Heift	ba346b357d	btrfs: tree_search, search_ioctl: direct copy to userspace By copying each found item seperatly to userspace, we do not need extra buffer in the kernel. Signed-off-by: Gerhard Heift <Gerhard@Heift.Name> Signed-off-by: Chris Mason <clm@fb.com> Acked-by: David Sterba <dsterba@suse.cz>	2014-06-12 18:22:05 -07:00
Gerhard Heift	550ac1d85e	btrfs: new function read_extent_buffer_to_user This new function reads the content of an extent directly to user memory. Signed-off-by: Gerhard Heift <Gerhard@Heift.Name> Signed-off-by: Chris Mason <clm@fb.com> Acked-by: David Sterba <dsterba@suse.cz>	2014-06-12 18:21:56 -07:00
Gerhard Heift	9b6e817d02	btrfs: tree_search, copy_to_sk: return needed size on EOVERFLOW If an item in tree_search is too large to be stored in the given buffer, return the needed size (including the header). Signed-off-by: Gerhard Heift <Gerhard@Heift.Name> Signed-off-by: Chris Mason <clm@fb.com> Acked-by: David Sterba <dsterba@suse.cz>	2014-06-12 18:21:47 -07:00
Gerhard Heift	8f5f6178f3	btrfs: tree_search, copy_to_sk: return EOVERFLOW for too small buffer In copy_to_sk, if an item is too large for the given buffer, it now returns -EOVERFLOW instead of copying a search_header with len = 0. For backward compatibility for the first item it still copies such a header to the buffer, but not any other following items, which could have fitted. tree_search changes -EOVERFLOW back to 0 to behave similiar to the way it behaved before this patch. Signed-off-by: Gerhard Heift <Gerhard@Heift.Name> Signed-off-by: Chris Mason <clm@fb.com> Acked-by: David Sterba <dsterba@suse.cz>	2014-06-12 18:21:39 -07:00
Gerhard Heift	1254444288	btrfs: tree_search, search_ioctl: accept varying buffer rewrite search_ioctl to accept a buffer with varying size Signed-off-by: Gerhard Heift <Gerhard@Heift.Name> Signed-off-by: Chris Mason <clm@fb.com> Acked-by: David Sterba <dsterba@suse.cz>	2014-06-12 18:21:26 -07:00
Gerhard Heift	25c9bc2e2b	btrfs: tree_search: eliminate redundant nr_items check If the amount of items reached the given limit of nr_items, we can leave copy_to_sk without updating the key. Also by returning 1 we leave the loop in search_ioctl without rechecking if we reached the given limit. Signed-off-by: Gerhard Heift <Gerhard@Heift.Name> Signed-off-by: Chris Mason <clm@fb.com> Acked-by: David Sterba <dsterba@suse.cz>	2014-06-12 18:20:39 -07:00
Linus Torvalds	16b9057804	Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs Pull vfs updates from Al Viro: "This the bunch that sat in -next + lock_parent() fix. This is the minimal set; there's more pending stuff. In particular, I really hope to get acct.c fixes merged this cycle - we need that to deal sanely with delayed-mntput stuff. In the next pile, hopefully - that series is fairly short and localized (kernel/acct.c, fs/super.c and fs/namespace.c). In this pile: more iov_iter work. Most of prereqs for ->splice_write with sane locking order are there and Kent's dio rewrite would also fit nicely on top of this pile" * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: (70 commits) lock_parent: don't step on stale ->d_parent of all-but-freed one kill generic_file_splice_write() ceph: switch to iter_file_splice_write() shmem: switch to iter_file_splice_write() nfs: switch to iter_splice_write_file() fs/splice.c: remove unneeded exports ocfs2: switch to iter_file_splice_write() ->splice_write() via ->write_iter() bio_vec-backed iov_iter optimize copy_page_{to,from}_iter() bury generic_file_aio_{read,write} lustre: get rid of messing with iovecs ceph: switch to ->write_iter() ceph_sync_direct_write: stop poking into iov_iter guts ceph_sync_read: stop poking into iov_iter guts new helper: copy_page_from_iter() fuse: switch to ->write_iter() btrfs: switch to ->write_iter() ocfs2: switch to ->write_iter() xfs: switch to ->write_iter() ...	2014-06-12 10:30:18 -07:00
Lidong Zhong	883854c545	dlm: keep listening connection alive with sctp mode The connection struct with nodeid 0 is the listening socket, not a connection to another node. The sctp resend function was not checking that the nodeid was valid (non-zero), so it would mistakenly get and resend on the listening connection when nodeid was zero. Signed-off-by: Lidong Zhong <lzhong@suse.com> Signed-off-by: David Teigland <teigland@redhat.com>	2014-06-12 10:26:14 -05:00
Al Viro	c2338f2dc7	lock_parent: don't step on stale ->d_parent of all-but-freed one Dentry that had been through (or into) __dentry_kill() might be seen by shrink_dentry_list(); that's normal, it'll be taken off the shrink list and freed if __dentry_kill() has already finished. The problem is, its ->d_parent might be pointing to already freed dentry, so lock_parent() needs to be careful. We need to check that dentry hasn't already gone into __dentry_kill() and grab rcu_read_lock() before dropping ->d_lock - the latter makes sure that whatever we see in ->d_parent after dropping ->d_lock it won't be freed until we drop rcu_read_lock(). Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2014-06-12 00:29:13 -04:00
Al Viro	9c1d5284c7	Merge commit '9f12600fe425bc28f0ccba034a77783c09c15af4' into for-linus Backmerge of dcache.c changes from mainline. It's that, or complete rebase... Conflicts: fs/splice.c Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2014-06-12 00:28:09 -04:00
Al Viro	5f07385060	kill generic_file_splice_write() no callers left Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2014-06-12 00:21:13 -04:00
Al Viro	3551dd79ac	ceph: switch to iter_file_splice_write() Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2014-06-12 00:21:12 -04:00
Al Viro	4da54c218d	nfs: switch to iter_splice_write_file() Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2014-06-12 00:21:11 -04:00
Al Viro	96f9bc8fbc	fs/splice.c: remove unneeded exports ocfs2 was using a bunch of splice.c guts... Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2014-06-12 00:21:11 -04:00
Al Viro	6dc8bc0fb3	ocfs2: switch to iter_file_splice_write() Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2014-06-12 00:21:10 -04:00
Al Viro	8d0207652c	->splice_write() via ->write_iter() iter_file_splice_write() - a ->splice_write() instance that gathers the pipe buffers, builds a bio_vec-based iov_iter covering those and feeds it to ->write_iter(). A bunch of simple cases coverted to that... [AV: fixed the braino spotted by Cyrill] Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2014-06-12 00:18:51 -04:00
Linus Torvalds	2840c566e9	Merge branch 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs Pull reiserfs and ext3 changes from Jan Kara: "Big reiserfs cleanup from Jeff, an ext3 deadlock fix, and some small cleanups" * 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs: (34 commits) reiserfs: Fix compilation breakage with CONFIG_REISERFS_CHECK ext3: Fix deadlock in data=journal mode when fs is frozen reiserfs: call truncate_setsize under tailpack mutex fs/jbd/revoke.c: replace shift loop by ilog2 reiserfs: remove obsolete __constant_cpu_to_le32 reiserfs: balance_leaf refactor, split up balance_leaf_when_delete reiserfs: balance_leaf refactor, format balance_leaf_finish_node reiserfs: balance_leaf refactor, format balance_leaf_new_nodes_paste reiserfs: balance_leaf refactor, format balance_leaf_paste_right reiserfs: balance_leaf refactor, format balance_leaf_insert_right reiserfs: balance_leaf refactor, format balance_leaf_paste_left reiserfs: balance_leaf refactor, format balance_leaf_insert_left reiserfs: balance_leaf refactor, pull out balance_leaf{left, right, new_nodes, finish_node} reiserfs: balance_leaf refactor, pull out balance_leaf_finish_node_paste reiserfs: balance_leaf refactor pull out balance_leaf_finish_node_insert reiserfs: balance_leaf refactor, pull out balance_leaf_new_nodes_paste reiserfs: balance_leaf refactor, pull out balance_leaf_new_nodes_insert reiserfs: balance_leaf refactor, pull out balance_leaf_paste_right reiserfs: balance_leaf refactor, pull out balance_leaf_insert_right reiserfs: balance_leaf refactor, pull out balance_leaf_paste_left ...	2014-06-11 10:45:14 -07:00
Linus Torvalds	859862ddd2	Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs Pull btrfs updates from Chris Mason: "The biggest change here is Josef's rework of the btrfs quota accounting, which improves the in-memory tracking of delayed extent operations. I had been working on Btrfs stack usage for a while, mostly because it had become impossible to do long stress runs with slab, lockdep and pagealloc debugging turned on without blowing the stack. Even though you upgraded us to a nice king sized stack, I kept most of the patches. We also have some very hard to find corruption fixes, an awesome sysfs use after free, and the usual assortment of optimizations, cleanups and other fixes" * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs: (80 commits) Btrfs: convert smp_mb__{before,after}_clear_bit Btrfs: fix scrub_print_warning to handle skinny metadata extents Btrfs: make fsync work after cloning into a file Btrfs: use right type to get real comparison Btrfs: don't check nodes for extent items Btrfs: don't release invalid page in btrfs_page_exists_in_range() Btrfs: make sure we retry if page is a retriable exception Btrfs: make sure we retry if we couldn't get the page btrfs: replace EINVAL with EOPNOTSUPP for dev_replace raid56 trivial: fs/btrfs/ioctl.c: fix typo s/substract/subtract/ Btrfs: fix leaf corruption after __btrfs_drop_extents Btrfs: ensure btrfs_prev_leaf doesn't miss 1 item Btrfs: fix clone to deal with holes when NO_HOLES feature is enabled btrfs: free delayed node outside of root->inode_lock btrfs: replace EINVAL with ERANGE for resize when ULLONG_MAX Btrfs: fix transaction leak during fsync call btrfs: Avoid trucating page or punching hole in a already existed hole. Btrfs: update commit root on snapshot creation after orphan cleanup Btrfs: ioctl, don't re-lock extent range when not necessary Btrfs: avoid visiting all extent items when cloning a range ...	2014-06-11 09:22:21 -07:00
Linus Torvalds	412dd3a6da	xfs: update for 3.16-rc1 This update contains: o cleanup removing unused function args o rework of the filestreams allocator to use dentry cache parent lookups o new on-disk free inode btree and optimised inode allocator o various bug fixes o rework of internal attribute API o cleanup of superblock feature bit support to remove historic cruft o more fixes and minor cleanups o added a new directory/attribute geometry abstraction o yet more fixes and minor cleanups. -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.12 (GNU/Linux) iQIcBAABAgAGBQJTl+YfAAoJEK3oKUf0dfod/T8QALmvR28JZTL3vtlD5rppXp9+ DXOMrwgJ9V+GOI39tpUgw1u/C5DuaFPRPtmCjnb9Do4DJMrHj+zD8ADvoVd6asa0 FHH4TuulQqOJVu67SZ3ng15yjyy+wPfymQZIQPQY/IwVMUUEpWnSFnKha1GAsL8Y RY/WNU50wMu4wxi0++ENooHJC2EoxXpzB80cHddN81zFEFZobw0cm5Aa5xBZEZ4i P+GpEuUpWHKvVaWRLuIMgVC0NuOt5KtLfS97ong+tRgWCw//QVl28Rxhrj1ZHsF3 VAskVsSFVIIPHP7qKjQyCGk71iqBfrfAgRqqJHFZgSmtSzyK3hVvJlRRDdCT5hi6 00aHg9vz9815I7zrQwyMuy872N3DTislOxJZGD7PKgLpgfeHs4qk+cQ1xCi2gdFn xnh2p4mLolZHzanUsoxYpSh7f7o+NT3xgET3yS63uuO/I57o74JJDfRDjWNX6I9F LLtIGb1cwVFUYbXcHGfP1wxQ1BS6rYYYwKpSJqqwJXApL9MqoxH2B8Hoo0BaG43/ 3UlNi+yljvhBNiJnx2pAIdU+WaIL1ZQj9XzuU1sFSa8lnFNb2x+wkgjHzJ0Hdotm zZqirCo1jyyNkyTwGfwJwGzNgZemQDMQ7cr2MYzG1mhFMLEZZJeFmWVzfuzJ3yoR jke/Hy/qiWVK0en43MdR =qnz2 -----END PGP SIGNATURE----- Merge tag 'xfs-for-linus-3.16-rc1' of git://oss.sgi.com/xfs/xfs Pull xfs updates from Dave Chinner: "This update contains: - cleanup removing unused function args - rework of the filestreams allocator to use dentry cache parent lookups - new on-disk free inode btree and optimised inode allocator - various bug fixes - rework of internal attribute API - cleanup of superblock feature bit support to remove historic cruft - more fixes and minor cleanups - added a new directory/attribute geometry abstraction - yet more fixes and minor cleanups" * tag 'xfs-for-linus-3.16-rc1' of git://oss.sgi.com/xfs/xfs: (86 commits) xfs: fix xfs_da_args sparse warning in xfs_readdir xfs: Fix rounding in xfs_alloc_fix_len() xfs: tone down writepage/releasepage WARN_ONs xfs: small cleanup in xfs_lowbit64() xfs: kill xfs_buf_geterror() xfs: xfs_readsb needs to check for magic numbers xfs: block allocation work needs to be kswapd aware xfs: remove redundant geometry information from xfs_da_state xfs: replace attr LBSIZE with xfs_da_geometry xfs: pass xfs_da_args to xfs_attr_leaf_newentsize xfs: use xfs_da_geometry for block size in attr code xfs: remove mp->m_dir_geo from directory logging xfs: reduce direct usage of mp->m_dir_geo xfs: move node entry counts to xfs_da_geometry xfs: convert dir/attr btree threshold to xfs_da_geometry xfs: convert m_dirblksize to xfs_da_geometry xfs: convert m_dirblkfsbs to xfs_da_geometry xfs: convert directory segment limits to xfs_da_geometry xfs: convert directory db conversion to xfs_da_geometry xfs: convert directory dablk conversion to xfs_da_geometry ...	2014-06-11 09:03:47 -07:00
Jan Kara	19ef1229bc	reiserfs: Fix compilation breakage with CONFIG_REISERFS_CHECK There was a bug in debug printout when CONFIG_REISERFS_CHECK was enabled so one of the assertions in do_balan.c didn't compile. Fix it. Fixes: `0080e9f9d3` Signed-off-by: Jan Kara <jack@suse.cz>	2014-06-11 17:32:10 +02:00
Linus Torvalds	9ee4d7a653	Merge branch 'akpm' (patches from Andrew Morton) Merge leftovers from Andrew Morton: "A few leftovers: ocfs2, gcov, RTC" * emailed patches from Andrew Morton <akpm@linux-foundation.org>: rtc: s5m: consolidate two device type switch statements rtc: s5m: add support for S2MPS14 RTC rtc: s5m: support different register layout rtc: s5m: use shorter time of register update rtc: s5m: remove undocumented time init on first boot mfd/rtc: sec/s5m: rename SEC* symbols to S5M gcov: add support for GCC 4.9 ocfs2/o2net: incorrect to terminate accepting connections loop upon rejecting an invalid one	2014-06-10 15:34:55 -07:00
Tariq Saeed	79deb3c148	ocfs2/o2net: incorrect to terminate accepting connections loop upon rejecting an invalid one When o2net-accept-one() rejects an illegal connection, it terminates the loop picking up the remaining queued connections. This fix will continue accepting connections till the queue is emtpy. Addresses Orabug 17489469. Signed-off-by: Tariq Saseed <tariq.x.saeed@oracle.com> Signed-off-by: Srinivas Eeda <srinivas.eeda@oracle.com> Reviewed-by: Mark Fasheh <mfasheh@suse.de> Cc: Joel Becker <jlbec@evilplan.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2014-06-10 15:34:46 -07:00
Linus Torvalds	d1e1cda862	NFS client updates for Linux 3.16 Highlights include: - Massive cleanup of the NFS read/write code by Anna and Dros - Support multiple NFS read/write requests per page in order to deal with non-page aligned pNFS striping. Also cleans up the r/wsize < page size code nicely. - stable fix for ensuring inode is declared uptodate only after all the attributes have been checked. - stable fix for a kernel Oops when remounting - NFS over RDMA client fixes - move the pNFS files layout driver into its own subdirectory -----BEGIN PGP SIGNATURE----- Version: GnuPG v1 iQIcBAABAgAGBQJTl3pmAAoJEGcL54qWCgDyraIP/08ZbbDowVTP9572bxl+VR2i zNbrflBtl1R05D4Imi/IEySK0w6xj1CLsncNpXAT2bxTlyKPW70tpiiPlRKMPuO8 JW+iPiepR2t0mol6MEd46yuV8btXVk8I+7IYjPXANiMJG8O5dJzNQ8NiCQOERBNt FQ7rzTCFO0ESGXnT6vYrT4I0bwqYVklBiJRTT4PQVzhhhDq9qUdq21BlQjQJFXP4 9aBLurxKptlHBvE6A2Quja6ObEC0s31CxcijqHIJ+Ue4GbKcFbMG1tgjY7ESE/AD rqzDeF0jvWHT+frmvFEUUXWqzF1ReZ4x9pfDoOgeG6T9/K6DT91O0yMOgG8jvlbF 8DSATNYGDX5sSjpvaG5JokGG+cGCk9srVDx+itn7HlwzalRwn0PjKtIYwOJ7TJIr o/j20nOsPrRGF0OqLf9phyocgRrlbMKOzj1IXldHHfAbNkRcISTK08lxvsz96Ddn zRyDmbsbY6QFXdB3AVSeQmg5R0OOLtzNIcsFPmNdvy5eiy67qU0lsGg8UGNnoz8k PHN1pcGejkctLhQ32ee3w/W6zkrgpJZcNC9JSoG8Dc3SeXus0c3IgumRknFCmiep ssN+1jEITAGeS5a2aBxwLQLVI2JAr2lxs5e+R4D5EsQlFkCl6Mrgtzh/aToWTuFl Qt7l2zI3r3VieKT9u7Bh =OyXR -----END PGP SIGNATURE----- Merge tag 'nfs-for-3.16-1' of git://git.linux-nfs.org/projects/trondmy/linux-nfs Pull NFS client updates from Trond Myklebust: "Highlights include: - massive cleanup of the NFS read/write code by Anna and Dros - support multiple NFS read/write requests per page in order to deal with non-page aligned pNFS striping. Also cleans up the r/wsize < page size code nicely. - stable fix for ensuring inode is declared uptodate only after all the attributes have been checked. - stable fix for a kernel Oops when remounting - NFS over RDMA client fixes - move the pNFS files layout driver into its own subdirectory" * tag 'nfs-for-3.16-1' of git://git.linux-nfs.org/projects/trondmy/linux-nfs: (79 commits) NFS: populate ->net in mount data when remounting pnfs: fix lockup caused by pnfs_generic_pg_test NFSv4.1: Fix typo in dprintk NFSv4.1: Comment is now wrong and redundant to code NFS: Use raw_write_seqcount_begin/end int nfs4_reclaim_open_state xprtrdma: Disconnect on registration failure xprtrdma: Remove BUG_ON() call sites xprtrdma: Avoid deadlock when credit window is reset SUNRPC: Move congestion window constants to header file xprtrdma: Reset connection timeout after successful reconnect xprtrdma: Use macros for reconnection timeout constants xprtrdma: Allocate missing pagelist xprtrdma: Remove Tavor MTU setting xprtrdma: Ensure ia->ri_id->qp is not NULL when reconnecting xprtrdma: Reduce the number of hardway buffer allocations xprtrdma: Limit work done by completion handler xprtrmda: Reduce calls to ib_poll_cq() in completion handlers xprtrmda: Reduce lock contention in completion handlers xprtrdma: Split the completion queue xprtrdma: Make rpcrdma_ep_destroy() return void ...	2014-06-10 15:02:42 -07:00
Andy Lutomirski	23adbe12ef	fs,userns: Change inode_capable to capable_wrt_inode_uidgid The kernel has no concept of capabilities with respect to inodes; inodes exist independently of namespaces. For example, inode_capable(inode, CAP_LINUX_IMMUTABLE) would be nonsense. This patch changes inode_capable to check for uid and gid mappings and renames it to capable_wrt_inode_uidgid, which should make it more obvious what it does. Fixes CVE-2014-4014. Cc: Theodore Ts'o <tytso@mit.edu> Cc: Serge Hallyn <serge.hallyn@ubuntu.com> Cc: "Eric W. Biederman" <ebiederm@xmission.com> Cc: Dave Chinner <david@fromorbit.com> Cc: stable@vger.kernel.org Signed-off-by: Andy Lutomirski <luto@amacapital.net> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2014-06-10 13:57:22 -07:00
Chris Mason	c7548af69d	Btrfs: convert smp_mb__{before,after}_clear_bit The new call is smp_mb__{before,after}_atomic. The __ gives us extra protection from the atomic rays. Signed-off-by: Chris Mason <clm@fb.com>	2014-06-10 13:10:47 -07:00
Linus Torvalds	5b174fd647	Merge branch 'for-3.16' of git://linux-nfs.org/~bfields/linux Pull nfsd updates from Bruce Fields: "The largest piece is a long-overdue rewrite of the xdr code to remove some annoying limitations: for example, there was no way to return ACLs larger than 4K, and readdir results were returned only in 4k chunks, limiting performance on large directories. Also: - part of Neil Brown's work to make NFS work reliably over the loopback interface (so client and server can run on the same machine without deadlocks). The rest of it is coming through other trees. - cleanup and bugfixes for some of the server RDMA code, from Steve Wise. - Various cleanup of NFSv4 state code in preparation for an overhaul of the locking, from Jeff, Trond, and Benny. - smaller bugfixes and cleanup from Christoph Hellwig and Kinglong Mee. Thanks to everyone! This summer looks likely to be busier than usual for knfsd. Hopefully we won't break it too badly; testing definitely welcomed" * 'for-3.16' of git://linux-nfs.org/~bfields/linux: (100 commits) nfsd4: fix FREE_STATEID lockowner leak svcrdma: Fence LOCAL_INV work requests svcrdma: refactor marshalling logic nfsd: don't halt scanning the DRC LRU list when there's an RC_INPROG entry nfs4: remove unused CHANGE_SECURITY_LABEL nfsd4: kill READ64 nfsd4: kill READ32 nfsd4: simplify server xdr->next_page use nfsd4: hash deleg stateid only on successful nfs4_set_delegation nfsd4: rename recall_lock to state_lock nfsd: remove unneeded zeroing of fields in nfsd4_proc_compound nfsd: fix setting of NFS4_OO_CONFIRMED in nfsd4_open nfsd4: use recall_lock for delegation hashing nfsd: fix laundromat next-run-time calculation nfsd: make nfsd4_encode_fattr static SUNRPC/NFSD: Remove using of dprintk with KERN_WARNING nfsd: remove unused function nfsd_read_file nfsd: getattr for FATTR4_WORD0_FILES_AVAIL needs the statfs buffer NFSD: Error out when getting more than one fsloc/secinfo/uuid NFSD: Using type of uint32_t for ex_nflavors instead of int ...	2014-06-10 11:50:57 -07:00
Jeff Layton	0c27362998	locks: set fl_owner for leases back to current->files This fixes a regression due to commit `130d1f956a` (locks: ensure that fl_owner is always initialized properly in flock and lease codepaths). I had mistakenly thought that the fl_owner wasn't used in the lease code, but I missed the place in __break_lease that does use it. The i_have_this_lease check in generic_add_lease uses it. While I'm not sure that check is terribly helpful [1], reset it back to using current->files in order to ensure that there's no behavior change here. [1]: leases are owned by the file description. It's possible that this is a threaded program, and the lease breaker and the task that would handle the signal are different, even if they have the same file table. So, there is the potential for false positives with this check. Fixes: `130d1f956a` (locks: ensure that fl_owner is always initialized properly in flock and lease codepaths) Signed-off-by: Jeff Layton <jlayton@primarydata.com>	2014-06-10 12:29:05 -04:00
Linus Torvalds	d53b47c08d	This pull request contains several UBIFS fixes. One of them fixes a race condition between the mmap page fault path and fsync. Another just removes a bogus assertion from the UBIFS memory shrinker. UBIFS also started honoring the MS_SILENT mount flag, so now it won't print many I/O errors when user-space just tries to probe for the FS. Rest of the changes are rather minor UBI/UBIFS fixes, improvements, and clean-ups. -----BEGIN PGP SIGNATURE----- Version: GnuPG v1 iQIcBAABAgAGBQJTlqqmAAoJECmIfjd9wqK0cYAP+QGoB8+DOGv4+rTtUegNaWaG QAeAkvOBxDa72NrTYMtDFBKiPPYcz0BUnPgsCyvYIlYknRuZ4xMsu1BucLF3y8E3 P8aHpnNas0dA3el/M+M0qwpEG0pb1lQvFT1dJVP6/D4q4VexLlgt7PlQjP+98Dua jp9jBDMu3sEDsqA9NiO2hfuFuIqF6DFnBpglkNcGcAyOtzQ/Ps49nivhb1mFKuzI 2kSm2kWmZCTnLlGG1uj9+diCpTKWZoES2Jmo6gcZjQIjlejSzQw4Fo8LqdmhfwOg 0ba1D9kJuwPHmBeM0UW4fLtrJHkvs2F66YaRcbA7GUo5lNw6+yxTqi3jkGevsPOD 7eQB8FVCco14PFKu8Vx4lbkS2ekZN6aQuYYw/EeY2f+w8MpQTfcWINP5OVNHHGVA QDeRiu9mRMbm3JARRe9NeL0+sryGN402jCHtESAhONUDsCmZKX9k8HUMY1iaPAIp kGJWOjbyzYRMJiOfQiklv4N6rusmnECiRWGCliKDCU6TER8U6m9ZCUHUKmtX+56c 2yRtd6y6LVequ/p3wC3x66jF64wjh2PlTM5jrWSOIg42ihIGSMAbmB6MKA+4RxIv EejZ4lhCxUg6Xp7zd/qgdu3aJ/P2OM8yFL+BngGKFac54EnTDEUcc7d8c2DZZv8s 7YYjpiKK1NQGnih0FABA =CP7Z -----END PGP SIGNATURE----- Merge tag 'upstream-3.16-rc1-v2' of git://git.infradead.org/linux-ubifs Pull UBIFS updates from Artem Bityutskiy: "This contains several UBIFS fixes. One of them fixes a race condition between the mmap page fault path and fsync. Another just removes a bogus assertion from the UBIFS memory shrinker. UBIFS also started honoring the MS_SILENT mount flag, so now it won't print many I/O errors when user-space just tries to probe for the FS. Rest of the changes are rather minor UBI/UBIFS fixes, improvements, and clean-ups" * tag 'upstream-3.16-rc1-v2' of git://git.infradead.org/linux-ubifs: UBIFS: Add an assertion for clean_zn_cnt UBIFS: respect MS_SILENT mount flag UBIFS: Remove incorrect assertion in shrink_tnc() UBIFS: fix debugging check UBIFS: add missing ui pointer in debugging code UBI: block: Fix error path on alloc_workqueue failure UBIFS: Fix dump messages in ubifs_dump_lprops UBI: fix rb_tree node comparison in add_map UBIFS: Remove unused variables in ubifs_budget_space UBI: weaken the 'exclusive' constraint when opening volumes to rename UBIFS: fix an mmap and fsync race condition	2014-06-10 08:55:46 -07:00
Mateusz Guzik	a914722f33	NFS: populate ->net in mount data when remounting Otherwise the kernel oopses when remounting with IPv6 server because net is dereferenced in dev_get_by_name. Use net ns of current thread so that dev_get_by_name does not operate on foreign ns. Changing the address is prohibited anyway so this should not affect anything. Signed-off-by: Mateusz Guzik <mguzik@redhat.com> Cc: linux-nfs@vger.kernel.org Cc: linux-kernel@vger.kernel.org Cc: stable@vger.kernel.org # 3.4+ Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>	2014-06-10 11:10:29 -04:00
Weston Andros Adamson	c5e20cb700	pnfs: fix lockup caused by pnfs_generic_pg_test end_offset and req_offset both return u64 - avoid casting to u32 until it's needed, when it's less than the (u32) size returned by nfs_generic_pg_test. Also, fix the comments in pnfs_generic_pg_test. Running the cthon04 special tests caused this lockup in the "write/read at 2GB, 4GB edges" test when running against a file layout server: BUG: soft lockup - CPU#0 stuck for 22s! [bigfile2:823] Modules linked in: nfs_layout_nfsv41_files rpcsec_gss_krb5 nfsv4 nfs fscache ip6t_REJECT nf_conntrack_ipv6 nf_defrag_ipv6 ip6table_mangle ip6table_filter ip6_tables iptable_nat nf_nat_ipv4 nf_nat iptable_mangle ppdev crc32c_intel aesni_intel aes_x86_64 glue_helper lrw gf128mul ablk_helper cryptd serio_raw e1000 shpchp i2c_piix4 i2c_core parport_pc parport nfsd auth_rpcgss oid_registry exportfs nfs_acl lockd sunrpc btrfs xor zlib_deflate raid6_pq mptspi scsi_transport_spi mptscsih mptbase ata_generic floppy autofs4 irq event stamp: 205958 hardirqs last enabled at (205957): [<ffffffff814a62dc>] restore_args+0x0/0x30 hardirqs last disabled at (205958): [<ffffffff814ad96a>] apic_timer_interrupt+0x6a/0x80 softirqs last enabled at (205956): [<ffffffff8103ffb2>] __do_softirq+0x1ea/0x2ab softirqs last disabled at (205951): [<ffffffff8104026d>] irq_exit+0x44/0x9a CPU: 0 PID: 823 Comm: bigfile2 Not tainted 3.15.0-rc1-branch-pgio_plus+ #3 Hardware name: VMware, Inc. VMware Virtual Platform/440BX Desktop Reference Platform, BIOS 6.00 07/31/2013 task: ffff8800792ec480 ti: ffff880078c4e000 task.ti: ffff880078c4e000 RIP: 0010:[<ffffffffa02ce51f>] [<ffffffffa02ce51f>] nfs_page_group_unlock+0x3e/0x4b [nfs] RSP: 0018:ffff880078c4fab0 EFLAGS: 00000202 RAX: 0000000000000fff RBX: ffff88006bf83300 RCX: 0000000000000000 RDX: 0000000000000001 RSI: 0000000000000000 RDI: ffff88006bf83300 RBP: ffff880078c4fab8 R08: 0000000000000001 R09: 0000000000000000 R10: ffffffff8249840c R11: 0000000000000000 R12: 0000000000000035 R13: ffff88007ffc72d8 R14: 0000000000000001 R15: 0000000000000000 FS: 00007f45f11b7740(0000) GS:ffff88007f200000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 00007f3a8cb632d0 CR3: 000000007931c000 CR4: 00000000001407f0 Stack: ffff88006bf832c0 ffff880078c4fb00 ffffffffa02cec22 ffff880078c4fad8 00000fff810f9d99 ffff880078c4fca0 ffff88006bf832c0 ffff88006bf832c0 ffff880078c4fca0 ffff880078c4fd60 ffff880078c4fb28 ffffffffa02cee34 Call Trace: [<ffffffffa02cec22>] __nfs_pageio_add_request+0x298/0x34f [nfs] [<ffffffffa02cee34>] nfs_pageio_add_request+0x1f/0x42 [nfs] [<ffffffffa02d1722>] nfs_do_writepage+0x1b5/0x1e4 [nfs] [<ffffffffa02d1764>] nfs_writepages_callback+0x13/0x25 [nfs] [<ffffffffa02d1751>] ? nfs_do_writepage+0x1e4/0x1e4 [nfs] [<ffffffff810eb32d>] write_cache_pages+0x254/0x37f [<ffffffffa02d1751>] ? nfs_do_writepage+0x1e4/0x1e4 [nfs] [<ffffffff8149cf9e>] ? printk+0x54/0x56 [<ffffffff810eacca>] ? __set_page_dirty_nobuffers+0x22/0xe9 [<ffffffffa016d864>] ? put_rpccred+0x38/0x101 [sunrpc] [<ffffffffa02d1ae1>] nfs_writepages+0xb4/0xf8 [nfs] [<ffffffff810ec59c>] do_writepages+0x21/0x2f [<ffffffff810e36e8>] __filemap_fdatawrite_range+0x55/0x57 [<ffffffff810e374a>] filemap_write_and_wait_range+0x2d/0x5b [<ffffffffa030ba0a>] nfs4_file_fsync+0x3a/0x98 [nfsv4] [<ffffffff8114ee3c>] vfs_fsync_range+0x18/0x20 [<ffffffff810e40c2>] generic_file_aio_write+0xa7/0xbd [<ffffffffa02c5c6b>] nfs_file_write+0xf0/0x170 [nfs] [<ffffffff81129215>] do_sync_write+0x59/0x78 [<ffffffff8112956c>] vfs_write+0xab/0x107 [<ffffffff81129c8b>] SyS_write+0x49/0x7f [<ffffffff814acd12>] system_call_fastpath+0x16/0x1b Reported-by: Anna Schumaker <Anna.Schumaker@netapp.com> Signed-off-by: Weston Andros Adamson <dros@primarydata.com> Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>	2014-06-10 11:07:56 -04:00
Linus Torvalds	64b2d1fbbf	f2fs updates for v3.16 This patch-set includes the following major enhancement patches. o enhance wait_on_page_writeback o support SEEK_DATA and SEEK_HOLE o enhance readahead flows o enhance IO flushes o support fiemap o add some tracepoints The other bug fixes are as follows. o fix to support a large volume > 2TB correctly o recovery bug fix wrt fallocated space o fix recursive lock on xattr operations o fix some cases on the remount flow And, there are a bunch of cleanups. -----BEGIN PGP SIGNATURE----- Version: GnuPG v1 iQIcBAABAgAGBQJTleLYAAoJEEAUqH6CSFDSdhEP/iY5hTZ02jH4ZiFPP/Nee4hS l0BHeZvrMjoccaWUDu0ZvIPC8BiJ7kLOgK+/VWS7LAfY1PY11ALEtYQOrW+RM47+ jMfULegTod/F8WS2B6dk31QMhAZldtnsYvA5PS1VV3S0rht+qbOz+PDejZFgSMc3 VVQ7OZzq30gMmtsw7oh3FHfeTu4xe/bxygdMRXgljQQD2MvorJiOb4MA+ovEDd9z CZMMTQvRKjc0d8LPf0gOkZEvG63GR6klCgFMuiappUsua0O8IPIEhCyEGFrE66vS fObVKpqNAsR2ABhS2grgn6Q7UUvn4xrF6jOwDH5tuw2yzmkQiMAWINwBdgnbEy1c D5S89PQ8TkQK9KBSoU0v8WKWC4HzWZF4ZEd6eq9VxVTS8iT0w8DtNHXK518FVC34 82iqrLc0EhrcGNiW/7Nrc6WzHrWqorylCFN7atB3ruhVqeMh3MZsDU4Gq0WgB2oh pF9XVBEpJQpV35DYSAPzLkm+2+mwHVNqwdY3HcvUs7IP90+wZlrWSRG2FEfFt/e8 6nwvbORrHMTI5VfdYlEPgpjuesFmYPZqEvRGdaDa1YrHqhvvgStEPT9qiq2qVn9+ tr0HjpNRDD/etkaE6ujYOYqdxuk3mm6RY68h+KSbNcY1/VTvt2bN2telwWuDsxIq 8yOsxV2x3JB/euDAJsSU =xqsO -----END PGP SIGNATURE----- Merge tag 'for-f2fs-3.16' of git://git.kernel.org/pub/scm/linux/kernel/git/jaegeuk/f2fs Pull f2fs updates from Jaegeuk Kim: "In this round, there is no special interesting feature, but we've investigated a couple of tuning points with respect to the I/O flow. Several major bug fixes and a bunch of clean-ups also have been made. This patch-set includes the following major enhancement patches: - enhance wait_on_page_writeback - support SEEK_DATA and SEEK_HOLE - enhance readahead flows - enhance IO flushes - support fiemap - add some tracepoints The other bug fixes are as follows: - fix to support a large volume > 2TB correctly - recovery bug fix wrt fallocated space - fix recursive lock on xattr operations - fix some cases on the remount flow And, there are a bunch of cleanups" * tag 'for-f2fs-3.16' of git://git.kernel.org/pub/scm/linux/kernel/git/jaegeuk/f2fs: (52 commits) f2fs: support f2fs_fiemap f2fs: avoid not to call remove_dirty_inode f2fs: recover fallocated space f2fs: fix to recover data written by dio f2fs: large volume support f2fs: avoid crash when trace f2fs_submit_page_mbio event in ra_sum_pages f2fs: avoid overflow when large directory feathure is enabled f2fs: fix recursive lock by f2fs_setxattr MAINTAINERS: add a co-maintainer from samsung for F2FS MAINTAINERS: change the email address for f2fs f2fs: use inode_init_owner() to simplify codes f2fs: avoid to use slab memory in f2fs_issue_flush for efficiency f2fs: add a tracepoint for f2fs_read_data_page f2fs: add a tracepoint for f2fs_write_{meta,node,data}_pages f2fs: add a tracepoint for f2fs_write_{meta,node,data}_page f2fs: add a tracepoint for f2fs_write_end f2fs: add a tracepoint for f2fs_write_begin f2fs: fix checkpatch warning f2fs: deactivate inode page if the inode is evicted f2fs: decrease the lock granularity during write_begin ...	2014-06-09 19:11:44 -07:00
Linus Torvalds	b1cce8032f	Merge branch 'for-next' of git://git.samba.org/sfrench/cifs-2.6 Pull CIFS fixes from Steve French. * 'for-next' of git://git.samba.org/sfrench/cifs-2.6: CIFS: Fix memory leaks in SMB2_open cifs: ensure that vol->username is not NULL before running strlen on it Clarify SMB2/SMB3 create context and add missing ones Do not send ClientGUID on SMB2.02 dialect cifs: Set client guid on per connection basis fs/cifs/netmisc.c: convert printk to pr_foo() fs/cifs/cifs.c: replace seq_printf by seq_puts Update cifs version number to 2.03 fs: cifs: new helper: file_inode(file) cifs: fix potential races in cifs_revalidate_mapping cifs: new helper function: cifs_revalidate_mapping cifs: convert booleans in cifsInodeInfo to a flags field cifs: fix cifs_uniqueid_to_ino_t not to ever return 0	2014-06-09 19:08:43 -07:00
Liu Bo	6eda71d0c0	Btrfs: fix scrub_print_warning to handle skinny metadata extents The skinny extents are intepreted incorrectly in scrub_print_warning(), and end up hitting the BUG() in btrfs_extent_inline_ref_size. Reported-by: Konstantinos Skarlatos <k.skarlatos@gmail.com> Signed-off-by: Liu Bo <bo.li.liu@oracle.com> Signed-off-by: Chris Mason <clm@fb.com>	2014-06-09 17:21:17 -07:00
Filipe Manana	7ffbb598a0	Btrfs: make fsync work after cloning into a file When cloning into a file, we were correctly replacing the extent items in the target range and removing the extent maps. However we weren't replacing the extent maps with new ones that point to the new extents - as a consequence, an incremental fsync (when the inode doesn't have the full sync flag) was a NOOP, since it relies on the existence of extent maps in the modified list of the inode's extent map tree, which was empty. Therefore add new extent maps to reflect the target clone range. A test case for xfstests follows. Signed-off-by: Filipe David Borba Manana <fdmanana@gmail.com> Signed-off-by: Chris Mason <clm@fb.com>	2014-06-09 17:21:16 -07:00
Liu Bo	cd857dd6bc	Btrfs: use right type to get real comparison We want to make sure the point is still within the extent item, not to verify the memory it's pointing to. Signed-off-by: Liu Bo <bo.li.liu@oracle.com> Signed-off-by: Chris Mason <clm@fb.com>	2014-06-09 17:21:15 -07:00
Josef Bacik	8a56457f5f	Btrfs: don't check nodes for extent items The backref code was looking at nodes as well as leaves when we tried to populate extent item entries. This is not good, and although we go away with it for the most part because we'd skip where disk_bytenr != random_memory, sometimes random_memory would match and suddenly boom. This fixes that problem. Thanks, Signed-off-by: Josef Bacik <jbacik@fb.com> Signed-off-by: Chris Mason <clm@fb.com>	2014-06-09 17:21:14 -07:00
Filipe Manana	6fdef6d43c	Btrfs: don't release invalid page in btrfs_page_exists_in_range() In inode.c:btrfs_page_exists_in_range(), if the page we got from the radix tree is an exception entry, which can't be retried, we exit the loop with a non-NULL page and then call page_cache_release against it, which is not ok since it's not a valid page. This could also make us return true when we shouldn't. Signed-off-by: Filipe David Borba Manana <fdmanana@gmail.com> Signed-off-by: Chris Mason <clm@fb.com>	2014-06-09 17:21:14 -07:00
Filipe Manana	809f901625	Btrfs: make sure we retry if page is a retriable exception In inode.c:btrfs_page_exists_in_range(), if the page we get from the radix tree is an exception which should make us retry, set page to NULL in order to really retry, because otherwise we don't get another loop iteration executed (page != NULL makes the while loop exit). This also was making us call page_cache_release after exiting the loop, which isn't correct because page doesn't point to a valid page, and possibly return true from the function when we shouldn't. Signed-off-by: Filipe David Borba Manana <fdmanana@gmail.com> Signed-off-by: Chris Mason <clm@fb.com>	2014-06-09 17:21:13 -07:00
Filipe Manana	91405151eb	Btrfs: make sure we retry if we couldn't get the page In inode.c:btrfs_page_exists_in_range(), if we can't get the page we need to retry. However we weren't retrying because we weren't setting page to NULL, which makes the while loop exit immediately and will make us call page_cache_release after exiting the loop which is incorrect because our page get didn't succeed. This could also make us return true when we shouldn't. Signed-off-by: Filipe David Borba Manana <fdmanana@gmail.com> Signed-off-by: Chris Mason <clm@fb.com>	2014-06-09 17:21:12 -07:00
Gui Hecheng	c81d57679e	btrfs: replace EINVAL with EOPNOTSUPP for dev_replace raid56 To return EOPNOTSUPP is more user friendly than to return EINVAL, and then user-space tool will show that the dev_replace operation for raid56 is not currently supported rather than showing that there is an invalid argument. Signed-off-by: Gui Hecheng <guihc.fnst@cn.fujitsu.com> Signed-off-by: Chris Mason <clm@fb.com>	2014-06-09 17:21:12 -07:00
Antonio Ospite	9391558411	trivial: fs/btrfs/ioctl.c: fix typo s/substract/subtract/ Signed-off-by: Antonio Ospite <ao2@ao2.it> Cc: Chris Mason <clm@fb.com> Cc: Josef Bacik <jbacik@fb.com> Cc: linux-btrfs@vger.kernel.org Signed-off-by: Chris Mason <clm@fb.com>	2014-06-09 17:21:11 -07:00
Liu Bo	0b43e04f70	Btrfs: fix leaf corruption after __btrfs_drop_extents Several reports about leaf corruption has been floating on the list, one of them points to __btrfs_drop_extents(), and we find that the leaf becomes corrupted after __btrfs_drop_extents(), it's really a rare case but it does exist. The problem turns out to be btrfs_next_leaf() called in __btrfs_drop_extents(). So in btrfs_next_leaf(), we release the current path to re-search the last key of the leaf for locating next leaf, and we've taken it into account that there might be balance operations between leafs during this 'unlock and re-lock' dance, so we check the path again and advance it if there are now more items available. But things are a bit different if that last key happens to be removed and balance gets a bigger key as the last one, and btrfs_search_slot will return it with ret > 0, IOW, nothing change in this leaf except the new last key, then we think we're okay because there is no more item balanced in, fine, we thinks we can go to the next leaf. However, we should return that bigger key, otherwise we deserve leaf corruption, for example, in endio, skipping that key means that __btrfs_drop_extents() thinks it has dropped all extent matched the required range and finish_ordered_io can safely insert a new extent, but it actually doesn't and ends up a leaf corruption. One may be asking that why our locking on extent io tree doesn't work as expected, ie. it should avoid this kind of race situation. But in __btrfs_drop_extents(), we don't always find extents which are included within our locking range, IOW, extents can start before our searching start, in this case locking on extent io tree doesn't protect us from the race. This takes the special case into account. Reviewed-by: Filipe Manana <fdmanana@gmail.com> Signed-off-by: Liu Bo <bo.li.liu@oracle.com> Signed-off-by: Chris Mason <clm@fb.com>	2014-06-09 17:21:10 -07:00
Filipe Manana	337c6f6830	Btrfs: ensure btrfs_prev_leaf doesn't miss 1 item We might have had an item with the previous key in the tree right before we released our path. And after we released our path, that item might have been pushed to the first slot (0) of the leaf we were holding due to a tree balance. Alternatively, an item with the previous key can exist as the only element of a leaf (big fat item). Therefore account for these 2 cases, so that our callers (like btrfs_previous_item) don't miss an existing item with a key matching the previous key we computed above. Signed-off-by: Filipe David Borba Manana <fdmanana@gmail.com> Signed-off-by: Chris Mason <clm@fb.com>	2014-06-09 17:21:09 -07:00
Filipe Manana	f82a9901b0	Btrfs: fix clone to deal with holes when NO_HOLES feature is enabled If the NO_HOLES feature is enabled holes don't have file extent items in the btree that represent them anymore. This made the clone operation ignore the gaps that exist between consecutive file extent items and therefore not create the holes at the destination. When not using the NO_HOLES feature, the holes were created at the destination. A test case for xfstests follows. Signed-off-by: Filipe David Borba Manana <fdmanana@gmail.com> Reviewed-by: Liu Bo <bo.li.liu@oracle.com> Signed-off-by: Chris Mason <clm@fb.com>	2014-06-09 17:21:09 -07:00
Jeff Mahoney	964930312a	btrfs: free delayed node outside of root->inode_lock On heavy workloads, we're seeing soft lockup warnings on root->inode_lock in __btrfs_release_delayed_node. The low hanging fruit is to reduce the size of the critical section. Signed-off-by: Jeff Mahoney <jeffm@suse.com> Reviewed-by: David Sterba <dsterba@suse.cz> Signed-off-by: Chris Mason <clm@fb.com>	2014-06-09 17:21:08 -07:00
Gui Hecheng	902c68a4da	btrfs: replace EINVAL with ERANGE for resize when ULLONG_MAX To be accurate about the error case, if the new size is beyond ULLONG_MAX, return ERANGE instead of EINVAL. Signed-off-by: Gui Hecheng <guihc.fnst@cn.fujitsu.com> Signed-off-by: Chris Mason <clm@fb.com>	2014-06-09 17:21:07 -07:00
Filipe Manana	b05fd8742f	Btrfs: fix transaction leak during fsync call If btrfs_log_dentry_safe() returns an error, we set ret to 1 and fall through with the goal of committing the transaction. However, in the case where the inode doesn't need a full sync, we would call btrfs_wait_ordered_range() against the target range for our inode, and if it returned an error, we would return without commiting or ending the transaction. Signed-off-by: Filipe David Borba Manana <fdmanana@gmail.com> Signed-off-by: Chris Mason <clm@fb.com>	2014-06-09 17:21:06 -07:00
Qu Wenruo	d77815461f	btrfs: Avoid trucating page or punching hole in a already existed hole. btrfs_punch_hole() will truncate unaligned pages or punch hole on a already existed hole. This will cause unneeded zero page or holes splitting the original huge hole. This patch will skip already existed holes before any page truncating or hole punching. Signed-off-by: Qu Wenruo <quwenruo@cn.fujitsu.com> Signed-off-by: Chris Mason <clm@fb.com>	2014-06-09 17:21:06 -07:00
Filipe Manana	3821f34888	Btrfs: update commit root on snapshot creation after orphan cleanup On snapshot creation (either writable or read-only), we do orphan cleanup against the root of the snapshot. If the cleanup did remove any orphans, then the current root node will be different from the commit root node until the next transaction commit happens. A send operation always uses the commit root of a snapshot - this means it will see the orphans if it starts computing the send stream before the next transaction commit happens (triggered by a timer or sync() for .e.g), which is when the commit root gets assigned a reference to current root, where the orphans are not visible anymore. The consequence of send seeing the orphans is explained below. For example: mkfs.btrfs -f /dev/sdd mount -o commit=999 /dev/sdd /mnt # open a file with O_TMPFILE and leave it open # write some data to the file btrfs subvolume snapshot -r /mnt /mnt/snap1 btrfs send /mnt/snap1 -f /tmp/send.data The send operation will fail with the following error: ERROR: send ioctl failed with -116: Stale file handle What happens here is that our snapshot has an orphan inode still visible through the commit root, that corresponds to the tmpfile. However send will attempt to call inode.c:btrfs_iget(), with the goal of reading the file's data, which will return -ESTALE because it will use the current root (and not the commit root) of the snapshot. Of course, there are other cases where we can get orphans, but this example using a tmpfile makes it much easier to reproduce the issue. Therefore on snapshot creation, after calling btrfs_orphan_cleanup, if the commit root is different from the current root, just commit the transaction associated with the snapshot's root (if it exists), so that a send will not see any orphans that don't exist anymore. This also guarantees a send will always see the same content regardless of whether a transaction commit happened already before the send was requested and after the orphan cleanup (meaning the commit root and current roots are the same) or it hasn't happened yet (commit and current roots are different). Signed-off-by: Filipe David Borba Manana <fdmanana@gmail.com> Signed-off-by: Chris Mason <clm@fb.com>	2014-06-09 17:21:05 -07:00
Filipe Manana	ff5df9b884	Btrfs: ioctl, don't re-lock extent range when not necessary In ioctl.c:lock_extent_range(), after locking our target range, the ordered extent that btrfs_lookup_first_ordered_extent() returns us may not overlap our target range at all. In this case we would just unlock our target range, wait for any new ordered extents that overlap the range to complete, lock again the range and repeat all these steps until we don't get any ordered extent and the delalloc flag isn't set in the io tree for our target range. Therefore just stop if we get an ordered extent that doesn't overlap our target range and the dealalloc flag isn't set for the range in the inode's io tree. Signed-off-by: Filipe David Borba Manana <fdmanana@gmail.com> Signed-off-by: Chris Mason <clm@fb.com>	2014-06-09 17:21:04 -07:00
Filipe Manana	2c463823cb	Btrfs: avoid visiting all extent items when cloning a range When cloning a range of a file, we were visiting all the extent items in the btree that belong to our source inode. We don't need to visit those extent items that don't overlap the range we are cloning, as doing so only makes us waste time and do unnecessary btree navigations (btrfs_next_leaf) for inodes that have a large number of file extent items in the btree. Signed-off-by: Filipe David Borba Manana <fdmanana@gmail.com> Signed-off-by: Chris Mason <clm@fb.com>	2014-06-09 17:21:04 -07:00
Filipe Manana	c55bfa67e9	Btrfs: set dead flag on the right root when destroying snapshot We were setting the BTRFS_ROOT_SUBVOL_DEAD flag on the root of the parent of our target snapshot, instead of setting it in the target snapshot's root. This is easy to observe by running the following scenario: mkfs.btrfs -f /dev/sdd mount /dev/sdd /mnt btrfs subvolume create /mnt/first_subvol btrfs subvolume snapshot -r /mnt /mnt/mysnap1 btrfs subvolume delete /mnt/first_subvol btrfs subvolume snapshot -r /mnt /mnt/mysnap2 btrfs send -p /mnt/mysnap1 /mnt/mysnap2 -f /tmp/send.data The send command failed because the send ioctl returned -EPERM. A test case for xfstests follows. Signed-off-by: Filipe David Borba Manana <fdmanana@gmail.com> Reviewed-by: David Sterba <dsterba@suse.cz> Signed-off-by: Chris Mason <clm@fb.com>	2014-06-09 17:21:03 -07:00
Filipe Manana	c125b8bff1	Btrfs: ensure readers see new data after a clone operation We were cleaning the clone target file range from the page cache before we did replace the file extent items in the fs tree. This was racy, as right after cleaning the relevant range from the page cache and before replacing the file extent items, a read against that range could be performed by another task and populate again the page cache with stale data (stale after the cloning finishes). This would result in reads after the clone operation successfully finishes to get old data (and potentially for a very long time). Therefore evict the pages after replacing the file extent items, so that subsequent reads will always get the new data. Similarly, we were prone to races while cloning the file extent items because we weren't locking the target range and wait for any existing ordered extents against that range to complete. It was possible that after cloning the extent items, a write operation that was performed before the clone operation and overlaps the same range, would end up undoing all or part of the work the clone operation did (a worker task running inode.c:btrfs_finish_ordered_io). Therefore lock the target range in the io tree, wait for all pending ordered extents against that range to finish and then safely perform the cloning. The issue of reading stale data after the clone operation is easy to reproduce by running the following C program in a loop until it exits with return value 1. #include <unistd.h> #include <stdio.h> #include <stdlib.h> #include <string.h> #include <errno.h> #include <pthread.h> #include <fcntl.h> #include <assert.h> #include <asm/types.h> #include <linux/ioctl.h> #include <sys/stat.h> #include <sys/types.h> #include <sys/ioctl.h> #define SRC_FILE "/mnt/sdd/foo" #define DST_FILE "/mnt/sdd/bar" #define FILE_SIZE (16 * 1024) #define PATTERN_SRC 'X' #define PATTERN_DST 'Y' struct btrfs_ioctl_clone_range_args { __s64 src_fd; __u64 src_offset, src_length; __u64 dest_offset; }; #define BTRFS_IOCTL_MAGIC 0x94 #define BTRFS_IOC_CLONE_RANGE _IOW(BTRFS_IOCTL_MAGIC, 13, \ struct btrfs_ioctl_clone_range_args) static pthread_mutex_t mutex = PTHREAD_MUTEX_INITIALIZER; static int clone_done = 0; static int reader_ready = 0; static int stale_data = 0; static void reader_loop(void arg) { char buf[4096], want_buf[4096]; memset(want_buf, PATTERN_SRC, 4096); pthread_mutex_lock(&mutex); reader_ready = 1; pthread_mutex_unlock(&mutex); while (1) { int done, fd, ret; fd = open(DST_FILE, O_RDONLY); assert(fd != -1); pthread_mutex_lock(&mutex); done = clone_done; pthread_mutex_unlock(&mutex); ret = read(fd, buf, 4096); assert(ret == 4096); close(fd); if (done) { ret = memcmp(buf, want_buf, 4096); if (ret == 0) { printf("Found new content\n"); } else { printf("Found old content\n"); pthread_mutex_lock(&mutex); stale_data = 1; pthread_mutex_unlock(&mutex); } break; } } return NULL; } int main(int argc, char *argv[]) { pthread_t reader; int ret, i, fd; struct btrfs_ioctl_clone_range_args clone_args; int fd1, fd2; ret = remove(SRC_FILE); if (ret == -1 && errno != ENOENT) { fprintf(stderr, "Error deleting src file: %s\n", strerror(errno)); return 1; } ret = remove(DST_FILE); if (ret == -1 && errno != ENOENT) { fprintf(stderr, "Error deleting dst file: %s\n", strerror(errno)); return 1; } fd = open(SRC_FILE, O_CREAT \| O_WRONLY \| O_TRUNC, S_IRWXU); assert(fd != -1); for (i = 0; i < FILE_SIZE; i++) { char c = PATTERN_SRC; ret = write(fd, &c, 1); assert(ret == 1); } close(fd); fd = open(DST_FILE, O_CREAT \| O_WRONLY \| O_TRUNC, S_IRWXU); assert(fd != -1); for (i = 0; i < FILE_SIZE; i++) { char c = PATTERN_DST; ret = write(fd, &c, 1); assert(ret == 1); } close(fd); sync(); ret = pthread_create(&reader, NULL, reader_loop, NULL); assert(ret == 0); while (1) { int r; pthread_mutex_lock(&mutex); r = reader_ready; pthread_mutex_unlock(&mutex); if (r) break; } fd1 = open(SRC_FILE, O_RDONLY); if (fd1 < 0) { fprintf(stderr, "Error open src file: %s\n", strerror(errno)); return 1; } fd2 = open(DST_FILE, O_RDWR); if (fd2 < 0) { fprintf(stderr, "Error open dst file: %s\n", strerror(errno)); return 1; } clone_args.src_fd = fd1; clone_args.src_offset = 0; clone_args.src_length = 4096; clone_args.dest_offset = 0; ret = ioctl(fd2, BTRFS_IOC_CLONE_RANGE, &clone_args); assert(ret == 0); close(fd1); close(fd2); pthread_mutex_lock(&mutex); clone_done = 1; pthread_mutex_unlock(&mutex); ret = pthread_join(reader, NULL); assert(ret == 0); pthread_mutex_lock(&mutex); ret = stale_data ? 1 : 0; pthread_mutex_unlock(&mutex); return ret; } Signed-off-by: Filipe David Borba Manana <fdmanana@gmail.com> Signed-off-by: Chris Mason <clm@fb.com>	2014-06-09 17:21:02 -07:00
Rickard Strandqvist	8321cf2596	fs: btrfs: volumes.c: Fix for possible null pointer dereference There is otherwise a risk of a possible null pointer dereference. Was largely found by using a static code analysis program called cppcheck. Signed-off-by: Rickard Strandqvist <rickard_strandqvist@spectrumdigital.se> Signed-off-by: Chris Mason <clm@fb.com>	2014-06-09 17:21:01 -07:00
Jeff Mahoney	c1895442be	btrfs: allocate raid type kobjects dynamically We are currently allocating space_info objects in an array when we allocate space_info. When a user does something like: # btrfs balance start -mconvert=raid1 -dconvert=raid1 /mnt # btrfs balance start -mconvert=single -dconvert=single /mnt -f # btrfs balance start -mconvert=raid1 -dconvert=raid1 / We can end up with memory corruption since the kobject hasn't been reinitialized properly and the name pointer was left set. The rationale behind allocating them statically was to avoid creating a separate kobject container that just contained the raid type. It used the index in the array to determine the index. Ultimately, though, this wastes more memory than it saves in all but the most complex scenarios and introduces kobject lifetime questions. This patch allocates the kobjects dynamically instead. Note that we also remove the kobject_get/put of the parent kobject since kobject_add and kobject_del do that internally. Signed-off-by: Jeff Mahoney <jeffm@suse.com> Reported-by: David Sterba <dsterba@suse.cz> Signed-off-by: Chris Mason <clm@fb.com>	2014-06-09 17:21:01 -07:00
Filipe Manana	7e3ae33efa	Btrfs: send, use the right limits for xattr names and values We were limiting the sum of the xattr name and value lengths to PATH_MAX, which is not correct, specially on filesystems created with btrfs-progs v3.12 or higher, where the default leaf size is max(16384, PAGE_SIZE), or systems with page sizes larger than 4096 bytes. Xattrs have their own specific maximum name and value lengths, which depend on the leaf size, therefore use these limits to be able to send xattrs with sizes larger than PATH_MAX. A test case for xfstests follows. Signed-off-by: Filipe David Borba Manana <fdmanana@gmail.com> Signed-off-by: Chris Mason <clm@fb.com>	2014-06-09 17:21:00 -07:00
Filipe Manana	1af56070e3	Btrfs: send, don't error in the presence of subvols/snapshots If we are doing an incremental send and the base snapshot has a directory with name X that doesn't exist anymore in the second snapshot and a new subvolume/snapshot exists in the second snapshot that has the same name as the directory (name X), the incremental send would fail with -ENOENT error. This is because it attempts to lookup for an inode with a number matching the objectid of a root, which doesn't exist. Steps to reproduce: mkfs.btrfs -f /dev/sdd mount /dev/sdd /mnt mkdir /mnt/testdir btrfs subvolume snapshot -r /mnt /mnt/mysnap1 rmdir /mnt/testdir btrfs subvolume create /mnt/testdir btrfs subvolume snapshot -r /mnt /mnt/mysnap2 btrfs send -p /mnt/mysnap1 /mnt/mysnap2 -f /tmp/send.data A test case for xfstests follows. Reported-by: Robert White <rwhite@pobox.com> Signed-off-by: Filipe David Borba Manana <fdmanana@gmail.com> Signed-off-by: Chris Mason <clm@fb.com>	2014-06-09 17:20:59 -07:00
Chris Mason	a79b7d4b3e	Btrfs: async delayed refs Delayed extent operations are triggered during transaction commits. The goal is to queue up a healthly batch of changes to the extent allocation tree and run through them in bulk. This farms them off to async helper threads. The goal is to have the bulk of the delayed operations being done in the background, but this is also important to limit our stack footprint. Signed-off-by: Chris Mason <clm@fb.com>	2014-06-09 17:20:58 -07:00
Chris Mason	40f765805f	Btrfs: split up __extent_writepage to lower stack usage __extent_writepage has two unrelated parts. First it does the delayed allocation dance and second it does the mapping and IO for the page we're actually writing. This splits it up into those two parts so the stack from one doesn't impact the stack from the other. Signed-off-by: Chris Mason <clm@fb.com>	2014-06-09 17:20:58 -07:00
Alex Gartrell	fc4adbff82	btrfs: Drop EXTENT_UPTODATE check in hole punching and direct locking In these instances, we are trying to determine if a page has been accessed since we began the operation for the sake of retry. This is easily accomplished by doing a gang lookup in the page mapping radix tree, and it saves us the dependency on the flag (so that we might eventually delete it). btrfs_page_exists_in_range borrows heavily from find_get_page, replacing the radix tree look up with a gang lookup of 1, so that we can find the next highest page >= index and see if it falls into our lock range. Signed-off-by: Chris Mason <clm@fb.com> Signed-off-by: Alex Gartrell <agartrell@fb.com>	2014-06-09 17:20:57 -07:00
Chris Mason	0e378df15c	Btrfs: cut down stack usage in btree_write_cache_pages This adds noinline_for_stack to two helpers used by btree_write_cache_pages. It shaves us down from 424 bytes on the stack to 280. Signed-off-by: Chris Mason <clm@fb.com>	2014-06-09 17:20:56 -07:00
Chris Mason	d4452bc526	Btrfs: break up __btrfs_write_out_cache to cut down stack usage __btrfs_write_out_cache was one of our stack pigs. This breaks it up into helper functions and slims it down to 194 bytes. Signed-off-by: Chris Mason <clm@fb.com>	2014-06-09 17:20:55 -07:00
Josef Bacik	2a10840945	Btrfs: free tmp ulist for qgroup rescan Memory leaks are bad mmkay? Signed-off-by: Josef Bacik <jbacik@fb.com> Signed-off-by: Chris Mason <clm@fb.com>	2014-06-09 17:20:55 -07:00
Anand Jain	402a0f4759	btrfs: usage error should not be logged into system log I have an opinion that system logs /var/log/messages are valuable info to investigate the real system issues at the data center. People handling data center issues do spend a lot time and efforts analyzing messages files. Having usage error logged into /var/log/messages is something we should avoid. Signed-off-by: Anand Jain <Anand.Jain@oracle.com> Reviewed-by: David Sterba <dsterba@suse.cz> Signed-off-by: Chris Mason <clm@fb.com>	2014-06-09 17:20:54 -07:00
David Sterba	67a77eb147	btrfs: remove newline from inode cache kthread name Signed-off-by: David Sterba <dsterba@suse.cz> Signed-off-by: Chris Mason <clm@fb.com>	2014-06-09 17:20:53 -07:00
David Sterba	351fd35321	btrfs: remove stale newlines from log messages I've noticed an extra line after "use no compression", but search revealed much more in messages of more critical levels and rare errors. Signed-off-by: David Sterba <dsterba@suse.cz> Signed-off-by: Chris Mason <clm@fb.com>	2014-06-09 17:20:53 -07:00
Chris Mason	7d78874273	Btrfs: fix double free in find_lock_delalloc_range We need to NULL the cached_state after freeing it, otherwise we might free it again if find_delalloc_range doesn't find anything. Signed-off-by: Chris Mason <clm@fb.com> cc: stable@vger.kernel.org	2014-06-09 17:20:52 -07:00
ZhangZhen	58dfae6365	btrfs: replace simple_strtoull() with kstrtoull() use the newer and more pleasant kstrtoull() to replace simple_strtoull(), because simple_strtoull() is marked for obsoletion. Signed-off-by: Zhang Zhen <zhenzhang.zhang@huawei.com> Signed-off-by: Chris Mason <clm@fb.com>	2014-06-09 17:20:51 -07:00
Wang Shilong	298658414a	Btrfs: set right total device count for seeding support Seeding device support allows us to create a new filesystem based on existed filesystem. However newly created filesystem's @total_devices should include seed devices. This patch fix the following problem: # mkfs.btrfs -f /dev/sdb # btrfstune -S 1 /dev/sdb # mount /dev/sdb /mnt # btrfs device add -f /dev/sdc /mnt --->fs_devices->total_devices = 1 # umount /mnt # mount /dev/sdc /mnt --->fs_devices->total_devices = 2 This is because we record right @total_devices in superblock, but @fs_devices->total_devices is reset to be 0 in btrfs_prepare_sprout(). Fix this problem by not resetting @fs_devices->total_devices. Signed-off-by: Wang Shilong <wangsl.fnst@cn.fujitsu.com> Signed-off-by: Chris Mason <clm@fb.com>	2014-06-09 17:20:50 -07:00
Guangliang Zhao	45ff35d6b9	Btrfs: remove OPT_acl parse when acl disabled Even CONFIG_BTRFS_FS_POSIX_ACL is not defined, the acl still could been enabled using a mount option, and now fs/btrfs/acl.o is not built, so the mount options will appear to be supported but will be silently ignored. Signed-off-by: Guangliang Zhao <lucienchao@gmail.com> Reviewed-by: David Sterba <dsterba@suse.cz> Signed-off-by: Chris Mason <clm@fb.com>	2014-06-09 17:20:50 -07:00
Josef Bacik	faa2dbf004	Btrfs: add sanity tests for new qgroup accounting code This exercises the various parts of the new qgroup accounting code. We do some basic stuff and do some things with the shared refs to make sure all that code works. I had to add a bunch of infrastructure because I needed to be able to insert items into a fake tree without having to do all the hard work myself, hopefully this will be usefull in the future. Thanks, Signed-off-by: Josef Bacik <jbacik@fb.com> Signed-off-by: Chris Mason <clm@fb.com>	2014-06-09 17:20:49 -07:00
Josef Bacik	fcebe4562d	Btrfs: rework qgroup accounting Currently qgroups account for space by intercepting delayed ref updates to fs trees. It does this by adding sequence numbers to delayed ref updates so that it can figure out how the tree looked before the update so we can adjust the counters properly. The problem with this is that it does not allow delayed refs to be merged, so if you say are defragging an extent with 5k snapshots pointing to it we will thrash the delayed ref lock because we need to go back and manually merge these things together. Instead we want to process quota changes when we know they are going to happen, like when we first allocate an extent, we free a reference for an extent, we add new references etc. This patch accomplishes this by only adding qgroup operations for real ref changes. We only modify the sequence number when we need to lookup roots for bytenrs, this reduces the amount of churn on the sequence number and allows us to merge delayed refs as we add them most of the time. This patch encompasses a bunch of architectural changes 1) qgroup ref operations: instead of tracking qgroup operations through the delayed refs we simply add new ref operations whenever we notice that we need to when we've modified the refs themselves. 2) tree mod seq: we no longer have this separation of major/minor counters. this makes the sequence number stuff much more sane and we can remove some locking that was needed to protect the counter. 3) delayed ref seq: we now read the tree mod seq number and use that as our sequence. This means each new delayed ref doesn't have it's own unique sequence number, rather whenever we go to lookup backrefs we inc the sequence number so we can make sure to keep any new operations from screwing up our world view at that given point. This allows us to merge delayed refs during runtime. With all of these changes the delayed ref stuff is a little saner and the qgroup accounting stuff no longer goes negative in some cases like it was before. Thanks, Signed-off-by: Josef Bacik <jbacik@fb.com> Signed-off-by: Chris Mason <clm@fb.com>	2014-06-09 17:20:48 -07:00
Liu Bo	5dca6eea91	Btrfs: mark mapping with error flag to report errors to userspace According to commit `865ffef379` (fs: fix fsync() error reporting), it's not stable to just check error pages because pages can be truncated or invalidated, we should also mark mapping with error flag so that a later fsync can catch the error. Signed-off-by: Liu Bo <bo.li.liu@oracle.com> Signed-off-by: Chris Mason <clm@fb.com>	2014-06-09 17:20:47 -07:00
Liu Bo	29cc83f69c	Btrfs: fix NULL pointer crash of deleting a seed device Same as normal devices, seed devices should be initialized with fs_info->dev_root as well, otherwise we'll get a NULL pointer crash. Cc: Chris Murphy <lists@colorremedies.com> Reported-by: Chris Murphy <lists@colorremedies.com> Signed-off-by: Liu Bo <bo.li.liu@oracle.com> Signed-off-by: Chris Mason <clm@fb.com>	2014-06-09 17:20:47 -07:00
Wang Shilong	f017f15f7c	Btrfs: fix joining same transaction handle more than twice We hit something like the following function call flows: \|->run_delalloc_range() \|->btrfs_join_transaction() \|->cow_file_range() \|->btrfs_join_transaction() \|->find_free_extent() \|->btrfs_join_transaction() Trace infomation can be seen as: [ 7411.127040] ------------[ cut here ]------------ [ 7411.127060] WARNING: CPU: 0 PID: 11557 at fs/btrfs/transaction.c:383 start_transaction+0x561/0x580 [btrfs]() [ 7411.127079] CPU: 0 PID: 11557 Comm: kworker/u8:9 Tainted: G O 3.13.0+ #4 [ 7411.127080] Hardware name: LENOVO QiTianM4350/ , BIOS F1KT52AUS 05/24/2013 [ 7411.127085] Workqueue: writeback bdi_writeback_workfn (flush-btrfs-5) [ 7411.127092] Call Trace: [ 7411.127097] [<ffffffff815b87b0>] dump_stack+0x45/0x56 [ 7411.127101] [<ffffffff81051ffd>] warn_slowpath_common+0x7d/0xa0 [ 7411.127102] [<ffffffff810520da>] warn_slowpath_null+0x1a/0x20 [ 7411.127109] [<ffffffffa0444fb1>] start_transaction+0x561/0x580 [btrfs] [ 7411.127115] [<ffffffffa0445027>] btrfs_join_transaction+0x17/0x20 [btrfs] [ 7411.127120] [<ffffffffa0431c91>] find_free_extent+0xa21/0xb50 [btrfs] [ 7411.127126] [<ffffffffa0431f68>] btrfs_reserve_extent+0xa8/0x1a0 [btrfs] [ 7411.127131] [<ffffffffa04322ce>] btrfs_alloc_free_block+0xee/0x440 [btrfs] [ 7411.127137] [<ffffffffa043bd6e>] ? btree_set_page_dirty+0xe/0x10 [btrfs] [ 7411.127142] [<ffffffffa041da51>] __btrfs_cow_block+0x121/0x530 [btrfs] [ 7411.127146] [<ffffffffa041dfff>] btrfs_cow_block+0x11f/0x1c0 [btrfs] [ 7411.127151] [<ffffffffa0421b74>] btrfs_search_slot+0x1d4/0x9c0 [btrfs] [ 7411.127157] [<ffffffffa0438567>] btrfs_lookup_file_extent+0x37/0x40 [btrfs] [ 7411.127163] [<ffffffffa0456bfc>] __btrfs_drop_extents+0x16c/0xd90 [btrfs] [ 7411.127169] [<ffffffffa0444ae3>] ? start_transaction+0x93/0x580 [btrfs] [ 7411.127171] [<ffffffff811663e2>] ? kmem_cache_alloc+0x132/0x140 [ 7411.127176] [<ffffffffa041cd9a>] ? btrfs_alloc_path+0x1a/0x20 [btrfs] [ 7411.127182] [<ffffffffa044aa61>] cow_file_range_inline+0x181/0x2e0 [btrfs] [ 7411.127187] [<ffffffffa044aead>] cow_file_range+0x2ed/0x440 [btrfs] [ 7411.127194] [<ffffffffa0464d7f>] ? free_extent_buffer+0x4f/0xb0 [btrfs] [ 7411.127200] [<ffffffffa044b38f>] run_delalloc_nocow+0x38f/0xa60 [btrfs] [ 7411.127207] [<ffffffffa0461600>] ? test_range_bit+0x30/0x180 [btrfs] [ 7411.127212] [<ffffffffa044bd48>] run_delalloc_range+0x2e8/0x350 [btrfs] [ 7411.127219] [<ffffffffa04618f9>] ? find_lock_delalloc_range+0x1a9/0x1e0 [btrfs] [ 7411.127222] [<ffffffff812a1e71>] ? blk_queue_bio+0x2c1/0x330 [ 7411.127228] [<ffffffffa0462ad4>] __extent_writepage+0x2f4/0x760 [btrfs] Here we fix it by avoiding joining transaction again if we have held a transaction handle when allocating chunk in find_free_extent(). Signed-off-by: Wang Shilong <wangsl.fnst@cn.fujitsu.com> Signed-off-by: Chris Mason <clm@fb.com>	2014-06-09 17:20:46 -07:00
Miao Xie	995946dd29	Btrfs: use helpers for last_trans_log_full_commit instead of opencode Signed-off-by: Miao Xie <miaox@cn.fujitsu.com> Signed-off-by: Wang Shilong <wangsl.fnst@cn.fujitsu.com> Signed-off-by: Chris Mason <clm@fb.com>	2014-06-09 17:20:45 -07:00
Filipe Manana	1f21ef0a34	Btrfs: check if items are ordered when a leaf is marked dirty To ease finding bugs during development related to modifying btree leaves in such a way that it makes its items not sorted by key anymore. Since this is an expensive check, it's only enabled if CONFIG_BTRFS_FS_CHECK_INTEGRITY is set, which isn't meant to be enabled for regular users. Signed-off-by: Filipe David Borba Manana <fdmanana@gmail.com> Reviewed-by: David Sterba <dsterba@suse.cz> Signed-off-by: Chris Mason <clm@fb.com>	2014-06-09 17:20:45 -07:00
Filipe Manana	35045bf2fd	Btrfs: don't access non-existent key when csum tree is empty When the csum tree is empty, our leaf (path->nodes[0]) has a number of items equal to 0 and since btrfs_header_nritems() returns an unsigned integer (and so is our local nritems variable) the following comparison always evaluates to false: if (path->slots[0] >= nritems - 1) { As the casting rules lead to: if ((u32)0 >= (u32)4294967295) { This makes us access key at slot paths->slots[0] + 1 (1) of the empty leaf some lines below: btrfs_item_key_to_cpu(path->nodes[0], &found_key, slot); if (found_key.objectid != BTRFS_EXTENT_CSUM_OBJECTID \|\| found_key.type != BTRFS_EXTENT_CSUM_KEY) { found_next = 1; goto insert; } So just don't access such non-existent slot and don't set found_next to 1 when the tree is empty. It's very unlikely we'll get a random key with the objectid and type values above, which is where we could go into trouble. If nritems is 0, just set found_next to 1 anyway as it will make us insert a csum item covering our whole extent (or the whole leaf) when the tree is empty. Signed-off-by: Filipe David Borba Manana <fdmanana@gmail.com> Signed-off-by: Chris Mason <clm@fb.com>	2014-06-09 17:20:44 -07:00
Wang Shilong	de348ee022	Btrfs: make sure there are not any read requests before stopping workers In close_ctree(), after we have stopped all workers,there maybe still some read requests(for example readahead) to submit and this maybe trigger an oops that user reported before: kernel BUG at fs/btrfs/async-thread.c:619! By hacking codes, i can reproduce this problem with one cpu available. We fix this potential problem by invalidating all btree inode pages before stopping all workers. Thanks to Miao for pointing out this problem. Signed-off-by: Wang Shilong <wangsl.fnst@cn.fujitsu.com> Reviewed-by: David Sterba <dsterba@suse.cz> Signed-off-by: Chris Mason <clm@fb.com>	2014-06-09 17:20:43 -07:00
Tsutomu Itoh	59885b3930	Btrfs: fix possible memory leak in btrfs_create_tree() In btrfs_create_tree(), if btrfs_insert_root() fails, we should free root->commit_root. Reported-by: Alex Lyakas <alex@zadarastorage.com> Signed-off-by: Tsutomu Itoh <t-itoh@jp.fujitsu.com> Signed-off-by: Chris Mason <clm@fb.com>	2014-06-09 17:20:42 -07:00
ZhangZhen	776e4aae55	btrfs: remove useless ACL check posix_acl_xattr_set() already does the check, and it's the only way to feed in an ACL from userspace. So the check here is useless, remove it. Signed-off-by: zhang zhen <zhenzhang.zhang@huawei.com> Signed-off-by: Chris Mason <clm@fb.com>	2014-06-09 17:20:42 -07:00
Anand Jain	4d90d28b1c	btrfs: btrfs_rm_device() should zero mirror SB as well This fix will ensure all SB copies on the disk is zeroed when the disk is intentionally removed. This helps to better manage disks in the user land. This version of patch also merges the Zach patch as below. btrfs: don't double brelse on device rm Signed-off-by: Anand Jain <anand.jain@oracle.com> Signed-off-by: Zach Brown <zab@redhat.com> Signed-off-by: Chris Mason <clm@fb.com>	2014-06-09 17:20:41 -07:00
Miao Xie	27cdeb7096	Btrfs: use bitfield instead of integer data type for the some variants in btrfs_root Signed-off-by: Miao Xie <miaox@cn.fujitsu.com> Signed-off-by: Wang Shilong <wangsl.fnst@cn.fujitsu.com> Signed-off-by: Chris Mason <clm@fb.com>	2014-06-09 17:20:40 -07:00
Filipe Manana	f959492fc1	Btrfs: send, fix more issues related to directory renames This is a continuation of the previous changes titled: Btrfs: fix incremental send's decision to delay a dir move/rename Btrfs: part 2, fix incremental send's decision to delay a dir move/rename There's a few more cases where a directory rename/move must be delayed which was previously overlooked. If our immediate ancestor has a lower inode number than ours and it doesn't have a delayed rename/move operation associated to it, it doesn't mean there isn't any non-direct ancestor of our current inode that needs to be renamed/moved before our current inode (i.e. with a higher inode number than ours). So we can't stop the search if our immediate ancestor has a lower inode number than ours, we need to navigate the directory hierarchy upwards until we hit the root or: 1) find an ancestor with an higher inode number that was renamed/moved in the send root too (or already has a pending rename/move registered); 2) find an ancestor that is a new directory (higher inode number than ours and exists only in the send root). Reproducer for case 1) $ mkfs.btrfs -f /dev/sdd $ mount /dev/sdd /mnt $ mkdir -p /mnt/a/b $ mkdir -p /mnt/a/c/d $ mkdir /mnt/a/b/e $ mkdir /mnt/a/c/d/f $ mv /mnt/a/b /mnt/a/c/d/2b $ mkdir /mnt/a/x $ mkdir /mnt/a/y $ btrfs subvolume snapshot -r /mnt /mnt/snap1 $ btrfs send /mnt/snap1 -f /tmp/base.send $ mv /mnt/a/x /mnt/a/y $ mv /mnt/a/c/d/2b/e /mnt/a/c/d/2b/2e $ mv /mnt/a/c/d /mnt/a/h/2d $ mv /mnt/a/c /mnt/a/h/2d/2b/2c $ btrfs subvolume snapshot -r /mnt /mnt/snap2 $ btrfs send -p /mnt/snap1 /mnt/snap2 -f /tmp/incremental.send Simple reproducer for case 2) $ mkfs.btrfs -f /dev/sdd $ mount /dev/sdd /mnt $ mkdir -p /mnt/a/b $ mkdir /mnt/a/c $ mv /mnt/a/b /mnt/a/c/b2 $ mkdir /mnt/a/e $ btrfs subvolume snapshot -r /mnt /mnt/snap1 $ btrfs send /mnt/snap1 -f /tmp/base.send $ mv /mnt/a/c/b2 /mnt/a/e/b3 $ mkdir /mnt/a/e/b3/f $ mkdir /mnt/a/h $ mv /mnt/a/c /mnt/a/e/b3/f/c2 $ mv /mnt/a/e /mnt/a/h/e2 $ btrfs subvolume snapshot -r /mnt /mnt/snap2 $ btrfs send -p /mnt/snap1 /mnt/snap2 -f /tmp/incremental.send Another simple reproducer for case 2) $ mkfs.btrfs -f /dev/sdd $ mount /dev/sdd /mnt $ mkdir -p /mnt/a/b $ mkdir /mnt/a/c $ mkdir /mnt/a/b/d $ mkdir /mnt/a/c/e $ btrfs subvolume snapshot -r /mnt /mnt/snap1 $ btrfs send /mnt/snap1 -f /tmp/base.send $ mkdir /mnt/a/b/d/f $ mkdir /mnt/a/b/g $ mv /mnt/a/c/e /mnt/a/b/g/e2 $ mv /mnt/a/c /mnt/a/b/d/f/c2 $ mv /mnt/a/b/d/f /mnt/a/b/g/e2/f2 $ btrfs subvolume snapshot -r /mnt /mnt/snap2 $ btrfs send -p /mnt/snap1 /mnt/snap2 -f /tmp/incremental.send More complex reproducer for case 2) $ mkfs.btrfs -f /dev/sdd $ mount /dev/sdd /mnt $ mkdir -p /mnt/a/b $ mkdir -p /mnt/a/c/d $ mkdir /mnt/a/b/e $ mkdir /mnt/a/c/d/f $ mv /mnt/a/b /mnt/a/c/d/2b $ mkdir /mnt/a/x $ mkdir /mnt/a/y $ btrfs subvolume snapshot -r /mnt /mnt/snap1 $ btrfs send /mnt/snap1 -f /tmp/base.send $ mv /mnt/a/x /mnt/a/y $ mv /mnt/a/c/d/2b/e /mnt/a/c/d/2b/2e $ mv /mnt/a/c/d /mnt/a/h/2d $ mv /mnt/a/c /mnt/a/h/2d/2b/2c $ btrfs subvolume snapshot -r /mnt /mnt/snap2 $ btrfs send -p /mnt/snap1 /mnt/snap2 -f /tmp/incremental.send For both cases the incremental send would enter an infinite loop when building path strings. While solving these cases, this change also re-implements the code to detect when directory moves/renames should be delayed. Instead of dealing with several specific cases separately, it's now more generic handling all cases with a simple detection algorithm and if when applying a delayed move/rename there's a path loop detected, it further delays the move/rename registering a new ancestor inode as the dependency inode (so our rename happens after that ancestor is renamed). Tests for these cases is being added to xfstests too. Signed-off-by: Filipe David Borba Manana <fdmanana@gmail.com> Signed-off-by: Chris Mason <clm@fb.com>	2014-06-09 17:20:40 -07:00
Filipe Manana	a10c40766c	Btrfs: send, remove dead code from __get_cur_name_and_parent Signed-off-by: Filipe David Borba Manana <fdmanana@gmail.com> Signed-off-by: Chris Mason <clm@fb.com>	2014-06-09 17:20:39 -07:00

1 2 3 4 5 ...

36882 Commits