49154 Commits

Author SHA1 Message Date
Jan Kara
5052b069ac jbd2: fix dbench4 performance regression for 'nobarrier' mounts
Commit b685d3d65ac7 "block: treat REQ_FUA and REQ_PREFLUSH as
synchronous" removed REQ_SYNC flag from WRITE_FUA implementation. Since
JBD2 strips REQ_FUA and REQ_FLUSH flags from submitted IO when the
filesystem is mounted with nobarrier mount option, journal superblock
writes ended up being async writes after this patch and that caused
heavy performance regression for dbench4 benchmark with high number of
processes. In my test setup with HP RAID array with non-volatile write
cache and 32 GB ram, dbench4 runs with 8 processes regressed by ~25%.

Fix the problem by making sure journal superblock writes are always
treated as synchronous since they generally block progress of the
journalling machinery and thus the whole filesystem.

Fixes: b685d3d65ac791406e0dfd8779cc9b3707fea5a3
CC: stable@vger.kernel.org
Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2017-04-29 21:07:30 -04:00
Jan Kara
c52c47e4b4 jbd2: Fix lockdep splat with generic/270 test
I've hit a lockdep splat with generic/270 test complaining that:

3216.fsstress.b/3533 is trying to acquire lock:
 (jbd2_handle){++++..}, at: [<ffffffff813152e0>] jbd2_log_wait_commit+0x0/0x150

but task is already holding lock:
 (jbd2_handle){++++..}, at: [<ffffffff8130bd3b>] start_this_handle+0x35b/0x850

The underlying problem is that jbd2_journal_force_commit_nested()
(called from ext4_should_retry_alloc()) may get called while a
transaction handle is started. In such case it takes care to not wait
for commit of the running transaction (which would deadlock) but only
for a commit of a transaction that is already committing (which is safe
as that doesn't wait for any filesystem locks).

In fact there are also other callers of jbd2_log_wait_commit() that take
care to pass tid of a transaction that is already committing and for
those cases, the lockdep instrumentation is too restrictive and leading
to false positive reports. Fix the problem by calling
jbd2_might_wait_for_commit() from jbd2_log_wait_commit() only if the
transaction isn't already committing.

Fixes: 1eaa566d368b214d99cbb973647c1b0b8102a9ae
Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2017-04-29 20:12:16 -04:00
Mark Charlebois
9280cdd6fe fs: compat: Remove warning from COMPATIBLE_IOCTL
cmd in COMPATIBLE_IOCTL is always a u32, so cast it so there isn't a
warning about an overflow in XFORM.

From: Mark Charlebois <charlebm@gmail.com>
Signed-off-by: Mark Charlebois <charlebm@gmail.com>
Signed-off-by: Behan Webster <behanw@converseincode.com>
Signed-off-by: Matthias Kaehlcke <mka@chromium.org>
Acked-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2017-04-29 17:47:19 -04:00
Al Viro
0b33540f9d remove pointless extern of atime_need_update_rcu()
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2017-04-29 17:42:25 -04:00
Trond Myklebust
1f18b82c34 pNFS: Ensure we commit the layout if it has been invalidated
If the layout is being invalidated on the server, then we must
invoke nfs_commit_inode() to ensure any commits to the DS get
cleared out.

Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2017-04-29 11:29:30 -04:00
Trond Myklebust
722f0b8911 pNFS: Don't send COMMITs to the DSes if the server invalidated our layout
If the layout was invalidated, then assume we should requeue all the
pending writes for the DS in question.

Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2017-04-29 11:29:24 -04:00
Trond Myklebust
37f8aa16da pNFS/flexfiles: Fix up the ff_layout_write_pagelist failure path
If the attempt to write through pNFS fails, we need to use the same
failure semantics as for the read path: If the FF_FLAGS_NO_IO_THRU_MDS
flag is set or we have sufficient valid DSes, then we must retry through
pNFS

Fixes: d67ae825a59d ("pnfs/flexfiles: Add the FlexFile Layout Driver")
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2017-04-29 00:02:37 -04:00
Takashi Iwai
d66bb1607e proc: Fix unbalanced hard link numbers
proc_create_mount_point() forgot to increase the parent's nlink, and
it resulted in unbalanced hard link numbers, e.g. /proc/fs shows one
less than expected.

Fixes: eb6d38d5427b ("proc: Allow creating permanently empty directories...")
Cc: stable@vger.kernel.org
Reported-by: Tristan Ye <tristan.ye@suse.com>
Signed-off-by: Takashi Iwai <tiwai@suse.de>
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
2017-04-28 21:05:26 -05:00
Linus Torvalds
28b2013587 Merge branch 'for-linus-4.11' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs
Pull btrfs fix from Chris Mason:
 "We have one more fix for btrfs.

  This gets rid of a new WARN_ON from rc1 that ended up making more
  noise than we really want. The larger fix for the underflow got
  delayed a bit and it's better for now to put it under
  CONFIG_BTRFS_DEBUG"

* 'for-linus-4.11' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs:
  btrfs: qgroup: move noisy underflow warning to debugging build
2017-04-28 10:13:17 -07:00
Trond Myklebust
bdebfccd0e pNFS: Ensure we check layout validity before marking it for return
pnfs_error_mark_layout_for_return needs to check that the layout is
valid before calling pnfs_set_plh_return_info().

Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2017-04-28 13:07:01 -04:00
Olga Kornievskaia
88bd4f8629 NFS4.1 handle interrupted slot reuse from ERR_DELAY
If the RPC slot was interrupted and server replied to the next
operation on the "reused" slot with ERR_DELAY, don't clear out
the "interrupted" flag until we properly recover.

Signed-off-by: Olga Kornievskaia <kolga@netapp.com>
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2017-04-28 13:07:00 -04:00
Pan Bian
4edabfd7d0 NFSv4: check return value of xdr_inline_decode
Function xdr_inline_decode() will return a NULL pointer if the input
buffer does not have long enough buffer to decode nbytes of data.
However, in function decode_op_map(), the return value of
xdr_inline_decode() is not validated before it is used. This patch adds
a check to the return value of xdr_inline_decode().

Signed-off-by: Pan Bian <bianpan2016@163.com>
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2017-04-28 13:06:59 -04:00
Artem Savkov
209aa23083 nfs/filelayout: fix NULL pointer dereference in fl_pnfs_update_layout()
Calling pnfs_put_lset on an IS_ERR pointer results in a NULL pointer
dereference like the one below. At the same time the check of retvalue
of filelayout_check_deviceid() sets lseg to error, but does not free it
before that.

[ 3000.636161] BUG: unable to handle kernel NULL pointer dereference at 000000000000003c
[ 3000.636970] IP: pnfs_put_lseg+0x29/0x100 [nfsv4]
[ 3000.637420] PGD 4f23b067
[ 3000.637421] PUD 4a0f4067
[ 3000.637679] PMD 0
[ 3000.637937]
[ 3000.638287] Oops: 0000 [#1] SMP
[ 3000.638591] Modules linked in: nfs_layout_nfsv41_files nfsv3 nfnetlink_queue nfnetlink_log nfnetlink bluetooth rfkill rpcsec_gss_krb5 nfsv4 nfs fscache binfmt_misc arc4 md4 nls_utf8 cifs ccm dns_resolver rpcrdma ib_isert iscsi_target_mod ib_iser rdma_cm iw_cm libiscsi scsi_transport_iscsi ib_srpt target_core_mod ib_srp scsi_transport_srp ib_ipoib ib_ucm ib_uverbs ib_umad ib_cm ib_core nls_koi8_u nls_cp932 ts_kmp nf_conntrack_ipv4 nf_defrag_ipv4 nf_conntrack crct10dif_pclmul crc32_pclmul ghash_clmulni_intel pcspkr virtio_balloon ppdev virtio_rng parport_pc i2c_piix4 parport acpi_cpufreq nfsd auth_rpcgss nfs_acl lockd grace sunrpc xfs libcrc32c ata_generic pata_acpi virtio_blk virtio_net cirrus drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops crc32c_intel ata_piix ttm libata drm serio_raw
[ 3000.645245]  i2c_core virtio_pci virtio_ring virtio floppy dm_mirror dm_region_hash dm_log dm_mod [last unloaded: xt_u32]
[ 3000.646360] CPU: 1 PID: 26402 Comm: date Not tainted 4.11.0-rc7.1.el7.test.x86_64 #1
[ 3000.647092] Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2011
[ 3000.647638] task: ffff8800415ada00 task.stack: ffffc90000ff0000
[ 3000.648207] RIP: 0010:pnfs_put_lseg+0x29/0x100 [nfsv4]
[ 3000.648696] RSP: 0018:ffffc90000ff39b8 EFLAGS: 00010246
[ 3000.649193] RAX: 0000000000000000 RBX: fffffffffffffff4 RCX: 00000000000d43be
[ 3000.649859] RDX: 00000000000d43bd RSI: 0000000000000000 RDI: fffffffffffffff4
[ 3000.650530] RBP: ffffc90000ff39d8 R08: 000000000001e320 R09: ffffffffa05c35ce
[ 3000.651203] R10: ffff88007fd1e320 R11: ffffea0001283d80 R12: 0000000001400040
[ 3000.651875] R13: ffff88004f77d9f0 R14: ffffc90000ff3cd8 R15: ffff8800417ade00
[ 3000.652546] FS:  00007fac4d5cd740(0000) GS:ffff88007fd00000(0000) knlGS:0000000000000000
[ 3000.653304] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 3000.653849] CR2: 000000000000003c CR3: 000000004f080000 CR4: 00000000000406e0
[ 3000.654527] Call Trace:
[ 3000.654771]  fl_pnfs_update_layout.constprop.20+0x10c/0x150 [nfs_layout_nfsv41_files]
[ 3000.655505]  filelayout_pg_init_write+0x21d/0x270 [nfs_layout_nfsv41_files]
[ 3000.656195]  __nfs_pageio_add_request+0x11c/0x490 [nfs]
[ 3000.656698]  nfs_pageio_add_request+0xac/0x260 [nfs]
[ 3000.657180]  nfs_do_writepage+0x109/0x2e0 [nfs]
[ 3000.657616]  nfs_writepages_callback+0x16/0x30 [nfs]
[ 3000.658096]  write_cache_pages+0x26f/0x510
[ 3000.658495]  ? nfs_do_writepage+0x2e0/0x2e0 [nfs]
[ 3000.658946]  ? _raw_spin_unlock_bh+0x1e/0x20
[ 3000.659357]  ? wb_wakeup_delayed+0x5f/0x70
[ 3000.659748]  ? __mark_inode_dirty+0x2eb/0x360
[ 3000.660170]  nfs_writepages+0x84/0xd0 [nfs]
[ 3000.660575]  ? nfs_updatepage+0x571/0xb70 [nfs]
[ 3000.661012]  do_writepages+0x1e/0x30
[ 3000.661358]  __filemap_fdatawrite_range+0xc6/0x100
[ 3000.661819]  filemap_write_and_wait_range+0x41/0x90
[ 3000.662292]  nfs_file_fsync+0x34/0x1f0 [nfs]
[ 3000.662704]  vfs_fsync_range+0x3d/0xb0
[ 3000.663065]  vfs_fsync+0x1c/0x20
[ 3000.663385]  nfs4_file_flush+0x57/0x80 [nfsv4]
[ 3000.663813]  filp_close+0x2f/0x70
[ 3000.664132]  __close_fd+0x9a/0xc0
[ 3000.664453]  SyS_close+0x23/0x50
[ 3000.664785]  do_syscall_64+0x67/0x180
[ 3000.665162]  entry_SYSCALL64_slow_path+0x25/0x25
[ 3000.665600] RIP: 0033:0x7fac4d0e1e90
[ 3000.665946] RSP: 002b:00007ffd54e90c88 EFLAGS: 00000246 ORIG_RAX: 0000000000000003
[ 3000.666679] RAX: ffffffffffffffda RBX: 00007fac4d3b5400 RCX: 00007fac4d0e1e90
[ 3000.667349] RDX: 0000000000000000 RSI: 00007fac4d5d9000 RDI: 0000000000000001
[ 3000.668031] RBP: 0000000000000000 R08: 00007fac4d3b6a00 R09: 00007fac4d5cd740
[ 3000.668709] R10: 00007ffd54e909e0 R11: 0000000000000246 R12: 0000000000000000
[ 3000.669385] R13: 00007fac4d3b5e80 R14: 0000000000000000 R15: 0000000000000000
[ 3000.670061] Code: 00 00 66 66 66 66 90 55 48 85 ff 48 89 e5 41 56 41 55 41 54 53 48 89 fb 0f 84 97 00 00 00 f6 05 16 8f bc ff 10 0f 85 a6 00 00 00 <4c> 8b 63 48 48 8d 7b 38 49 8b 84 24 90 00 00 00 4c 8d a8 88 00
[ 3000.671831] RIP: pnfs_put_lseg+0x29/0x100 [nfsv4] RSP: ffffc90000ff39b8
[ 3000.672462] CR2: 000000000000003c

Signed-off-by: Artem Savkov <asavkov@redhat.com>
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2017-04-28 13:06:59 -04:00
Brian Foster
e20c8a517f xfs: wait on new inodes during quotaoff dquot release
The quotaoff operation has a race with inode allocation that results
in a livelock. An inode allocation that occurs before the quota
status flags are updated acquires the appropriate dquots for the
inode via xfs_qm_vop_dqalloc(). It then inserts the XFS_INEW inode
into the perag radix tree, sometime later attaches the dquots to the
inode and finally clears the XFS_INEW flag. Quotaoff expects to
release the dquots from all inodes in the filesystem via
xfs_qm_dqrele_all_inodes(). This invokes the AG inode iterator,
which skips inodes in the XFS_INEW state because they are not fully
constructed. If the scan occurs after dquots have been attached to
an inode, but before XFS_INEW is cleared, the newly allocated inode
will continue to hold a reference to the applicable dquots. When
quotaoff invokes xfs_qm_dqpurge_all(), the reference count of those
dquot(s) remain elevated and the dqpurge scan spins indefinitely.

To address this problem, update the xfs_qm_dqrele_all_inodes() scan
to wait on inodes marked on the XFS_INEW state. We wait on the
inodes explicitly rather than skip and retry to avoid continuous
retry loops due to a parallel inode allocation workload. Since
quotaoff updates the quota state flags and uses a synchronous
transaction before the dqrele scan, and dquots are attached to
inodes after radix tree insertion iff quota is enabled, one INEW
waiting pass through the AG guarantees that the scan has processed
all inodes that could possibly hold dquot references.

Reported-by: Eryu Guan <eguan@redhat.com>
Signed-off-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2017-04-28 08:11:08 -07:00
Brian Foster
ae2c4ac2dd xfs: update ag iterator to support wait on new inodes
The AG inode iterator currently skips new inodes as such inodes are
inserted into the inode radix tree before they are fully
constructed. Certain contexts require the ability to wait on the
construction of new inodes, however. The fs-wide dquot release from
the quotaoff sequence is an example of this.

Update the AG inode iterator to support the ability to wait on
inodes flagged with XFS_INEW upon request. Create a new
xfs_inode_ag_iterator_flags() interface and support a set of
iteration flags to modify the iteration behavior. When the
XFS_AGITER_INEW_WAIT flag is set, include XFS_INEW flags in the
radix tree inode lookup and wait on them before the callback is
executed.

Signed-off-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2017-04-28 08:11:08 -07:00
Brian Foster
756baca27f xfs: support ability to wait on new inodes
Inodes that are inserted into the perag tree but still under
construction are flagged with the XFS_INEW bit. Most contexts either
skip such inodes when they are encountered or have the ability to
handle them.

The runtime quotaoff sequence introduces a context that must wait
for construction of such inodes to correctly ensure that all dquots
in the fs are released. In anticipation of this, support the ability
to wait on new inodes. Wake the appropriate bit when XFS_INEW is
cleared.

Signed-off-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2017-04-28 08:11:08 -07:00
Amir Goldstein
8f720d9f89 xfs: publish UUID in struct super_block
Copy the uuid of the filesystem to struct super_block s_uuid field,
as several other filesystems already do.  Copy regardless of the nouuid
mount option, because other filesystems also do not guaranty uniqueness
of the s_uuid field in super_block struct.

Signed-off-by: Amir Goldstein <amir73il@gmail.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2017-04-28 08:10:53 -07:00
NeilBrown
a6f74e80f2 cifs: don't check for failure from mempool_alloc()
mempool_alloc() cannot fail if the gfp flags allow it to
sleep, and both GFP_FS allows for sleeping.

So these tests of the return value from mempool_alloc()
cannot be needed.

Signed-off-by: NeilBrown <neilb@suse.com>
Signed-off-by: Steve French <smfrench@gmail.com>
2017-04-28 07:56:33 -05:00
Sachin Prabhu
7d0c234fd2 Do not return number of bytes written for ioctl CIFS_IOC_COPYCHUNK_FILE
commit 620d8745b35d ("Introduce cifs_copy_file_range()") changes the
behaviour of the cifs ioctl call CIFS_IOC_COPYCHUNK_FILE. In case of
successful writes, it now returns the number of bytes written. This
return value is treated as an error by the xfstest cifs/001. Depending
on the errno set at that time, this may or may not result in the test
failing.

The patch fixes this by setting the return value to 0 in case of
successful writes.

Fixes: commit 620d8745b35d ("Introduce cifs_copy_file_range()")
Reported-by: Eryu Guan <eguan@redhat.com>
Signed-off-by: Sachin Prabhu <sprabhu@redhat.com>
Acked-by: Pavel Shilovsky <pshilov@microsoft.com>
Cc: stable@vger.kernel.org
Signed-off-by: Steve French <smfrench@gmail.com>
2017-04-28 07:56:33 -05:00
Sachin Prabhu
cd8c42968e Fix match_prepath()
Incorrect return value for shares not using the prefix path means that
we will never match superblocks for these shares.

Fixes: commit c1d8b24d1819 ("Compare prepaths when comparing superblocks")
Cc: stable@vger.kernel.org
Signed-off-by: Sachin Prabhu <sprabhu@redhat.com>
Reviewed-by: Pavel Shilovsky <pshilov@microsoft.com>
Signed-off-by: Steve French <smfrench@gmail.com>
2017-04-28 07:54:54 -05:00
Kees Cook
3a7d2fd16c pstore: Solve lockdep warning by moving inode locks
Lockdep complains about a possible deadlock between mount and unlink
(which is technically impossible), but fixing this improves possible
future multiple-backend support, and keeps locking in the right order.

The lockdep warning could be triggered by unlinking a file in the
pstore filesystem:

  -> #1 (&sb->s_type->i_mutex_key#14){++++++}:
         lock_acquire+0xc9/0x220
         down_write+0x3f/0x70
         pstore_mkfile+0x1f4/0x460
         pstore_get_records+0x17a/0x320
         pstore_fill_super+0xa4/0xc0
         mount_single+0x89/0xb0
         pstore_mount+0x13/0x20
         mount_fs+0xf/0x90
         vfs_kern_mount+0x66/0x170
         do_mount+0x190/0xd50
         SyS_mount+0x90/0xd0
         entry_SYSCALL_64_fastpath+0x1c/0xb1

  -> #0 (&psinfo->read_mutex){+.+.+.}:
         __lock_acquire+0x1ac0/0x1bb0
         lock_acquire+0xc9/0x220
         __mutex_lock+0x6e/0x990
         mutex_lock_nested+0x16/0x20
         pstore_unlink+0x3f/0xa0
         vfs_unlink+0xb5/0x190
         do_unlinkat+0x24c/0x2a0
         SyS_unlinkat+0x16/0x30
         entry_SYSCALL_64_fastpath+0x1c/0xb1

  Possible unsafe locking scenario:

        CPU0                    CPU1
        ----                    ----
   lock(&sb->s_type->i_mutex_key#14);
                                lock(&psinfo->read_mutex);
                                lock(&sb->s_type->i_mutex_key#14);
   lock(&psinfo->read_mutex);

Reported-by: Marta Lofstedt <marta.lofstedt@intel.com>
Reported-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Kees Cook <keescook@chromium.org>
Acked-by: Namhyung Kim <namhyung@kernel.org>
2017-04-27 20:35:34 -07:00
Trond Myklebust
ed6473ddc7 NFSv4: Fix callback server shutdown
We want to use kthread_stop() in order to ensure the threads are
shut down before we tear down the nfs_callback_info in nfs_callback_down.

Tested-and-reviewed-by: Kinglong Mee <kinglongmee@gmail.com>
Reported-by: Kinglong Mee <kinglongmee@gmail.com>
Fixes: bb6aeba736ba9 ("NFSv4.x: Switch to using svc_set_num_threads()...")
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2017-04-27 18:00:16 -04:00
Kinglong Mee
df807fffaa NFSv4.x/callback: Create the callback service through svc_create_pooled
As the comments for svc_set_num_threads() said,
" Destroying threads relies on the service threads filling in
rqstp->rq_task, which only the nfs ones do.  Assumes the serv
has been created using svc_create_pooled()."

If creating service through svc_create(), the svc_pool_map_put()
will be called in svc_destroy(), but the pool map isn't used.
So that, the reference of pool map will be drop, the next using
of pool map will get a zero npools.

[  137.992130] divide error: 0000 [#1] SMP
[  137.992148] Modules linked in: nfsd(E) nfsv4 nfs fscache fuse tun bridge stp llc ip_set nfnetlink vmw_vsock_vmci_transport vsock snd_seq_midi snd_seq_midi_event vmw_balloon coretemp crct10dif_pclmul crc32_pclmul ppdev ghash_clmulni_intel intel_rapl_perf joydev snd_ens1371 gameport snd_ac97_codec ac97_bus snd_seq snd_pcm snd_rawmidi snd_timer snd_seq_device snd soundcore parport_pc parport nfit acpi_cpufreq tpm_tis tpm_tis_core tpm vmw_vmci i2c_piix4 shpchp auth_rpcgss nfs_acl lockd(E) grace sunrpc(E) xfs libcrc32c vmwgfx drm_kms_helper ttm crc32c_intel drm e1000 mptspi scsi_transport_spi serio_raw mptscsih mptbase ata_generic pata_acpi [last unloaded: nfsd]
[  137.992336] CPU: 0 PID: 4514 Comm: rpc.nfsd Tainted: G            E   4.11.0-rc8+ #536
[  137.992777] Hardware name: VMware, Inc. VMware Virtual Platform/440BX Desktop Reference Platform, BIOS 6.00 07/02/2015
[  137.993757] task: ffff955984101d00 task.stack: ffff9873c2604000
[  137.994231] RIP: 0010:svc_pool_for_cpu+0x2b/0x80 [sunrpc]
[  137.994768] RSP: 0018:ffff9873c2607c18 EFLAGS: 00010246
[  137.995227] RAX: 0000000000000000 RBX: ffff95598376f000 RCX: 0000000000000002
[  137.995673] RDX: 0000000000000000 RSI: 0000000000000000 RDI: ffff9559944aec00
[  137.996156] RBP: ffff9873c2607c18 R08: ffff9559944aec28 R09: 0000000000000000
[  137.996609] R10: 0000000001080002 R11: 0000000000000000 R12: ffff95598376f010
[  137.997063] R13: ffff95598376f018 R14: ffff9559944aec28 R15: ffff9559944aec00
[  137.997584] FS:  00007f755529eb40(0000) GS:ffff9559bb600000(0000) knlGS:0000000000000000
[  137.998048] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  137.998548] CR2: 000055f3aecd9660 CR3: 0000000084290000 CR4: 00000000001406f0
[  137.999052] Call Trace:
[  137.999517]  svc_xprt_do_enqueue+0xef/0x260 [sunrpc]
[  138.000028]  svc_xprt_received+0x47/0x90 [sunrpc]
[  138.000487]  svc_add_new_perm_xprt+0x76/0x90 [sunrpc]
[  138.000981]  svc_addsock+0x14b/0x200 [sunrpc]
[  138.001424]  ? recalc_sigpending+0x1b/0x50
[  138.001860]  ? __getnstimeofday64+0x41/0xd0
[  138.002346]  ? do_gettimeofday+0x29/0x90
[  138.002779]  write_ports+0x255/0x2c0 [nfsd]
[  138.003202]  ? _copy_from_user+0x4e/0x80
[  138.003676]  ? write_recoverydir+0x100/0x100 [nfsd]
[  138.004098]  nfsctl_transaction_write+0x48/0x80 [nfsd]
[  138.004544]  __vfs_write+0x37/0x160
[  138.004982]  ? selinux_file_permission+0xd7/0x110
[  138.005401]  ? security_file_permission+0x3b/0xc0
[  138.005865]  vfs_write+0xb5/0x1a0
[  138.006267]  SyS_write+0x55/0xc0
[  138.006654]  entry_SYSCALL_64_fastpath+0x1a/0xa9
[  138.007071] RIP: 0033:0x7f7554b9dc30
[  138.007437] RSP: 002b:00007ffc9f92c788 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
[  138.007807] RAX: ffffffffffffffda RBX: 0000000000000002 RCX: 00007f7554b9dc30
[  138.008168] RDX: 0000000000000002 RSI: 00005640cd536640 RDI: 0000000000000003
[  138.008573] RBP: 00007ffc9f92c780 R08: 0000000000000001 R09: 0000000000000002
[  138.008918] R10: 0000000000000064 R11: 0000000000000246 R12: 0000000000000004
[  138.009254] R13: 00005640cdbf77a0 R14: 00005640cdbf7720 R15: 00007ffc9f92c238
[  138.009610] Code: 0f 1f 44 00 00 48 8b 87 98 00 00 00 55 48 89 e5 48 83 78 08 00 74 10 8b 05 07 42 02 00 83 f8 01 74 40 83 f8 02 74 19 31 c0 31 d2 <f7> b7 88 00 00 00 5d 89 d0 48 c1 e0 07 48 03 87 90 00 00 00 c3
[  138.010664] RIP: svc_pool_for_cpu+0x2b/0x80 [sunrpc] RSP: ffff9873c2607c18
[  138.011061] ---[ end trace b3468224cafa7d11 ]---

Signed-off-by: Kinglong Mee <kinglongmee@gmail.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2017-04-27 17:59:00 -04:00
Geliang Tang
3509d048c8 pstore: Remove unused vmalloc.h in pmsg
Since the vmalloc code has been removed from write_pmsg() in the commit
"5bf6d1b pstore/pmsg: drop bounce buffer", remove the unused header
vmalloc.h.

Signed-off-by: Geliang Tang <geliangtang@gmail.com>
Signed-off-by: Kees Cook <keescook@chromium.org>
2017-04-27 14:48:59 -07:00
Chris Mason
bce19f9d23 Merge branch 'for-chris-4.12' of git://git.kernel.org/pub/scm/linux/kernel/git/fdmanana/linux into for-linus-4.12 2017-04-27 14:13:09 -07:00
Linus Torvalds
8b5d11e4b0 Thanks to Ari Kauppi and Tuomas Haanpää at Synopsis for spotting bugs in
our NFSv2/v3 xdr code that could crash the server or leak memory.
 -----BEGIN PGP SIGNATURE-----
 
 iQIcBAABAgAGBQJZAlLrAAoJECebzXlCjuG+lb8P/idTu9rLGaU1VYPInrdoXru0
 iPY+p5inmGSYW2MfCGlS7disaCACgzPBKVqKjeNB1hfHn2JZCrfeBd0XvBYlc7TH
 JYmlHKSBjN/ZfnYMl0WrlVUCZstt4JVmxWjO3sZTCL3nImEbGA7d13yBDagxISWh
 gM9wOOiJwR5lzT/W3MezNCYj4n27/vVhODMP+Qhy0rpK08dBORIH0Bi7hwtVgdD9
 cMVIXMbujmJrCz0Uhbo/DhoItcePRBrCLTXdY6WEeMmHxnXlyaA2XuA0EjjZHuMs
 +BsbAOsNy1BxY1c8Z2ignYgRymUvUBHeiJeIGZLbyKKM2OJMtE4BoqlSWjx08P3Y
 hTwTuaw8u7uxfAsvxepukZoonWj/uVY5tP2Hyq+K6CIKKJpB7vj+c3QGYD3dNnUu
 zDl/LG3ayZgpPXqfjKRnSzZE+St1/IwDnvaM2WN2B1mkuerVr8qDGq6xd4kB0QKX
 BcEKiwcb2ewfPaLlVnSXz6Wbuh2pB42BObJjC3qbOgMvQ7SBUM0UcZmFpJJ20uCR
 BX20aFzB/GHcd6fTRHpDrAxB4XGdVG/8Da5Ki2WhRnmmaeSXushhiKWImujY9B6i
 s4mZHu4gGGJdLzOT5u2HZl93STevriL70SA9nZPhPLQycnJMUAO0buU9UcNI7wXR
 GVu2F3IHd+anxDtDwOv4
 =Glhl
 -----END PGP SIGNATURE-----

Merge tag 'nfsd-4.11-3' of git://linux-nfs.org/~bfields/linux

Pull nfsd fixes from Bruce Fields:
 "Thanks to Ari Kauppi and Tuomas Haanpää at Synopsis for spotting bugs
  in our NFSv2/v3 xdr code that could crash the server or leak memory"

* tag 'nfsd-4.11-3' of git://linux-nfs.org/~bfields/linux:
  nfsd: stricter decoding of write-like NFSv2/v3 ops
  nfsd4: minor NFSv2/v3 write decoding cleanup
  nfsd: check for oversized NFSv2/v3 arguments
2017-04-27 13:39:19 -07:00
Linus Torvalds
19ac447420 A fix for a kernel stack overflow bug in ceph setattr code, marked for
stable.
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v2
 
 iQEcBAABCAAGBQJZAhJLAAoJEEp/3jgCEfOLw4gH/ia+bMzmsnkYtjMfxQfCh0ia
 MHi7JS/YcAej/o71c/tvWlTU7mRbmvUCVSAcishRNytEBNGL8YzkP12vMOp/5Vdx
 kKk6yDWn9z0mR5/YdBKaE8ziM5Umdy+zLqeL4yuxyhtbxKFGUPG4txJKS5WD80yU
 Ld/toF2fL3y/JEs+s1pd5G+DPhEhEm2hFf56/VI6N7y08CHJgTqHB3GJ3ZnuUbnU
 UhSvNR9skdVirObI8jt3oWIix8uAGq5+6MjVeTqXo75Qng5sdBGZ8S2agxXbM3j7
 Hu8h/1bhKyPCUzAXnOyGcZeR+5DQolKmlKLhogbT4I9X4YC2ie4Djg0bmFHscWI=
 =8aUa
 -----END PGP SIGNATURE-----

Merge tag 'ceph-for-4.11-rc9' of git://github.com/ceph/ceph-client

Pull ceph fix from Ilya Dryomov:
 "A fix for a kernel stack overflow bug in ceph setattr code, marked for
  stable"

* tag 'ceph-for-4.11-rc9' of git://github.com/ceph/ceph-client:
  ceph: fix recursion between ceph_set_acl() and __ceph_setattr()
2017-04-27 11:38:05 -07:00
Linus Torvalds
f56fc7bdaa Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs
Pull vfs fixes from Al Viro:

 - fix orangefs handling of faults on write() - I'd missed that one back
   when orangefs was going through review.

 - readdir counterpart of "9p: cope with bogus responses from server in
   p9_client_{read,write}" - server might be lying or broken, and we'd
   better not overrun the kmalloc'ed buffer we are copying the results
   into.

 - NFS O_DIRECT read/write can leave iov_iter advanced by too much;
   that's what had been causing iov_iter_pipe() warnings davej had been
   seeing.

 - statx_timestamp.tv_nsec type fix (s32 -> u32). That one really should
   go in before 4.11.

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
  uapi: change the type of struct statx_timestamp.tv_nsec to unsigned
  fix nfs O_DIRECT advancing iov_iter too much
  p9_client_readdir() fix
  orangefs_bufmap_copy_from_iovec(): fix EFAULT handling
2017-04-27 11:09:37 -07:00
Lukas Czerner
3c3781951c xfs: Allow user to kill fstrim process
fstrim can take really long time on big, slow device or on file system
with a lots of allocation groups. Currently there is no way for the user
to cancell the operation. This patch makes it possible for the user to
kill fstrim pocess by adding the check for fatal_signal_pending() in
xfs_trim_extents().

Signed-off-by: Lukas Czerner <lczerner@redhat.com>
Reported-by: Zdenek Kabelac <zkabelac@redhat.com>
Reviewed-by: Eric Sandeen <sandeen@redhat.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2017-04-27 10:45:34 -07:00
Michael Kerrisk (man-pages)
59372bbf3a statx: correct error handling of NULL pathname
The change in commit 1e2f82d1e9d1 ("statx: Kill fd-with-NULL-path
support in favour of AT_EMPTY_PATH") to error on a NULL pathname to
statx() is inconsistent.

It results in the error EINVAL for a NULL pathname.  Other system calls
with similar APIs (fchownat(), fstatat(), linkat()), return EFAULT.

The solution is simply to remove the EINVAL check.  As I already pointed
out in [1], user_path_at*() and filename_lookup() will handle the NULL
pathname as per the other APIs, to correctly produce the error EFAULT.

[1] https://lkml.org/lkml/2017/4/26/561

Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
Cc: David Howells <dhowells@redhat.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Eric Sandeen <sandeen@sandeen.net>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-04-27 10:45:09 -07:00
Christoph Hellwig
629e014bb8 fs: completely ignore unknown open flags
Currently we just stash anything we got into file->f_flags, and the
report it in fcntl(F_GETFD).  This patch just clears out all unknown
flags so that we don't pass them to the fs or report them.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2017-04-27 05:13:04 -04:00
Christoph Hellwig
80f18379a7 fs: add a VALID_OPEN_FLAGS
Add a central define for all valid open flags, and use it in the uniqueness
check.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2017-04-27 05:13:04 -04:00
Eric Biggers
020c2833db fs: remove _submit_bh()
_submit_bh() allowed submitting a buffer_head for I/O using custom
bio_flags.  It used to be used by jbd to set BIO_SNAP_STABLE, introduced
by commit 713685111774 ("mm: make snapshotting pages for stable writes a
per-bio operation").  However, the code and flag has since been removed
and no _submit_bh() users remain.

These days, bio_flags are mostly used internally by the block layer to
track the state of bio's.  As such, it doesn't really make sense for
filesystems to use them instead of op_flags when wanting special
behavior for block requests.

Therefore, remove _submit_bh() and trim the bio_flags argument from
submit_bh_wbc().

Cc: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2017-04-26 23:54:06 -04:00
Eric Biggers
cda37124f4 fs: constify tree_descr arrays passed to simple_fill_super()
simple_fill_super() is passed an array of tree_descr structures which
describe the files to create in the filesystem's root directory.  Since
these arrays are never modified intentionally, they should be 'const' so
that they are placed in .rodata and benefit from memory protection.
This patch updates the function signature and all users, and also
constifies tree_descr.name.

Signed-off-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2017-04-26 23:54:06 -04:00
Fabian Frederick
a80f2d2224 fs/affs: bugfix: Write files greater than page size on OFS
Previous AFFS patch fixed OFS write operations but unveiled
another bug: files greater than 4KB are being created with a wrong
size resulting in errors like the following:

dd if=/dev/zero of=file bs=4097 count=1
cp file /mnt/affs/
cp: error writing '/mnt/affs/file': Bad address

Signed-off-by: Fabian Frederick <fabf@skynet.be>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2017-04-26 23:54:06 -04:00
Fabian Frederick
077e073e8f fs/affs: bugfix: enable writes on OFS disks
We called unconditionally affs_bread_ino() with create 0 resulting in
"error (device ...): get_block(): strange block request 0"
when trying to write on AFFS OFS format.

This patch adds create parameter to that function.
0 for affs_readpage_ofs()
1 for affs_write_begin_ofs()

Bug was found here:
https://bugzilla.kernel.org/show_bug.cgi?id=114961

Signed-off-by: Fabian Frederick <fabf@skynet.be>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2017-04-26 23:54:06 -04:00
Fabian Frederick
a9d6cfb70f fs/affs: remove node generation check
node generation has to be stored on disk.
AFAICS we won't be able to manage it on AFFS.
This patch removes relevant check in affs_nfs_get_inode()

Signed-off-by: Fabian Frederick <fabf@skynet.be>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2017-04-26 23:54:05 -04:00
Fabian Frederick
d2d58e0e0d fs/affs: import amigaffs.h
Have that file in global include/linux is not needed.

Signed-off-by: Fabian Frederick <fabf@skynet.be>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2017-04-26 23:54:05 -04:00
Fabian Frederick
f1bf90724d fs/affs: bugfix: make symbolic links work again
AFFS symbolic links were broken since kernel 2.6.29

Problem was bisected to the following

commit ebd09abbd969 ("vfs: ensure page symlinks are NUL-terminated")
commit 035146851cfa ("vfs: introduce helper function to safely
NUL-terminate symlinks")

AFFS wasn't setting inode size when reading symbolic link from disk or
creating a new one. Result was zero allocation in pagecache.

ln -s file symlink

ls -lrt

file
symlink ->

This patch adds inode isize information on inode get and symbolic link
addition.

Signed-off-by: Fabian Frederick <fabf@skynet.be>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2017-04-26 23:54:05 -04:00
David S. Miller
b1513c3531 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-04-26 22:39:08 -04:00
David Howells
1e2f82d1e9 statx: Kill fd-with-NULL-path support in favour of AT_EMPTY_PATH
With the new statx() syscall, the following both allow the attributes of
the file attached to a file descriptor to be retrieved:

	statx(dfd, NULL, 0, ...);

and:

	statx(dfd, "", AT_EMPTY_PATH, ...);

Change the code to reject the first option, though this means copying
the path and engaging pathwalk for the fstat() equivalent.  dfd can be a
non-directory provided path is "".

[ The timing of this isn't wonderful, but applying this now before we
  have statx() in any released kernel, before anybody starts using the
  NULL special case.    - Linus ]

Fixes: a528d35e8bfc ("statx: Add a system call to make enhanced file info available")
Reported-by: Michael Kerrisk <mtk.manpages@gmail.com>
Signed-off-by: David Howells <dhowells@redhat.com>
cc: Eric Sandeen <sandeen@sandeen.net>
cc: fstests@vger.kernel.org
cc: linux-api@vger.kernel.org
cc: linux-man@vger.kernel.org
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-04-26 15:05:47 -07:00
Dan Carpenter
907bfcd8d8 orangefs: handle zero size write in debugfs
If we write zero bytes to this debugfs file, then it will cause an
underflow when we do copy_from_user(buf, ubuf, count - 1).  Debugfs can
normally only be written to by root so the impact of this is low.

Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Mike Marshall <hubcap@omnibond.com>
2017-04-26 14:33:01 -04:00
Martin Brandenburg
b5a9d61eeb orangefs: do not wait for timeout if umounting
When the computer is turned off, all the processes are killed and then
all the filesystems are umounted.  OrangeFS should not wait for the
userspace daemon to come back in that case.

This only works for plain umount(2).  To actually take advantage of this
interactively, `umount -f' is needed; otherwise umount will issue a
statfs first, which will wait for the userspace daemon to come back.

Signed-off-by: Martin Brandenburg <martin@omnibond.com>
Signed-off-by: Mike Marshall <hubcap@omnibond.com>
2017-04-26 14:33:01 -04:00
Martin Brandenburg
b7a57ccab8 orangefs: return from orangefs_devreq_read quickly if possible
It is not necessary to take the lock and search through the request list
if the list is empty.

Signed-off-by: Martin Brandenburg <martin@omnibond.com>
Signed-off-by: Mike Marshall <hubcap@omnibond.com>
2017-04-26 14:33:00 -04:00
Martin Brandenburg
9d286b0d82 orangefs: ensure the userspace component is unmounted if mount fails
If the mount is aborted after userspace has been asked to mount,
userspace must be told to unmount.

Ordinarily orangefs_kill_sb does the unmount.  However it cannot be
called if the superblock has not been set up.  This is a very narrow
window.

The NULL fs_id is not unmounted.

Signed-off-by: Martin Brandenburg <martin@omnibond.com>
Signed-off-by: Mike Marshall <hubcap@omnibond.com>
2017-04-26 14:33:00 -04:00
Martin Brandenburg
53950ef541 orangefs: do not check possibly stale size on truncate
Let the server figure this out because our size might be out of date or
not present.

The bug was that

	xfs_io -f -t -c "pread -v 0 100" /mnt/foo
	echo "Test" > /mnt/foo
	xfs_io -f -t -c "pread -v 0 100" /mnt/foo

fails because the second truncate did not happen if nothing had
requested the size after the write in echo.  Thus i_size was zero (not
present) and the orangefs_setattr though i_size was zero and there was
nothing to do.

Signed-off-by: Martin Brandenburg <martin@omnibond.com>
Cc: stable@vger.kernel.org
Signed-off-by: Mike Marshall <hubcap@omnibond.com>
2017-04-26 14:33:00 -04:00
Martin Brandenburg
68a24a6cc4 orangefs: implement statx
Fortunately OrangeFS has had a getattr request mask for a long time.

The server basically has two difficulty levels for attributes.  Fetching
any attribute except size requires communicating with the metadata
server for that handle.  Since all the attributes are right there, it
makes sense to return them all.  Fetching the size requires
communicating with every I/O server (that the file is distributed
across).  Therefore if asked for anything except size, get everything
except size, and if asked for size, get everything.

Signed-off-by: Martin Brandenburg <martin@omnibond.com>
Signed-off-by: Mike Marshall <hubcap@omnibond.com>
2017-04-26 14:33:00 -04:00
Martin Brandenburg
7b796ae370 orangefs: remove ORANGEFS_READDIR macros
They are clones of the ORANGEFS_ITERATE macros in use elsewhere.  Delete
ORANGEFS_ITERATE_NEXT which is a hack previously used by readdir.

Signed-off-by: Martin Brandenburg <martin@omnibond.com>
Signed-off-by: Mike Marshall <hubcap@omnibond.com>
2017-04-26 14:33:00 -04:00
Martin Brandenburg
480e3e532e orangefs: support very large directories
This works by maintaining a linked list of pages which the directory
has been read into rather than one giant fixed-size buffer.

This replaces code which limits the total directory size to the total
amount that could be returned in one server request.  Since filenames
are usually considerably shorter than the maximum, the old code could
usually handle several server requests before running out of space.

Signed-off-by: Martin Brandenburg <martin@omnibond.com>
Signed-off-by: Mike Marshall <hubcap@omnibond.com>
2017-04-26 14:33:00 -04:00
Martin Brandenburg
72f66b8329 orangefs: support llseek on directories
This and the previous commit fix xfstests generic/257.

Signed-off-by: Martin Brandenburg <martin@omnibond.com>
Signed-off-by: Mike Marshall <hubcap@omnibond.com>
2017-04-26 14:33:00 -04:00