Commit Graph

4534 Commits

Author SHA1 Message Date
Linus Torvalds
78a45c6f06 Merge branch 'akpm' (second patch-bomb from Andrew)
Merge second patchbomb from Andrew Morton:
 - the rest of MM
 - misc fs fixes
 - add execveat() syscall
 - new ratelimit feature for fault-injection
 - decompressor updates
 - ipc/ updates
 - fallocate feature creep
 - fsnotify cleanups
 - a few other misc things

* emailed patches from Andrew Morton <akpm@linux-foundation.org>: (99 commits)
  cgroups: Documentation: fix trivial typos and wrong paragraph numberings
  parisc: percpu: update comments referring to __get_cpu_var
  percpu: update local_ops.txt to reflect this_cpu operations
  percpu: remove __get_cpu_var and __raw_get_cpu_var macros
  fsnotify: remove destroy_list from fsnotify_mark
  fsnotify: unify inode and mount marks handling
  fallocate: create FAN_MODIFY and IN_MODIFY events
  mm/cma: make kmemleak ignore CMA regions
  slub: fix cpuset check in get_any_partial
  slab: fix cpuset check in fallback_alloc
  shmdt: use i_size_read() instead of ->i_size
  ipc/shm.c: fix overly aggressive shmdt() when calls span multiple segments
  ipc/msg: increase MSGMNI, remove scaling
  ipc/sem.c: increase SEMMSL, SEMMNI, SEMOPM
  ipc/sem.c: change memory barrier in sem_lock() to smp_rmb()
  lib/decompress.c: consistency of compress formats for kernel image
  decompress_bunzip2: off by one in get_next_block()
  usr/Kconfig: make initrd compression algorithm selection not expert
  fault-inject: add ratelimit option
  ratelimit: add initialization macro
  ...
2014-12-13 13:00:36 -08:00
Ganesh Mahendran
083914eab9 zram: use DEVICE_ATTR_[RW|RO|WO] to define zram sys device attribute
In current zram, we use DEVICE_ATTR() to define sys device attributes.
SO, we need to set (S_IRUGO | S_IWUSR) permission and other arguments
manually.  Linux already provids the macro DEVICE_ATTR_[RW|RO|WO] to
define sys device attribute.  It is simple and readable.

This patch uses kernel defined macro DEVICE_ATTR_[RW|RO|WO] to define
zram device attribute.

Signed-off-by: Ganesh Mahendran <opensource.ganesh@gmail.com>
Acked-by: Jerome Marchand <jmarchan@redhat.com>
Cc: Minchan Kim <minchan@kernel.org>
Cc: Nitin Gupta <ngupta@vflare.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-12-13 12:42:50 -08:00
Mahendran Ganesh
d49b1c254c mm/zram: correct ZRAM_ZERO flag bit position
In struct zram_table_entry, the element *value* contains obj size and obj
zram flags.  Bit 0 to bit (ZRAM_FLAG_SHIFT - 1) represent obj size, and
bit ZRAM_FLAG_SHIFT to the highest bit of unsigned long represent obj
zram_flags.  So the first zram flag(ZRAM_ZERO) should be from
ZRAM_FLAG_SHIFT instead of (ZRAM_FLAG_SHIFT + 1).

This patch fixes this cosmetic issue.

Also fix a typo, "page in now accessed" -> "page is now accessed"

Signed-off-by: Mahendran Ganesh <opensource.ganesh@gmail.com>
Acked-by: Minchan Kim <minchan@kernel.org>
Acked-by: Weijie Yang <weijie.yang@samsung.com>
Acked-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Cc: Nitin Gupta <ngupta@vflare.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-12-13 12:42:50 -08:00
karam.lee
8c7f01025f zram: implement rw_page operation of zram
This patch implements rw_page operation for zram block device.

I implemented the feature in zram and tested it.  Test bed was the G2, LG
electronic mobile device, whtich has msm8974 processor and 2GB memory.

With a memory allocation test program consuming memory, the system
generates swap.

Operating time of swap_write_page() was measured.

--------------------------------------------------
|             |   operating time   | improvement |
|             |  (20 runs average) |             |
--------------------------------------------------
|with patch   |    1061.15 us      |    +2.4%    |
--------------------------------------------------
|without patch|    1087.35 us      |             |
--------------------------------------------------

Each test(with paged_io,with BIO) result set shows normal distribution and
has equal variance.  I mean the two values are valid result to compare.  I
can say operation with paged I/O(without BIO) is faster 2.4% with
confidence level 95%.

[minchan@kernel.org: make rw_page opeartion return 0]
[minchan@kernel.org: rely on the bi_end_io for zram_rw_page fails]
[sergey.senozhatsky@gmail.com: code cleanup]
[minchan@kernel.org: add comment]
Signed-off-by: karam.lee <karam.lee@lge.com>
Acked-by: Minchan Kim <minchan@kernel.org>
Acked-by: Jerome Marchand <jmarchan@redhat.com>
Cc: Matthew Wilcox <matthew.r.wilcox@intel.com>
Cc: Nitin Gupta <ngupta@vflare.org>
Cc: <seungho1.park@lge.com>
Signed-off-by: Minchan Kim <minchan@kernel.org>
Signed-off-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-12-13 12:42:50 -08:00
karam.lee
54850e73e8 zram: change parameter from vaild_io_request()
This patch changes parameter of valid_io_request for common usage.  The
purpose of valid_io_request() is to determine if bio request is valid or
not.

This patch use I/O start address and size instead of a BIO parameter for
common usage.

Signed-off-by: karam.lee <karam.lee@lge.com>
Acked-by: Minchan Kim <minchan@kernel.org>
Acked-by: Jerome Marchand <jmarchan@redhat.com>
Cc: Matthew Wilcox <matthew.r.wilcox@intel.com>
Cc: Nitin Gupta <ngupta@vflare.org>
Cc: <seungho1.park@lge.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-12-13 12:42:50 -08:00
karam.lee
b627cff3d3 zram: remove bio parameter from zram_bvec_rw()
Recently rw_page block device operation has been added.  This patchset
implements rw_page operation for zram block device and does some clean-up.

This patch (of 3):

Remove an unnecessary parameter(bio) from zram_bvec_rw() and
zram_bvec_read().  zram_bvec_read() doesn't use a bio parameter, so remove
it.  zram_bvec_rw() calls a read/write operation not using bio, so a rw
parameter replaces a bio parameter.

Signed-off-by: karam.lee <karam.lee@lge.com>
Acked-by: Minchan Kim <minchan@kernel.org>
Acked-by: Jerome Marchand <jmarchan@redhat.com>
Cc: Matthew Wilcox <matthew.r.wilcox@intel.com>
Cc: Nitin Gupta <ngupta@vflare.org>
Cc: <seungho1.park@lge.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-12-13 12:42:49 -08:00
Dwight Engen
76e74bbe0a sunvdc: reconnect ldc after vds service domain restarts
This change enables the sunvdc driver to reconnect and recover if a vds
service domain is disconnected or bounced.

By default, it will wait indefinitely for the service domain to become
available again, but will honor a non-zero vdc-timout md property if one
is set. If a timeout is reached, any in-progress I/O's are completed
with -EIO.

Signed-off-by: Dwight Engen <dwight.engen@oracle.com>
Reviewed-by: Chris Hyser <chris.hyser@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-12-11 18:52:45 -08:00
Dwight Engen
fe47c3c262 vio: create routines for inc,dec vio dring indexes
Both sunvdc and sunvnet implemented distinct functionality for incrementing
and decrementing dring indexes. Create common functions for use by both
from the sunvnet versions, which were chosen since they will still work
correctly in case a non power of two ring size is used.

Signed-off-by: Dwight Engen <dwight.engen@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-12-11 18:52:45 -08:00
Dwight Engen
31f4888f51 sunvdc: fix module unload/reload
Free resources allocated during port/disk probing so that the module may be
successfully reloaded after unloading.

Signed-off-by: Dwight Engen <dwight.engen@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-12-11 18:51:57 -08:00
Linus Torvalds
6b9e2cea42 virtio: virtio 1.0 support, misc patches
This adds a lot of infrastructure for virtio 1.0 support.
 Notable missing pieces: virtio pci, virtio balloon (needs spec extension),
 vhost scsi.
 
 Plus, there are some minor fixes in a couple of places.
 
 Cc: David Miller <davem@davemloft.net>
 Cc: Rusty Russell <rusty@rustcorp.com.au>
 Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1
 
 iQEcBAABAgAGBQJUh1CVAAoJECgfDbjSjVRpWZcH/2+EGPyng7Lca820UHA0cU1U
 u4D8CAAwOGaVdnUUo8ox1eon3LNB2UgRtgsl3rBDR3YTgFfNPrfuYdnHO0dYIDc1
 lS26NuPrVrTX0lA+OBPe2nlKrsrOkn8aw1kxG9Y0gKtNg/+HAGNW5e2eE7R/LrA5
 94XbWZ8g9Yf4GPG1iFmih9vQvvN0E68zcUlojfCnllySgaIEYr8nTiGQBWpRgJat
 fCqFAp1HMDZzGJQO+m1/Vw0OftTRVybyfai59e6uUTa8x1djvzPb/1MvREqQjegM
 ylSuofIVyj7JPu++FbAjd9mikkb53GSc8ql3YmWNZLdr69rnkzP0GdzQvrdheAo=
 =RtrR
 -----END PGP SIGNATURE-----

Merge tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost

Pull virtio updates from Michael Tsirkin:
 "virtio: virtio 1.0 support, misc patches

  This adds a lot of infrastructure for virtio 1.0 support.  Notable
  missing pieces: virtio pci, virtio balloon (needs spec extension),
  vhost scsi.

  Plus, there are some minor fixes in a couple of places.

  Note: some net drivers are affected by these patches.  David said he's
  fine with merging these patches through my tree.

  Rusty's on vacation, he acked using my tree for these, too"

* tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost: (70 commits)
  virtio_ccw: finalize_features error handling
  virtio_ccw: future-proof finalize_features
  virtio_pci: rename virtio_pci -> virtio_pci_common
  virtio_pci: update file descriptions and copyright
  virtio_pci: split out legacy device support
  virtio_pci: setup config vector indirectly
  virtio_pci: setup vqs indirectly
  virtio_pci: delete vqs indirectly
  virtio_pci: use priv for vq notification
  virtio_pci: free up vq->priv
  virtio_pci: fix coding style for structs
  virtio_pci: add isr field
  virtio: drop legacy_only driver flag
  virtio_balloon: drop legacy_only driver flag
  virtio_ccw: rev 1 devices set VIRTIO_F_VERSION_1
  virtio: allow finalize_features to fail
  virtio_ccw: legacy: don't negotiate rev 1/features
  virtio: add API to detect legacy devices
  virtio_console: fix sparse warnings
  vhost: remove unnecessary forward declarations in vhost.h
  ...
2014-12-11 12:20:31 -08:00
Linus Torvalds
cbfe0de303 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs
Pull VFS changes from Al Viro:
 "First pile out of several (there _definitely_ will be more).  Stuff in
  this one:

   - unification of d_splice_alias()/d_materialize_unique()

   - iov_iter rewrite

   - killing a bunch of ->f_path.dentry users (and f_dentry macro).

     Getting that completed will make life much simpler for
     unionmount/overlayfs, since then we'll be able to limit the places
     sensitive to file _dentry_ to reasonably few.  Which allows to have
     file_inode(file) pointing to inode in a covered layer, with dentry
     pointing to (negative) dentry in union one.

     Still not complete, but much closer now.

   - crapectomy in lustre (dead code removal, mostly)

   - "let's make seq_printf return nothing" preparations

   - assorted cleanups and fixes

  There _definitely_ will be more piles"

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: (63 commits)
  copy_from_iter_nocache()
  new helper: iov_iter_kvec()
  csum_and_copy_..._iter()
  iov_iter.c: handle ITER_KVEC directly
  iov_iter.c: convert copy_to_iter() to iterate_and_advance
  iov_iter.c: convert copy_from_iter() to iterate_and_advance
  iov_iter.c: get rid of bvec_copy_page_{to,from}_iter()
  iov_iter.c: convert iov_iter_zero() to iterate_and_advance
  iov_iter.c: convert iov_iter_get_pages_alloc() to iterate_all_kinds
  iov_iter.c: convert iov_iter_get_pages() to iterate_all_kinds
  iov_iter.c: convert iov_iter_npages() to iterate_all_kinds
  iov_iter.c: iterate_and_advance
  iov_iter.c: macros for iterating over iov_iter
  kill f_dentry macro
  dcache: fix kmemcheck warning in switch_names
  new helper: audit_file()
  nfsd_vfs_write(): use file_inode()
  ncpfs: use file_inode()
  kill f_dentry uses
  lockd: get rid of ->f_path.dentry->d_sb
  ...
2014-12-10 16:10:49 -08:00
Michael S. Tsirkin
51cdc3815f virtio: drop VIRTIO_F_VERSION_1 from drivers
Core activates this bit automatically now,
drop it from drivers that set it explicitly.

Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2014-12-09 12:06:32 +02:00
Michael S. Tsirkin
38f37b578f virtio_blk: fix race at module removal
If a device appears while module is being removed,
driver will get a callback after we've given up
on the major number.

In theory this means this major number can get reused
by something else, resulting in a conflict.

Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Reviewed-by: Cornelia Huck <cornelia.huck@de.ibm.com>
2014-12-09 12:05:27 +02:00
Michael S. Tsirkin
393c525b5b virtio_blk: make serial attribute static
It's never declared so no need to make it extern.

Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Reviewed-by: Cornelia Huck <cornelia.huck@de.ibm.com>
2014-12-09 12:05:27 +02:00
Michael S. Tsirkin
19c1c5a64c virtio_blk: v1.0 support
Based on patch by Cornelia Huck.

Note: for consistency, and to avoid sparse errors,
      convert all fields, even those no longer in use
      for virtio v1.0.

Signed-off-by: Cornelia Huck <cornelia.huck@de.ibm.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2014-12-09 12:05:26 +02:00
Al Viro
ba00410b81 Merge branch 'iov_iter' into for-next 2014-12-08 20:39:29 -05:00
James Bottomley
dc843ef00e Merge remote-tracking branch 'scsi-queue/core-for-3.19' into for-linus 2014-12-08 07:40:20 -08:00
Hannes Reinecke
eb846d9f14 scsi: rename SERVICE_ACTION_IN to SERVICE_ACTION_IN_16
SPC-3 defines SERVICE ACTION IN(12) and SERVICE ACTION IN(16).
So rename SERVICE_ACTION_IN to SERVICE_ACTION_IN_16 to be
consistent with SPC and to allow for better distinction.

Signed-off-by: Hannes Reinecke <hare@suse.de>
Tested-by: Robert Elliott <elliott@hp.com>
Reviewed-by: Robert Elliott <elliott@hp.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
2014-11-24 20:01:40 +01:00
Al Viro
b583043e99 kill f_dentry uses
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2014-11-19 13:01:25 -05:00
Weijie Yang
c406515239 zram: avoid kunmap_atomic() of a NULL pointer
zram could kunmap_atomic() a NULL pointer in a rare situation: a zram
page becomes a full-zeroed page after a partial write io.  The current
code doesn't handle this case and performs kunmap_atomic() on a NULL
pointer, which panics the kernel.

This patch fixes this issue.

Signed-off-by: Weijie Yang <weijie.yang@samsung.com>
Cc: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Cc: Dan Streetman <ddstreet@ieee.org>
Cc: Nitin Gupta <ngupta@vflare.org>
Cc: Weijie Yang <weijie.yang.kh@gmail.com>
Acked-by: Jerome Marchand <jmarchan@redhat.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-11-13 16:17:05 -08:00
Linus Torvalds
ce1928da84 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client
Pull ceph fixes from Sage Weil:
 "There is a GFP flag fix from Mike Christie, an error code fix from
  Jan, and fixes for two unnecessary allocations (kmalloc and workqueue)
  from Ilya.  All are well tested.

  Ilya has one other fix on the way but it didn't get tested in time"

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client:
  libceph: eliminate unnecessary allocation in process_one_ticket()
  rbd: Fix error recovery in rbd_obj_read_sync()
  libceph: use memalloc flags for net IO
  rbd: use a single workqueue for all devices
2014-11-03 15:04:26 -08:00
Linus Torvalds
53429290a0 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/sparc
Pull sparc update from David Miller:
 "Two changes:

  1) It makes no sense to execute a VTOC partition table request in the
     Sun virtual block device driver and fail to load if it doesn't
     succeed because a) we don't use the result at all and b) it won't
     succeed if there is an EFI partition on the disk, for example.

     We read the partition table via the normal means in the block layer
     anyways, so this is really completely useless, so just remove it.

     From Dwight Engen.

  2) Hook up new bpf system call"

* git://git.kernel.org/pub/scm/linux/kernel/git/davem/sparc:
  sunvdc: don't call VD_OP_GET_VTOC
  sparc: Hook up bpf system call.
2014-10-31 15:00:48 -07:00
Dwight Engen
85b0c6e62c sunvdc: don't call VD_OP_GET_VTOC
The VD_OP_GET_VTOC operation will succeed only if the vdisk backend has a
VTOC label, otherwise it will fail. In particular, it will return error
48 (ENOTSUP) if the disk has an EFI label. VTOC disk labels are already
handled by directly reading the disk in block/partitions/sun.c (enabled by
CONFIG_SUN_PARTITION which defaults to y on SPARC). Since port->label is
unused in the driver, remove the call and the field.

Signed-off-by: Dwight Engen <dwight.engen@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-10-31 15:49:45 -04:00
Jan Kara
a8d4205623 rbd: Fix error recovery in rbd_obj_read_sync()
When we fail to allocate page vector in rbd_obj_read_sync() we just
basically ignore the problem and continue which will result in an oops
later. Fix the problem by returning proper error.

CC: Yehuda Sadeh <yehuda@inktank.com>
CC: Sage Weil <sage@inktank.com>
CC: ceph-devel@vger.kernel.org
CC: stable@vger.kernel.org
Coverity-id: 1226882
Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: Ilya Dryomov <idryomov@redhat.com>
2014-10-30 13:11:51 +03:00
Ilya Dryomov
f5ee37bd31 rbd: use a single workqueue for all devices
Using one queue per device doesn't make much sense given that our
workfn processes "devices" and not "requests".  Switch to a single
workqueue for all devices.

Signed-off-by: Ilya Dryomov <idryomov@redhat.com>
Reviewed-by: Sage Weil <sage@redhat.com>
2014-10-30 13:11:25 +03:00
Linus Torvalds
a7ca10f263 Merge branch 'akpm' (incoming from Andrew Morton)
Merge misc fixes from Andrew Morton:
 "21 fixes"

* emailed patches from Andrew Morton <akpm@linux-foundation.org>: (21 commits)
  mm/balloon_compaction: fix deflation when compaction is disabled
  sh: fix sh770x SCIF memory regions
  zram: avoid NULL pointer access in concurrent situation
  mm/slab_common: don't check for duplicate cache names
  ocfs2: fix d_splice_alias() return code checking
  mm: rmap: split out page_remove_file_rmap()
  mm: memcontrol: fix missed end-writeback page accounting
  mm: page-writeback: inline account_page_dirtied() into single caller
  lib/bitmap.c: fix undefined shift in __bitmap_shift_{left|right}()
  drivers/rtc/rtc-bq32k.c: fix register value
  memory-hotplug: clear pgdat which is allocated by bootmem in try_offline_node()
  drivers/rtc/rtc-s3c.c: fix initialization failure without rtc source clock
  kernel/kmod: fix use-after-free of the sub_info structure
  drivers/rtc/rtc-pm8xxx.c: rework to support pm8941 rtc
  mm, thp: fix collapsing of hugepages on madvise
  drivers: of: add return value to of_reserved_mem_device_init()
  mm: free compound page with correct order
  gcov: add ARM64 to GCOV_PROFILE_ALL
  fsnotify: next_i is freed during fsnotify_unmount_inodes.
  mm/compaction.c: avoid premature range skip in isolate_migratepages_range
  ...
2014-10-29 16:38:48 -07:00
Weijie Yang
5a99e95b8d zram: avoid NULL pointer access in concurrent situation
There is a rare NULL pointer bug in mem_used_total_show() and
mem_used_max_store() in concurrent situation, like this:

zram is not initialized, process A is a mem_used_total reader which runs
periodically, while process B try to init zram.

	process A 				process B
  access meta, get a NULL value
						init zram, done
  init_done() is true
  access meta->mem_pool, get a NULL pointer BUG

This patch fixes this issue.

Signed-off-by: Weijie Yang <weijie.yang@samsung.com>
Acked-by: Minchan Kim <minchan@kernel.org>
Acked-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-10-29 16:33:15 -07:00
Jan Kara
31f9690e6e null_blk: Cleanup error recovery in null_add_dev()
When creation of queues fails in init_driver_queues(), we free the
queues. But null_add_dev() doesn't test for this failure and continues
with the setup leading to strange consequences, likely oops. Fix the
problem by testing whether init_driver_queues() failed and do proper
error cleanup.

Coverity-id: 1148005
Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: Jens Axboe <axboe@fb.com>
2014-10-22 07:59:25 -06:00
Linus Torvalds
e75437fb93 Merge branch 'for-3.18/drivers' of git://git.kernel.dk/linux-block
Pull block layer driver update from Jens Axboe:
 "This is the block driver pull request for 3.18.  Not a lot in there
  this round, and nothing earth shattering.

   - A round of drbd fixes from the linbit team, and an improvement in
     asender performance.

   - Removal of deprecated (and unused) IRQF_DISABLED flag in rsxx and
     hd from Michael Opdenacker.

   - Disable entropy collection from flash devices by default, from Mike
     Snitzer.

   - A small collection of xen blkfront/back fixes from Roger Pau Monné
     and Vitaly Kuznetsov"

* 'for-3.18/drivers' of git://git.kernel.dk/linux-block:
  block: disable entropy contributions for nonrot devices
  xen, blkfront: factor out flush-related checks from do_blkif_request()
  xen-blkback: fix leak on grant map error path
  xen/blkback: unmap all persistent grants when frontend gets disconnected
  rsxx: Remove deprecated IRQF_DISABLED
  block: hd: remove deprecated IRQF_DISABLED
  drbd: use RB_DECLARE_CALLBACKS() to define augment callbacks
  drbd: compute the end before rb_insert_augmented()
  drbd: Add missing newline in resync progress display in /proc/drbd
  drbd: reduce lock contention in drbd_worker
  drbd: Improve asender performance
  drbd: Get rid of the WORK_PENDING macro
  drbd: Get rid of the __no_warn and __cond_lock macros
  drbd: Avoid inconsistent locking warning
  drbd: Remove superfluous newline from "resync_extents" debugfs entry.
  drbd: Use consistent names for all the bi_end_io callbacks
  drbd: Use better variable names
2014-10-18 12:12:45 -07:00
Linus Torvalds
d3dc366bba Merge branch 'for-3.18/core' of git://git.kernel.dk/linux-block
Pull core block layer changes from Jens Axboe:
 "This is the core block IO pull request for 3.18.  Apart from the new
  and improved flush machinery for blk-mq, this is all mostly bug fixes
  and cleanups.

   - blk-mq timeout updates and fixes from Christoph.

   - Removal of REQ_END, also from Christoph.  We pass it through the
     ->queue_rq() hook for blk-mq instead, freeing up one of the request
     bits.  The space was overly tight on 32-bit, so Martin also killed
     REQ_KERNEL since it's no longer used.

   - blk integrity updates and fixes from Martin and Gu Zheng.

   - Update to the flush machinery for blk-mq from Ming Lei.  Now we
     have a per hardware context flush request, which both cleans up the
     code should scale better for flush intensive workloads on blk-mq.

   - Improve the error printing, from Rob Elliott.

   - Backing device improvements and cleanups from Tejun.

   - Fixup of a misplaced rq_complete() tracepoint from Hannes.

   - Make blk_get_request() return error pointers, fixing up issues
     where we NULL deref when a device goes bad or missing.  From Joe
     Lawrence.

   - Prep work for drastically reducing the memory consumption of dm
     devices from Junichi Nomura.  This allows creating clone bio sets
     without preallocating a lot of memory.

   - Fix a blk-mq hang on certain combinations of queue depths and
     hardware queues from me.

   - Limit memory consumption for blk-mq devices for crash dump
     scenarios and drivers that use crazy high depths (certain SCSI
     shared tag setups).  We now just use a single queue and limited
     depth for that"

* 'for-3.18/core' of git://git.kernel.dk/linux-block: (58 commits)
  block: Remove REQ_KERNEL
  blk-mq: allocate cpumask on the home node
  bio-integrity: remove the needless fail handle of bip_slab creating
  block: include func name in __get_request prints
  block: make blk_update_request print prefix match ratelimited prefix
  blk-merge: don't compute bi_phys_segments from bi_vcnt for cloned bio
  block: fix alignment_offset math that assumes io_min is a power-of-2
  blk-mq: Make bt_clear_tag() easier to read
  blk-mq: fix potential hang if rolling wakeup depth is too high
  block: add bioset_create_nobvec()
  block: use bio_clone_fast() in blk_rq_prep_clone()
  block: misplaced rq_complete tracepoint
  sd: Honor block layer integrity handling flags
  block: Replace strnicmp with strncasecmp
  block: Add T10 Protection Information functions
  block: Don't merge requests if integrity flags differ
  block: Integrity checksum flag
  block: Relocate bio integrity flags
  block: Add a disk flag to block integrity profile
  block: Add prefix to block integrity profile flags
  ...
2014-10-18 11:53:51 -07:00
Linus Torvalds
0e6e58f941 One cc: stable commit, the rest are a series of minor cleanups which have
been sitting in MST's tree during my vacation.  I changed a function name
 and made one trivial change, then they spent two days in linux-next.
 
 Thanks,
 Rusty.
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1
 
 iQIcBAABAgAGBQJUQFBQAAoJENkgDmzRrbjxJRIP/1yCQRElQewxURSmJelyqCdU
 0mHYB0R9Mf3tfre1xnofqs2lWeSMc/4ptKHsVR6pupoztSwnz7HsLHfEFvFJh4mj
 KsaqYElxkNxTcfyHwLjyJS0/J6tG1tYypXGiimTBS0bvFHL3XZdimVgJ6WvX+gO7
 YSaDEX8/EqCERafslS5+gKJlz3drDOnCZCe9y4BDSmsvl2k7bkpSxIn8vsR6jIC0
 c5JpUy6QVF+3XA/J932M7yRs+xpqxNoUWiyY3ar9o3CtQAaQB0ZAetSxY6hTfvVc
 GlNFzCifdsaQwsl2SVsE2h6tWaRhtMtcGWQuhHThIPyIf8XxhYyBRY2FLo70LMz1
 eqtwy6F/Bg/nzUsdee4PZBMeoKHlAEL12RpsEKgfUoLzj16Aqa8ll+Agbglbkw8G
 f3d2FwzKAlpY5NwHETC1wYy52PJ3efqksRWuhokmYpxNSbHJS/lsiJOE7272/4Qr
 MtXuvRmo22tf34XFd5y7zqWjgZ58eeFOqQWi/K+6ZgpqVOvikjrXXKEuiVdjO0ZD
 kTVR/sQKiR+79rzENk80XBhWaMveECNXF1TiZ/3MmURkmEOBRQMxRQ20BX3exvna
 AJ/WVA5DcfXZc1yyqknE1NLGrvSBMJENH13x2QPwrqNWAryOOKuF1VKKIwWlDw5j
 vtx5nXiJa8YYdxI2TJCN
 =JK6x
 -----END PGP SIGNATURE-----

Merge tag 'virtio-next-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rusty/linux

Pull virtio updates from Rusty Russell:
 "One cc: stable commit, the rest are a series of minor cleanups which
  have been sitting in MST's tree during my vacation.  I changed a
  function name and made one trivial change, then they spent two days in
  linux-next"

* tag 'virtio-next-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rusty/linux: (25 commits)
  virtio-rng: refactor probe error handling
  virtio_scsi: drop scan callback
  virtio_balloon: enable VQs early on restore
  virtio_scsi: fix race on device removal
  virito_scsi: use freezable WQ for events
  virtio_net: enable VQs early on restore
  virtio_console: enable VQs early on restore
  virtio_scsi: enable VQs early on restore
  virtio_blk: enable VQs early on restore
  virtio_scsi: move kick event out from virtscsi_init
  virtio_net: fix use after free on allocation failure
  9p/trans_virtio: enable VQs early
  virtio_console: enable VQs early
  virtio_blk: enable VQs early
  virtio_net: enable VQs early
  virtio: add API to enable VQs early
  virtio_net: minor cleanup
  virtio-net: drop config_mutex
  virtio_net: drop config_enable
  virtio-blk: drop config_mutex
  ...
2014-10-18 10:25:09 -07:00
Linus Torvalds
6b04908166 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client
Pull Ceph updates from Sage Weil:
 "There is the long-awaited discard support for RBD (Guangliang Zhao,
  Josh Durgin), a pile of RBD bug fixes that didn't belong in late -rc's
  (Ilya Dryomov, Li RongQing), a pile of fs/ceph bug fixes and
  performance and debugging improvements (Yan, Zheng, John Spray), and a
  smattering of cleanups (Chao Yu, Fabian Frederick, Joe Perches)"

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client: (40 commits)
  ceph: fix divide-by-zero in __validate_layout()
  rbd: rbd workqueues need a resque worker
  libceph: ceph-msgr workqueue needs a resque worker
  ceph: fix bool assignments
  libceph: separate multiple ops with commas in debugfs output
  libceph: sync osd op definitions in rados.h
  libceph: remove redundant declaration
  ceph: additional debugfs output
  ceph: export ceph_session_state_name function
  ceph: include the initial ACL in create/mkdir/mknod MDS requests
  ceph: use pagelist to present MDS request data
  libceph: reference counting pagelist
  ceph: fix llistxattr on symlink
  ceph: send client metadata to MDS
  ceph: remove redundant code for max file size verification
  ceph: remove redundant io_iter_advance()
  ceph: move ceph_find_inode() outside the s_mutex
  ceph: request xattrs if xattr_version is zero
  rbd: set the remaining discard properties to enable support
  rbd: use helpers to handle discard for layered images correctly
  ...
2014-10-15 06:46:01 +02:00
Michael S. Tsirkin
6d62c37f19 virtio_blk: enable VQs early on restore
virtio spec requires drivers to set DRIVER_OK before using VQs.
This is set automatically after restore returns, virtio block violated
this rule on restore by restarting queues, which might in theory
cause the VQ to be used directly within restore.

To fix, call virtio_device_ready before using starting queues.

Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2014-10-15 10:25:07 +10:30
Michael S. Tsirkin
7a11370e5e virtio_blk: enable VQs early
virtio spec requires drivers to set DRIVER_OK before using VQs.
This is set automatically after probe returns, virtio block violated this
rule by calling add_disk, which causes the VQ to be used directly within
probe.

To fix, call virtio_device_ready before using VQs.

Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Reviewed-by: Cornelia Huck <cornelia.huck@de.ibm.com>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2014-10-15 10:25:02 +10:30
Michael S. Tsirkin
1f54b0c055 virtio-blk: drop config_mutex
config_mutex served two purposes: prevent multiple concurrent config
change handlers, and synchronize access to config_enable flag.

Since commit dbf2576e37
    workqueue: make all workqueues non-reentrant
all workqueues are non-reentrant, and config_enable
is now gone.

Get rid of the unnecessary lock.

Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Reviewed-by: Cornelia Huck <cornelia.huck@de.ibm.com>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2014-10-15 10:24:57 +10:30
Michael S. Tsirkin
cc74f71934 virtio_blk: drop config_enable
Now that virtio core ensures config changes don't
arrive during probing, drop config_enable flag
in virtio blk.
On removal, flush is now sufficient to guarantee that
no change work is queued.

This help simplify the driver, and will allow
setting DRIVER_OK earlier without losing config
change notifications.

Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Reviewed-by: Cornelia Huck <cornelia.huck@de.ibm.com>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2014-10-15 10:24:56 +10:30
Ilya Dryomov
792c3a9149 rbd: rbd workqueues need a resque worker
Need to use WQ_MEM_RECLAIM for our workqueues to prevent I/O lockups
under memory pressure - we sit on the memory reclaim path.

Cc: stable@vger.kernel.org # 3.17, needs backporting for 3.16
Signed-off-by: Ilya Dryomov <idryomov@redhat.com>
Tested-by: Micha Krause <micha@krausam.de>
Reviewed-by: Sage Weil <sage@redhat.com>
2014-10-14 12:57:05 -07:00
Josh Durgin
b76f82398c rbd: set the remaining discard properties to enable support
max_discard_sectors must be set for the queue to support discard.
Operations implementing discard for rbd zero data, so report that.

Signed-off-by: Josh Durgin <josh.durgin@inktank.com>
2014-10-14 21:03:37 +04:00
Josh Durgin
d3246fb0da rbd: use helpers to handle discard for layered images correctly
Only allocate two osd ops for discard requests, since the
preallocation hint is only added for regular writes.  Use
rbd_img_obj_request_fill() to recreate the original write or discard
osd operations, isolating that logic to one place, and change the
assert in rbd_osd_req_create_copyup() to accept discard requests as
well.

Signed-off-by: Josh Durgin <josh.durgin@inktank.com>
2014-10-14 21:03:36 +04:00
Josh Durgin
3b434a2aff rbd: extract a method for adding object operations
rbd_img_request_fill() creates a ceph_osd_request and has logic for
adding the appropriate osd ops to it based on the request type and
image properties.

For layered images, the original rbd_obj_request is resent with a
copyup operation in front, using a new ceph_osd_request. The logic for
adding the original operations should be the same as when first
sending them, so move it to a helper function.

op_type only needs to be checked once, so create a helper for that as
well and call it outside the loop in rbd_img_request_fill().

Signed-off-by: Josh Durgin <josh.durgin@inktank.com>
2014-10-14 21:03:35 +04:00
Josh Durgin
1c220881e3 rbd: make discard trigger copy-on-write
Discard requests are a form of write, so they should go through the
same process as plain write requests and trigger copy-on-write for
layered images.

Signed-off-by: Josh Durgin <josh.durgin@inktank.com>
2014-10-14 21:03:34 +04:00
Josh Durgin
d0265de7c3 rbd: tolerate -ENOENT for discard operations
Discard may try to delete an object from a non-layered image that does not exist.
If this occurs, the image already has no data in that range, so change the
result to success.

Signed-off-by: Josh Durgin <josh.durgin@inktank.com>
2014-10-14 21:03:33 +04:00
Josh Durgin
bef95455a4 rbd: fix snapshot context reference count for discards
Discards take a reference to the snapshot context of an image when
they are created.  This reference needs to be cleaned up when the
request is done just as it is for regular writes.

Signed-off-by: Josh Durgin <josh.durgin@inktank.com>
2014-10-14 21:03:32 +04:00
Josh Durgin
3c5df89367 rbd: read image size for discard check safely
In rbd_img_request_fill() the image size is only checked to determine
whether we can truncate an object instead of zeroing it for discard
requests. Take rbd_dev->header_rwsem while reading the image size, and
move this read into the discard check, so that non-discard ops don't
need to take the semaphore in this function.

Signed-off-by: Josh Durgin <josh.durgin@inktank.com>
2014-10-14 21:03:32 +04:00
Guangliang Zhao
90e98c5229 rbd: initial discard bits from Guangliang Zhao
This patch add the discard support for rbd driver.

There are three types operation in the driver:
1. The objects would be removed if they completely contained
   within the discard range.
2. The objects would be truncated if they partly contained within
   the discard range, and align with their boundary.
3. Others would be zeroed.

A discard request from blkdev_issue_discard() is defined which
REQ_WRITE and REQ_DISCARD both marked and no data, so we must
check the REQ_DISCARD first when getting the request type.

This resolve:
	http://tracker.ceph.com/issues/190

[ Ilya Dryomov: This is incomplete and somewhat buggy, see follow up
  commits by Josh Durgin for refinements and fixes which weren't
  folded in to preserve authorship. ]

Signed-off-by: Guangliang Zhao <lucienchao@gmail.com>
Reviewed-by: Josh Durgin <josh.durgin@inktank.com>
Reviewed-by: Alex Elder <elder@linaro.org>
2014-10-14 21:03:31 +04:00
Guangliang Zhao
6d2940c881 rbd: extend the operation type
It could only handle the read and write operations now,
extend it for the coming discard support.

Signed-off-by: Guangliang Zhao <lucienchao@gmail.com>
Reviewed-by: Josh Durgin <josh.durgin@inktank.com>
Reviewed-by: Alex Elder <elder@linaro.org>
2014-10-14 21:03:30 +04:00
Guangliang Zhao
c622d22615 rbd: skip the copyup when an entire object writing
It need to copyup the parent's content when layered writing,
but an entire object write would overwrite it, so skip it.

Signed-off-by: Guangliang Zhao <lucienchao@gmail.com>
Reviewed-by: Josh Durgin <josh.durgin@inktank.com>
Reviewed-by: Alex Elder <elder@linaro.org>
2014-10-14 21:03:29 +04:00
Ilya Dryomov
70d045f660 rbd: add img_obj_request_simple() helper
To clarify the conditions and make it easier to add new ones.

Signed-off-by: Ilya Dryomov <ilya.dryomov@inktank.com>
2014-10-14 21:03:28 +04:00
Josh Durgin
4e752f0ab0 rbd: access snapshot context and mapping size safely
These fields may both change while the image is mapped if a snapshot
is created or deleted or the image is resized.  They are guarded by
rbd_dev->header_rwsem, so hold that while reading them, and store a
local copy to refer to outside of the critical section. The local copy
will stay consistent since the snapshot context is reference counted,
and the mapping size is just a u64. This prevents torn loads from
giving us inconsistent values.

Move reading header.snapc into the caller of rbd_img_request_create()
so that we only need to take the semaphore once. The read-only caller,
rbd_parent_request_create() can just pass NULL for snapc, since the
snapshot context is only relevant for writes.

Signed-off-by: Josh Durgin <josh.durgin@inktank.com>
2014-10-14 21:03:27 +04:00
Ilya Dryomov
7dd440c9e0 rbd: do not return -ERANGE on auth failures
Trying to map an image out of a pool for which we don't have an 'x'
permission bit fails with -ERANGE from ceph_extract_encoded_string()
due to an unsigned vs signed bug.  Fix it and get rid of the -EINVAL
sink, thus propagating rbd::get_id cls method errors.  (I've seen
a bunch of unexplained -ERANGE reports, I bet this is it).

Signed-off-by: Ilya Dryomov <ilya.dryomov@inktank.com>
Reviewed-by: Alex Elder <elder@linaro.org>
2014-10-14 21:03:26 +04:00