linux

mirror of https://github.com/FEX-Emu/linux.git synced 2024-12-23 01:40:30 +00:00

Author	SHA1	Message	Date
Alex Elder	ed63f4fd9a	rbd: drop rbd_header_from_disk() gfp_flags parameter The function rbd_header_from_disk() is only called in one spot, and it passes GFP_KERNEL as its value for the gfp_flags parameter. Just drop that parameter and substitute GFP_KERNEL everywhere within that function it had been used. (If we find we need the parameter again in the future it's easy enough to add back again.) Signed-off-by: Alex Elder <elder@inktank.com> Reviewed-by: Josh Durgin <josh.durgin@inktank.com>	2012-07-30 18:15:50 -07:00
Alex Elder	9a5d690b08	rbd: snapc is unused in rbd_req_sync_read() The "snapc" parameter to in rbd_req_sync_read() is not used, so get rid of it. Reported-by: Josh Durgin <josh.durgin@inktank.com> Signed-off-by: Alex Elder <elder@inktank.com> Reviewed-by: Josh Durgin <josh.durgin@inktank.com>	2012-07-30 18:15:49 -07:00
Alex Elder	de71a2970d	rbd: rename rbd_device->id The "id" field of an rbd device structure represents the unique client-local device id mapped to the underlying rbd image. Each rbd image will have another id--the image id--and each snapshot has its own id as well. The simple name "id" no longer conveys the information one might like to have. Rename the device "id" field in struct rbd_dev to be "dev_id" to make it a little more obvious what we're dealing with without having to think more about context. Signed-off-by: Alex Elder <elder@inktank.com> Reviewed-by: Josh Durgin <josh.durgin@inktank.com>	2012-07-30 18:15:49 -07:00
Alex Elder	8e94af8e2b	rbd: encapsulate header validity test If an rbd image header is read and it doesn't begin with the expected magic information, a warning is displayed. This is a fairly simple test, but it could be extended at some point. Fix the comparison so it actually looks at the "text" field rather than the front of the structure. In any case, encapsulate the validity test in its own function. Signed-off-by: Alex Elder <elder@inktank.com> Reviewed-by: Josh Durgin <josh.durgin@inktank.com>	2012-07-30 18:15:48 -07:00
Alex Elder	bd919d45aa	rbd: clean up a few dout() calls There was a dout() call in rbd_do_request() that was reporting the reporting the offset as the length and vice versa. While fixing that I did a quick scan of other dout() calls and fixed a couple of other minor things. Signed-off-by: Alex Elder <elder@inktank.com> Reviewed-by: Josh Durgin <josh.durgin@inktank.com>	2012-07-30 18:15:46 -07:00
Alex Elder	a059329056	rbd: simplify __rbd_remove_all_snaps() This just replaces a while loop with list_for_each_entry_safe() in __rbd_remove_all_snaps(). Signed-off-by: Alex Elder <elder@inktank.com> Reviewed-by: Josh Durgin <josh.durgin@inktank.com>	2012-07-30 18:15:45 -07:00
Alex Elder	a66f8c97a3	rbd: drop extra header_rwsem init In commit `c666601a` there was inadvertently added an extra initialization of rbd_dev->header_rwsem. This gets rid of the duplicate. Reported-by: Guangliang Zhao <gzhao@suse.com> Signed-off-by: Alex Elder <elder@inktank.com> Reviewed-by: Josh Durgin <josh.durgin@inktank.com>	2012-07-30 18:15:45 -07:00
Alex Elder	9e15dc735a	rbd: kill rbd_image_header->snap_seq The snap_seq field in an rbd_image_header structure held the value from the rbd image header when it was last refreshed. We now maintain this value in the snapc->seq field. So get rid of the other one. Signed-off-by: Alex Elder <elder@inktank.com> Reviewed-by: Josh Durgin <josh.durgin@inktank.com>	2012-07-30 18:15:44 -07:00
Alex Elder	505cbb9bed	rbd: set snapc->seq only when refreshing header In rbd_header_add_snap() there is code to set snapc->seq to the just-added snapshot id. This is the only remnant left of the use of that field for recording which snapshot an rbd_dev was associated with. That functionality is no longer supported, so get rid of that final bit of code. Doing so means we never actually set snapc->seq any more. On the server, the snapshot context's sequence value represents the highest snapshot id ever issued for a particular rbd image. So we'll make it have that meaning here as well. To do so, set this value whenever the rbd header is (re-)read. That way it will always be consistent with the rest of the snapshot context we maintain. Signed-off-by: Alex Elder <elder@inktank.com> Reviewed-by: Josh Durgin <josh.durgin@inktank.com>	2012-07-30 18:15:43 -07:00
Alex Elder	78dc447d3c	rbd: preserve snapc->seq in rbd_header_set_snap() In rbd_header_set_snap(), there is logic to make the snap context's seq field get set to a particular snapshot id, or 0 if there is no snapshot for the rbd image. This seems to be an artifact of how the current snapshot id for an rbd_dev was recorded before the rbd_dev->snap_id field began to be used for that purpose. There's no need to update the value of snapc->seq here any more, so stop doing it. Tidy up a few local variables in that function while we're at it. Signed-off-by: Alex Elder <elder@inktank.com> Reviewed-by: Josh Durgin <josh.durgin@inktank.com>	2012-07-30 18:15:42 -07:00
Alex Elder	75fe9e1981	rbd: don't use snapc->seq that way In what appears to be an artifact of a different way of encoding whether an rbd image maps a snapshot, __rbd_refresh_header() has code that arranges to update the seq value in an rbd image's snapshot context to point to the first entry in its snapshot array if that's where it was pointing initially. We now use rbd_dev->snap_id to record the snapshot id--using the special value CEPH_NOSNAP to indicate the rbd_dev is not mapping a snapshot at all. There is therefore no need to check for this case, nor to update the seq value, in __rbd_refresh_header(). Just preserve the seq value that rbd_read_header() provides (which, at the moment, is nothing). Signed-off-by: Alex Elder <elder@inktank.com> Reviewed-by: Josh Durgin <josh.durgin@inktank.com>	2012-07-30 18:15:41 -07:00
Josh Durgin	a71b891bc7	rbd: send header version when notifying Previously the original header version was sent. Now, we update it when the header changes. Signed-off-by: Josh Durgin <josh.durgin@dreamhost.com> Reviewed-by: Alex Elder <elder@inktank.com>	2012-07-30 18:15:40 -07:00
Josh Durgin	d1d2564654	rbd: use reference counting for the snap context This prevents a race between requests with a given snap context and header updates that free it. The osd client was already expecting the snap context to be reference counted, since it get()s it in ceph_osdc_build_request and put()s it when the request completes. Also remove the second down_read()/up_read() on header_rwsem in rbd_do_request, which wasn't actually preventing this race or protecting any other data. Signed-off-by: Josh Durgin <josh.durgin@dreamhost.com> Reviewed-by: Alex Elder <elder@inktank.com>	2012-07-30 18:15:40 -07:00
Josh Durgin	93a24e084d	rbd: set image size when header is updated The image may have been resized. Signed-off-by: Josh Durgin <josh.durgin@dreamhost.com> Reviewed-by: Alex Elder <elder@inktank.com>	2012-07-30 18:15:39 -07:00
Josh Durgin	a51aa0c042	rbd: expose the correct size of the device in sysfs If an image was mapped to a snapshot, the size of the head version would be shown. Protect capacity with header_rwsem, since it may change. Signed-off-by: Josh Durgin <josh.durgin@dreamhost.com> Reviewed-by: Alex Elder <elder@inktank.com>	2012-07-30 18:15:38 -07:00
Josh Durgin	474ef7ce83	rbd: only reset capacity when pointing to head Snapshots cannot be resized, and the new capacity of head should not be reflected by the snapshot. Signed-off-by: Josh Durgin <josh.durgin@inktank.com> Reviewed-by: Alex Elder <elder@inktank.com>	2012-07-30 18:15:37 -07:00
Josh Durgin	e88a36ec96	rbd: return errors for mapped but deleted snapshot When a snapshot is deleted, the OSD will return ENOENT when reading from it. This is normally interpreted as a hole by rbd, which will return zeroes. To minimize the time in which this can happen, stop requests early when we are notified that our snapshot no longer exists. [elder@inktank.com: updated __rbd_init_snaps_header() logic] Signed-off-by: Josh Durgin <josh.durgin@inktank.com> Reviewed-by: Alex Elder <elder@inktank.com>	2012-07-30 18:15:36 -07:00
Alex Elder	d1f57ea663	rbd: kill num_reply parameters Several functions include a num_reply parameter, but it is never used. Just get rid of it everywhere--it seems to be something that never got fully implemented. Signed-off-by: Alex Elder <elder@inktank.com> Reviewed-by: Josh Durgin <josh.durgin@inktank.com>	2012-07-30 09:30:09 -07:00
Alex Elder	43ae470112	rbd: option symbol renames Use the name "ceph_opts" consistently (rather than just "opt") for pointers to a ceph_options structure. Change the few spots that don't use "rbd_opts" for a rbd_options pointer to match the rest. Signed-off-by: Alex Elder <elder@inktank.com> Reviewed-by: Josh Durgin <josh.durgin@inktank.com>	2012-07-30 09:30:08 -07:00
Alex Elder	aded07ea9f	rbd: more symbol renames Rename variables named "obj" which represent object names so they're consistently named "object_name". Rename the "cls" and "method" parameters in rbd_req_sync_exec() to be "class_name" and "method_name", and make similar changes to the names of local variables in that function representing the lengths of those names. Signed-off-by: Alex Elder <elder@inktank.com> Reviewed-by: Josh Durgin <josh.durgin@inktank.com>	2012-07-30 09:30:07 -07:00
Alex Elder	0bed54dc9a	rbd: rename some fields in struct rbd_dev An rbd image is not a single object, but a logical construct made up of an aggregation of objects. Rename some fields in struct rbd_dev, in hopes of reinforcing this. obj --> image_name obj_len --> image_name_len obj_md_name --> header_name Signed-off-by: Alex Elder <elder@inktank.com> Reviewed-by: Josh Durgin <josh.durgin@inktank.com>	2012-07-30 09:30:06 -07:00
Alex Elder	0ce1a79413	rbd: use rbd_dev consistently Most variables that represent a struct rbd_device are named "rbd_dev", but in some cases "dev" is used instead. Change all the "dev" references so they use "rbd_dev" consistently, to make it clear from the name that we're working with an RBD device (as opposed to, for example, a struct device). Similarly, change the name of the "dev" field in struct rbd_notify_info to be "rbd_dev". Signed-off-by: Alex Elder <elder@inktank.com> Reviewed-by: Josh Durgin <josh.durgin@inktank.com>	2012-07-30 09:30:05 -07:00
Alex Elder	820a5f3e94	rbd: dynamically allocate snapshot name There is no need to impose a small limit the length of the snapshot name recorded for an rbd image in a struct rbd_dev. Remove the limitation by allocating space for the snapshot name dynamically. Signed-off-by: Alex Elder <elder@inktank.com> Reviewed-by: Josh Durgin <josh.durgin@inktank.com>	2012-07-30 09:30:04 -07:00
Alex Elder	bf3e5ae112	rbd: dynamically allocate image name There is no need to impose a small limit the length of the rbd image name recorded in a struct rbd_dev. Remove the limitation by allocating space for the image name dynamically. Signed-off-by: Alex Elder <elder@inktank.com> Reviewed-by: Josh Durgin <josh.durgin@inktank.com>	2012-07-30 09:30:04 -07:00
Alex Elder	cb8627c76d	rbd: dynamically allocate image header name There is no need to impose a small limit the length of the header name recorded for an rbd image in a struct rbd_dev. Remove the limitation by allocating space for the header name dynamically. Signed-off-by: Alex Elder <elder@inktank.com> Reviewed-by: Josh Durgin <josh.durgin@inktank.com>	2012-07-30 09:30:03 -07:00
Alex Elder	849b4260d4	rbd: dynamically allocate object prefix There is no need to impose a small limit the length of the object prefix recorded for an rbd image in a struct rbd_image_header. Remove the limitation by allocating space for the object prefix dynamically. Signed-off-by: Alex Elder <elder@inktank.com> Reviewed-by: Josh Durgin <josh.durgin@inktank.com>	2012-07-30 09:30:02 -07:00
Alex Elder	d22f76e703	rbd: dynamically allocate pool name There is no need to impose a small limit the length of the pool name recorded for an rbd image in a struct rbd_device. Remove the limitation by allocating space for the pool name ynamically. Signed-off-by: Alex Elder <elder@inktank.com> Reviewed-by: Josh Durgin <josh.durgin@inktank.com>	2012-07-30 09:30:01 -07:00
Alex Elder	9bb2f334b9	rbd: create pool_id device attribute Add an entry under /sys/bus/rbd/devices/<N>/ named "pool_id" that provides the id for the pool the rbd image is assocatied with. This is in addition to the pool name already provided. Rename the "poolid" field in struct rbd_device to be "pool_id". Update the documentation to reflect the addition of this new entry. Signed-off-by: Alex Elder <elder@inktank.com> Reviewed-by: Josh Durgin <josh.durgin@inktank.com>	2012-07-30 09:30:00 -07:00
Alex Elder	ca1e49a6af	rbd: rename rbd_dev->block_name Each rbd image has a name that forms the basis of all data objects backing the device. Old (format 1) images refer to this name as the "block name," while new (format 2) images use the term "object prefix" for this. Change the field name in the in-core rbd image header structure to reflect the more modern usage. We intentionally keep the the name "block_name" in the on-disk definition for format 1 image headers. Signed-off-by: Alex Elder <elder@inktank.com> Reviewed-by: Yehuda Sadeh <yehuda@inktank.com> Reviewed-by: Josh Durgin <josh.durgin@inktank.com>	2012-07-30 09:29:59 -07:00
Alex Elder	ea3352f4aa	rbd: define dup_token() Define a new function dup_token(), to be used during argument parsing for making dynamically-allocated copies of tokens being parsed. Signed-off-by: Alex Elder <elder@inktank.com> Reviewed-by: Yehuda Sadeh <yehuda@inktank.com> Reviewed-by: Josh Durgin <josh.durgin@inktank.com>	2012-07-30 09:29:58 -07:00
Alex Elder	ad4f232f28	rbd: drop a useless local variable In rbd_req_sync_notify_ack(), a local variable was needlessly being used to hold a null pointer. Just pass NULL instead. Signed-off-by: Alex Elder <elder@inktank.com> Reviewed-by: Yehuda Sadeh <yehuda@inktank.com> Reviewed-by: Josh Durgin <josh.durgin@inktank.com>	2012-07-30 09:29:56 -07:00
Jens Axboe	72ea1f74fc	Merge branch 'for-jens' of git://git.drbd.org/linux-drbd into for-3.6/drivers	2012-07-30 09:03:10 +02:00
Paolo Bonzini	cd5d503862	virtio-blk: allow toggling host cache between writeback and writethrough This patch adds support for the new VIRTIO_BLK_F_CONFIG_WCE feature, which exposes the cache mode in the configuration space and lets the driver modify it. The cache mode is exposed via sysfs. Even if the host does not support the new feature, the cache mode is visible (thanks to the existing VIRTIO_BLK_F_WCE), but not modifiable. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>	2012-07-30 13:30:52 +09:30
Asias He	2c95a32909	virtio-blk: Use block layer provided spinlock Block layer will allocate a spinlock for the queue if the driver does not provide one in blk_init_queue(). The reason to use the internal spinlock is that blk_cleanup_queue() will switch to use the internal spinlock in the cleanup code path. if (q->queue_lock != &q->__queue_lock) q->queue_lock = &q->__queue_lock; However, processes which are in D state might have taken the driver provided spinlock, when the processes wake up, they would release the block provided spinlock. ===================================== [ BUG: bad unlock balance detected! ] 3.4.0-rc7+ #238 Not tainted ------------------------------------- fio/3587 is trying to release lock (&(&q->__queue_lock)->rlock) at: [<ffffffff813274d2>] blk_queue_bio+0x2a2/0x380 but there are no more locks to release! other info that might help us debug this: 1 lock held by fio/3587: #0: (&(&vblk->lock)->rlock){......}, at: [<ffffffff8132661a>] get_request_wait+0x19a/0x250 Other drivers use block layer provided spinlock as well, e.g. SCSI. Switching to the block layer provided spinlock saves a bit of memory and does not increase lock contention. Performance test shows no real difference is observed before and after this patch. Changes in v2: Improve commit log as Michael suggested. Cc: virtualization@lists.linux-foundation.org Cc: kvm@vger.kernel.org Cc: stable@kernel.org Signed-off-by: Asias He <asias@redhat.com> Acked-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>	2012-07-30 13:30:51 +09:30
Asias He	483001c765	virtio-blk: Reset device after blk_cleanup_queue() blk_cleanup_queue() will call blk_drian_queue() to drain all the requests before queue DEAD marking. If we reset the device before blk_cleanup_queue() the drain would fail. 1) if the queue is stopped in do_virtblk_request() because device is full, the q->request_fn() will not be called. blk_drain_queue() { while(true) { ... if (!list_empty(&q->queue_head)) __blk_run_queue(q) { if (queue is not stoped) q->request_fn() } ... } } Do no reset the device before blk_cleanup_queue() gives the chance to start the queue in interrupt handler blk_done(). 2) In commit `b79d866c8b`, We abort requests dispatched to driver before blk_cleanup_queue(). There is a race if requests are dispatched to driver after the abort and before the queue DEAD mark. To fix this, instead of aborting the requests explicitly, we can just reset the device after after blk_cleanup_queue so that the device can complete all the requests before queue DEAD marking in the drain process. Cc: Rusty Russell <rusty@rustcorp.com.au> Cc: virtualization@lists.linux-foundation.org Cc: kvm@vger.kernel.org Cc: stable@kernel.org Signed-off-by: Asias He <asias@redhat.com> Acked-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>	2012-07-30 13:30:51 +09:30
Asias He	02e2b12494	virtio-blk: Call del_gendisk() before disable guest kick del_gendisk() might not return due to failing to remove the /sys/block/vda/serial sysfs entry when another thread (udev) is trying to read it. virtblk_remove() vdev->config->reset() : guest will not kick us through interrupt del_gendisk() device_del() kobject_del(): got stuck, sysfs entry ref count non zero sysfs_open_file(): user space process read /sys/block/vda/serial sysfs_get_active() : got sysfs entry ref count dev_attr_show() virtblk_serial_show() blk_execute_rq() : got stuck, interrupt is disabled request cannot be finished This patch fixes it by calling del_gendisk() before we disable guest's interrupt so that the request sent in virtblk_serial_show() will be finished and del_gendisk() will success. This fixes another race in hot-unplug process. It is save to call del_gendisk(vblk->disk) before flush_work(&vblk->config_work) which might access vblk->disk, because vblk->disk is not freed until put_disk(vblk->disk). Cc: virtualization@lists.linux-foundation.org Cc: kvm@vger.kernel.org Cc: stable@kernel.org Signed-off-by: Asias He <asias@redhat.com> Acked-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>	2012-07-30 13:30:51 +09:30
Keith Busch	c7d36ab8fa	NVMe: Fix uninitialized iod compiler warning Signed-off-by: Keith Busch <keith.busch@intel.com> Signed-off-by: Matthew Wilcox <matthew.r.wilcox@intel.com>	2012-07-27 14:25:22 -04:00
Keith Busch	a0cadb85b8	NVMe: Do not set IO queue depth beyond device max Set the depth for IO queues to the device's maximum supported queue entries if the requested depth exceeds the device's capabilities. Signed-off-by: Keith Busch <keith.busch@intel.com> Signed-off-by: Matthew Wilcox <matthew.r.wilcox@intel.com>	2012-07-27 13:57:23 -04:00
Keith Busch	8fc23e032d	NVMe: Set block queue max sectors Set the max hw sectors in a namespace's request queue if the nvme device has a max data transfer size. Signed-off-by: Keith Busch <keith.busch@intel.com> Signed-off-by: Matthew Wilcox <matthew.r.wilcox@intel.com>	2012-07-26 13:43:20 -04:00
Keith Busch	a42ceccef0	NVMe: use namespace id for nvme_get_features The specification does not provide a use for command dword11 in the NVMe Get Features command, but does use the NSID for some features. Signed-off-by: Keith Busch <keith.busch@intel.com> Signed-off-by: Matthew Wilcox <matthew.r.wilcox@intel.com>	2012-07-26 12:46:36 -04:00
Keith Busch	50af8baec4	NVMe: replace nvme_ns with nvme_dev for user admin The function nvme_user_admin_command does not require a namespace to proceed. Replace with the nvme_dev structure so that it can be called from contexts that do not have a namespace. Signed-off-by: Keith Busch <keith.busch@intel.com> Signed-off-by: Matthew Wilcox <matthew.r.wilcox@intel.com>	2012-07-26 12:24:36 -04:00
Keith Busch	5c42ea1643	NVMe: Fix nvme module init when nvme_major is set register_blkdev returns 0 when given a valid major number. Reported-by:Ross Zwisler <ross.zwisler@intel.com> Signed-off-by: Keith Busch <keith.busch@intel.com> Signed-off-by: Matthew Wilcox <matthew.r.wilcox@intel.com>	2012-07-26 12:23:38 -04:00
Keith Busch	e9ef46369f	NVMe: Set request queue logical block size Sets the request queue logical block size with the block size of the namespace. Signed-off-by: Keith Busch <keith.busch@intel.com> Signed-off-by: Matthew Wilcox <matthew.r.wilcox@intel.com>	2012-07-25 14:52:49 -04:00
Lars Ellenberg	a73ff3231d	drbd: announce FLUSH/FUA capability to upper layers Unconditionally announce FLUSH/FUA to upper layers. If the lower layers on either node do not actually support this, generic_make_request() will deal with it. If this causes performance regressions on your setup, make sure there are no volatile caches involved, and mount -o nobarrier or equivalent. Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2012-07-24 15:14:28 +02:00
Lars Ellenberg	db141b2f42	drbd: fix max_bio_size to be unsigned We capped our max_bio_size respectively max_hw_sectors with min_t(int, lower level limit, our limit); unfortunately, some drivers, e.g. the kvm virtio block driver, initialize their limits to "-1U", and that is of course a smaller "int" value than our limit. Impact: we started to request 16 MB resync requests, which lead to protocol error and a reconnect loop. Fix all relevant constants and parameters to be unsigned int. Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2012-07-24 15:14:00 +02:00
Lars Ellenberg	7ee1fb93f3	drbd: flush drbd work queue before invalidate/invalidate remote If you do back to back wait-sync/invalidate on a Primary in a tight loop, during application IO load, you could trigger a race: kernel: block drbd6: FIXME going to queue 'set_n_write from StartingSync' but 'write from resync_finished' still pending? Fix this by changing the order of the drbd_queue_work() and the wake_up() in dec_ap_pending(), and adding the additional drbd_flush_workqueue() before requesting the full sync. Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2012-07-24 14:15:58 +02:00
Lars Ellenberg	c12e9c8964	drbd: fix potential access after free Occasionally, if we disconnect, we triggered this assert: block drbd7: ASSERT FAILED tl_hash[27] == c30b0f04, expected NULL hlist_del() happens only on master bio completion. We used to wait for pending IO to complete before freeing tl_hash on disconnect. We no longer do so, since we learned to "freeze" IO on disconnect. If the local disk is too slow, we may reach C_STANDALONE early, and there are still some requests pending locally when we call drbd_free_tl_hash(). If we now free the tl_hash, and later the local IO completion completes the master bio, which then does hlist_del() and clobbers freed memory. Do hlist_del_init() and hlist_add_fake() before kfree(tl_hash), so the hlist_del() on master bio completion is harmless. Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2012-07-24 14:15:16 +02:00
Lars Ellenberg	63a6d0bb3d	drbd: call local-io-error handler early In case we want to hard-reset from the local-io-error handler, we need to call it before notifying the peer or aborting local IO. Otherwise the peer will advance its data generation UUIDs even if secondary. This way, local io error looks like a "regular" node crash, which reduces the number of different failure cases. This may be useful in a bigger picture where crashed or otherwise "misbehaving" nodes are automatically re-deployed. Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2012-07-24 14:10:41 +02:00
Lars Ellenberg	0029d62434	drbd: do not reset rs_pending_cnt too early Fix asserts like block drbd0: in got_BlockAck:4634: rs_pending_cnt = -35 < 0 ! We reset the resync lru cache and related information (rs_pending_cnt), once we successfully finished a resync or online verify, or if the replication connection is lost. We also need to reset it if a resync or online verify is aborted because a lower level disk failed. In that case the replication link is still established, and we may still have packets queued in the network buffers which want to touch rs_pending_cnt. We do not have any synchronization mechanism to know for sure when all such pending resync related packets have been drained. To avoid this counter to go negative (and violate the ASSERT that it will always be >= 0), just do not reset it when we lose a disk. It is good enough to make sure it is re-initialized before the next resync can start: reset it when we re-attach a disk. Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2012-07-24 14:09:53 +02:00
Lars Ellenberg	88437879fb	drbd: reset congestion information before reporting it in /proc/drbd We cache the congestion status in mdev->congestion_reason whenever drbd_congested() was called. Reset this cached info before reporting it when reading /proc/drbd. Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2012-07-24 14:07:48 +02:00
Lars Ellenberg	c2ba686f35	drbd: report congestion if we are waiting for some userland callback If the drbd worker thread is synchronously waiting for some userland callback, we don't want some casual pageout to block on us. Have drbd_congested() report congestion in that case. Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2012-07-24 14:07:18 +02:00
Lars Ellenberg	383606e0de	drbd: differentiate between normal and forced detach Aborting local requests (not waiting for completion from the lower level disk) is dangerous: if the master bio has been completed to upper layers, data pages may be re-used for other things already. If local IO is still pending and later completes, this may cause crashes or corrupt unrelated data. Only abort local IO if explicitly requested. Intended use case is a lower level device that turned into a tarpit, not completing io requests, not even doing error completion. Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2012-07-24 14:06:18 +02:00
Lars Ellenberg	d264580145	drbd: cleanup, remove two unused global flags Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2012-07-24 14:02:41 +02:00
Jens Axboe	b1af9be5ef	Merge branch 'upstream' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/floppy into for-3.6/drivers	2012-07-24 13:15:08 +02:00
Linus Torvalds	7100e505b7	Power management updates for 3.6 * ACPI conversion to PM handling based on struct dev_pm_ops. * Conversion of a number of platform drivers to PM handling based on struct dev_pm_ops and removal of empty legacy PM callbacks from a couple of PCI drivers. * Suspend-to-both for in-kernel hibernation from Bojan Smojver. * cpuidle fixes and cleanups from ShuoX Liu, Daniel Lezcano and Preeti U Murthy. * cpufreq bug fixes from Jonghwa Lee and Stephen Boyd. * Suspend and hibernate fixes from Srivatsa S. Bhat and Colin Cross. * Generic PM domains framework updates. * RTC CMOS wakeup signaling update from Paul Fox. * sparse warnings fixes from Sachin Kamat. * Build warnings fixes for the generic PM domains framework and PM sysfs code. * sysfs switch for printing device suspend times from Sameer Nanda. * Documentation fix from Oskar Schirmer. -----BEGIN PGP SIGNATURE----- Version: GnuPG v2.0.18 (GNU/Linux) iQIcBAABAgAGBQJQDF5eAAoJEKhOf7ml8uNsEaAP/2wg4faoOGob5A0/7tLqG3Cw xnTmGsfL7wG07Q8ykCL1BSlBb1VeJz8L6LTmUpaABI4M//oIBlcYQKyCE0Tat1AO 9bJXFzK7qcHMhkTz6d6LDqtVzR3NGM3ypjZqj8aEXBov07LMR1AXvgNwXXhv25zM 0unwrh1XNinBN3n+oaktpWk1YHUjsa5IMU+2tQJrocuHXcgK30vGXZVrZ4g9w1c2 eS+ED1oKUqOYtFzIUX+aCtaDDheGaPlugk/GOtIB7Sae0s0vMlxH/T5ncB4SxRC+ v3s4OykqQc5Dc8+0bNlBH7ykSVNB0PoQiyKDY67CxtH+q1xQSc9/f3XJqnGMaVDE 17eZUZsL4qSyzRuCbPCGAgwBHmx3qNCMu1i1BcmnSxU+ikPUeCR7mYOP0mRThwPH OSfs+c/vZ+Ow6CwVE4UFrbm9Jve7ADnCrlZzT2m6XjhHGyjKP7SJlzP9TPsZ0LRk oxgQDYHmxbo50t9tBCz5L4ZTMKkDp28e78x84/CteP85srcW3GqDxrPyp2uzJu5O tvIEBvVlc4ucq8sG83RkugQwrG/2cQwG2HO9ERAwq01HHA1BYsuU3A961Jqf5CZo nFRSnByvVj/imPf47OWpDPAbVEs7jxufJuLEbPwGj1MkttTGDBIRu3zldXt2S6kP Q4qYU6fDaQQHFc90pqxQ =vC4/ -----END PGP SIGNATURE----- Merge tag 'pm-for-3.6-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm Pull power management updates from Rafael Wysocki: - ACPI conversion to PM handling based on struct dev_pm_ops. - Conversion of a number of platform drivers to PM handling based on struct dev_pm_ops and removal of empty legacy PM callbacks from a couple of PCI drivers. - Suspend-to-both for in-kernel hibernation from Bojan Smojver. - cpuidle fixes and cleanups from ShuoX Liu, Daniel Lezcano and Preeti Murthy. - cpufreq bug fixes from Jonghwa Lee and Stephen Boyd. - Suspend and hibernate fixes from Srivatsa Bhat and Colin Cross. - Generic PM domains framework updates. - RTC CMOS wakeup signaling update from Paul Fox. - sparse warnings fixes from Sachin Kamat. - Build warnings fixes for the generic PM domains framework and PM sysfs code. - sysfs switch for printing device suspend times from Sameer Nanda. - Documentation fix from Oskar Schirmer. * tag 'pm-for-3.6-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm: (70 commits) cpufreq: Fix sysfs deadlock with concurrent hotplug/frequency switch EXYNOS: bugfix on retrieving old_index from freqs.old PM / Sleep: call early resume handlers when suspend_noirq fails PM / QoS: Use NULL pointer instead of plain integer in qos.c PM / QoS: Use NULL pointer instead of plain integer in pm_qos.h PM / Sleep: Require CAP_BLOCK_SUSPEND to use wake_lock/wake_unlock PM / Sleep: Add missing static storage class specifiers in main.c cpuilde / ACPI: remove time from acpi_processor_cx structure cpuidle / ACPI: remove usage from acpi_processor_cx structure cpuidle / ACPI : remove latency_ticks from acpi_processor_cx structure rtc-cmos: report wakeups from interrupt handler PM / Sleep: Fix build warning in sysfs.c for CONFIG_PM_SLEEP unset PM / Domains: Fix build warning for CONFIG_PM_RUNTIME unset olpc-xo15-sci: Use struct dev_pm_ops for power management PM / Domains: Replace plain integer with NULL pointer in domain.c file PM / Domains: Add missing static storage class specifier in domain.c file PM / crypto / ux500: Use struct dev_pm_ops for power management PM / IPMI: Remove empty legacy PCI PM callbacks tpm_nsc: Use struct dev_pm_ops for power management tpm_tis: Use struct dev_pm_ops for power management ...	2012-07-22 13:36:52 -07:00
Theodore Ts'o	89c30f161c	xen-blkfront: remove IRQF_SAMPLE_RANDOM which is now a no-op With the changes in the random tree, IRQF_SAMPLE_RANDOM is now a no-op; interrupt randomness is now collected unconditionally in a very low-overhead fashion; see commit `775f4b297b`. The IRQF_SAMPLE_RANDOM flag was scheduled to be removed in 2009 on the feature-removal-schedule, so this patch is preparation for the final removal of this flag. Signed-off-by: "Theodore Ts'o" <tytso@mit.edu> Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Cc: Jeremy Fitzhardinge <jeremy@goop.org>	2012-07-19 10:39:25 -04:00
Rafael J. Wysocki	bfaa07bc32	Merge branch 'pm-drivers' * pm-drivers: rtc-cmos: report wakeups from interrupt handler PM / crypto / ux500: Use struct dev_pm_ops for power management PM / IPMI: Remove empty legacy PCI PM callbacks tpm_nsc: Use struct dev_pm_ops for power management tpm_tis: Use struct dev_pm_ops for power management tpm_atmel: Use struct dev_pm_ops for power management PM / TPM: Drop unused pm_message_t argument from tpm_pm_suspend() omap-rng: Use struct dev_pm_ops for power management mg_disk: Use struct dev_pm_ops for power management msi-laptop: Use struct dev_pm_ops for power management hdaps: Use struct dev_pm_ops for power management sonypi: Use struct dev_pm_ops for power management intel_mid_thermal: Use struct dev_pm_ops for power management acer-wmi: Use struct dev_pm_ops for power management intel_ips: Remove empty legacy PM callbacks thinkpad_acpi: Use struct dev_pm_ops instead of legacy PM routines thinkpad_acpi: Drop pm_message_t arguments from suspend routines	2012-07-19 00:03:42 +02:00
Dan Carpenter	6a3ca4f188	rbd: endian bug in rbd_req_cb() Sparse complains about this because: drivers/block/rbd.c:996:20: warning: cast to restricted __le32 drivers/block/rbd.c:996:20: warning: cast from restricted __le16 These are set in osd_req_encode_op() and they are le16. Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Reviewed-by: Alex Elder <elder@inktank.com> (cherry picked from commit `895cfcc810`)	2012-07-17 21:30:31 -07:00
Yan, Zheng	236df3755d	rbd: Fix ceph_snap_context size calculation ceph_snap_context->snaps is an u64 array Signed-off-by: Zheng Yan <zheng.z.yan@intel.com> Reviewed-by: Alex Elder <elder@inktank.com> (cherry picked from commit `f9f9a19044`)	2012-07-17 21:30:19 -07:00
Silva Paulo	68d740d79c	blk: fix wrong idr_pre_get() error check in loop.c The idr_pre_get() function never returns a value < 0. It returns 0 (no memory) or 1 (OK). Reported-by: Silva Paulo <psdasilva@yahoo.com> [ Rewrote Silva's patch, but attributing it to Silva anyway - Linus ] Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2012-07-14 15:39:58 -07:00
Rafael J. Wysocki	156ffcb42a	mg_disk: Use struct dev_pm_ops for power management Make the mg_disk driver define its PM callbacks through a struct dev_pm_ops object rather than by using legacy PM hooks in struct platform_driver. Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>	2012-07-06 19:07:00 +02:00
Andi Kleen	0cc15d03bc	floppy: Run floppy initialization asynchronous floppy_init is quite slow, 3s on my test system to determine that there is no floppy. Run it asynchronous to the other init calls to improve boot time. [jkosina@suse.cz: fix modular build] Signed-off-by: Andi Kleen <ak@linux.intel.com> Signed-off-by: Jiri Kosina <jkosina@suse.cz>	2012-07-04 14:01:40 +02:00
Linus Torvalds	dab058fd5f	floppy: cancel any pending fd_timeouts before adding a new one In commit `070ad7e793` ("floppy: convert to delayed work and single-thread wq") the 'fd_timeout' timer was converted to a delayed work. However, the "del_timer(&fd_timeout)" was lost in the process, and any previous pending timeouts would stay active when we then re-queued the timeout. This resulted in the floppy probe sequence having a (stale) 20s timeout rather than the intended 3s timeout, and thus made booting with the floppy driver (but no actual floppy controller) take much longer than it should. Of course, there's little reason for most people to compile the floppy driver into the kernel at all, which is why most people never noticed. Canceling the delayed work where we used to do the del_timer() fixes the issue, and makes the floppy probing use the proper new timeout instead. The three second timeout is still very wasteful, but better than the 20s one. Reported-and-tested-by: Andi Kleen <ak@linux.intel.com> Reported-and-tested-by: Calvin Walton <calvin.walton@kepstin.ca> Cc: Jiri Kosina <jkosina@suse.cz> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2012-07-03 15:51:22 -07:00
Sage Weil	9a64e8e0ac	Linux 3.5-rc1 -----BEGIN PGP SIGNATURE----- Version: GnuPG v2.0.18 (GNU/Linux) iQEcBAABAgAGBQJPyr4LAAoJEHm+PkMAQRiGhvMH/1uXaDmJiiyAtMhC9kQbLclK 5RpUOV+ukRrPXBJhwWGEZvC9G/DiWAfZ/19Ee6qTGZbA46yxkgZklqO+bw7fuOLH dPf4MNXdhgOgbs0KkVAk6aXIYzIU836pcYg+LcapG8E8SZp3SWbJzrVbUPFwPM+m Sv11ZcpJfM2HH9wFRdKErUOiZHsMY+LZHcw0nx+BObytjgzBbzHNkpF57F714TUO QplYpIToO3XtGhIM1yRDxww+2zFlVNsCZ8IC57EDbLb8BMZWuyZoFgWZqLAnrU0u vy7CHLledMSvs855juJ9JxGo/EDnfwJpCnjmcp8BY+h4b5T/k5mGK6d9aeXYRf4= =CcWn -----END PGP SIGNATURE----- Merge tag 'v3.5-rc1' Linux 3.5-rc1 Conflicts: net/ceph/messenger.c	2012-06-15 12:32:04 -07:00
Jens Axboe	6d407cfaf5	Merge branch 'for-jens' of git://git.drbd.org/linux-drbd into for-linus	2012-06-13 21:19:42 +02:00
Jens Axboe	987751719b	Merge branch 'stable/for-jens-3.5' of git://git.kernel.org/pub/scm/linux/kernel/git/konrad/xen into for-linus	2012-06-13 21:19:06 +02:00
Tao Guo	32587371ad	umem: fix up unplugging Fix a regression introduced by `7eaceaccab` ("block: remove per-queue plugging"). In that patch, Jens removed the whole mm_unplug_device() function, which used to be the trigger to make umem start to work. We need to implement unplugging to make umem start to work, or I/O will never be triggered. Signed-off-by: Tao Guo <Tao.Guo@emc.com> Cc: Neil Brown <neilb@suse.de> Cc: Jens Axboe <axboe@kernel.dk> Cc: Shaohua Li <shli@kernel.org> Cc: <stable@vger.kernel.org> Acked-by: NeilBrown <neilb@suse.de> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2012-06-13 21:17:21 +02:00
Lars Ellenberg	0d5934e3c2	drbd: fix null pointer dereference with on-congestion policy when diskless We must not look at mdev->actlog, unless we have a get_ldev() reference. It also does not make much sense to try to disconnect or pull-ahead of the peer, if we don't have good local data. Only even consider congestion policies, if our local disk is D_UP_TO_DATE. Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2012-06-12 14:35:19 +02:00
Lars Ellenberg	1ed25b269e	drbd: fix list corruption by failing but already aborted reads If a read is aborted due to force-detach of a supposedly unresponsive local backing device, and retried on the peer, it can happen that the local request later still completes (hopefully with an error). As it may already have been completed to upper layers meanwhile, it must not be retried again now. Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2012-06-12 14:34:51 +02:00
Lars Ellenberg	4eccc57979	drbd: fix access of unallocated pages and kernel panic BUG: unable to handle kernel NULL pointer dereference at (null) ... [<d1e17561>] ? _drbd_bm_set_bits+0x151/0x240 [drbd] [<d1e236f8>] ? receive_bitmap+0x4f8/0xbc0 [drbd] This fixes an off-by-one error in the receive_bitmap() path, if run-length encoded bitmap transfer is enabled. If the bitmap is an exact multiple of PAGE_SIZE, which means the visible capacity of the drbd device is an exact multiple of 128 MiB (for 4k page size), and bitmap compression (use-rle) is enabled (which became default with 8.4), and the very last bit is dirty and reported in an rle comressed bitmap packet, we ended up trying to kmap_atomic a page pointer that does not exist (bitmap->bm_pages[last index + 1]). bug introduced by: Date: Fri Jul 24 15:33:24 2009 +0200 set bits: optimize for complete last word, fix off-by-one-word corner case made effective by: Date: Thu Dec 16 00:32:38 2010 +0100 drbd: get rid of unused debug code Long time ago, we had paranoia code in the bitmap that allocated one extra word, assigned a magic value, and checked on every occasion that the magic value was still unchanged. That debug code is unused, the extra long word complicates code a bit. Get rid of it. No-one triggered this bug in the last few years, because a large subset of our userbase is unaffected: * typically the last few blocks of a device are not modified frequently, and remain unset * use-rle was disabled by default in drbd < 8.4 * those with slightly "odd" device sizes, or * drbd internal meta data (which will skew the device size slightly, thus makes it harder to have a bug relevant device size) Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2012-06-12 14:32:48 +02:00
Konrad Rzeszutek Wilk	6878c32e5c	xen/blkfront: Add WARN to deal with misbehaving backends. Part of the ring structure is the 'id' field which is under control of the frontend. The frontend stamps it with "some" value (this some in this implementation being a value less than BLK_RING_SIZE), and when it gets a response expects said value to be in the response structure. We have a check for the id field when spolling new requests but not when de-spolling responses. We also add an extra check in add_id_to_freelist to make sure that the 'struct request' was not NULL - as we cannot pass a NULL to __blk_end_request_all, otherwise that crashes (and all the operations that the response is dealing with end up with __blk_end_request_all). Lastly we also print the name of the operation that failed. [v1: s/BUG/WARN/ suggested by Stefano] [v2: Add extra check in add_id_to_freelist] [v3: Redid op_name per Jan's suggestion] [v4: add const * and add WARN on failure returns] Acked-by: Jan Beulich <jbeulich@suse.com> Acked-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>	2012-06-12 08:29:04 -04:00
Dan Carpenter	895cfcc810	rbd: endian bug in rbd_req_cb() Sparse complains about this because: drivers/block/rbd.c:996:20: warning: cast to restricted __le32 drivers/block/rbd.c:996:20: warning: cast from restricted __le16 These are set in osd_req_encode_op() and they are le16. Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Reviewed-by: Alex Elder <elder@inktank.com>	2012-06-06 09:23:54 -05:00
Yan, Zheng	f9f9a19044	rbd: Fix ceph_snap_context size calculation ceph_snap_context->snaps is an u64 array Signed-off-by: Zheng Yan <zheng.z.yan@intel.com> Reviewed-by: Alex Elder <elder@inktank.com>	2012-06-06 09:23:53 -05:00
Asai Thambi S P	7b421d24ea	mtip32xx: Create debugfs entries for troubleshooting On module load, creates a debugfs parent 'rssd' in debugfs root. Then for each device, create a new node with corresponding disk name. Under the new node, two entries 'registers' and 'flags' are created. NOTE: These entries were removed from sysfs in the previous patch Signed-off-by: Asai Thambi S P <asamymuthupa@micron.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2012-06-05 09:13:49 +02:00
Asai Thambi S P	7412ff139d	mtip32xx: Remove 'registers' and 'flags' from sysfs This patch removes entries 'registers' and 'flags' from sysfs. Updated ABI file to reflect this change. Reported-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Asai Thambi S P <asamymuthupa@micron.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2012-06-05 09:13:48 +02:00
Sachin Kamat	87c9ea76a2	mtip32xx: Remove version.h header file inclusion version.h header file inclusion is no longer required. Signed-off-by: Sachin Kamat <sachin.kamat@linaro.org>	2012-06-04 10:00:32 +02:00
Asai Thambi S P	b77874c969	mtip32xx: Changes to sysfs entries * Formatted the output of 'registers' entry * Added "Commands in Q' to output of 'registers' entry * Added a new entry 'flags' Signed-off-by: Asai Thambi S P <asamymuthupa@micron.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2012-05-31 08:46:50 +02:00
Asai Thambi S P	8ce800935d	mtip32xx: Convert macro definitions for flag bits to enum Convert macro definitions for flags bits to enum Signed-off-by: Asai Thambi S P <asamymuthupa@micron.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2012-05-31 08:46:50 +02:00
Asai Thambi S P	377b8fc6d7	mtip32xx: minor performance tweak When checking for command completions if the register value is zero, proceed to next register. Signed-off-by: Asai Thambi S P <asamymuthupa@micron.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2012-05-31 08:46:50 +02:00
Asai Thambi S P	e602878fd8	mtip32xx: Fix to support more than one sector in exec_drive_command() Fix to support more than one sector in exec_drive_command(). Signed-off-by: Asai Thambi S P <asamymuthupa@micron.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2012-05-31 08:46:50 +02:00
Asai Thambi S P	0a07ab224a	mtip32xx: Use plain spinlock for 'cmd_issue_lock' 'cmd_issue_lock' is for only acquiring a free slot, and it is not used in interrupt context. So replaced irq version with non-irq version of spinlock. Signed-off-by: Asai Thambi S P <asamymuthupa@micron.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2012-05-31 08:46:50 +02:00
Asai Thambi S P	6c8ab69818	mtip32xx: Set block queue boundary variables Set the following block queue boundary variables * max_hw_sectors * max_segment_size Signed-off-by: Asai Thambi S P <asamymuthupa@micron.com> Removed setting of q->nr_requests. Signed-off-by: Jens Axboe <axboe@kernel.dk>	2012-05-31 08:46:50 +02:00
Asai Thambi S P	d02e1f0ad0	mtip32xx: Fix to handle TFE for PIO(IOCTL/internal) commands If a PIO (IOCTL/internal) command resulted in TFE, signal the wait event or break out of polling. Signed-off-by: Asai Thambi S P <asamymuthupa@micron.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2012-05-31 08:36:55 +02:00
Asai Thambi S P	971890f258	mtip32xx: Change HDIO_GET_IDENTITY to return stored data For the ioctl command HDIO_GET_IDENTITY, return the stored copy of IDENTIFY DATA instead of sending the command to the device - similar to libata. Signed-off-by: Asai Thambi S P <asamymuthupa@micron.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2012-05-31 08:36:55 +02:00
Asai Thambi S P	2df7aa96e7	mtip32xx: Set custom timeouts for PIO commands This change sets custom timeouts depending on PIO command. Signed-off-by: Asai Thambi S P <asamymuthupa@micron.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2012-05-31 08:36:55 +02:00
Asai Thambi S P	6bb688c048	mtip32xx: fix clearing an incorrect register in mtip_init_port Fix clearing an incorrect register in mtip_init_port Signed-off-by: Asai Thambi S P <asamymuthupa@micron.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2012-05-31 08:36:55 +02:00
Konrad Rzeszutek Wilk	8c9ce606a6	xen/blkback: Copy id field when doing BLKIF_DISCARD. We weren't copying the id field so when we sent the response back to the frontend (especially with a 64-bit host and 32-bit guest), we ended up using a random value. This lead to the frontend crashing as it would try to pass to __blk_end_request_all a NULL 'struct request' (b/c it would use the 'id' to find the proper 'struct request' in its shadow array) and end up crashing: BUG: unable to handle kernel NULL pointer dereference at 000000e4 IP: [<c0646d4c>] __blk_end_request_all+0xc/0x40 .. snip.. EIP is at __blk_end_request_all+0xc/0x40 .. snip.. [<ed95db72>] blkif_interrupt+0x172/0x330 [xen_blkfront] This fixes the bug by passing in the proper id for the response. Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=824641 CC: stable@kernel.org Tested-by: William Dauchy <wdauchy@gmail.com> Acked-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>	2012-05-30 17:20:04 -04:00
Linus Torvalds	af56e0aa35	Merge git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client Pull ceph updates from Sage Weil: "There are some updates and cleanups to the CRUSH placement code, a bug fix with incremental maps, several cleanups and fixes from Josh Durgin in the RBD block device code, a series of cleanups and bug fixes from Alex Elder in the messenger code, and some miscellaneous bounds checking and gfp cleanups/fixes." Fix up trivial conflicts in net/ceph/{messenger.c,osdmap.c} due to the networking people preferring "unsigned int" over just "unsigned". * git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client: (45 commits) libceph: fix pg_temp updates libceph: avoid unregistering osd request when not registered ceph: add auth buf in prepare_write_connect() ceph: rename prepare_connect_authorizer() ceph: return pointer from prepare_connect_authorizer() ceph: use info returned by get_authorizer ceph: have get_authorizer methods return pointers ceph: ensure auth ops are defined before use ceph: messenger: reduce args to create_authorizer ceph: define ceph_auth_handshake type ceph: messenger: check return from get_authorizer ceph: messenger: rework prepare_connect_authorizer() ceph: messenger: check prepare_write_connect() result ceph: don't set WRITE_PENDING too early ceph: drop msgr argument from prepare_write_connect() ceph: messenger: send banner in process_connect() ceph: messenger: reset connection kvec caller libceph: don't reset kvec in prepare_write_banner() ceph: ignore preferred_osd field ceph: fully initialize new layout ...	2012-05-30 11:17:19 -07:00
Linus Torvalds	a70f35af4e	Merge branch 'for-3.5/drivers' of git://git.kernel.dk/linux-block Pull block driver updates from Jens Axboe: "Here are the driver related changes for 3.5. It contains: - The floppy changes from Jiri. Jiri is now also marked as the maintainer of floppy.c, I shall be publically branding his forehead with red hot iron at the next opportune moment. - A batch of drbd updates and fixes from the linbit crew, as well as fixes from others. - Two small fixes for xen-blkfront courtesy of Jan." * 'for-3.5/drivers' of git://git.kernel.dk/linux-block: (70 commits) floppy: take over maintainership floppy: remove floppy-specific O_EXCL handling floppy: convert to delayed work and single-thread wq xen-blkfront: module exit handling adjustments xen-blkfront: properly name all devices drbd: grammar fix in log message drbd: check MODULE for THIS_MODULE drbd: Restore the request restart logic drbd: introduce a bio_set to allocate housekeeping bios from drbd: remove unused define drbd: bm_page_async_io: properly initialize page->private drbd: use the newly introduced page pool for bitmap IO drbd: add page pool to be used for meta data IO drbd: allow bitmap to change during writeout from resync_finished drbd: fix race between drbdadm invalidate/verify and finishing resync drbd: fix resend/resubmit of frozen IO drbd: Ensure that data_size is not 0 before using data_size-1 as index drbd: Delay/reject other state changes while establishing a connection drbd: move put_ldev from __req_mod() to the endio callback drbd: fix WRITE_ACKED_BY_PEER_AND_SIS to not set RQ_NET_DONE ...	2012-05-30 09:05:47 -07:00
Linus Torvalds	99262a3daf	Autogenerated GPG tag for Rusty D1ADB8F1: 15EE 8D6C AB0E 7F0C F999 BFCB D920 0E6C D1AD B8F1 -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.11 (GNU/Linux) iQIcBAABAgAGBQJPuv35AAoJENkgDmzRrbjxUx4P/0uc+0oNnZv11vYQsqHuhURa zMlsVdlXGVkvPqQiLY0QkrK5LcO6KiSnSk8vEnOYFIPjL4wNqL/4RRRLnTAJwmE+ lsrL9DblI8Ira/EZRv7d2L12QrP+F2ZGKOZr67uVxSaxH71fUqtiJ0jqA/I8AYH7 /V7+DgdIB1DD28Ya/JEFEUi41F08A6MU10hpaQWy9kXv09gCc9apgvH7/S3s9DaQ G640YWkoKZAx/OFBb8XFvpu9LqZcVl02Nl8goMZOKnMctC4iU3km7HeVjfwCgLjO AdA5spLMhDkS/xrpI0mSQ/wT0k0+sSYW5vEdW9N4XLZza0NgH9GfU4RtEuK85Slj 7bPviZOcpjtt0sGi4wXCaVjZyHROX6tyRvTMUAIj3D0oJglb5T9D3MCvQnadILb0 I0+7gk3d9rHqkO6CmjNaZG9IwR9NpFkbuolcFQuEaZoUMoKd2pYNQyxpbFGl+jCl 7ViFHAy+fydNqDoETKincld4A43KWxOV7jyEJd7hloKcCixsqI7ZdPS7X8amec72 a0hfNgMJzarZkTgo61Hair/d+vKGRJPgEdF1Yq76SDhYKD1TeWeDjmboctsiMjqe f5M4C6IdNJj9cDIlCxMk+3bX250oy7KG77v7Ux0/7nvtSWVa3yEMowD57hnn1But 0gNC8bjXDHRsho90rDRN =Kj9v -----END PGP SIGNATURE----- Merge tag 'virtio-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rusty/linux-2.6-for-linus Pull virtio updates from Rusty Russell. * tag 'virtio-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rusty/linux-2.6-for-linus: virtio: fix typo in comment virtio-mmio: Devices parameter parsing virtio_blk: Drop unused request tracking list virtio-blk: Fix hot-unplug race in remove method virtio: Use ida to allocate virtio index virtio: balloon: separate out common code between remove and freeze functions virtio: balloon: drop restore_common() 9p: disconnect channel when PCI device is removed virtio: update documentation to v0.9.5 of spec	2012-05-21 20:20:23 -07:00
Asias He	f65ca1dc6a	virtio_blk: Drop unused request tracking list Benchmark shows small performance improvement on fusion io device. Before: seq-read : io=1,024MB, bw=19,982KB/s, iops=39,964, runt= 52475msec seq-write: io=1,024MB, bw=20,321KB/s, iops=40,641, runt= 51601msec rnd-read : io=1,024MB, bw=15,404KB/s, iops=30,808, runt= 68070msec rnd-write: io=1,024MB, bw=14,776KB/s, iops=29,552, runt= 70963msec After: seq-read : io=1,024MB, bw=20,343KB/s, iops=40,685, runt= 51546msec seq-write: io=1,024MB, bw=20,803KB/s, iops=41,606, runt= 50404msec rnd-read : io=1,024MB, bw=16,221KB/s, iops=32,442, runt= 64642msec rnd-write: io=1,024MB, bw=15,199KB/s, iops=30,397, runt= 68991msec Signed-off-by: Asias He <asias@redhat.com> Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>	2012-05-22 12:16:14 +09:30
Asias He	b79d866c8b	virtio-blk: Fix hot-unplug race in remove method If we reset the virtio-blk device before the requests already dispatched to the virtio-blk driver from the block layer are finised, we will stuck in blk_cleanup_queue() and the remove will fail. blk_cleanup_queue() calls blk_drain_queue() to drain all requests queued before DEAD marking. However it will never success if the device is already stopped. We'll have q->in_flight[] > 0, so the drain will not finish. How to reproduce the race: 1. hot-plug a virtio-blk device 2. keep reading/writing the device in guest 3. hot-unplug while the device is busy serving I/O Test: ~1000 rounds of hot-plug/hot-unplug test passed with this patch. Changes in v3: - Drop blk_abort_queue and blk_abort_request - Use __blk_end_request_all to complete request dispatched to driver Changes in v2: - Drop req_in_flight - Use virtqueue_detach_unused_buf to get request dispatched to driver Signed-off-by: Asias He <asias@redhat.com> Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>	2012-05-22 12:16:13 +09:30
David S. Miller	17eea0df5f	Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net	2012-05-20 21:53:04 -04:00
Linus Torvalds	14e931a264	Merge branch 'for-linus' of git://git.kernel.dk/linux-block Pull block layer fixes from Jens Axboe: "A few small, but important fixes. Most of them are marked for stable as well - Fix failure to release a semaphore on error path in mtip32xx. - Fix crashable condition in bio_get_nr_vecs(). - Don't mark end-of-disk buffers as mapped, limit it to i_size. - Fix for build problem with CONFIG_BLOCK=n on arm at least. - Fix for a buffer overlow on UUID partition printing. - Trivial removal of unused variables in dac960." * 'for-linus' of git://git.kernel.dk/linux-block: block: fix buffer overflow when printing partition UUIDs Fix blkdev.h build errors when BLOCK=n bio allocation failure due to bio_get_nr_vecs() block: don't mark buffers beyond end of disk as mapped mtip32xx: release the semaphore on an error path dac960: Remove unused variables from DAC960_CreateProcEntries()	2012-05-19 10:12:17 -07:00
Jens Axboe	4fd1ffaa12	Merge branch 'for-jens' of git://git.drbd.org/linux-drbd into for-3.5/drivers Philipp writes: This are the updates we have in the drbd-8.3 tree. They are intended for your "for-3.5/drivers" drivers branch. These changes include one new feature: * Allow detach from frozen backing devices with the new --force option; configurable timeout for backing devices by the new disk-timeout option And huge number of bug fixes: * Fixed a write ordering problem on SyncTarget nodes for a write to a block that gets resynced at the same time. The bug can only be triggered with a device that has a firmware that actually reorders writes to the same block * Fixed a race between disconnect and receive_state, that could cause a IO lockup * Fixed resend/resubmit for requests with disk or network timeout * Make sure that hard state changed do not disturb the connection establishing process (I.e. detach due to an IO error). When the bug was triggered it caused a retry in the connect process * Postpone soft state changes to no disturb the connection establishing process (I.e. becoming primary). When the bug was triggered it could cause both nodes going into SyncSource state * Fixed a refcount leak that could cause failures when trying to unload a protocol family modules, that was used by DRBD * Dedicated page pool for meta data IOs * Deny normal detach (as opposed to --forced) if the user tries to detach from the last UpToDate disk in the resource * Fixed a possible protocol error that could be caused by "unusual" BIOs. * Enforce the disk-timeout option also on meta-data IO operations * Implemented stable bitmap pages when we do a full write out of the bitmap * Fixed a rare compatibility issue with DRBD's older than 8.3.7 when negotiating the bio_size * Fixed a rare race condition where an empty resync could stall with if pause/unpause events happen in parallel * Made the re-establishing of connections quicker, if it got a broken pipe once. Previously there was a bug in the code caused it to waste the first successful established connection after a broken pipe event. PS: I am postponing the drbd-8.4 for mainline for one or two kernel development cycles more (the ~400 patchets set).	2012-05-18 16:20:06 +02:00
Jens Axboe	13828dec45	Merge branch 'stable/for-jens-3.5' of git://git.kernel.org/pub/scm/linux/kernel/git/konrad/xen into for-3.5/drivers Konrad writes: Please git pull the following branch: git://git.kernel.org/pub/scm/linux/kernel/git/konrad/xen.git stable/for-jens-3.5 in your for-3.5/drivers branch. The changes in it are rather simple - cleaning up some code and adding proper mechanism to unload without leaking memory.	2012-05-18 16:17:41 +02:00
Jiri Kosina	bfa10b8c98	floppy: remove floppy-specific O_EXCL handling Block layer now handles O_EXCL in a generic way for block devices. The semantics is however different for floppy and all other block devices, as floppy driver contains its own O_EXCL handling. The semantics for all-but-floppy bdevs is "there can be at most one O_EXCL open of this file", while for floppy bdev the semantics is "if someone has the bdev open with O_EXCL, noone else can open it". There is actual userspace-observable change in behavior because of this since commit `e525fd89d3` ("block: make blkdev_get/put() handle exclusive access") -- on kernels containing this commit, mount of /dev/fd0 causes the fd0 block device be claimed with _EXCL, preventing subsequent open(/dev/fd0). Bring things back into shape, i.e. make it possible, analogically to other block devices, to mount the floppy and open() it afterwards -- remove the floppy-specific handling and let the generic bdev code O_EXCL handling take over. Signed-off-by: Jiri Kosina <jkosina@suse.cz> Acked-by: Tejun Heo <tj@kernel.org> Acked-by: NeilBrown <neilb@suse.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Jiri Kosina <jkosina@suse.cz>	2012-05-18 15:19:11 +02:00
Jiri Kosina	070ad7e793	floppy: convert to delayed work and single-thread wq There are several races in floppy driver between bottom half (scheduled_work) and timers (fd_timeout, fd_timer). Due to slowness of the actual floppy devices, those races are never (at least to my knowledge) triggered on a bare floppy metal. However on virtualized (emulated) floppy drives, which are of course magnitudes faster than the real ones, these races trigger reliably. They usually exhibit themselves as NULL pointer dereferences during DMA setup, such as BUG: unable to handle kernel NULL pointer dereference at 0000000a [ ... snip ... ] EIP: 0060:[<c02053d5>] EFLAGS: 00010293 CPU: 0 EAX: ffffe000 EBX: 0000000a ECX: 00000000 EDX: 0000000a ESI: c05d2718 EDI: 00000000 EBP: 00000000 ESP: f540fe44 DS: 007b ES: 007b FS: 00d8 GS: 0000 SS: 0068 Process swapper (pid: 0, ti=f540e000 task=c082d5a0 task.ti=c0826000) Stack: ffffe000 00001ffc 00000000 00000000 00000000 c05d2718 c0708b40 f540fe80 c020470f c05d2718 c0708b40 00000000 f540fe80 0000000a f540fee4 00000000 c0708b40 f540fee4 00000000 00000000 c020526b 00000000 c05d2718 c0708b40 Call Trace: [<c020470f>] dump_trace+0xaf/0x110 [<c020526b>] show_trace_log_lvl+0x4b/0x60 [<c0205298>] show_trace+0x18/0x20 [<c05c5811>] dump_stack+0x6d/0x72 [<c0248527>] warn_slowpath_common+0x77/0xb0 [<c02485f3>] warn_slowpath_fmt+0x33/0x40 [<f7ec593c>] setup_DMA+0x14c/0x210 [floppy] [<f7ecaa95>] setup_rw_floppy+0x105/0x190 [floppy] [<c0256d08>] run_timer_softirq+0x168/0x2a0 [<c024e762>] __do_softirq+0xc2/0x1c0 [<c02042ed>] do_softirq+0x7d/0xb0 [<f54d8a00>] 0xf54d89ff but other instances can be easily seen as well. This can be observed at least under VMWare, VirtualBox and KVM. This patch converts all the timers and bottom halfs to be processed in a single workqueue. This aproach has been already discussed back in 2010 if I remember correctly, and Acked by Linus [1], but it then never made it to the tree. This all is based on original idea and code of Stephen Hemminger. I have ported original Stepen's code to the current state of the floppy driver, and performed quite some testing (on real hardware), which didn't reveal any issues (this includes not only writing and reading data, but also formatting (unfortunately I didn't find any Double-Density disks any more)). Ability to handle errors properly (supplying known bad floppies) has also been verified. [1] http://kerneltrap.org/mailarchive/linux-kernel/2010/6/11/4582092 Based-on-patch-by: Stephen Hemminger <shemminger@vyatta.com> Acked-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Jiri Kosina <jkosina@suse.cz>	2012-05-18 15:19:10 +02:00
David S. Miller	028940342a	Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net	2012-05-16 22:17:37 -04:00
Josh Durgin	263c6ca007	rbd: rename __rbd_update_snaps to __rbd_refresh_header This function rereads the entire header and handles any changes in it, not just changes in snapshots. Signed-off-by: Josh Durgin <josh.durgin@dreamhost.com> Reviewed-by: Alex Elder <elder@dreamhost.com> Reviewed-by: Yehuda Sadeh <yehuda@hq.newdream.net>	2012-05-14 12:13:09 -05:00
Josh Durgin	3591538fb2	rbd: fix snapshot size type Snapshot sizes should be the same type as regular image sizes. This only affects their displayed size in sysfs, not the reported size of an actual block device sizes. Signed-off-by: Josh Durgin <josh.durgin@dreamhost.com> Reviewed-by: Alex Elder <elder@dreamhost.com> Reviewed-by: Yehuda Sadeh <yehuda@hq.newdream.net>	2012-05-14 12:13:03 -05:00
Josh Durgin	b06e6a6be7	rbd: remove conditional snapid parameters The snapid parameters passed to rbd_do_op() and rbd_req_sync_op() are now always either a valid snapid or an explicit CEPH_NOSNAP. [elder@dreamhost.com: Rephrased the description] Signed-off-by: Josh Durgin <josh.durgin@dreamhost.com> Reviewed-by: Alex Elder <elder@dreamhost.com> Reviewed-by: Yehuda Sadeh <yehuda@hq.newdream.net>	2012-05-14 12:12:58 -05:00
Josh Durgin	77dfe99fe3	rbd: store snapshot id instead of index When a device was open at a snapshot, and snapshots were deleted or added, data from the wrong snapshot could be read. Instead of assuming the snap context is constant, store the actual snap id when the device is initialized, and rely on the OSDs to signal an error if we try reading from a snapshot that was deleted. Signed-off-by: Josh Durgin <josh.durgin@dreamhost.com> Reviewed-by: Alex Elder <elder@dreamhost.com> Reviewed-by: Yehuda Sadeh <yehuda@hq.newdream.net>	2012-05-14 12:12:52 -05:00
Josh Durgin	403f24d3d5	rbd: protect read of snapshot sequence number This is updated whenever a snapshot is added or deleted, and the snapc pointer is changed with every refresh of the header. Signed-off-by: Josh Durgin <josh.durgin@dreamhost.com> Reviewed-by: Alex Elder <elder@dreamhost.com> Reviewed-by: Yehuda Sadeh <yehuda@hq.newdream.net>	2012-05-14 12:12:46 -05:00
Xi Wang	50f7c4c967	rbd: fix integer overflow in rbd_header_from_disk() ondisk->snap_count is read from disk via rbd_req_sync_read() and thus needs validation. Otherwise, a bogus `snap_count' could overflow the kmalloc() size, leading to memory corruption. Also use `u32' consistently for `snap_count'. [elder@dreamhost.com: changed to use UINT_MAX rather than ULONG_MAX] Signed-off-by: Xi Wang <xi.wang@gmail.com> Reviewed-by: Alex Elder <elder@dreamhost.com>	2012-05-14 12:12:41 -05:00
Dan Carpenter	f8ad495a8a	rbd: use gfp_flags parameter in rbd_header_from_disk() We should use the gfp_flags that the caller specified instead of GFP_KERNEL here. There is only one caller and it uses GFP_KERNEL, so this change is just a cleanup and doesn't change how the code works. Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Reviewed-by: Alex Elder <elder@dreamhost.com>	2012-05-14 12:12:35 -05:00
Jan Beulich	8605067fb9	xen-blkfront: module exit handling adjustments The blkdev major must be released upon exit, or else the module can't attach to devices using the same majors upon being loaded again. Also avoid leaking the minor tracking bitmap. Signed-off-by: Jan Beulich <jbeulich@suse.com> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>	2012-05-11 16:11:54 -04:00
Jan Beulich	e77c78c022	xen-blkfront: properly name all devices - devices beyond xvdzz didn't get proper names assigned at all - extended devices with minors not representable within the kernel's major/minor bit split spilled into foreign majors Signed-off-by: Jan Beulich <jbeulich@suse.com> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>	2012-05-11 16:11:52 -04:00
Asai Thambi S P	a09ba13eef	mtip32xx: release the semaphore on an error path Release the semaphore in an error path in mtip_hw_get_scatterlist(). This fixes the smatch warning inconsistent returns. Reported-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: Asai Thambi S P <asamymuthupa@micron.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2012-05-11 16:42:14 +02:00
Jesper Juhl	d88a440edd	dac960: Remove unused variables from DAC960_CreateProcEntries() The variables 'StatusProcEntry' and 'UserCommandProcEntry' are assigned to once and then never used. This patch gets rid of the variables. While I was there I also fixed the indentation of the function to use tabs rather than spaces for the lines that did not already do so. Signed-off-by: Jesper Juhl <jj@chaosbits.net> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2012-05-11 16:42:14 +02:00
Eric W. Biederman	38bf195398	connector/userns: replace netlink uses of cap_raised() with capable() In 2009 Philip Reiser notied that a few users of netlink connector interface needed a capability check and added the idiom cap_raised(nsp->eff_cap, CAP_SYS_ADMIN) to a few of them, on the premise that netlink was asynchronous. In 2011 Patrick McHardy noticed we were being silly because netlink is synchronous and removed eff_cap from the netlink_skb_params and changed the idiom to cap_raised(current_cap(), CAP_SYS_ADMIN). Looking at those spots with a fresh eye we should be calling capable(CAP_SYS_ADMIN). The only reason I can see for not calling capable is that it once appeared we were not in the same task as the caller which would have made calling capable() impossible. In the initial user_namespace the only difference between between cap_raised(current_cap(), CAP_SYS_ADMIN) and capable(CAP_SYS_ADMIN) are a few sanity checks and the fact that capable(CAP_SYS_ADMIN) sets PF_SUPERPRIV if we use the capability. Since we are going to be using root privilege setting PF_SUPERPRIV seems the right thing to do. The motivation for this that patch is that in a child user namespace cap_raised(current_cap(),...) tests your capabilities with respect to that child user namespace not capabilities in the initial user namespace and thus will allow processes that should be unprivielged to use the kernel services that are only protected with cap_raised(current_cap(),..). To fix possible user_namespace issues and to just clean up the code replace cap_raised(current_cap(), CAP_SYS_ADMIN) with capable(CAP_SYS_ADMIN). Signed-off-by: Eric W. Biederman <ebiederm@xmission.com> Cc: Patrick McHardy <kaber@trash.net> Cc: Philipp Reisner <philipp.reisner@linbit.com> Acked-by: Serge E. Hallyn <serge.hallyn@canonical.com> Acked-by: Andrew G. Morgan <morgan@kernel.org> Cc: Vasiliy Kulikov <segoon@openwall.com> Cc: David Howells <dhowells@redhat.com> Reviewed-by: James Morris <james.l.morris@oracle.com> Cc: David Miller <davem@davemloft.net> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2012-05-10 23:21:39 -04:00
Lars Ellenberg	92b4ca291f	drbd: grammar fix in log message Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2012-05-10 12:00:56 +02:00
Cong Wang	bc4854bc91	drbd: check MODULE for THIS_MODULE THIS_MODULE is NULL only when drbd is compiled as built-in, so the #ifdef CONFIG_MODULES should be #ifdef MODULE instead. This fixes the warning: drivers/block/drbd/drbd_main.c: In function ‘drbd_buildtag’: drivers/block/drbd/drbd_main.c:4187:24: warning: the comparison will always evaluate as ‘true’ for the address of ‘__this_module’ will never be NULL [-Waddress] Signed-off-by: WANG Cong <xiyou.wangcong@gmail.com> Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2012-05-10 12:00:54 +02:00
Philipp Reisner	f6d0a8dbfd	drbd: Restore the request restart logic It got lost with the commit `5a7bbad27a` "block: remove support for bio remapping from ->make_request" Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2012-05-09 17:20:59 +02:00
Lars Ellenberg	9476f39d66	drbd: introduce a bio_set to allocate housekeeping bios from Don't rely on availability of bios from the global fs_bio_set, we should use our own bio_set for meta data IO. Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2012-05-09 15:17:07 +02:00
Lars Ellenberg	3c2f7a856f	drbd: remove unused define Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2012-05-09 15:17:06 +02:00
Arne Redlich	0c7db27920	drbd: bm_page_async_io: properly initialize page->private If bm_page_async_io is advised to use a new page for I/O (BM_AIO_COPY_PAGES is set), it will get it from a mempool. Once the mempool has to dip into its reserves the page is not reinitialized, i.e. page->private contains garbage, which will lead to various problems once the I/O completes (dereferences of NULL pointers, the submitting thread getting stuck in D-state, ...). Signed-off-by: Arne Redlich <arne.redlich@googlemail.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com> Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>	2012-05-09 15:17:04 +02:00
Lars Ellenberg	4d95a10f97	drbd: use the newly introduced page pool for bitmap IO Conflicts: drbd/drbd_bitmap.c Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2012-05-09 15:17:03 +02:00
Lars Ellenberg	4281808fb3	drbd: add page pool to be used for meta data IO Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2012-05-09 15:17:02 +02:00
Lars Ellenberg	0e8488ade2	drbd: allow bitmap to change during writeout from resync_finished Symptom: messages similar to "FIXME asender in bm_change_bits_to, bitmap locked for 'write from resync_finished' by worker" If a resync or verify is finished (or aborted), a full bitmap writeout is triggered. If we have ongoing local IO, the bitmap may still change during that writeout, pending and not yet processed acks may cause bits to be cleared, while new writes may cause bits to be to be set. To fix this, introduce the drbd_bm_write_copy_pages() variant. Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2012-05-09 15:17:00 +02:00
Lars Ellenberg	a574daf5d7	drbd: fix race between drbdadm invalidate/verify and finishing resync When a resync or online verify is finished or aborted, drbd does a bulk write-out of changed bitmap pages. If in that very moment a new verify or resync is triggered, this can race: ASSERT( !test_bit(BITMAP_IO, &mdev->flags) ) in drbd_main.c FIXME going to queue 'set_n_write from StartingSync' but 'write from resync_finished' still pending? and similar. This can be observed with e.g. tight invalidate loops in test scripts, and probably has no real-life implication. Still, that race can be solved by first quiescen the device, before starting a new resync or verify. Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2012-05-09 15:16:59 +02:00
Lars Ellenberg	ba280c092e	drbd: fix resend/resubmit of frozen IO DRBD can freeze IO, due to fencing policy (fencing resource-and-stonith), or because we lost access to data (on-no-data-accessible suspend-io). Resuming from there (re-connect, or re-attach, or explicit admin intervention) should "just work". Unfortunately, if the re-attach/re-connect did not happen within the timeout, since the commit drbd: Implemented real timeout checking for request processing time if so configured, the request_timer_fn() would timeout and detach/disconnect virtually immediately. This change tracks the most recent attach and connect, and does not timeout within <configured timeout interval> after attach/connect. Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2012-05-09 15:16:58 +02:00
Philipp Reisner	5de738272e	drbd: Ensure that data_size is not 0 before using data_size-1 as index This could be exploited by a peer which runs modified code. Reported-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2012-05-09 15:16:56 +02:00
Philipp Reisner	197296ffed	drbd: Delay/reject other state changes while establishing a connection Changes to the role and disk state should be delayed or rejected while we establish a connection. This is necessary, since the peer will base its resync decision on the UUIDs and the state we sent in the drbd_connect() function. The most prominent example for this race is becoming primary after sending state and UUIDs and before the state changes to C_WF_CONNECTION. Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2012-05-09 15:16:55 +02:00
Lars Ellenberg	46385c84ac	drbd: move put_ldev from __req_mod() to the endio callback One invocation in the endio handler is good enough, we don't need mention it for each of the different ways it calls __req_mod(). Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2012-05-09 15:16:51 +02:00
Lars Ellenberg	d64957c9a9	drbd: fix WRITE_ACKED_BY_PEER_AND_SIS to not set RQ_NET_DONE Just because this request happened during a resync does not mean it may pretend to have been barrier-acked. Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2012-05-09 15:16:50 +02:00
Lars Ellenberg	41c4a0035b	drbd: fix READ_RETRY_REMOTE_CANCELED to not complete if device is suspended READ_RETRY_REMOTE_CANCELED needs to be grouped with the other _CANCELED cases, not with CONNECTION_LOST_WHILE_PENDING, as that would complete (fail) the bio even if the device became suspended. Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2012-05-09 15:16:48 +02:00
Lars Ellenberg	6d49e101fd	drbd: make OOS_HANDED_TO_NETWORK its own case OOS_HANDED_TO_NETWORK should not be grouped with the various _CANCELED/_FAILED cases. Also, not only clear the RQ_NET_QUEUED flag, but also mark it RQ_NET_DONE, so it can be distinguished from a local-only request even after that. Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2012-05-09 15:16:47 +02:00
Lars Ellenberg	c088b2d904	drbd: don't pretend that barrier_nr == 0 was special We used to have a barrier implementation where barrier_nr 0 was reserved. That is long gone. Just use the full sequence space. Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2012-05-09 15:16:46 +02:00
Lars Ellenberg	7ffcaa7194	drbd: remove unused static helper function Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2012-05-09 15:16:44 +02:00
Lars Ellenberg	a5d214f621	drbd: remove some very outdated comments Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2012-05-09 15:16:43 +02:00
Lars Ellenberg	1abc2af205	drbd: missing wakeup after drbd_rs_del_all Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2012-05-09 15:16:42 +02:00
Lars Ellenberg	671a74e749	drbd: remove now unused seq_num member from struct drbd_request Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2012-05-09 15:16:40 +02:00
Lars Ellenberg	001a88687a	drbd: fix potential data corruption and protocol error We assumed only bios with bi_idx == 0 would end up in drbd_make_request(). That is wrong. At least device mapper, in __clone_and_map(), may submit clones only covering a partial bio, but sharing the original bvec, by adjusting bi_idx and relevant other bio members of the clone. We used __bio_for_each_segment() in various places, even though that is documented as * drivers should not use the __ version unless they _really_ want to * run through the entire bio and not just pending pieces Impact: we would send the full bio bvec, even for the clone with bi_idx > 0, which will cause data corruption on the peer (because we submit wrong data at the clone offset), and will cause a DRBD protocol error, disconnect/reconnect and resync (thus fixing the corruption), because the next package header would be expected right in the middle of the sent data, causing DRBD magic mismatch. Fix: drop the assert, and use bio_for_each_segment() instead of the __ version. Conflicts: drbd/drbd_tracing.c Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2012-05-09 15:16:39 +02:00
Philipp Reisner	b6a370ba07	drbd: Fix a potential write ordering issue on SyncTarget nodes If a SyncTarget node gets a P_RS_DATA_REPLY before a P_DATA packet for the same sector, it simply submits these two IO requests. This is be possible because on the SyncSource node, the data of the P_RS_DATA_REPLY packet was read from disk. Immediately after that a write request from upper layers came in. The disk scheduler or even the "hardware" queues on the disk drive might reorder these writes. Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2012-05-09 15:16:38 +02:00
Philipp Reisner	fc28845bc0	drbd: Fix a potential race that could case data inconsistency When we have a write request and a state change C_WF_BITMAP_S -> C_SYNC_SOURCE at the same time, and it happens that the line remote = remote && drbd_should_do_remote(s); stills sees C_WF_BITMAP_S, and send_oos = rw == WRITE && drbd_should_send_oos(s); already sees C_SYNC_SOURCE both are 0. This causes the write to not be mirrored, but marked as out-of-sync on the Sync_Source node. Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2012-05-09 15:16:34 +02:00
Lars Ellenberg	031a7c173f	drbd: add missing part_round_stats to _drbd_start_io_acct Without this, iostat frequently sees bogus svctime and >= 100% "utilization". Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2012-05-09 15:16:33 +02:00
Lars Ellenberg	47a4f1c1bb	drbd: Fix module refcount leak in drbd_accept() drbd_accept was modelled after kernel_accept with drbd commit 53eb779 in July 2008. Only, kernel_accept was then broken, and only fixed later with kernel commit `1b08534e` in Dec 2008: net: Fix module refcount leak in kernel_accept() Impact: protocol families provided as modules, e.g. ipv6 or ib_sdp, would soon have their reference count become negative, preventing them from being unloaded (likely), or worse, hit zero without actually being unused, allowing them to be unloaded while still in use (unlikely, but if triggered, causing a kernel crash). Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2012-05-09 15:16:32 +02:00
Philipp Reisner	7caacb69ac	drbd: Consider the disk-timeout also for meta-data IO operations If the backing device is already frozen during attach, we failed to recognize that. The current disk-timeout code works on top of the drbd_request objects. During attach we do not allow IO and therefore never generate a drbd_request object but block before that in drbd_make_request(). This patch adds the timeout to all drbd_md_sync_page_io(). Before this patch we used to go from D_ATTACHING directly to D_DISKLESS if IO failed during attach. We can no longer do this since we have to stay in D_FAILED until all IO ops issued to the backing device returned. Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2012-05-09 15:16:30 +02:00
Philipp Reisner	4afc433cf8	drbd: Do not send state packets while lower than C_CONNECTED cstate I.e. in C_WF_REPORT_PARAMS or in C_WF_CONNECTION. Sending may already work in these cstates, but the peer still expects the HandShake / ConnectionFeatures packet. Actually triggered by the Testuite on kugel. Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2012-05-09 15:16:29 +02:00
Lars Ellenberg	545752d5d8	drbd: fix race between disconnect and receive_state If the asender thread, or request_timer_fn(), or some other part of the code, decided to drop the connection (because of timeout or other), but the receiver just now was processing a P_STATE packet, there was a chance that receive_state() would do a hard state change "re-establishing" an already failed connection without additional handshake. Log excerpt: Remote failed to finish a request within ko-count * timeout peer( Secondary -> Unknown ) conn( Connected -> Timeout ) pdsk( UpToDate -> DUnknown ) asender terminated ... peer( Unknown -> Secondary ) conn( Timeout -> Connected ) pdsk( DUnknown -> UpToDate ) peer_isp( 0 -> 1 ) ... Connection closed peer( Secondary -> Unknown ) conn( Connected -> Unconnected ) pdsk( UpToDate -> DUnknown ) peer_isp( 1 -> 0 ) receiver terminated Impact: while the connection state is erroneously "Connected", requests may be queued and even sent, which would never be acknowledged, and may have been missed by the cleanup. These requests would never be completed. The next drbd_suspend_io() will then lock up, waiting forever for these requests to complete. Fixed in several code paths: Make sure the connection state is NetworkFailure or worse before starting the cleanup in drbd_disconnect(). This should make sure the cleanup won't miss any requests. Disallow receive_state() to "upgrade" the connection state from an error state. This will make sure the "illegal" state transition won't happen. For all connection failure states, relax the safe-guard in sanitize_state() again to silently mask out those state changes (e.g. Timeout -> Connected becomes Timeout -> Timeout). Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2012-05-09 15:16:01 +02:00
Lars Ellenberg	763eb63625	drbd: fix potential spinlock deadlock drbd_try_clear_on_disk_bm() has a sanity check for the number of blocks left to be resynced (rs_left) in the current resync extent. If it detects a mismatch, it complains, and forces a disconnect using drbd_force_state(mdev, NS(conn, C_DISCONNECTING)); Unfortunately, this may be called while holding the req_lock, and drbd_force_state() want's to aquire that lock itself. Deadlock. Don't force a disconnect, but fix up rs_left by recounting and reassigning the number of dirty blocks in that extent. Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2012-05-09 15:15:58 +02:00
Philipp Reisner	e89868a092	drbd: Fixed an obvious copy-n-paste mistake This bug might have caused troubles if disk-barriers and the ahead-behind more are enabled at the same time. Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2012-05-09 15:15:57 +02:00
Lars Ellenberg	f479ea0661	drbd: send intermediate state change results to the peer DRBD state changes schedule after_state_ch() actions to a worker thread, which decides on the old and new states of that change, whether to send an informational state update packet (P_STATE) to the peer. If it decides to drbd_send_state(), it would however always send the _curent_ state, which, if a second state change happens before the after_state_ch() of the first ran, may "fast-forward" the peer's view about this node. In most cases that is harmless, but sometimes this can confuse DRBD, for example into not actually starting a necessary resync if you do a very tight detach/attach loop on a Connected Secondary. Fix this by always sending the "new" state of the respective state transition which scheduled this after_state_ch() work. Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2012-05-09 15:15:56 +02:00
Lars Ellenberg	a2e9138197	drbd: fix spurious meta data IO "error" When detaching, even cleanly detaching due to administrator request, we always go through D_FAILED before we become D_DISKLESS. Don't let that state change race with an in-flight meta data IO, or that one might think it actually experienced an IO error. Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2012-05-09 15:15:54 +02:00
Philipp Reisner	aaae506d54	drbd: Fixed a race condition between detach and start of resync drbd_state_lock() is only there to serialize cluster wide state changes. Testing the local disk state needs to happen while holding the global_state_lock. Otherwise you might see something like this (Oct 6 on kugel) 14:20:24 drbd0: conn( WFSyncUUID -> Connected ) disk( Inconsistent -> Failed ) 14:20:24 drbd0: helper command: /sbin/drbdadm before-resync-target minor-0 exit code 0 (0x0) 14:20:24 drbd0: conn( Connected -> SyncTarget ) disk( Failed -> Inconsistent ) Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2012-05-09 15:15:53 +02:00
Lars Ellenberg	6a9a92f4ef	drbd: fix harmless race to not trigger an ASSERT We have one pre-allocated page to do certain synchronous meta data IO with, using it is serialized like so: drbd_md_get_buffer(); drbd_md_sync_page_io(); drbd_md_sync_page_io(); ... drbd_md_put_buffer(); In drbd_md_sync_page_io() there is an ASSERT(atomic_read(&mdev->md_io_in_use) == 1); We want to be able to timeout on unresponsive lower level devices, so we can "detach" in that case. Inside drbd_md_sync_page_io() we grab an extra reference, to not have a dangling pointer in case a delayed IO eventually does still complete, even after we "detached" already. We need to put the extra reference before we signal completion from the completion handler, or the second drbd_md_sync_page_io() above may trigger the assert (reference count still 2). Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2012-05-09 15:15:52 +02:00
Philipp Reisner	5ba3dac521	drbd: Derive sync-UUIDs only from the bitmap-uuid if it is non-zero Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2012-05-09 15:15:50 +02:00
Andreas Gruenbacher	7b4e4d3126	drbd: drbd_nl_resize(): Fix missing put_ldev() on error path Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2012-05-09 15:15:49 +02:00
Lars Ellenberg	40424e4a24	drbd: fix "stalled" empty resync With sync-after dependencies, given "lucky" timing of pause/unpause events, and the end of an empty (0 bits set) resync was sometimes not detected on the SyncTarget, leading to a "stalled" SyncSource state. Fixed this by expecting not only "Inconsistent -> UpToDate" but also "Consistent -> UpToDate" transitions for the peer disk state to end a resync. Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2012-05-09 15:15:47 +02:00
Philipp Reisner	1e86ac48af	drbd: Bugfix for the connection behavior If we get into the C_BROKEN_PIPE cstate once, the state engine set the thi->t_state of the receiver thread to restarting. But with the while loop in drbdd_init() a new connection gets established. After the call into drbdd() returns immediately since the thi->t_state is not RUNNING. The restart of drbd_init() then resets thi->t_state to RUNNING. I.e. after entering C_BROKEN_PIPE once, the next successful established connection gets wasted. The two parts of the fix: * Do not cause the thread to restart if we detect the issue with the sockets while we are in C_WF_CONNECTION. * Make sure that all actions that would have set us to C_BROKEN_PIPE happen before the state change to C_WF_REPORT_PARAMS. Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2012-05-09 15:15:46 +02:00
Philipp Reisner	80f9fd55a6	drbd: Cleanup all epoch objects upon connection loss Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2012-05-09 15:15:44 +02:00
Philipp Reisner	fd2491f4a4	drbd: detach must not try to abort non-local requests from drbd-8.4 Cherry picked form 8.4 Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2012-05-09 15:15:43 +02:00
Philipp Reisner	79f16f5dbc	drbd: Consider that the no-data-condition could be in connected state ...when the peer has inconsistent data. In that case we failed to clear the susp_nod flag. When the local disk was attached again Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2012-05-09 15:15:42 +02:00
Philipp Reisner	bca482e90b	drbd: Fixed current UUID generation Now, the new edition of the clause only fires if a diskless peer gets promoted. This is a fixup for "drbd: Delayed creation of current-UUID". Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2012-05-09 15:10:50 +02:00
Lars Ellenberg	22f46ce2ef	drbd: change some GFP_KERNEL to GFP_NOIO Bitmap IO may happend in the context of an application write, in the generic block IO path. We need to use GFP_NOIO. Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2012-05-09 15:10:47 +02:00
Philipp Reisner	dfa8bedbfe	drbd: Implemented the disk-timeout option When the disk-timeout is active, and it expires for a single request, we consider the local disk as D_FAILED. Note: With this change, I made both timeout based state transitions HARD state transitions. Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2012-05-09 15:10:45 +02:00
Philipp Reisner	02ee8f95fa	drbd: Force flag for the detach operation Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2012-05-09 15:10:38 +02:00
Philipp Reisner	5ca1de0384	drbd: Allow new IOs while the local disk in in FAILED state The last bunch of commits prepared the 'detach from tar pit' feature. With that we can be for long time in disk state FAILED. We need to accept new IO requests during that time. Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2012-05-09 15:10:34 +02:00
Philipp Reisner	9e58c4dad7	drbd: Bitmap IO functions can now return prematurely if the disk breaks Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2012-05-09 15:10:33 +02:00
Philipp Reisner	d1f3779bbe	drbd: Added a kref to bm_aio_ctx Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2012-05-09 10:37:19 +02:00
Philipp Reisner	b2057629ea	drbd: Hold a reference to ldev while doing meta-data IO Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2012-05-09 10:31:11 +02:00
Philipp Reisner	4a2fe568b5	drbd: Keep a reference to the bio until the completion handler finished Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2012-05-09 10:28:51 +02:00
Philipp Reisner	0c46442515	drbd: Implemented wait_until_done_or_disk_failure() Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2012-05-09 10:26:51 +02:00
Philipp Reisner	e17117310b	drbd: Replaced md_io_mutex by an atomic: md_io_in_use The new function drbd_md_get_buffer() aborts waiting for the buffer in case the disk failes in the meantime. Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2012-05-09 10:22:31 +02:00
Philipp Reisner	cc94c65015	drbd: moved md_io into mdev Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2012-05-09 10:17:24 +02:00
Philipp Reisner	2b4dd36fba	drbd: Immediately allow completion of IOs, that wait for IO completions on a failed disk Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2012-05-09 10:16:04 +02:00
Philipp Reisner	6d7e32f568	drbd: Keep a reference to barrier acked requests Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2012-05-09 10:15:28 +02:00
Philipp Reisner	6809384c71	drbd: Improve compatibility with drbd's older than 8.3.7 Regression introduced with 8.3.11 commit: drbd: Take a more conservative approach when deciding max_bio_size Never ever tell an older drbd, that we support more than 32KiB in a single data request (packet). Never believe an older drbd, that is supports more than 32KiB in a single data request (packet) Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2012-05-09 10:08:57 +02:00
Philipp Reisner	77e8fdfc18	drbd: Only print sanitize state's warnings, if the state change happens The reason for this change is that, with when doing 'drbdadm invalidate' on a disconnected resource caused an "implicitly set pdsk from UpToDate to DUnknown" message, which was missleading. Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2012-05-09 10:08:22 +02:00
Lars Ellenberg	07667347c8	drbd: downgraded error printk to info Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2012-05-09 10:05:25 +02:00
David Howells	5f138ce01a	DRBD: Fix comparison always false warning due to long/long long compare Fix warnings of the following nature in the drbd header: In file included from drivers/block/drbd/drbd_bitmap.c:32: drivers/block/drbd/drbd_int.h: In function 'drbd_get_syncer_progress': drivers/block/drbd/drbd_int.h:2234: warning: comparison is always false due to limited range of data where mdev->rs_total (an unsigned long) is being compared to 1ULL << 32, which is always false on a 32-bit machine. Signed-off-by: David Howells <dhowells@redhat.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com> Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>	2012-05-09 10:03:19 +02:00
Lars Ellenberg	7948bcdc38	drbd: spelling fix: too small It is not "to small", but "too small". Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2012-05-09 10:02:22 +02:00
Lars Ellenberg	1381e9a496	drbd: cosmetic: fix accidental division instead of modulo when pretty printing For large resync rates, seq_printf_with_thousands_grouping() accidentally only produced Y,000,00Y, instead of the real numbers. Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2012-05-09 10:01:39 +02:00
Philipp Reisner	ebd2b0cde5	drbd: Lower log priority for an event that is definitely not an error Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2012-05-09 09:59:29 +02:00
Sage Weil	3469ac1aa3	ceph: drop support for preferred_osd pgs This was an ill-conceived feature that has been removed from Ceph. Do this gracefully: - reject attempts to specify a preferred_osd via the ioctl - stop exposing this information via virtual xattrs - always fill in -1 for requests, in case we talk to an older server - don't calculate preferred_osd placements/pgids Reviewed-by: Alex Elder <elder@inktank.com> Signed-off-by: Sage Weil <sage@inktank.com>	2012-05-07 15:33:36 -07:00
David S. Miller	f24001941c	Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net Fix merge between commit `3adadc08cc` ("net ax25: Reorder ax25_exit to remove races") and commit `0ca7a4c87d` ("net ax25: Simplify and cleanup the ax25 sysctl handling") The former moved around the sysctl register/unregister calls, the later simply removed them. With help from Stephen Rothwell. Signed-off-by: David S. Miller <davem@davemloft.net>	2012-04-23 23:15:17 -04:00
Pavel Emelyanov	4a17fd5229	sock: Introduce named constants for sk_reuse Name them in a "backward compatible" manner, i.e. reuse or not are still 1 and 0 respectively. The reuse value of 2 means that the socket with it will forcibly reuse everyone else's port. Signed-off-by: Pavel Emelyanov <xemul@openvz.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2012-04-21 15:52:25 -04:00
Linus Torvalds	c1acb0ba33	Fixes in various components: * mechanism to work with misconfigured backends (where they are advertised but in reality don't exist). * two tiny compile warning fixes. * proper error handling in gnttab_resume * Not using VM_PFNMAP anymore to allow backends in the same domain. -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.11 (GNU/Linux) iQEcBAABAgAGBQJPkYeqAAoJEFjIrFwIi8fJEXIIAI+PYLNMcHTc4bxa6pErpKaS rq5eCXL9+EaZOwTUqHRJjfrjnlAc+BWO8lN0H41oRQWFYh14hgfUVJ+ziEujb1kw N1eTMVHnH/XRJV6rIFX+TiBasnyoMmNfWEAb45UL1nEUTMPL1Jv7AiRY/GxUlHyg M+uFG52KP3ytXxcIiGW6pYEqJd6UgWrqnclaeg5TR5zvDlWfJbUIBEMQ/PyV0WSS 4e7biiwi4XPWT2f1qewOmI+3r68CltU3GAs1XxjcSX+bYYuh00UtY39AsBWo2N8I 1VORuq0QPs+GB22r3e47IqBcjXkBGRIf6w1e/5a6WLiq7TqVq4bYgCGUmWT8V7o= =olCU -----END PGP SIGNATURE----- Merge tag 'stable/for-linus-3.4-rc3-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/konrad/xen Pull xen fixes from Konrad Rzeszutek Wilk: - mechanism to work with misconfigured backends (where they are advertised but in reality don't exist). - two tiny compile warning fixes. - proper error handling in gnttab_resume - Not using VM_PFNMAP anymore to allow backends in the same domain. * tag 'stable/for-linus-3.4-rc3-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/konrad/xen: Revert "xen/p2m: m2p_find_override: use list_for_each_entry_safe" xen/resume: Fix compile warnings. xen/xenbus: Add quirk to deal with misconfigured backends. xen/blkback: Fix warning error. xen/p2m: m2p_find_override: use list_for_each_entry_safe xen/gntdev: do not set VM_PFNMAP xen/grant-table: add error-handling code on failure of gnttab_resume	2012-04-20 11:31:00 -07:00
Konrad Rzeszutek Wilk	a71e23d992	xen/blkback: Fix warning error. drivers/block/xen-blkback/xenbus.c: In function 'xen_blkbk_discard': drivers/block/xen-blkback/xenbus.c:419:4: warning: passing argument 1 of 'dev_warn' makes pointer from integer without a cast +[enabled by default] include/linux/device.h:894:5: note: expected 'const struct device *' but argument is of type 'long int' It is unclear how that mistake made it in. It surely is wrong. Acked-by: Jens Axboe <axboe@kernel.dk> Reported-by: Stephen Rothwell <sfr@canb.auug.org.au> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>	2012-04-18 15:54:08 -04:00
Linus Torvalds	cdd5983063	virtio: fixes on top of 3.4-rc2 Here are some virtio fixes for 3.4: a test build fix, a patch by Ren fixing naming for systems with a massive number of virtio blk devices, and balloon fixes for powerpc by David Gibson. There was some discussion about Ren's patch for virtio disc naming: some people wanted to move the legacy name mangling function to the block core. But there's no concensus on that yet, and we can always deduplicate later. Added comments in the hope that this will stop people from copying this legacy naming scheme into future drivers. Signed-off-by: Michael S. Tsirkin <mst@redhat.com> -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.11 (GNU/Linux) iQEcBAABAgAGBQJPio1GAAoJECgfDbjSjVRpGDAH/3C/bXm9mriuNauRHwktHgJe gmh2BfUgnxly6vheuz0Fv61lTe6V8kekHVolbUYwAUgXeWEKK1C59xehrMGRIPDG 1XUiti50U3P+skhIfrbkS5nZ7L+5Hk0ToQ6dd9v0BM2GxDOvgwidlY1bZe+wJEZf Lvl6w/djBCr1e3k4qfRnpTcdJJ4FnOjGbikLQhSTGfUXeNo6uWS1hljYWnAhzFkd 1xU8h5PP0TDR0nYb80CeB+9Lxw0w4qyNPJIBhNN6ucB/1U6R+55HpEpmrLUkn910 sEFEFsc0cRVWr8FiOTlmzxLHnwTc8AY/Bsp9TMSmnTRu3ZQcoQMTQQCczRj04xI= =VmpJ -----END PGP SIGNATURE----- Merge tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost Pull virtio fixes from Michael S. Tsirkin: "Here are some virtio fixes for 3.4: a test build fix, a patch by Ren fixing naming for systems with a massive number of virtio blk devices, and balloon fixes for powerpc by David Gibson. There was some discussion about Ren's patch for virtio disc naming: some people wanted to move the legacy name mangling function to the block core. But there's no concensus on that yet, and we can always deduplicate later. Added comments in the hope that this will stop people from copying this legacy naming scheme into future drivers." * tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost: virtio_balloon: fix handling of PAGE_SIZE != 4k virtio_balloon: Fix endian bug virtio_blk: helper function to format disk names tools/virtio: fix up vhost/test module build	2012-04-16 18:34:12 -07:00
Linus Torvalds	c104f1fa1e	Merge branch 'for-3.4/drivers' of git://git.kernel.dk/linux-block Pull block driver bits from Jens Axboe: - A series of fixes for mtip32xx. Most from Asai at Micron, but also one from Greg, getting rid of the dependency on PCIE_HOTPLUG. - A few bug fixes for xen-blkfront, and blkback. - A virtio-blk fix for Vivek, making resize actually work. - Two fixes from Stephen, making larger transfers possible on cciss. This is needed for tape drive support. * 'for-3.4/drivers' of git://git.kernel.dk/linux-block: block: mtip32xx: remove HOTPLUG_PCI_PCIE dependancy mtip32xx: dump tagmap on failure mtip32xx: fix handling of commands in various scenarios mtip32xx: Shorten macro names mtip32xx: misc changes mtip32xx: Add new sysfs entry 'status' mtip32xx: make setting comp_time as common mtip32xx: Add new bitwise flag 'dd_flag' mtip32xx: fix error handling in mtip_init() virtio-blk: Call revalidate_disk() upon online disk resize xen/blkback: Make optional features be really optional. xen/blkback: Squash the discard support for 'file' and 'phy' type. mtip32xx: fix incorrect value set for drv_cleanup_done, and re-initialize and start port in mtip_restart_port() cciss: Fix scsi tape io with more than 255 scatter gather elements cciss: Initialize scsi host max_sectors for tape drive support xen-blkfront: make blkif_io_lock spinlock per-device xen/blkfront: don't put bdev right after getting it xen-blkfront: use bitmap_set() and bitmap_clear() xen/blkback: Enable blkback on HVM guests xen/blkback: use grant-table.c hypercall wrappers	2012-04-13 18:45:13 -07:00
Ren Mingxin	c0aa3e0916	virtio_blk: helper function to format disk names The current virtio block's naming algorithm just supports 18278 (26^3 + 26^2 + 26) disks. If there are more virtio blocks, there will be disks with the same name. Based on commit `3e1a7ff8a0`, add a function "virtblk_name_format()" for virtio block to support mass of disks naming. Notes: - Our naming scheme is ugly. We are stuck with it for virtio but don't use it for any new driver: new drivers should name their devices PREFIX%d where the sequence number can be allocated by ida - sd_format_disk_name has exactly the same logic. Moving it to a central place was deferred over worries that this will make people keep using the legacy naming in new drivers. We kept code idential in case someone wants to deduplicate later. Signed-off-by: Ren Mingxin <renmx@cn.fujitsu.com> Acked-by: Asias He <asias@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>	2012-04-12 10:37:05 +03:00
Greg Kroah-Hartman	6363480651	block: mtip32xx: remove HOTPLUG_PCI_PCIE dependancy This removes the HOTPLUG_PCI_PCIE dependency on the driver and makes it depend on PCI. Cc: Sam Bradshaw <sbradshaw@micron.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Acked-by: Asai Thambi S P <asamymuthupa@micron.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2012-04-12 08:47:05 +02:00
Asai Thambi S P	95fea2f1d9	mtip32xx: dump tagmap on failure Dump tagmap on failure, instead of individual tags. Signed-off-by: Asai Thambi S P <asamymuthupa@micron.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2012-04-09 08:35:39 +02:00
Asai Thambi S P	c74b0f586f	mtip32xx: fix handling of commands in various scenarios * If a ncq command time out and a non-ncq command is active, skip restart port * Queue(pause) ncq commands during operations spanning more than one non-ncq commands - secure erase, download microcode * When a non-ncq command is active, allow incoming non-ncq commands to wait instead of failing back * Changed timeout for download microcode and smart commands * If the device in write protect mode, fail all writes (do not send to device) * Set maximum retries to 2 Signed-off-by: Asai Thambi S P <asamymuthupa@micron.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2012-04-09 08:35:39 +02:00
Asai Thambi S P	8a857a880b	mtip32xx: Shorten macro names Shortened macros used to represent mtip_port->flags and dd->dd_flag Signed-off-by: Asai Thambi S P <asamymuthupa@micron.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2012-04-09 08:35:38 +02:00
Asai Thambi S P	8182b49528	mtip32xx: misc changes * Handle the interrupt completion of polled internal commands * Do not check remove pending flag for standby command * On rebuild failure, - set corresponding bit dd_flag - do not send standby command * Free ida index in remove path Signed-off-by: Asai Thambi S P <asamymuthupa@micron.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2012-04-09 08:35:38 +02:00
Asai Thambi S P	f65872177d	mtip32xx: Add new sysfs entry 'status' * Add support for detecting the following device status - write protect - over temp (thermal shutdown) * Add new sysfs entry 'status', possible values - online, write_protect, thermal_shutdown * Add new file 'sysfs-block-rssd' to document ABI (Reported-by: Greg Kroah-Hartman) Signed-off-by: Asai Thambi S P <asamymuthupa@micron.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2012-04-09 08:35:38 +02:00
Asai Thambi S P	dad40f16ff	mtip32xx: make setting comp_time as common Moved setting completion time into mtip_issue_ncq_command() Signed-off-by: Asai Thambi S P <asamymuthupa@micron.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2012-04-09 08:35:38 +02:00
Asai Thambi S P	45038367c2	mtip32xx: Add new bitwise flag 'dd_flag' * Merged the following flags into one variable 'dd_flag': * drv_cleanup_done * resumeflag * Added the following flags into 'dd_flag' * remove pending * init done * Removed 'ftlrebuildflag' (similar flag is already part of mti_port->flags) Signed-off-by: Asai Thambi S P <asamymuthupa@micron.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2012-04-09 08:35:38 +02:00
Linus Torvalds	9479f0f801	Two fixes for regressions: * one is a workaround that will be removed in v3.5 with proper fix in the tip/x86 tree, * the other is to fix drivers to load on PV (a previous patch made them only load in PVonHVM mode). The rest are just minor fixes in the various drivers and some cleanup in the core code. -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.11 (GNU/Linux) iQEcBAABAgAGBQJPfyVUAAoJEFjIrFwIi8fJUjUH/jbY5JavRqSlNELZW2A4Ta76 8p00LqLHw/C56iHZcWKke8mqtWNb+ZfcQt7ZYcxDIYa4QWBL28x0OLAO2tOBIt37 ZjYESWSdFJaJvmpADluWtFyGyZ9TYJllDTBm/jWj1ZtKSZvR1YkhuMXCS0f4AmGQ xFzSWJZUDdiOAqpN+VQD8wP00gfR8knQLg16XE2fvFdQo4XwpCtqLfHV/5pMMGdy Cs/ep6rq/7cdv/nshKOcBnw7RW8l3Xoi/28ht8k3DvAQ2VtFq1Tugv2G9pcCHwQG DIBkB3SOU6/v6P5at5+egKS5xR1fJetCWlkMd8kkbcdz2NPI4UDMkvOW6Q8yQls= =6Ve+ -----END PGP SIGNATURE----- Merge tag 'stable/for-linus-3.4-rc1-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/konrad/xen Pull xen fixes from Konrad Rzeszutek Wilk: "Two fixes for regressions: * one is a workaround that will be removed in v3.5 with proper fix in the tip/x86 tree, * the other is to fix drivers to load on PV (a previous patch made them only load in PVonHVM mode). The rest are just minor fixes in the various drivers and some cleanup in the core code." * tag 'stable/for-linus-3.4-rc1-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/konrad/xen: xen/pcifront: avoid pci_frontend_enable_msix() falsely returning success xen/pciback: fix XEN_PCI_OP_enable_msix result xen/smp: Remove unnecessary call to smp_processor_id() xen/x86: Workaround 'x86/ioapic: Add register level checks to detect bogus io-apic entries' xen: only check xen_platform_pci_unplug if hvm	2012-04-06 17:54:53 -07:00
Igor Mammedov	e95ae5a493	xen: only check xen_platform_pci_unplug if hvm commit b9136d207f08 xen: initialize platform-pci even if xen_emul_unplug=never breaks blkfront/netfront by not loading them because of xen_platform_pci_unplug=0 and it is never set for PV guest. Signed-off-by: Andrew Jones <drjones@redhat.com> Signed-off-by: Igor Mammedov <imammedo@redhat.com> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>	2012-04-06 12:12:52 -04:00
Alex Elder	cd9d9f5df6	rbd: don't hold spinlock during messenger flush A recent change made changes to the rbd_client_list be protected by a spinlock. Unfortunately in rbd_put_client(), the lock is taken before possibly dropping the last reference to an rbd_client, and on the last reference that eventually calls flush_workqueue() which can sleep. The problem was flagged by a debug spinlock warning: BUG: spinlock wrong CPU on CPU#3, rbd/27814 The solution is to move the spinlock acquisition and release inside rbd_client_release(), which is the spot where it's really needed for protecting the removal of the rbd_client from the client list. Signed-off-by: Alex Elder <elder@dreamhost.com> Reviewed-by: Sage Weil <sage@newdream.net>	2012-04-05 15:43:58 -05:00
Ryosuke Saito	6d27f09a63	mtip32xx: fix error handling in mtip_init() Ensure that block device is properly unregistered, if pci_register_driver() fails. Signed-off-by: Ryosuke Saito <raitosyo@gmail.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2012-04-05 08:09:34 -06:00
Len Brown	f6365201d8	x86: Remove the ancient and deprecated disable_hlt() and enable_hlt() facility The X86_32-only disable_hlt/enable_hlt mechanism was used by the 32-bit floppy driver. Its effect was to replace the use of the HLT instruction inside default_idle() with cpu_relax() - essentially it turned off the use of HLT. This workaround was commented in the code as: "disable hlt during certain critical i/o operations" "This halt magic was a workaround for ancient floppy DMA wreckage. It should be safe to remove." H. Peter Anvin additionally adds: "To the best of my knowledge, no-hlt only existed because of flaky power distributions on 386/486 systems which were sold to run DOS. Since DOS did no power management of any kind, including HLT, the power draw was fairly uniform; when exposed to the much hhigher noise levels you got when Linux used HLT caused some of these systems to fail. They were by far in the minority even back then." Alan Cox further says: "Also for the Cyrix 5510 which tended to go castors up if a HLT occurred during a DMA cycle and on a few other boxes HLT during DMA tended to go astray. Do we care ? I doubt it. The 5510 was pretty obscure, the 5520 fixed it, the 5530 is probably the oldest still in any kind of use." So, let's finally drop this. Signed-off-by: Len Brown <len.brown@intel.com> Signed-off-by: Josh Boyer <jwboyer@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Acked-by: "H. Peter Anvin" <hpa@zytor.com> Acked-by: Alan Cox <alan@lxorguk.ukuu.org.uk> Cc: Stephen Hemminger <shemminger@vyatta.com Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: <stable@kernel.org> Link: http://lkml.kernel.org/n/tip-3rhk9bzf0x9rljkv488tloib@git.kernel.org [ If anyone cares then alternative instruction patching could be used to replace HLT with a one-byte NOP instruction. Much simpler. ] Signed-off-by: Ingo Molnar <mingo@kernel.org>	2012-03-30 08:50:27 +02:00
Vivek Goyal	e9986f303d	virtio-blk: Call revalidate_disk() upon online disk resize If a virtio disk is open in guest and a disk resize operation is done, (virsh blockresize), new size is not visible to tools like "fdisk -l". This seems to be happening as we update only part->nr_sects and not bdev->bd_inode size. Call revalidate_disk() which should take care of it. I tested growing disk size of already open disk and it works for me. Signed-off-by: Vivek Goyal <vgoyal@redhat.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2012-03-29 10:09:44 +02:00
Linus Torvalds	532bfc851a	Merge branch 'akpm' (Andrew's patch-bomb) Merge third batch of patches from Andrew Morton: - Some MM stragglers - core SMP library cleanups (on_each_cpu_mask) - Some IPI optimisations - kexec - kdump - IPMI - the radix-tree iterator work - various other misc bits. "That'll do for -rc1. I still have ~10 patches for 3.4, will send those along when they've baked a little more." * emailed from Andrew Morton <akpm@linux-foundation.org>: (35 commits) backlight: fix typo in tosa_lcd.c crc32: add help text for the algorithm select option mm: move hugepage test examples to tools/testing/selftests/vm mm: move slabinfo.c to tools/vm mm: move page-types.c from Documentation to tools/vm selftests/Makefile: make `run_tests' depend on `all' selftests: launch individual selftests from the main Makefile radix-tree: use iterators in find_get_pages* functions radix-tree: rewrite gang lookup using iterator radix-tree: introduce bit-optimized iterator fs/proc/namespaces.c: prevent crash when ns_entries[] is empty nbd: rename the nbd_device variable from lo to nbd pidns: add reboot_pid_ns() to handle the reboot syscall sysctl: use bitmap library functions ipmi: use locks on watchdog timeout set on reboot ipmi: simplify locking ipmi: fix message handling during panics ipmi: use a tasklet for handling received messages ipmi: increase KCS timeouts ipmi: decrease the IPMI message transaction time in interrupt mode ...	2012-03-28 17:19:28 -07:00
Wanlong Gao	f4507164e7	nbd: rename the nbd_device variable from lo to nbd rename the nbd_device variable from "lo" to "nbd", since "lo" is just a name copied from loop.c. Signed-off-by: Wanlong Gao <gaowanlong@cn.fujitsu.com> Cc: Paul Clements <paul.clements@steeleye.com> Cc: Jens Axboe <axboe@kernel.dk> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2012-03-28 17:14:37 -07:00
Linus Torvalds	0195c00244	Disintegrate and delete asm/system.h -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.12 (GNU/Linux) iQIVAwUAT3NKzROxKuMESys7AQKElw/+JyDxJSlj+g+nymkx8IVVuU8CsEwNLgRk 8KEnRfLhGtkXFLSJYWO6jzGo16F8Uqli1PdMFte/wagSv0285/HZaKlkkBVHdJ/m u40oSjgT013bBh6MQ0Oaf8pFezFUiQB5zPOA9QGaLVGDLXCmgqUgd7exaD5wRIwB ZmyItjZeAVnDfk1R+ZiNYytHAi8A5wSB+eFDCIQYgyulA1Igd1UnRtx+dRKbvc/m rWQ6KWbZHIdvP1ksd8wHHkrlUD2pEeJ8glJLsZUhMm/5oMf/8RmOCvmo8rvE/qwl eDQ1h4cGYlfjobxXZMHqAN9m7Jg2bI946HZjdb7/7oCeO6VW3FwPZ/Ic75p+wp45 HXJTItufERYk6QxShiOKvA+QexnYwY0IT5oRP4DrhdVB/X9cl2MoaZHC+RbYLQy+ /5VNZKi38iK4F9AbFamS7kd0i5QszA/ZzEzKZ6VMuOp3W/fagpn4ZJT1LIA3m4A9 Q0cj24mqeyCfjysu0TMbPtaN+Yjeu1o1OFRvM8XffbZsp5bNzuTDEvviJ2NXw4vK 4qUHulhYSEWcu9YgAZXvEWDEM78FXCkg2v/CrZXH5tyc95kUkMPcgG+QZBB5wElR FaOKpiC/BuNIGEf02IZQ4nfDxE90QwnDeoYeV+FvNj9UEOopJ5z5bMPoTHxm4cCD NypQthI85pc= =G9mT -----END PGP SIGNATURE----- Merge tag 'split-asm_system_h-for-linus-20120328' of git://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-asm_system Pull "Disintegrate and delete asm/system.h" from David Howells: "Here are a bunch of patches to disintegrate asm/system.h into a set of separate bits to relieve the problem of circular inclusion dependencies. I've built all the working defconfigs from all the arches that I can and made sure that they don't break. The reason for these patches is that I recently encountered a circular dependency problem that came about when I produced some patches to optimise get_order() by rewriting it to use ilog2(). This uses bitops - and on the SH arch asm/bitops.h drags in asm-generic/get_order.h by a circuituous route involving asm/system.h. The main difficulty seems to be asm/system.h. It holds a number of low level bits with no/few dependencies that are commonly used (eg. memory barriers) and a number of bits with more dependencies that aren't used in many places (eg. switch_to()). These patches break asm/system.h up into the following core pieces: (1) asm/barrier.h Move memory barriers here. This already done for MIPS and Alpha. (2) asm/switch_to.h Move switch_to() and related stuff here. (3) asm/exec.h Move arch_align_stack() here. Other process execution related bits could perhaps go here from asm/processor.h. (4) asm/cmpxchg.h Move xchg() and cmpxchg() here as they're full word atomic ops and frequently used by atomic_xchg() and atomic_cmpxchg(). (5) asm/bug.h Move die() and related bits. (6) asm/auxvec.h Move AT_VECTOR_SIZE_ARCH here. Other arch headers are created as needed on a per-arch basis." Fixed up some conflicts from other header file cleanups and moving code around that has happened in the meantime, so David's testing is somewhat weakened by that. We'll find out anything that got broken and fix it.. * tag 'split-asm_system_h-for-linus-20120328' of git://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-asm_system: (38 commits) Delete all instances of asm/system.h Remove all #inclusions of asm/system.h Add #includes needed to permit the removal of asm/system.h Move all declarations of free_initmem() to linux/mm.h Disintegrate asm/system.h for OpenRISC Split arch_align_stack() out from asm-generic/system.h Split the switch_to() wrapper out of asm-generic/system.h Move the asm-generic/system.h xchg() implementation to asm-generic/cmpxchg.h Create asm-generic/barrier.h Make asm-generic/cmpxchg.h #include asm-generic/cmpxchg-local.h Disintegrate asm/system.h for Xtensa Disintegrate asm/system.h for Unicore32 [based on ver #3, changed by gxt] Disintegrate asm/system.h for Tile Disintegrate asm/system.h for Sparc Disintegrate asm/system.h for SH Disintegrate asm/system.h for Score Disintegrate asm/system.h for S390 Disintegrate asm/system.h for PowerPC Disintegrate asm/system.h for PA-RISC Disintegrate asm/system.h for MN10300 ...	2012-03-28 15:58:21 -07:00
Linus Torvalds	47b816ff7d	Merge branch 'next' of git://git.kernel.org/pub/scm/linux/kernel/git/benh/powerpc Pull a few more things for powerpc by Benjamin Herrenschmidt: - Anton's did some recent improvements to EPOW event reporting on pSeries (power supply failures and such). The patches are self contained enough and replace really nasty code so I felt it should still go in - I did the vio driver registration change Greg requested, I don't see the point of leaving that til the next merge window - The remaining EEH changes I said were still pending to get rid of the EEH references from the generic struct device_node - A few more iSeries removal bits - A perf bug fix on 970 * 'next' of git://git.kernel.org/pub/scm/linux/kernel/git/benh/powerpc: powerpc/perf: Fix instruction address sampling on 970 and Power4 powerpc+sparc/vio: Modernize driver registration powerpc: Random little legacy iSeries removal tidy ups powerpc: Remove NO_IRQ_IGNORE powerpc/pseries: Cut down on enthusiastic use of defines in RAS code powerpc/pseries: Clean up ras_error_interrupt code powerpc/pseries: Remove RTAS_POWERMGM_EVENTS powerpc/pseries: Use rtas_get_sensor in RAS code powerpc/pseries: Parse and handle EPOW interrupts powerpc: Make function that parses RTAS error logs global powerpc/eeh: Retrieve PHB from global list powerpc/eeh: Remove eeh information from pci_dn powerpc/eeh: Remove eeh device from OF node	2012-03-28 14:41:36 -07:00
David Howells	9ffc93f203	Remove all #inclusions of asm/system.h Remove all #inclusions of asm/system.h preparatory to splitting and killing it. Performed with the following command: perl -p -i -e 's!^#\sinclude\s<asm/system[.]h>.\n!!' `grep -Irl '^#\sinclude\s<asm/system[.]h>' ` Signed-off-by: David Howells <dhowells@redhat.com>	2012-03-28 18:30:03 +01:00
Linus Torvalds	56b59b429b	Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client Pull Ceph updates for 3.4-rc1 from Sage Weil: "Alex has been busy. There are a range of rbd and libceph cleanups, especially surrounding device setup and teardown, and a few critical fixes in that code. There are more cleanups in the messenger code, virtual xattrs, a fix for CRC calculation/checks, and lots of other miscellaneous stuff. There's a patch from Amon Ott to make inos behave a bit better on 32-bit boxes, some decode check fixes from Xi Wang, and network throttling fix from Jim Schutt, and a couple RBD fixes from Josh Durgin. No new functionality, just a lot of cleanup and bug fixing." * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client: (65 commits) rbd: move snap_rwsem to the device, rename to header_rwsem ceph: fix three bugs, two in ceph_vxattrcb_file_layout() libceph: isolate kmap() call in write_partial_msg_pages() libceph: rename "page_shift" variable to something sensible libceph: get rid of zero_page_address libceph: only call kernel_sendpage() via helper libceph: use kernel_sendpage() for sending zeroes libceph: fix inverted crc option logic libceph: some simple changes libceph: small refactor in write_partial_kvec() libceph: do crc calculations outside loop libceph: separate CRC calculation from byte swapping libceph: use "do" in CRC-related Boolean variables ceph: ensure Boolean options support both senses libceph: a few small changes libceph: make ceph_tcp_connect() return int libceph: encapsulate some messenger cleanup code libceph: make ceph_msgr_wq private libceph: encapsulate connection kvec operations libceph: move prepare_write_banner() ...	2012-03-28 10:01:29 -07:00
Benjamin Herrenschmidt	cb52d8970e	powerpc+sparc/vio: Modernize driver registration This makes vio_register_driver() get the module owner & name at compile time like PCI drivers do, and adds a name pointer directly in struct vio_driver to avoid having to explicitly initialize the embedded struct device. Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Acked-by: David S. Miller <davem@davemloft.net>	2012-03-28 11:33:24 +11:00
Jens Axboe	6674fb79ca	Merge branch 'stable/for-jens-3.4-bugfixes' of git://git.kernel.org/pub/scm/linux/kernel/git/konrad/xen into for-3.4/drivers Konrad writes: I've two small fixes for the xen-blkback - and I think one more will show up eventually (a partial revert), but not sure when. So in the spirit of keeping the patches flowing, please git pull the following branch.	2012-03-26 09:13:14 +02:00
Linus Torvalds	e22057c859	One tiny feature that accidentally got lost in the initial git pull: * Add fast-EOI acking of interrupts (clear a bit instead of hypercall) And bug-fixes: * Fix CPU bring-up code missing a call to notify other subsystems. * Fix reading /sys/hypervisor even if PVonHVM drivers are not loaded. * In Xen ACPI processor driver: remove too verbose WARN messages, fix up the Kconfig dependency to be a module by default, and add dependency on CPU_FREQ. * Disable CPU frequency drivers from loading when booting under Xen (as we want the Xen ACPI processor to be used instead). * Cleanups in tmem code. -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.11 (GNU/Linux) iQEcBAABAgAGBQJPbc3DAAoJEFjIrFwIi8fJTQkIAMnH2fPhcHAb4mNaz+3gdmsZ Flo6V1gMBcO8xKZlUkFgKKPYoOm7lLmvoceXLVSH5oOKSnSJo1zSinzKmcdJQo/D kPo4/EguNwtzcAcQh2dmT6/IM9O3ihMKUli7Oajif9PLCFFFqTaG3Y3YNBo/rxTY D3HAnJrIfmIyG0NpLnaFCWhCzUvcB4M7ysutECqcF8l5gnbHxRVeCKD0blM+n9GH Wyum00dQCwo6h6wTduhPOAxHAM4rncyR3heOB2vDxq9YJHSUhhcva5QCgQ+tdUVt 6U2TQT1L2Px8iXXzr2w9YBpepOVajZReoKhajLjJ5VbkpBZFz5dVNfJ8LpF8RV8= =z8IB -----END PGP SIGNATURE----- Merge tag 'stable/for-linus-3.4-tag-two' of git://git.kernel.org/pub/scm/linux/kernel/git/konrad/xen Pull more xen updates from Konrad Rzeszutek Wilk: "One tiny feature that accidentally got lost in the initial git pull: * Add fast-EOI acking of interrupts (clear a bit instead of hypercall) And bug-fixes: * Fix CPU bring-up code missing a call to notify other subsystems. * Fix reading /sys/hypervisor even if PVonHVM drivers are not loaded. * In Xen ACPI processor driver: remove too verbose WARN messages, fix up the Kconfig dependency to be a module by default, and add dependency on CPU_FREQ. * Disable CPU frequency drivers from loading when booting under Xen (as we want the Xen ACPI processor to be used instead). * Cleanups in tmem code." * tag 'stable/for-linus-3.4-tag-two' of git://git.kernel.org/pub/scm/linux/kernel/git/konrad/xen: xen/acpi: Fix Kconfig dependency on CPU_FREQ xen: initialize platform-pci even if xen_emul_unplug=never xen/smp: Fix bringup bug in AP code. xen/acpi: Remove the WARN's as they just create noise. xen/tmem: cleanup xen: support pirq_eoi_map xen/acpi-processor: Do not depend on CPU frequency scaling drivers. xen/cpufreq: Disable the cpu frequency scaling drivers from loading. provide disable_cpufreq() function to disable the API.	2012-03-24 12:20:25 -07:00
Konrad Rzeszutek Wilk	3389bb8bf7	xen/blkback: Make optional features be really optional. They were using the xenbus_dev_fatal() function which would change the state of the connection immediately. Which is not what we want when we advertise optional features. So make 'feature-discard','feature-barrier','feature-flush-cache' optional. Suggested-by: Jan Beulich <JBeulich@suse.com> [v1: Made the discard function void and static] Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>	2012-03-24 10:04:36 -04:00
Konrad Rzeszutek Wilk	4dae76705f	xen/blkback: Squash the discard support for 'file' and 'phy' type. The only reason for the distinction was for the special case of 'file' (which is assumed to be loopback device), was to reach inside the loopback device, find the underlaying file, and call fallocate on it. Fortunately "xen-blkback: convert hole punching to discard request on loop devices" removes that use-case and we now based the discard support based on blk_queue_discard(q) and extract all appropriate parameters from the 'struct request_queue'. CC: Li Dongyang <lidongyang@novell.com> Acked-by: Jan Beulich <JBeulich@suse.com> [v1: Dropping pointless initializer and keeping blank line] [v2: Remove the kfree as it is not used anymore] Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>	2012-03-24 10:04:35 -04:00
Oleg Nesterov	70834d3070	usermodehelper: use UMH_WAIT_PROC consistently A few call_usermodehelper() callers use the hardcoded constant instead of the proper UMH_WAIT_PROC, fix them. Reported-by: Tetsuo Handa <penguin-kernel@i-love.sakura.ne.jp> Signed-off-by: Oleg Nesterov <oleg@redhat.com> Cc: Lars Ellenberg <drbd-dev@lists.linbit.com> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Michal Januszewski <spock@gentoo.org> Cc: Florian Tobias Schandinat <FlorianSchandinat@gmx.de> Cc: Kentaro Takeda <takedakn@nttdata.co.jp> Cc: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp> Cc: James Morris <jmorris@namei.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2012-03-23 16:58:41 -07:00
Asai Thambi S P	22be2e6e13	mtip32xx: fix incorrect value set for drv_cleanup_done, and re-initialize and start port in mtip_restart_port() This patch includes two changes: * fix incorrect value set for drv_cleanup_done * re-initialize and start port in mtip_restart_port() Signed-off-by: Asai Thambi S P <asamymuthupa@micron.com> Signed-off-by: Sam Bradshaw <sbradshaw@micron.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2012-03-23 12:33:03 +01:00
Stephen M. Cameron	bc67f63650	cciss: Fix scsi tape io with more than 255 scatter gather elements The total number of scatter gather elements in the CISS command used by the scsi tape code was being cast to a u8, which can hold at most 255 scatter gather elements. It should have been cast to a u16. Without this patch the command gets rejected by the controller since the total scatter gather count did not add up to the right value resulting in an i/o error. Signed-off-by: Stephen M. Cameron <scameron@beardog.cce.hp.com> Cc: stable@vger.kernel.org Signed-off-by: Jens Axboe <axboe@kernel.dk>	2012-03-22 21:40:09 +01:00
Stephen M. Cameron	395d287526	cciss: Initialize scsi host max_sectors for tape drive support The default is too small (1024 blocks), use h->cciss_max_sectors (8192 blocks) Without this change, if you try to set the block size of a tape drive above 512*1024, via "mt -f /dev/st0 setblk nnn" where nnn is greater than 524288, it won't work right. Signed-off-by: Stephen M. Cameron <scameron@beardog.cce.hp.com> Cc: stable@vger.kernel.org Signed-off-by: Jens Axboe <axboe@kernel.dk>	2012-03-22 21:40:08 +01:00
Josh Durgin	c666601a93	rbd: move snap_rwsem to the device, rename to header_rwsem A new temporary header is allocated each time the header changes, but only the changed properties are copied over. We don't need a new semaphore for each header update. This addresses http://tracker.newdream.net/issues/2174 Signed-off-by: Josh Durgin <josh.durgin@dreamhost.com> Reviewed-by: Alex Elder <elder@dreamhost.com>	2012-03-22 10:47:52 -05:00
Alex Elder	32eec68d2f	rbd: don't drop the rbd_id too early Currently an rbd device's id is released when it is removed, but it is done before the code is run to clean up sysfs-related files (such as /sys/bus/rbd/devices/1). It's possible that an rbd is still in use after the rbd_remove() call has been made. It's essentially the same as an active inode that stays around after it has been removed--until its final close operation. This means that the id shows up as free for reuse at a time it should not be. The effect of this was seen by Jens Rehpoehler, who: - had a filesystem mounted on an rbd device - unmapped that filesystem (without unmounting) - found that the mount still worked properly - but hit a panic when he attempted to re-map a new rbd device This re-map attempt found the previously-unmapped id available. The subsequent attempt to reuse it was met with a panic while attempting to (re-)install the sysfs entry for the new mapped device. Fix this by holding off "putting" the rbd id, until the rbd_device release function is called--when the last reference is finally dropped. Note: This fixes: http://tracker.newdream.net/issues/1907 Signed-off-by: Alex Elder <elder@dreamhost.com> Signed-off-by: Sage Weil <sage@newdream.net>	2012-03-22 10:47:50 -05:00
Alex Elder	593a9e7b34	rbd: small changes Here is another set of small code tidy-ups: - Define SECTOR_SHIFT and SECTOR_SIZE, and use these symbolic names throughout. Tell the blk_queue system our physical block size, in the (unlikely) event we want to use something other than the default. - Delete the definition of struct rbd_info, which is never used. - Move the definition of dev_to_rbd() down in its source file, just above where it gets first used, and change its name to dev_to_rbd_dev(). - Replace an open-coded operation in rbd_dev_release() to use dev_to_rbd_dev() instead. - Calculate the segment size for a given rbd_device just once in rbd_init_disk(). - Use the '%zd' conversion specifier in rbd_snap_size_show(), since the value formatted is a size_t. - Switch to the '%llu' conversion specifier in rbd_snap_id_show(). since the value formatted is unsigned. Signed-off-by: Alex Elder <elder@dreamhost.com>	2012-03-22 10:47:50 -05:00
Alex Elder	00f1f36ffa	rbd: do some refactoring A few blocks of code are rearranged a bit here: - In rbd_header_from_disk(): - Don't bother computing snap_count until we're sure the on-disk header starts with a good signature. - Move a few independent lines of code so they are after a check for a failed memory allocation. - Get rid of unnecessary local variable "ret". - Make a few other changes in rbd_read_header(), similar to the above--just moving things around a bit while preserving the functionality. - In rbd_rq_fn(), just assign rq in the while loop's controlling expression rather than duplicating it before and at the end of the loop body. This allows the use of "continue" rather than "goto next" in a number of spots. - Rearrange the logic in snap_by_name(). End result is the same. Signed-off-by: Alex Elder <elder@dreamhost.com>	2012-03-22 10:47:50 -05:00
Alex Elder	fed4c143ba	rbd: fix module sysfs setup/teardown code Once rbd_bus_type is registered, it allows an "add" operation via the /sys/bus/rbd/add bus attribute, and adding a new rbd device that way establishes a connection between the device and rbd_root_dev. But rbd_root_dev is not registered until after the rbd_bus_type registration is complete. This could (in principle anyway) result in an invalid state. Since rbd_root_dev has no tie to rbd_bus_type we can reorder these two initializations and never be faced with this scenario. In addition, unregister the device in the event the bus registration fails at module init time. Signed-off-by: Alex Elder <elder@dreamhost.com> Signed-off-by: Sage Weil <sage@newdream.net>	2012-03-22 10:47:50 -05:00
Alex Elder	7ef3214af2	rbd: don't allocate mon_addrs buffer in rbd_add() The mon_addrs buffer in rbd_add is used to hold a copy of the monitor IP addresses supplied via /sys/bus/rbd/add. That is passed to rbd_get_client(), which never modifies it (nor do any of the functions it gets passed to thereafter)--the mon_addr parameter to rbd_get_client() is a pointer to constant data, so it can't be modifed. Furthermore, rbd_get_client() has the length of the mon_addrs buffer and that is used to ensure nothing goes beyond its end. Based on all this, there is no reason that a buffer needs to be used to hold a copy of the mon_addrs provided via /sys/bus/rbd/add. Instead, the location within that passed-in buffer can be provided, along with the length of the "token" therein which represents the monitor IP's. A small change to rbd_add_parse_args() allows the address within the buffer to be passed back, and the length is already returned. This now means that, at least from the perspective of this interface, there is no such thing as a list of monitor addresses that is too long. Signed-off-by: Alex Elder <elder@dreamhost.com>	2012-03-22 10:47:50 -05:00
Alex Elder	5214ecc45c	rbd: have rbd_parse_args() report found mon_addrs size The argument parsing routine already computes the size of the mon_addrs buffer it extracts from the "command." Pass it to the caller so it can use it to provide the length to rbd_get_client(). Signed-off-by: Alex Elder <elder@dreamhost.com>	2012-03-22 10:47:49 -05:00
Alex Elder	81a8979378	rbd: do a few checks at build time This is a bit gratuitous, but there are a few things that can be verified at build time rather than run time, so do that. Signed-off-by: Alex Elder <elder@dreamhost.com>	2012-03-22 10:47:49 -05:00
Alex Elder	e28fff268e	rbd: don't use sscanf() in rbd_add_parse_args() Make use of a few simple helper routines to parse the arguments rather than sscanf(). This will treat both missing and too-long arguments as invalid input (rather than silently truncating the input in the too-long case). In time this can also be used by rbd_add() to use the passed-in buffer in place, rather than copying its contents into new buffers. It appears to me that the sscanf() previously used would not correctly handle a supplied snapshot--the two final "%s" conversion specifications were not separated by a space, and I'm not sure how sscanf() handles that situation. It may not be well-defined. So that may be a bug this change fixes (but I didn't verify that). The sizes of the mon_addrs and options buffers are now passed to rbd_add_parse_args(), so they can be supplied to copy_token(). Signed-off-by: Alex Elder <elder@dreamhost.com>	2012-03-22 10:47:49 -05:00
Alex Elder	a725f65e52	rbd: encapsulate argument parsing for rbd_add() Move the code that parses the arguments provided to rbd_add() (which are supplied via /sys/bus/rbd/add) into a separate function. Also rename the "mon_dev_name" variable in rbd_add() to be "mon_addrs". The variable represents a list of one or more comma-separated monitor IP addresses, each with an optional port number. I think "mon_addrs" captures that notion a little better. Signed-off-by: Alex Elder <elder@dreamhost.com>	2012-03-22 10:47:48 -05:00
Alex Elder	27cc25943f	rbd: simplify error handling in rbd_add() If a couple pointers are initialized to NULL then a single "out_nomem" label can be used for all of the memory allocation failure cases in rbd_add(). Also, get rid of the "irc" local variable there. There is no real need for "rc" to be type ssize_t, and it can be used in the spot "irc" was. Signed-off-by: Alex Elder <elder@dreamhost.com> Signed-off-by: Sage Weil <sage@newdream.net>	2012-03-22 10:47:48 -05:00
Alex Elder	60571c7d55	rbd: reduce memory used for rbd_dev fields The length of the string containing the monitor address specification(s) will never exceed the length of the string passed in to rbd_add(). The same holds true for the ceph + rbd options string. So reduce the amount of memory allocated for these to that length rather than the maximum (1024 bytes). Signed-off-by: Alex Elder <elder@dreamhost.com> Signed-off-by: Sage Weil <sage@newdream.net>	2012-03-22 10:47:48 -05:00
Alex Elder	d720bcb0a8	rbd: have rbd_get_client() return a rbd_client Since rbd_get_client() currently returns an error code. It assigns the rbd_client field of the rbd_device structure it is passed if successful. Instead, have it return the created rbd_client structure and return a pointer-coded error if there is an error. This makes the assignment of the client pointer more obvious at the call site. Signed-off-by: Alex Elder <elder@dreamhost.com> Signed-off-by: Sage Weil <sage@newdream.net>	2012-03-22 10:47:48 -05:00
Alex Elder	f0f8cef5a3	rbd: a few simple changes Here are a few very simple cleanups: - Add a "RBD_" prefix to the two driver name string definitions. - Move the definition of struct rbd_request below struct rbd_req_coll to avoid the need for an empty declaration of the latter. - Move and group the definitions of rbd_root_dev_release() and rbd_root_dev, as well as rbd_bus_type and rbd_bus_attrs[], close to the top of the file. Arrange the latter so rbd_bus_type.bus_attrs can be initialized statically. - Get rid of an unnecessary local variable in rbd_open(). - Rework some hokey logic in rbd_bus_add_dev(), so the value of "ret" at the end is either 0 or -ENOENT to avoid the need for the code duplication that was there. - Rename a goto target in rbd_add(). Signed-off-by: Alex Elder <elder@dreamhost.com> Signed-off-by: Sage Weil <sage@newdream.net>	2012-03-22 10:47:48 -05:00
Alex Elder	432b858749	rbd: rename "node_lock" The spinlock used to protect rbd_client_list is named "node_lock". Rename it to "rbd_client_list_lock" to make it more obvious what it's for. Signed-off-by: Alex Elder <elder@dreamhost.com> Signed-off-by: Sage Weil <sage@newdream.net>	2012-03-22 10:47:48 -05:00
Alex Elder	bc534d86be	rbd: move ctl_mutex lock inside rbd_client_create() Since rbd_client_create() is only called in one place, move the acquisition of the mutex around that call inside that function. Signed-off-by: Alex Elder <elder@dreamhost.com> Signed-off-by: Sage Weil <sage@newdream.net>	2012-03-22 10:47:47 -05:00
Alex Elder	d97081b0c7	rbd: move ctl_mutex lock inside rbd_get_client() Since rbd_get_client() is only called in one place, move the acquisition of the mutex around that call inside that function. Furthermore, within rbd_get_client(), it appears the mutex only needs to be held while calling rbd_client_create(). (Moving the lock inside that function will wait for the next patch.) Signed-off-by: Alex Elder <elder@dreamhost.com> Signed-off-by: Sage Weil <sage@newdream.net>	2012-03-22 10:47:47 -05:00
Alex Elder	e6994d3dde	rbd: release client list lock sooner In rbd_get_client(), if a client is reused, a number of things get done while still holding the list lock unnecessarily. This just moves a few things that need no lock protection outside the lock. Signed-off-by: Alex Elder <elder@dreamhost.com> Signed-off-by: Sage Weil <sage@newdream.net>	2012-03-22 10:47:47 -05:00
Alex Elder	d184f6bfde	rbd: restore previous rbd id sequence behavior It used to be that selecting a new unique identifier for an added rbd device required searching all existing ones to find the highest id is used. A recent change made that unnecessary, but made it so that id's used were monotonically non-decreasing. It's a bit more pleasant to have smaller rbd id's though, and this change makes ids get allocated as they were before--each new id is one more than the maximum currently in use. Signed-off-by: Alex Elder <elder@dreamhost.com> Signed-off-by: Sage Weil <sage@newdream.net>	2012-03-22 10:47:47 -05:00
Alex Elder	499afd5b8e	rbd: tie rbd_dev_list changes to rbd_id operations The only time entries are added to or removed from the global rbd_dev_list is exactly when a "put" or "get" operation is being performed on a rbd_dev's id. So just move the list management code into get/put routines. Signed-off-by: Alex Elder <elder@dreamhost.com> Signed-off-by: Sage Weil <sage@newdream.net>	2012-03-22 10:47:47 -05:00
Alex Elder	e124a82f3c	rbd: protect the rbd_dev_list with a spinlock The rbd_dev_list is just a simple list of all the current rbd_devices. Using the ctl_mutex as a concurrency guard is overkill. Instead, use a spinlock for that specific purpose. This also reduces the window that the ctl_mutex needs to be held in rbd_add(). Signed-off-by: Alex Elder <elder@dreamhost.com> Signed-off-by: Sage Weil <sage@newdream.net>	2012-03-22 10:47:47 -05:00
Alex Elder	1ddbe94eda	rbd: rework calculation of new rbd id's In order to select a new unique identifier for an added rbd device, the list of all existing ones is searched and a value one greater than the highest id is used. The list search can be avoided by using an atomic variable that keeps track of the current highest id. Using a get/put model for id's we can limit the boundless growth of id numbers a bit by arranging to reuse the current highest id once it gets released. Add these calls to "put" the id when an rbd is getting removed. Note that this changes the pattern of device id's used--new values will never be below the highest one seen so far (even if there exists an unused lower one). I assert this is OK because the key property of an rbd id is its uniqueness, not its magnitude. Regardless, a follow-on patch will restore the old way of doing things, I just think this commit just makes the incremental change to atomics a little easier to understand. Signed-off-by: Alex Elder <elder@dreamhost.com> Signed-off-by: Sage Weil <sage@newdream.net>	2012-03-22 10:47:47 -05:00
Alex Elder	b7f23c361b	rbd: encapsulate new rbd id selection Move the loop that finds a new unique rbd id to use into its own helper function. Signed-off-by: Alex Elder <elder@dreamhost.com> Signed-off-by: Sage Weil <sage@newdream.net>	2012-03-22 10:47:47 -05:00
Josh Durgin	cc9d734c3d	rbd: use a single value of snap_name to mean no snap There's already a constant for this anyway. Since rbd_header_set_snap() is only used to set the rbd device snap_name field, just do that within that function rather than having it take the snap_name as an argument. Signed-off-by: Alex Elder <elder@dreamhost.com> Signed-off-by: Sage Weil <sage@newdream.net> v2: Changed interface rbd_header_set_snap() so it explicitly updates the snap_name in the rbd_device. Also added a BUILD_BUG_ON() to verify the size of the snap_name field is sufficient for SNAP_HEAD_NAME.	2012-03-22 10:47:47 -05:00
Alex Elder	1dbb439913	rbd: do not duplicate ceph_client pointer in rbd_device The rbd_device structure maintains a duplicate copy of the ceph_client pointer maintained in its rbd_client structure. There appears to be no good reason for this, and its presence presents a risk of them getting out of synch or otherwise misused. So kill it off, and use the rbd_client copy only. Signed-off-by: Alex Elder <elder@dreamhost.com> Signed-off-by: Sage Weil <sage@newdream.net>	2012-03-22 10:47:47 -05:00
Alex Elder	ee57741c52	rbd: make ceph_parse_options() return a pointer ceph_parse_options() takes the address of a pointer as an argument and uses it to return the address of an allocated structure if successful. With this interface is not evident at call sites that the pointer is always initialized. Change the interface to return the address instead (or a pointer-coded error code) to make the validity of the returned pointer obvious. Signed-off-by: Alex Elder <elder@dreamhost.com> Signed-off-by: Sage Weil <sage@newdream.net>	2012-03-22 10:47:47 -05:00
Alex Elder	2107978668	rbd: a few small cleanups Some minor cleanups in "drivers/block/rbd.c: - Use the more meaningful "RBD_MAX_OBJ_NAME_LEN" in place if "96" in the definition of RBD_MAX_MD_NAME_LEN. - Use DEFINE_SPINLOCK() to define and initialize node_lock. - Drop a needless (char *) cast in parse_rbd_opts_token(). - Make a few minor formatting changes. Signed-off-by: Alex Elder <elder@dreamhost.com> Signed-off-by: Sage Weil <sage@newdream.net>	2012-03-22 10:47:46 -05:00
Igor Mammedov	b9136d207f	xen: initialize platform-pci even if xen_emul_unplug=never When xen_emul_unplug=never is specified on kernel command line reading files from /sys/hypervisor is broken (returns -EBUSY). It is caused by xen_bus dependency on platform-pci and platform-pci isn't initialized when xen_emul_unplug=never is specified. Fix it by allowing platform-pci to ignore xen_emul_unplug=never, and do not intialize xen_[blk\|net]front instead. Signed-off-by: Igor Mammedov <imammedo@redhat.com> Acked-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>	2012-03-22 11:37:11 -04:00
Linus Torvalds	5375871d43	Merge branch 'next' of git://git.kernel.org/pub/scm/linux/kernel/git/benh/powerpc Pull powerpc merge from Benjamin Herrenschmidt: "Here's the powerpc batch for this merge window. It is going to be a bit more nasty than usual as in touching things outside of arch/powerpc mostly due to the big iSeriesectomy :-) We finally got rid of the bugger (legacy iSeries support) which was a PITA to maintain and that nobody really used anymore. Here are some of the highlights: - Legacy iSeries is gone. Thanks Stephen ! There's still some bits and pieces remaining if you do a grep -ir series arch/powerpc but they are harmless and will be removed in the next few weeks hopefully. - The 'fadump' functionality (Firmware Assisted Dump) replaces the previous (equivalent) "pHyp assisted dump"... it's a rewrite of a mechanism to get the hypervisor to do crash dumps on pSeries, the new implementation hopefully being much more reliable. Thanks Mahesh Salgaonkar. - The "EEH" code (pSeries PCI error handling & recovery) got a big spring cleaning, motivated by the need to be able to implement a new backend for it on top of some new different type of firwmare. The work isn't complete yet, but a good chunk of the cleanups is there. Note that this adds a field to struct device_node which is not very nice and which Grant objects to. I will have a patch soon that moves that to a powerpc private data structure (hopefully before rc1) and we'll improve things further later on (hopefully getting rid of the need for that pointer completely). Thanks Gavin Shan. - I dug into our exception & interrupt handling code to improve the way we do lazy interrupt handling (and make it work properly with "edge" triggered interrupt sources), and while at it found & fixed a wagon of issues in those areas, including adding support for page fault retry & fatal signals on page faults. - Your usual random batch of small fixes & updates, including a bunch of new embedded boards, both Freescale and APM based ones, etc..." I fixed up some conflicts with the generalized irq-domain changes from Grant Likely, hopefully correctly. * 'next' of git://git.kernel.org/pub/scm/linux/kernel/git/benh/powerpc: (141 commits) powerpc/ps3: Do not adjust the wrapper load address powerpc: Remove the rest of the legacy iSeries include files powerpc: Remove the remaining CONFIG_PPC_ISERIES pieces init: Remove CONFIG_PPC_ISERIES powerpc: Remove FW_FEATURE ISERIES from arch code tty/hvc_vio: FW_FEATURE_ISERIES is no longer selectable powerpc/spufs: Fix double unlocks powerpc/5200: convert mpc5200 to use of_platform_populate() powerpc/mpc5200: add options to mpc5200_defconfig powerpc/mpc52xx: add a4m072 board support powerpc/mpc5200: update mpc5200_defconfig to fit for charon board Documentation/powerpc/mpc52xx.txt: Checkpatch cleanup powerpc/44x: Add additional device support for APM821xx SoC and Bluestone board powerpc/44x: Add support PCI-E for APM821xx SoC and Bluestone board MAINTAINERS: Update PowerPC 4xx tree powerpc/44x: The bug fixed support for APM821xx SoC and Bluestone board powerpc: document the FSL MPIC message register binding powerpc: add support for MPIC message register API powerpc/fsl: Added aliased MSIIR register address to MSI node in dts powerpc/85xx: mpc8548cds - add 36-bit dts ...	2012-03-21 18:55:10 -07:00
Linus Torvalds	9f3938346a	Merge branch 'kmap_atomic' of git://github.com/congwang/linux Pull kmap_atomic cleanup from Cong Wang. It's been in -next for a long time, and it gets rid of the (no longer used) second argument to k[un]map_atomic(). Fix up a few trivial conflicts in various drivers, and do an "evil merge" to catch some new uses that have come in since Cong's tree. * 'kmap_atomic' of git://github.com/congwang/linux: (59 commits) feature-removal-schedule.txt: schedule the deprecated form of kmap_atomic() for removal highmem: kill all __kmap_atomic() [swarren@nvidia.com: highmem: Fix ARM build break due to __kmap_atomic rename] drbd: remove the second argument of k[un]map_atomic() zcache: remove the second argument of k[un]map_atomic() gma500: remove the second argument of k[un]map_atomic() dm: remove the second argument of k[un]map_atomic() tomoyo: remove the second argument of k[un]map_atomic() sunrpc: remove the second argument of k[un]map_atomic() rds: remove the second argument of k[un]map_atomic() net: remove the second argument of k[un]map_atomic() mm: remove the second argument of k[un]map_atomic() lib: remove the second argument of k[un]map_atomic() power: remove the second argument of k[un]map_atomic() kdb: remove the second argument of k[un]map_atomic() udf: remove the second argument of k[un]map_atomic() ubifs: remove the second argument of k[un]map_atomic() squashfs: remove the second argument of k[un]map_atomic() reiserfs: remove the second argument of k[un]map_atomic() ocfs2: remove the second argument of k[un]map_atomic() ntfs: remove the second argument of k[un]map_atomic() ...	2012-03-21 09:40:26 -07:00
Linus Torvalds	69a7aebcf0	Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial Pull trivial tree from Jiri Kosina: "It's indeed trivial -- mostly documentation updates and a bunch of typo fixes from Masanari. There are also several linux/version.h include removals from Jesper." * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial: (101 commits) kcore: fix spelling in read_kcore() comment constify struct pci_dev * in obvious cases Revert "char: Fix typo in viotape.c" init: fix wording error in mm_init comment usb: gadget: Kconfig: fix typo for 'different' Revert "power, max8998: Include linux/module.h just once in drivers/power/max8998_charger.c" writeback: fix fn name in writeback_inodes_sb_nr_if_idle() comment header writeback: fix typo in the writeback_control comment Documentation: Fix multiple typo in Documentation tpm_tis: fix tis_lock with respect to RCU Revert "media: Fix typo in mixer_drv.c and hdmi_drv.c" Doc: Update numastat.txt qla4xxx: Add missing spaces to error messages compiler.h: Fix typo security: struct security_operations kerneldoc fix Documentation: broken URL in libata.tmpl Documentation: broken URL in filesystems.tmpl mtd: simplify return logic in do_map_probe() mm: fix comment typo of truncate_inode_pages_range power: bq27x00: Fix typos in comment ...	2012-03-20 21:12:50 -07:00
Linus Torvalds	ed378a52da	USB merge for 3.4-rc1 Here's the big USB merge for the 3.4-rc1 merge window. Lots of gadget driver reworks here, driver updates, xhci changes, some new drivers added, usb-serial core reworking to fix some bugs, and other various minor things. There are some patches touching arch code, but they have all been acked by the various arch maintainers. Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> -----BEGIN PGP SIGNATURE----- Version: GnuPG v2.0.18 (GNU/Linux) iEYEABECAAYFAk9njL8ACgkQMUfUDdst+ylQ9wCfbBOnIT01lGOorkaE9pom0hhk HfMAoKq1xzCR2B+OS3UMyUQffk+Ri9Ri =KIQ2 -----END PGP SIGNATURE----- Merge tag 'usb-3.3' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb Pull USB merge for 3.4-rc1 from Greg KH: "Here's the big USB merge for the 3.4-rc1 merge window. Lots of gadget driver reworks here, driver updates, xhci changes, some new drivers added, usb-serial core reworking to fix some bugs, and other various minor things. There are some patches touching arch code, but they have all been acked by the various arch maintainers." * tag 'usb-3.3' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb: (302 commits) net: qmi_wwan: add support for ZTE MF820D USB: option: add ZTE MF820D usb: gadget: f_fs: Remove lock is held before freeing checks USB: option: make interface blacklist work again usb/ub: deprecate & schedule for removal the "Low Performance USB Block" driver USB: ohci-pxa27x: add clk_prepare/clk_unprepare calls USB: use generic platform driver on ath79 USB: EHCI: Add a generic platform device driver USB: OHCI: Add a generic platform device driver USB: ftdi_sio: new PID: LUMEL PD12 USB: ftdi_sio: add support for FT-X series devices USB: serial: mos7840: Fixed MCS7820 device attach problem usb: Don't make USB_ARCH_HAS_{XHCI,OHCI,EHCI} depend on USB_SUPPORT. usb gadget: fix a section mismatch when compiling g_ffs with CONFIG_USB_FUNCTIONFS_ETH USB: ohci-nxp: Remove i2c_write(), use smbus USB: ohci-nxp: Support for LPC32xx USB: ohci-nxp: Rename symbols from pnx4008 to nxp USB: OHCI-HCD: Rename ohci-pnx4008 to ohci-nxp usb: gadget: Kconfig: fix typo for 'different' usb: dwc3: pci: fix another failure path in dwc3_pci_probe() ...	2012-03-20 11:26:30 -07:00
Cong Wang	589973a704	drbd: remove the second argument of k[un]map_atomic() Signed-off-by: Cong Wang <amwang@redhat.com>	2012-03-20 21:48:29 +08:00
Cong Wang	cfd8005c99	block: remove the second argument of k[un]map_atomic() Signed-off-by: Cong Wang <amwang@redhat.com>	2012-03-20 21:48:16 +08:00
Steven Noonan	3467811e26	xen-blkfront: make blkif_io_lock spinlock per-device This patch moves the global blkif_io_lock to the per-device structure. The spinlock seems to exists for two reasons: to disable IRQs when in the interrupt handlers for blkfront, and to protect the blkfront VBDs when a detachment is requested. Having a global blkif_io_lock doesn't make sense given the use case, and it drastically hinders performance due to contention. All VBDs with pending IOs have to take the lock in order to get work done, which serializes everything pretty badly. Signed-off-by: Steven Noonan <snoonan@amazon.com> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>	2012-03-20 12:52:41 +01:00
Andrew Jones	dad5cf659b	xen/blkfront: don't put bdev right after getting it We should hang onto bdev until we're done with it. Signed-off-by: Andrew Jones <drjones@redhat.com> [v1: Fixed up git commit description] Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>	2012-03-20 12:52:41 +01:00
Akinobu Mita	34ae2e47d9	xen-blkfront: use bitmap_set() and bitmap_clear() Use bitmap_set and bitmap_clear rather than modifying individual bits in a memory region. Signed-off-by: Akinobu Mita <akinobu.mita@gmail.com> Cc: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com> Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Cc: xen-devel@lists.xensource.com Cc: virtualization@lists.linux-foundation.org Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>	2012-03-20 12:52:41 +01:00
Daniel De Graaf	b2167ba6dd	xen/blkback: Enable blkback on HVM guests Signed-off-by: Daniel De Graaf <dgdegra@tycho.nsa.gov> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>	2012-03-20 12:52:41 +01:00

... 3 4 5 6 7 ...

2715 Commits