`|' has a higher precedence than `?' so since MSG_WAITALL is
defined 0x100, MSG_MORE was always written to the msg_flag.
Signed-off-by: Roel Kluin <roel.kluin@gmail.com>
Acked-by: Evgeniy Polyakov <zbr@ioremap.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
* 'for-linus' of git://git.kernel.dk/linux-2.6-block: (41 commits)
Revert "Seperate read and write statistics of in_flight requests"
cfq-iosched: don't delay async queue if it hasn't dispatched at all
block: Topology ioctls
cfq-iosched: use assigned slice sync value, not default
cfq-iosched: rename 'desktop' sysfs entry to 'low_latency'
cfq-iosched: implement slower async initiate and queue ramp up
cfq-iosched: delay async IO dispatch, if sync IO was just done
cfq-iosched: add a knob for desktop interactiveness
Add a tracepoint for block request remapping
block: allow large discard requests
block: use normal I/O path for discard requests
swapfile: avoid NULL pointer dereference in swapon when s_bdev is NULL
fs/bio.c: move EXPORT* macros to line after function
Add missing blk_trace_remove_sysfs to be in pair with blk_trace_init_sysfs
cciss: fix build when !PROC_FS
block: Do not clamp max_hw_sectors for stacking devices
block: Set max_sectors correctly for stacking devices
cciss: cciss_host_attr_groups should be const
cciss: Dynamically allocate the drive_info_struct for each logical drive.
cciss: Add usage_count attribute to each logical drive in /sys
...
Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Acked-by: Lars Ellenberg <lars.ellenberg@linbit.com>
Acked-by: Evgeniy Polyakov <zbr@ioremap.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
prepare_discard_fn() was being called in a place where memory allocation
was effectively impossible. This makes it inappropriate for all but
the most trivial translations of Linux's DISCARD operation to the block
command set. Additionally adding a payload there makes the ownership
of the bio backing unclear as it's now allocated by the device driver
and not the submitter as usual.
It is replaced with QUEUE_FLAG_DISCARD which is used to indicate whether
the queue supports discard operations or not. blkdev_issue_discard now
allocates a one-page, sector-length payload which is the right thing
for the common ATA and SCSI implementations.
The mtd implementation of prepare_discard_fn() is replaced with simply
checking for the request being a discard.
Largely based on a previous patch from Matthew Wilcox <matthew@wil.cx>
which did the prepare_discard_fn but not the different payload allocation
yet.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
dst_state_alloc returns an ERR_PTR value in an error case instead of NULL.
A simplified version of the semantic match that finds this problem is as
follows: (http://coccinelle.lip6.fr/)
// <smpl>
@match exists@
expression x, E;
statement S1, S2;
@@
x = dst_state_alloc(...)
... when != x = E
(
* if (x == NULL || ...) S1 else S2
|
* if (x == NULL && ...) S1 else S2
)
// </smpl>
Signed-off-by: Julia Lawall <julia@diku.dk>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
* 'for-2.6.32' of git://git.kernel.dk/linux-2.6-block: (29 commits)
block: use blkdev_issue_discard in blk_ioctl_discard
Make DISCARD_BARRIER and DISCARD_NOBARRIER writes instead of reads
block: don't assume device has a request list backing in nr_requests store
block: Optimal I/O limit wrapper
cfq: choose a new next_req when a request is dispatched
Seperate read and write statistics of in_flight requests
aoe: end barrier bios with EOPNOTSUPP
block: trace bio queueing trial only when it occurs
block: enable rq CPU completion affinity by default
cfq: fix the log message after dispatched a request
block: use printk_once
cciss: memory leak in cciss_init_one()
splice: update mtime and atime on files
block: make blk_iopoll_prep_sched() follow normal 0/1 return convention
cfq-iosched: get rid of must_alloc flag
block: use interrupts disabled version of raise_softirq_irqoff()
block: fix comment in blk-iopoll.c
block: adjust default budget for blk-iopoll
block: fix long lines in block/blk-iopoll.c
block: add blk-iopoll, a NAPI like approach for block devices
...
Get rid of any functions that test for these bits and make callers
use bio_rw_flagged() directly. Then it is at least directly apparent
what variable and flag they check.
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
The connector documentation states that the argument to the callback
function is always a pointer to a struct cn_msg, but rather than encode it
in the API itself, it uses a void pointer everywhere. This doesn't make
much sense to encode the pointer in documentation as it prevents proper C
type checking from occurring and can easily allow people to use the wrong
pointer type. So convert the argument type to an explicit struct cn_msg
pointer.
Signed-off-by: Mike Frysinger <vapier@gentoo.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
On Mon, Jan 19, 2009 at 10:12:24AM -0800, Randy Dunlap (randy.dunlap@oracle.com) wrote:
>
> DST build fails when CONFIG_BLOCK=n:
DST should depend on block and block device, in the original patch its
kconfig entry was in the BLK_DEV menu, so this dependency was satisfied
automatically. Should attached patch be pushed into drivers/staging?
Reported-by: Randy Dunlap <randy.dunlap@oracle.com>
Acked-by: Randy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: Evgeniy Polyakov <zbr@ioremap.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Do not allow empty barriers or generic_make_request() -> scsi_setup_fs_cmnd()
will explode
Signed-off-by: Evgeniy Polyakov <zbr@ioremap.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Added thread pool exit condition into the thread_pool_del_worker(). If
called in parallel another thread can steal
Signed-off-by: Evgeniy Polyakov <zbr@ioremap.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Use bio prepend feature as suggested by Jens Axboe.
Signed-off-by: Evgeniy Polyakov <zbr@ioremap.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
bus_id is going away, use the dev_set_name() function instead.
Cc: Kay Sievers <kay.sievers@vrfy.org>
Cc: Evgeniy Polyakov <zbr@ioremap.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
DST may fully encrypt the data channel in case of untrusted channel and implement
strong checksum of the transferred data. It is possible to configure algorithms
and crypto keys, they should match on both sides of the network channel.
Crypto processing does not introduce noticeble performance overhead, since DST
uses configurable pool of threads to perform crypto processing.
This patch introduces crypto processing helpers and crypto engine initialization:
glueing with the crypto layer, allocation and initialization of the crypto
processing thread pool, allocation of the cached pages, which are used to temporary
encrypt data into, since it is forbidden to encrypt data in-place, since pages
are used by the higher layers.
Signed-off-by: Evgeniy Polyakov <zbr@ioremap.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
DST uses transaction model, when each store has to be explicitly acked
from the remote node to be considered as successfully written. There
may be lots of in-flight transactions. When remote host does not ack
the transaction it will be resent predefined number of times with specified
timeouts between them. All those parameters are configurable. Transactions
are marked as failed after all resends completed unsuccessfully, having
long enough resend timeout and/or large number of resends allows not to
return error to the higher (FS usually) layer in case of short network
problems or remote node outages. In case of network RAID setup this means
that storage will not degrade until transactions are marked as failed, and
thus will not force checksum recalculation and data rebuild. In case of
connection failure DST will try to reconnect to the remote node automatically.
DST sends ping commands at idle time to detect if remote node is alive.
Because of transactional model it is possible to use zero-copy sending
without worry of data corruption (which in turn could be detected by the
strong checksums though).
Transactions are handled in this patch: allocation/freeing/completion,
scanning for stall and to-be-resent transactions and overall management
of the storing tree.
Signed-off-by: Evgeniy Polyakov <zbr@ioremap.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Kernel currently does not allow to queue work into some entity which
will perform it in the process context and have simple way to extend
number of worker and work with them not as separate objects, but with
pool as a whole. So thread pool model was implemented in the DST.
Thread pool abstraction allows to schedule a work to be performed
on behalf of kernel thread. One does not operate with threads itself,
instead user provides setup and cleanup callbacks for thread pool itself,
and action and cleanup callbacks for each submitted work.
Each worker has private data initialized at creation time and data,
provided by user at scheduling time.
When action is being performed, thread can not be used by other users,
instead they will sleep until there is free thread to pick their work.
Thread pool is used for crypto processing of incoming and outgoing IO
requests to reduce the overall overhead.
Signed-off-by: Evgeniy Polyakov <zbr@ioremap.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
This patch introduces remote (export) node machinery: initialization
address/port (and other socket parameters), export block device (can be
another DST storage for example or local device like /dev/sda1), local
IO processing engine (BIO state machines, receiving/submitting logic).
Network management for the export node like accepting new client, scheduling
its command processing thread, receiving/sending IO requests, all are placed
here.
Signed-off-by: Evgeniy Polyakov <zbr@ioremap.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Each DST device contains of two nodes: local and remote (called also as export node).
This patch contains local node processing engine: network state storage,
socket processing loops and state machine, socket polling machinery, reconnection
logic, send/receive basic helpers, related IO commands and so on.
Signed-off-by: Evgeniy Polyakov <zbr@ioremap.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
This patch contains DST core files, which introduce
block layer, connector and sysfs registration glue and main headers.
Connector is used for the configuration of the node (its type, address,
device name and so on). Sysfs provides bits of information about running
devices in the following format:
+/*
+ * DST sysfs tree for device called 'storage':
+ *
+ * /sys/bus/dst/devices/storage/
+ * /sys/bus/dst/devices/storage/type : 192.168.4.80:1025
+ * /sys/bus/dst/devices/storage/size : 800
+ * /sys/bus/dst/devices/storage/name : storage
+ */
DST header contains structure definitions and protocol command description.
Signed-off-by: Evgeniy Polyakov <zbr@ioremap.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>