linux/net/ceph
Alex Elder e02493c07c libceph: requeue only sent requests when kicking
The osd expects incoming requests for a given object from a given
client to arrive in order, with the tid for each request being
greater than the tid for requests that have already arrived.  This
patch fixes two places the osd client might not maintain that
ordering.

For the osd client, the connection fault method is osd_reset().
That function calls __reset_osd() to close and re-open the
connection, then calls __kick_osd_requests() to cause all
outstanding requests for the affected osd to be re-sent after
the connection has been re-established.

When an osd is reset, any in-flight messages will need to be
re-sent.  An osd client maintains distinct lists for unsent and
in-flight messages.  Meanwhile, an osd maintains a single list of
all its requests (both sent and un-sent).  (Each message is linked
into two lists--one for the osd client and one list for the osd.)

To process an osd "kick" operation, the request list for the *osd*
is traversed, and each request is moved off whichever osd *client*
list it was on (unsent or sent) and placed onto the osd client's
unsent list.  (It remains where it is on the osd's request list.)

When that is done, osd_reset() calls __send_queued() to cause each
of the osd client's unsent messages to be sent.

OK, with that background...

As the osd request list is traversed each request is prepended to
the osd client's unsent list in the order they're seen.  The effect
of this is to reverse the order of these requests as they are put
(back) onto the unsent list.

Instead, build up a list of only the requests for an osd that have
already been sent (by checking their r_sent flag values).  Once an
unsent request is found, stop examining requests and prepend the
requests that need re-sending to the osd client's unsent list.

Preserve the original order of requests in the process (previously
re-queued requests were reversed in this process).  Because they
have already been sent, they will have lower tids than any request
already present on the unsent list.

Just below that, traverse the linger list in forward order as
before, but add them to the *tail* of the list rather than the head.
These requests get re-registered, and in the process are give a new
(higher) tid, so the should go at the end.

This partially resolves:
    http://tracker.ceph.com/issues/4392

Signed-off-by: Alex Elder <elder@inktank.com>
Reviewed-off-by: Sage Weil <sage@inktank.com>
2013-05-01 21:17:18 -07:00
..
crush crush: avoid recursion if we have already collided 2013-01-17 12:42:39 -06:00
armor.c libceph: Fix base64-decoding when input ends in newline. 2011-03-15 09:14:02 -07:00
auth_none.c ceph: messenger: reduce args to create_authorizer 2012-05-17 08:18:12 -05:00
auth_none.h
auth_x_protocol.h
auth_x.c libceph: wrap auth ops in wrapper functions 2013-05-01 21:17:14 -07:00
auth_x.h libceph: add update_authorizer auth method 2013-05-01 21:17:13 -07:00
auth.c libceph: wrap auth methods in a mutex 2013-05-01 21:17:15 -07:00
buffer.c net: allow GFP_HIGHMEM in __vmalloc() 2010-11-21 10:04:04 -08:00
ceph_common.c Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client 2013-02-28 17:43:09 -08:00
ceph_fs.c ceph: fix file mode calculation 2011-07-19 11:25:04 -07:00
ceph_hash.c net: cleanup unsigned to unsigned int 2012-04-15 12:44:40 -04:00
ceph_strings.c libceph: update ceph_osd_op_name() 2013-02-18 12:20:18 -06:00
crypto.c libceph: eliminate sparse warnings 2013-02-25 15:37:18 -06:00
crypto.h libceph: fix crypto key null deref, memory leak 2012-08-02 09:19:20 -07:00
debugfs.c libceph: update osd request/reply encoding 2013-02-26 15:02:50 -08:00
Kconfig net/ceph: remove depends on CONFIG_EXPERIMENTAL 2013-01-11 11:39:33 -08:00
Makefile Merge branch 'master' of master.kernel.org:/pub/scm/linux/kernel/git/davem/net-2.6 2010-12-08 13:47:38 -08:00
messenger.c libceph: clear messenger auth_retry flag when we authenticate 2013-05-01 21:17:11 -07:00
mon_client.c libceph: wrap auth ops in wrapper functions 2013-05-01 21:17:14 -07:00
msgpool.c libceph: initialize msgpool message types 2012-07-30 09:29:50 -07:00
osd_client.c libceph: requeue only sent requests when kicking 2013-05-01 21:17:18 -07:00
osdmap.c libceph: rename ceph_calc_object_layout() 2013-05-01 21:16:17 -07:00
pagelist.c ceph: use list_move_tail instead of list_del/list_add_tail 2012-10-01 14:30:49 -05:00
pagevec.c libceph: drop return value from page vector copy routines 2013-02-19 19:14:05 -06:00