2006-01-02 19:04:38 +01:00
|
|
|
/*
|
|
|
|
* net/tipc/core.c: TIPC module code
|
|
|
|
*
|
tipc: remove 'links' list from tipc_bearer struct
In our ongoing effort to simplify the TIPC locking structure,
we see a need to remove the linked list for tipc_links
in the bearer. This can be explained as follows.
Currently, we have three different ways to access a link,
via three different lists/tables:
1: Via a node hash table:
Used by the time-critical outgoing/incoming data paths.
(e.g. link_send_sections_fast() and tipc_recv_msg() ):
grab net_lock(read)
find node from node hash table
grab node_lock
select link
grab bearer_lock
send_msg()
release bearer_lock
release node lock
release net_lock
2: Via a global linked list for nodes:
Used by configuration commands (link_cmd_set_value())
grab net_lock(read)
find node and link from global node list (using link name)
grab node_lock
update link
release node lock
release net_lock
(Same locking order as above. No problem.)
3: Via the bearer's linked link list:
Used by notifications from interface (e.g. tipc_disable_bearer() )
grab net_lock(write)
grab bearer_lock
get link ptr from bearer's link list
get node from link
grab node_lock
delete link
release node lock
release bearer_lock
release net_lock
(Different order from above, but works because we grab the
outer net_lock in write mode first, excluding all other access.)
The first major goal in our simplification effort is to get rid
of the "big" net_lock, replacing it with rcu-locks when accessing
the node list and node hash array. This will come in a later patch
series.
But to get there we first need to rewrite access methods ##2 and 3,
since removal of net_lock would introduce three major problems:
a) In access method #2, we access the link before taking the
protecting node_lock. This will not work once net_lock is gone,
so we will have to change the access order. We will deal with
this in a later commit in this series, "tipc: add node lock
protection to link found by link_find_link()".
b) When the outer protection from net_lock is gone, taking
bearer_lock and node_lock in opposite order of method 1) and 2)
will become an obvious deadlock hazard. This is fixed in the
commit ("tipc: remove bearer_lock from tipc_bearer struct")
later in this series.
c) Similar to what is described in problem a), access method #3
starts with using a link pointer that is unprotected by node_lock,
in order to via that pointer find the correct node struct and
lock it. Before we remove net_lock, this access order must be
altered. This is what we do with this commit.
We can avoid introducing problem problem c) by even here using the
global node list to find the node, before accessing its links. When
we loop though the node list we use the own bearer identity as search
criteria, thus easily finding the links that are associated to the
resetting/disabling bearer. It should be noted that although this
method is somewhat slower than the current list traversal, it is in
no way time critical. This is only about resetting or deleting links,
something that must be considered relatively infrequent events.
As a bonus, we can get rid of the mutual pointers between links and
bearers. After this commit, pointer dependency go in one direction
only: from the link to the bearer.
This commit pre-empts introduction of problem c) as described above.
Signed-off-by: Ying Xue <ying.xue@windriver.com>
Reviewed-by: Paul Gortmaker <paul.gortmaker@windriver.com>
Signed-off-by: Jon Maloy <jon.maloy@ericsson.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-02-13 17:29:09 -05:00
|
|
|
* Copyright (c) 2003-2006, 2013, Ericsson AB
|
tipc: convert topology server to use new server facility
As the new TIPC server infrastructure has been introduced, we can
now convert the TIPC topology server to it. We get two benefits
from doing this:
1) It simplifies the topology server locking policy. In the
original locking policy, we placed one spin lock pointer in the
tipc_subscriber structure to reuse the lock of the subscriber's
server port, controlling access to members of tipc_subscriber
instance. That is, we only used one lock to ensure both
tipc_port and tipc_subscriber members were safely accessed.
Now we introduce another spin lock for tipc_subscriber structure
only protecting themselves, to get a finer granularity locking
policy. Moreover, the change will allow us to make the topology
server code more readable and maintainable.
2) It fixes a bug where sent subscription events may be lost when
the topology port is congested. Using the new service, the
topology server now queues sent events into an outgoing buffer,
and then wakes up a sender process which has been blocked in
workqueue context. The process will keep picking events from the
buffer and send them to their respective subscribers, using the
kernel socket interface, until the buffer is empty. Even if the
socket is congested during transmission there is no risk that
events may be dropped, since the sender process may block when
needed.
Some minor reordering of initialization is done, since we now
have a scenario where the topology server must be started after
socket initialization has taken place, as the former depends
on the latter. And overall, we see a simplification of the
TIPC subscriber code in making this changeover.
Signed-off-by: Ying Xue <ying.xue@windriver.com>
Signed-off-by: Jon Maloy <jon.maloy@ericsson.com>
Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-06-17 10:54:40 -04:00
|
|
|
* Copyright (c) 2005-2006, 2010-2013, Wind River Systems
|
2006-01-02 19:04:38 +01:00
|
|
|
* All rights reserved.
|
|
|
|
*
|
2006-01-11 13:30:43 +01:00
|
|
|
* Redistribution and use in source and binary forms, with or without
|
2006-01-02 19:04:38 +01:00
|
|
|
* modification, are permitted provided that the following conditions are met:
|
|
|
|
*
|
2006-01-11 13:30:43 +01:00
|
|
|
* 1. Redistributions of source code must retain the above copyright
|
|
|
|
* notice, this list of conditions and the following disclaimer.
|
|
|
|
* 2. Redistributions in binary form must reproduce the above copyright
|
|
|
|
* notice, this list of conditions and the following disclaimer in the
|
|
|
|
* documentation and/or other materials provided with the distribution.
|
|
|
|
* 3. Neither the names of the copyright holders nor the names of its
|
|
|
|
* contributors may be used to endorse or promote products derived from
|
|
|
|
* this software without specific prior written permission.
|
2006-01-02 19:04:38 +01:00
|
|
|
*
|
2006-01-11 13:30:43 +01:00
|
|
|
* Alternatively, this software may be distributed under the terms of the
|
|
|
|
* GNU General Public License ("GPL") version 2 as published by the Free
|
|
|
|
* Software Foundation.
|
|
|
|
*
|
|
|
|
* THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS"
|
|
|
|
* AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
|
|
|
|
* IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
|
|
|
|
* ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER OR CONTRIBUTORS BE
|
|
|
|
* LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR
|
|
|
|
* CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF
|
|
|
|
* SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS
|
|
|
|
* INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN
|
|
|
|
* CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE)
|
|
|
|
* ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE
|
2006-01-02 19:04:38 +01:00
|
|
|
* POSSIBILITY OF SUCH DAMAGE.
|
|
|
|
*/
|
|
|
|
|
tipc: convert tipc reference table to use generic rhashtable
As tipc reference table is statically allocated, its memory size
requested on stack initialization stage is quite big even if the
maximum port number is just restricted to 8191 currently, however,
the number already becomes insufficient in practice. But if the
maximum ports is allowed to its theory value - 2^32, its consumed
memory size will reach a ridiculously unacceptable value. Apart from
this, heavy tipc users spend a considerable amount of time in
tipc_sk_get() due to the read-lock on ref_table_lock.
If tipc reference table is converted with generic rhashtable, above
mentioned both disadvantages would be resolved respectively: making
use of the new resizable hash table can avoid locking on the lookup;
smaller memory size is required at initial stage, for example, 256
hash bucket slots are requested at the beginning phase instead of
allocating the entire 8191 slots in old mode. The hash table will
grow if entries exceeds 75% of table size up to a total table size
of 1M, and it will automatically shrink if usage falls below 30%,
but the minimum table size is allowed down to 256.
Also converts ref_table_lock to a separate mutex to protect hash table
mutations on write side. Lastly defers the release of the socket
reference using call_rcu() to allow using an RCU read-side protected
call to rhashtable_lookup().
Signed-off-by: Ying Xue <ying.xue@windriver.com>
Acked-by: Jon Maloy <jon.maloy@ericsson.com>
Acked-by: Erik Hugne <erik.hugne@ericsson.com>
Cc: Thomas Graf <tgraf@suug.ch>
Acked-by: Thomas Graf <tgraf@suug.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-01-07 13:41:58 +08:00
|
|
|
#define pr_fmt(fmt) KBUILD_MODNAME ": " fmt
|
|
|
|
|
2006-01-02 19:04:38 +01:00
|
|
|
#include "core.h"
|
|
|
|
#include "name_table.h"
|
|
|
|
#include "subscr.h"
|
2015-02-09 09:50:18 +01:00
|
|
|
#include "bearer.h"
|
|
|
|
#include "net.h"
|
2014-08-22 18:09:18 -04:00
|
|
|
#include "socket.h"
|
2015-10-22 08:51:35 -04:00
|
|
|
#include "bcast.h"
|
2006-01-02 19:04:38 +01:00
|
|
|
|
2012-06-29 00:16:37 -04:00
|
|
|
#include <linux/module.h>
|
2006-01-02 19:04:38 +01:00
|
|
|
|
|
|
|
/* configurable TIPC parameters */
|
2012-08-16 12:09:12 +00:00
|
|
|
int tipc_net_id __read_mostly;
|
2013-06-17 10:54:37 -04:00
|
|
|
int sysctl_tipc_rmem[3] __read_mostly; /* min/default/max */
|
2006-01-02 19:04:38 +01:00
|
|
|
|
2015-01-09 15:27:04 +08:00
|
|
|
static int __net_init tipc_init_net(struct net *net)
|
|
|
|
{
|
|
|
|
struct tipc_net *tn = net_generic(net, tipc_net_id);
|
2015-01-09 15:27:08 +08:00
|
|
|
int err;
|
2015-01-09 15:27:04 +08:00
|
|
|
|
|
|
|
tn->net_id = 4711;
|
2015-01-09 15:27:10 +08:00
|
|
|
tn->own_addr = 0;
|
2015-01-09 15:27:12 +08:00
|
|
|
get_random_bytes(&tn->random, sizeof(int));
|
2015-01-09 15:27:05 +08:00
|
|
|
INIT_LIST_HEAD(&tn->node_list);
|
|
|
|
spin_lock_init(&tn->node_list_lock);
|
2015-01-09 15:27:04 +08:00
|
|
|
|
2015-01-09 15:27:08 +08:00
|
|
|
err = tipc_sk_rht_init(net);
|
2015-01-09 15:27:09 +08:00
|
|
|
if (err)
|
|
|
|
goto out_sk_rht;
|
|
|
|
|
|
|
|
err = tipc_nametbl_init(net);
|
|
|
|
if (err)
|
|
|
|
goto out_nametbl;
|
2015-01-09 15:27:11 +08:00
|
|
|
|
2016-04-07 10:40:43 -04:00
|
|
|
INIT_LIST_HEAD(&tn->dist_queue);
|
2015-05-04 10:36:44 +08:00
|
|
|
err = tipc_topsrv_start(net);
|
2015-01-09 15:27:11 +08:00
|
|
|
if (err)
|
|
|
|
goto out_subscr;
|
2015-10-22 08:51:35 -04:00
|
|
|
|
|
|
|
err = tipc_bcast_init(net);
|
|
|
|
if (err)
|
|
|
|
goto out_bclink;
|
|
|
|
|
2015-01-09 15:27:09 +08:00
|
|
|
return 0;
|
|
|
|
|
2015-10-22 08:51:35 -04:00
|
|
|
out_bclink:
|
|
|
|
tipc_bcast_stop(net);
|
2015-01-09 15:27:11 +08:00
|
|
|
out_subscr:
|
|
|
|
tipc_nametbl_stop(net);
|
2015-01-09 15:27:09 +08:00
|
|
|
out_nametbl:
|
|
|
|
tipc_sk_rht_destroy(net);
|
|
|
|
out_sk_rht:
|
2015-01-09 15:27:08 +08:00
|
|
|
return err;
|
2015-01-09 15:27:04 +08:00
|
|
|
}
|
|
|
|
|
|
|
|
static void __net_exit tipc_exit_net(struct net *net)
|
|
|
|
{
|
2015-05-04 10:36:44 +08:00
|
|
|
tipc_topsrv_stop(net);
|
2015-01-09 15:27:05 +08:00
|
|
|
tipc_net_stop(net);
|
2015-10-22 08:51:35 -04:00
|
|
|
tipc_bcast_stop(net);
|
2015-01-09 15:27:09 +08:00
|
|
|
tipc_nametbl_stop(net);
|
2015-01-09 15:27:08 +08:00
|
|
|
tipc_sk_rht_destroy(net);
|
2015-01-09 15:27:04 +08:00
|
|
|
}
|
|
|
|
|
|
|
|
static struct pernet_operations tipc_net_ops = {
|
|
|
|
.init = tipc_init_net,
|
|
|
|
.exit = tipc_exit_net,
|
|
|
|
.id = &tipc_net_id,
|
|
|
|
.size = sizeof(struct tipc_net),
|
|
|
|
};
|
|
|
|
|
2015-01-09 15:26:59 +08:00
|
|
|
static int __init tipc_init(void)
|
2006-01-02 19:04:38 +01:00
|
|
|
{
|
2014-02-20 11:32:49 +08:00
|
|
|
int err;
|
2006-01-02 19:04:38 +01:00
|
|
|
|
2015-01-09 15:26:59 +08:00
|
|
|
pr_info("Activated (version " TIPC_MOD_VER ")\n");
|
|
|
|
|
tipc: redesign connection-level flow control
There are two flow control mechanisms in TIPC; one at link level that
handles network congestion, burst control, and retransmission, and one
at connection level which' only remaining task is to prevent overflow
in the receiving socket buffer. In TIPC, the latter task has to be
solved end-to-end because messages can not be thrown away once they
have been accepted and delivered upwards from the link layer, i.e, we
can never permit the receive buffer to overflow.
Currently, this algorithm is message based. A counter in the receiving
socket keeps track of number of consumed messages, and sends a dedicated
acknowledge message back to the sender for each 256 consumed message.
A counter at the sending end keeps track of the sent, not yet
acknowledged messages, and blocks the sender if this number ever reaches
512 unacknowledged messages. When the missing acknowledge arrives, the
socket is then woken up for renewed transmission. This works well for
keeping the message flow running, as it almost never happens that a
sender socket is blocked this way.
A problem with the current mechanism is that it potentially is very
memory consuming. Since we don't distinguish between small and large
messages, we have to dimension the socket receive buffer according
to a worst-case of both. I.e., the window size must be chosen large
enough to sustain a reasonable throughput even for the smallest
messages, while we must still consider a scenario where all messages
are of maximum size. Hence, the current fix window size of 512 messages
and a maximum message size of 66k results in a receive buffer of 66 MB
when truesize(66k) = 131k is taken into account. It is possible to do
much better.
This commit introduces an algorithm where we instead use 1024-byte
blocks as base unit. This unit, always rounded upwards from the
actual message size, is used when we advertise windows as well as when
we count and acknowledge transmitted data. The advertised window is
based on the configured receive buffer size in such a way that even
the worst-case truesize/msgsize ratio always is covered. Since the
smallest possible message size (from a flow control viewpoint) now is
1024 bytes, we can safely assume this ratio to be less than four, which
is the value we are now using.
This way, we have been able to reduce the default receive buffer size
from 66 MB to 2 MB with maintained performance.
In order to keep this solution backwards compatible, we introduce a
new capability bit in the discovery protocol, and use this throughout
the message sending/reception path to always select the right unit.
Acked-by: Ying Xue <ying.xue@windriver.com>
Signed-off-by: Jon Maloy <jon.maloy@ericsson.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-05-02 11:58:47 -04:00
|
|
|
sysctl_tipc_rmem[0] = RCVBUF_MIN;
|
|
|
|
sysctl_tipc_rmem[1] = RCVBUF_DEF;
|
|
|
|
sysctl_tipc_rmem[2] = RCVBUF_MAX;
|
2015-01-09 15:26:59 +08:00
|
|
|
|
2014-02-20 11:32:49 +08:00
|
|
|
err = tipc_netlink_start();
|
|
|
|
if (err)
|
|
|
|
goto out_netlink;
|
|
|
|
|
2015-02-09 09:50:03 +01:00
|
|
|
err = tipc_netlink_compat_start();
|
|
|
|
if (err)
|
|
|
|
goto out_netlink_compat;
|
|
|
|
|
2014-02-20 11:32:49 +08:00
|
|
|
err = tipc_socket_init();
|
|
|
|
if (err)
|
|
|
|
goto out_socket;
|
|
|
|
|
|
|
|
err = tipc_register_sysctl();
|
|
|
|
if (err)
|
|
|
|
goto out_sysctl;
|
|
|
|
|
2015-01-09 15:27:11 +08:00
|
|
|
err = register_pernet_subsys(&tipc_net_ops);
|
2014-02-20 11:32:49 +08:00
|
|
|
if (err)
|
2015-01-09 15:27:11 +08:00
|
|
|
goto out_pernet;
|
2014-02-20 11:32:49 +08:00
|
|
|
|
2014-02-20 11:32:50 +08:00
|
|
|
err = tipc_bearer_setup();
|
|
|
|
if (err)
|
|
|
|
goto out_bearer;
|
|
|
|
|
2015-01-09 15:26:59 +08:00
|
|
|
pr_info("Started in single node mode\n");
|
2014-02-20 11:32:49 +08:00
|
|
|
return 0;
|
2014-02-20 11:32:50 +08:00
|
|
|
out_bearer:
|
2015-01-09 15:27:11 +08:00
|
|
|
unregister_pernet_subsys(&tipc_net_ops);
|
|
|
|
out_pernet:
|
2014-02-20 11:32:49 +08:00
|
|
|
tipc_unregister_sysctl();
|
|
|
|
out_sysctl:
|
|
|
|
tipc_socket_stop();
|
|
|
|
out_socket:
|
2015-02-09 09:50:03 +01:00
|
|
|
tipc_netlink_compat_stop();
|
|
|
|
out_netlink_compat:
|
2014-02-20 11:32:49 +08:00
|
|
|
tipc_netlink_stop();
|
|
|
|
out_netlink:
|
2015-01-09 15:26:59 +08:00
|
|
|
pr_err("Unable to start in single node mode\n");
|
2014-02-20 11:32:49 +08:00
|
|
|
return err;
|
2006-01-02 19:04:38 +01:00
|
|
|
}
|
|
|
|
|
|
|
|
static void __exit tipc_exit(void)
|
|
|
|
{
|
2015-01-09 15:26:59 +08:00
|
|
|
tipc_bearer_cleanup();
|
2015-04-01 09:42:50 +08:00
|
|
|
unregister_pernet_subsys(&tipc_net_ops);
|
2015-01-09 15:26:59 +08:00
|
|
|
tipc_netlink_stop();
|
2015-02-09 09:50:03 +01:00
|
|
|
tipc_netlink_compat_stop();
|
2015-01-09 15:26:59 +08:00
|
|
|
tipc_socket_stop();
|
|
|
|
tipc_unregister_sysctl();
|
|
|
|
|
2012-06-29 00:16:37 -04:00
|
|
|
pr_info("Deactivated\n");
|
2006-01-02 19:04:38 +01:00
|
|
|
}
|
|
|
|
|
|
|
|
module_init(tipc_init);
|
|
|
|
module_exit(tipc_exit);
|
|
|
|
|
|
|
|
MODULE_DESCRIPTION("TIPC: Transparent Inter Process Communication");
|
|
|
|
MODULE_LICENSE("Dual BSD/GPL");
|
2006-06-25 23:42:47 -07:00
|
|
|
MODULE_VERSION(TIPC_MOD_VER);
|