This patchs adds support of modification of the used logger via sysctl.
It can be used to change the logger to module that can not use the bind
operation (ipt_LOG and ipt_ULOG). For this purpose, it creates a
directory /proc/sys/net/netfilter/nf_log which contains a file
per-protocol. The content of the file is the name current logger (NONE if
not set) and a logger can be setup by simply echoing its name to the file.
By echoing "NONE" to a /proc/sys/net/netfilter/nf_log/PROTO file, the
logger corresponding to this PROTO is set to NULL.
Signed-off-by: Eric Leblond <eric@inl.fr>
Signed-off-by: Patrick McHardy <kaber@trash.net>
Discard incoming packets whose ack field iincludes data not yet sent.
This is consistent with RFC 793 Section 3.9.
Change tcp_ack() to distinguish between too-small and too-large ack
field values. Keep segments with too-large ack fields out of the fast
path, and change slow path to discard them.
Reported-by: Oliver Zheng <mailinglists+netdev@oliverzheng.com>
Signed-off-by: John Dykstra <john.dykstra1@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Now that most network device drivers in (all but one in x86_64 allmodconfig)
support net_device_ops. Expose it as a configuration parameter. Still
need to address even older 32 bit drivers, and other arch before
compatiablity can be scheduled for removal in some future release.
Signed-off-by: Stephen Hemminger <shemminger@vyatta.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Need to reference net_device_ops not old pointer.
Signed-off-by: Stephen Hemminger <shemminger@vyatta.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This converts the mpc device to using new netdevice_ops.
Compile tested only, needs more than usual review since
device was swaping pointers around etc.
Signed-off-by: Stephen Hemminger <shemminger@vyatta.com>
Acked-by: Chas Williams <chas@cmf.nrl.navy.mil>
Signed-off-by: David S. Miller <davem@davemloft.net>
The initial version of the DSA driver only supported a single switch
chip per network interface, while DSA-capable switch chips can be
interconnected to form a tree of switch chips. This patch adds support
for multiple switch chips on a network interface.
An example topology for a 16-port device with an embedded CPU is as
follows:
+-----+ +--------+ +--------+
| |eth0 10| switch |9 10| switch |
| CPU +----------+ +-------+ |
| | | chip 0 | | chip 1 |
+-----+ +---++---+ +---++---+
|| ||
|| ||
||1000baseT ||1000baseT
||ports 1-8 ||ports 9-16
This requires a couple of interdependent changes in the DSA layer:
- The dsa platform driver data needs to be extended: there is still
only one netdevice per DSA driver instance (eth0 in the example
above), but each of the switch chips in the tree needs its own
mii_bus device pointer, MII management bus address, and port name
array. (include/net/dsa.h) The existing in-tree dsa users need
some small changes to deal with this. (arch/arm)
- The DSA and Ethertype DSA tagging modules need to be extended to
use the DSA device ID field on receive and demultiplex the packet
accordingly, and fill in the DSA device ID field on transmit
according to which switch chip the packet is heading to.
(net/dsa/tag_{dsa,edsa}.c)
- The concept of "CPU port", which is the switch chip port that the
CPU is connected to (port 10 on switch chip 0 in the example), needs
to be extended with the concept of "upstream port", which is the
port on the switch chip that will bring us one hop closer to the CPU
(port 10 for both switch chips in the example above).
- The dsa platform data needs to specify which ports on which switch
chips are links to other switch chips, so that we can enable DSA
tagging mode on them. (For inter-switch links, we always use
non-EtherType DSA tagging, since it has lower overhead. The CPU
link uses dsa or edsa tagging depending on what the 'root' switch
chip supports.) This is done by specifying "dsa" for the given
port in the port array.
- The dsa platform data needs to be extended with information on via
which port to reach any given switch chip from any given switch chip.
This info is specified via the per-switch chip data struct ->rtable[]
array, which gives the nexthop ports for each of the other switches
in the tree.
For the example topology above, the dsa platform data would look
something like this:
static struct dsa_chip_data sw[2] = {
{
.mii_bus = &foo,
.sw_addr = 1,
.port_names[0] = "p1",
.port_names[1] = "p2",
.port_names[2] = "p3",
.port_names[3] = "p4",
.port_names[4] = "p5",
.port_names[5] = "p6",
.port_names[6] = "p7",
.port_names[7] = "p8",
.port_names[9] = "dsa",
.port_names[10] = "cpu",
.rtable = (s8 []){ -1, 9, },
}, {
.mii_bus = &foo,
.sw_addr = 2,
.port_names[0] = "p9",
.port_names[1] = "p10",
.port_names[2] = "p11",
.port_names[3] = "p12",
.port_names[4] = "p13",
.port_names[5] = "p14",
.port_names[6] = "p15",
.port_names[7] = "p16",
.port_names[10] = "dsa",
.rtable = (s8 []){ 10, -1, },
},
},
static struct dsa_platform_data pd = {
.netdev = &foo,
.nr_switches = 2,
.sw = sw,
};
Signed-off-by: Lennert Buytenhek <buytenh@marvell.com>
Tested-by: Gary Thomas <gary@mlbassoc.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Add support for the Marvell 88E6095/6095F switch chips. These
chips are similar to the 88e6131, so we can add the support to
mv88e6131.c easily.
Thanks to Gary Thomas <gary@mlbassoc.com> and Jesper Dangaard
Brouer <hawk@diku.dk> for testing various patches.
Signed-off-by: Lennert Buytenhek <buytenh@marvell.com>
Tested-by: Gary Thomas <gary@mlbassoc.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
..so that we can parse the DSA topology from 'ip link' output:
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 16436 qdisc noqueue
2: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast qlen 1000
3: eth1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast qlen 1000
4: lan1@eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue
5: lan2@eth0: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc noqueue
6: lan3@eth0: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc noqueue
7: lan4@eth0: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc noqueue
Signed-off-by: Lennert Buytenhek <buytenh@marvell.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Fix compiler warning about non-const format string.
Signed-off-by: Stephen Hemminger <shemminger@vyatta.com>
Acked-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Protocols should be able to use constant value for the descriptor.
Minor whitespace cleanup as well
Signed-off-by: Stephen Hemminger <shemminger@vyatta.com>
Acked-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
There is no gain using prefetch() in dev_hard_start_xmit(), since
we already had to read ops->ndo_select_queue pointer in dev_pick_tx(),
and both pointers are probably located in the same cache line.
This prefetch call slows down fast path because of a stall in address
computation.
Signed-off-by: Eric Dumazet <dada1@cosmosbay.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Remove 2 TEST_FRAME hacks that are no longer needed. These allowed
sctp regression tests to compile before, but are no longer needed.
Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Some minor changes to queue hashing:
1. Use const on accessor functions
2. Export skb_tx_hash for use in drivers (see ixgbe)
Signed-off-by: Stephen Hemminger <shemminger@vyatta.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Rather than calling device pointer directly (which is incorrect with
net_device_ops), use the standard dev_change_mtu. Compile tested only.
Signed-off-by: Stephen Hemminger <shemminger@vyatta.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
tcp_sack_swap seems unnecessary so I pushed swap to the caller.
Also removed comment that seemed then pointless, and added include
when not already there. Compile tested.
Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi>
Signed-off-by: David S. Miller <davem@davemloft.net>
A zero length frame filter was recently introduced in ROSE protocole.
Previous commit makes the same at AX25 protocole level.
This patch has the same purpose for NetRom protocole.
The reason is that empty frames have no meaning in NetRom protocole.
Signed-off-by: Bernard Pidoux <f6bvp@amsat.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
In previous commit 244f46ae6e9e18f6fc0be7d1f49febde4762c34b
was introduced a zero length frame filter for ROSE protocole.
This patch has the same purpose at AX25 frame level for the same
reason. Empty frames have no meaning in AX25 protocole.
Signed-off-by: Bernard Pidoux <f6bvp@amsat.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
NL80211_CMD_GET_MESH_PARAMS and NL80211_CMD_SET_MESH_PARAMS handlers
did not verify whether a function pointer is NULL (not supported by
the driver) before trying to call the function. The former nl80211
command is available for unprivileged users, too, so this can
potentially allow normal users to kill networking (or worse..) if
mac80211 is built without CONFIG_MAC80211_MESH=y.
Signed-off-by: Jouni Malinen <jouni.malinen@atheros.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
printk formats in prior commit were reversed/incorrect.
Compiled without warning on x86 and x86_64, but detected on ppc.
Signed-off-by: Tom Talpey <tmtalpey@gmail.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
As long as one task is holding the socket lock, then calls to
xprt_force_disconnect(xprt) will not succeed in shutting down the socket.
In particular, this would mean that a server initiated shutdown will not
succeed until the lock is relinquished.
In order to avoid the deadlock, we should ensure that xs_tcp_send_request()
closes the socket on EPIPE errors too.
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
This fixes a regression against FreeBSD servers as reported by Tomas
Kasparek. Apparently when using RPC over a TCP socket, the FreeBSD servers
don't ever react to the client closing the socket, and so commit
e06799f958bf7f9f8fae15f0c6f519953fb0257c (SUNRPC: Use shutdown() instead of
close() when disconnecting a TCP socket) causes the setup to hang forever
whenever the client attempts to close and then reconnect.
We break the deadlock by adding a 'linger2' style timeout to the socket,
after which, the client will abort the connection using a TCP 'RST'.
The default timeout is set to 15 seconds. A subsequent patch will put it
under user control by means of a systctl.
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
dev can be NULL in ip[6]_frag_reasm for skb's coming from RAW sockets.
Quagga's OSPFD sends fragmented packets on a RAW socket, when netfilter
conntrack reassembles them on the OUTPUT path you hit this code path.
You can test it with something like "hping2 -0 -d 2000 -f AA.BB.CC.DD"
With help from Jarek Poplawski.
Signed-off-by: Jorge Boncompte [DTI2] <jorge@dti2.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
struct sk_buff pointers should be freed with kfree_skb.
Signed-off-by: Roel Kluin <roel.kluin@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
broken by commit 5e739d1752aca4e8f3e794d431503bfca3162df4; AFAICS should
be -stable fodder as well...
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Aced-by: Vlad Yasevich <vladislav.yasevich@hp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Fix warnings from current gcc about using non-const strings as printf
args in TIPC. Compile tested only (not a TIPC user).
Signed-off-by: Stephen Hemminger <shemminger@vyatta.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This fixes the regressions cause by
commit 1326c3d5a4b792a2b15877feb7fb691f8945d203
(v2.6.28-rc6-461-g23a12b1) broke the display of local and remote
addresses of an SIT tunnel in iproute2.
nt->parms is used by ipip6_tunnel_init() and therefore need to be
initialized first.
Tracked as http://bugzilla.kernel.org/show_bug.cgi?id=12868
Reported-by: Jan Engelhardt <jengelh@medozas.de>
Signed-off-by: Bjørn Mork <bjorn@mork.no>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch removes an unused parameter (addr_len) from tcp_recv_urg()
method in net/ipv4/tcp.c.
Signed-off-by: Rami Rosen <ramirose@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Fix the behavior of allowing both sysctl and addrconf_dad_failure()
to set the disable_ipv6 parameter without any bad side-effects.
If DAD fails and accept_dad > 1, we will still set disable_ipv6=1,
but then instead of allowing an RA to add an address then
immediately fail DAD, we simply don't allow the address to be
added in the first place. This also lets the user set this flag
and disable all IPv6 addresses on the interface, or on the entire
system.
Signed-off-by: Brian Haley <brian.haley@hp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Allow the NFSv4 server to make use of TCP autotuning behaviour, which
was previously disabled by setting the sk_userlocks variable.
Set the receive buffers to be big enough to receive the whole RPC
request, and set this for the listening socket, not the accept socket.
Remove the code that readjusts the receive/send buffer sizes for the
accepted socket. Previously this code was used to influence the TCP
window management behaviour, which is no longer needed when autotuning
is enabled.
This can improve IO bandwidth on networks with high bandwidth-delay
products, where a large tcp window is required. It also simplifies
performance tuning, since getting adequate tcp buffers previously
required increasing the number of nfsd threads.
Signed-off-by: Olga Kornievskaia <aglo@citi.umich.edu>
Cc: Jim Rees <rees@umich.edu>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
Add /proc/fs/nfsd/pool_stats to export to userspace various
statistics about the operation of rpc server thread pools.
This patch is based on a forward-ported version of
knfsd-add-pool-thread-stats which has been shipping in the SGI
"Enhanced NFS" product since 2006 and which was previously
posted:
http://article.gmane.org/gmane.linux.nfs/10375
It has also been updated thus:
* moved EXPORT_SYMBOL() to near the function it exports
* made the new struct struct seq_operations const
* used SEQ_START_TOKEN instead of ((void *)1)
* merged fix from SGI PV 990526 "sunrpc: use dprintk instead of
printk in svc_pool_stats_*()" by Harshula Jayasuriya.
* merged fix from SGI PV 964001 "Crash reading pool_stats before
nfsds are started".
Signed-off-by: Greg Banks <gnb@sgi.com>
Signed-off-by: Harshula Jayasuriya <harshula@sgi.com>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
Avoid overloading the CPU scheduler with enormous load averages
when handling high call-rate NFS loads. When the knfsd bottom half
is made aware of an incoming call by the socket layer, it tries to
choose an nfsd thread and wake it up. As long as there are idle
threads, one will be woken up.
If there are lot of nfsd threads (a sensible configuration when
the server is disk-bound or is running an HSM), there will be many
more nfsd threads than CPUs to run them. Under a high call-rate
low service-time workload, the result is that almost every nfsd is
runnable, but only a handful are actually able to run. This situation
causes two significant problems:
1. The CPU scheduler takes over 10% of each CPU, which is robbing
the nfsd threads of valuable CPU time.
2. At a high enough load, the nfsd threads starve userspace threads
of CPU time, to the point where daemons like portmap and rpc.mountd
do not schedule for tens of seconds at a time. Clients attempting
to mount an NFS filesystem timeout at the very first step (opening
a TCP connection to portmap) because portmap cannot wake up from
select() and call accept() in time.
Disclaimer: these effects were observed on a SLES9 kernel, modern
kernels' schedulers may behave more gracefully.
The solution is simple: keep in each svc_pool a counter of the number
of threads which have been woken but have not yet run, and do not wake
any more if that count reaches an arbitrary small threshold.
Testing was on a 4 CPU 4 NIC Altix using 4 IRIX clients, each with 16
synthetic client threads simulating an rsync (i.e. recursive directory
listing) workload reading from an i386 RH9 install image (161480
regular files in 10841 directories) on the server. That tree is small
enough to fill in the server's RAM so no disk traffic was involved.
This setup gives a sustained call rate in excess of 60000 calls/sec
before being CPU-bound on the server. The server was running 128 nfsds.
Profiling showed schedule() taking 6.7% of every CPU, and __wake_up()
taking 5.2%. This patch drops those contributions to 3.0% and 2.2%.
Load average was over 120 before the patch, and 20.9 after.
This patch is a forward-ported version of knfsd-avoid-nfsd-overload
which has been shipping in the SGI "Enhanced NFS" product since 2006.
It has been posted before:
http://article.gmane.org/gmane.linux.nfs/10374
Signed-off-by: Greg Banks <gnb@sgi.com>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
Convert the remaining refcount users.
As pointed out by Patrick McHardy, the protocols can be accessed safely using RCU.
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Patrick McHardy <kaber@trash.net>
On the legacy netif_rx path, I incorrectly tried to optimise
the napi_complete call by using __napi_complete before we reenable
IRQs. This simply doesn't work since we need to flush the held
GRO packets first.
This patch fixes it by doing the obvious thing of reenabling
IRQs first and then calling napi_complete.
Reported-by: Frank Blaschka <blaschka@linux.vnet.ibm.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
Jarek Poplawski pointed out that my previous fix is broken for
VLAN+netpoll as if netpoll is enabled we'd end up in the normal
receive path instead of the VLAN receive path.
This patch fixes it by calling the VLAN receive hook.
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
This allows us to send to userspace "regulatory" events.
For now we just send an event when we change regulatory domains.
We also notify userspace when devices are using their own custom
world roaming regulatory domains.
Signed-off-by: Luis R. Rodriguez <lrodriguez@atheros.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
We do this so we can later inform userspace who set the
regulatory domain and provide details of the request.
Signed-off-by: Luis R. Rodriguez <lrodriguez@atheros.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
This is not used as we can always just assume the first
regulatory domain set will _always_ be a static regulatory
domain. REGDOM_SET_BY_CORE will be the first request from
cfg80211 for a regdomain and that then populates the first
regulatory request.
Signed-off-by: Luis R. Rodriguez <lrodriguez@atheros.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
Even after commit "mac80211: deauth when interface is marked down"
(e327b847 on Linus tree), userspace still isn't notified when interface
goes down. There isn't a problem with this commit, but because of other
code changes it doesn't work on kernels >= 2.6.28 (works if same/similar
change applied on 2.6.27 for example).
The issue is as follows: after commit "mac80211: restructure disassoc/deauth
flows" in 2.6.28, the call to ieee80211_sta_deauthenticate added by
commit e327b847 will not work: because we do sta_info_flush(local, sdata)
inside ieee80211_stop (iface.c), all stations in interface are cleared, so
when calling ieee80211_sta_deauthenticate->ieee80211_set_disassoc (mlme.c),
inside ieee80211_set_disassoc we have this in the beginning:
sta = sta_info_get(local, ifsta->bssid);
if (!sta) {
The !sta check triggers, thus the function returns early and
ieee80211_sta_send_apinfo(sdata, ifsta) later isn't called, so
wpa_supplicant/userspace isn't notified with SIOCGIWAP.
This commit moves deauthentication to before flushing STA info
(sta_info_flush), thus the above can't happen and userspace is really
notified when interface goes down.
Signed-off-by: Herton Ronaldo Krzesinski <herton@mandriva.com.br>
Signed-off-by: John W. Linville <linville@tuxdriver.com>