Commit Graph

20159 Commits

Author SHA1 Message Date
Changli Gao
1ff1986fc9 net: rps: support 802.1Q
For the 802.1Q packets, if the NIC doesn't support hw-accel-vlan-rx, RPS
won't inspect the internal 4 tuples to generate skb->rxhash, so this kind
of traffic can't get any benefit from RPS.

This patch adds the support for 802.1Q to RPS.

Signed-off-by: Changli Gao <xiaosuo@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2011-08-18 22:07:54 -07:00
Jiri Pirko
b81693d914 net: remove ndo_set_multicast_list callback
Remove no longer used operation.

Signed-off-by: Jiri Pirko <jpirko@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2011-08-17 20:22:03 -07:00
Jiri Pirko
afc4b13df1 net: remove use of ndo_set_multicast_list in drivers
replace it by ndo_set_rx_mode

Signed-off-by: Jiri Pirko <jpirko@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2011-08-17 20:22:03 -07:00
Jiri Pirko
01789349ee net: introduce IFF_UNICAST_FLT private flag
Use IFF_UNICAST_FTL to find out if driver handles unicast address
filtering. In case it does not, promisc mode is entered.

Patch also fixes following drivers:
stmmac, niu: support uc filtering and yet it propagated
	ndo_set_multicast_list
bna, benet, pxa168_eth, ks8851, ks8851_mll, ksz884x : has set
	ndo_set_rx_mode but do not support uc filtering

Signed-off-by: Jiri Pirko <jpirko@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2011-08-17 20:21:27 -07:00
Tom Herbert
c6865cb3cc rps: Inspect GRE encapsulated packets to get flow hash
Crack open GRE packets in __skb_get_rxhash to compute 4-tuple hash on
in encapsulated packet.  Note that this is used only when the
__skb_get_rxhash is taken, in particular only when the device does
not compute provide the rxhash (ie. feature is disabled).

This was tested by creating a single GRE tunnel between two 16 core
AMD machines.  200 netperf TCP_RR streams were ran with 1 byte
request and response size.

Without patch: 157497 tps, 50/90/99% latencies 1250/1292/1364 usecs
With patch: 325896 tps, 50/90/99% latencies 603/848/1169

Signed-off-by: Tom Herbert <therbert@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2011-08-17 20:06:03 -07:00
Tom Herbert
e971b7225b rps: Infrastructure in __skb_get_rxhash for deep inspection
Basics for looking for ports in encapsulated packets in tunnels.

Signed-off-by: Tom Herbert <therbert@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2011-08-17 20:06:03 -07:00
Tom Herbert
bdeab99191 rps: Add flag to skb to indicate rxhash is based on L4 tuple
The l4_rxhash flag was added to the skb structure to indicate
that the rxhash value was computed over the 4 tuple for the
packet which includes the port information in the encapsulated
transport packet.  This is used by the stack to preserve the
rxhash value in __skb_rx_tunnel.

Signed-off-by: Tom Herbert <therbert@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2011-08-17 20:06:03 -07:00
Tom Herbert
792df22cd0 rps: Some minor cleanup in get_rps_cpus
Use some variables for clarity and extensibility.

Signed-off-by: Tom Herbert <therbert@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2011-08-17 20:06:03 -07:00
Ursula Braun
3881ac441f af_iucv: add HiperSockets transport
The current transport mechanism for af_iucv is the z/VM offered
communications facility IUCV. To provide equivalent support when
running Linux in an LPAR, HiperSockets transport is added to the
AF_IUCV address family. It requires explicit binding of an AF_IUCV
socket to a HiperSockets device. A new packet_type ETH_P_AF_IUCV
is announced. An af_iucv specific transport header is defined
preceding the skb data. A small protocol is implemented for
connecting and for flow control/congestion management.

Signed-off-by: Ursula Braun <ursula.braun@de.ibm.com>
Signed-off-by: Frank Blaschka <frank.blaschka@de.ibm.com>
Reviewed-by: Hendrik Brueckner <brueckner@linux.vnet.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2011-08-13 01:10:16 -07:00
Ursula Braun
493d3971a6 af_iucv: cleanup - use iucv_sk(sk) early
Code cleanup making make use of local variable for struct iucv_sock.

Signed-off-by: Ursula Braun <ursula.braun@de.ibm.com>
Signed-off-by: Frank Blaschka <frank.blaschka@de.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2011-08-13 01:10:16 -07:00
Frank Blaschka
6fcd61f7bf af_iucv: use loadable iucv interface
For future af_iucv extensions the module should be able to run in LPAR
mode too. For this we use the new dynamic loading iucv interface.

Signed-off-by: Frank Blaschka <frank.blaschka@de.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2011-08-13 01:10:16 -07:00
Ursula Braun
c69748d1c9 iucv: kernel option for z/VM IUCV and HiperSockets
When adding HiperSockets transport to AF_IUCV Sockets, af_iucv either
depends on IUCV or QETH_L3 (or both). This patch introduces the
necessary changes for kernel configuration.

Signed-off-by: Ursula Braun <ursula.braun@de.ibm.com>
Signed-off-by: Frank Blaschka <frank.blaschka@de.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2011-08-13 01:10:16 -07:00
Frank Blaschka
96d042a68b iucv: introduce loadable iucv interface
This patch adds a symbol to dynamically load iucv functions.

Signed-off-by: Frank Blaschka <frank.blaschka@de.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2011-08-13 01:10:15 -07:00
Eric Dumazet
33d480ce6d net: cleanup some rcu_dereference_raw
RCU api had been completed and rcu_access_pointer() or
rcu_dereference_protected() are better than generic
rcu_dereference_raw()

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2011-08-12 02:55:28 -07:00
Eric Dumazet
cd28ca0a3d neigh: reduce arp latency
Remove the artificial HZ latency on arp resolution.

Instead of firing a timer in one jiffy (up to 10 ms if HZ=100), lets
send the ARP message immediately.

Before patch :

# arp -d 192.168.20.108 ; ping -c 3 192.168.20.108
PING 192.168.20.108 (192.168.20.108) 56(84) bytes of data.
64 bytes from 192.168.20.108: icmp_seq=1 ttl=64 time=9.91 ms
64 bytes from 192.168.20.108: icmp_seq=2 ttl=64 time=0.065 ms
64 bytes from 192.168.20.108: icmp_seq=3 ttl=64 time=0.061 ms

After patch :

$ arp -d 192.168.20.108 ; ping -c 3 192.168.20.108
PING 192.168.20.108 (192.168.20.108) 56(84) bytes of data.
64 bytes from 192.168.20.108: icmp_seq=1 ttl=64 time=0.152 ms
64 bytes from 192.168.20.108: icmp_seq=2 ttl=64 time=0.064 ms
64 bytes from 192.168.20.108: icmp_seq=3 ttl=64 time=0.074 ms

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2011-08-12 02:55:28 -07:00
Jiri Pirko
e7c379d2a0 rtnetlink: remove initialization of dev->real_num_tx_queues
dev->real_num_tx_queues is correctly set already in alloc_netdev_mqs.

Signed-off-by: Jiri Pirko <jpirko@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2011-08-11 07:44:38 -07:00
Eric Dumazet
9de79c127c net: fix potential neighbour race in dst_ifdown()
Followup of commit f2c31e32b3 (fix NULL dereferences in
check_peer_redir()).

We need to make sure dst neighbour doesnt change in dst_ifdown().

Fix some sparse errors.

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2011-08-09 21:47:14 -07:00
David S. Miller
19fd61785a Merge branch 'master' of master.kernel.org:/pub/scm/linux/kernel/git/davem/net 2011-08-07 23:20:26 -07:00
Julian Anastasov
d52fbfc9e5 ipv4: use dst with ref during bcast/mcast loopback
Make sure skb dst has reference when moving to
another context. Currently, I don't see protocols that can
hit it when sending broadcasts/multicasts to loopback using
noref dsts, so it is just a precaution.

Signed-off-by: Julian Anastasov <ja@ssi.bg>
Signed-off-by: David S. Miller <davem@davemloft.net>
2011-08-07 22:52:32 -07:00
Julian Anastasov
47670b767b ipv4: route non-local sources for raw socket
The raw sockets can provide source address for
routing but their privileges are not considered. We
can provide non-local source address, make sure the
FLOWI_FLAG_ANYSRC flag is set if socket has privileges
for this, i.e. based on hdrincl (IP_HDRINCL) and
transparent flags.

Signed-off-by: Julian Anastasov <ja@ssi.bg>
Signed-off-by: David S. Miller <davem@davemloft.net>
2011-08-07 22:52:32 -07:00
Julian Anastasov
797fd3913a netfilter: TCP and raw fix for ip_route_me_harder
TCP in some cases uses different global (raw) socket
to send RST and ACK. The transparent flag is not set there.
Currently, it is a problem for rerouting after the previous
change.

	Fix it by simplifying the checks in ip_route_me_harder
and use FLOWI_FLAG_ANYSRC even for sockets. It looks safe
because the initial routing allowed this source address to
be used and now we just have to make sure the packet is rerouted.

	As a side effect this also allows rerouting for normal
raw sockets that use spoofed source addresses which was not possible
even before we eliminated the ip_route_input call.

Signed-off-by: Julian Anastasov <ja@ssi.bg>
Signed-off-by: David S. Miller <davem@davemloft.net>
2011-08-07 22:52:32 -07:00
Daniel Baluta
dd23198e58 ipv4: Fix ip_getsockopt for IP_PKTOPTIONS
IP_PKTOPTIONS is broken for 32-bit applications running
in COMPAT mode on 64-bit kernels.

This happens because msghdr's msg_flags field is always
set to zero. When running in COMPAT mode this should be
set to MSG_CMSG_COMPAT instead.

Signed-off-by: Tiberiu Szocs-Mihai <tszocs@ixiacom.com>
Signed-off-by: Daniel Baluta <dbaluta@ixiacom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2011-08-07 22:31:07 -07:00
Julian Anastasov
d547f727df ipv4: fix the reusing of routing cache entries
compare_keys and ip_route_input_common rely on
rt_oif for distinguishing of input and output routes
with same keys values. But sometimes the input route has
also same hash chain (keyed by iif != 0) with the output
routes (keyed by orig_oif=0). Problem visible if running
with small number of rhash_entries.

	Fix them to use rt_route_iif instead. By this way
input route can not be returned to users that request
output route.

	The patch fixes the ip_rt_bug errors that were
reported in ip_local_out context, mostly for 255.255.255.255
destinations.

Signed-off-by: Julian Anastasov <ja@ssi.bg>
Signed-off-by: David S. Miller <davem@davemloft.net>
2011-08-07 22:20:20 -07:00
Julian Anastasov
fad5444043 netfilter: avoid double free in nf_reinject
NF_STOLEN means skb was already freed

Signed-off-by: Julian Anastasov <ja@ssi.bg>
Signed-off-by: David S. Miller <davem@davemloft.net>
2011-08-07 22:11:15 -07:00
David S. Miller
6e5714eaf7 net: Compute protocol sequence numbers and fragment IDs using MD5.
Computers have become a lot faster since we compromised on the
partial MD4 hash which we use currently for performance reasons.

MD5 is a much safer choice, and is inline with both RFC1948 and
other ISS generators (OpenBSD, Solaris, etc.)

Furthermore, only having 24-bits of the sequence number be truly
unpredictable is a very serious limitation.  So the periodic
regeneration and 8-bit counter have been removed.  We compute and
use a full 32-bit sequence number.

For ipv6, DCCP was found to use a 32-bit truncated initial sequence
number (it needs 43-bits) and that is fixed here as well.

Reported-by: Dan Kaminsky <dan@doxpara.com>
Tested-by: Willy Tarreau <w@1wt.eu>
Signed-off-by: David S. Miller <davem@davemloft.net>
2011-08-06 18:33:19 -07:00
Max Matveev
c15fea2d8c ipv6: check for IPv4 mapped addresses when connecting IPv6 sockets
When support for binding to 'mapped INADDR_ANY (::ffff.0.0.0.0)' was added
in 0f8d3c7ac3 the rest of the code
wasn't told so now it's possible to bind IPv6 datagram socket to
::ffff.0.0.0.0, connect it to another IPv4 address and it will all
work except for getsockhame() which does not return the local address
as expected.

To give getsockname() something to work with check for 'mapped INADDR_ANY'
when connecting and update the in-core source addresses appropriately.

Signed-off-by: Max Matveev <makc@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2011-08-05 03:56:30 -07:00
Tetsuo Handa
c71d8ebe7a net: Fix security_socket_sendmsg() bypass problem.
The sendmmsg() introduced by commit 228e548e "net: Add sendmmsg socket system
call" is capable of sending to multiple different destination addresses.

SMACK is using destination's address for checking sendmsg() permission.
However, security_socket_sendmsg() is called for only once even if multiple
different destination addresses are passed to sendmmsg().

Therefore, we need to call security_socket_sendmsg() for each destination
address rather than only the first destination address.

Since calling security_socket_sendmsg() every time when only single destination
address was passed to sendmmsg() is a waste of time, omit calling
security_socket_sendmsg() unless destination address of previous datagram and
that of current datagram differs.

Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Acked-by: Anton Blanchard <anton@samba.org>
Cc: stable <stable@kernel.org> [3.0+]
Signed-off-by: David S. Miller <davem@davemloft.net>
2011-08-05 03:31:03 -07:00
Anton Blanchard
98382f419f net: Cap number of elements for sendmmsg
To limit the amount of time we can spend in sendmmsg, cap the
number of elements to UIO_MAXIOV (currently 1024).

For error handling an application using sendmmsg needs to retry at
the first unsent message, so capping is simpler and requires less
application logic than returning EINVAL.

Signed-off-by: Anton Blanchard <anton@samba.org>
Cc: stable <stable@kernel.org> [3.0+]
Signed-off-by: David S. Miller <davem@davemloft.net>
2011-08-05 03:31:03 -07:00
Anton Blanchard
728ffb86f1 net: sendmmsg should only return an error if no messages were sent
sendmmsg uses a similar error return strategy as recvmmsg but it
turns out to be a confusing way to communicate errors.

The current code stores the error code away and returns it on the next
sendmmsg call. This means a call with completely valid arguments could
get an error from a previous call.

Change things so we only return an error if no datagrams could be sent.
If less than the requested number of messages were sent, the application
must retry starting at the first failed one and if the problem is
persistent the error will be returned.

This matches the behaviour of other syscalls like read/write - it
is not an error if less than the requested number of elements are sent.

Signed-off-by: Anton Blanchard <anton@samba.org>
Cc: stable <stable@kernel.org> [3.0+]
Signed-off-by: David S. Miller <davem@davemloft.net>
2011-08-05 03:31:02 -07:00
David S. Miller
5be1334062 Merge branch 'for-davem' of git://git.kernel.org/pub/scm/linux/kernel/git/linville/wireless 2011-08-03 16:36:41 -07:00
John W. Linville
a5d5a91477 Merge git://git.kernel.org/pub/scm/linux/kernel/git/linville/wireless-next into for-davem 2011-08-03 09:18:21 -04:00
Eric Dumazet
f2c31e32b3 net: fix NULL dereferences in check_peer_redir()
Gergely Kalman reported crashes in check_peer_redir().

It appears commit f39925dbde (ipv4: Cache learned redirect
information in inetpeer.) added a race, leading to possible NULL ptr
dereference.

Since we can now change dst neighbour, we should make sure a reader can
safely use a neighbour.

Add RCU protection to dst neighbour, and make sure check_peer_redir()
can be called safely by different cpus in parallel.

As neighbours are already freed after one RCU grace period, this patch
should not add typical RCU penalty (cache cold effects)

Many thanks to Gergely for providing a pretty report pointing to the
bug.

Reported-by: Gergely Kalman <synapse@hippy.csoma.elte.hu>
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2011-08-03 03:34:12 -07:00
Stephen Hemminger
a9b3cd7f32 rcu: convert uses of rcu_assign_pointer(x, NULL) to RCU_INIT_POINTER
When assigning a NULL value to an RCU protected pointer, no barrier
is needed. The rcu_assign_pointer, used to handle that but will soon
change to not handle the special case.

Convert all rcu_assign_pointer of NULL value.

//smpl
@@ expression P; @@

- rcu_assign_pointer(P, NULL)
+ RCU_INIT_POINTER(P, NULL)

// </smpl>

Signed-off-by: Stephen Hemminger <shemminger@vyatta.com>
Acked-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2011-08-02 04:29:23 -07:00
Lorenzo Colitti
76f793e3a4 ipv6: updates to privacy addresses per RFC 4941.
Update the code to handle some of the differences between
RFC 3041 and RFC 4941, which obsoletes it. Also a couple
of janitorial fixes.

- Allow router advertisements to increase the lifetime of
  temporary addresses. This was not allowed by RFC 3041,
  but is specified by RFC 4941. It is useful when RA
  lifetimes are lower than TEMP_{VALID,PREFERRED}_LIFETIME:
  in this case, the previous code would delete or deprecate
  addresses prematurely.

- Change the default of MAX_RETRY to 3 per RFC 4941.

- Add a comment to clarify that the preferred and valid
  lifetimes in inet6_ifaddr are relative to the timestamp.

- Shorten lines to 80 characters in a couple of places.

Signed-off-by: Lorenzo Colitti <lorenzo@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2011-08-01 18:05:00 -07:00
Eric Dumazet
22019b1782 net: add kerneldoc to skb_copy_bits()
Since skb_copy_bits() is called from assembly, add a fat comment to make
clear we should think twice before changing its prototype.

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2011-08-01 18:03:06 -07:00
Paul Moore
82c21bfab4 doc: Update the email address for Paul Moore in various source files
My @hp.com will no longer be valid starting August 5, 2011 so an update is
necessary.  My new email address is employer independent so we don't have
to worry about doing this again any time soon.

Signed-off-by: Paul Moore <paul.moore@hp.com>
Signed-off-by: Paul Moore <paul@paul-moore.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2011-08-01 17:58:33 -07:00
Chas Williams
a08af810cd atm: br2864: sent packets truncated in VC routed mode
Reported-by: Pascal Hambourg <pascal@plouf.fr.eu.org>
Signed-off-by: Chas Williams <chas@cmf.nrl.navy.mil>
Signed-off-by: David S. Miller <davem@davemloft.net>
2011-08-01 17:56:14 -07:00
David S. Miller
ebdcc94b4b Merge branch 'dccp' of git://eden-feed.erg.abdn.ac.uk/net-next-2.6 2011-08-01 17:37:22 -07:00
Dan Carpenter
84404623da cfg80211: off by one in nl80211_trigger_scan()
The test is off by one so we'd read past the end of the
wiphy->bands[] array on the next line.

Signed-off-by: Dan Carpenter <error27@gmail.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2011-08-01 13:46:46 -04:00
Samuel Jero
d96a9e8dd0 dccp ccid-2: check Ack Ratio when reducing cwnd
This patch causes CCID-2 to check the Ack Ratio after reducing the congestion
window. If the Ack Ratio is greater than the congestion window, it is
reduced. This prevents timeouts caused by an Ack Ratio larger than the
congestion window.

In this situation, we choose to set the Ack Ratio to half the congestion window
(or one if that's zero) so that if we loose one ack we don't trigger a timeout.

Signed-off-by: Samuel Jero <sj323707@ohio.edu> 
Acked-by: Gerrit Renker <gerrit@erg.abdn.ac.uk>
2011-08-01 07:52:36 -06:00
Samuel Jero
0ce95dc792 dccp ccid-2: increment cwnd correctly
This patch fixes an issue where CCID-2 will not increase the congestion
window for numerous RTTs after an idle period, application-limited period,
or a loss once the algorithm is in Congestion Avoidance.

What happens is that, when CCID-2 is in Congestion Avoidance mode, it will
increase hc->tx_packets_acked by one for every packet and will increment cwnd
every cwnd packets. However, if there is now an idle period in the connection,
cwnd will be reduced, possibly below the slow start threshold. This will
cause the connection to go into Slow Start. However, in Slow Start CCID-2
performs this test to increment cwnd every second ack:

	++hc->tx_packets_acked == 2

Unfortunately, this will be incorrect, if cwnd previous to the idle period
was larger than 2 and if tx_packets_acked was close to cwnd. For example:
	cwnd=50  and  tx_packets_acked=45.

In this case, the current code, will increment tx_packets_acked until it
equals two, which will only be once tx_packets_acked (an unsigned 32-bit
integer) overflows.

My fix is simply to change that test for tx_packets_acked greater than or
equal to two in slow start.

Signed-off-by: Samuel Jero <sj323707@ohio.edu>
Acked-by: Gerrit Renker <gerrit@erg.abdn.ac.uk>
2011-08-01 07:52:36 -06:00
Samuel Jero
d346d886a4 dccp ccid-2: prevent cwnd > Sequence Window
Add a check to prevent CCID-2 from increasing the cwnd greater than the
Sequence Window.

When the congestion window becomes bigger than the Sequence Window, CCID-2
will attempt to keep more data in the network than the DCCP Sequence Window
code considers possible. This results in the Sequence Window code issuing
a Sync, thereby inducing needless overhead. Further, if this occurs at the
sender, CCID-2 will never detect the problem because the Acks it receives
will indicate no losses. I have seen this cause a drop of 1/3rd in throughput
for a connection.

Also add code to adjust the Sequence Window to be about 5 times the number of
packets in the network (RFC 4340, 7.5.2) and to adjust the Ack Ratio so that
the remote Sequence Window will hold about 5 times the number of packets in
the network. This allows the congestion window to increase correctly without
being limited by the Sequence Window.

Signed-off-by: Samuel Jero <sj323707@ohio.edu>
Acked-by: Gerrit Renker <gerrit@erg.abdn.ac.uk>
2011-08-01 07:52:35 -06:00
Gerrit Renker
31daf0393f dccp ccid-2: use feature-negotiation to report Ack Ratio changes
This uses the new feature-negotiation framework to signal Ack Ratio changes,
as required by RFC 4341, sec. 6.1.2.

That raises some problems with CCID-2, which at the moment can not cope
gracefully with Ack Ratios > 1. Since these issues are not directly related
to feature negotiation, they are marked by a FIXME.

Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk>
Signed-off-by: Samuel Jero <sj323707@ohio.edu>
Acked-by: Ian McDonald <ian.mcdonald@jandi.co.uk>
2011-08-01 07:52:35 -06:00
Samuel Jero
a6444f4237 dccp: send Confirm options only once
If a connection is in the OPEN state, remove feature negotiation Confirm
options from the list of options after sending them once; as such options
are NOT supposed to be retransmitted and are ONLY supposed to be sent in
response to a Change option (RFC 4340 6.2).

Signed-off-by: Samuel Jero <sj323707@ohio.edu>
Acked-by: Gerrit Renker <gerrit@erg.abdn.ac.uk>
2011-08-01 07:52:35 -06:00
Gerrit Renker
44e6fd9e67 dccp: support for exchanging of NN options in established state 2/2
This patch adds the receiver side and the (fast-path) activation part for
dynamic changes of non-negotiable (NN) parameters in (PART)OPEN state.

Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk>
Signed-off-by: Samuel Jero <sj323707@ohio.edu>
Acked-by: Ian McDonald <ian.mcdonald@jandi.co.uk>
2011-08-01 07:52:34 -06:00
Gerrit Renker
d6916f87ca dccp: support for the exchange of NN options in established state 1/2
In contrast to static feature negotiation at the begin of a connection, this
patch introduces support for exchange of dynamically changing options.

Such an update/exchange is necessary in at least two cases:
 * CCID-2's Ack Ratio (RFC 4341, 6.1.2) which changes during the connection;
 * Sequence Window values that, as per RFC 4340, 7.5.2, should be sent "as
   the connection progresses".

Both are non-negotiable (NN) features, which means that no new capabilities
are negotiated, but rather that changes in known parameters are brought
up-to-date at either end.

Thse characteristics are reflected by the implementation:
 * only NN options can be exchanged after connection setup;
 * an ack is scheduled directly after activation to speed up the update;
 * CCIDs may request changes to an NN feature even if a negotiation for that
   feature is already underway: this is required by CCID-2, where changes in
   cwnd necessitate Ack Ratio changes, such that the previous Ack Ratio (which
   is still being negotiated) would cause irrecoverable RTO timeouts (thanks
   to work by Samuel Jero).	   

Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk>
Signed-off-by: Samuel Jero <sj323707@ohio.edu>
Acked-by: Ian McDonald <ian.mcdonald@jandi.co.uk>
2011-08-01 07:52:34 -06:00
Eric Dumazet
e1738bd9ce sch_sfq: fix sfq_enqueue()
commit 8efa885406 (sch_sfq: avoid giving spurious NET_XMIT_CN signals)
forgot to call qdisc_tree_decrease_qlen() to signal upper levels that a
packet (from another flow) was dropped, leading to various problems.

With help from Michal Soltys and Michal Pokrywka, who did a bisection.

Bugzilla ref: https://bugzilla.kernel.org/show_bug.cgi?id=39372
Debian ref: http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=631945

Reported-by: Lucas Bocchi <lucas.bocchi@gmail.com>
Reported-and-bisected-by: Michal Pokrywka <wolfmoon@o2.pl>
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
CC: Michal Soltys <soltys@ziu.info>
Acked-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2011-08-01 02:27:21 -07:00
Julia Lawall
a1889c0d20 net: adjust array index
Convert array index from the loop bound to the loop index.

A simplified version of the semantic patch that fixes this problem is as
follows: (http://coccinelle.lip6.fr/)

// <smpl>
@@
expression e1,e2,ar;
@@

for(e1 = 0; e1 < e2; e1++) { <...
  ar[
- e2
+ e1
  ]
  ...> }
// </smpl>

Signed-off-by: Julia Lawall <julia@diku.dk>
Signed-off-by: David S. Miller <davem@davemloft.net>
2011-08-01 02:27:21 -07:00
Eric Dumazet
89b0212697 ip6tnl: avoid touching dst refcount in ip6_tnl_xmit2()
Even using percpu stats, we still hit tunnel dst_entry refcount in
ip6_tnl_xmit2()

Since we are in RCU locked section, we can use skb_dst_set_noref() and
avoid these atomic operations, leaving dst shared on cpus.

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2011-08-01 00:12:00 -07:00
Eric Dumazet
897dc80b95 ipv6: avoid a dst_entry refcount change in ipv6_destopt_rcv()
ipv6_destopt_rcv() runs with rcu_read_lock(), so there is no need to
take a temporay reference on dst_entry, even if skb is freed by
ip6_parse_tlv()

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2011-08-01 00:12:00 -07:00