This patch fixes 2 issues in lowpan_skb_deliver function:
1. Check for return status of skb_copy call;
2. Use skb_copy with proper GFP flag, drop check for non-interrupt
context.
Signed-off-by: Alexander Smirnov <alex.bluesman.smirnov@gmail.com>
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Since these checks and initialization are done in
dev_ethtool_get_settings called later on, remove this redundancy.
Signed-off-by: Jiri Pirko <jpirko@redhat.com>
Acked-by: Stephen Hemminger <shemminger@vyatta.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
There is a time-lag of IFF_RUNNING flag consistency between vlan and
real devices when the real devices are in problem such as link or cable
broken.
This leads to a degradation of Availability such as a delay of failover
in HA systems using vlan since the detection of the problem at real
device is delayed.
We can avoid the linkwatch delay (~1 sec) for devices linked to another
ones, since delay is already done for the realdev.
Based on a previous patch from Mitsuo Hayasaka
Reported-by: Mitsuo Hayasaka <mitsuo.hayasaka.hu@hitachi.com>
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Cc: Herbert Xu <herbert@gondor.apana.org.au>
Cc: Patrick McHardy <kaber@trash.net>
Cc: "Michał Mirosław" <mirq-linux@rere.qmqm.pl>
Cc: Tom Herbert <therbert@google.com>
Cc: Stephen Hemminger <shemminger@vyatta.com>
Cc: Jesse Gross <jesse@nicira.com>
Tested-by: Mitsuo Hayasaka <mitsuo.hayasaka.hu@hitachi.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The functionality of xlist and llist is almost same. This patch
replace xlist with llist to avoid code duplication.
Known issues: don't know how to test this, need special hardware?
Signed-off-by: Huang Ying <ying.huang@intel.com>
Cc: Chris Mason <chris.mason@oracle.com>
Cc: Andy Grover <andy.grover@oracle.com>
Cc: "David S. Miller" <davem@davemloft.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
We should release the dev_hold() on error before returning here.
Signed-off-by: Dan Carpenter <error27@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
When we kfree(entry) that causes a use-after-free bug so we have to
use list_for_each_entry_safe() safe here.
Signed-off-by: Dan Carpenter <error27@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
d88733150 introduced the IFF_SKB_TX_SHARING flag, which I unilaterally set in
ether_setup. In doing this I didn't realize that other flags (such as
IFF_XMIT_DST_RELEASE) might be set prior to calling the ether_setup routine.
This patch changes ether_setup to or in SKB_TX_SHARING so as not to
inadvertently clear other existing flags. Thanks to Pekka Riikonen for pointing
out my error
Signed-off-by: Neil Horman <nhorman@tuxdriver.com>
Reported-by: Pekka Riikonen <priikone@iki.fi>
CC: "David S. Miller" <davem@davemloft.net>
Acked-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
dev_forward_skb loops an skb back into host networking
stack which might hang on the memory indefinitely.
In particular, this can happen in macvtap in bridged mode.
Copy the userspace fragments to avoid blocking the
sender in that case.
As this patch makes skb_copy_ubufs extern now,
I also added some documentation and made it clear
the SKBTX_DEV_ZEROCOPY flag automatically instead
of doing it in all callers. This can be made into a separate
patch if people feel it's worth it.
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
flow_cache_lookup will return a cached object (or null pointer) that the
resolver (i.e. xfrm_policy_lookup) previously found for another namespace
using the same key/family/dir. Instead, make the namespace part of what
identifies entries in the cache.
As before, flow_entry_valid will return 0 for entries where the namespace
has been deleted, and they will be removed from the cache the next time
flow_cache_gc_task is run.
Reported-by: Andrew Dickinson <whydna@whydna.net>
Signed-off-by: David Ward <david.ward@ll.mit.edu>
Signed-off-by: David S. Miller <davem@davemloft.net>
This is important for SMP platform to check if timer function is
executing on other CPU with deleting the timer.
Signed-off-by: Rajan Aggarwal <Rajan Aggarwal rajan.aggarwal85@gmail.com>
Acked-by: Oliver Hartkopp <socketcan@hartkopp.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
"Possible SYN flooding on port xxxx " messages can fill logs on servers.
Change logic to log the message only once per listener, and add two new
SNMP counters to track :
TCPReqQFullDoCookies : number of times a SYNCOOKIE was replied to client
TCPReqQFullDrop : number of times a SYN request was dropped because
syncookies were not enabled.
Based on a prior patch from Tom Herbert, and suggestions from David.
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
CC: Tom Herbert <therbert@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
File cls_rsvp.h in /net/sched was outdated. I'm sending you patch for this
file.
[ tb[] array should be indexed by X not X-1 -DaveM ]
Signed-off-by: Igor Maravić <igorm@etf.rs>
Signed-off-by: David S. Miller <davem@davemloft.net>
The checks for HCI_INQUIRY and HCI_MGMT were in the wrong order,
so that second scans always failed.
Signed-off-by: Oliver Neukum <oneukum@suse.de>
Signed-off-by: Gustavo F. Padovan <padovan@profusion.mobi>
The hw.conf.channel value is not updated properly for drivers that
support HW channel switch. Since the switch is done entirely by the
driver and we don't call ieee80211_hw_config(), this value remains
untouched. This patch fixes that by setting the new channel directly in
ieee80211_chswitch_work().
Signed-off-by: Shahar Levi <shahar_levi@ti.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
Per sec 7.1.3.5 of draft 12.0 of 802.11s, mesh frames indicate the
presence of the mesh control header in their QoS header.
Signed-off-by: Javier Cardona <javier@cozybit.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
In order to support QoS in mesh, we need to assign queue mapping only
after the next hop has been resolved, both for forwarded and locally
originated frames. Also, now that this is fixed, remove the XXX comment
in ieee80211_select_queue().
Also, V-Shy Ho reported that the queue mapping was not being applied to
the forwarded frame (fwd_skb instead of skb). Fixed that as well.
Signed-off-by: Javier Cardona <javier@cozybit.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
The format is intended to be like the subfields
in the QoS Info field, verify that is the case.
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
Juuso optimised the timer to not run all the
time in commit 3393a608c4.
However, after that it will still run once
more even if all frames just expired. Fixing
that also makes the function return value a
little clearer in the process.
Also, while at it, change the return value
to bool (instead of int).
Cc: Juuso Oikarinen <juuso.oikarinen@nokia.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
Set SSID information from nl80211 beacon parameters. Advertise changes
in SSID to low level drivers.
Signed-off-by: Arik Nemtsov <arik@wizery.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
Minstrel HT tries very hard to establish a BA session with
each peer once there's some data on the way. However the
stack does not inform minstrel if an aggregation session
is already in place, so it keeps trying and wastes good
cycles in the tx status path.
[ 8149.946393] Open BA session requested for $AP tid 0
[ 8150.048765] Open BA session requested for $AP tid 0
[ 8150.174509] Open BA session requested for $AP tid 0
[ 8150.274376] Open BA session requested for $AP tid 0
...
Signed-off-by: Christian Lamparter <chunkeey@googlemail.com>
Acked-by: Felix Fietkau <nbd@openwrt.org>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
The assumption is that during the hw config, transmission was
already stopped by mac80211. Sometimes the AP can be switching
b/w the ht modes due to intolerant or etc where STA is in
the middle of transmission. In such scenario, buffer overflow
was observed at driver side. And also before updating the rate
control, the frames are continued to xmited with older rates.
This patch ensures that the frames are always xmitted with
updated rates and avoid buffer overflow.
Signed-off-by: Rajkumar Manoharan <rmanohar@qca.qualcomm.com>
Reviewed-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
Tx flow control for non-mesh modes of operation only needs to act on the
net device queues: when the hardware queues are full we stop accepting
traffic from the net device. In mesh, however, we also need to stop
forwarding traffic. This patch checks the hardware queues before
attempting to forward a mesh frame.
Signed-off-by: Javier Cardona <javier@cozybit.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
To check whether a frame is in the RMC, we need access to the mesh
header. This header is encrypted in encrypted data frames, so make this
check after the frame has been decrypted.
Signed-off-by: Thomas Pedersen <thomas@cozybit.com>
Signed-off-by: Javier Cardona <javier@cozybit.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
To properly maintain the peer's block ack window, the driver needs to be
able to control the new starting sequence number that is sent along with
the BlockAckReq frame.
Signed-off-by: Felix Fietkau <nbd@openwrt.org>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
Reorder functions to remove the need for a forward declaration
introduced by the last commit.
Signed-off-by: Sven Neumann <s.neumann@raumfeld.com>
Cc: John W. Linville <linville@tuxdriver.com>
Cc: Luis R. Rodriguez <mcgrof@gmail.com>
Cc: Daniel Mack <daniel@zonque.org>
Cc: linux-wireless@vger.kernel.org
Acked-by: Luis R. Rodriguez <mcgrof@qca.qualcomm.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
The function wiphy_update_regulatory() uses the static variable
last_request and thus needs to be called with reg_mutex held.
This is the case for all users in reg.c, but the function was
exported for use by wiphy_register(), from where it is called
without the lock being held.
Fix this by making wiphy_update_regulatory() private and introducing
regulatory_update() as a wrapper that acquires and holds the lock.
Signed-off-by: Sven Neumann <s.neumann@raumfeld.com>
Cc: John W. Linville <linville@tuxdriver.com>
Cc: Luis R. Rodriguez <mcgrof@gmail.com>
Cc: Daniel Mack <daniel@zonque.org>
Cc: linux-wireless@vger.kernel.org
Acked-by: Luis R. Rodriguez <mcgrof@qca.qualcomm.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
Two spaces and the second "KHz" suggest that the code author meant to
print the bandwidth but forgot it. The code appears in commit e702d3cf
already with two spaces and "KHz" in place of the bandwidth.
Signed-off-by: Pavel Roskin <proski@gnu.org>
Acked-by: Luis R. Rodriguez <mcgrof@qca.qualcomm.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
Introduce filtering for scheduled scans to reduce the number of
unnecessary results (which cause useless wake-ups).
Add a new nested attribute where sets of parameters to be matched can
be passed when starting a scheduled scan. Only scan results that
match any of the sets will be returned.
At this point, the set consists of a single parameter, an SSID. This
can be easily extended in the future to support more complex matches.
Signed-off-by: Luciano Coelho <coelho@ti.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
add WIPHY_FLAG_AP_UAPSD flag to indicate uapsd support on
AP mode.
Advertise it to userspace by including a new
NL80211_ATTR_SUPPORT_AP_UAPSD attribute.
Signed-off-by: Eliad Peller <eliad@wizery.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
The calls to kzalloc() weren't checked here and it upsets the static
checkers. Obviously they're not super likely to fail, but we might
as well add some error handling.
Signed-off-by: Dan Carpenter <error27@gmail.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
When this flag is set, Tx A-MPDU sessions will not be started by
mac80211. This flag is required for devices that support Tx A-MPDU setup
in hardware.
Signed-off-by: Arik Nemtsov <arik@wizery.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
Removing unnecessary messages saves code and text.
Site specific OOM messages are duplications of a generic MM
out of memory message and aren't really useful, so just
delete them.
Signed-off-by: Joe Perches <joe@perches.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
Removing unnecessary messages saves code and text.
Site specific OOM messages are duplications of a generic MM
out of memory message and aren't really useful, so just
delete them.
Signed-off-by: Joe Perches <joe@perches.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
Removing unnecessary messages saves code and text.
Site specific OOM messages are duplications of a generic MM
out of memory message and aren't really useful, so just
delete them.
Signed-off-by: Joe Perches <joe@perches.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
No need to take the mpath state lock when an mpath is removed.
Also, no need checking the lock when reading mpath flags.
Signed-off-by: Javier Cardona <javier@cozybit.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
When an interface is removed, the mesh paths associated with it should
also be removed.
This fixes a bug we observed when reloading a device driver module
without reloading mac80211s.
Signed-off-by: Javier Cardona <javier@cozybit.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
When the rssi of the current AP drops, both wpa_supplicant and the
firmware may do a background scan to find a better AP and try to
associate. Since firmware based roaming is faster, inform
wpa_supplicant to avoid roaming and let the firmware decide to
roam if necessary.
For fullmac drivers like ath6kl, it is just enough to provide the
ESSID and the firmware will decide on the BSSID. Since it is not
possible to do pre-auth during roaming for fullmac drivers, the
wpa_supplicant needs to completely disconnect with the old AP and
reconnect with the new AP. This consumes lot of time and it is
better to leave the roaming decision to the firmware.
Signed-off-by: Vivek Natarajan <nataraja@qca.qualcomm.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
Unfortunately failed BAR tx attempts happen more frequently than I
expected, and the resulting aggregation teardowns cause performance
issues, as the aggregation session does not always get re-established
properly.
Instead of tearing down the entire aggr session, we can simply store the
SSN of the last failed BAR tx attempt, wait for the first successful
tx status event, and then send another BAR with the same SSN.
Signed-off-by: Felix Fietkau <nbd@openwrt.org>
Cc: Helmut Schaa <helmut.schaa@googlemail.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
Under failure conditions, the mesh stack sends PERR messages to the
previous sender of the failed frame. This happens in the tx feedback
path, in which the transmission queue lock may be taken. Avoid a
deadlock by sending the path error via the pending queue.
Signed-off-by: John W. Linville <linville@tuxdriver.com>
Since my commit 34e895075e
("mac80211: allow station add/remove to sleep") there is
a race in mac80211 when it clears the TIM bit because a
sleeping station disconnected, the spinlock isn't held
around the relevant code any more. Use the right API to
acquire the spinlock correctly.
Cc: stable@kernel.org [2.6.34+]
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
Removing unnecessary messages saves code and text.
Site specific OOM messages are duplications of a generic MM
out of memory message and aren't really useful, so just
delete them.
Signed-off-by: Joe Perches <joe@perches.com>
Signed-off-by: Marek Lindner <lindner_marek@yahoo.de>
In the process the batman iv OGM aggregation code could be merged
into the batman iv code base which makes the separate aggregation
files superfluous.
Signed-off-by: Marek Lindner <lindner_marek@yahoo.de>
In preparation of the upcoming improved routing algorithm the code based has
to be re-organized to allow choosing the routing algorithm at compile time.
Signed-off-by: Marek Lindner <lindner_marek@yahoo.de>
The follow-up routing code changes are going to introduce additional
routing packet types which make this distinction necessary.
Signed-off-by: Marek Lindner <lindner_marek@yahoo.de>
With msize equal to 512K (PAGE_SIZE * VIRTQUEUE_NUM), we hit multiple
crashes. This patch fix those.
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: Eric Van Hensbergen <ericvh@gmail.com>
Modifies code that disables a bearer to ensure that all of its links
are deleted, not just its uncongested links. Similarly, modifies code
that blocks a bearer to ensure that all of its links are reset, not
just its uncongested links.
Signed-off-by: Allan Stephens <allan.stephens@windriver.com>
Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
Saves a socket's TIPC_CONN_TIMEOUT socket option value in its original
form (milliseconds), rather than jiffies. This ensures that the exact
value set using setsockopt() is always returned by getsockopt(), without
being subject to rounding issues introduced by a ms->jiffies->ms
conversion sequence.
Signed-off-by: Allan Stephens <allan.stephens@windriver.com>
Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
Eliminates code in tipc_send_buf_fast() that handles messages
sent to a destination on the current node, since the only caller
of the routine only passes in messages destined for other nodes.
Signed-off-by: Allan Stephens <allan.stephens@windriver.com>
Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
Eliminates obsolete code that handles broadcast bearer congestion when
the broadast link sends a NACK message, since the broadcast pseudo-bearer
never becomes blocked.
Signed-off-by: Allan Stephens <allan.stephens@windriver.com>
Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
Modifies TIPC's incoming broadcast packet handler to discard messages
that cannot legally be sent over the broadcast link, including:
- broadcast protocol messages that do no contain state information
- payload messages that are not named multicast messages
- any other form of message except for bundled messages, fragmented
messages, and name distribution messages.
These checks are needed to prevent TIPC from handing an unexpected
message to a routine that isn't prepared to handle it, which could
lead to incorrect processing (up to and including invalid memory
references caused by attempts to access message fields that aren't
present in the message).
Signed-off-by: Allan Stephens <allan.stephens@windriver.com>
Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
Modifies TIPC's incoming broadcast packet handler so that it no longer
pre-reads information about the deferred packet queue, since the cached
value is unreliable once the associated node lock has been released.
Signed-off-by: Allan Stephens <allan.stephens@windriver.com>
Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
Modifies TIPC's incoming broadcast packet handler to ensure that the
node lock associated with the sender of the packet is held whenever
node-related data structure fields are accessed. The routine is also
restructured with a single exit point, making it easier to ensure
the node lock is properly released and the incoming packet is properly
disposed of.
Signed-off-by: Allan Stephens <allan.stephens@windriver.com>
Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
Ensure that broadcast link messages that have not been acknowledged
by a newly failed node do not get an implied acknowledgement until the
failed node is removed from the broadcast link's map of reachable nodes.
Previously, a race condition allowed a new broadcast link message to be
sent after the implicit acknowledgement processing was completed, but
before the map of reachable nodes was updated, resulting in the message
having an expected acknowledgement count that required the failed node
to explicitly acknowledge the message. Since this would never occur
the new message would remain in the broadcast link's transmit queue
forever, eventually causing the link to become congested and "stall".
Delaying the implicit acknowledgement processing until after the update
of the map of reachable nodes eliminates this race condition and prevents
stalling.
Signed-off-by: Allan Stephens <allan.stephens@windriver.com>
Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
Enhances cleanup of broadcast link-related information when contact
with a node is lost.
1) All broadcast link-related cleanup now occurs only if the lost node
was capable of communicating over the broadcast link.
2) Following cleanup, the lost node is marked as no longer supporting
the broadcast link, ensuring that any remaining broadcast messages
received from that node prior to the re-establishment of a normal
communication link are ignored.
Thanks to Surya [Suryanarayana.Garlapati@emerson.com] for contributing
a prototype version of this patch.
Signed-off-by: Allan Stephens <allan.stephens@windriver.com>
Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
Eliminates code associated with the sending of unsent broadcast link
traffic when the broadcast pseudo-bearer becomes unblocked following a
temporary congestion situation. This code is non-executable because the
broadcast pseudo-bearer never becomes blocked [see tipc_bcbearer_send()].
Signed-off-by: Allan Stephens <allan.stephens@windriver.com>
Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
Updates the comments in the broadcast bearer send routine to more
accurately describe the processing done by the routine. Also replaces
the improper use of a TIPC payload message error status symbol (in a place
that has nothing to do with such errors) with its numeric equivalent.
Signed-off-by: Allan Stephens <allan.stephens@windriver.com>
Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
Updates TIPC's broadcast link in a couple of places that were missed
during the transition from its former name ("multicast-link") to its
current name ("broadcast-link"). These changes are essentially cosmetic
and do not affect the overall operation of TIPC.
Signed-off-by: Allan Stephens <allan.stephens@windriver.com>
Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
Ensure TIPC ignores an out-dated link reset message whose session
number predates the current session number. (Previously, TIPC only
ignored an out-date reset message whose session number was equal
to the current link session number.)
Out-dated link reset messages should not occur under normal circumstances;
however, they can be generated if a link endpoint is unable to send a
link reset message right away and queues it for later delivery, but the
queued message is not sent until after the link is established.
Thanks to Laser [gotolaser@gmail.com] for diagnosing the problem and
contributing a prototype patch.
Signed-off-by: Allan Stephens <allan.stephens@windriver.com>
Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
Initializes the peer session number field of a newly created link
endpoint to an invalid value. This eliminates the remote possibility
that it will accidentally match the session number used by the peer
the first time the link is activated, and cause the link to ignore
a valid RESET message.
Signed-off-by: Allan Stephens <allan.stephens@windriver.com>
Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
Sets the peer interface portion of the name of a newly created link
endpoint to "unknown". This ensures that state and statistics information
can be properly displayed during the time between the link endpoint's
creation and the time handshaking with its peer is completed.
Signed-off-by: Allan Stephens <allan.stephens@windriver.com>
Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
Removes a test that ensures unicast link endpoints discard an incoming
message if it will not be consumed by the node itself and cannot be
forwarded to another node, since the preceding test already ensures that
the message is destined for this node and single-cluster TIPC no longer
performs message forwarding.
Signed-off-by: Allan Stephens <allan.stephens@windriver.com>
Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
Eliminates code that increments and validates the re-route count field
of payload messages, since the elimination of multi-cluster support
means that it is no longer necessary for TIPC to forward incoming messages
to another node. (The obsolete code was incorrect anyway, since it
incorrectly incremented the re-route count field of messages that
originated on the node that forwarded the message.)
Signed-off-by: Allan Stephens <allan.stephens@windriver.com>
Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
commit d0733d2e29 (Check for mistakenly passed in non-IPv4 address)
added regression on legacy apps that use bind() with AF_UNSPEC family.
Relax the check, but make sure the bind() is done on INADDR_ANY
addresses, as AF_UNSPEC has probably no sane meaning for other
addresses.
Bugzilla reference : https://bugzilla.kernel.org/show_bug.cgi?id=42012
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Reported-and-bisected-by: Rene Meier <r_meier@freenet.de>
CC: Marcus Meissner <meissner@suse.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
Allow transparent sockets to be less restrictive about
the source ip of ipv6 udp packets being sent.
Google-Bug-Id: 5018138
Signed-off-by: Maciej Żenczykowski <maze@google.com>
CC: "Erik Kline" <ek@google.com>
CC: "Lorenzo Colitti" <lorenzo@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The wrong multiplication of TCPOLEN_TSTAMP_ALIGNED by 4 skips the fast path
for the timestamp-only option. Bug reported by Michael M. Builov (netfilter
bugzilla #738).
Signed-off-by: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu>
Signed-off-by: Patrick McHardy <kaber@trash.net>
Michael M. Builov reported that in the tcp_options and tcp_sack functions
of netfilter TCP conntrack the incorrect handling of invalid TCP option
with too big opsize may lead to read access beyond tcp-packet or buffer
allocated on stack (netfilter bugzilla #738). The fix is to stop parsing
the options at detecting the broken option.
Signed-off-by: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu>
Signed-off-by: Patrick McHardy <kaber@trash.net>
When both the server and the client are NATed, the set-link-info control
packet containing the peer's call-id field is not properly translated.
I have verified that it was working in 2.6.16.13 kernel previously but
due to rewrite, this scenario stopped working (Not knowing exact version
when it stopped working).
Signed-off-by: Sanket Shah <sanket.shah@elitecore.com>
Signed-off-by: Patrick McHardy <kaber@trash.net>
A userspace listener may send (bogus) NF_STOLEN verdict, which causes skb leak.
This problem was previously fixed via
64507fdbc2 (netfilter:
nf_queue: fix NF_STOLEN skb leak) but this had to be reverted because
NF_STOLEN can also be returned by a netfilter hook when iterating the
rules in nf_reinject.
Reject userspace NF_STOLEN verdict, as suggested by Michal Miroslaw.
This is complementary to commit fad5444043
(netfilter: avoid double free in nf_reinject).
Cc: Julian Anastasov <ja@ssi.bg>
Cc: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Patrick McHardy <kaber@trash.net>
This patch replaces the code for getting an number from a
userspace buffer by a simple call to kstrou8_from_user.
This makes it easier to read and less error prone.
Since the old buffer was only 10 bytes long and the value is masked by a
nibble-mask anyway, we don't need to use kstrtoul but rather kstrtou8.
Kernel Version: v3.0-rc2
Signed-off-by: Peter Huewe <peterhuewe@gmx.de>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
Remove per site OOM messages because they duplicate
the generic mm subsystem OOM message.
Use kzalloc instead of kmalloc/memset
when next to the OOM message removals.
Reduces object size (allyesconfig ~2%)
$ size -t drivers/net/caif/built-in.o.old net/caif/built-in.o.old
text data bss dec hex filename
32297 700 8224 41221 a105 drivers/net/caif/built-in.o.old
72159 1317 20552 94028 16f4c net/caif/built-in.o.old
104456 2017 28776 135249 21051 (TOTALS)
$ size -t drivers/net/caif/built-in.o.new net/caif/built-in.o.new
text data bss dec hex filename
31975 700 8184 40859 9f9b drivers/net/caif/built-in.o.new
70748 1317 20152 92217 16839 net/caif/built-in.o.new
102723 2017 28336 133076 207d4 (TOTALS)
Signed-off-by: Joe Perches <joe@perches.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
In case SFB queue is full (hard limit reached), there is no point
spending time to compute hash and maximum qlen/p_mark.
We instead just early drop packet.
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
__netpoll_rx() doesnt properly handle skbs with small header
pskb_may_pull() or pskb_trim_rcsum() can change skb->data, we must
reload it.
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
structs introduced in tpacket_v3 implementation are prefixed with 'tpacket'
to avoid namespace collision.
Compile tested.
Signed-off-by: Chetan Loke <loke.chetan@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Add uapsd_queues and max_sp fields to ieee80211_sta.
These fields might be needed by low-level drivers in
order to configure the AP.
Signed-off-by: Eliad Peller <eliad@wizery.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
Add new NL80211_ATTR_STA_WME nested attribute that contains
wme params needed by the low-level driver (uapsd_queues and
max_sp).
Add these params to the station_parameters struct as well.
Signed-off-by: Eliad Peller <eliad@wizery.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
When operating as a P2P GO, we receive some P2P action frames where the
BSSID is set to the peer MAC address. Specifically, this occurs for
invitation responses. These are valid action frames and they should be
passed up.
Signed-off-by: Arik Nemtsov <arik@wizery.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
When associating to an AP, the station might miss the first EAP
packet that the AP sends due to a race condition between the association
success procedure and the rx flow in mac80211.
In such cases, the packet might fall in ieee80211_rx_h_check due to
the fact that the relevant rx->sta wasn't allocated yet.
Allocation of the relevant station info struct before actually
sending the association request and setting it with a new
dummy_sta flag solve this problem.
The station will accept only EAP packets from the AP while it
is in the pre-association/dummy state.
This dummy station entry is not seen by normal sta_info_get()
calls, only by sta_info_get_bss_rx().
The driver is not notified for the first insertion of the
dummy station. The driver is notified only after the association
is complete and the dummy flag is removed from the station entry.
That way, all the rest of the code flow should be untouched by
this change.
Signed-off-by: Guy Eilam <guy@wizery.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
Divided the sta_info_insert_rcu function to 3 mini-functions:
sta_info_insert_check - the initial checks done when inserting
a new station
sta_info_insert_ibss - the function that handles the station
addition for IBSS interfaces
sta_info_insert_non_ibss - the function that handles the station
addition in other cases
The outer API was not changed.
The refactoring was done for better usage of the different
stages in the station addition in new scenarios added
in the next commit.
Signed-off-by: Guy Eilam <guy@wizery.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
Since a v1 of the mesh gate series was accidentally applied, this patch
contains the changes in v2.
These are:
- automatically make mesh gate a root node.
- use TU_TO_EXP_TIME macro.
- initialize timer instead of checking for NULL timer function.
- cleanups.
Signed-off-by: Thomas Pedersen <thomas@cozybit.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
Dereferencing a user pointer directly from kernel-space without going
through the copy_from_user family of functions is a bad idea. Two of
such usages can be found in the sendmsg code path called from sendmmsg,
added by
commit c71d8ebe7a upstream.
commit 5b47b8038f183b44d2d8ff1c7d11a5c1be706b34 in the 3.0-stable tree.
Usages are performed through memcmp() and memcpy() directly. Fix those
by using the already copied msg_sys structure instead of the __user *msg
structure. Note that msg_sys can be set to NULL by verify_compat_iovec()
or verify_iovec(), which requires additional NULL pointer checks.
Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Signed-off-by: David Goulet <dgoulet@ev0ke.net>
CC: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
CC: Anton Blanchard <anton@samba.org>
CC: David S. Miller <davem@davemloft.net>
CC: stable <stable@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Patch series 109f6e39..7361c36c back in 2.6.36 added functionality to
allow credentials to work across pid namespaces for packets sent via
UNIX sockets. However, the atomic reference counts on pid and
credentials caused plenty of cache bouncing when there are numerous
threads of the same pid sharing a UNIX socket. This patch mitigates the
problem by eliminating extraneous reference counts on pid and
credentials on both send and receive path of UNIX sockets. I found a 2x
improvement in hackbench's threaded case.
On the receive path in unix_dgram_recvmsg, currently there is an
increment of reference count on pid and credentials in scm_set_cred.
Then there are two decrement of the reference counts. Once in scm_recv
and once when skb_free_datagram call skb->destructor function
unix_destruct_scm. One pair of increment and decrement of ref count on
pid and credentials can be eliminated from the receive path. Until we
destroy the skb, we already set a reference when we created the skb on
the send side.
On the send path, there are two increments of ref count on pid and
credentials, once in scm_send and once in unix_scm_to_skb. Then there
is a decrement of the reference counts in scm_destroy's call to
scm_destroy_cred at the end of unix_dgram_sendmsg functions. One pair
of increment and decrement of the reference counts can be removed so we
only need to increment the ref counts once.
By incorporating these changes, for hackbench running on a 4 socket
NHM-EX machine with 40 cores, the execution of hackbench on
50 groups of 20 threads sped up by factor of 2.
Hackbench command used for testing:
./hackbench 50 thread 2000
Signed-off-by: Tim Chen <tim.c.chen@linux.intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
With this patch a HEARTBEAT chunk is bundled into the ASCONF-ACK
for ADD IP ADDRESS, confirming the new destination as quickly as
possible.
Signed-off-by: Michio Honda <micchie@sfc.wide.ad.jp>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch fixes BUG that the ASCONF receiver transmits DATA chunks
to the newly added UNCONFIRMED destination.
Signed-off-by: Michio Honda <micchie@sfc.wide.ad.jp>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch implements Proportional Rate Reduction (PRR) for TCP.
PRR is an algorithm that determines TCP's sending rate in fast
recovery. PRR avoids excessive window reductions and aims for
the actual congestion window size at the end of recovery to be as
close as possible to the window determined by the congestion control
algorithm. PRR also improves accuracy of the amount of data sent
during loss recovery.
The patch implements the recommended flavor of PRR called PRR-SSRB
(Proportional rate reduction with slow start reduction bound) and
replaces the existing rate halving algorithm. PRR improves upon the
existing Linux fast recovery under a number of conditions including:
1) burst losses where the losses implicitly reduce the amount of
outstanding data (pipe) below the ssthresh value selected by the
congestion control algorithm and,
2) losses near the end of short flows where application runs out of
data to send.
As an example, with the existing rate halving implementation a single
loss event can cause a connection carrying short Web transactions to
go into the slow start mode after the recovery. This is because during
recovery Linux pulls the congestion window down to packets_in_flight+1
on every ACK. A short Web response often runs out of new data to send
and its pipe reduces to zero by the end of recovery when all its packets
are drained from the network. Subsequent HTTP responses using the same
connection will have to slow start to raise cwnd to ssthresh. PRR on
the other hand aims for the cwnd to be as close as possible to ssthresh
by the end of recovery.
A description of PRR and a discussion of its performance can be found at
the following links:
- IETF Draft:
http://tools.ietf.org/html/draft-mathis-tcpm-proportional-rate-reduction-01
- IETF Slides:
http://www.ietf.org/proceedings/80/slides/tcpm-6.pdfhttp://tools.ietf.org/agenda/81/slides/tcpm-2.pdf
- Paper to appear in Internet Measurements Conference (IMC) 2011:
Improving TCP Loss Recovery
Nandita Dukkipati, Matt Mathis, Yuchung Cheng
Signed-off-by: Nandita Dukkipati <nanditad@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
1) Blocks can be configured with non-static frame-size.
2) Read/poll is at a block-level(as opposed to packet-level).
3) Added poll timeout to avoid indefinite user-space wait on idle links.
4) Added user-configurable knobs:
4.1) block::timeout.
4.2) tpkt_hdr::sk_rxhash.
Changes:
C1) tpacket_rcv()
C1.1) packet_current_frame() is replaced by packet_current_rx_frame()
The bulk of the processing is then moved in the following chain:
packet_current_rx_frame()
__packet_lookup_frame_in_block
fill_curr_block()
or
retire_current_block
dispatch_next_block
or
return NULL(queue is plugged/paused)
Signed-off-by: Chetan Loke <loke.chetan@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch provides base support for transmission of IPv6 packets as
well as the formation of IPv6 link-local addresses and statelessly
autoconfigured addresses on top of IEEE 802.15.4 networks.
For more information please look at the RFC4944 "Compression Format
for IPv6 Datagrams in Low Power and Losst Networks (6LoWPAN).
Signed-off-by: Alexander Smirnov <alex.bluesman.smirnov@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Ian Campbell <ian.campbell@citrix.com>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: netdev@vger.kernel.org
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Ian Campbell <ian.campbell@citrix.com>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Eric Dumazet <eric.dumazet@gmail.com>
Cc: "Michał Mirosław" <mirq-linux@rere.qmqm.pl>
Cc: netdev@vger.kernel.org
Signed-off-by: David S. Miller <davem@davemloft.net>
br_multicast_ipv6_rcv() can call pskb_trim_rcsum() and therefore skb
head can be reallocated.
Cache icmp6_type field instead of dereferencing twice the struct
icmp6hdr pointer.
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Checksum of ICMPv6 is not properly computed because the pseudo header is not used.
Thus, the MLD packet gets dropped by the bridge.
Signed-off-by: Zheng Yan <zheng.z.yan@intel.com>
Reported-by: Ang Way Chuang <wcang@sfc.wide.ad.jp>
Signed-off-by: David S. Miller <davem@davemloft.net>
Should check use count of include mode filter instead of total number
of include mode filters.
Signed-off-by: Zheng Yan <zheng.z.yan@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Skip IPIP header to get proper layer-4 information.
Like GRE tunnels, this only works if rxhash is not already provided by
the device itself (ethtool -K ethX rxhash off), to allow kernel compute
a software rxhash.
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
We can have the NFC core layer allocating the tx head and tail
room for the drivers and avoid 1 or more SKBs copy on write on
the Tx path.
Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
Allow userspace to set NL80211_MESHCONF_GATE_ANNOUNCEMENTS attribute,
which will advertise this mesh node as being a mesh gate.
NL80211_HWMP_ROOTMODE must be set or this will do nothing.
Signed-off-by: Javier Cardona <javier@cozybit.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
Allow userspace to set Root Announcement Interval for our mesh
interface. Also, RANN interval is now in proper units of TUs.
Signed-off-by: Javier Cardona <javier@cozybit.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
This fix allows userspace to mark a meshif as a root node.
Signed-off-by: Javier Cardona <javier@cozybit.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
In this implementation, a mesh gate is a root node with a certain bit
set in its RANN flags. The mpath to this root node is marked as a path
to a gate, and added to our list of known gates for this if_mesh. Once a
path discovery process fails, we forward the unresolved frames to a
known gate. Thanks to Luis Rodriguez for refactoring and bug fix help.
Signed-off-by: Javier Cardona <javier@cozybit.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
Previously, mpaths were never flushed since the mpath is not active once
we call this function.
Signed-off-by: Javier Cardona <javier@cozybit.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
mesh_queue_preq is invoked invoked from both user (work queue) and
softirq (timer) context, so the _bh version of spinlock needs to be
used. Also, the mpath->state_lock should be softirq safe as well.
Signed-off-by: Javier Cardona <javier@cozybit.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
If we have an mpath whose timer has not been initialized, don't try to
delete it.
Signed-off-by: Javier Cardona <javier@cozybit.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
make hwmp_dbg print the relevant sdata->name by default and improve
formatting. Also add mpath_dbg macro for debugging of mesh path
operations.
Signed-off-by: Javier Cardona <javier@cozybit.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
Jan Beulich reported a possible net_device leak in bridge code after
commit bb900b27a2 (bridge: allow creating bridge devices with netlink)
Reported-by: Jan Beulich <JBeulich@novell.com>
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Acked-by: Stephen Hemminger <shemminger@vyatta.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Now, when vlan tag on untagged in non-accelerated path is stripped from
skb, headers are reset right away. Benefit from that and avoid calling
__netif_receive_skb recursivelly and just use another_round.
Signed-off-by: Jiri Pirko <jpirko@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Make mesh path selection frames Mesh Action category, remove outdated
Mesh Path Selection category and defines, use updated reason codes, add
mesh_action_is_path_sel for readability, and update/correct path
selection IEs.
Signed-off-by: Thomas Pedersen <thomas@cozybit.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
This patch updates the mesh peering frames to the format specified in
the recently ratified 802.11s standard. Several changes took place to
make this happen:
- Change RX path to handle new self-protected frames
- Add new Peering management IE
- Remove old Peer Link IE
- Remove old plink_action field in ieee80211_mgmt header
These changes by themselves would either break peering, or work by
coincidence, so squash them all into this patch.
Signed-off-by: Thomas Pedersen <thomas@cozybit.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
Have the mesh peering frames use the self-protected action and reason codes
specified in 802.11s and defined in ieee80211.h. Remove the local enums.
Signed-off-by: Thomas Pedersen <thomas@cozybit.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
Correct ordering of IEs in the mesh beacon while removing unneeded IEs
from mesh peering frames. Set privacy bit in capability info if security
is enabled. Add utility functions to aid in construction
of IEs and reduce code duplication.
Signed-off-by: Thomas Pedersen <thomas@cozybit.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
As described at [1] some STAs (i.e. Intel 5100 Windows) can end up
correctly BlockAcking incoming frames without delivering them to user
space if a AMPDU subframe got lost and we don't flush the receipients
reorder buffer with a BlockAckReq. This in turn results in stuck
connections.
According to 802.11n-2009 it is not necessary to send a BAR to flush
the recepients RX reorder buffer but we still do that to be polite.
However, assume the following frame exchange:
AP -> STA, AMPDU (failed)
AP -> STA, BAR (failed)
The client in question then ends up in the same situation and won't
deliver frames to userspace anymore since we weren't able to flush
its reorder buffer.
This is not a hypothetical situation but I was able to observe this
exact behavior during a stress test between a rt2800pci AP and a Intel
5100 Windows client.
In order to work around this issue just tear down the BA session as
soon as a BAR failed to be TX'ed.
[1] http://comments.gmane.org/gmane.linux.kernel.wireless.general/66867
Signed-off-by: Helmut Schaa <helmut.schaa@googlemail.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
While at it also fix the indention of the other IEEE80211_BAR_CTRL_ defines.
Signed-off-by: Helmut Schaa <helmut.schaa@googlemail.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
Since clients can have several flags on or off, this patches make them
appear in the local/global transtable output so that they can be checked
for debugging purposes.
Signed-off-by: Antonio Quartulli <ordex@autistici.org>
Signed-off-by: Marek Lindner <lindner_marek@yahoo.de>
If a node has to send a packet issued by a WIFI client to another WIFI client,
the packet is dropped.
Signed-off-by: Antonio Quartulli <ordex@autistici.org>
Signed-off-by: Marek Lindner <lindner_marek@yahoo.de>
When a node receives a unicast packet it checks if the source and the
destination client can communicate or not due to the AP isolation
Signed-off-by: Antonio Quartulli <ordex@autistici.org>
Signed-off-by: Marek Lindner <lindner_marek@yahoo.de>
Clients connected through a 802.11 device are now marked with the
TT_CLIENT_WIFI flag. This flag is also advertised with the tt
announcement.
Signed-off-by: Antonio Quartulli <ordex@autistici.org>
Signed-off-by: Marek Lindner <lindner_marek@yahoo.de>
Several typos have been corrected and some sentences have been rephrased
Signed-off-by: Antonio Quartulli <ordex@autistici.org>
Signed-off-by: Marek Lindner <lindner_marek@yahoo.de>
hash_add() returns 0 on success while returns -1 either on error and on
entry already present. The caller could use such information to select
its behaviour. For this reason it is useful that hash_add() returns -1
in case on error and returns 1 in case of entry already present.
Signed-off-by: Antonio Quartulli <ordex@autistici.org>
Signed-off-by: Marek Lindner <lindner_marek@yahoo.de>
This oops have been already fixed with commit
27141666b6
atm: [br2684] Fix oops due to skb->dev being NULL
It happens that if a packet arrives in a VC between the call to open it on
the hardware and the call to change the backend to br2684, br2684_regvcc
processes the packet and oopses dereferencing skb->dev because it is
NULL before the call to br2684_push().
but have been introduced again with commit
b6211ae7f2
atm: Use SKB queue and list helpers instead of doing it by-hand.
Signed-off-by: Daniel Schwierzeck <daniel.schwierzeck@googlemail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
IPV6_2292PKTOPTIONS is broken for 32-bit applications running
in COMPAT mode on 64-bit kernels.
The same problem was fixed for IPv4 with the patch:
ipv4: Fix ip_getsockopt for IP_PKTOPTIONS,
commit dd23198e58
Signed-off-by: Sorin Dumitru <sdumitru@ixiacom.com>
Signed-off-by: Daniel Baluta <dbaluta@ixiacom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Inspect the payload of PPPOE session messages for the 4 tuples to generate
skb->rxhash.
Signed-off-by: Changli Gao <xiaosuo@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
For the 802.1Q packets, if the NIC doesn't support hw-accel-vlan-rx, RPS
won't inspect the internal 4 tuples to generate skb->rxhash, so this kind
of traffic can't get any benefit from RPS.
This patch adds the support for 802.1Q to RPS.
Signed-off-by: Changli Gao <xiaosuo@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
It's after all necessary to do reset headers here. The reason is we
cannot depend that it gets reseted in __netif_receive_skb once skb is
reinjected. For incoming vlanids without vlan_dev, vlan_do_receive()
returns false with skb != NULL and __netif_reveive_skb continues, skb is
not reinjected.
This might be good material for 3.0-stable as well
Reported-by: Mike Auty <mike.auty@gmail.com>
Signed-off-by: Jiri Pirko <jpirko@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Use IFF_UNICAST_FTL to find out if driver handles unicast address
filtering. In case it does not, promisc mode is entered.
Patch also fixes following drivers:
stmmac, niu: support uc filtering and yet it propagated
ndo_set_multicast_list
bna, benet, pxa168_eth, ks8851, ks8851_mll, ksz884x : has set
ndo_set_rx_mode but do not support uc filtering
Signed-off-by: Jiri Pirko <jpirko@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
When a redirected or mirrored packet is dropped by the target
device we need to record statistics.
Signed-off-by: Jamal Hadi Salim <jhs@mojatatu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Crack open GRE packets in __skb_get_rxhash to compute 4-tuple hash on
in encapsulated packet. Note that this is used only when the
__skb_get_rxhash is taken, in particular only when the device does
not compute provide the rxhash (ie. feature is disabled).
This was tested by creating a single GRE tunnel between two 16 core
AMD machines. 200 netperf TCP_RR streams were ran with 1 byte
request and response size.
Without patch: 157497 tps, 50/90/99% latencies 1250/1292/1364 usecs
With patch: 325896 tps, 50/90/99% latencies 603/848/1169
Signed-off-by: Tom Herbert <therbert@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Basics for looking for ports in encapsulated packets in tunnels.
Signed-off-by: Tom Herbert <therbert@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The l4_rxhash flag was added to the skb structure to indicate
that the rxhash value was computed over the 4 tuple for the
packet which includes the port information in the encapsulated
transport packet. This is used by the stack to preserve the
rxhash value in __skb_rx_tunnel.
Signed-off-by: Tom Herbert <therbert@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Use some variables for clarity and extensibility.
Signed-off-by: Tom Herbert <therbert@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
sit tunnels (IPv6 tunnel over IPv4) do not implement the "tos inherit"
case to copy the IPv6 transport class byte from the inner packet to
the IPv4 type of service byte in the outer packet. By contrast, ipip
tunnels and GRE tunnels do.
This patch, adapted from the similar code in net/ipv4/ipip.c and
net/ipv4/ip_gre.c, implements that.
This patch applies to 3.0.1, and has been tested on that version.
Signed-off-by: Lionel Elie Mamane <lionel@mamane.lu>
Signed-off-by: David S. Miller <davem@davemloft.net>
The current transport mechanism for af_iucv is the z/VM offered
communications facility IUCV. To provide equivalent support when
running Linux in an LPAR, HiperSockets transport is added to the
AF_IUCV address family. It requires explicit binding of an AF_IUCV
socket to a HiperSockets device. A new packet_type ETH_P_AF_IUCV
is announced. An af_iucv specific transport header is defined
preceding the skb data. A small protocol is implemented for
connecting and for flow control/congestion management.
Signed-off-by: Ursula Braun <ursula.braun@de.ibm.com>
Signed-off-by: Frank Blaschka <frank.blaschka@de.ibm.com>
Reviewed-by: Hendrik Brueckner <brueckner@linux.vnet.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Code cleanup making make use of local variable for struct iucv_sock.
Signed-off-by: Ursula Braun <ursula.braun@de.ibm.com>
Signed-off-by: Frank Blaschka <frank.blaschka@de.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
For future af_iucv extensions the module should be able to run in LPAR
mode too. For this we use the new dynamic loading iucv interface.
Signed-off-by: Frank Blaschka <frank.blaschka@de.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
When adding HiperSockets transport to AF_IUCV Sockets, af_iucv either
depends on IUCV or QETH_L3 (or both). This patch introduces the
necessary changes for kernel configuration.
Signed-off-by: Ursula Braun <ursula.braun@de.ibm.com>
Signed-off-by: Frank Blaschka <frank.blaschka@de.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch adds a symbol to dynamically load iucv functions.
Signed-off-by: Frank Blaschka <frank.blaschka@de.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The NL80211_CMD_NEW_BEACON command is, in practice, requesting AP mode
operations to be started. Add new attributes to provide extra IEs
(e.g., WPS IE, P2P IE) for drivers that build Beacon, Probe Response,
and (Re)Association Response frames internally (likely in firmware).
Signed-off-by: Jouni Malinen <jouni@qca.qualcomm.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
This removes need from drivers to parse the beacon tail/head data
to figure out what crypto settings are to be used in AP mode in case
the Beacon and Probe Response frames are fully constructed in the
driver/firmware.
Signed-off-by: Jouni Malinen <jouni@qca.qualcomm.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
This makes it easier for drivers that generate Beacon and Probe Response
frames internally (in firmware most likely) in AP mode.
Signed-off-by: Jouni Malinen <jouni@qca.qualcomm.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
Moving the parsing logic for retrieving the information elements
stored in management frames, e.g. beacons or probe responses,
and making it available to other cfg80211 drivers.
Signed-off-by: Yogesh Ashok Powar <yogeshp@marvell.com>
Signed-off-by: Amitkumar Karwar <akarwar@marvell.com>
Signed-off-by: Bing Zhao <bzhao@marvell.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
RCU api had been completed and rcu_access_pointer() or
rcu_dereference_protected() are better than generic
rcu_dereference_raw()
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Remove the artificial HZ latency on arp resolution.
Instead of firing a timer in one jiffy (up to 10 ms if HZ=100), lets
send the ARP message immediately.
Before patch :
# arp -d 192.168.20.108 ; ping -c 3 192.168.20.108
PING 192.168.20.108 (192.168.20.108) 56(84) bytes of data.
64 bytes from 192.168.20.108: icmp_seq=1 ttl=64 time=9.91 ms
64 bytes from 192.168.20.108: icmp_seq=2 ttl=64 time=0.065 ms
64 bytes from 192.168.20.108: icmp_seq=3 ttl=64 time=0.061 ms
After patch :
$ arp -d 192.168.20.108 ; ping -c 3 192.168.20.108
PING 192.168.20.108 (192.168.20.108) 56(84) bytes of data.
64 bytes from 192.168.20.108: icmp_seq=1 ttl=64 time=0.152 ms
64 bytes from 192.168.20.108: icmp_seq=2 ttl=64 time=0.064 ms
64 bytes from 192.168.20.108: icmp_seq=3 ttl=64 time=0.074 ms
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Once the session thread is running, cleanup must be handled
by the session thread only.
Signed-off-by: Peter Hurley <peter@hurleysoftware.com>
Signed-off-by: Gustavo F. Padovan <padovan@profusion.mobi>
When an hidp connection is added for a boot protocol input
device, don't release a device reference that was never
acquired. The device reference is acquired when the session
is linked to the session list (which hasn't happened yet when
hidp_setup_input is called).
Signed-off-by: Peter Hurley <peter@hurleysoftware.com>
Signed-off-by: Gustavo F. Padovan <padovan@profusion.mobi>
When an hidp connection is added for a boot protocol input
device, only free the allocated device if device registration fails.
Subsequent failures should only unregister the device (the input
device api documents that unregister will also free the allocated
device).
Signed-off-by: Peter Hurley <peter@hurleysoftware.com>
Signed-off-by: Gustavo F. Padovan <padovan@profusion.mobi>
Free the cached HID report descriptor on thread terminate.
Signed-off-by: Peter Hurley <peter@hurleysoftware.com>
Signed-off-by: Gustavo F. Padovan <padovan@profusion.mobi>
Once the session thread is running, cleanup must be
handled by the session thread only.
Signed-off-by: Peter Hurley <peter@hurleysoftware.com>
Signed-off-by: Gustavo F. Padovan <padovan@profusion.mobi>
Commit fada4ac339 introduced the usage of kthread API.
kthread_stop is a blocking function which returns only when
the thread exits. In this case, the thread can't exit because it's
waiting for the write lock, which is being held by cmtp_del_connection()
which is waiting for the thread to exit -- deadlock.
Revert cmtp_reset_ctr to its original behavior: non-blocking signalling
for the session to terminate.
Signed-off-by: Peter Hurley <peter@hurleysoftware.com>
Signed-off-by: Gustavo F. Padovan <padovan@profusion.mobi>
Commit f4d7cd4a4c introduced the usage of kthread API.
kthread_stop is a blocking function which returns only when
the thread exits. In this case, the thread can't exit because it's
waiting for the write lock, which is being held by bnep_del_connection()
which is waiting for the thread to exit -- deadlock.
Use atomic_t/wake_up_process instead to signal to the thread to exit.
Signed-off-by: Jaikumar Ganesh <jaikumar@google.com>
Signed-off-by: Peter Hurley <peter@hurleysoftware.com>
Signed-off-by: Gustavo F. Padovan <padovan@profusion.mobi>
No command should be send before Command Complete event for HCI
reset is received. This fix regression introduced by commit
6bd32326cda(Bluetooth: Use proper timer for hci command timout)
for chips whose reset command takes longer to complete (e.g. CSR)
resulting in next command being send before HCI reset completed.
Signed-off-by: Szymon Janc <szymon@janc.net.pl>
Signed-off-by: Gustavo F. Padovan <padovan@profusion.mobi>
L2CAP connection timeout needs to be assigned as miliseconds
and not as jiffies.
Signed-off-by: Chen Ganir <chen.ganir@ti.com>
Signed-off-by: Gustavo F. Padovan <padovan@profusion.mobi>
Fix race condition which can result in missing wakeup during
l2cap socket shutdown.
Signed-off-by: Peter Hurley <peter@hurleysoftware.com>
Signed-off-by: Gustavo F. Padovan <padovan@profusion.mobi>
Fix race condition which can result in missing the wakeup intended
to stop the session thread.
Signed-off-by: Peter Hurley <peter@hurleysoftware.com>
Signed-off-by: Gustavo F. Padovan <padovan@profusion.mobi>
Fix race condition which can result in missing the wakeup intended
to stop the session thread.
Signed-off-by: Peter Hurley <peter@hurleysoftware.com>
Signed-off-by: Gustavo F. Padovan <padovan@profusion.mobi>
Fix race conditions which can cause lost wakeups (or missed signals)
while waiting to accept a sco socket connection.
Signed-off-by: Peter Hurley <peter@hurleysoftware.com>
Signed-off-by: Gustavo F. Padovan <padovan@profusion.mobi>
Fix race conditions which can cause lost wakeups (or misssed signals)
while waiting to accept an l2cap socket connection.
Signed-off-by: Peter Hurley <peter@hurleysoftware.com>
Signed-off-by: Gustavo F. Padovan <padovan@profusion.mobi>
Fix race conditions which can cause lost wakeups while waiting
for sock state to change.
Signed-off-by: Peter Hurley <peter@hurleysoftware.com>
Signed-off-by: Gustavo F. Padovan <padovan@profusion.mobi>
Fix race conditions which can cause lost wakeups (or missed
signals) while waiting to accept an rfcomm socket connection.
Signed-off-by: Peter Hurley <peter@hurleysoftware.com>
Signed-off-by: Gustavo F. Padovan <padovan@profusion.mobi>
Removed superfluous event handling which was used to signal
that the rfcomm kthread had been woken. This appears to have been
used to prevent lost wakeups. Correctly ordering when the task
state is set to TASK_INTERRUPTIBLE is sufficient to prevent lost wakeups.
To prevent wakeups which occurred prior to initially setting
TASK_INTERRUPTIBLE from being lost, the main work of the thread loop -
rfcomm_process_sessions() - is performed prior to sleeping.
Signed-off-by: Peter Hurley <peter@hurleysoftware.com>
Signed-off-by: Gustavo F. Padovan <padovan@profusion.mobi>
There was a small typo here so we never actually hit the goto which
would call hci_dev_unlock_bh().
Signed-off-by: Dan Carpenter <error27@gmail.com>
Signed-off-by: Gustavo F. Padovan <padovan@profusion.mobi>
Add a comment pointing out the use of enum station_info_flags for
all new struct station_info fields. In addition, memset the sinfo
buffer to zero before use on all paths in the current tree to avoid
leaving uninitialized pointers in the data.
Signed-off-by: Jouni Malinen <jouni@qca.qualcomm.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
mac80211 leaves sinfo->assoc_req_ies uninitialized, causing a random
pointer memory access in nl80211_send_station.
Instead of checking if the pointer is null, use sinfo->filled, like
the rest of the fields.
Signed-off-by: Felix Fietkau <nbd@openwrt.org>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
dev->real_num_tx_queues is correctly set already in alloc_netdev_mqs.
Signed-off-by: Jiri Pirko <jpirko@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
As rt_iif represents input device even for packets
coming from loopback with output route, it is not an unique
key specific to input routes. Now rt_route_iif has such role,
it was fl.iif in 2.6.38, so better to change the checks at
some places to save CPU cycles and to restore 2.6.38 semantics.
compare_keys:
- input routes: only rt_route_iif matters, rt_iif is same
- output routes: only rt_oif matters, rt_iif is not
used for matching in __ip_route_output_key
- now we are back to 2.6.38 state
ip_route_input_common:
- matching rt_route_iif implies input route
- compared to 2.6.38 we eliminated one rth->fl.oif check
because it was not needed even for 2.6.38
compare_hash_inputs:
Only the change here is not an optimization, it has
effect only for output routes. I assume I'm restoring
the original intention to ignore oif, it was using fl.iif
- now we are back to 2.6.38 state
Signed-off-by: Julian Anastasov <ja@ssi.bg>
Signed-off-by: David S. Miller <davem@davemloft.net>
Free the locally allocated table and newinfo as done in adjacent error
handling code.
Signed-off-by: Julia Lawall <julia@diku.dk>
Signed-off-by: David S. Miller <davem@davemloft.net>
Call cipso_v4_doi_putdef in the case of the failure of the allocation of
entry. Reverse the order of the error handling code at the end of the
function and insert more labels in order to reduce the number of
unnecessary calls to kfree.
Signed-off-by: Julia Lawall <julia@diku.dk>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch corrects an erroneous update of credential's gid with uid
introduced in commit 257b5358b3 since 2.6.36.
Signed-off-by: Tim Chen <tim.c.chen@linux.intel.com>
Acked-by: Eric Dumazet <eric.dumazet@gmail.com>
Reviewed-by: James Morris <jmorris@namei.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Using a gcc 4.4.3, warnings are emitted for a possibly uninitialized use
of ecn_ok.
This can happen if cookie_check_timestamp() returns due to not having
seen a timestamp. Defaulting to ecn off seems like a reasonable thing
to do in this case, so initialized ecn_ok to false.
Signed-off-by: Mike Waychison <mikew@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
When a PREQ or PREP is received from an intermediate node, it contains
useful information for path selection but it doesn't include the
originator's sequence number. Therefore, when updating the mesh path
to that intermediate node, we should not set the MESH_PATH_SN_VALID
flag. BUT, if the flag is set, it should not be unset as we might have
received a valid sequence number for that intermediate node in the past.
This issue was reported, fixed and tested by Ya Bo (游波) and Pedro
Larbig (ASPj).
Signed-off-by: Javier Cardona <javier@cozybit.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
drivers might assume sta.drv_priv is clear while
the sta is added, so clear it on reconfinguration.
Signed-off-by: Eliad Peller <eliad@wizery.com>
Acked-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
When user space SME/MLME (e.g., hostapd) is not used in AP mode, the
IEs from the (Re)Association Request frame that was processed in
firmware need to be made available for user space (e.g., RSN IE for
hostapd). Allow this to be done with cfg80211_new_sta().
Signed-off-by: Jouni Malinen <jouni@qca.qualcomm.com>
Acked-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
Drivers that support frame transmission with mgmt_tx() may not support
driver-based offchannel TX. Use mgmt_tx_cancel_wait instead of mgmt_tx
when figuring out whether to indicate support for this with
NL80211_ATTR_OFFCHANNEL_TX_OK.
Signed-off-by: Jouni Malinen <jouni@qca.qualcomm.com>
Acked-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
commit 07bd8df5df
(sch_sfq: fix peek() implementation) changed sfq to use generic
peek helper.
This makes HFSC complain about a non-work-conserving child qdisc, if
prio with sfq child is used within hfsc:
hfsc peeks into prio qdisc, which will then peek into sfq.
returned skb is stashed in sch->gso_skb.
Next, hfsc tries to dequeue from prio, but prio will call sfq dequeue
directly, which may return NULL instead of previously peeked-at skb.
Have prio call qdisc_dequeue_peeked, so sfq->dequeue() is
not called in this case.
Cc: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
Followup of commit f2c31e32b3 (fix NULL dereferences in
check_peer_redir()).
We need to make sure dst neighbour doesnt change in dst_ifdown().
Fix some sparse errors.
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This ensures the neighbor entries associated with the bridge
dev are flushed, also invalidating the associated cached L2 headers.
This means we br_add_if/br_del_if ports to implement hand-over and
not wind up with bridge packets going out with stale MAC.
This means we can also change MAC of port device and also not wind
up with bridge packets going out with stale MAC.
This builds on Stephen Hemminger's patch, also handling the br_del_if
case and the port MAC change case.
Cc: Stephen Hemminger <shemminger@vyatta.com>
Signed-off-by: Andrei Warkentin <andreiw@motorola.com>
Acked-by: Stephen Hemminger <shemminger@vyatta.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
There were several problems here:
1- we weren't tagging allocations with the pool, so they were never
returned to the pool.
2- msgpool_put didn't add back to the mempool, even it were called.
3- msgpool_release didn't clear the pool pointer, so it would have looped
had #1 not been broken.
These may or may not have been responsible for #1136 or #1381 (BUG due to
non-empty mempool on umount). I can't seem to trigger the crash now using
the method I was using before.
Signed-off-by: Sage Weil <sage@newdream.net>
Signed-off-by: Helmut Schaa <helmut.schaa@googlemail.com>
Acked-by: Felix Fietkau <nbd@openwrt.org>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
Some drivers (ath9k for example) are using skb->protocol to treat EAPOL
frames somehow special (disallow aggregation for example).
When running in AP mode hostapd injects the EAPOL frames through a
monitor interface and thus skb->protocol isn't set at all. Hence, if the
injected frame is a data frame and carries a rfc1042 headaer update the
skb->protocol field accordingly.
Signed-off-by: Helmut Schaa <helmut.schaa@googlemail.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
Several uses were missing terminating newlines.
Typo fix and macro neatening.
Signed-off-by: Joe Perches <joe@perches.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
Make mesh_preq_queue_lock locking consistent with mesh_queue_preq() using
spin_lock_bh().
Signed-off-by: Baruch Siach <baruch@tkos.co.il>
Signed-off-by: Javier Cardona <javier@cozybit.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
According to 802.11-2007, 7.3.1.14 it is compliant to use a buf_size of
0 in ADDBA requests. But some devices (AVM Fritz Stick N) arn't able to
handle that correctly and will reply with an ADDBA reponse with a
buf_size of 0 which in turn will disallow BA sessions for these
devices.
To work around this problem, initialize hw.max_tx_aggregation_subframes
to the maximum AMPDU buffer size 0x40.
Using 0 as default for the bufsize was introduced in commit
5dd36bc933 (mac80211: allow advertising
correct maximum aggregate size).
Signed-off-by: Helmut Schaa <helmut.schaa@googlemail.com>
Acked-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
If we receive an ADDBA response with status code 0 and a buf_size of 0
we should stop the TX BA session as otherwise we'll end up queuing
frames in ieee80211_tx_prep_agg forever instead of sending them out as
non AMPDUs.
This fixes a problem with AVM Fritz Stick N wireless devices where
frames to this device are not transmitted anymore by mac80211.
Signed-off-by: Helmut Schaa <helmut.schaa@googlemail.com>
Acked-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
For iwlwifi, I decided not to use this API since
it just increased the complexity for little gain.
Since nobody else intends to use it, let's kill
it again. If anybody later needs to have it, we
can always revive it then.
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
A lot of code is dedicated to giving drivers the
ability to use cfg80211's wext handlers without
completely converting. However, only orinoco is
currently using this, and it is only partially
using it.
We reduce the size of both the source and binary
by removing those that nobody needs. If a driver
shows up that needs it during conversion, we can
add back those that are needed.
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
A lot of drivers erroneously use wext constants
and don't notice since cfg80211.h includes them.
Make this more split up so drivers needing wext
compatibility from cfg80211 need to explicitly
include that from cfg80211-wext.h.
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
Make sure skb dst has reference when moving to
another context. Currently, I don't see protocols that can
hit it when sending broadcasts/multicasts to loopback using
noref dsts, so it is just a precaution.
Signed-off-by: Julian Anastasov <ja@ssi.bg>
Signed-off-by: David S. Miller <davem@davemloft.net>
The raw sockets can provide source address for
routing but their privileges are not considered. We
can provide non-local source address, make sure the
FLOWI_FLAG_ANYSRC flag is set if socket has privileges
for this, i.e. based on hdrincl (IP_HDRINCL) and
transparent flags.
Signed-off-by: Julian Anastasov <ja@ssi.bg>
Signed-off-by: David S. Miller <davem@davemloft.net>
TCP in some cases uses different global (raw) socket
to send RST and ACK. The transparent flag is not set there.
Currently, it is a problem for rerouting after the previous
change.
Fix it by simplifying the checks in ip_route_me_harder
and use FLOWI_FLAG_ANYSRC even for sockets. It looks safe
because the initial routing allowed this source address to
be used and now we just have to make sure the packet is rerouted.
As a side effect this also allows rerouting for normal
raw sockets that use spoofed source addresses which was not possible
even before we eliminated the ip_route_input call.
Signed-off-by: Julian Anastasov <ja@ssi.bg>
Signed-off-by: David S. Miller <davem@davemloft.net>
IP_PKTOPTIONS is broken for 32-bit applications running
in COMPAT mode on 64-bit kernels.
This happens because msghdr's msg_flags field is always
set to zero. When running in COMPAT mode this should be
set to MSG_CMSG_COMPAT instead.
Signed-off-by: Tiberiu Szocs-Mihai <tszocs@ixiacom.com>
Signed-off-by: Daniel Baluta <dbaluta@ixiacom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
compare_keys and ip_route_input_common rely on
rt_oif for distinguishing of input and output routes
with same keys values. But sometimes the input route has
also same hash chain (keyed by iif != 0) with the output
routes (keyed by orig_oif=0). Problem visible if running
with small number of rhash_entries.
Fix them to use rt_route_iif instead. By this way
input route can not be returned to users that request
output route.
The patch fixes the ip_rt_bug errors that were
reported in ip_local_out context, mostly for 255.255.255.255
destinations.
Signed-off-by: Julian Anastasov <ja@ssi.bg>
Signed-off-by: David S. Miller <davem@davemloft.net>
Computers have become a lot faster since we compromised on the
partial MD4 hash which we use currently for performance reasons.
MD5 is a much safer choice, and is inline with both RFC1948 and
other ISS generators (OpenBSD, Solaris, etc.)
Furthermore, only having 24-bits of the sequence number be truly
unpredictable is a very serious limitation. So the periodic
regeneration and 8-bit counter have been removed. We compute and
use a full 32-bit sequence number.
For ipv6, DCCP was found to use a 32-bit truncated initial sequence
number (it needs 43-bits) and that is fixed here as well.
Reported-by: Dan Kaminsky <dan@doxpara.com>
Tested-by: Willy Tarreau <w@1wt.eu>
Signed-off-by: David S. Miller <davem@davemloft.net>
When support for binding to 'mapped INADDR_ANY (::ffff.0.0.0.0)' was added
in 0f8d3c7ac3 the rest of the code
wasn't told so now it's possible to bind IPv6 datagram socket to
::ffff.0.0.0.0, connect it to another IPv4 address and it will all
work except for getsockhame() which does not return the local address
as expected.
To give getsockname() something to work with check for 'mapped INADDR_ANY'
when connecting and update the in-core source addresses appropriately.
Signed-off-by: Max Matveev <makc@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The sendmmsg() introduced by commit 228e548e "net: Add sendmmsg socket system
call" is capable of sending to multiple different destination addresses.
SMACK is using destination's address for checking sendmsg() permission.
However, security_socket_sendmsg() is called for only once even if multiple
different destination addresses are passed to sendmmsg().
Therefore, we need to call security_socket_sendmsg() for each destination
address rather than only the first destination address.
Since calling security_socket_sendmsg() every time when only single destination
address was passed to sendmmsg() is a waste of time, omit calling
security_socket_sendmsg() unless destination address of previous datagram and
that of current datagram differs.
Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Acked-by: Anton Blanchard <anton@samba.org>
Cc: stable <stable@kernel.org> [3.0+]
Signed-off-by: David S. Miller <davem@davemloft.net>
To limit the amount of time we can spend in sendmmsg, cap the
number of elements to UIO_MAXIOV (currently 1024).
For error handling an application using sendmmsg needs to retry at
the first unsent message, so capping is simpler and requires less
application logic than returning EINVAL.
Signed-off-by: Anton Blanchard <anton@samba.org>
Cc: stable <stable@kernel.org> [3.0+]
Signed-off-by: David S. Miller <davem@davemloft.net>
sendmmsg uses a similar error return strategy as recvmmsg but it
turns out to be a confusing way to communicate errors.
The current code stores the error code away and returns it on the next
sendmmsg call. This means a call with completely valid arguments could
get an error from a previous call.
Change things so we only return an error if no datagrams could be sent.
If less than the requested number of messages were sent, the application
must retry starting at the first failed one and if the problem is
persistent the error will be returned.
This matches the behaviour of other syscalls like read/write - it
is not an error if less than the requested number of elements are sent.
Signed-off-by: Anton Blanchard <anton@samba.org>
Cc: stable <stable@kernel.org> [3.0+]
Signed-off-by: David S. Miller <davem@davemloft.net>
Gergely Kalman reported crashes in check_peer_redir().
It appears commit f39925dbde (ipv4: Cache learned redirect
information in inetpeer.) added a race, leading to possible NULL ptr
dereference.
Since we can now change dst neighbour, we should make sure a reader can
safely use a neighbour.
Add RCU protection to dst neighbour, and make sure check_peer_redir()
can be called safely by different cpus in parallel.
As neighbours are already freed after one RCU grace period, this patch
should not add typical RCU penalty (cache cold effects)
Many thanks to Gergely for providing a pretty report pointing to the
bug.
Reported-by: Gergely Kalman <synapse@hippy.csoma.elte.hu>
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
When assigning a NULL value to an RCU protected pointer, no barrier
is needed. The rcu_assign_pointer, used to handle that but will soon
change to not handle the special case.
Convert all rcu_assign_pointer of NULL value.
//smpl
@@ expression P; @@
- rcu_assign_pointer(P, NULL)
+ RCU_INIT_POINTER(P, NULL)
// </smpl>
Signed-off-by: Stephen Hemminger <shemminger@vyatta.com>
Acked-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Update the code to handle some of the differences between
RFC 3041 and RFC 4941, which obsoletes it. Also a couple
of janitorial fixes.
- Allow router advertisements to increase the lifetime of
temporary addresses. This was not allowed by RFC 3041,
but is specified by RFC 4941. It is useful when RA
lifetimes are lower than TEMP_{VALID,PREFERRED}_LIFETIME:
in this case, the previous code would delete or deprecate
addresses prematurely.
- Change the default of MAX_RETRY to 3 per RFC 4941.
- Add a comment to clarify that the preferred and valid
lifetimes in inet6_ifaddr are relative to the timestamp.
- Shorten lines to 80 characters in a couple of places.
Signed-off-by: Lorenzo Colitti <lorenzo@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Since skb_copy_bits() is called from assembly, add a fat comment to make
clear we should think twice before changing its prototype.
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
My @hp.com will no longer be valid starting August 5, 2011 so an update is
necessary. My new email address is employer independent so we don't have
to worry about doing this again any time soon.
Signed-off-by: Paul Moore <paul.moore@hp.com>
Signed-off-by: Paul Moore <paul@paul-moore.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Reported-by: Pascal Hambourg <pascal@plouf.fr.eu.org>
Signed-off-by: Chas Williams <chas@cmf.nrl.navy.mil>
Signed-off-by: David S. Miller <davem@davemloft.net>
The test is off by one so we'd read past the end of the
wiphy->bands[] array on the next line.
Signed-off-by: Dan Carpenter <error27@gmail.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
This patch causes CCID-2 to check the Ack Ratio after reducing the congestion
window. If the Ack Ratio is greater than the congestion window, it is
reduced. This prevents timeouts caused by an Ack Ratio larger than the
congestion window.
In this situation, we choose to set the Ack Ratio to half the congestion window
(or one if that's zero) so that if we loose one ack we don't trigger a timeout.
Signed-off-by: Samuel Jero <sj323707@ohio.edu>
Acked-by: Gerrit Renker <gerrit@erg.abdn.ac.uk>
This patch fixes an issue where CCID-2 will not increase the congestion
window for numerous RTTs after an idle period, application-limited period,
or a loss once the algorithm is in Congestion Avoidance.
What happens is that, when CCID-2 is in Congestion Avoidance mode, it will
increase hc->tx_packets_acked by one for every packet and will increment cwnd
every cwnd packets. However, if there is now an idle period in the connection,
cwnd will be reduced, possibly below the slow start threshold. This will
cause the connection to go into Slow Start. However, in Slow Start CCID-2
performs this test to increment cwnd every second ack:
++hc->tx_packets_acked == 2
Unfortunately, this will be incorrect, if cwnd previous to the idle period
was larger than 2 and if tx_packets_acked was close to cwnd. For example:
cwnd=50 and tx_packets_acked=45.
In this case, the current code, will increment tx_packets_acked until it
equals two, which will only be once tx_packets_acked (an unsigned 32-bit
integer) overflows.
My fix is simply to change that test for tx_packets_acked greater than or
equal to two in slow start.
Signed-off-by: Samuel Jero <sj323707@ohio.edu>
Acked-by: Gerrit Renker <gerrit@erg.abdn.ac.uk>
Add a check to prevent CCID-2 from increasing the cwnd greater than the
Sequence Window.
When the congestion window becomes bigger than the Sequence Window, CCID-2
will attempt to keep more data in the network than the DCCP Sequence Window
code considers possible. This results in the Sequence Window code issuing
a Sync, thereby inducing needless overhead. Further, if this occurs at the
sender, CCID-2 will never detect the problem because the Acks it receives
will indicate no losses. I have seen this cause a drop of 1/3rd in throughput
for a connection.
Also add code to adjust the Sequence Window to be about 5 times the number of
packets in the network (RFC 4340, 7.5.2) and to adjust the Ack Ratio so that
the remote Sequence Window will hold about 5 times the number of packets in
the network. This allows the congestion window to increase correctly without
being limited by the Sequence Window.
Signed-off-by: Samuel Jero <sj323707@ohio.edu>
Acked-by: Gerrit Renker <gerrit@erg.abdn.ac.uk>
This uses the new feature-negotiation framework to signal Ack Ratio changes,
as required by RFC 4341, sec. 6.1.2.
That raises some problems with CCID-2, which at the moment can not cope
gracefully with Ack Ratios > 1. Since these issues are not directly related
to feature negotiation, they are marked by a FIXME.
Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk>
Signed-off-by: Samuel Jero <sj323707@ohio.edu>
Acked-by: Ian McDonald <ian.mcdonald@jandi.co.uk>
If a connection is in the OPEN state, remove feature negotiation Confirm
options from the list of options after sending them once; as such options
are NOT supposed to be retransmitted and are ONLY supposed to be sent in
response to a Change option (RFC 4340 6.2).
Signed-off-by: Samuel Jero <sj323707@ohio.edu>
Acked-by: Gerrit Renker <gerrit@erg.abdn.ac.uk>
This patch adds the receiver side and the (fast-path) activation part for
dynamic changes of non-negotiable (NN) parameters in (PART)OPEN state.
Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk>
Signed-off-by: Samuel Jero <sj323707@ohio.edu>
Acked-by: Ian McDonald <ian.mcdonald@jandi.co.uk>
In contrast to static feature negotiation at the begin of a connection, this
patch introduces support for exchange of dynamically changing options.
Such an update/exchange is necessary in at least two cases:
* CCID-2's Ack Ratio (RFC 4341, 6.1.2) which changes during the connection;
* Sequence Window values that, as per RFC 4340, 7.5.2, should be sent "as
the connection progresses".
Both are non-negotiable (NN) features, which means that no new capabilities
are negotiated, but rather that changes in known parameters are brought
up-to-date at either end.
Thse characteristics are reflected by the implementation:
* only NN options can be exchanged after connection setup;
* an ack is scheduled directly after activation to speed up the update;
* CCIDs may request changes to an NN feature even if a negotiation for that
feature is already underway: this is required by CCID-2, where changes in
cwnd necessitate Ack Ratio changes, such that the previous Ack Ratio (which
is still being negotiated) would cause irrecoverable RTO timeouts (thanks
to work by Samuel Jero).
Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk>
Signed-off-by: Samuel Jero <sj323707@ohio.edu>
Acked-by: Ian McDonald <ian.mcdonald@jandi.co.uk>
commit 8efa885406 (sch_sfq: avoid giving spurious NET_XMIT_CN signals)
forgot to call qdisc_tree_decrease_qlen() to signal upper levels that a
packet (from another flow) was dropped, leading to various problems.
With help from Michal Soltys and Michal Pokrywka, who did a bisection.
Bugzilla ref: https://bugzilla.kernel.org/show_bug.cgi?id=39372
Debian ref: http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=631945
Reported-by: Lucas Bocchi <lucas.bocchi@gmail.com>
Reported-and-bisected-by: Michal Pokrywka <wolfmoon@o2.pl>
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
CC: Michal Soltys <soltys@ziu.info>
Acked-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Convert array index from the loop bound to the loop index.
A simplified version of the semantic patch that fixes this problem is as
follows: (http://coccinelle.lip6.fr/)
// <smpl>
@@
expression e1,e2,ar;
@@
for(e1 = 0; e1 < e2; e1++) { <...
ar[
- e2
+ e1
]
...> }
// </smpl>
Signed-off-by: Julia Lawall <julia@diku.dk>
Signed-off-by: David S. Miller <davem@davemloft.net>
Even using percpu stats, we still hit tunnel dst_entry refcount in
ip6_tnl_xmit2()
Since we are in RCU locked section, we can use skb_dst_set_noref() and
avoid these atomic operations, leaving dst shared on cpus.
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
ipv6_destopt_rcv() runs with rcu_read_lock(), so there is no need to
take a temporay reference on dst_entry, even if skb is freed by
ip6_parse_tlv()
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Use RCU to avoid changing dst_entry refcount in fast path.
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
ICMP and ND are not fast path, but still we can avoid changing idev
refcount, using RCU.
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The configuration of ebtables shouldn't depend on
CONFIG_BRIDGE_NETFILTER, only on CONFIG_NETFILTER.
Reported-by: Sbastien Laveze <slaveze@gmail.com>
Signed-off-by: Bart De Schuymer <bdschuym@pandora.be>
Signed-off-by: Patrick McHardy <kaber@trash.net>
ipq_build_packet_message() in net/ipv4/netfilter/ip_queue.c and
net/ipv6/netfilter/ip6_queue.c contain a small potential mem leak as
far as I can tell.
We allocate memory for 'skb' with alloc_skb() annd then call
nlh = NLMSG_PUT(skb, 0, 0, IPQM_PACKET, size - sizeof(*nlh));
NLMSG_PUT is a macro
NLMSG_PUT(skb, pid, seq, type, len) \
NLMSG_NEW(skb, pid, seq, type, len, 0)
that expands to NLMSG_NEW, which is also a macro which expands to:
NLMSG_NEW(skb, pid, seq, type, len, flags) \
({ if (unlikely(skb_tailroom(skb) < (int)NLMSG_SPACE(len))) \
goto nlmsg_failure; \
__nlmsg_put(skb, pid, seq, type, len, flags); })
If we take the true branch of the 'if' statement and 'goto
nlmsg_failure', then we'll, at that point, return from
ipq_build_packet_message() without having assigned 'skb' to anything
and we'll leak the memory we allocated for it when it goes out of
scope.
Fix this by placing a 'kfree(skb)' at 'nlmsg_failure'.
I admit that I do not know how likely this to actually happen or even
if there's something that guarantees that it will never happen - I'm
not that familiar with this code, but if that is so, I've not been
able to spot it.
Signed-off-by: Jesper Juhl <jj@chaosbits.net>
Signed-off-by: Patrick McHardy <kaber@trash.net>
commit 4a5a5c73b7 (slightly better error reporting) added some
useless code in xt_rateest_mt_checkentry().
Fix this so that different error codes can really be returned.
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
CC: Jan Engelhardt <jengelh@medozas.de>
Signed-off-by: Patrick McHardy <kaber@trash.net>