On 2015/11/06, Dmitry Vyukov reported a deadlock involving the splice
system call and AF_UNIX sockets,
http://lists.openwall.net/netdev/2015/11/06/24
The situation was analyzed as
(a while ago) A: socketpair()
B: splice() from a pipe to /mnt/regular_file
does sb_start_write() on /mnt
C: try to freeze /mnt
wait for B to finish with /mnt
A: bind() try to bind our socket to /mnt/new_socket_name
lock our socket, see it not bound yet
decide that it needs to create something in /mnt
try to do sb_start_write() on /mnt, block (it's
waiting for C).
D: splice() from the same pipe to our socket
lock the pipe, see that socket is connected
try to lock the socket, block waiting for A
B: get around to actually feeding a chunk from
pipe to file, try to lock the pipe. Deadlock.
on 2015/11/10 by Al Viro,
http://lists.openwall.net/netdev/2015/11/10/4
The patch fixes this by removing the kern_path_create related code from
unix_mknod and executing it as part of unix_bind prior acquiring the
readlock of the socket in question. This means that A (as used above)
will sb_start_write on /mnt before it acquires the readlock, hence, it
won't indirectly block B which first did a sb_start_write and then
waited for a thread trying to acquire the readlock. Consequently, A
being blocked by C waiting for B won't cause a deadlock anymore
(effectively, both A and B acquire two locks in opposite order in the
situation described above).
Dmitry Vyukov(<dvyukov@google.com>) tested the original patch.
Signed-off-by: Rainer Weikusat <rweikusat@mobileactivedefense.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Commands run in a vrf context are not failing as expected on a route lookup:
root@kenny:~# ip ro ls table vrf-red
unreachable default
root@kenny:~# ping -I vrf-red -c1 -w1 10.100.1.254
ping: Warning: source address might be selected on device other than vrf-red.
PING 10.100.1.254 (10.100.1.254) from 0.0.0.0 vrf-red: 56(84) bytes of data.
--- 10.100.1.254 ping statistics ---
2 packets transmitted, 0 received, 100% packet loss, time 999ms
Since the vrf table does not have a route for 10.100.1.254 the ping
should have failed. The saddr lookup causes a full VRF table lookup.
Propogating a lookup failure to the user allows the command to fail as
expected:
root@kenny:~# ping -I vrf-red -c1 -w1 10.100.1.254
connect: No route to host
Signed-off-by: David Ahern <dsa@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Expose socket options for setting a classic or extended BPF program
for use when selecting sockets in an SO_REUSEPORT group. These options
can be used on the first socket to belong to a group before bind or
on any socket in the group after bind.
This change includes refactoring of the existing sk_filter code to
allow reuse of the existing BPF filter validation checks.
Signed-off-by: Craig Gallek <kraig@google.com>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Include a struct sock_reuseport instance when a UDP socket binds to
a specific address for the first time with the reuseport flag set.
When selecting a socket for an incoming UDP packet, use the information
available in sock_reuseport if present.
This required adding an additional field to the UDP source address
equality function to differentiate between exact and wildcard matches.
The original use case allowed wildcard matches when checking for
existing port uses during bind. The new use case of adding a socket
to a reuseport group requires exact address matching.
Performance test (using a machine with 2 CPU sockets and a total of
48 cores): Create reuseport groups of varying size. Use one socket
from this group per user thread (pinning each thread to a different
core) calling recvmmsg in a tight loop. Record number of messages
received per second while saturating a 10G link.
10 sockets: 18% increase (~2.8M -> 3.3M pkts/s)
20 sockets: 14% increase (~2.9M -> 3.3M pkts/s)
40 sockets: 13% increase (~3.0M -> 3.4M pkts/s)
This work is based off a similar implementation written by
Ying Cai <ycai@google.com> for implementing policy-based reuseport
selection.
Signed-off-by: Craig Gallek <kraig@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
struct sock_reuseport is an optional shared structure referenced by each
socket belonging to a reuseport group. When a socket is bound to an
address/port not yet in use and the reuseport flag has been set, the
structure will be allocated and attached to the newly bound socket.
When subsequent calls to bind are made for the same address/port, the
shared structure will be updated to include the new socket and the
newly bound socket will reference the group structure.
Usually, when an incoming packet was destined for a reuseport group,
all sockets in the same group needed to be considered before a
dispatching decision was made. With this structure, an appropriate
socket can be found after looking up just one socket in the group.
This shared structure will also allow for more complicated decisions to
be made when selecting a socket (eg a BPF filter).
This work is based off a similar implementation written by
Ying Cai <ycai@google.com> for implementing policy-based reuseport
selection.
Signed-off-by: Craig Gallek <kraig@google.com>
Acked-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This is the first NFC pull request for 4.5 and it brings:
- A new driver for the STMicroelectronics ST95HF NFC chipset.
The ST95HF is an NFC digital transceiver with an embedded analog
front-end and as such relies on the Linux NFC digital
implementation. This is the 3rd user of the NFC digital stack.
- ACPI support for the ST st-nci and st21nfca drivers.
- A small improvement for the nfcsim driver, as we can now tune
the Rx delay through sysfs.
- A bunch of minor cleanups and small fixes from Christophe Ricard,
for a few drivers and the NFC core code.
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1
iQIcBAABAgAGBQJWgsv0AAoJEIqAPN1PVmxKvukP/3eJwA+chAUF89/fqwqFTaJN
fffLoOxx2OBIbTXD2VV36yw4bAo9tbKDAYBZiot3Ig7Kg0SeCJ5oPA9xCVnWPrEY
hxFAldvl+lWQs8nrUOgUZItiFBeUPdfW9YX/yKhUZVc+602nUG/e/+6x8B5MhIce
SAfgCyd0c16DApltP7sw1muyZMvsO6Ow6dyNzDUVYZuabvEhe3SLSj9KFJi7Thsp
h41Iv+bwPLhwF4RXGA6rei/gdEDSMRohprdj3uTDiTarGW+OpcAO0zWACS5m4eR0
zF19+HjGPUk/LpFRaU31xX0ZQQjOTmmfsOt4FBb3P7oJx47egycsadLYPexeG7nj
ruyS6ezlRX1I/tZsnyLNJK92mK5TXYLz2uJ8r2ii/BgPNE+AErB3zKCC+EjXzWhh
AvClGu5b88WJLxoq3I3l5evPwGhebGZ8N/1uiFsHOxvzKVLgxwOmNLRGN4XXxB2i
UbIHgBb6smsu/l+3q9R83kfoMaoWnr+OUIi2QPQVDt/K7t1LfsCuIhzcGSgo1VuW
fGlA1iu+CNDknofeCl4JDo2UXAETO4gdKWw87GXeUcbbraLUczZeO7FFLZqxbMYc
OCaPYshmVFeZRypYdRWDHw67ivj0/h+9iq4PP1XOROkRFH746dD/p4yamJwVi20B
samZ8VPwzgH3/ohQJyX3
=VFUH
-----END PGP SIGNATURE-----
Merge tag 'nfc-next-4.5-1' of git://git.kernel.org/pub/scm/linux/kernel/git/sameo/nfc-next
Samuel Ortiz says:
====================
NFC 4.5 pull request
This is the first NFC pull request for 4.5 and it brings:
- A new driver for the STMicroelectronics ST95HF NFC chipset.
The ST95HF is an NFC digital transceiver with an embedded analog
front-end and as such relies on the Linux NFC digital
implementation. This is the 3rd user of the NFC digital stack.
- ACPI support for the ST st-nci and st21nfca drivers.
- A small improvement for the nfcsim driver, as we can now tune
the Rx delay through sysfs.
- A bunch of minor cleanups and small fixes from Christophe Ricard,
for a few drivers and the NFC core code.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Backport of this upstream commit into stable kernels :
89c22d8c3b27 ("net: Fix skb csum races when peeking")
exposed a bug in udp stack vs MSG_PEEK support, when user provides
a buffer smaller than skb payload.
In this case,
skb_copy_and_csum_datagram_iovec(skb, sizeof(struct udphdr),
msg->msg_iov);
returns -EFAULT.
This bug does not happen in upstream kernels since Al Viro did a great
job to replace this into :
skb_copy_and_csum_datagram_msg(skb, sizeof(struct udphdr), msg);
This variant is safe vs short buffers.
For the time being, instead reverting Herbert Xu patch and add back
skb->ip_summed invalid changes, simply store the result of
udp_lib_checksum_complete() so that we avoid computing the checksum a
second time, and avoid the problematic
skb_copy_and_csum_datagram_iovec() call.
This patch can be applied on recent kernels as it avoids a double
checksumming, then backported to stable kernels as a bug fix.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Acked-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
Since 79c441ae505c ("ppp: implement x-netns support"), the PPP layer
calls skb_scrub_packet() whenever the skb is received on the PPP
device. Manually resetting packet meta-data in the L2TP layer is thus
redundant.
Signed-off-by: Guillaume Nault <g.nault@alphalink.fr>
Signed-off-by: David S. Miller <davem@davemloft.net>
The ieee802154_llsec_ops structure is never modified, so declare it as
const.
Done with the help of Coccinelle.
Signed-off-by: Julia Lawall <Julia.Lawall@lip6.fr>
Acked-by: Alexander Aring <alex.aring@gmail.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
The IDs should all be for Broadcom BCM43241 module, and
hci_bcm is now the proper driver for them. This removes one
of two different ways of handling PM with the module.
Cc: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: Heikki Krogerus <heikki.krogerus@linux.intel.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
You can use this to forward packets from ingress to the egress path of
the specified interface. This provides a fast path to bounce packets
from one interface to another specific destination interface.
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
A _lot_ of ->write() instances were open-coding it; some are
converted to memdup_user_nul(), a lot more remain...
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
You can use this to duplicate packets and inject them at the egress path
of the specified interface. This duplication allows you to inspect
traffic from the dummy or any other interface dedicated to this purpose.
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
This patch allows you to invert the ratelimit matching criteria, so you
can match packets over the ratelimit. This is required to support what
hashlimit does.
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Johan Hedberg says:
====================
pull request: bluetooth-next 2015-12-31
Here's (probably) the last bluetooth-next pull request for the 4.5
kernel:
- Add support for BCM2E65 ACPI ID
- Minor fixes/cleanups in the bcm203x & bfusb drivers
- Minor debugfs related fix in 6lowpan code
Please let me know if there are any issues pulling. Thanks.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Ethernet PHYs can maintain statistics, for example errors while idle
and receive errors. Add an ethtool mechanism to retrieve these
statistics, using the same model as MAC statistics.
Signed-off-by: Andrew Lunn <andrew@lunn.ch>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The missing break means that we always return EAFNOSUPPORT when
faced with a request for an IPv6 loopback address.
Reported-by: coverity (CID 401987)
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
In sctp_close, sctp_make_abort_user may return NULL because of memory
allocation failure. If this happens, it will bypass any state change
and never free the assoc. The assoc has no chance to be freed and it
will be kept in memory with the state it had even after the socket is
closed by sctp_close().
So if sctp_make_abort_user fails to allocate memory, we should abort
the asoc via sctp_primitive_ABORT as well. Just like the annotation in
sctp_sf_cookie_wait_prm_abort and sctp_sf_do_9_1_prm_abort said,
"Even if we can't send the ABORT due to low memory delete the TCB.
This is a departure from our typical NOMEM handling".
But then the chunk is NULL (low memory) and the SCTP_CMD_REPLY cmd would
dereference the chunk pointer, and system crash. So we should add
SCTP_CMD_REPLY cmd only when the chunk is not NULL, just like other
places where it adds SCTP_CMD_REPLY cmd.
Signed-off-by: Xin Long <lucien.xin@gmail.com>
Acked-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Commit ceb5d58b2170 ("net: fix sock_wake_async() rcu protection") from
the current 4.4 release cycle introduced a new flags member in
struct socket_wq and moved SOCKWQ_ASYNC_NOSPACE and SOCKWQ_ASYNC_WAITDATA
from struct socket's flags member into that new place.
Unfortunately, the new flags field is never initialized properly, at least
not for the struct socket_wq instance created in sock_alloc_inode().
One particular issue I encountered because of this is that my GNU Emacs
failed to draw anything on my desktop -- i.e. what I got is a transparent
window, including the title bar. Bisection lead to the commit mentioned
above and further investigation by means of strace told me that Emacs
is indeed speaking to my Xorg through an O_ASYNC AF_UNIX socket. This is
reproducible 100% of times and the fact that properly initializing the
struct socket_wq ->flags fixes the issue leads me to the conclusion that
somehow SOCKWQ_ASYNC_WAITDATA got set in the uninitialized ->flags,
preventing my Emacs from receiving any SIGIO's due to data becoming
available and it got stuck.
Make sock_alloc_inode() set the newly created struct socket_wq's ->flags
member to zero.
Fixes: ceb5d58b2170 ("net: fix sock_wake_async() rcu protection")
Signed-off-by: Nicolai Stange <nicstange@gmail.com>
Acked-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Commit 5b48bb8506c5 ("openvswitch: Fix helper reference leak") fixed a
reference leak on helper objects, but inadvertently introduced a leak on
the ct template.
Previously, ct_info.ct->general.use was initialized to 0 by
nf_ct_tmpl_alloc() and only incremented when ovs_ct_copy_action()
returned successful. If an error occurred while adding the helper or
adding the action to the actions buffer, the __ovs_ct_free_action()
cleanup would use nf_ct_put() to free the entry; However, this relies on
atomic_dec_and_test(ct_info.ct->general.use). This reference must be
incremented first, or nf_ct_put() will never free it.
Fix the issue by acquiring a reference to the template immediately after
allocation.
Fixes: cae3a2627520 ("openvswitch: Allow attaching helpers to ct action")
Fixes: 5b48bb8506c5 ("openvswitch: Fix helper reference leak")
Signed-off-by: Joe Stringer <joe@ovn.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
I've moved the check for "number_destination_params" forward
a few lines to avoid leaking "cmd".
Fixes: caa575a86ec1 ('NFC: nci: fix possible crash in nci_core_conn_create')
Acked-by: Christophe Ricard <christophe-h.ricard@st.com>
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>
Add support for missing HCI event EVT_CONNECTIVITY and forward
it to userspace.
Signed-off-by: Christophe Ricard <christophe-h.ricard@st.com>
Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>
net/nfc/nci/hci.c: In function nci_hci_connect_gate :
net/nfc/nci/hci.c:679: warning: comparison is always false due to limited range of data type
In case of error, nci_hci_create_pipe() returns NCI_HCI_INVALID_PIPE,
and not a negative error code.
Correct the check to fix this.
Acked-by: Geert Uytterhoeven <geert@linux-m68k.org>
Reported-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Christophe Ricard <christophe-h.ricard@st.com>
Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>
The definition of DIGITAL_PROTO_NFCA_RF_TECH is modified to support
ISO14443 Type4A tags. Without this change it is not possible to start
polling for ISO14443 Type4A tags from the initiator side.
Signed-off-by: Shikha Singh <shikha.singh@st.com>
Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>
These patches mostly fix send queue ordering issues inside the NFSoRDMA
client, but there are also two patches from Dan Carpenter fixing up smatch
warnings.
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2
iQIcBAABCAAGBQJWdG7tAAoJENfLVL+wpUDrqWEP/j+K2BVkanx0/Yi1WjFHPuTW
sKc02qeuhajVcS4QdLygD+4fcnVHsqfmStOg1dHLkQpG62mot8R59aLctISjndfo
Q8O0JlUPYJSUZ1A5e++8BqHi6IiN+hBw+4rxuyqseEvsgcWI1bzFl/Ajm823fjMB
HjcpLPokKmZvgIRpasBtA4O6Lg01rDb80v9a/DFPOa8cwSVmervgHa1ZEGU/KyZH
5taRYUiIIkankEJE4VqQQ86bWIGKbuGlsHH+7BcDxtqNISQhq8uZsC2EMOxtdjnf
IKHt6B6IVZoMfERSaRHOCkK30sEDUi0u3zYkgodFq7xNTW7MBwmCs/xEUqhLtBoe
KV2imTNOuwzl6v3Vnjm9yKa+0PF8ejg3VYNZOrZWSeBaGPSmyiDlQ05zdcH+F5i7
OQeo1ebIb2Wu6PJgZwA3QbNjWkddlwF7WrA58PbHgxEyYJeMHJ697g2cQiUdCBxl
mbnkHyZqcG4ko2hbt5kNM+1TWQoY+HdGs8BJAV6W+UBO6ZYNL+ciM23uFSb5XiJC
bWUPgUEu6vepW0JaCMt0clpFlWtLHFzclwkgOiWfiNKOtmHeENzE0L4JmQX6QNa6
xW/Vn6ctxVx6AzTNdKZO8VK+XUgkH/7Y/x3H/c+zAeq25M2CJs36zdgkDGc9v4h9
eVEBs2tHgI23aAk8brgY
=kwhB
-----END PGP SIGNATURE-----
Merge tag 'nfs-rdma-4.5' of git://git.linux-nfs.org/projects/anna/nfs-rdma
NFS: NFSoRDMA Client Side Changes
These patches mostly fix send queue ordering issues inside the NFSoRDMA
client, but there are also two patches from Dan Carpenter fixing up smatch
warnings.
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
* tag 'nfs-rdma-4.5' of git://git.linux-nfs.org/projects/anna/nfs-rdma:
xprtrdma: Revert commit e7104a2a9606 ('xprtrdma: Cap req_cqinit').
xprtrdma: Invalidate in the RPC reply handler
xprtrdma: Add ro_unmap_sync method for all-physical registration
xprtrdma: Add ro_unmap_sync method for FMR
xprtrdma: Add ro_unmap_sync method for FRWR
xprtrdma: Introduce ro_unmap_sync method
xprtrdma: Move struct ib_send_wr off the stack
xprtrdma: Disable RPC/RDMA backchannel debugging messages
xprtrdma: xprt_rdma_free() must not release backchannel reqs
xprtrdma: Fix additional uses of spin_lock_irqsave(rb_lock)
xprtrdma: checking for NULL instead of IS_ERR()
xprtrdma: clean up some curly braces
The following sequence inside a batch, although not very useful, is
valid:
add table foo
...
delete table foo
This may be generated by some robot while applying some incremental
upgrade, so remove the defensive checks against this.
This patch keeps the check on the get/dump path by now, we have to
replace the inactive flag by introducing object generations.
Reported-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
If the netdevice is destroyed, the resources that are attached should
be released too as they belong to the device that is now gone.
Suggested-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
We have to release the existing objects on netns removal otherwise we
leak them. Chains are unregistered in first place to make sure no
packets are walking on our rules and sets anymore.
The object release happens by when we unregister the family via
nft_release_afinfo() which is called from nft_unregister_afinfo() from
the corresponding __net_exit path in every family.
Reported-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
xs_reclassify_socket4() and friends used to be called directly.
xs_reclassify_socket() is called instead nowadays.
The xs_reclassify_socketX() helper functions are empty when
CONFIG_DEBUG_LOCK_ALLOC is not defined. Drop them since they have no
callers.
Note that AF_LOCAL still calls xs_reclassify_socketu() directly but is
easily converted to generic xs_reclassify_socket().
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
Accepted or peeled off sockets were missing a security label (e.g.
SELinux) which means that socket was in "unlabeled" state.
This patch clones the sock's label from the parent sock and resolves the
issue (similar to AF_BLUETOOTH protocol family).
Cc: Paul Moore <pmoore@redhat.com>
Cc: David Teigland <teigland@redhat.com>
Signed-off-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Acked-by: Paul Moore <paul@paul-moore.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Commit cacc06215271 ("sctp: use GFP_USER for user-controlled kmalloc")
missed two other spots.
For connectx, as it's more likely to be used by kernel users of the API,
it detects if GFP_USER should be used or not.
Fixes: cacc06215271 ("sctp: use GFP_USER for user-controlled kmalloc")
Reported-by: Dmitry Vyukov <dvyukov@google.com>
Signed-off-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
By moving stats update into iptunnel_xmit(), we can simplify
iptunnel_xmit() usage. With this change there is no need to
call another function (iptunnel_xmit_stats()) to update stats
in tunnel xmit code path.
Signed-off-by: Pravin B Shelar <pshelar@nicira.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
kobj_to_dev has been defined in linux/device.h, so I replace to_dev
with it.
Signed-off-by: Geliang Tang <geliangtang@163.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Marc Haber reported we don't honor interface indexes when we receive link
local router addresses in router advertisements. Luckily the non-strict
version of ipv6_chk_addr already does the correct job here, so we can
simply use it to lighten the checks and use those addresses by default
without any configuration change.
Link: <http://permalink.gmane.org/gmane.linux.network/391348>
Reported-by: Marc Haber <mh+netdev@zugschlus.de>
Cc: Marc Haber <mh+netdev@zugschlus.de>
Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Add a function svc_age_temp_xprts_now() to close temporary transports
whose xpt_local matches the address passed in server_addr immediately
instead of waiting for them to be closed by the timer function.
The function is intended to be used by notifier_blocks that will be
added to nfsd and lockd that will run when an ip address is deleted.
This will eliminate the ACK storms and client hangs that occur in
HA-NFS configurations where nfsd & lockd is left running on the cluster
nodes all the time and the NFS 'service' is migrated back and forth
within a short timeframe.
Signed-off-by: Scott Mayhew <smayhew@redhat.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
Hannes points out that when we generate tcp reset for timewait sockets we
pretend we found no socket and pass NULL sk to tcp_vX_send_reset().
Make it cope with inet tw sockets and then provide tw sk.
This makes RSTs appear on correct interface when SO_BINDTODEVICE is used.
Packetdrill test case:
// want default route to be used, we rely on BINDTODEVICE
`ip route del 192.0.2.0/24 via 192.168.0.2 dev tun0`
0.000 socket(..., SOCK_STREAM, IPPROTO_TCP) = 3
// test case still works due to BINDTODEVICE
0.001 setsockopt(3, SOL_SOCKET, SO_BINDTODEVICE, "tun0", 4) = 0
0.100...0.200 connect(3, ..., ...) = 0
0.100 > S 0:0(0) <mss 1460,sackOK,nop,nop>
0.200 < S. 0:0(0) ack 1 win 32792 <mss 1460,sackOK,nop,nop>
0.200 > . 1:1(0) ack 1
0.210 close(3) = 0
0.210 > F. 1:1(0) ack 1 win 29200
0.300 < . 1:1(0) ack 2 win 46
// more data while in FIN_WAIT2, expect RST
1.300 < P. 1:1001(1000) ack 1 win 46
// fails without this change -- default route is used
1.301 > R 1:1(0) win 0
Reported-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: Florian Westphal <fw@strlen.de>
Acked-by: Eric Dumazet <edumazet@google.com>
Acked-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
tcp_md5_do_lookup requires a full socket, so once we extend
_send_reset() to also accept timewait socket we would have to change
if (!sk && hash_location)
to something like
if ((!sk || !sk_fullsock(sk)) && hash_location) {
...
} else {
(sk && sk_fullsock(sk)) tcp_md5_do_lookup()
}
Switch the two branches: check if we have a socket first, then
fall back to a listener lookup if we saw a md5 option (hash_location).
Signed-off-by: Florian Westphal <fw@strlen.de>
Acked-by: Eric Dumazet <edumazet@google.com>
Acked-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
When sysctl performs restrict writes, it allows to write from
a middle position of a sysctl file, which requires us to initialize
the table data before calling proc_dostring() for the write case.
Fixes: 3d1bec99320d ("ipv6: introduce secret_stable to ipv6_devconf")
Reported-by: Sasha Levin <sasha.levin@oracle.com>
Acked-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Tested-by: Sasha Levin <sasha.levin@oracle.com>
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Steffen Klassert says:
====================
pull request (net): ipsec 2015-12-22
Just one patch to fix dst_entries_init with multiple namespaces.
From Dan Streetman.
Please pull or let me know if there are problems.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
When closing a listen socket, tcp_abort currently calls
tcp_done without clearing the request queue. If the socket has a
child socket that is established but not yet accepted, the child
socket is then left without a parent, causing a leak.
Fix this by setting the socket state to TCP_CLOSE and calling
inet_csk_listen_stop with the socket lock held, like tcp_close
does.
Tested using net_test. With this patch, calling SOCK_DESTROY on a
listen socket that has an established but not yet accepted child
socket results in the parent and the child being closed, such
that they no longer appear in sock_diag dumps.
Reported-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: Lorenzo Colitti <lorenzo@google.com>
Acked-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
ip6addrlbl_get() has never worked. If ip6addrlbl_hold() succeeded,
ip6addrlbl_get() will exit with '-ESRCH'. If ip6addrlbl_hold() failed,
ip6addrlbl_get() will use about to be free ip6addrlbl_entry pointer.
Fix this by inverting ip6addrlbl_hold() check.
Fixes: 2a8cc6c89039 ("[IPV6] ADDRCONF: Support RFC3484 configurable address selection policy table.")
Signed-off-by: Andrey Ryabinin <aryabinin@virtuozzo.com>
Reviewed-by: Cong Wang <cwang@twopensource.com>
Acked-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
The bridge's ageing time is offloaded to hardware when:
1) A port joins a bridge
2) The ageing time of the bridge is changed
In the first case the ageing time is offloaded as jiffies, but in the
second case it's offloaded as clock_t, which is what existing switchdev
drivers expect to receive.
Fixes: 6ac311ae8bfb ("Adding switchdev ageing notification on port bridged")
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
It looks like an attempt to use CPU notifier here which was never
completed. Nobody tried to wire it up completely since 2k9. So I unwind
this code and get rid of everything not required. Oh look! 19 lines were
removed while code still does the same thing.
Acked-by: Santosh Shilimkar <santosh.shilimkar@oracle.com>
Tested-by: Santosh Shilimkar <santosh.shilimkar@oracle.com>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: David S. Miller <davem@davemloft.net>