Stephen Hemminger says:
====================
hv_netvsc: minor fixes
These are improvements to netvsc driver. They aren't functionality
changes so not targeting net-next; and they are not show stopper
bugs that need to go to stable either.
v2
- drop the irq flags patch, defer it to net-next
- split the multicast filter flag patch out
- change propogate rx mode patch to handle startup of vf
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
The netvsc device should propagate filters to the SR-IOV VF
device (if present). The flags also need to be propagated to the
VF device as well. This only really matters on local Hyper-V
since Azure does not support multiple addresses.
Signed-off-by: Stephen Hemminger <sthemmin@microsoft.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The netvsc driver was always enabling all multicast and broadcast
even if netdevice flag had not enabled it.
Signed-off-by: Stephen Hemminger <sthemmin@microsoft.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
When VF is used for accelerated networking it will likely have
more queues (and different policy) than the synthetic NIC.
This patch defers the queue policy to the VF so that all the
queues can be used. This impacts workloads like local generate UDP.
Signed-off-by: Stephen Hemminger <sthemmin@microsoft.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Since the netvsc_channel_cb is already called in interrupt
context from vmbus, there is no need to do irqsave/restore.
Signed-off-by: Stephen Hemminger <sthemmin@microsoft.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
There is a race between napi_reschedule and re-enabling interrupts
which could lead to missed host interrrupts. This occurs when
interrupts are re-enabled (hv_end_read) and vmbus irq callback
(netvsc_channel_cb) has already scheduled NAPI.
Signed-off-by: Stephen Hemminger <sthemmin@microsoft.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Block setup of multiple channels earlier in the teardown
process. This avoids possible races between halt and subchannel
initialization.
Suggested-by: Haiyang Zhang <haiyangz@microsoft.com>
Signed-off-by: Stephen Hemminger <sthemmin@microsoft.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Need to delete NAPI association if vmbus_open fails.
Signed-off-by: Stephen Hemminger <sthemmin@microsoft.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Don't wake transmit queues if link is not up yet.
Signed-off-by: Stephen Hemminger <sthemmin@microsoft.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Change the initialization order so that the device is ready to transmit
(ie connect vsp is completed) before setting the internal reference
to the device with RCU.
This avoids any races on initialization and prevents retry issues
on shutdown.
Signed-off-by: Stephen Hemminger <sthemmin@microsoft.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
XDP_REDIRECT support for mergeable buffer was removed since commit
7324f5399b06 ("virtio_net: disable XDP_REDIRECT in receive_mergeable()
case"). This is because we don't reserve enough tailroom for struct
skb_shared_info which breaks XDP assumption. So this patch fixes this
by reserving enough tailroom and using fixed size of rx buffer.
Signed-off-by: Jason Wang <jasowang@redhat.com>
Acked-by: Jesper Dangaard Brouer <brouer@redhat.com>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
PPP units don't hold any reference on the channels connected to it.
It is the channel's responsibility to ensure that it disconnects from
its unit before being destroyed.
In practice, this is ensured by ppp_unregister_channel() disconnecting
the channel from the unit before dropping a reference on the channel.
However, it is possible for an unregistered channel to connect to a PPP
unit: register a channel with ppp_register_net_channel(), attach a
/dev/ppp file to it with ioctl(PPPIOCATTCHAN), unregister the channel
with ppp_unregister_channel() and finally connect the /dev/ppp file to
a PPP unit with ioctl(PPPIOCCONNECT).
Once in this situation, the channel is only held by the /dev/ppp file,
which can be released at anytime and free the channel without letting
the parent PPP unit know. Then the ppp structure ends up with dangling
pointers in its ->channels list.
Prevent this scenario by forbidding unregistered channels from
connecting to PPP units. This maintains the code logic by keeping
ppp_unregister_channel() responsible from disconnecting the channel if
necessary and avoids modification on the reference counting mechanism.
This issue seems to predate git history (successfully reproduced on
Linux 2.6.26 and earlier PPP commits are unrelated).
Signed-off-by: Guillaume Nault <g.nault@alphalink.fr>
Signed-off-by: David S. Miller <davem@davemloft.net>
iproute2 print_skbmod() prints the configured ethertype using format 0x%X:
therefore, test 9aa8 systematically fails, because it configures action #4
using ethertype 0x0031, and expects 0x0031 when it reads it back. Changing
the expected value to 0x31 lets the test result 'not ok' become 'ok'.
tested with:
# ./tdc.py -e 9aa8
Test 9aa8: Get a single skbmod action from a list
All test results:
1..1
ok 1 9aa8 Get a single skbmod action from a list
Fixes: cf797ac49b94 ("tc-testing: Add test cases for police and skbmod")
Signed-off-by: Davide Caratti <dcaratti@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Until now, we assumed that in case of error when adding FDB entries, the
write operation will fail, but this is not the case. Instead, we need to
check that the number of entries reported in the response is equal to
the number of entries specified in the request.
Fixes: 56ade8fe3fe1 ("mlxsw: spectrum: Add initial support for Spectrum ASIC")
Reported-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Shalom Toledo <shalomt@mellanox.com>
Reviewed-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Daniel Axtens says:
====================
GSO_BY_FRAGS correctness improvements
As requested [1], I went through and had a look at users of gso_size to
see if there were things that need to be fixed to consider
GSO_BY_FRAGS, and I have tried to improve our helper functions to deal
with this case.
I found a few. This fixes bugs relating to the use of
skb_gso_*_seglen() where GSO_BY_FRAGS is not considered.
Patch 1 renames skb_gso_validate_mtu to skb_gso_validate_network_len.
This is follow-up to my earlier patch 2b16f048729b ("net: create
skb_gso_validate_mac_len()"), and just makes everything a bit clearer.
Patches 2 and 3 replace the final users of skb_gso_network_seglen() -
which doesn't consider GSO_BY_FRAGS - with
skb_gso_validate_network_len(), which does. This allows me to make the
skb_gso_*_seglen functions private in patch 4 - now future users won't
accidentally do the wrong comparison.
Two things remain. One is qdisc_pkt_len_init, which is discussed at
[2] - it's caught up in the GSO_DODGY mess. I don't have any expertise
in GSO_DODGY, and it looks like a good clean fix will involve
unpicking the whole validation mess, so I have left it for now.
Secondly, there are 3 eBPF opcodes that change the gso_size of an SKB
and don't consider GSO_BY_FRAGS. This is going through the bpf tree.
Regards,
Daniel
[1] https://patchwork.ozlabs.org/comment/1852414/
[2] https://www.spinics.net/lists/netdev/msg482397.html
PS: This is all in the core networking stack. For a driver to be
affected by this it would need to support NETIF_F_GSO_SCTP /
NETIF_F_GSO_SOFTWARE and then either use gso_size or not be a purely
virtual device. (Many drivers look at gso_size, but do not support
SCTP segmentation, so the core network will segment an SCTP gso before
it hits them.) Based on that, the only driver that may be affected is
sunvnet, but I have no way of testing it, so I haven't looked at it.
v2: split out bpf stuff
fix review comments from Dave Miller
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
They're very hard to use properly as they do not consider the
GSO_BY_FRAGS case. Code should use skb_gso_validate_network_len
and skb_gso_validate_mac_len as they do consider this case.
Make the seglen functions static, which stops people using them
outside of skbuff.c
Signed-off-by: Daniel Axtens <dja@axtens.net>
Reviewed-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Replace skb_gso_network_seglen() with
skb_gso_validate_network_len(), as it considers the GSO_BY_FRAGS
case.
Signed-off-by: Daniel Axtens <dja@axtens.net>
Reviewed-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
tbf_enqueue() checks the size of a packet before enqueuing it.
However, the GSO size check does not consider the GSO_BY_FRAGS
case, and so will drop GSO SCTP packets, causing a massive drop
in throughput.
Use skb_gso_validate_mac_len() instead, as it does consider that
case.
Signed-off-by: Daniel Axtens <dja@axtens.net>
Reviewed-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
If you take a GSO skb, and split it into packets, will the network
length (L3 headers + L4 headers + payload) of those packets be small
enough to fit within a given MTU?
skb_gso_validate_mtu gives you the answer to that question. However,
we recently added to add a way to validate the MAC length of a split GSO
skb (L2+L3+L4+payload), and the names get confusing, so rename
skb_gso_validate_mtu to skb_gso_validate_network_len
Signed-off-by: Daniel Axtens <dja@axtens.net>
Reviewed-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
- fix skb checksum issues, by Matthias Schiffer (2 patches)
- fix exception handling when dumping data objects through netlink,
by Sven Eckelmann (4 patches)
- fix handling of interface indices, by Sven Eckelmann
-----BEGIN PGP SIGNATURE-----
iQJKBAABCgA0FiEE1ilQI7G+y+fdhnrfoSvjmEKSnqEFAlqZjskWHHN3QHNpbW9u
d3VuZGVybGljaC5kZQAKCRChK+OYQpKeoZgAEADcXmuvwW+QVU0L6N8ZDV7Titdp
oIK+w2XDPLa2DcIkJ8ZAGJSK4ZZBrye/jAaRL/1CpBQiGDgP6pGodSihDPNUqCjs
aqw8Zrwfg6BAvIeEFp3ThkCB89MADk6tMRXM4wZ3daNJvfCgHEUZeSl/1/PWWQmq
yBn+DMeXZPOPJZgTTPMQUQ11kpf7UxtT4T9N8lq0KH6ISTbaZChPFlpPPIivpG+9
JLjH40v8BErpIAsZHT2LJ/JPDRPXXul61p3y2sdnVnqUuSUnsc/ElJGLARXNId8P
tkKtOONlZEwnt9ZrXOLqeqxzN+KFs97JSNIuap/hRfpgbG+Qqn4pePiIQVXNHMUC
eBUfbGSWNtRDX/JXcwk6O4nMkTX79UDq97npcDv5sTkWLcIeznacqpXJCA8+2E05
3F34XTDr0HY4QTxDQSEOfQ6qwo7qCxhc8u2Mr8RYNtzSiyRYS3Sy8jdFMMK/RgqT
ggR8mIdR9MSRYhN0VFVHWWL7NL/OImke9ALvb4KioDbvDTFMFhsa/Pp1+K/NE3fK
/E5mwicFFUB0MUmRnZiuN9gXPOy06fWOShouMgqF86Ofx4QpmT/Ph/gGK/HgFFKq
SEfU6UunC3r6BISE4Zu5O7xTEH+AQhEZgyeSTTrTr+HLfNkvyqwGIlhG1HXzyCTx
KWA50ikvjiKGeOCoMA==
=tcl9
-----END PGP SIGNATURE-----
Merge tag 'batadv-net-for-davem-20180302' of git://git.open-mesh.org/linux-merge
Simon Wunderlich says:
====================
Here are some batman-adv bugfixes:
- fix skb checksum issues, by Matthias Schiffer (2 patches)
- fix exception handling when dumping data objects through netlink,
by Sven Eckelmann (4 patches)
- fix handling of interface indices, by Sven Eckelmann
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Pablo Neira Ayuso says:
====================
Netfilter/IPVS fixes for net
The following patchset contains Netfilter fixes for your net tree,
they are:
1) Put back reference on CLUSTERIP configuration structure from the
error path, patch from Florian Westphal.
2) Put reference on CLUSTERIP configuration instead of freeing it,
another cpu may still be walking over it, also from Florian.
3) Refetch pointer to IPv6 header from nf_nat_ipv6_manip_pkt() given
packet manipulation may reallocation the skbuff header, from Florian.
4) Missing match size sanity checks in ebt_among, from Florian.
5) Convert BUG_ON to WARN_ON in ebtables, from Florian.
6) Sanity check userspace offsets from ebtables kernel, from Florian.
7) Missing checksum replace call in flowtable IPv4 DNAT, from Felix
Fietkau.
8) Bump the right stats on checksum error from bridge netfilter,
from Taehee Yoo.
9) Unset interface flag in IPv6 fib lookups otherwise we get
misleading routing lookup results, from Florian.
10) Missing sk_to_full_sk() in ip6_route_me_harder() from Eric Dumazet.
11) Don't allow devices to be part of multiple flowtables at the same
time, this may break setups.
12) Missing netlink attribute validation in flowtable deletion.
13) Wrong array index in nf_unregister_net_hook() call from error path
in flowtable addition path.
14) Fix FTP IPVS helper when NAT mangling is in place, patch from
Julian Anastasov.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
* fix for a regression in 4-addr mode with fast-RX
* fix for a Kconfig problem with the new regdb
* fix for the long-standing TCP performance issue in
wifi using the new sk_pacing_shift_update()
-----BEGIN PGP SIGNATURE-----
iQIzBAABCgAdFiEEH1e1rEeCd0AIMq6MB8qZga/fl8QFAlqZEcoACgkQB8qZga/f
l8SfMw//USsQzokuZOPNW4kI8ntVmp3lPRcIKFOP/CCbC+vuR9nwDE1hNPbGAiBY
SN+ALcF9mv96vVFBvpVp1eZMLBz07+tBqWHCBkbAhHeZ+cEo0VfNoOvyY7mqQtsS
OsKfj0xG0M9+aG/bPLiSZJNvvcPZxbtVwE31zVM0Xl6ogJ6/98t5z+wlbFFvLbOF
e8O/FiO4x2MAY27ITdZepaa7OnS2Jprcoae7csERBH5siqX9jFoH6PDLQObscQZ/
XBHEvOy2NCZrw0B9hDKexIGvwzLgfnyePjweI8B59p+unBSFOXNNoIo3yux4aFcH
WmNQJ11PYpVffk8BS97HmeQmIta4TaFNP0OAUc+aTDZiDRxCjMzQF9dMehxXwR2N
nqeE0wa0LeM+HIH5HFf7iUhv5ORy64eI5mt1HRTzTB4cLRbpD5OPwkdwN94ck4fs
AY8uN2sgZotYOFtrOt4X0HNoU5RqwI7c6a++n75HRafbeoa1uSQm9NlJcPyLD9J9
eAZjVQRC70HBP3PSuUgqzVZyS/6TrT2D0Gan2S555Bao0iA/1Vq59VDn+pfftJsN
8T6d0BM3clv/VYYkqdzP3pFPrzJT/XqWGkRcpfnUBpmTxTkZgkWOpbpTVBCunje7
Xy1zP8itH3bm4/A+zbNUO3H4I/WrUsOLTyEqIJZatIOfnDrhne0=
=d3er
-----END PGP SIGNATURE-----
Merge tag 'mac80211-for-davem-2018-03-02' of git://git.kernel.org/pub/scm/linux/kernel/git/jberg/mac80211
Johannes Berg says:
====================
Three more patches:
* fix for a regression in 4-addr mode with fast-RX
* fix for a Kconfig problem with the new regdb
* fix for the long-standing TCP performance issue in
wifi using the new sk_pacing_shift_update()
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Commit 0933a578cd55 ("rds: tcp: use sock_create_lite() to create the
accept socket") has a reference counting issue in TCP socket creation
when accepting a new connection. The code uses sock_create_lite() to
create a kernel socket. But it does not do __module_get() on the
socket owner. When the connection is shutdown and sock_release() is
called to free the socket, the owner's reference count is decremented
and becomes incorrect. Note that this bug only shows up when the socket
owner is configured as a kernel module.
v2: Update comments
Fixes: 0933a578cd55 ("rds: tcp: use sock_create_lite() to create the accept socket")
Signed-off-by: Ka-Cheong Poon <ka-cheong.poon@oracle.com>
Acked-by: Santosh Shilimkar <santosh.shilimkar@oracle.com>
Acked-by: Sowmini Varadhan <sowmini.varadhan@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Daniel Borkmann says:
====================
pull-request: bpf 2018-02-28
The following pull-request contains BPF updates for your *net* tree.
The main changes are:
1) Add schedule points and reduce the number of loop iterations
the test_bpf kernel module is performing in order to not hog
the CPU for too long, from Eric.
2) Fix an out of bounds access in tail calls in the ppc64 BPF
JIT compiler, from Daniel.
3) Fix a crash on arm64 on unaligned BPF xadd operations that
could be triggered via interpreter and JIT, from Daniel.
Please not that once you merge net into net-next at some point, there
is a minor merge conflict in test_verifier.c since test cases had
been added at the end in both trees. Resolution is trivial: keep all
the test cases from both trees.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
If ethtool_ops->get_fecparam returns an error, pass that error on to the
user, rather than ignoring it.
Fixes: 1a5f3da20bd9 ("net: ethtool: add support for forward error correction modes")
Signed-off-by: Edward Cree <ecree@solarflare.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
When ip_error() is called the device is the l3mdev master instead of the
original device. So the forwarding check should be on the original one.
Changes from v2:
- Handle the original device disappearing (per David Ahern)
- Minimize the change in code order
Changes from v1:
- Only need to reset the device on which __in_dev_get_rcu() is done (per
David Ahern).
Signed-off-by: Stephen Suryaputra <ssuryaextr@gmail.com>
Acked-by: David Ahern <dsahern@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Setting an interface into a VRF fails with 'RTNETLINK answers: File
exists' if one of its VLAN interfaces is already in the same VRF.
As the VRF is an upper device of the VLAN interface, it is also showing
up as an upper device of the interface itself. The solution is to
restrict this check to devices other than master. As only one master
device can be linked to a device, the check in this case is that the
upper device (VRF) being linked to is not the same as the master device
instead of it not being any one of the upper devices.
The following example shows an interface ens12 (with a VLAN interface
ens12.10) being set into VRF green, which behaves as expected:
# ip link add link ens12 ens12.10 type vlan id 10
# ip link set dev ens12 master vrfgreen
# ip link show dev ens12
3: ens12: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc fq_codel
master vrfgreen state UP mode DEFAULT group default qlen 1000
link/ether 52:54:00:4c:a0:45 brd ff:ff:ff:ff:ff:ff
But if the VLAN interface has previously been set into the same VRF,
then setting the interface into the VRF fails:
# ip link set dev ens12 nomaster
# ip link set dev ens12.10 master vrfgreen
# ip link show dev ens12.10
39: ens12.10@ens12: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500
qdisc noqueue master vrfgreen state UP mode DEFAULT group default
qlen 1000 link/ether 52:54:00:4c:a0:45 brd ff:ff:ff:ff:ff:ff
# ip link set dev ens12 master vrfgreen
RTNETLINK answers: File exists
The workaround is to move the VLAN interface back into the default VRF
beforehand, but it has to be shut first so as to avoid the risk of
traffic leaking from the VRF. This fix avoids needing this workaround.
Signed-off-by: Mike Manning <mmanning@att.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The newly introudced ip_min_valid_pmtu variable is only used when
CONFIG_SYSCTL is set:
net/ipv4/route.c:135:12: error: 'ip_min_valid_pmtu' defined but not used [-Werror=unused-variable]
This moves it to the other variables like it, to avoid the harmless
warning.
Fixes: c7272c2f1229 ("net: ipv4: don't allow setting net.ipv4.route.min_pmtu below 68")
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Acked-by: Sabrina Dubroca <sd@queasysnail.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
The IPS_NAT_MASK check in 4.12 replaced previous check for nfct_nat()
which was needed to fix a crash in 2.6.36-rc, see
commit 7bcbf81a2296 ("ipvs: avoid oops for passive FTP").
But as IPVS does not set the IPS_SRC_NAT and IPS_DST_NAT bits,
checking for IPS_NAT_MASK prevents PASV response to be properly
mangled and blocks the transfer. Remove the check as it is not
needed after 3.12 commit 41d73ec053d2 ("netfilter: nf_conntrack:
make sequence number adjustments usuable without NAT") which
changes nfct_nat() with nfct_seqadj() and especially after 3.13
commit b25adce16064 ("ipvs: correct usage/allocation of seqadj
ext in ipvs").
Thanks to Li Shuang and Florian Westphal for reporting the problem!
Reported-by: Li Shuang <shuali@redhat.com>
Fixes: be7be6e161a2 ("netfilter: ipvs: fix incorrect conflict resolution")
Signed-off-by: Julian Anastasov <ja@ssi.bg>
Acked-by: Simon Horman <horms@verge.net.au>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Jiri Pirko says:
====================
mlxsw: couple of fixes
Couple of unrelated fixes for mlxsw.
---
v1->v2:
-patch 2:
- rebase on top of current -net tree
- removed forgotten empty line
-patch 3:
- new patch
-patch 4:
- new patch
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
One of the basic construct in the device is a port-VLAN pair, which can
be bound to a FID or a RIF in order to direct packets to the bridge or
the router, respectively.
Since not all the netdevs are configured with a VLAN (e.g., sw1p1 vs.
sw1p1.10), VID 1 is used to represent these and thus this VID can be
used by both upper devices of mlxsw ports and by the driver itself.
However, this VID is not reference counted and therefore might be freed
prematurely, which can result in various WARNINGs. For example:
$ ip link add name br0 type bridge vlan_filtering 1
$ teamd -t team0 -d -c '{"runner": {"name": "lacp"}}'
$ ip link set dev team0 master br0
$ ip link set dev enp1s0np1 master team0
$ ip address add 192.0.2.1/24 dev enp1s0np1
The enslavement to team0 will fail because team0 already has an upper
and thus vlan_vids_del_by_dev() will be executed as part of team's error
path which will delete VID 1 from enp1s0np1 (added by br0 as PVID). The
WARNING will be generated when the driver will realize it can't find VID
1 on the port and bind it to a RIF.
Fix this by adding a reference count to the VLAN entries on the port, in
a similar fashion to the reference counting used by the corresponding
'vlan_vid_info' structure in the 8021q driver.
Fixes: c57529e1d5d8 ("mlxsw: spectrum: Replace vPorts with Port-VLAN")
Reported-by: Tal Bar <talb@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Tested-by: Tal Bar <talb@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
When multicast snooping is enabled, the Linux bridge resorts to flooding
unregistered multicast packets to all ports only in case it did not
detect a querier in the network.
The above condition is not reflected to underlying drivers, which is
especially problematic in IPv6 environments, as multicast snooping is
enabled by default and since neighbour solicitation packets might be
treated as unregistered multicast packets in case there is no
corresponding MDB entry.
Until the Linux bridge reflects its querier state to underlying drivers,
simply treat unregistered multicast packets as broadcast and allow them
to reach their destination.
Fixes: 9df552ef3e21 ("mlxsw: spectrum: Improve IPv6 unregistered multicast flooding")
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Reported-by: David Ahern <dsahern@gmail.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Current code uses global variables, adjusts them and passes pointer down
to devlink. With every other mlxsw_core instance, the previously passed
pointer values are rewritten. Fix this by de-globalize the variables and
also memcpy size_params during devlink resource registration.
Also, introduce a convenient size_param_init helper.
Fixes: ef3116e5403e ("mlxsw: spectrum: Register KVD resources with devlink")
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
IP_TTL, IP_ECN and IP_DSCP are using the same offset within the
scratchpad as L4 ports. Fix this by shifting all up.
Fixes: 5f57e0909136 ("mlxsw: acl: Add ip ttl acl element")
Fixes: i80d0fe4710c ("mlxsw: acl: Add ip tos acl element")
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Ursula Braun says:
====================
net/smc: fixes 2018-02-28
here are 3 smc bug fixes for the net-tree. Karsten's first patch is
the reworked version of last week's
"[PATCH net-next 2/5] net/smc: fix structure size"
patch, now solved without using __packed, and now targetted for net
instead of net-next.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
The CONFIRM LINK reply message must contain the link_id sent
by the server. And set the link_id explicitly when
initializing the link.
Signed-off-by: Karsten Graul <kgraul@linux.vnet.ibm.com>
Signed-off-by: Ursula Braun <ubraun@linux.vnet.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The sizeof(struct smc_cdc_msg) evaluates to 48 bytes instead of the
required 44 bytes. We need to use the constant value of
SMC_WR_TX_SIZE to set and check the control message length.
Signed-off-by: Karsten Graul <kgraul@linux.vnet.ibm.com>
Signed-off-by: Ursula Braun <ubraun@linux.vnet.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
We try to disable NAPI to prevent a single XDP TX queue being used by
multiple cpus. But we don't check if device is up (NAPI is enabled),
this could result stall because of infinite wait in
napi_disable(). Fixing this by checking device state through
netif_running() before.
Fixes: 4941d472bf95b ("virtio-net: do not reset during XDP set")
Signed-off-by: Jason Wang <jasowang@redhat.com>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The link to the pdf containing the algorithm description is now a
dead link; it seems http://www.ifp.illinois.edu/~srikant/ has been
moved to https://sites.google.com/a/illinois.edu/srikant/ and none of
the original papers can be found there...
I have replaced it with the only working copy I was able to find.
n.b. there is also a copy available at:
http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.296.6350&rep=rep1&type=pdf
However, this seems to only be a *cached* version, so I am unsure
exactly how reliable that link can be expected to remain over time
and have decided against using that one.
Signed-off-by: Joey Pabalinas <joeypabalinas@gmail.com>
1 file changed, 1 insertion(+), 1 deletion(-)
Signed-off-by: David S. Miller <davem@davemloft.net>
For tests that are using the maximal number of BPF instruction, each
run takes 20 usec. Looping 10,000 times on them totals 200 ms, which
is bad when the loop is not preemptible.
test_bpf: #264 BPF_MAXINSNS: Call heavy transformations jited:1 19248
18548 PASS
test_bpf: #269 BPF_MAXINSNS: ld_abs+get_processor_id jited:1 20896 PASS
Lets divide by ten the number of iterations, so that max latency is
20ms. We could use need_resched() to break the loop earlier if we
believe 20 ms is too much.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
When the connection is reset, there is no point in
keeping the packets on the write queue until the connection
is closed.
RFC 793 (page 70) and RFC 793-bis (page 64) both suggest
purging the write queue upon RST:
https://tools.ietf.org/html/draft-ietf-tcpm-rfc793bis-07
Moreover, this is essential for a correct MSG_ZEROCOPY
implementation, because userspace cannot call close(fd)
before receiving zerocopy signals even when the connection
is reset.
Fixes: f214f915e7db ("tcp: enable MSG_ZEROCOPY")
Signed-off-by: Soheil Hassas Yeganeh <soheil@google.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Yuchung Cheng <ycheng@google.com>
Signed-off-by: Neal Cardwell <ncardwell@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Yuchung Cheng says:
====================
tcp: revert a F-RTO extension due to broken middle-boxes
This patch series reverts a (non-standard) TCP F-RTO extension that aimed
to detect more spurious timeouts. Unfortunately it could result in poor
performance due to broken middle-boxes that modify TCP packets. E.g.
https://www.spinics.net/lists/netdev/msg484154.html
We believe the best and simplest solution is to just revert the change.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
This reverts commit 89fe18e44f7ee5ab1c90d0dff5835acee7751427.
While the patch could detect more spurious timeouts, it could cause
poor TCP performance on broken middle-boxes that modifies TCP packets
(e.g. receive window, SACK options). Since the performance gain is
much smaller compared to the potential loss. The best solution is
to fully revert the change.
Fixes: 89fe18e44f7e ("tcp: extend F-RTO to catch more spurious timeouts")
Reported-by: Teodor Milkov <tm@del.bg>
Signed-off-by: Yuchung Cheng <ycheng@google.com>
Signed-off-by: Neal Cardwell <ncardwell@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This reverts commit cc663f4d4c97b7297fb45135ab23cfd508b35a77. While fixing
some broken middle-boxes that modifies receive window fields, it does not
address middle-boxes that strip off SACK options. The best solution is
to fully revert this patch and the root F-RTO enhancement.
Fixes: cc663f4d4c97 ("tcp: restrict F-RTO to work-around broken middle-boxes")
Reported-by: Teodor Milkov <tm@del.bg>
Signed-off-by: Yuchung Cheng <ycheng@google.com>
Signed-off-by: Neal Cardwell <ncardwell@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Julian Wiedmann says:
====================
s390/qeth: fixes 2018-02-27
please apply some more qeth patches for -net and stable.
One patch fixes a performance bug in the TSO path. Then there's several
more fixes for IP management on L3 devices - including a revert, so that
the subsequent fix cleanly applies to earlier kernels.
The final patch takes care of a race in the control IO code that causes
qeth to miss the cmd response, and subsequently trigger device recovery.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
If multiple IPA commands are build & sent out concurrently,
fill_ipacmd_header() may assign a seqno value to a command that's
different from what send_control_data() later assigns to this command's
reply.
This is due to other commands passing through send_control_data(),
and incrementing card->seqno.ipa along the way.
So one IPA command has no reply that's waiting for its seqno, while some
other IPA command has multiple reply objects waiting for it.
Only one of those waiting replies wins, and the other(s) times out and
triggers a recovery via send_ipa_cmd().
Fix this by making sure that the same seqno value is assigned to
a command and its reply object.
Do so immediately before submitting the command & while holding the
irq_pending "lock", to produce nicely ascending seqnos.
As a side effect, *all* IPA commands now use a reply object that's
waiting for its actual seqno. Previously, early IPA commands that were
submitted while the card was still DOWN used the "catch-all" IDX seqno.
Signed-off-by: Julian Wiedmann <jwi@linux.vnet.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Current code ("qeth_l3_ip_from_hash()") matches a queried address object
against objects in the IP table by IP address, Mask/Prefix Length and
MAC address ("qeth_l3_ipaddrs_is_equal()"). But what callers actually
require is either
a) "is this IP address registered" (ie. match by IP address only),
before adding a new address.
b) or "is this address object registered" (ie. match all relevant
attributes), before deleting an address.
Right now
1. the ADD path is too strict in its lookup, and eg. doesn't detect
conflicts between an existing NORMAL address and a new VIPA address
(because the NORMAL address will have mask != 0, while VIPA has
a mask == 0),
2. the DELETE path is not strict enough, and eg. allows del_rxip() to
delete a VIPA address as long as the IP address matches.
Fix all this by adding helpers (_addr_match_ip() and _addr_match_all())
that do the appropriate checking.
Note that the ADD path for NORMAL addresses is special, as qeth keeps
track of how many times such an address is in use (and there is no
immediate way of returning errors to the caller). So when a requested
NORMAL address _fully_ matches an existing one, it's not considered a
conflict and we merely increment the refcount.
Fixes: 5f78e29ceebf ("qeth: optimize IP handling in rx_mode callback")
Signed-off-by: Julian Wiedmann <jwi@linux.vnet.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This reverts commit cb816192d986f7596009dedcf2201fe2e5bc2aa7.
The issue this attempted to fix never actually occurs.
l3_add_rxip() checks (via l3_ip_from_hash()) if the requested address
was previously added to the card. If so, it returns -EEXIST and doesn't
call l3_add_ip().
As a result, the "address exists" path in l3_add_ip() is never taken
for rxip addresses, and this patch had no effect.
Fixes: cb816192d986 ("s390/qeth: fix using of ref counter for rxip addresses")
Signed-off-by: Julian Wiedmann <jwi@linux.vnet.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Registering an IPv4 address with the HW takes quite a while, so we
temporarily drop the ip_htable lock. Any concurrent add/remove of the
same IP adjusts the IP's use count, and (on remove) is then blocked by
addr->in_progress.
After the register call has completed, we check the use count for
concurrently attempted add/remove calls - and possibly straight-away
deregister the IP again. This happens via l3_delete_ip(), which
1) looks up the queried IP in the htable (getting a reference to the
*same* queried object),
2) deregisters the IP from the HW, and
3) frees the IP object.
The caller in l3_add_ip() then does a second free on the same object.
For this case, skip all the extra checks and lookups in l3_delete_ip()
and just deregister & free the IP object ourselves.
Fixes: 5f78e29ceebf ("qeth: optimize IP handling in rx_mode callback")
Signed-off-by: Julian Wiedmann <jwi@linux.vnet.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>