hrtimer_init_on_stack() needs a matching call to
destroy_hrtimer_on_stack(), so both need to be exported.
Signed-off-by: Guenter Roeck <linux@roeck-us.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
l2tp_ip6 tunnel and session lookups were still using init_net, although
the l2tp core infrastructure already supports lookups keyed by 'net'.
As a result, l2tp_ip6_recv discarded packets for tunnels/sessions
created in namespaces other than the init_net.
Fix, by using dev_net(skb->dev) or sock_net(sk) where appropriate.
Signed-off-by: Shmulik Ladkani <shmulik.ladkani@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Clarify how secure_redirects works. Mention that RFC1122 always applies.
Signed-off-by: Eric Garver <e@erig.me>
Signed-off-by: David S. Miller <davem@davemloft.net>
Fix a logic error to avoid potential null pointer dereference.
Signed-off-by: Baozeng Ding <sploving1@gmail.com>
Reviewed-by: Stefan Schmidt<stefan@osg.samsung.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Since NAPI works by shutting down event interrupts when theres
work and turning them on when theres none, the net driver must
make sure that interrupts are disabled when it reschedules polling.
By calling napi_reschedule, the driver switches to polling mode,
therefor there should be no interrupt interference.
Any received packets will be handled in nps_enet_poll by polling the HW
indication of received packet until all packets are handled.
Signed-off-by: Elad Kanfi <eladkan@mellanox.com>
Acked-by: Noam Camus <noamca@mellanox.com>
Tested-by: Alexey Brodkin <abrodkin@synopsys.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Use %*ph specifier to dump small buffers in hex format instead doing this
byte-by-byte.
Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
If we pass ERR_PTR(-EFAULT) to kfree() then it's going to oops.
Fixes: 2ece068e1b ('ptp: use memdup_user().')
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Acked-by: Richard Cochran <richardcochran@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
A previous patch added the fou6.ko module, but that failed to link
in a couple of configurations:
net/built-in.o: In function `ip6_tnl_encap_add_fou_ops':
net/ipv6/fou6.c:88: undefined reference to `ip6_tnl_encap_add_ops'
net/ipv6/fou6.c:94: undefined reference to `ip6_tnl_encap_add_ops'
net/ipv6/fou6.c:97: undefined reference to `ip6_tnl_encap_del_ops'
net/built-in.o: In function `ip6_tnl_encap_del_fou_ops':
net/ipv6/fou6.c:106: undefined reference to `ip6_tnl_encap_del_ops'
net/ipv6/fou6.c:107: undefined reference to `ip6_tnl_encap_del_ops'
If CONFIG_IPV6=m, ip6_tnl_encap_add_ops/ip6_tnl_encap_del_ops
are in a module, but fou6.c can still be built-in, and that
obviously fails to link.
Also, if CONFIG_IPV6=y, but CONFIG_IPV6_TUNNEL=m or
CONFIG_IPV6_TUNNEL=n, the same problem happens for a different
reason.
This adds two new silent Kconfig symbols to work around both
problems:
- CONFIG_IPV6_FOU is now always set to 'm' if either CONFIG_NET_FOU=m
or CONFIG_IPV6=m
- CONFIG_IPV6_FOU_TUNNEL is set implicitly when IPV6_FOU is enabled
and NET_FOU_IP_TUNNELS is also turned out, and it will ensure
that CONFIG_IPV6_TUNNEL is also available.
The options could be made user-visible as well, to give additional
room for configuration, but it seems easier not to bother users
with more choice here.
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Fixes: aa3463d65e ("fou: Add encap ops for IPv6 tunnels")
Signed-off-by: David S. Miller <davem@davemloft.net>
A recent cleanup moved MAX_IPTUN_ENCAP_OPS along with some other
definitions, but it is now invisible when CONFIG_INET is
not defined, but still referenced from ip6_tunnel.h:
In file included from net/xfrm/xfrm_input.c:17:0:
include/net/ip6_tunnel.h:67:17: error: 'MAX_IPTUN_ENCAP_OPS' undeclared here (not in a function)
ip6tun_encaps[MAX_IPTUN_ENCAP_OPS];
^~~~~~~~~~~~~~~~~~~
This hides the ip6_encap_hlen and ip6_tnl_encap functions inside
of CONFIG_INET so we don't run into the the problem.
Alternatively we could move the macro out of the #ifdef again to
restore the previous behavior
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Fixes: 55c2bc1432 ("net: Cleanup encap items in ip_tunnels.h")
Signed-off-by: David S. Miller <davem@davemloft.net>
Yuval Mintz says:
====================
qed*: Bug fixes
This series contain several small fixes, most of which deal with
either 100g support, sriov or bandwidth configurations.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Currently 100g devices don't support minimum/maximum BW configurations,
yet link flaps might cause the driver to attempt to do such a
configuration. Prevent this just as we do for the maximum BW.
Signed-off-by: Yuval Mintz <Yuval.Mintz@qlogic.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Adapter can support 100g in both MSIx and INTa, but not in MSI.
Signed-off-by: Sudarsana Reddy Kalluru <sudarsana.kalluru@qlogic.com>
Signed-off-by: Yuval Mintz <Yuval.Mintz@qlogic.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Some of the HW configurations are currently missing for 100g devices.
This can cause various classification issues, as well as prevent device
from fully reaching line-rate.
Signed-off-by: Yuval Mintz <Yuval.Mintz@qlogic.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
When DCBx re-negotiation is occurring, the PF's configurations for
maximum and minimum bandwidth guarantees are currently lost.
Signed-off-by: Yuval Mintz <Yuval.Mintz@qlogic.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Commit 39651abd28 ("qed: add support for dcbx") is re-configuring
the QM hw-block as part of its sequence. This is done in attention
handling context which is non-sleepable, yet memory is allocated in
this flow using GFP_KERNEL.
Signed-off-by: Sudarsana Reddy Kalluru <sudarsana.kalluru@qlogic.com>
Signed-off-by: Yuval Mintz <Yuval.Mintz@qlogic.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
PFs and VFs differ in their registered ethtool operations,
but they're using a common function for get_sset_count().
As a result, `ethtool -i' for a VF would indicate it supports
selftest, although that's not the case.
Signed-off-by: Yuval Mintz <Yuval.Mintz@qlogic.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Since driver is using a FW-based GRO implementation, this has some
effects on its ability to cope with GRO enablement/disablement.
As a result, driver must perform an inner-reload as a result of a state
change in the offload configuration of said feature.
[Failure to do so means network stack would continue to receive
aggregated packets even though user requested the feature to be disabled].
Signed-off-by: Yuval Mintz <Yuval.Mintz@qlogic.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
VF is currently ignoring the minimum provided by the API,
mistakenly using the maximum for minimum as well.
Signed-off-by: Yuval Mintz <Yuval.Mintz@qlogic.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Eric Dumazet says:
====================
net/mlx4_en: fix stats
mlx4 has various bugs in its ndo_get_stats() and related functions.
This patch series address the obvious issues.
Remaining ones will be discussed later.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
We simply can use the standard net_device stats.
We do not need to clear fields that are already 0.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Willem de Bruijn <willemb@google.com>
Cc: Eugenia Emantayev <eugenia@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
mlx4 uses a private struct net_device_stats in a vain attempt
to avoid races.
This is buggy because multiple cpus could call mlx4_en_get_stats()
at the same time, so ret_stats can not guarantee stable results.
To fix this, we need to switch to ndo_get_stats64() as this
method provides per-thread storage.
This allows to reduce mlx4_en_priv bloat.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Willem de Bruijn <willemb@google.com>
Cc: Eugenia Emantayev <eugenia@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
mlx4_en_clear_stats() clears about everything but few TX ring
fields are missing :
- queue_stopped, wake_queue, tso_packets, xmit_more
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Willem de Bruijn <willemb@google.com>
Cc: Eugenia Emantayev <eugenia@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
1) mlx4_en_xmit() can increment priv->stats.tx_dropped, but this variable
is overwritten in mlx4_en_DUMP_ETH_STATS().
2) This increment was not SMP safe, as a port might have many TX queues.
Add a per TX ring tx_dropped to fix these issues.
This is u32 as mlx4_en_DUMP_ETH_STATS() will add a 32bit field.
So lets avoid bugs with SNMP agents having to cope with partial
overwraps. (One of these agents being bond_fold_stats())
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reported-by: Willem de Bruijn <willemb@google.com>
Cc: Eugenia Emantayev <eugenia@mellanox.com>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
We have this situation: that EP hash table, contains only the EPs
that are listening, while the transports one, has the opposite.
We have to traverse both to dump all.
But when we traverse the transports one we will also get EPs that are
in the EP hash if they are listening. In this case, the EP is dumped
twice.
We will fix it by checking if the endpoint that is in the endpoint
hash table contains any ep->asoc in there, as it means we will also
find it via transport hash, and thus we can/should skip it, depending
on the filters used, like 'ss -l'.
Still, we should NOT skip it if the user is listing only listening
endpoints, because then we are not traversing the transport hash.
so we have to check idiag_states there also.
Signed-off-by: Xin Long <lucien.xin@gmail.com>
Acked-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Fix a typo in the driver, replace comma with a semicolon at the end
of statement. While using comma is a legal C here and probably does
not even generate compiler warning, it was unlikely the intention.
Signed-off-by: Marek Vasut <marex@denx.de>
Cc: David S. Miller <davem@davemloft.net>
Cc: Caesar Wang <wxt@rock-chips.com>
Cc: Heiko Stuebner <heiko@sntech.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
The memcpy() currently copies mdio_bus_data into new_bus->irq, which
makes no sense, since the mdio_bus_data structure contains more than
just irqs. The code was likely supposed to copy mdio_bus_data->irqs
into the new_bus->irq instead, so fix this.
Fixes: e7f4dc3536 ("mdio: Move allocation of interrupts into core")
Signed-off-by: Marek Vasut <marex@denx.de>
Cc: David S. Miller <davem@davemloft.net>
Cc: Giuseppe Cavallaro <peppe.cavallaro@st.com>
Cc: Alexandre Torgue <alexandre.torgue@st.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch follows Eric Dumazet's commit 7b70176421 for Atheros
atl1c driver to fix one exactly same bug in alx driver, that the
network link will be lost in 1-5 minutes after the device is up.
My laptop Lenovo Y580 with Atheros AR8161 ethernet device hit the
same problem with kernel 4.4, and it will be cured by Jarod Wilson's
commit c406700c for alx driver which get merged in 4.5. But there
are still some alx devices can't function well even with Jarod's
patch, while this patch could make them work fine. More details on
https://bugzilla.kernel.org/show_bug.cgi?id=70761
The debug shows the issue is very likely to be related with the RX
DMA address, specifically 0x...f80, if RX buffer get 0x...f80 several
times, their will be RX overflow error and device will stop working.
For kernel 4.5.0 with Jarod's patch which works fine with my
AR8161/Lennov Y580, if I made some change to the
__netdev_alloc_skb
--> __alloc_page_frag()
to make the allocated buffer can get an address with 0x...f80,
then the same error happens. If I make it to 0x...f40 or 0x....fc0,
everything will be still fine. So I tend to believe that the
0x..f80 address cause the silicon to behave abnormally.
Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=70761
Cc: Eric Dumazet <edumazet@google.com>
Cc: Johannes Berg <johannes@sipsolutions.net>
Cc: Jarod Wilson <jarod@redhat.com>
Signed-off-by: Feng Tang <feng.tang@intel.com>
Tested-by: Ole Lukoie <olelukoie@mail.ru>
Signed-off-by: David S. Miller <davem@davemloft.net>
The team_device_event() notifier calls team_compute_features() to fix
vlan_features under team->lock to protect team->port_list. The problem is
that subsequent __team_compute_features() calls netdev_change_features()
to propagate vlan_features to upper vlan devices while team->lock is still
taken. This can lead to deadlock when NETIF_F_LRO is modified on lower
devices or team device itself.
Example:
The team0 as active backup with eth0 and eth1 NICs. Both eth0 & eth1 are
LRO capable and LRO is enabled. Thus LRO is also enabled on team0.
The command 'ethtool -K team0 lro off' now hangs due to this deadlock:
dev_ethtool()
-> ethtool_set_features()
-> __netdev_update_features(team)
-> netdev_sync_lower_features()
-> netdev_update_features(lower_1)
-> __netdev_update_features(lower_1)
-> netdev_features_change(lower_1)
-> call_netdevice_notifiers(...)
-> team_device_event(lower_1)
-> team_compute_features(team) [TAKES team->lock]
-> netdev_change_features(team)
-> __netdev_update_features(team)
-> netdev_sync_lower_features()
-> netdev_update_features(lower_2)
-> __netdev_update_features(lower_2)
-> netdev_features_change(lower_2)
-> call_netdevice_notifiers(...)
-> team_device_event(lower_2)
-> team_compute_features(team) [DEADLOCK]
The bug is present in team from the beginning but it appeared after the commit
fd867d5 (net/core: generic support for disabling netdev features down stack)
that adds synchronization of features with lower devices.
Fixes: fd867d5 (net/core: generic support for disabling netdev features down stack)
Cc: Jiri Pirko <jiri@resnulli.us>
Signed-off-by: Ivan Vecera <ivecera@redhat.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Florian Fainelli says:
====================
Documentation: dsa: misc fixes
Here are some miscelaneous documentation fixes for DSA, I targeted "net"
because these are not functional code changes, but still documentation fixes
per-se.
Changes in v2:
- reword what the port_vlan_filtering is about based on feedback from Vivien and Ido
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Described what the port_vlan_filtering function is supposed to
accomplish.
Fixes: fb2dabad69 ("net: dsa: support VLAN filtering switchdev attr")
Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Reviewed-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
We no longer have a priv_size structure member since 5feebd0a8a ("net:
dsa: Remove allocation of driver private memory")
Reviewed-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com>
Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This function has been removed in 4baee937b8 ("net: dsa: remove DSA
link polling") in favor of using the PHYLIB polling mechanism.
Reviewed-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com>
Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Otherwise, if we fail to allocate new PIO buffers, our TXQs will try to
use the old ones, which aren't there any more.
Fixes: 183233bec8 "sfc: Allocate and link PIO buffers; map them with write-combining"
Signed-off-by: Edward Cree <ecree@solarflare.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Gregory CLEMENT says:
====================
Fix spinlock usage in HWBM
these two patches fix spinlock related issues introduced in v4.6. They
have been reported by Russell King and Jean-Jacques Hiblot.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
When hwbm_pool_add exited in error the spinlock was not released. This
patch fixes this issue.
Fixes: 8cb2d8bf57 ("net: add a hardware buffer management helper API")
Reported-by: Jean-Jacques Hiblot <jjhiblot@traphandler.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Gregory CLEMENT <gregory.clement@free-electrons.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The spinlock used by the hwbm functions must be initialized by the
network driver. This commit fixes this lack and the following erros when
lockdep is enabled:
INFO: trying to register non-static key.
the code is fine but needs lockdep annotation.
turning off the locking correctness validator.
[<c010ff80>] (unwind_backtrace) from [<c010bd08>] (show_stack+0x10/0x14)
[<c010bd08>] (show_stack) from [<c032913c>] (dump_stack+0xb4/0xe0)
[<c032913c>] (dump_stack) from [<c01670e4>] (__lock_acquire+0x1f58/0x2060)
[<c01670e4>] (__lock_acquire) from [<c0167dec>] (lock_acquire+0xa4/0xd0)
[<c0167dec>] (lock_acquire) from [<c06f6650>] (_raw_spin_lock_irqsave+0x54/0x68)
[<c06f6650>] (_raw_spin_lock_irqsave) from [<c058e830>] (hwbm_pool_add+0x1c/0xdc)
[<c058e830>] (hwbm_pool_add) from [<c043f4e8>] (mvneta_bm_pool_use+0x338/0x490)
[<c043f4e8>] (mvneta_bm_pool_use) from [<c0443198>] (mvneta_probe+0x654/0x1284)
[<c0443198>] (mvneta_probe) from [<c03b894c>] (platform_drv_probe+0x4c/0xb0)
[<c03b894c>] (platform_drv_probe) from [<c03b7158>] (driver_probe_device+0x214/0x2c0)
[<c03b7158>] (driver_probe_device) from [<c03b72c4>] (__driver_attach+0xc0/0xc4)
[<c03b72c4>] (__driver_attach) from [<c03b5440>] (bus_for_each_dev+0x68/0x9c)
[<c03b5440>] (bus_for_each_dev) from [<c03b65b8>] (bus_add_driver+0x1a0/0x218)
[<c03b65b8>] (bus_add_driver) from [<c03b79cc>] (driver_register+0x78/0xf8)
[<c03b79cc>] (driver_register) from [<c01018f4>] (do_one_initcall+0x90/0x1dc)
[<c01018f4>] (do_one_initcall) from [<c0900de4>] (kernel_init_freeable+0x15c/0x1fc)
[<c0900de4>] (kernel_init_freeable) from [<c06eed90>] (kernel_init+0x8/0x114)
[<c06eed90>] (kernel_init) from [<c0107910>] (ret_from_fork+0x14/0x24)
Fixes: baa11ebc0c ("net: mvneta: Use the new hwbm framework")
Reported-by: Russell King <rmk+kernel@armlinux.org.uk>
Cc: <stable@vger.kernel.org>
Signed-off-by: Gregory CLEMENT <gregory.clement@free-electrons.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Before calling the nla_parse_nested function, make sure the pointer to the
attribute is not null. This patch fixes several potential null pointer
dereference vulnerabilities in the tipc netlink functions.
Signed-off-by: Baozeng Ding <sploving1@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch fixes the coding error in determining the enable flag for
the application/protocol. The enable flag should be set for all protocols
but the eth.
Signed-off-by: Sudarsana Reddy Kalluru <sudarsana.kalluru@qlogic.com>
Signed-off-by: Yuval Mintz <Yuval.Mintz@qlogic.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch enhances ethtool link mode bitmap to include
25G/50G/100G speed along with interface modes
Signed-off-by: Vidya Sagar Ravipati <vidya@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Policer was not dumping or updating timestamps
Signed-off-by: Jamal Hadi Salim <jhs@mojatatu.com>
Acked-by: Cong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
I found a serious performance bug in packet schedulers using hrtimers.
sch_htb and sch_fq are definitely impacted by this problem.
We constantly rearm high resolution timers if some packets are throttled
in one (or more) class, and other packets are flying through qdisc on
another (non throttled) class.
hrtimer_start() does not have the mod_timer() trick of doing nothing if
expires value does not change :
if (timer_pending(timer) &&
timer->expires == expires)
return 1;
This issue is particularly visible when multiple cpus can queue/dequeue
packets on the same qdisc, as hrtimer code has to lock a remote base.
I used following fix :
1) Change htb to use qdisc_watchdog_schedule_ns() instead of open-coding
it.
2) Cache watchdog prior expiration. hrtimer might provide this, but I
prefer to not rely on some hrtimer internal.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
One timer, whose handler keeps reading on MMIO register for EEH
core to detect error in time, is started when the PCI device driver
is loaded. MMIO register can't be accessed during PE reset in EEH
recovery. Otherwise, the unexpected recursive error is triggered.
The timer isn't closed that time if the interface isn't brought
up. So the unexpected recursive error is seen during EEH recovery
when the interface is down.
This avoids the unexpected recursive EEH error by closing the timer
in qlge_io_error_detected() before EEH PE reset unconditionally. The
timer is started unconditionally after EEH PE reset in qlge_io_resume().
Also, the timer should be closed unconditionally when the device is
removed from the system permanently in qlge_io_error_detected().
Reported-by: Shriya R. Kulkarni <shriyakul@in.ibm.com>
Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
In gre6 xmit path, we are sending a GRE packet, so set fl6 proto
to IPPROTO_GRE properly.
Signed-off-by: Haishuang Yan <yanhaishuang@cmss.chinamobile.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
When creat an ip6gretap interface with an unreachable route,
the MTU is about 14 bytes larger than what was needed.
If the remote address is reachable:
ping6 2001:0:130::1 -c 2
PING 2001:0:130::1(2001:0:130::1) 56 data bytes
64 bytes from 2001:0:130::1: icmp_seq=1 ttl=64 time=1.46 ms
64 bytes from 2001:0:130::1: icmp_seq=2 ttl=64 time=81.1 ms
Signed-off-by: David S. Miller <davem@davemloft.net>
"priority" needs to be signed for the error handling to work.
Fixes: 39651abd28 ('qed: add support for dcbx.')
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Follow-up to commit e27f4a942a ("bpf: Use mount_nodev not mount_ns
to mount the bpf filesystem"), which removes the FS_USERNS_MOUNT flag.
The original idea was to have a per mountns instance instead of a
single global fs instance, but that didn't work out and we had to
switch to mount_nodev() model. The intent of that middle ground was
that we avoid users who don't play nice to create endless instances
of bpf fs which are difficult to control and discover from an admin
point of view, but at the same time it would have allowed us to be
more flexible with regard to namespaces.
Therefore, since we now did the switch to mount_nodev() as a fix
where individual instances are created, we also need to remove userns
mount flag along with it to avoid running into mentioned situation.
I don't expect any breakage at this early point in time with removing
the flag and we can revisit this later should the requirement for
this come up with future users. This and commit e27f4a942a have
been split to facilitate tracking should any of them run into the
unlikely case of causing a regression.
Fixes: b2197755b2 ("bpf: add support for persistent maps/progs")
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Submitters of device tree binding documentation may forget to CC
the subsystem maintainer if this is missing.
Signed-off-by: Geert Uytterhoeven <geert@linux-m68k.org>
Cc: David S. Miller <davem@davemloft.net>
Cc: netdev@vger.kernel.org
Signed-off-by: David S. Miller <davem@davemloft.net>
Commit fa50d974d1 ("ipv4: Namespaceify ip_default_ttl sysctl knob")
moves the default TTL assignment, and as side-effect IPv4 TTL now
has a default value only if sysctl support is enabled (CONFIG_SYSCTL=y).
The sysctl_ip_default_ttl is fundamental for IP to work properly,
as it provides the TTL to be used as default. The defautl TTL may be
used in ip_selected_ttl, through the following flow:
ip_select_ttl
ip4_dst_hoplimit
net->ipv4.sysctl_ip_default_ttl
This commit fixes the issue by assigning net->ipv4.sysctl_ip_default_ttl
in net_init_net, called during ipv4's initialization.
Without this commit, a kernel built without sysctl support will send
all IP packets with zero TTL (unless a TTL is explicitly set, e.g.
with setsockopt).
Given a similar issue might appear on the other knobs that were
namespaceify, this commit also moves them.
Fixes: fa50d974d1 ("ipv4: Namespaceify ip_default_ttl sysctl knob")
Signed-off-by: Ezequiel Garcia <ezequiel@vanguardiasur.com.ar>
Signed-off-by: David S. Miller <davem@davemloft.net>
Use memdup_user to duplicate a memory region from user-space to
kernel-space, instead of open coding using kmalloc & copy_from_user.
Signed-off-by: Muhammad Falak R Wani <falakreyaz@gmail.com>
Acked-by: Richard Cochran <richardcochran@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>