Currently macvtap uses rcu_bh functions in its
user facing fuction macvtap_get_user() and macvtap_put_user().
However, its packet handlers use normal rcu as the rcu_read_lock()
is taken in netif_receive_skb(). We can safely discontinue
the usage or rcu with bh disabled.
Signed-off-by: Vlad Yasevich <vyasevic@redhat.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Macvtap uses a private lock to protect the relationship between
macvtap_queue and macvlan_dev. The private lock is not needed
since the relationship is managed by user via open(), release(),
and dellink() calls. dellink() already happens under rtnl, so
we can safely convert open() and release(), and use it in ioctl()
as well.
Suggested by Eric Dumazet.
Signed-off-by: Vlad Yasevich <vyasevic@redhat.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
select/poll busy-poll support.
Split sysctl value into two separate ones, one for read and one for poll.
updated Documentation/sysctl/net.txt
Add a new poll flag POLL_LL. When this flag is set, sock_poll will call
sk_poll_ll if possible. sock_poll sets this flag in its return value
to indicate to select/poll when a socket that can busy poll is found.
When poll/select have nothing to report, call the low-level
sock_poll again until we are out of time or we find something.
Once the system call finds something, it stops setting POLL_LL, so it can
return the result to the user ASAP.
Signed-off-by: Eliezer Tamir <eliezer.tamir@linux.intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
No need to have an extra ret variable when we directly can return
the value of sctp_get_port_local().
Signed-off-by: Daniel Borkmann <dborkman@redhat.com>
Acked-by: Vlad Yasevich <vyasevich@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Rather instead of having the endpoint clean the garbage from the
socket, use a sk_destruct handler sctp_destruct_sock(), that does
the job for that when there are no more references on the socket.
At least do this for our crypto transform through crypto_free_hash()
that is allocated when in listening state.
Also, perform sctp_put_port() only when sk is valid. At a later
point in time we can still determine if there's an option of
placing this into sk_prot->unhash() or sctp_endpoint_free() without
any races. For now, leave it in sctp_endpoint_destroy() though.
Signed-off-by: Daniel Borkmann <dborkman@redhat.com>
Acked-by: Vlad Yasevich <vyasevich@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
A trailing newline has been forgotten to add into the WARN().
Signed-off-by: Daniel Borkmann <dborkman@redhat.com>
Acked-by: Vlad Yasevich <vyasevich@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Currently, SCTP code defines its own timeval functions (since timeval
is rarely used inside the kernel by others), namely tv_lt() and
TIMEVAL_ADD() macros, that operate on SCTP cookie expiration.
We might as well remove all those, and operate directly on ktime
structures for a couple of reasons: ktime is available on all archs;
complexity of ktime calculations depending on the arch is less than
(reduces to a simple arithmetic operations on archs with
BITS_PER_LONG == 64 or CONFIG_KTIME_SCALAR) or equal to timeval
functions (other archs); code becomes more readable; macros can be
thrown out.
Signed-off-by: Daniel Borkmann <dborkman@redhat.com>
Acked-by: Vlad Yasevich <vyasevich@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Add two ktime helper functions that i) convert a given msec value to
a ktime structure and ii) that adds a msec value to a ktime structure.
Signed-off-by: Daniel Borkmann <dborkman@redhat.com>
Acked-by: Vlad Yasevich <vyasevich@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
We do neither ship a test_frame.h, nor will this be compatible with
the 2.5 out-of-tree lksctp kernel test suite anyway. So remove this
artefact.
Signed-off-by: Daniel Borkmann <dborkman@redhat.com>
Acked-by: Vlad Yasevich <vyasevich@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Should not allow negative num_vfs
Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.com>
Signed-off-by: Vladimir Sokolovsky <vlad@mellanox.com>
Signed-off-by: Amir Vadai <amirv@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Warning prints when there are command timeout to help debugging future
failures.
Signed-off-by: Dotan Barak <dotanb@dev.mellanox.com>
Signed-off-by: Amir Vadai <amirv@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
It is not safe to use sscanf.
Signed-off-by: Dotan Barak <dotanb@dev.mellanox.com>
Signed-off-by: Vladimir Sokolovsky <vlad@mellanox.com>
Signed-off-by: Amir Vadai <amirv@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Since this variable is now part of a structure and not allocated dynamically,
this test is irrelevant now.
Signed-off-by: Dotan Barak <dotanb@dev.mellanox.com>
Signed-off-by: Amir Vadai <amirv@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Print a warning when a TX timeout is detected
Signed-off-by: Yevgeny Petrilin <yevgenyp@mellanox.com>
Signed-off-by: Eugenia Emantayev <eugenia@mellanox.com>
Signed-off-by: Amir Vadai <amirv@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The RX rings were cleaned while there was still possible RX traffic completion
handling.
Change the sequance of events so that the port is closed and the QPs are being
stopped before RX cleanup.
Signed-off-by: Eugenia Emantayev <eugenia@mellanox.com>
Signed-off-by: Amir Vadai <amirv@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The port vlan table size is 126 (used for IBoE) so after 126 we will
not have space and the user need to see it only in debug print and not
error.
Signed-off-by: Aviad Yehezkel <aviadye@mellanox.com>
Reviewed-by: Yevgeny Petrilin <yevgenyp@mellanox.com>
Signed-off-by: Eugenia Emantayev <eugenia@mellanox.com>
Signed-off-by: Amir Vadai <amirv@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
To avoid a race between the open function and everything that happens after
register_netdev() move it to be the last operation called.
Signed-off-by: Eugenia Emantayev <eugenia@mellanox.com>
Signed-off-by: Amir Vadai <amirv@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
There are no counters allocated to the eth device when the port is down, so
this query is meaningless at that time.
It also leads to querying incorrect counters (since the counter_index is not
valid when the device port is down).
Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.com>
Signed-off-by: Amir Vadai <amirv@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Wrong condition was used when calling iounmap.
Signed-off-by: Dotan Barak <dotanb@dev.mellanox.com>
Signed-off-by: Amir Vadai <amirv@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
If the tokenized ip address is re-set on an interface we depend on the
arrival of a new router advertisment to call addrconf_verify to clean
up the old address (which valid_lft is now set to 0). Old addresses can
linger around for a longer time if e.g. the source of router advertisments
vanishes.
So, call addrconf_verify immediately after setting the new tokenized
address to get rid of the old tokenized addresses.
Cc: Daniel Borkmann <dborkman@redhat.com>
Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Acked-by: Daniel Borkmann <dborkman@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The reason behind this change is that as soon as we delete
the last ipv6 address of an interface we also lose the
/proc/sys/net/ipv6/conf/<interface> directory. This seems to be a
usability problem for me.
I don't see any reason why we should shutdown ipv6 on that interface in
such cases.
Cc: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch splits the timers for duplicate address detection and router
solicitations apart. The router solicitations timer goes into inet6_dev
and the dad timer stays in inet6_ifaddr.
The reason behind this patch is to reduce the number of unneeded router
solicitations send out by the host if additional link-local addresses
are created. Currently we send out RS for every link-local address on
an interface.
If the RS timer fires we pick a source address with ipv6_get_lladdr. This
change could hurt people adding additional link-local addresses and
specifying these addresses in the radvd clients section because we
no longer guarantee that we use every ll address as source address in
router solicitations.
Cc: Flavio Leitner <fleitner@redhat.com>
Cc: Hideaki YOSHIFUJI <yoshfuji@linux-ipv6.org>
Cc: David Stevens <dlstevens@us.ibm.com>
Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Reviewed-by: Flavio Leitner <fbl@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Eric Dumazet <edumazet@google.com>
mlx4 exclusively uses order-2 allocations in RX path, which are
likely to fail under memory pressure.
We therefore drop frames more than needed.
This patch tries order-3, order-2, order-1 and finally order-0
allocations to keep good performance, yet allow allocations if/when
memory gets fragmented.
By using larger pages, and avoiding unnecessary get_page()/put_page()
on compound pages, this patch improves performance as well, lowering
false sharing on struct page.
Also use GFP_KERNEL allocations in initialization path, as allocating 12
MB (390 order-3 pages) can easily fail with GFP_ATOMIC.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Amir Vadai <amirv@mellanox.com>
Acked-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Ben Hutchings says:
====================
1. Make EEH recovery work when using legacy interrupts, from Alexandre
Rames.
2. Enable accelerated RFS for VLAN-tagged flows, from Andy Lutomirski.
3. Improve performance for non-TCP (and particularly UDP) traffic, which
regressed in 3.10 when we switched to always allocating paged RX
buffers. Partly by Jon Cooper.
4. Some minor bug fixes to IOMMU detection, timestamping capabilities,
and IRQ cleanup on the probe failure path.
I've dropped the RX skb cache, which improved some benchmarks but
perhaps needs some reworking to be more generally useful.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Since commit 82dc3c63c6
"net: introduce NAPI_POLL_WEIGHT" network drivers receive a warning
when they use napi weight higher than NAPI_POLL_WEIGHT. This patch
reduces QETH_NAPI_WEIGHT from 128 to 64 (NAPI_POLL_WEIGHT).
Signed-off-by: Sebastian Ott <sebott@linux.vnet.ibm.com>
Signed-off-by: Frank Blaschka <blaschka@linux.vnet.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
When the initial MTU size is changed prior to any activity on the device
(e.g. by attaching a z/VM vNIC already configured in Linux to a guestLAN),
we call dev_kfree_skb_irq(NULL) which results in a kernel panic.
Adding a proper check for NULL pointers to address this issue.
Signed-off-by: Stefan Raspl <raspl@linux.vnet.ibm.com>
Signed-off-by: Frank Blaschka <blaschka@linux.vnet.ibm.com>
Reviewed-by: Ursula Braun <braunu@de.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
blkt settings (or LAN idle settings) for an OSA Express card
determine when and how often an OSA Express card tells the
operating system about new incoming packets. The semantic of
these settings has changed starting with OSA Express3. Currently
the qeth standard settings apply to OSA Express2 and older
generations of OSA Express cards, while new generations of OSA
Express cards require extra coding of their reasonable default.
To cover future OSA Express generations the qeth default standard
blkt setting is now the desired setting for OSA generations
starting with OSA Express3, while the fixed set of older OSA
Express cards receives its blkt settings explicitly.
Signed-off-by: Ursula Braun <ursula.braun@de.ibm.com>
Signed-off-by: Frank Blaschka <blaschka@linux.vnet.ibm.com>
Reviewed-by: Stefan Raspl <raspl@linux.vnet.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Increase the default MTU for real OSA devices in layer 2 mode
to 1500 Bytes for increased compatibility.
Signed-off-by: Stefan Raspl <raspl@linux.vnet.ibm.com>
Signed-off-by: Frank Blaschka <blaschka@linux.vnet.ibm.com>
Reviewed-by: Ursula Braun <ursula.braun@de.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
If someone is interested to dump something they may consider to use
print_hex_dump() or print_hex_dump_bytes() kernel helpers.
Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Signed-off-by: Ursula Braun <ursula.braun@de.ibm.com>
Signed-off-by: Frank Blaschka <blaschka@linux.vnet.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch solves several sparse issues as well as an unneeded semicolon
found via coccinelle.
Signed-off-by: Yuval Mintz <yuvalmin@broadcom.com>
Signed-off-by: Dmitry Kravkov <dmitry@broadcom.com>
Signed-off-by: Ariel Elior <ariele@broadcom.com>
Signed-off-by: Eilon Greenstein <eilong@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
commit f88c91ddba ("ipv6: statically link
register_inet6addr_notifier()" added following sparse warnings :
net/ipv6/addrconf_core.c:83:5: warning: symbol
'register_inet6addr_notifier' was not declared. Should it be static?
net/ipv6/addrconf_core.c:89:5: warning: symbol
'unregister_inet6addr_notifier' was not declared. Should it be static?
net/ipv6/addrconf_core.c:95:5: warning: symbol
'inet6addr_notifier_call_chain' was not declared. Should it be static?
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Cong Wang <amwang@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
struct tcp_fastopen_context has a field named tfm, which is a pointer
to a crypto_cipher structure.
It currently has a __rcu annotation, which is not needed at all.
tcp_fastopen_ctx is the pointer fetched by rcu_dereference(), but once
we have a pointer to current tcp_fastopen_context, we do not use/need
rcu_dereference() to access tfm.
This fixes a lot of sparse errors like the following :
net/ipv4/tcp_fastopen.c:21:31: warning: incorrect type in argument 1 (different address spaces)
net/ipv4/tcp_fastopen.c:21:31: expected struct crypto_cipher *tfm
net/ipv4/tcp_fastopen.c:21:31: got struct crypto_cipher [noderef] <asn:4>*tfm
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Jerry Chu <hkchu@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Currently, there is no good possibility to debug netlink traffic that
is being exchanged between kernel and user space. Therefore, this patch
implements a netlink virtual device, so that netlink messages will be
made visible to PF_PACKET sockets. Once there was an approach with a
similar idea [1], but it got forgotten somehow.
I think it makes most sense to accept the "overhead" of an extra netlink
net device over implementing the same functionality from PF_PACKET
sockets once again into netlink sockets. We have BPF filters that can
already be easily applied which even have netlink extensions, we have
RX_RING zero-copy between kernel- and user space that can be reused,
and much more features. So instead of re-implementing all of this, we
simply pass the skb to a given PF_PACKET socket for further analysis.
Another nice benefit that comes from that is that no code needs to be
changed in user space packet analyzers (maybe adding a dissector, but
not more), thus out of the box, we can already capture pcap files of
netlink traffic to debug/troubleshoot netlink problems.
Also thanks goes to Thomas Graf, Flavio Leitner, Jesper Dangaard Brouer.
[1] http://marc.info/?l=linux-netdev&m=113813401516110
Signed-off-by: Daniel Borkmann <dborkman@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Similarly to the networking receive path with ptype_all taps, we add
the possibility to register netdevices that are for ARPHRD_NETLINK to
the netlink subsystem, so that those can be used for netlink analyzers
resp. debuggers. We do not offer a direct callback function as out-of-tree
modules could do crap with it. Instead, a netdevice must be registered
properly and only receives a clone, managed by the netlink layer. Symbols
are exported as GPL-only.
Signed-off-by: Daniel Borkmann <dborkman@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This small patch adds the definition of ARPHRD_NETLINK which can for
example be used by netlink monitoring devices as device type. So that
sockaddr_ll can pick it up and based on that choose the correct packet
dissector.
Signed-off-by: Daniel Borkmann <dborkman@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This restores commits:
c573972c111a5904342cda2e2c2149
which initially accidently went into 'net', were
reverted there, and then properly placed into 'net-next'.
But the next net --> net-next merge accidently wiped them
out again.
Reported-by: Paul Gortmaker <paul.gortmaker@windriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The device::iommu_group field may be set even if no IOMMU is in use.
iommu_present() is still a better indicator, although it doesn't tell
us whether *our* device is affected.
Reported-by: Alex Williamson <alex.williamson@redhat.com>
Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>
The lifetime of an irq_cpu_rmap is odd: we have to allocate it before
installing IRQ handlers and free it before removing the IRQ handlers.
As a result of this asymmetry, it was omitted from some failure paths.
On another failure path, we could try to remove IRQ handlers we
had not yet installed.
Move the irq_cpu_rmap allocation and freeing alongside IRQ handler
installation and removal, in efx_nic_{init,fini}_interrupts().
Count the number of IRQ handlers successfully installed and only
remove those on the failure path.
Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>
GRO can handle non-TCP packets and pass them up without coalescing,
but it has to do some extra work to parse the packet which we can
bypass using the hardware parse result. (This condition yields a
false negative for TCP/IPv6 packets received by Falcon, but its
performance is already poor in that case due to lack of checksum
offload.)
Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>
As far as I know, the hardware doesn't support matching on both IP
fields and vlan tag, but it can at least match on the IP fields.
Signed-off-by: Andy Lutomirski <luto@amacapital.net>
Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>
The kernel can generate software receive timestamps and we should
report those for all ports regardless of hardware capabilities.
Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>
PCI legacy interrupts are level-triggered, and we cannot mask them up
on an isolated device. Instead, disable the IRQ at the controller
until we have recovered.
Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>
This fixes an issue caused by submit 78c3bcc5d1
`bnx2x: Improve PF behaviour toward VF', which made the bnx2x driver fail
compilation when PCI_IOV is not set.
Signed-off-by: Yuval Mintz <yuvalmin@broadcom.com>
Signed-off-by: Ariel Elior <ariele@broadcom.com>
Signed-off-by: Eilon Greenstein <eilong@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Callers of skb_seq_read() are currently forced to call skb_abort_seq_read()
even when consuming all the data because the last call to skb_seq_read (the
one that returns 0 to indicate the end) fails to unmap the last fragment page.
With this patch callers will be allowed to traverse the SKB data by calling
skb_prepare_seq_read() once and repeatedly calling skb_seq_read() as originally
intended (and documented in the original commit 677e90eda), that is, only call
skb_abort_seq_read() if the sequential read is actually aborted.
Signed-off-by: Wedson Almeida Filho <wedsonaf@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
John W. Linville says:
====================
I would guess that this is the last big wireless pull request before
the 3.11 merge window...
Regarding the mac80211 bits, Johannes says:
"I have a number of mesh fixes and improvements from Colleen, Jacob,
Ashok and Thomas, powersave fixes in mac80211 from Alex, improved
management-TX from Antonio, and a few various things, including locking
fixes, from others and myself. Overall though, nothing really stands
out."
As for the iwlwifi bits, Johannes says:
"Emmanuel contributed two AP mode fixes, removed an unused field, fixed a
comment and added a warning for something that shouldn't happen in
practice, and I removed the declaration of a function that doesn't even
exist and cleaned up a small include."
"This time I have a number of cleanups, a small fix from Emmanuel and two
performance improvements that combined reduce our driver's CPU
utilisation as much as 75% in high TX-throughput scenarios."
"These two patches fix two issues with using rfkill randomly during
traffic, which would then cause our driver to stop working and not be
able to recover at all."
Regarding the ath6kl bits, Kalle says:
"Here are few simple patches for ath6kl. We have a suspend crash fix for
USB from Shafi, use of mac_pton(), a compiler warning fix and a fix for
module initialisation error path."
Kalle also sends the biggest single item of note, the new ath10k
driver for Qualcomm Atheros 802.11ac CQA98xx devices.
Included is an NFC pull, of which Samuel says:
"These are the pending NFC patches for the 3.11 merge window.
It contains the pending fixes that were on nfc-fixes (nfc-fixes-3.10-2),
along with a few more for the pn544 and pn533 drivers, the LLCP
disconnection path and an LLCP memory leak.
Highlights for this one are:
- An initial secure element API. NFC chipsets can carry an embedded
secure element or get access to the SIM one. In both cases they
control the secure elements and this API provides a way to discover,
enable and disable the available SEs. It also exports that to
userspace in order for SE focused middleware to actually do something
with them (e.g. payments).
- NCI over SPI support. SPI is the most complex NCI specified transport
layer and we now have support for it in the kernel. The next step will
be to implement drivers for NCI chipsets using this transport like
e.g. bcm2079x.
- NFC p2p hardware simulation driver. We now have an nfcsim driver that
is mostly a loopback device between 2 NFC interfaces. It also
implements the rest of the NFC core API like polling and target
detection. This driver, with neard running on top of it, allows us to
completely test the LLCP, SNEP and Handover implementation without
physical hardware.
- A Firmware update netlink API. Most (All ?) HCI chipsets have a
special firmware update mode where applications can push a new
firmware that will be flashed. We now have a netlink API for providing
that mode to e.g. nfctool."
On top of all that, there are a variety of updates to brcmfmac,
iwlegacy, rtlwifi, wil6210, and the TI wl12xx drivers. As usual,
the bcma and ssb busses get a little love as well, as do a handful
of others here and there.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
There is a typo here, "i" vs "j", so we would crash on module_exit().
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>