78057 Commits

Author SHA1 Message Date
Jiri Pirko
4d429c5ddc switchdev: introduce possibility to defer obj_add/del
Similar to the attr usecase, the caller knows if he is holding RTNL and is
in atomic section. So let the called to decide the correct call variant.

This allows drivers to sleep inside their ops and wait for hw to get the
operation status. Then the status is propagated into switchdev core.
This avoids silent errors in drivers.

Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-10-15 06:09:49 -07:00
Jiri Pirko
850d0cbc91 switchdev: remove pointers from switchdev objects
When object is used in deferred work, we cannot use pointers in
switchdev object structures because the memory they point at may be already
used by someone else. So rather do local copy of the value.

Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Acked-by: Scott Feldman <sfeldma@gmail.com>
Reviewed-by: John Fastabend <john.r.fastabend@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-10-15 06:09:49 -07:00
Jiri Pirko
0bc05d585d switchdev: allow caller to explicitly request attr_set as deferred
Caller should know if he can call attr_set directly (when holding RTNL)
or if he has to defer the att_set processing for later.

This also allows drivers to sleep inside attr_set and report operation
status back to switchdev core. Switchdev core then warns if status is
not ok, instead of silent errors happening in drivers.

Benefit from newly introduced switchdev deferred ops infrastructure.

Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-10-15 06:09:48 -07:00
Jiri Pirko
f7fadf3047 switchdev: make struct switchdev_attr parameter const for attr_set calls
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Reviewed-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-10-15 06:09:46 -07:00
Jiri Pirko
793f40147e switchdev: introduce switchdev deferred ops infrastructure
Introduce infrastructure which will be used internally to defer ops.
Note that the deferred ops are queued up and either are processed by
scheduled work or explicitly by user calling deferred_process function.

Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-10-15 06:09:46 -07:00
Jack Morgenstein
2b3ddf27f4 net/mlx4_core: Replace VF zero mac with random mac in mlx4_core
By design, when no default MAC addresses are set in the Hypervisor for VFs,
the VFs are passed zero-macs. When such a MAC is received by the VF, it
generates a random MAC address and registers that MAC address
with the Hypervisor.

This random mac generation is currently done in the mlx4_en module.
There is a problem, though, if the mlx4_ib module is loaded by a VF before
the mlx4_en module. In this case, for RoCE, mlx4_ib will see the un-replaced
zero-mac and register that zero-mac as part of QP1 initialization.

Having a zero-mac in the port's MAC table creates problems for a
Baseboard Management Console. The BMC occasionally sends packets with a
zero-mac destination MAC. If there is a zero-mac present in the port's
MAC table, the FW will send such BMC packets to the host driver rather than
to the wire, and BMC will stop working.

To address this problem, we move the replacement of zero-mac addresses
with random-mac addresses to procedure mlx4_slave_cap(), which is part of the
driver startup for VFs, and is before activation of mlx4_ib and mlx4_en.
As a result, zero-mac addresses will never be registered in the port MAC table
by the driver.

In addition, when mlx4_en does initialize the net device, it needs to set
the NET_ADDR_RANDOM flag in the netdev structure if the address was
randomly generated. This is done so that udev on the VM does not create
a new device name after each VF probe (VM boot and such). To accomplish this,
we add a per-port flag in mlx4_dev which gets set whenever mlx4_core replaces
a zero-mac with a randomly-generated mac. This flag is examined when mlx4_en
initializes the net-device.

Fix was suggested by Matan Barak <matanb@mellanox.com>

Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-10-14 19:14:44 -07:00
Eli Cohen
e3297246c2 net/mlx5_core: Wait for FW readiness on startup
On device initialization, wait till firmware indicates that that it is done
with initialization before proceeding to initialize the device.

Also update initialization segment layout to match driver/firmware
interface definitions.

Signed-off-by: Eli Cohen <eli@mellanox.com>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-10-14 19:14:43 -07:00
Majd Dibbiny
89d44f0a6c net/mlx5_core: Add pci error handlers to mlx5_core driver
This patch implement the pci_error_handlers for mlx5_core which allow the
driver to recover from PCI error.

Once an error is detected in the PCI, the mlx5_pci_err_detected is called
and it:
1) Marks the device to be in 'Internal Error' state.
2) Dispatches an event to the mlx5_ib to flush all the outstanding cqes
with error.
3) Returns all the on going commands with error.
4) Unloads the driver.

Afterwards, the FW is reset and mlx5_pci_slot_reset is called and it
enables the device and restore it's pci state.

If the later succeeds, mlx5_pci_resume is called, and it loads the SW
stack.

Signed-off-by: Majd Dibbiny <majd@mellanox.com>
Signed-off-by: Eli Cohen <eli@mellanox.com>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-10-14 19:14:42 -07:00
Eli Cohen
fd76ee4da5 net/mlx5_core: Fix internal error detection conditions
The detection of a fatal condition has been updated to take into account
the state reported by the device or by detecting an all ones read of the
firmware version which indicates that the device is not accessible.

Signed-off-by: Eli Cohen <eli@mellanox.com>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-10-14 19:14:41 -07:00
Eric Dumazet
f985c65c90 tcp: avoid spurious SYN flood detection at listen() time
At listen() time, there is a small window where listener is visible with
a zero backlog, triggering a spurious "Possible SYN flooding on port"
message.

Nothing prevents us from setting the correct backlog.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-10-14 19:06:32 -07:00
David S. Miller
c3503357fb linux-can-next-for-4.4-20151013
-----BEGIN PGP SIGNATURE-----
 
 iQEcBAABCgAGBQJWHSq8AAoJEP5prqPJtc/HCyMH/R4PeAJoSwlQQVTKxDsdC/x1
 Jxue9dMpEQhINoU1EpSQxaYIwg8zzFpQqcKVzoX9bgw5YdfAVgv/wrSW98Hwg/h5
 OX+QYBAvhK/1Gk0+b7fwPF323osdD/8hn4lbQorB3gEYmE4+3kKh6ivlxGNa1LfW
 VDfX23MhRF+iXFM64pnl7LR6BnflPQlGEKlWQgevR+cZDfEk+lDTRHjdAu/Hjokc
 Nwo1agptCOsS5mgE/hyLhqBc6UXSN8ytoi5acP+KtnfnLtmgw/YEt7/2QQgOOTkf
 T2zwCxFRQcePwoip7OXFwzkPsZkj3gn4XZCbTSErbqnQ28sFDbTQUTC1mB87f70=
 =0GcF
 -----END PGP SIGNATURE-----

Merge tag 'linux-can-next-for-4.4-20151013' of git://git.kernel.org/pub/scm/linux/kernel/git/mkl/linux-can-next

Marc Kleine-Budde says:

====================
pull-request: can-next 2015-09-17

this is a pull request of 4 patches for net-next/master.

Two patches are by Gerhard Bertelsmann, fixing some problems in the
sun4i driver. The patch by Arnd Bergmann stops using timeval for the
CAN broadcast manager. The last patch by Alexandre Belloni removes the
otherwise unused struct at91_can_data from the driver.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2015-10-14 18:36:58 -07:00
Paolo Abeni
02a6d6136f Revert "ipv4/icmp: redirect messages can use the ingress daddr as source"
Revert the commit e2ca690b657f ("ipv4/icmp: redirect messages
can use the ingress daddr as source"), which tried to introduce a more
suitable behaviour for ICMP redirect messages generated by VRRP routers.
However RFC 5798 section 8.1.1 states:

    The IPv4 source address of an ICMP redirect should be the address
    that the end-host used when making its next-hop routing decision.

while said commit used the generating packet destination
address, which do not match the above and in most cases leads to
no redirect packets to be generated.

Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-10-14 06:01:07 -07:00
Alexandre Belloni
42160a041d can: at91: remove at91_can_data
struct at91_can_data was used to pass a callback to the driver, allowing it
to switch the transceiver on and off. As all at91 boards are now using DT,
this is not used anymore, remove that structure.

Signed-off-by: Alexandre Belloni <alexandre.belloni@free-electrons.com>
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
2015-10-13 17:42:35 +02:00
Arnd Bergmann
ba61a8d9d7 can: avoid using timeval for uapi
The can subsystem communicates with user space using a bcm_msg_head
header, which contains two timestamps. This is problematic for
multiple reasons:

a) The structure layout is currently incompatible between 64-bit
   user space and 32-bit user space, and cannot work in compat
   mode (other than x32).

b) The timeval structure layout will change in 32-bit user
   space when we fix the y2038 overflow problem by redefining
   time_t to 64-bit, making new 32-bit user space incompatible
   with the current kernel interface.
   Cars last a long time and often use old kernels, so the actual
   users of this code are the most likely ones to migrate to y2038
   safe user space.

This tries to work around part of the problem by changing the
publicly visible user interface in the header, but not the binary
interface. Fortunately, the values passed around in the structure
are relative times and do not actually suffer from the y2038
overflow, so 32-bit is enough here.

We replace the use of 'struct timeval' with a newly defined
'struct bcm_timeval' that uses the exact same binary layout
as before and that still suffers from problem a) but not problem
b).

The downside of this approach is that any user space program
that currently assigns a timeval structure to these members
rather than writing the tv_sec/tv_usec portions individually
will suffer a compile-time error when built with an updated
kernel header. Fixing this error makes it work fine with old
and new headers though.

We could address problem a) by using '__u32' or 'int' members
rather than 'long', but that would have a more significant
downside in also breaking support for all existing 64-bit user
binaries that might be using this interface, which is likely
not acceptable.

Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Acked-by: Oliver Hartkopp <socketcan@hartkopp.net>
Cc: linux-can@vger.kernel.org
Cc: linux-api@vger.kernel.org
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
2015-10-13 17:42:34 +02:00
David Ahern
ccf3c8c3fe net: Add IPv6 support to l3mdev
Add operations to retrieve cached IPv6 dst entry from l3mdev device
and lookup IPv6 source address.

Signed-off-by: David Ahern <dsa@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-10-13 04:55:04 -07:00
Eric W. Biederman
b72775977c ipv6: Pass struct net into nf_ct_frag6_gather
The function nf_ct_frag6_gather is called on both the input and the
output paths of the networking stack.  In particular ipv6_defrag which
calls nf_ct_frag6_gather is called from both the the PRE_ROUTING chain
on input and the LOCAL_OUT chain on output.

The addition of a net parameter makes it explicit which network
namespace the packets are being reassembled in, and removes the need
for nf_ct_frag6_gather to guess.

Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
Acked-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-10-12 19:44:17 -07:00
Eric W. Biederman
19bcf9f203 ipv4: Pass struct net into ip_defrag and ip_check_defrag
The function ip_defrag is called on both the input and the output
paths of the networking stack.  In particular conntrack when it is
tracking outbound packets from the local machine calls ip_defrag.

So add a struct net parameter and stop making ip_defrag guess which
network namespace it needs to defragment packets in.

Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
Acked-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-10-12 19:44:16 -07:00
David S. Miller
9916596742 Major changes:
iwlwifi
 
 * some debugfs improvements
 * fix signedness in beacon statistics
 * deinline some functions to reduce size when device tracing is enabled
 * filter beacons out in AP mode when no stations are associated
 * deprecate firmwares version -12
 * fix a runtime PM vs. legacy suspend race
 * one-liner fix for a ToF bug
 * clean-ups in the rx code
 * small debugging improvement
 * fix WoWLAN with new firmware versions
 * more clean-ups towards multiple RX queues;
 * some rate scaling fixes and improvements;
 * some time-of-flight fixes;
 * other generic improvements and clean-ups;
 
 brcmfmac
 
 * rework code dealing with multiple interfaces
 * allow logging firmware console using debug level
 * support for BCM4350, BCM4365, and BCM4366 PCIE devices
 * fixed for legacy P2P and P2P device handling
 * correct set and get tx-power
 
 ath9k
 
 * add support for Outside Context of a BSS (OCB) mode
 
 mwifiex
 
 * add USB multichannel feature
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1.4.11 (GNU/Linux)
 
 iQEcBAABAgAGBQJWF9ciAAoJEG4XJFUm622bVaAH/3Fi4CaKrDF6L8lxSRWUZzft
 Ie2X0FC+d5knpS7dOd7iI02MuEuKCg3f6dmtDrCDFBqFohvfO5NkG4XU81jdIiWM
 Xkyxlgcy/1TuILNjQfNh/2nhjpvvHDCyptl+jimeT2VR2ITD/Vj3IOAMA5l4khyx
 OeWmgW7dT9xLwYYy20ql5QLGkbxwJlHawUw/d+3yiS+AHO+6dVGJL2OtpyrlPP/F
 0KpSj0lZY9UNRL+i6FbONDCBYeG+q/lA5G5nGXBF6zEeZ6BcuWNRcBBGr2n/6uMy
 gQMAunqBIunfYkfpEKYEPF5zoyO/wCmvPLxx56iS8okGSVw4KzQ2DtQ0leFbjBw=
 =1po3
 -----END PGP SIGNATURE-----

Merge tag 'wireless-drivers-next-for-davem-2015-10-09' of git://git.kernel.org/pub/scm/linux/kernel/git/kvalo/wireless-drivers-next

Kalle Valo says:

====================
Major changes:

iwlwifi

* some debugfs improvements
* fix signedness in beacon statistics
* deinline some functions to reduce size when device tracing is enabled
* filter beacons out in AP mode when no stations are associated
* deprecate firmwares version -12
* fix a runtime PM vs. legacy suspend race
* one-liner fix for a ToF bug
* clean-ups in the rx code
* small debugging improvement
* fix WoWLAN with new firmware versions
* more clean-ups towards multiple RX queues;
* some rate scaling fixes and improvements;
* some time-of-flight fixes;
* other generic improvements and clean-ups;

brcmfmac

* rework code dealing with multiple interfaces
* allow logging firmware console using debug level
* support for BCM4350, BCM4365, and BCM4366 PCIE devices
* fixed for legacy P2P and P2P device handling
* correct set and get tx-power

ath9k

* add support for Outside Context of a BSS (OCB) mode

mwifiex

* add USB multichannel feature
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2015-10-12 19:39:18 -07:00
Paolo Abeni
e2ca690b65 ipv4/icmp: redirect messages can use the ingress daddr as source
This patch allows configuring how the source address of ICMP
redirect messages is selected; by default the old behaviour is
retained, while setting icmp_redirects_use_orig_daddr force the
usage of the destination address of the packet that caused the
redirect.

The new behaviour fits closely the RFC 5798 section 8.1.1, and fix the
following scenario:

Two machines are set up with VRRP to act as routers out of a subnet,
they have IPs x.x.x.1/24 and x.x.x.2/24, with VRRP holding on to
x.x.x.254/24.

If a host in said subnet needs to get an ICMP redirect from the VRRP
router, i.e. to reach a destination behind a different gateway, the
source IP in the ICMP redirect is chosen as the primary IP on the
interface that the packet arrived at, i.e. x.x.x.1 or x.x.x.2.

The host will then ignore said redirect, due to RFC 1122 section 3.2.2.2,
and will continue to use the wrong next-op.

Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-10-12 19:38:02 -07:00
Eric Dumazet
d475f090bf tcp: shrink tcp_timewait_sock by 8 bytes
Reducing tcp_timewait_sock from 280 bytes to 272 bytes
allows SLAB to pack 15 objects per page instead of 14 (on x86)

Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-10-12 19:28:24 -07:00
Eric Dumazet
ed53d0ab76 net: shrink struct sock and request_sock by 8 bytes
One 32bit hole is following skc_refcnt, use it.
skc_incoming_cpu can also be an union for request_sock rcv_wnd.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-10-12 19:28:22 -07:00
Eric Dumazet
8e5eb54d30 net: align sk_refcnt on 128 bytes boundary
sk->sk_refcnt is dirtied for every TCP/UDP incoming packet.
This is a performance issue if multiple cpus hit a common socket,
or multiple sockets are chained due to SO_REUSEPORT.

By moving sk_refcnt 8 bytes further, first 128 bytes of sockets
are mostly read. As they contain the lookup keys, this has
a considerable performance impact, as cpus can cache them.

These 8 bytes are not wasted, we use them as a place holder
for various fields, depending on the socket type.

Tested:
 SYN flood hitting a 16 RX queues NIC.
 TCP listener using 16 sockets and SO_REUSEPORT
 and SO_INCOMING_CPU for proper siloing.

 Could process 6.0 Mpps SYN instead of 4.2 Mpps

 Kernel profile looked like :
    11.68%  [kernel]  [k] sha_transform
     6.51%  [kernel]  [k] __inet_lookup_listener
     5.07%  [kernel]  [k] __inet_lookup_established
     4.15%  [kernel]  [k] memcpy_erms
     3.46%  [kernel]  [k] ipt_do_table
     2.74%  [kernel]  [k] fib_table_lookup
     2.54%  [kernel]  [k] tcp_make_synack
     2.34%  [kernel]  [k] tcp_conn_request
     2.05%  [kernel]  [k] __netif_receive_skb_core
     2.03%  [kernel]  [k] kmem_cache_alloc

Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-10-12 19:28:22 -07:00
Eric Dumazet
70da268b56 net: SO_INCOMING_CPU setsockopt() support
SO_INCOMING_CPU as added in commit 2c8c56e15df3 was a getsockopt() command
to fetch incoming cpu handling a particular TCP flow after accept()

This commits adds setsockopt() support and extends SO_REUSEPORT selection
logic : If a TCP listener or UDP socket has this option set, a packet is
delivered to this socket only if CPU handling the packet matches the specified
one.

This allows to build very efficient TCP servers, using one listener per
RX queue, as the associated TCP listener should only accept flows handled
in softirq by the same cpu.
This provides optimal NUMA behavior and keep cpu caches hot.

Note that __inet_lookup_listener() still has to iterate over the list of
all listeners. Following patch puts sk_refcnt in a different cache line
to let this iteration hit only shared and read mostly cache lines.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-10-12 19:28:20 -07:00
Edward Jee
f28ea365cd sock: support per-packet fwmark
It's useful to allow users to set fwmark for an individual packet,
without changing the socket state. The function this patch adds in
sock layer can be used by the protocols that need such a feature.

Signed-off-by: Edward Hyunkoo Jee <edjee@google.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Willem de Bruijn <willemb@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-10-12 19:25:21 -07:00
Alexei Starovoitov
aaac3ba95e bpf: charge user for creation of BPF maps and programs
since eBPF programs and maps use kernel memory consider it 'locked' memory
from user accounting point of view and charge it against RLIMIT_MEMLOCK limit.
This limit is typically set to 64Kbytes by distros, so almost all
bpf+tracing programs would need to increase it, since they use maps,
but kernel charges maximum map size upfront.
For example the hash map of 1024 elements will be charged as 64Kbyte.
It's inconvenient for current users and changes current behavior for root,
but probably worth doing to be consistent root vs non-root.

Similar accounting logic is done by mmap of perf_event.

Signed-off-by: Alexei Starovoitov <ast@plumgrid.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-10-12 19:13:36 -07:00
Alexei Starovoitov
1be7f75d16 bpf: enable non-root eBPF programs
In order to let unprivileged users load and execute eBPF programs
teach verifier to prevent pointer leaks.
Verifier will prevent
- any arithmetic on pointers
  (except R10+Imm which is used to compute stack addresses)
- comparison of pointers
  (except if (map_value_ptr == 0) ... )
- passing pointers to helper functions
- indirectly passing pointers in stack to helper functions
- returning pointer from bpf program
- storing pointers into ctx or maps

Spill/fill of pointers into stack is allowed, but mangling
of pointers stored in the stack or reading them byte by byte is not.

Within bpf programs the pointers do exist, since programs need to
be able to access maps, pass skb pointer to LD_ABS insns, etc
but programs cannot pass such pointer values to the outside
or obfuscate them.

Only allow BPF_PROG_TYPE_SOCKET_FILTER unprivileged programs,
so that socket filters (tcpdump), af_packet (quic acceleration)
and future kcm can use it.
tracing and tc cls/act program types still require root permissions,
since tracing actually needs to be able to see all kernel pointers
and tc is for root only.

For example, the following unprivileged socket filter program is allowed:
int bpf_prog1(struct __sk_buff *skb)
{
  u32 index = load_byte(skb, ETH_HLEN + offsetof(struct iphdr, protocol));
  u64 *value = bpf_map_lookup_elem(&my_map, &index);

  if (value)
	*value += skb->len;
  return 0;
}

but the following program is not:
int bpf_prog1(struct __sk_buff *skb)
{
  u32 index = load_byte(skb, ETH_HLEN + offsetof(struct iphdr, protocol));
  u64 *value = bpf_map_lookup_elem(&my_map, &index);

  if (value)
	*value += (u64) skb;
  return 0;
}
since it would leak the kernel address into the map.

Unprivileged socket filter bpf programs have access to the
following helper functions:
- map lookup/update/delete (but they cannot store kernel pointers into them)
- get_random (it's already exposed to unprivileged user space)
- get_smp_processor_id
- tail_call into another socket filter program
- ktime_get_ns

The feature is controlled by sysctl kernel.unprivileged_bpf_disabled.
This toggle defaults to off (0), but can be set true (1).  Once true,
bpf programs and maps cannot be accessed from unprivileged process,
and the toggle cannot be set back to false.

Signed-off-by: Alexei Starovoitov <ast@plumgrid.com>
Reviewed-by: Kees Cook <keescook@chromium.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-10-12 19:13:35 -07:00
Scott Feldman
464314ea6c switchdev: skip over ports returning -EOPNOTSUPP when recursing ports
This allows us to recurse over all the ports, skipping over unsupporting
ports.  Without the change, the recursion would stop at first unsupported
port.

Signed-off-by: Scott Feldman <sfeldma@gmail.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-10-12 05:20:20 -07:00
Scott Feldman
f55ac58ae6 switchdev: add bridge ageing_time attribute
Setting the stage to push bridge-level attributes down to port driver so
hardware can be programmed accordingly.  Bridge-level attribute example is
ageing_time.  This is a per-bridge attribute, not a per-bridge-port attr.

Signed-off-by: Scott Feldman <sfeldma@gmail.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-10-12 05:20:19 -07:00
Vivien Didelot
8057b3e7a1 net: dsa: use switchdev obj in port_fdb_del
For consistency with the FDB add operation, propagate the
switchdev_obj_port_fdb structure in the DSA drivers.

Signed-off-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com>
Reviewed-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-10-11 05:28:52 -07:00
Vivien Didelot
1f36faf269 net: dsa: push prepare phase in port_fdb_add
Now that the prepare phase is pushed down to the DSA drivers, propagate
it to the port_fdb_add function.

Signed-off-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com>
Reviewed-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-10-11 05:28:50 -07:00
Vivien Didelot
146a32067b net: dsa: add port_fdb_prepare
Push the prepare phase for FDB operations down to the DSA drivers, with
a new port_fdb_prepare function. Currently only mv88e6xxx is affected.

Signed-off-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com>
Reviewed-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-10-11 05:28:49 -07:00
David S. Miller
7bcfeead48 Merge branch 'for-upstream' of git://git.kernel.org/pub/scm/linux/kernel/git/bluetooth/bluetooth-next
Johan Hedberg says:

====================
pull request: bluetooth-next 2015-10-08

Here's another set of Bluetooth & 802.15.4 patches for the 4.4 kernel.

802.15.4:
 - Many improvements & fixes to the mrf24j40 driver
 - Fixes and cleanups to nl802154, mac802154 & ieee802154 code

Bluetooth:
 - New chipset support in btmrvl driver
 - Fixes & cleanups to btbcm, btmrvl, bpa10x & btintel drivers
 - Support for vendor specific diagnostic data through common API
 - Cleanups to the 6lowpan code
 - New events & message types for monitor channel

Please let me know if there are any issues pulling. Thanks.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2015-10-11 05:15:30 -07:00
Eric Dumazet
e446f9dfe1 net: synack packets can be attached to request sockets
selinux needs few changes to accommodate fact that SYNACK messages
can be attached to a request socket, lacking sk_security pointer

(Only syncookies are still attached to a TCP_LISTEN socket)

Adds a new sk_listener() helper, and use it in selinux and sch_fq

Fixes: ca6fb0651883 ("tcp: attach SYNACK messages to request sockets instead of listener")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reported by: kernel test robot <ying.huang@linux.intel.com>
Cc: Paul Moore <paul@paul-moore.com>
Cc: Stephen Smalley <sds@tycho.nsa.gov>
Cc: Eric Paris <eparis@parisplace.org>
Acked-by: Paul Moore <paul@paul-moore.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-10-11 05:05:06 -07:00
Alexei Starovoitov
ff936a04e5 bpf: fix cb access in socket filter programs
eBPF socket filter programs may see junk in 'u32 cb[5]' area,
since it could have been used by protocol layers earlier.

For socket filter programs used in af_packet we need to clean
20 bytes of skb->cb area if it could be used by the program.
For programs attached to TCP/UDP sockets we need to save/restore
these 20 bytes, since it's used by protocol layers.

Remove SK_RUN_FILTER macro, since it's no longer used.

Long term we may move this bpf cb area to per-cpu scratch, but that
requires addition of new 'per-cpu load/store' instructions,
so not suitable as a short term fix.

Fixes: d691f9e8d440 ("bpf: allow programs to write to certain skb fields")
Reported-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Alexei Starovoitov <ast@plumgrid.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-10-11 04:40:05 -07:00
Yaowei Bai
0cbf334376 net/core: lockdep_rtnl_is_held can be boolean
This patch makes lockdep_rtnl_is_held return bool due to this
particular function only using either one or zero as its return
value.

In another patch lockdep_is_held is also made return bool.

No functional change.

Signed-off-by: Yaowei Bai <bywxiaobai@163.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-10-09 07:49:06 -07:00
Yaowei Bai
f06cc7b284 net/inetdevice: bad_mask can be boolean
This patch makes bad_mask return bool due to this particular function
only using either one or zero as its return value.

No functional change.

Signed-off-by: Yaowei Bai <bywxiaobai@163.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-10-09 07:49:05 -07:00
Yaowei Bai
c3225164cf net/inetdevice: inet_ifa_match can be boolean
This patch makes inet_ifa_match return bool due to this
particular function only using either one or zero as its return
value.

No functional change.

Signed-off-by: Yaowei Bai <bywxiaobai@163.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-10-09 07:49:03 -07:00
Yaowei Bai
0c6119d99b net/dccp: dccp_list_has_service can be boolean
This patch makes dccp_list_has_service return bool due to this
particular function only using either one or zero as its return
value.

No functional change.

Signed-off-by: Yaowei Bai <bywxiaobai@163.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-10-09 07:49:02 -07:00
Yaowei Bai
d6fbaea5f6 net/can: can_dropped_invalid_skb can be boolean
This patch makes can_dropped_invalid_skb return bool due to this
particular function only using either one or zero as its return
value.

No functional change.

Signed-off-by: Yaowei Bai <bywxiaobai@163.com>
Acked-by: Marc Kleine-Budde <mkl@pengutronix.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-10-09 07:49:01 -07:00
Yaowei Bai
875e082949 net/nfnetlink: lockdep_nfnl_is_held can be boolean
This patch makes lockdep_nfnl_is_held return bool to improve
readability due to this particular function only using either
one or zero as its return value.

No functional change.

Signed-off-by: Yaowei Bai <bywxiaobai@163.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-10-09 07:49:00 -07:00
Yaowei Bai
35498edc64 net/ieee80211: ieee80211_is_* can be boolean
This patch makes ieee80211_is_* return bool to improve
readability due to these particular functions only using either
one or zero as their return value.

No functional change.

Signed-off-by: Yaowei Bai <bywxiaobai@163.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-10-09 07:48:59 -07:00
Yaowei Bai
61d03535e4 net/netlink: lockdep_genl_is_held can be boolean
This patch makes lockdep_genl_is_held return bool to improve
readability due to this particular function only using either
one or zero as its return value.

No functional change.

Signed-off-by: Yaowei Bai <bywxiaobai@163.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-10-09 07:48:59 -07:00
Eli Cohen
ac6ea6e81a net/mlx5_core: Use private health thread for each device
Use a single threaded work queue for each device in the system instead of
using one thread for any device. This is required so we can concurrently
process system error handling for all the devices that need that.

Signed-off-by: Eli Cohen <eli@mellanox.com>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-10-09 07:27:49 -07:00
Eli Cohen
020446e01e net/mlx5_core: Prepare cmd interface to system errors handling
In preparation to handling system errors at the mlx5_core level, change the
interface of cmd_work_handler to accept a 64 bit argument for the vector.

This allows to encode a flag that signifies when the handler is called
as a result of a driver logic that wishes to terminate commands that
the hardware may not be able to terminate. Such command completions
are detected at the handler and proper return status is encoded.

To be able to terminate page handler commands, we make sure to set
the corresponding bit in the bitmask.

Signed-off-by: Eli Cohen <eli@mellanox.com>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-10-09 07:27:48 -07:00
Daniel Borkmann
3ad0040573 bpf: split state from prandom_u32() and consolidate {c, e}BPF prngs
While recently arguing on a seccomp discussion that raw prandom_u32()
access shouldn't be exposed to unpriviledged user space, I forgot the
fact that SKF_AD_RANDOM extension actually already does it for some time
in cBPF via commit 4cd3675ebf74 ("filter: added BPF random opcode").

Since prandom_u32() is being used in a lot of critical networking code,
lets be more conservative and split their states. Furthermore, consolidate
eBPF and cBPF prandom handlers to use the new internal PRNG. For eBPF,
bpf_get_prandom_u32() was only accessible for priviledged users, but
should that change one day, we also don't want to leak raw sequences
through things like eBPF maps.

One thought was also to have own per bpf_prog states, but due to ABI
reasons this is not easily possible, i.e. the program code currently
cannot access bpf_prog itself, and copying the rnd_state to/from the
stack scratch space whenever a program uses the prng seems not really
worth the trouble and seems too hacky. If needed, taus113 could in such
cases be implemented within eBPF using a map entry to keep the state
space, or get_random_bytes() could become a second helper in cases where
performance would not be critical.

Both sides can trigger a one-time late init via prandom_init_once() on
the shared state. Performance-wise, there should even be a tiny gain
as bpf_user_rnd_u32() saves one function call. The PRNG needs to live
inside the BPF core since kernels could have a NET-less config as well.

Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Acked-by: Alexei Starovoitov <ast@plumgrid.com>
Cc: Chema Gonzalez <chema@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-10-08 05:26:39 -07:00
Daniel Borkmann
897ece56e7 random32: add prandom_init_once helper for own rngs
Add a prandom_init_once() facility that works on the rnd_state, so that
users that are keeping their own state independent from prandom_u32() can
initialize their taus113 per cpu states.

The motivation here is similar to net_get_random_once(): initialize the
state as late as possible in the hope that enough entropy has been
collected for the seeding. prandom_init_once() makes use of the recently
introduced prandom_seed_full_state() helper and is generic enough so that
it could also be used on fast-paths due to the DO_ONCE().

Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-10-08 05:26:38 -07:00
Hannes Frederic Sowa
c90aeb9482 once: make helper generic for calling functions once
Make the get_random_once() helper generic enough, so that functions
in general would only be called once, where one user of this is then
net_get_random_once().

The only implementation specific call is to get_random_bytes(), all
the rest of this *_once() facility would be duplicated among different
subsystems otherwise. The new DO_ONCE() helper will be used by prandom()
later on, but might also be useful for other scenarios/subsystems as
well where a one-time initialization in often-called, possibly fast
path code could occur.

Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-10-08 05:26:36 -07:00
Hannes Frederic Sowa
46234253b9 net: move net_get_random_once to lib
There's no good reason why users outside of networking should not
be using this facility, f.e. for initializing their seeds.

Therefore, make it accessible from there as get_random_once().

Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-10-08 05:26:35 -07:00
Alexander Aring
4d6a6aed22 6lowpan: move shared settings to lowpan_netdev_setup
This patch moves values for all lowpan interface to the shared
implementation of 6lowpan. This patch also quietly fixes the forgotten
IFF_NO_QUEUE flag for the bluetooth 6LoWPAN interface. An identically
commit is 4afbc0d ("net: 6lowpan: convert to using IFF_NO_QUEUE") which
wasn't changed for bluetooth 6lowpan.

All 6lowpan interfaces should be virtual with IFF_NO_QUEUE, using EUI64
address length, the mtu size is 1280 (IPV6_MIN_MTU) and the netdev type
is ARPHRD_6LOWPAN.

Signed-off-by: Alexander Aring <alex.aring@gmail.com>
Acked-by: Jukka Rissanen <jukka.rissanen@linux.intel.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2015-10-08 14:25:34 +02:00
Arun Parameswaran
8e185d6997 net: phy: Broadcom Cygnus internal Etherent PHY driver
Add support for the Broadcom Cygnus SoCs internal PHY's.
The PHYs are 1000M/100M/10M capable with support for 'EEE'
and 'APD' (Auto Power Down).

This driver supports the following Broadcom Cygnus SoCs:
 - BCM583XX (BCM58300, BCM58302, BCM58303, BCM58305)
 - BCM113XX (BCM11300, BCM11320, BCM11350, BCM11360)

The PHY's on these SoC's require some workarounds for
stable operation, both during configuration time and
during suspend/resume. This driver handles the
application of the workarounds.

Signed-off-by: Arun Parameswaran <arunp@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-10-08 04:45:52 -07:00