722470 Commits

Author SHA1 Message Date
Jason Wang
96f8406162 tun: add eBPF based queue selection method
This patch introduces an eBPF based queue selection method. With this,
the policy could be offloaded to userspace completely through a new
ioctl TUNSETSTEERINGEBPF.

Signed-off-by: Jason Wang <jasowang@redhat.com>
Acked-by: Willem de Bruijn <willemb@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-12-05 12:01:49 -05:00
David S. Miller
f520957dc2 Merge branch 'hns3-reset-refactor'
Salil Mehta says:

====================
net: hns3: Refactors "reset" handling code in HCLGE layer of HNS3 driver

This patch refactors the code of the reset feature in HCLGE layer
of HNS3 PF driver. Prime motivation to do this change is:
1. To reduce the time for which common miscellaneous Vector 0
   interrupt is disabled because of the reset. Simplification
   of the common miscellaneous interrupt handler routine(for
   Vector 0) used to handle reset and other sources of Vector
   0 interrupt.
2. Separate the task for handling the reset
3. Simplification of reset request submission and pending reset
   logic.

To achieve above below few things have been done:
1. Interrupt is disabled while common miscellaneous interrupt
   handler is entered and re-enabled before it is exit. This
   reduces the interrupt handling latency as compared to older
   interrupt handling scheme where interrupt was being disabled
   in interrupt handler context and re-enabled in task context
   some time later. Made Miscellaneous interrupt handler more
   generic to handle all sources including reset interrupt source.
2. New reset service task has been introduced to service the
   reset handling.
3. Introduces new reset service task for honoring software reset
   requests like from network stack related to timeout and serving
   the pending reset request(to reset the driver and associated
   clients).

Change Log:
Patch V2: Addressed comment by Andrew Lunn
   Link: https://lkml.org/lkml/2017/12/1/366
Patch V1: Initial Submit
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2017-12-05 11:45:18 -05:00
Salil Mehta
f2f432f2c3 net: hns3: Refactors the requested reset & pending reset handling code
In exisiting code, the way to detect if driver/client reset should
be executed or if hardware should be be soft resetted was overly
complex.

Existing code use to read the interrupt status register from task
context to figure out if the interrupt source event was reset and
then use clear the interrupt source for reset while waiting for the
hardware to finish the reset. This behaviour again was confusing
and overly complex in terms of the flow.

This patch simplifies the handling of the requested reset and the
pending reset(i.e. reset which have already been asserted by the
software and hardware has acknowledged back to driver that it is
processing the hardware reset through interrupt)

Signed-off-by: Salil Mehta <salil.mehta@huawei.com>
Signed-off-by: lipeng <lipeng321@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-12-05 11:45:18 -05:00
Salil Mehta
cb1b9f77c4 net: hns3: Add reset service task for handling reset requests
Existing common service task was being used to service the reset
requests. This patch tries to make the handling of reset cleaner
by separating task to handle the reset requests. This might in
turn help in adapting similar handling approach for other
interrupt events like mailbox, sharing vector 0 interrupt.

Signed-off-by: Salil Mehta <salil.mehta@huawei.com>
Signed-off-by: lipeng <lipeng321@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-12-05 11:45:17 -05:00
Salil Mehta
ca1d7669b7 net: hns3: Refactor of the reset interrupt handling logic
The reset interrupt event shares common miscellaneous interrupt
Vector 0. In the existing reset interrupt handling we disable
the Vector 0 interrupt in misc interrupt handler and re-enable
them later in context to common service task.

This also means other event sources like mailbox would also be
deferred or if the interrupt event was due to mailbox(which shall
be supported for VF soon), it could delay the reset handling.

This patch reorganizes the reset interrupt handling logic and
makes it more fair to other events.

Signed-off-by: Salil Mehta <salil.mehta@huawei.com>
Signed-off-by: lipeng <lipeng321@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-12-05 11:45:17 -05:00
Julia Lawall
81da3bf6e3 net: macb: change GFP_KERNEL to GFP_ATOMIC
Function gem_add_flow_filter called on line 2958 inside lock on line 2949
but uses GFP_KERNEL

Generated by: scripts/coccinelle/locks/call_kern.cocci

Fixes: ae8223de3df5 ("net: macb: Added support for RX filtering")
CC: Rafal Ozieblo <rafalo@cadence.com>
Signed-off-by: Julia Lawall <julia.lawall@lip6.fr>
Signed-off-by: Fengguang Wu <fengguang.wu@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-12-05 11:26:42 -05:00
David S. Miller
9b89f07f10 Merge branch 'SFP-phylink-updates'
Russell King says:

====================
SFP/phylink updates

This series, which follows on from the fixes posted earlier, improves
the phylink/sfp support.  Changes included here are:

- Merge 802.3z and SGMII modes into one "in-band" mode, using the
  PHY_INTERFACE_MODE_xxx definition to determine which should be used.
  This allows more flexibility as more interface modes become
  available.

- Allow 2500base-X and 10GBASE-KR to be requested from SFP.

- Remove unused and unnecessary phylink_init_eee()

- Restart 802.3z autonegotiation when starting the network device to
  ensure that the negotiated parameters are always correct.  It has
  been observed on mvneta that this is not always the case without
  this change.

- Add kerneldoc documentation for phylink and sfp upstream facing APIs
  and link it in to the networking documentation.

- Resolve a sparse warning in sfp-bus.c

- Convert phylink/sfp to use fwnode rather than DT so that other firmware
  systems can take advantage of this - I have received a request for it
  to be usable with ACPI.  The exception to this is our interactions with
  phylib, as phylib itself does not yet support fwnode.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2017-12-05 11:16:20 -05:00
Russell King
8fa7b9b6af phylink: convert to fwnode
Convert phylink to fwnode, switching phylink_create() from taking a
device_node to taking a fwnode_handle. This will allow other firmware
systems to take advantage of sfp/phylink support.

Signed-off-by: Russell King <rmk+kernel@armlinux.org.uk>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-12-05 11:16:19 -05:00
Russell King
c19bb00070 sfp: convert to fwnode
Convert sfp-bus to use fwnode rather than device_node internally, so
we can support more than just device tree firmware.

Signed-off-by: Russell King <rmk+kernel@armlinux.org.uk>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-12-05 11:16:19 -05:00
Russell King
b6e67d6d46 sfp: fix sparse warning
drivers/net/phy/sfp-bus.c:298:13: warning: context imbalance in 'sfp_bus_release' - wrong count at exit

Signed-off-by: Russell King <rmk+kernel@armlinux.org.uk>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-12-05 11:16:19 -05:00
Russell King
0a6fcd3fc1 sfp: add documentation for kernel APIs
Add kernel-doc documentation for sfp kernel APIs, and link it into the
networking kapi documentation under "Network device support".

Signed-off-by: Russell King <rmk+kernel@armlinux.org.uk>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-12-05 11:16:19 -05:00
Russell King
8796c8923d phylink: add documentation for kernel APIs
Add kernel-doc documentation for phylink kernel APIs, and link it into
the networking kapi documentation under "Network device support".

Signed-off-by: Russell King <rmk+kernel@armlinux.org.uk>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-12-05 11:16:19 -05:00
Russell King
85b43945cf phylink: restart 802.3z negotiation when starting net device
Restart 802.3z negotiation when the net device is brought up to ensure
that the link partner has our current link modes.

Signed-off-by: Russell King <rmk+kernel@armlinux.org.uk>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-12-05 11:16:18 -05:00
Russell King
939eae25d9 phylink: remove phylink_init_eee()
phylink_init_eee() serves no purpose, remove it.

Signed-off-by: Russell King <rmk+kernel@armlinux.org.uk>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-12-05 11:16:18 -05:00
Russell King
4336c40113 phylink: add support for 2500baseX and 10GbaseKR
Add support for handling the faster 2.5G and 10G link modes when used
with SFP modules.

Signed-off-by: Russell King <rmk+kernel@armlinux.org.uk>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-12-05 11:16:18 -05:00
Russell King
86a362c49f phylink: get rid of separate Cisco SGMII and 802.3z modes
Since the handling of SGMII and 802.3z is now the same, combine the
MLO_AN_xxx constants into a single MLO_AN_INBAND, and use the PHY
interface mode to distinguish between Cisco SGMII and 802.3z.

Signed-off-by: Russell King <rmk+kernel@armlinux.org.uk>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-12-05 11:16:18 -05:00
Russell King
cf4f267534 phylink: merge SGMII and 802.3z handling
The code handling SGMII and 802.3z is essentially the same, except that
we assume 802.3z has no PHY.  Re-organise the code such that these cases
are merged, and exclude 802.3z mode from having a PHY attached.  This
results in the same link handling behaviour as before.

Signed-off-by: Russell King <rmk+kernel@armlinux.org.uk>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-12-05 11:16:17 -05:00
Russell King
365c1e64ad phy: add phy_interface_mode_is_8023z() helper
Add and use phy_interface_mode_is_8023z() helper to identify the
interface modes that use 802.3z negotiation.  Use it in phylink's
phylink_mac_an_restart().

Signed-off-by: Russell King <rmk+kernel@armlinux.org.uk>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-12-05 11:16:17 -05:00
Florian Westphal
b0e9fe1ba7 rtnetlink: fix rtnl_link msghandler rcu annotations
Incorrect/missing annotations caused a few sparse warnings:

rtnetlink.c:155:15: incompatible types .. (different address spaces)
rtnetlink.c:157:23: incompatible types .. (different address spaces)
rtnetlink.c:185:15: incompatible types .. (different address spaces)
rtnetlink.c:285:15: incompatible types .. (different address spaces)
rtnetlink.c:317:9: incompatible types .. (different address spaces)
rtnetlink.c:3054:23: incompatible types .. (different address spaces)

no change in generated code.

Fixes: addf9b90de22f7 ("net: rtnetlink: use rcu to free rtnl message handlers")
Reported-by: kbuild test robot <fengguang.wu@intel.com>
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-12-05 11:14:44 -05:00
David S. Miller
7cda4cee13 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
Small overlapping change conflict ('net' changed a line,
'net-next' added a line right afterwards) in flexcan.c

Signed-off-by: David S. Miller <davem@davemloft.net>
2017-12-05 10:44:19 -05:00
Linus Torvalds
2391f0b480 virtio,qemu: bugfixes
A couple of bugfixes that just became ready.
 
 Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
 -----BEGIN PGP SIGNATURE-----
 
 iQEcBAABAgAGBQJaIW15AAoJECgfDbjSjVRprxIH/RzUfSAfgaxhIVm2j3CRarGS
 G0xJvKxuXpiDnEtfu5st9QRqNo8CMuPV02kIaOb0Aqq2GfdEb0sCFMTLbfnYuXul
 uECu3uquXMUbuRXffQCHnDN4qqDWKRim8STIDPUlxjnBQqkos+JFn+MQ3vF2KEI0
 SeA8u1YQs5Ji10ZPjZ7FEAirbxDFiLPb0Hb0tphBGSZPXBRV3qoA4qeKNOy45EkQ
 F4J/1YxX/nWca0HYjBl2NceIqc4U/HcNZpner5YcJLad6SQSU9mSfEnJJIhOrRmu
 /Ivn6Q1822/YVdePDcffLIrvFvM4HprT4BUYv0Cd/cTZZRFMCa16OXmDcPrWcLo=
 =Nvzh
 -----END PGP SIGNATURE-----

Merge tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost

Pull virtio fixes from Michael Tsirkin:
 "virtio and qemu bugfixes

  A couple of bugfixes that just became ready"

* tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost:
  virtio_balloon: fix increment of vb->num_pfns in fill_balloon()
  virtio: release virtio index when fail to device_register
  fw_cfg: fix driver remove
2017-12-04 11:32:02 -08:00
Linus Torvalds
236fa078c6 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
Pull networking fixes from David Miller:

 1) Various TCP control block fixes, including one that crashes with
    SELinux, from David Ahern and Eric Dumazet.

 2) Fix ACK generation in rxrpc, from David Howells.

 3) ipvlan doesn't set the mark properly in the ipv4 route lookup key,
    from Gao Feng.

 4) SIT configuration doesn't take on the frag_off ipv4 field
    configuration properly, fix from Hangbin Liu.

 5) TSO can fail after device down/up on stmmac, fix from Lars Persson.

 6) Various bpftool fixes (mostly in JSON handling) from Quentin Monnet.

 7) Various SKB leak fixes in vhost/tun/tap (mostly observed as
    performance problems). From Wei Xu.

 8) mvpps's TX descriptors were not zero initialized, from Yan Markman.

* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (57 commits)
  tcp: use IPCB instead of TCP_SKB_CB in inet_exact_dif_match()
  tcp: add tcp_v4_fill_cb()/tcp_v4_restore_cb()
  rxrpc: Fix the MAINTAINERS record
  rxrpc: Use correct netns source in rxrpc_release_sock()
  liquidio: fix incorrect indentation of assignment statement
  stmmac: reset last TSO segment size after device open
  ipvlan: Add the skb->mark as flow4's member to lookup route
  s390/qeth: build max size GSO skbs on L2 devices
  s390/qeth: fix GSO throughput regression
  s390/qeth: fix thinko in IPv4 multicast address tracking
  tap: free skb if flags error
  tun: free skb in early errors
  vhost: fix skb leak in handle_rx()
  bnxt_en: Fix a variable scoping in bnxt_hwrm_do_send_msg()
  bnxt_en: fix dst/src fid for vxlan encap/decap actions
  bnxt_en: wildcard smac while creating tunnel decap filter
  bnxt_en: Need to unconditionally shut down RoCE in bnxt_shutdown
  phylink: ensure we take the link down when phylink_stop() is called
  sfp: warn about modules requiring address change sequence
  sfp: improve RX_LOS handling
  ...
2017-12-04 11:14:46 -08:00
Chris Metcalf
8ee5ad1d4c arch/tile: mark as orphaned
The chip family of TILEPro and TILE-Gx was developed by Tilera, which
was eventually acquired by Mellanox.  The tile architecture was added to
the kernel in 2010 and first appeared in 2.6.36.

Now at Mellanox we are developing new chips based on the ARM64
architecture; our last TILE-Gx chip (the Gx72) was released in 2013, and
our customers using tile architecture products are not, as far as we
know, looking to upgrade to newer kernel releases.  In the absence of
someone in the community stepping up to take over maintainership, this
commit marks the architecture as orphaned.

Cc: Chris Metcalf <metcalf@alum.mit.edu>
Signed-off-by: Chris Metcalf <cmetcalf@mellanox.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-12-04 11:09:31 -08:00
Florian Westphal
a3fde2addd rtnetlink: ipv6: convert remaining users to rtnl_register_module
convert remaining users of rtnl_register to rtnl_register_module
and un-export rtnl_register.

Requested-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-12-04 13:35:36 -05:00
David S. Miller
112d59c75c linux-can-next-for-4.16-20171201
-----BEGIN PGP SIGNATURE-----
 
 iQFHBAABCgAxFiEE4bay/IylYqM/npjQHv7KIOw4HPYFAlohXEATHG1rbEBwZW5n
 dXRyb25peC5kZQAKCRAe/sog7Dgc9gWMCACMHrISR36Ht6NGRIQC0asPY3Z3aSSL
 W2D0z5QysnLfd4RPen0rwvkujezMIBbDjjcbo0hI+VNe1q1SNQSGJmgL4Mm+drLw
 6uqfvv8KIrhH9AEcz9AP8nKWeDFgYrjjRoYMfr0uc0Zeq3boTRWaO6bGw17rMZ1X
 WWaOYA2sFUsNaiZPkiNEADg1MQGH85kfCEaS2SnQLVQJNmm0pidsC3BbcepTQ5Ys
 Wx3P1jkHwkol33lHsgumSzDYDT00kNQA038k9VAVKo5lx8WH3rUWg0lb133U6IY6
 rN0jmTE67MAsKspcveoIRdxAo/PDOFbZBoiHJA03j3JkdA1NIpAgesi7
 =2Ddp
 -----END PGP SIGNATURE-----

Merge tag 'linux-can-next-for-4.16-20171201' of git://git.kernel.org/pub/scm/linux/kernel/git/mkl/linux-can-next

Marc Kleine-Budde says:

====================
pull-request: can-next 2017-12-01

this is a pull request of 10 patches for net-next/master.

The first two patches are by Arnd Bergmann, they convert the peak_usb
from using "struct timeval" to "ktime_t". The error handling in the
vxcan driver is clean up by Markus Elfring's patch. Bhumika Goyal
contributes a patch for the c_can_pci driver to make the pci data const.
The six patches by Pankaj Bansal for the flexcan driver add LS1021A
support by making the endianness of the driver configurable by the
device tree.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2017-12-04 13:18:54 -05:00
David S. Miller
d671965b54 Merge git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next
Daniel Borkmann says:

====================
pull-request: bpf-next 2017-12-03

The following pull-request contains BPF updates for your *net-next* tree.

The main changes are:

1) Addition of a software model for BPF offloads in order to ease
   testing code changes in that area and make semantics more clear.
   This is implemented in a new driver called netdevsim, which can
   later also be extended for other offloads. SR-IOV support is added
   as well to netdevsim. BPF kernel selftests for offloading are
   added so we can track basic functionality as well as exercising
   all corner cases around BPF offloading, from Jakub.

2) Today drivers have to drop the reference on BPF progs they hold
   due to XDP on device teardown themselves. Change this in order
   to make XDP handling inside the drivers less error prone, and
   move disabling XDP to the core instead, also from Jakub.

3) Misc set of BPF verifier improvements and cleanups as preparatory
   work for upcoming BPF-to-BPF calls. Among others, this set also
   improves liveness marking such that pruning can be slightly more
   effective. Register and stack liveness information is now included
   in the verifier log as well, from Alexei.

4) nfp JIT improvements in order to identify load/store sequences in
   the BPF prog e.g. coming from memcpy lowering and optimizing them
   through the NPU's command push pull (CPP) instruction, from Jiong.

5) Cleanups to test_cgrp2_attach2.c BPF sample code in oder to remove
   bpf_prog_attach() magic values and replacing them with actual proper
   attach flag instead, from David.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2017-12-04 12:07:10 -05:00
David S. Miller
f4d4c49b0c Merge branch 'rtnetlink-rework-handler-registration'
Florian Westphal says:

====================
rtnetlink: rework handler (un)registering

Peter Zijlstra reported (referring to commit 019a316992ee0d983,
"rtnetlink: add reference counting to prevent module unload while dump is in progress"):

 1) it not in fact a refcount, so using refcount_t is silly
 2) there is a distinct lack of memory barriers, so we can easily
    observe the decrement while the msg_handler is still in progress.
 3) waiting with a schedule()/yield() loop is complete crap and subject
    life-locks, imagine doing that rtnl_unregister_all() from a RT task.

In ancient times rtnetlink exposed a statically-sized table with
preset doit/dumpit handlers to be called for a protocol/type pair.

Later the rtnl_register interface was added and the table was allocated
on demand.  Eventually these were also used by modules.

Problem is that nothing prevents module unload while a netlink dump
is in progress.  netlink dumps can be span multiple recv calls and
netlink core saves the to-be-repeated dumper address for later invocation.

To prevent rmmod the netlink core expects callers to pass in the owning
module so a reference can be taken.

So far rtnetlink wasn't doing this, add new interface to pass THIS_MODULE.
Moreover, when converting parts of the rtnetlink handling to rcu this code
gained way too many READ_ONCE spots, remove them and the extra refcounting.

Take a module reference when running dumpit and doit callbacks
and never alter content of rtnl_link structures after they have been
published via rcu_assign_pointer.

Based partially on earlier patch from Peter.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2017-12-04 11:33:02 -05:00
Florian Westphal
16feebcf23 rtnetlink: remove __rtnl_register
This removes __rtnl_register and switches callers to either
rtnl_register or rtnl_register_module.

Also, rtnl_register() will now print an error if memory allocation
failed rather than panic the kernel.

Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-12-04 11:32:53 -05:00
Florian Westphal
c1c502b511 net: use rtnl_register_module where needed
all of these can be compiled as a module, so use new
_module version to make sure module can no longer be removed
while callback/dump is in use.

Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-12-04 11:32:39 -05:00
Florian Westphal
e420251148 rtnetlink: get reference on module before invoking handlers
Add yet another rtnl_register function.  It will be used by modules
that can be removed.

The passed module struct is used to prevent module unload while
a netlink dump is in progress or when a DOIT_UNLOCKED doit callback
is called.

Cc: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-12-04 11:32:31 -05:00
Florian Westphal
addf9b90de net: rtnetlink: use rcu to free rtnl message handlers
rtnetlink is littered with READ_ONCE() because we can have read accesses
while another cpu can write to the structure we're reading by
(un)registering doit or dumpit handlers.

This patch changes this so that (un)registering cpu allocates a new
structure and then publishes it via rcu_assign_pointer, i.e. once
another cpu can see such pointer no modifications will occur anymore.

based on initial patch from Peter Zijlstra.

Cc: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-12-04 11:32:22 -05:00
Heiner Kallweit
9753c21f55 net: phy: broadcom: re-add mistakenly removed config settings
Previous patch mistakenly removed three chip-specific config settings.
Add them again.

Fixes: 80274abafc60 "net: phy: remove generic settings for callbacks config_aneg and read_status from drivers"
Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-12-04 11:17:08 -05:00
David S. Miller
90ffa72ff4 Merge branch 'ipv6-gre-collect_md'
William Tu says:

====================
add ip6 gre and gretap collect_md mode

Similar to gre, vxlan, geneve, ipip tunnels, allow ip6gretap tunnels to
operate in collect metadata mode.  The first patch adds the support to
ip6_gre.c. The second patch enables unsetting the csum for ipv6 tunnel,
when using bpf_skb_[gs]et_tunnel_key() helpers.  Finally, the last patch
adds the ip6 gre and gretap tunnel test cases to BPF sample code.

The corresponding iproute2 patch:
https://marc.info/?l=linux-netdev&m=151216943128087&w=2
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2017-12-04 11:04:20 -05:00
William Tu
56ddd30280 samples/bpf: extend test_tunnel_bpf.sh with ip6gre
Extend existing tests for vxlan, gre, geneve, ipip, erspan,
to include ip6 gre and gretap tunnel.

Signed-off-by: William Tu <u9012063@gmail.com>
Cc: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-12-04 11:04:19 -05:00
William Tu
b8da518c6e bpf: allow disabling tunnel csum for ipv6
Before the patch, BPF_F_ZERO_CSUM_TX can be used only for ipv4 tunnel.
With introduction of ip6gretap collect_md mode, the flag should be also
supported for ipv6.

Signed-off-by: William Tu <u9012063@gmail.com>
Cc: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-12-04 11:04:19 -05:00
William Tu
6712abc168 ip6_gre: add ip6 gre and gretap collect_md mode
Similar to gre, vxlan, geneve, ipip tunnels, allow ip6 gre and gretap
tunnels to operate in collect metadata mode.  bpf_skb_[gs]et_tunnel_key()
helpers can make use of it right away.  OVS can use it as well in the
future.

Signed-off-by: William Tu <u9012063@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-12-04 11:04:19 -05:00
Heiner Kallweit
c34bc2b505 net: phy: core: don't disable device interrupts in phy_change
If state is not PHY_HALTED I see no need to temporarily disable
interrupts on the device. As long as the current interrupt isn't acked
on the device no new interrupt can happen anyway.

In addition remove a unneeded enabling of interrupts in the state
machine when handling state PHY_CHANGELINK.

Tested on a Odroid-C2 with RTL8211F phy in interrupt mode.

Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-12-04 10:23:42 -05:00
Heiner Kallweit
a6d1642dab net: phy: core: remove now uneeded disabling of interrupts
After commits c974bdbc3e "net: phy: Use threaded IRQ, to allow IRQ from
sleeping devices" and 664fcf123a30 "net: phy: Threaded interrupts allow
some simplification" all relevant code pieces run in process context
anyway and I don't think we need the disabling of interrupts any longer.

Interestingly enough, latter commit already removed the comment
explaining why interrupts need to be temporarily disabled.

On my system phy interrupt mode works fine with this patch.
However I may miss something, especially in the context of shared phy
interrupts, therefore I'd appreciate if more people could test this.

Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
Acked-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-12-04 10:23:41 -05:00
David S. Miller
c2eb6d07a6 Merge git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf
Daniel Borkmann says:

====================
pull-request: bpf 2017-12-02

The following pull-request contains BPF updates for your *net* tree.

The main changes are:

1) Fix a compilation warning in xdp redirect tracepoint due to
   missing bpf.h include that pulls in struct bpf_map, from Xie.

2) Limit the maximum number of attachable BPF progs for a given
   perf event as long as uabi is not frozen yet. The hard upper
   limit is now 64 and therefore the same as with BPF multi-prog
   for cgroups. Also add related error checking for the sample
   BPF loader when enabling and attaching to the perf event, from
   Yonghong.

3) Specifically set the RLIMIT_MEMLOCK for the test_verifier_log
   case, so that the test case can always pass and not fail in
   some environments due to too low default limit, also from
   Yonghong.

4) Fix up a missing license header comment for kernel/bpf/offload.c,
   from Jakub.

5) Several fixes for bpftool, among others a crash on incorrect
   arguments when json output is used, error message handling
   fixes on unknown options and proper destruction of json writer
   for some exit cases, all from Quentin.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2017-12-03 13:08:30 -05:00
David S. Miller
e4485c7484 Merge branch 'tcp-cb-selinux-corruption'
Eric Dumazet says:

====================
tcp: add tcp_v4_fill_cb()/tcp_v4_restore_cb()

James Morris reported kernel stack corruption bug that
we tracked back to commit 971f10eca186 ("tcp: better TCP_SKB_CB
layout to reduce cache line misses")

First patch needs to be backported to kernels >= 3.18,
while second patch needs to be backported to kernels >= 4.9, since
this was the time when inet_exact_dif_match appeared.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2017-12-03 12:39:15 -05:00
David Ahern
b4d1605a8e tcp: use IPCB instead of TCP_SKB_CB in inet_exact_dif_match()
After this fix : ("tcp: add tcp_v4_fill_cb()/tcp_v4_restore_cb()"),
socket lookups happen while skb->cb[] has not been mangled yet by TCP.

Fixes: a04a480d4392 ("net: Require exact match for TCP socket lookups if dif is l3mdev")
Signed-off-by: David Ahern <dsahern@gmail.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-12-03 12:39:15 -05:00
Eric Dumazet
eeea10b83a tcp: add tcp_v4_fill_cb()/tcp_v4_restore_cb()
James Morris reported kernel stack corruption bug [1] while
running the SELinux testsuite, and bisected to a recent
commit bffa72cf7f9d ("net: sk_buff rbnode reorg")

We believe this commit is fine, but exposes an older bug.

SELinux code runs from tcp_filter() and might send an ICMP,
expecting IP options to be found in skb->cb[] using regular IPCB placement.

We need to defer TCP mangling of skb->cb[] after tcp_filter() calls.

This patch adds tcp_v4_fill_cb()/tcp_v4_restore_cb() in a very
similar way we added them for IPv6.

[1]
[  339.806024] SELinux: failure in selinux_parse_skb(), unable to parse packet
[  339.822505] Kernel panic - not syncing: stack-protector: Kernel stack is corrupted in: ffffffff81745af5
[  339.822505]
[  339.852250] CPU: 4 PID: 3642 Comm: client Not tainted 4.15.0-rc1-test #15
[  339.868498] Hardware name: LENOVO 10FGS0VA1L/30BC, BIOS FWKT68A   01/19/2017
[  339.885060] Call Trace:
[  339.896875]  <IRQ>
[  339.908103]  dump_stack+0x63/0x87
[  339.920645]  panic+0xe8/0x248
[  339.932668]  ? ip_push_pending_frames+0x33/0x40
[  339.946328]  ? icmp_send+0x525/0x530
[  339.958861]  ? kfree_skbmem+0x60/0x70
[  339.971431]  __stack_chk_fail+0x1b/0x20
[  339.984049]  icmp_send+0x525/0x530
[  339.996205]  ? netlbl_skbuff_err+0x36/0x40
[  340.008997]  ? selinux_netlbl_err+0x11/0x20
[  340.021816]  ? selinux_socket_sock_rcv_skb+0x211/0x230
[  340.035529]  ? security_sock_rcv_skb+0x3b/0x50
[  340.048471]  ? sk_filter_trim_cap+0x44/0x1c0
[  340.061246]  ? tcp_v4_inbound_md5_hash+0x69/0x1b0
[  340.074562]  ? tcp_filter+0x2c/0x40
[  340.086400]  ? tcp_v4_rcv+0x820/0xa20
[  340.098329]  ? ip_local_deliver_finish+0x71/0x1a0
[  340.111279]  ? ip_local_deliver+0x6f/0xe0
[  340.123535]  ? ip_rcv_finish+0x3a0/0x3a0
[  340.135523]  ? ip_rcv_finish+0xdb/0x3a0
[  340.147442]  ? ip_rcv+0x27c/0x3c0
[  340.158668]  ? inet_del_offload+0x40/0x40
[  340.170580]  ? __netif_receive_skb_core+0x4ac/0x900
[  340.183285]  ? rcu_accelerate_cbs+0x5b/0x80
[  340.195282]  ? __netif_receive_skb+0x18/0x60
[  340.207288]  ? process_backlog+0x95/0x140
[  340.218948]  ? net_rx_action+0x26c/0x3b0
[  340.230416]  ? __do_softirq+0xc9/0x26a
[  340.241625]  ? do_softirq_own_stack+0x2a/0x40
[  340.253368]  </IRQ>
[  340.262673]  ? do_softirq+0x50/0x60
[  340.273450]  ? __local_bh_enable_ip+0x57/0x60
[  340.285045]  ? ip_finish_output2+0x175/0x350
[  340.296403]  ? ip_finish_output+0x127/0x1d0
[  340.307665]  ? nf_hook_slow+0x3c/0xb0
[  340.318230]  ? ip_output+0x72/0xe0
[  340.328524]  ? ip_fragment.constprop.54+0x80/0x80
[  340.340070]  ? ip_local_out+0x35/0x40
[  340.350497]  ? ip_queue_xmit+0x15c/0x3f0
[  340.361060]  ? __kmalloc_reserve.isra.40+0x31/0x90
[  340.372484]  ? __skb_clone+0x2e/0x130
[  340.382633]  ? tcp_transmit_skb+0x558/0xa10
[  340.393262]  ? tcp_connect+0x938/0xad0
[  340.403370]  ? ktime_get_with_offset+0x4c/0xb0
[  340.414206]  ? tcp_v4_connect+0x457/0x4e0
[  340.424471]  ? __inet_stream_connect+0xb3/0x300
[  340.435195]  ? inet_stream_connect+0x3b/0x60
[  340.445607]  ? SYSC_connect+0xd9/0x110
[  340.455455]  ? __audit_syscall_entry+0xaf/0x100
[  340.466112]  ? syscall_trace_enter+0x1d0/0x2b0
[  340.476636]  ? __audit_syscall_exit+0x209/0x290
[  340.487151]  ? SyS_connect+0xe/0x10
[  340.496453]  ? do_syscall_64+0x67/0x1b0
[  340.506078]  ? entry_SYSCALL64_slow_path+0x25/0x25

Fixes: 971f10eca186 ("tcp: better TCP_SKB_CB layout to reduce cache line misses")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reported-by: James Morris <james.l.morris@oracle.com>
Tested-by: James Morris <james.l.morris@oracle.com>
Tested-by: Casey Schaufler <casey@schaufler-ca.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-12-03 12:39:14 -05:00
Linus Torvalds
ae64f9bd1d Linux 4.15-rc2 v4.15-rc2 2017-12-03 11:01:47 -05:00
Linus Torvalds
87fc5c686e Merge branch 'fixes' of git://git.armlinux.org.uk/~rmk/linux-arm
Pull ARM fix from Russell King:
 "Just one fix this time around, for the late commit in the merge window
  that triggered a problem with qemu. Qemu is apparently also going to
  receive a fix for the discovered issue"

* 'fixes' of git://git.armlinux.org.uk/~rmk/linux-arm:
  ARM: avoid faulting on qemu
2017-12-03 10:51:08 -05:00
Linus Torvalds
ae4806a38b Merge branch 'i2c/for-current' of git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux
Pull i2c fixes from Wolfram Sang:
 "Here are two bugfixes for I2C, fixing a memleak in the core and irq
  allocation for i801.

  Also three bugfixes for the at24 eeprom driver which Bartosz collected
  while taking over maintainership for this driver"

* 'i2c/for-current' of git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux:
  eeprom: at24: check at24_read/write arguments
  eeprom: at24: fix reading from 24MAC402/24MAC602
  eeprom: at24: correctly set the size for at24mac402
  i2c: i2c-boardinfo: fix memory leaks on devinfo
  i2c: i801: Fix Failed to allocate irq -2147483648 error
2017-12-03 10:48:24 -05:00
Linus Torvalds
49a418d783 hwmon fixes for v4.15-rc2
Drop reference to obsolete maintainer tree
 Fix overflow bug in pmbus driver
 Fix SMBUS timeout problem in jc42 driver
 
 For the SMBUS timeout handling, we had a brief discussion if this should
 be considered a bug fix or a feature. Peter says "it fixes real problems
 where the application misbehave due to faulty content when reading from
 an eeprom", and he needs the patch in his company's v4.14 images. This is
 good enough for me and warrants backport to stable kernels.
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1
 
 iQIcBAABAgAGBQJaItAeAAoJEMsfJm/On5mBYrQQAJz+Ukg8blLKg1bqb01nZxEY
 4vOxqZySpqR5icW6KP0zn/ws8eKyFjGQQsRxoRePo9Zfj2Y9RKRKVOoRrs7McZ6s
 mO12KJBf13cGiB+Msm8JsjJv81E15RnDwsWw39+SPms3ueKiBDAl4xaH6PSeGKTZ
 zB8teDk7MLLwCplRuNbB3qrc0BCj4AgeAu3omHO7PpKClOCHRieJPLaFE2Fpzsu/
 P4RxUn4CFY0urgWJ5b9g5A3FdH8lOz8nfkiWPPnEb/IF+8tR9M3GYzwOp5r2uVul
 uKszDMKKx3Q+Hi+67/Ou2uLhCDnaxYtFHiN+REB9dRi33BHSuIsc4riCqa1ZMz8c
 XWQLbQq3u0bS1XgiegD4nihF2iNrj0fMcy2dcnUVWJNFKrfHjGAIodY0cKZJsZKW
 RqzWlX/aUVIpCSKxtJm0xDNPJS5FqKXLCV0xsNMF2Mz2JosT8o4IARZNlY7Ap0Be
 kRiQMXA/3y2RyeOUi82YHM0MorMt4icmTT3ztRrJpVbM1MBiiX2SefZquai5RLgp
 o/qTOrJ0gD8XzBhwP8wQYP5BvpPX2UX3V0sjLcRDNakqaOaDuslAMtzZ0sNn37Ng
 R+sAJNuFkLZkzhsa8IZSidngWMLvGS3Zjh2N75v6HcEaLrVoK2p6rBJM82Dg8cmp
 0GwZkjd72bdXIXuRLpri
 =5rYP
 -----END PGP SIGNATURE-----

Merge tag 'hwmon-for-linus-v4.15-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/groeck/linux-staging

Pull hwmon fixes from Guenter Roeck:
 "Fixes:

   - Drop reference to obsolete maintainer tree

   - Fix overflow bug in pmbus driver

   - Fix SMBUS timeout problem in jc42 driver

  For the SMBUS timeout handling, we had a brief discussion if this
  should be considered a bug fix or a feature. Peter says "it fixes real
  problems where the application misbehave due to faulty content when
  reading from an eeprom", and he needs the patch in his company's v4.14
  images. This is good enough for me and warrants backport to stable
  kernels"

* tag 'hwmon-for-linus-v4.15-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/groeck/linux-staging:
  hwmon: (jc42) optionally try to disable the SMBUS timeout
  hwmon: (pmbus) Use 64bit math for DIRECT format values
  hwmon: Drop reference to Jean's tree
2017-12-03 10:46:16 -05:00
David S. Miller
c6d3c96f8d Merge branch 'tcp-2nd-listener-hash'
Martin KaFai Lau says:

====================
tcp: Add a 2nd listener hashtable (port+addr)

This patch set adds a 2nd listener hashtable.  It is to resolve
the performance issue when a process is listening at many IP
addresses with the same port (e.g. [IP1]:443, [IP2]:443... [IPN]:443)

v2:
- Move the new lhash2 and lhash2_mask before the existing
  listening_hash to avoid adding another cacheline
  to inet_hashinfo (Suggested by Eric Dumazet, Thanks!)
- I take this chance to plug an existing 4 bytes hole while
  adding 'unsigned int lhash2_mask'.
- Add some comments about lhash2 in inet_hashtables.h
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2017-12-03 10:18:28 -05:00
Martin KaFai Lau
27da6d37e2 tcp: Enable 2nd listener hashtable in TCP
Enable the second listener hashtable in TCP.
The scale is the same as UDP which is one slot per 2MB.

Signed-off-by: Martin KaFai Lau <kafai@fb.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-12-03 10:18:28 -05:00
Martin KaFai Lau
61b7c691c7 inet: Add a 2nd listener hashtable (port+addr)
The current listener hashtable is hashed by port only.
When a process is listening at many IP addresses with the same port (e.g.
[IP1]:443, [IP2]:443... [IPN]:443), the inet[6]_lookup_listener()
performance is degraded to a link list.  It is prone to syn attack.

UDP had a similar issue and a second hashtable was added to resolve it.

This patch adds a second hashtable for the listener's sockets.
The second hashtable is hashed by port and address.

It cannot reuse the existing skc_portaddr_node which is shared
with skc_bind_node.  TCP listener needs to use skc_bind_node.
Instead, this patch adds a hlist_node 'icsk_listen_portaddr_node' to
the inet_connection_sock which the listener (like TCP) also belongs to.

The new portaddr hashtable may need two lookup (First by IP:PORT.
Second by INADDR_ANY:PORT if the IP:PORT is a not found).   Hence,
it implements a similar cut off as UDP such that it will only consult the
new portaddr hashtable if the current port-only hashtable has >10
sk in the link-list.

lhash2 and lhash2_mask are added to 'struct inet_hashinfo'.  I take
this chance to plug a 4 bytes hole.  It is done by first moving
the existing bind_bucket_cachep up and then add the new
(int lhash2_mask, *lhash2) after the existing bhash_size.

Signed-off-by: Martin KaFai Lau <kafai@fb.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-12-03 10:18:28 -05:00
Martin KaFai Lau
f0b1e64c13 udp: Move udp[46]_portaddr_hash() to net/ip[v6].h
This patch moves the udp[46]_portaddr_hash()
to net/ip[v6].h.  The function name is renamed to
ipv[46]_portaddr_hash().

It will be used by a later patch which adds a second listener
hashtable hashed by the address and port.

Signed-off-by: Martin KaFai Lau <kafai@fb.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-12-03 10:18:28 -05:00