IPv6 already exposes some address family data via netlink in the
IFLA_PROTINFO attribute if RTM_GETLINK request is sent with the
address family set to AF_INET6. We take over this format and
reuse all the code.
Signed-off-by: Thomas Graf <tgraf@infradead.org>
Cc: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Implements the AF_INET link address family exposing the per
device configuration settings via netlink using the attribute
IFLA_INET_CONF.
The format of IFLA_INET_CONF differs depending on the direction
the attribute is sent. The attribute sent by the kernel consists
of a u32 array, basically a 1:1 copy of in_device->cnf.data[].
The attribute expected by the kernel must consist of a sequence
of nested u32 attributes, each representing a change request,
e.g.
[IFLA_INET_CONF] = {
[IPV4_DEVCONF_FORWARDING] = 1,
[IPV4_DEVCONF_NOXFRM] = 0,
}
libnl userspace API documentation and example available from:
http://www.infradead.org/~tgr/libnl/doc-git/group__link__inet.html
Signed-off-by: Thomas Graf <tgraf@infradead.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Define IPV4_DEVCONF_MAX to get rid of MAX - 1 notation.
Signed-off-by: Thomas Graf <tgraf@infradead.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Each net_device contains address family specific data such as
per device settings and statistics. We already expose this data
via procfs/sysfs and partially netlink.
The netlink method requires the requester to send one RTM_GETLINK
request for each address family it wishes to receive data of
and then merge this data itself.
This patch implements a new API which combines all address family
specific link data in a new netlink attribute IFLA_AF_SPEC.
IFLA_AF_SPEC contains a sequence of nested attributes, one for each
address family which in turn defines the structure of its own
attribute. Example:
[IFLA_AF_SPEC] = {
[AF_INET] = {
[IFLA_INET_CONF] = ...,
},
[AF_INET6] = {
[IFLA_INET6_FLAGS] = ...,
[IFLA_INET6_CONF] = ...,
}
}
The API also allows for address families to implement a function
which parses the IFLA_AF_SPEC attribute sent by userspace to
implement address family specific link options.
Signed-off-by: Thomas Graf <tgraf@infradead.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch helps clarify documentation for
net.ipv4.igmp_max_memberships by providing a formula for
calculating the maximum number of multicast groups that can be
subscribed to, plus defining the theoretical limit.
Signed-off-by: Jiri Pirko <jpirko@redhat.com>
Signed-off-by: Jeremy Eder <jeder@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Add __attribute__((format... to several functins
Make formats and arguments match.
Signed-off-by: Joe Perches <joe@perches.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The SELinux netfilter hooks just return NF_DROP if they drop a packet. We
want to signal that a drop in this hook is a permanant fatal error and is not
transient. If we do this the error will be passed back up the stack in some
places and applications will get a faster interaction that something went
wrong.
Signed-off-by: Eric Paris <eparis@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The current tcp_connect code completely ignores errors from sending an skb.
This makes sense in many situations (like -ENOBUFFS) but I want to be able to
immediately fail connections if they are denied by the SELinux netfilter hook.
Netfilter does not normally return ECONNREFUSED when it drops a packet so we
respect that error code as a final and fatal error that can not be recovered.
Based-on-patch-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: Eric Paris <eparis@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
SELinux would like to pass certain fatal errors back up the stack. This patch
implements the generic netfilter support for this functionality.
Based-on-patch-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: Eric Paris <eparis@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
With enabling CONFIG_PCI_MSI, e1000e could work in MSI/MSI-X IRQ mode,
and netpoll controller didn't deal with those IRQ modes on e1000e.
This patch add the handling MSI/MSI-X IRQ modes to netpoll controller,
so that netconsole could work with those IRQ modes.
Signed-off-by: Dongdong Deng <dongdong.deng@windriver.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
The driver can fail initializing the hardware when manageability firmware
is performing concurrent MDIO operations because the hardware semaphore
scheme to prevent concurrent operations between software and firmware is
incorrect for 82574/82583. Instead of using the SWSM register, the driver
should be using the EXTCNF_CTRL register. A software mutex is also added
to prevent simultaneous software threads from performing similar concurrent
accesses.
Signed-off-by: Bruce Allan <bruce.w.allan@intel.com>
Tested-by: Emil Tantilov <emil.s.tantilov@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
SerDes Link detection on certain 82571 mezzanine cards can fail when the
link is forced, the link partner does not recognize forced link and the
link partner sends null code words. Detect the null code words and return
to auto-negotiation state which causes the link partner to begin responding
with valid code words. Within a reasonable interval the link will finally
settle as forced by both partners.
Signed-off-by: Bruce Allan <bruce.w.allan@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Update version string and copyright notice
Signed-off-by: Greg Rose <gregory.v.rose@intel.com>
Tested-by: Emil Tantilov <emil.s.tantilov@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
The driver is calling netif_carrier_off and netif_tx_stop_all_queues
before the netdevice is registered which causes an Oops. Move call
to netif_carrier_off after the netdevice is registered and remove
call to netif_tx_stop_all_queues because there aren't any TX
queues yet.
Signed-off-by: Greg Rose <gregory.v.rose@intel.com>
Tested-by: Emil Tantilov <emil.s.tantilov@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
I noticed ring variable was initialized before allocations, and that
memory node management was a bit ugly. We also leak memory in case of
ring allocations error.
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
This patch adds support for the x540 MAC which is the next MAC in the
82598/82599 line.
Signed-off-by: Don Skidmore <donald.c.skidmore@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Adds the new x540.c file and Aquantia 1202 PHY for X540 support.
Signed-off-by: Don Skidmore <donald.c.skidmore@intel.com>
Tested-by: Stephen Ko <stephen.s.ko@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
The new MAC type X540 shares much of the same functionality of
some silicon specific functions. To reduce duplicate code,
made these functions generic.
Signed-off-by: Don Skidmore <donald.c.skidmore@intel.com>
Tested-by: Stephen Ko <stephen.s.ko@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
When invalidating the DDP context is invalidated, the HW may not be done
with the user buffer right away. In which case, we poll the FCBUFF register
to check if the buffer valid bit is cleared or not, if not, we wait for max
100us that is guaranteed by the HW.
Signed-off-by: Yi Zou <yi.zou@intel.com>
Tested-by: Ross Brattain <ross.b.brattain@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
The hw automatically invalidates the context if DDP is successful or there is
error detected. In case there is no error status available from the hw,
initializing the per context error status to be 1 allows the DDP context to be
still invalidated via the upper layer call to ddp_put().
Signed-off-by: Yi Zou <yi.zou@intel.com>
Tested-by: Ross Brattain <ross.b.brattain@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
There is no point to allow incoming DDP requests from the upper layer stack if
the adapter is going down or being reset.
Signed-off-by: Yi Zou <yi.zou@intel.com>
Tested-by: Ross Brattain <ross.b.brattain@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
The Tx hang logic has been known to detect false hangs when
the device is receiving pause frames or has delayed processing
for some other reason.
This patch makes the logic more robust and resolves these
known issues. The old logic checked to see if the device
was paused by querying the HW then the hang logic was
aborted if the device was currently paused. This check was
racy because the device could have been in the pause state
any time up to this check. The other operation of the
hang logic is to verify the Tx ring is still advancing
the old logic checked the EOP timestamp. This is not
sufficient to determine the ring is not advancing but
only infers that it may be moving slowly.
Here we add logic to track the number of completed Tx
descriptors and use the adapter stats to check if any
pause frames have been received since the previous Tx
hang check. This way we avoid racing with the HW
register and do not detect false hangs if the ring is
advancing slowly.
This patch is primarily the work of Jesse Brandeburg. I
clean it up some and fixed the PFC checking.
Signed-off-by: John Fastabend <john.r.fastabend@intel.com>
Tested-by: Ross Brattain <ross.b.brattain@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
This change resolves some null function pointer accesses on 82598 when a
multi-speed fiber module is inserted into the adapter.
Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com>
Tested-by: Ross Brattain <ross.b.brattain@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
The q_vector back pointer was not being set in the rings so it would not
have been possible to determine the parent q_vector of the ring.
Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com>
Tested-by: Ross Brattain <ross.b.brattain@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
This change cleans up some of the items in ixgbe_map_rings_to_vectors.
Specifically it merges the two for loops and drops the unnecessary vectors
parameter.
It also moves the vector names into the q_vectors themselves.
Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com>
Tested-by: Ross Brattain <ross.b.brattain@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
This change is meant to improve the stack utilization and simplify the math
used in ixgbe_set_itr_msix.
Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com>
Tested-by: Ross Brattain <ross.b.brattain@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
There are a number of places where we use the variable j to contain the
register index of the ring. Instead of using such a non-descriptive
variable name it is better that we name it reg_idx so that it is clear what
the variable contains.
Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com>
Tested-by: Ross Brattain <ross.b.brattain@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
This change is just to cleanup some confusing logic in ixgbe_cache_ring_rss
which can be simplified by adding a conditional with return to the start of
the call.
Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com>
Tested-by: Ross Brattain <ross.b.brattain@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
This change address a few whitespace issues in DCB #ifdefs, adds a comment
calling out the DCB specific registers, and nests an if statement inline
with a number of if statements related to flow control.
Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com>
Tested-by: Ross Brattain <ross.b.brattain@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
This change adds support for certain 82599 based Mezzanine adapters.
Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com>
Tested-by: Ross Brattain <ross.b.brattain@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
This change makes it so that we always disable SCTP regardless of mac type
since we shouldn't need to check mac type before disabling a feature that
isn't supported on a given piece of hardware.
Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com>
Tested-by: Ross Brattain <ross.b.brattain@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
This change replaces a number of if/elseif/else statements with switch
statements to support the addition of future devices to the ixgbe driver.
Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com>
Tested-by: Ross Brattain <ross.b.brattain@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
This change cleans up the use of rsc_count and changes it to a boolean since
the actual numerical value is used nowhere in the Rx cleanup path. I am
also moving the skb count into the RSC_CB path since it is much easier to
track it there than when it is passed as a parameter to various function
calls.
Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com>
Tested-by: Ross Brattain <ross.b.brattain@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
This change cleans up the ixgbe_atr filter setup function so that it uses
fewer items from the stack. Since the code is only applicable to IPv4 w/
TCP it makes sense to just use the pointers based on the headers themselves
instead of copying them to temp variables and then writing those to the
filters.
Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com>
Tested-by: Ross Brattain <ross.b.brattain@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
The code for ixgbe_clean_rx_irq was much more tangled up than it needed to
be in terms of logic statements and unused variables. This change
untangles much of that and drops several unused variables such as cleaned
which was being returned but never checked.
Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com>
Tested-by: Ross Brattain <ross.b.brattain@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
This changes the numbering scheme slightly. Previously the ordering was
coming out like this:
Rx-2
Rx-1
Rx-0
TxRx-0
Which would drop two queues on CPU 0. This change makes it so that the
ordering is like this:
Rx-3
Rx-2
Rx-1
TxRx-0
This means that each CPU will have it's own Rx queue, and only CPU 0 will
have the Tx queue.
Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com>
Tested-by: Ross Brattain <ross.b.brattain@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
The code as it existed could re-arm the queues when it was requesting a HW
reset due to a TX hang. Instead of doing that this change makes it so that
we will just exit if the hardware is believed to be hung.
Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com>
Tested-by: Ross Brattain <ross.b.brattain@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
RSC will flush its descriptors every time the interrupt throttle timer
expires. In addition there are known issues with RSC when the rx-usecs
value is set too low. As such we are forced to clear the RSC_ENABLED bit
and reset the adapter when the rx-usecs value is set too low.
However we do not need to clear the NETIF_F_LRO flag because it is used to
indicate that the user wants to leave the LRO feature enabled, and in fact
with this change we will now re-enable RSC as soon as the rx-usecs value is
increased and the flag is still set.
Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com>
Tested-by: Ross Brattain <ross.b.brattain@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>