Commit Graph

377135 Commits

Author SHA1 Message Date
Li RongQing
8249152c47 xen-netfront: use skb_partial_csum_set() to simplify the codes
use skb_partial_csum_set() to simplify the codes

Cc: Jason Wang <jasowang@redhat.com>
Signed-off-by: Li RongQing <roy.qing.li@gmail.com>
Acked-by: Jason Wang <jasowang@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-06-11 02:34:36 -07:00
David S. Miller
2e42206948 Merge branch 'bridge_flags'
Vlad Yasevich says:

====================
The following series adds 2 new flags to bridge.  One flag allows
the user to control whether mac learning is performed on the interface
or not.  By default mac learning is on.
The other flag allows the user to control whether unicast traffic
is flooded (send without an fdb) to a given unicast port.  Default is
on.

Changes since v4:
 - Implemented Stephen's suggestions.

Changes since v2:
 - removed unused "unlock" tag.

Changes since v1:
 - Integrated suggestion from MST to not impact RTM_NEWNEIGH and to
   skip lookups when learning is disabled.

Vlad Yasevich (2):
  bridge: Add flag to control mac learning.
  bridge: Add a flag to control unicast packet flood.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2013-06-11 02:04:43 -07:00
Vlad Yasevich
867a59436f bridge: Add a flag to control unicast packet flood.
Add a flag to control flood of unicast traffic.  By default, flood is
on and the bridge will flood unicast traffic if it doesn't know
the destination.  When the flag is turned off, unicast traffic
without an FDB will not be forwarded to the specified port.

Signed-off-by: Vlad Yasevich <vyasevic@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-06-11 02:04:32 -07:00
Vlad Yasevich
9ba18891f7 bridge: Add flag to control mac learning.
Allow user to control whether mac learning is enabled on the port.
By default, mac learning is enabled.  Disabling mac learning will
cause new dynamic FDB entries to not be created for a particular port.

Signed-off-by: Vlad Yasevich <vyasevic@redhat.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-06-11 02:04:32 -07:00
Cong Wang
30f3a40f9a net: remove last caller of skb_tail_offset() and itself
Similar to the following commits:

commit 00f97da17a (netpoll: fix position of network header)
commit 525cebedb3 (pktgen: Fix position of ip and udp header)

using skb_tail_offset() seems not correct since the offset
is based on head pointer.

With the last caller removed, skb_tail_offset() can be killed
finally.

Cc: Thomas Graf <tgraf@suug.ch>
Cc: Daniel Borkmann <dborkmann@redhat.com>
Cc: David S. Miller <davem@davemloft.net>
Signed-off-by: Cong Wang <amwang@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-06-10 22:22:23 -07:00
David S. Miller
0a4db187a9 Merge branch 'll_poll'
Eliezer Tamir says:

====================
This patch set adds the ability for the socket layer code to
poll directly on an Ethernet device's RX queue.
This eliminates the cost of the interrupt and context switch
and with proper tuning allows us to get very close to the HW latency.

This is a follow up to Jesse Brandeburg's Kernel Plumbers talk from
last year
http://www.linuxplumbersconf.org/2012/wp-content/uploads/2012/09/2012-lpc-Low-Latency-Sockets-slides-brandeburg.pdf

Patch 1 adds a napi_id and a hashing mechanism to lookup a napi by id.
Patch 2 adds an ndo_ll_poll method and the code that supports it.
Patch 3 adds support for busy-polling on UDP sockets.
Patch 4 adds support for TCP.
Patch 5 adds the ixgbe driver code implementing ndo_ll_poll.
Patch 6 adds additional statistics to the ixgbe driver for ndo_ll_poll.

Performance numbers:
     setup                         TCP_RR           UDP_RR
kernel  Config     C3/6 rx-usecs tps cpu% S.dem  tps cpu% S.dem
patched optimized  on   100      87k 3.13 11.4   94K 3.17 10.7
patched optimized  on   0        71k 3.12 14.0   84k 3.19 12.0
patched optimized  on   adaptive 80k 3.13 12.5   90k 3.46 12.2
patched typical    on   100      72  3.13 14.0   79k 3.17 12.8
patched typical    on   0        60k 2.13 16.5   71k 3.18 14.0
patched typical    on   adaptive 67k 3.51 16.7   75k 3.36 14.5
3.9     optimized  on   adaptive 25k 1.0  12.7   28k 0.98 11.2
3.9     typical    off  0        48k 1.09  7.3   52k 1.11 4.18
3.9     typical    0ff  adaptive 35k 1.12 4.08   38k 0.65 5.49
3.9     optimized  off  adaptive 40k 0.82 4.83   43k 0.70 5.23
3.9     optimized  off  0        57k 1.17 4.08   62k 1.04 3.95

Test setup details:
Machines: each with two Intel Xeon 2680 CPUs and X520 (82599) optical
NICs
Tests: Netperf tcp_rr and udp_rr, 1 byte (round trips per second)
Kernel: unmodified 3.9 and patched 3.9
Config: typical is derived from RH6.2, optimized is a stripped down
config.
Interrupt coalescing (ethtool rx-usecs) settings: 0=off, 1=adaptive,
100 us
When C3/6 states were turned on (via BIOS) the performance governor
was used.

These performance numbers were measured with v2 of the patch set.
Performance of the optimized config with an rx-usecs setting of 100
(the first line in the table above) was tracked during the evolution
of the patches and has never varied by more than 1%.

Design:
A global hash table that allows us to look up a struct napi by a
unique id was added.

A napi_id field was added both to struct sk_buff and struct sk.
This is used to track which NAPI we need to poll for a specific
socket.

The device driver marks every incoming skb with this id.
This is propagated to the sk when the socket is looked up in the
protocol handler.

When the socket code does not find any more data on the socket queue,
it now may call ndo_ll_poll which will crank the device's rx queue and
feed incoming packets to the stack directly from the context of the
socket.

A sysctl value (net.core4.low_latency_poll) controls how many
microseconds we busy-wait before giving up. (setting to 0 globally
disables busy-polling)

Locking:

1. Locking between napi poll and ndo_ll_poll:
Since what needs to be locked between a device's NAPI poll and
ndo_ll_poll, is highly device / configuration dependent, we do this
inside the Ethernet driver.
For example, when packets for high priority connections are sent to
separate rx queues, you might not need locking between napi poll and
ndo_ll_poll at all.

For ixgbe we only lock the RX queue.
ndo_ll_poll does not touch the interrupt state or the TX queues.
(earlier versions of this patchset did touch them,
but this design is simpler and works better.)

If a queue is actively polled by a socket (on another CPU) napi poll
will not service it, but will wait until the queue can be locked
and cleaned before doing a napi_complete().
If a socket can't lock the queue because another CPU has it,
either from napi or from another socket polling on the queue,
the socket code can busy wait on the socket's skb queue.

Ndo_ll_poll does not have preferential treatment for the data from the
calling socket vs. data from others, so if another CPU is polling,
you will see your data on this socket's queue when it arrives.

Ndo_ll_poll is called with local BHs disabled, so it won't race on
the same CPU with net_rx_action, which calls the napi poll method.

2. Napi_hash
The napi hash mechanism uses RCU.
napi_by_id() must be called under rcu_read_lock().
After a call to napi_hash_del(), caller must take care to wait an rcu
grace period before freeing the memory containing the napi struct.
(Ixgbe already had this because the queue vector structure uses rcu to
protect the statistics counters in it.)

how to test:

1. The patchset should apply cleanly to net-next.
(don't forget to configure INET_LL_RX_POLL).

2. The ethtool -c setting for rx-usecs should be on the order of 100.

3. Use ethtool -K to disable GRO and LRO
(You are encouraged to try it both ways. If you find that your
workload
does better with GRO on do tell us.)

4. Sysctl value net.core.low_latency_poll controls how long
(in us) to busy-wait for more data, You are encouraged to play
with this and see what works for you. The default is now 0 so you need
to
set it to turn the feature on. I recommend a value around 50.

4. benchmark thread and IRQ should be bound to separate cores.
Both cores should be on the same CPU NUMA node as the NIC.
When the app and the IRQ run on the same CPU  you get a small penalty.
If interrupt coalescing is set to a low value this penalty can be very
large.

5. If you suspect that your machine is not configured properly,
use numademo to make sure that the CPU to memory BW is OK.
numademo 128m memcpy local copy numbers should be more than
8GB/s on a properly configured machine.

Change log:
v10
- removed select/poll support. (we will work on this some more and try again)
v9
- correct sysctl proc_handler, reported by Eric Dumazet and Amir Vadai.
- more int -> bool changes, reported by Eric Dumazet.
- better mask testing in sock_poll(), reported by Eric Dumazet.

v8
- split out udp and select/poll into separate patches.
  what used to be patch 2/5 is now three patches.
- type corrections from Amir Vadai and Cong Wang:
  one unsigned long that was left when changing to cycles_t
  int -> bool
- more detailed patch descriptions.

v7
- suggested by Ben Hutchings and Eric Dumazet:
  type fixes, static for globals in net/core.c,
  avoid napi_id collisions in napi_hash_add()

v6
- many small fixes suggested by Eric Dumazet:
  data locality, typos, documentation
  protect napi_hash insert/delete with a spinlock (napi_gen_id is no
  longer atomic_t since it's only accessed with the spinlock held.)
- added IPv6 TCP and UDP support (only minimally tested)

v5
- corrections suggested by Ben Hutchings:
  fixed typos, moved the config option and sysctl value from IPv4 to net
- moved sk_mark_ll() to the protocol handlers
- removed global id mechanism, replaced with a hashed napi_id.
  based on code sample from Eric Dumazet
  Note that ixgbe_free_q_vector() already waits an rcu grace period
  before freeing the q_vector, so nothing additional needs to be done
  when adding a call to napi_hash_del().
- simple poll/select support

v4
- removed separate config option for TCP as suggested Eric Dumazet.
- added linux mib counter for packets received through the low latency path,
  as suggested by Andi Kleen.
- re-allow module unloading, remove module param, use a global generation id
  instead to prevent the use of a stale napi pointer, as suggested
  by Eric Dumazet
- updated Documentation/networking/ip-sysctl.txt text

v3
- coding style changes suggested by Dave Miller

v2
- the sysctl knob is now in microseconds. The default value is now 0 (off).
- for now the code depends at configure time on CONFIG_I86_TSC
- the napi reference in struct skb is now a union with the dma cookie
  since the former is only used on RX and the latter on TX,
  as suggested by Eric Dumazet.
- we do a better job at honoring non-blocking operations.
- removed busy-polling support for tcp_read_sock()
- remove dynamic disabling of GRO
- coding style fixes
- disallow unloading the device module after the feature has been used

Credit:
Jesse Brandeburg, Arun Chekhov Ilango, Julie Cummings,
Alexander Duyck, Eric Geisler, Jason Neighbors, Yadong Li,
Mike Polehn, Anil Vasudevan, Don Wood
Special thanks for finding bugs in earlier versions:
Willem de Bruijn and Andi Kleen
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2013-06-10 21:23:57 -07:00
Eliezer Tamir
7e15b90ff9 ixgbe: add extra stats for ndo_ll_poll
Add additional statistics to the ixgbe driver for ndo_ll_poll
Defined under LL_EXTENDED_STATS

Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com>
Signed-off-by: Jesse Brandeburg <jesse.brandeburg@intel.com>
Signed-off-by: Eliezer Tamir <eliezer.tamir@linux.intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-06-10 21:22:36 -07:00
Eliezer Tamir
5a85e737f3 ixgbe: add support for ndo_ll_poll
Add the ixgbe driver code implementing ndo_ll_poll.
Adds ndo_ll_poll method and locking between it and the napi poll.
When receiving a packet we use skb_mark_ll to record the napi it came from.
Add each napi to the napi_hash right after netif_napi_add().

Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com>
Signed-off-by: Jesse Brandeburg <jesse.brandeburg@intel.com>
Signed-off-by: Eliezer Tamir <eliezer.tamir@linux.intel.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-06-10 21:22:36 -07:00
Eliezer Tamir
d30e383bb8 tcp: add low latency socket poll support.
Adds low latency socket poll support for TCP.
In tcp_v[46]_rcv() add a call to sk_mark_ll() to copy the napi_id
from the skb to the sk.
In tcp_recvmsg(), when there is no data in the socket we busy-poll.
This is a good example of how to add busy-poll support to more protocols.

Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com>
Signed-off-by: Jesse Brandeburg <jesse.brandeburg@intel.com>
Signed-off-by: Eliezer Tamir <eliezer.tamir@linux.intel.com>
Acked-by: Eric Dumazet <edumazet@google.com>
Tested-by: Willem de Bruijn <willemb@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-06-10 21:22:36 -07:00
Eliezer Tamir
a5b50476f7 udp: add low latency socket poll support
Add upport for busy-polling on UDP sockets.
In __udp[46]_lib_rcv add a call to sk_mark_ll() to copy the napi_id
from the skb into the sk.
This is done at the earliest possible moment, right after we identify
which socket this skb is for.
In __skb_recv_datagram When there is no data and the user
tries to read we busy poll.

Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com>
Signed-off-by: Jesse Brandeburg <jesse.brandeburg@intel.com>
Signed-off-by: Eliezer Tamir <eliezer.tamir@linux.intel.com>
Acked-by: Eric Dumazet <edumazet@google.com>
Tested-by: Willem de Bruijn <willemb@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-06-10 21:22:36 -07:00
Eliezer Tamir
0602129286 net: add low latency socket poll
Adds an ndo_ll_poll method and the code that supports it.
This method can be used by low latency applications to busy-poll
Ethernet device queues directly from the socket code.
sysctl_net_ll_poll controls how many microseconds to poll.
Default is zero (disabled).
Individual protocol support will be added by subsequent patches.

Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com>
Signed-off-by: Jesse Brandeburg <jesse.brandeburg@intel.com>
Signed-off-by: Eliezer Tamir <eliezer.tamir@linux.intel.com>
Acked-by: Eric Dumazet <edumazet@google.com>
Tested-by: Willem de Bruijn <willemb@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-06-10 21:22:35 -07:00
Eliezer Tamir
af12fa6e46 net: add napi_id and hash
Adds a napi_id and a hashing mechanism to lookup a napi by id.
This will be used by subsequent patches to implement low latency
Ethernet device polling.
Based on a code sample by Eric Dumazet.

Signed-off-by: Eliezer Tamir <eliezer.tamir@linux.intel.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Tested-by: Willem de Bruijn <willemb@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-06-10 21:22:35 -07:00
Maxime Bizon
6f00a02296 bcm63xx_enet: add support for Broadcom BCM63xx integrated gigabit switch
Newer Broadcom BCM63xx SoCs: 6328, 6362 and 6368 have an integrated switch
which needs to be driven slightly differently from the traditional
external switches. This patch introduces changes in arch/mips/bcm63xx in order
to:

- register a bcm63xx_enetsw driver instead of bcm63xx_enet driver
- update DMA channels configuration & state RAM base addresses
- add a new platform data configuration knob to define the number of
  ports per switch/device and force link on some ports
- define the required switch registers

On the driver side, the following changes are required:

- the switch ports need to be polled to ensure the link is up and
  running and RX/TX can properly work
- basic switch configuration needs to be performed for the switch to
  forward packets to the CPU
- update the MIB counters since the integrated

Signed-off-by: Maxime Bizon <mbizon@freebox.fr>
Signed-off-by: Jonas Gorski <jogo@openwrt.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-06-10 14:28:28 -07:00
Maxime Bizon
0ae99b5fed bcm63xx_enet: split DMA channel register accesses
The current bcm63xx_enet driver always uses bcmenet_shared_base whenever
it needs to access DMA channel configuration space or access the DMA
channel state RAM. Split these register in 3 parts to be more accurate:

- global DMA configuration
- per DMA channel configuration space
- per DMA channel state RAM space

This is preliminary to support new chips where the global DMA
configuration remains the same, but there is a varying number of DMA
channels located at a different memory offset.

Signed-off-by: Maxime Bizon <mbizon@freebox.fr>
Signed-off-by: Jonas Gorski <jogo@openwrt.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-06-10 14:28:27 -07:00
Maxime Bizon
7260aac974 bcm63xx_enet: implement reset autoneg ethtool callback
Implement the rset_nway ethtool callback which uses libphy generic
autonegotiation restart function.

Signed-off-by: Maxime Bizon <mbizon@freebox.fr>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-06-10 14:28:27 -07:00
Jason Wang
df09b36f22 macvtap: enable multiqueue flag
To notify the userspace about our capability of multiqueue.

Signed-off-by: Jason Wang <jasowang@redhat.com>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-06-07 23:49:09 -07:00
Jason Wang
815f236d62 macvtap: add TUNSETQUEUE ioctl
This patch adds TUNSETQUEUE ioctl to let userspace can temporarily disable or
enable a queue of macvtap. This is used to be compatible at API layer of tuntap
to simplify the userspace to manage the queues. This is done through introducing
a linked list to track all taps while using vlan->taps array to only track
active taps.

Signed-off-by: Jason Wang <jasowang@redhat.com>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-06-07 23:49:09 -07:00
Jason Wang
376b1aabe1 macvtap: eliminate linear search
Linear search were used in both get_slot() and macvtap_get_queue(), this is
because:

- macvtap didn't reshuffle the array of taps when create or destroy a queue, so
  when adding a new queue, macvtap must do linear search to find a location for
  the new queue. This will also complicate the TUNSETQUEUE implementation for
  multiqueue API.
- the queue itself didn't track the queue index, so the we must do a linear
  search in the array to find the location of a existed queue.

The solution is straightforward: reshuffle the array and introduce a queue_index
to macvtap_queue.

Signed-off-by: Jason Wang <jasowang@redhat.com>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-06-07 23:49:09 -07:00
Jason Wang
f0afce01aa macvlan: change the max number of queues to 16
Macvtap should be at least compatible with tap, so change the max number to 16.

Signed-off-by: Jason Wang <jasowang@redhat.com>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-06-07 23:49:09 -07:00
Jason Wang
8f475a318a macvtap: introduce macvtap_get_vlan()
Factor out the device holding logic to a macvtap_get_vlan(), this will be also
used by multiqueue API.

Signed-off-by: Jason Wang <jasowang@redhat.com>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-06-07 23:49:08 -07:00
Jason Wang
1d2f41ed23 macvlan: switch to use IS_ENABLED()
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Jason Wang <jasowang@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-06-07 23:49:08 -07:00
Jason Wang
89cee917de macvtap: do not add self to waitqueue if doing a nonblock read
There's no need to add self to waitqueue if doing a nonblock read. This could
help to avoid the spinlock contention.

Acked-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Jason Wang <jasowang@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-06-07 23:49:08 -07:00
Jason Wang
ed0483fa06 macvtap: fix a possible race between queue selection and changing queues
Complier may generate codes that re-read the vlan->numvtaps during
macvtap_get_queue(). This may lead a race if vlan->numvtaps were changed in the
same time and which can lead unexpected result (e.g. very huge value).

We need prevent the compiler from generating such codes by adding an
ACCESS_ONCE() to make sure vlan->numvtaps were only read once.

Acked-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Jason Wang <jasowang@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-06-07 23:49:08 -07:00
David S. Miller
e403d29581 sh_eth: Fix warnings on 64-bit.
Don't cast a plain integer to a pointer.

drivers/net/ethernet/renesas/sh_eth.c: In function ‘sh_eth_chip_reset_giga’:
drivers/net/ethernet/renesas/sh_eth.c:482:22: warning: cast to pointer from integer of different size [-Wint-to-pointer-cast]
drivers/net/ethernet/renesas/sh_eth.c:483:22: warning: cast to pointer from integer of different size [-Wint-to-pointer-cast]
drivers/net/ethernet/renesas/sh_eth.c:492:22: warning: cast to pointer from integer of different size [-Wint-to-pointer-cast]
drivers/net/ethernet/renesas/sh_eth.c:493:22: warning: cast to pointer from integer of different size [-Wint-to-pointer-cast]

Signed-off-by: David S. Miller <davem@davemloft.net>
2013-06-07 23:40:41 -07:00
Sergei Shtylyov
eb770bf430 sh_eth: remove dependencies from Kconfig
Since dependence on the certain SoCs is no  longer  necessary to compile the
driver, remove the dependency list from its Kconfig entry which is a popular
demand anyway...

Signed-off-by: Sergei Shtylyov <sergei.shtylyov@cogentembedded.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-06-07 23:38:24 -07:00
Sergei Shtylyov
589ebdef7e sh_eth: get R8A777x support out of #ifdef
Get the R-Car code/data in the driver out of #ifdef by adding "r8a777x-ether" to
the platfrom driver's  ID table; since it's the last #ifdef, we remove CARDNAME
from the  ID table and no longer check  the driver data before  assigning it to
'mdp->cd'...
Change the Ether platform device's name in the ARM platform code accordingly.

Signed-off-by: Sergei Shtylyov <sergei.shtylyov@cogentembedded.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-06-07 23:38:24 -07:00
Sergei Shtylyov
9c3beaabb9 sh_eth: get SH7724 support out of #ifdef
Get the SH7724 code/data in the driver out of #ifdef by adding "r8a7724-ether"
to the platform driver's ID table. Change the Ether platform device's name in
the SH platform code accordingly.

Signed-off-by: Sergei Shtylyov <sergei.shtylyov@cogentembedded.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-06-07 23:38:24 -07:00
Sergei Shtylyov
24549e2a0f sh_eth: get SH7757 support out of #ifdef
Get the SH7757 code/data in the driver out of #ifdef by adding "sh7757-ether"
and "sh7757-gether" to the platform driver's ID table.  Note that we can remove
SH_ETH_HAS_BOTH_MODULES and sh_eth_get_cpu_data().
Change the Ether/GEther platform devices' names in the SH platform code
accordingly.

Signed-off-by: Sergei Shtylyov <sergei.shtylyov@cogentembedded.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-06-07 23:38:24 -07:00
Sergei Shtylyov
f5d12767c8 sh_eth: get SH77{34|63} support out of #ifdef
Get the SH77{34|63} specific code/data in the driver out of #ifdef by adding
"sh7734-gether" and "sh7763-gether" to the platform driver's ID table.  Note
that we have to split the 'struct sh_eth_cpu_data' instance into two due to
#ifdef inside it; note that we can kill the duplicate sh_eth_set_rate_gether().
Change the GEther platform device's name in the SH platform code accordingly.

Signed-off-by: Sergei Shtylyov <sergei.shtylyov@cogentembedded.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-06-07 23:38:24 -07:00
Sergei Shtylyov
e5c9b4cd66 sh_eth: get R8A7740 support out of #ifdef
Get the R8A7740 code/data in the driver out of #ifdef by adding "r8a7740-gether"
to the platform driver's ID table.  Change the GEther platform device's name in
the ARM platform code accordingly.

Signed-off-by: Sergei Shtylyov <sergei.shtylyov@cogentembedded.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-06-07 23:38:24 -07:00
Sergei Shtylyov
c18a79abe3 sh_eth: get SH7619 support out of #ifdef
Get  the SH7619 data in the driver out of #ifdef by adding "sh7619-ether" to the
platform driver's ID table. Change the Ether platform device's name  in  the SH
platform code accordingly.

Signed-off-by: Sergei Shtylyov <sergei.shtylyov@cogentembedded.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-06-07 23:38:23 -07:00
Sergei Shtylyov
7bbe150d8c sh_eth: get SH771x support out of #ifdef
Get the SH771[02] data in the driver out of #ifdef by adding "sh771x-ether" to
the platform driver's ID table. Change the Ether platform device's name in the
SH platform code accordingly.

Signed-off-by: Sergei Shtylyov <sergei.shtylyov@cogentembedded.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-06-07 23:38:23 -07:00
Sergei Shtylyov
afe391ad4b sh_eth: create initial ID table
We are trying to get away from the current driver's scheme of identifying a SoC
based on #ifdef's and the platform device ID table matching seems to be a good
replacement -- we can use the 'driver_data' field of 'struct platform_device_id'
as a pointer to a 'struct sh_eth_cpu_data'. Start by creating the initial table
with driver's name as the only entry without the driver data. Check  the driver
data in the probe() method and if it's not NULL override 'mdp->cd' from it.

Signed-off-by: Sergei Shtylyov <sergei.shtylyov@cogentembedded.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-06-07 23:38:23 -07:00
Rami Rosen
e8b265e8ba doc:networking: Fix default value (icmp_ignore_bogus_error_responses).
This patch fixes icmp_ignore_bogus_error_responses default value to be 1 instead
of FALSE. It is initialized to 1 in icmp_sk_init().

Signed-off-by: Rami Rosen <ramirose@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-06-07 23:31:54 -07:00
Pablo Neira Ayuso
c05cdb1b86 netlink: allow large data transfers from user-space
I can hit ENOBUFS in the sendmsg() path with a large batch that is
composed of many netlink messages. Here that limit is 8 MBytes of
skbuff data area as kmalloc does not manage to get more than that.

While discussing atomic rule-set for nftables with Patrick McHardy,
we decided to put all rule-set updates that need to be applied
atomically in one single batch to simplify the existing approach.
However, as explained above, the existing netlink code limits us
to a maximum of ~20000 rules that fit in one single batch without
hitting ENOBUFS. iptables does not have such limitation as it is
using vmalloc.

This patch adds netlink_alloc_large_skb() which is only used in
the netlink_sendmsg() path. It uses alloc_skb if the memory
requested is <= one memory page, that should be the common case
for most subsystems, else vmalloc for higher memory allocations.

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-06-07 16:26:34 -07:00
Jay Vosburgh
1b5acd2923 bonding: disallow change of MAC if fail_over_mac enabled
Currently, if fail_over_mac is set to active, then attempts to
change the MAC of the bond itself silently fail.  However, if fail_over_mac
is set to follow, changes are permitted.

	Permitting the bond's MAC to change with fail_over_mac=follow
will disrupt the follow functionality, which normally controls the
assignment of MAC address to the bond and its slaves, and can cause
multiple ports to be assigned the same MAC address. which will interfere
with the functioning of the device (where the device here is a
virtualization-aware card for s390, qeth).

Signed-off-by: Jay Vosburgh <fubar@us.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-06-07 15:05:51 -07:00
Jay Vosburgh
303d1cbf61 bonding: Convert hw addr handling to sync/unsync, support ucast addresses
This patch converts bonding to use the dev_uc/mc_sync and
dev_uc/mc_sync_multiple functions for updating the hardware addresses
of bonding slaves.

	The existing functions to add or remove addresses are removed,
and their functionality is replaced with calls to dev_mc_sync or
dev_mc_sync_multiple, depending upon the bonding mode.

	Calls to dev_uc_sync and dev_uc_sync_multiple are also added,
so that unicast addresses added to a bond will be properly synced with
its slaves.

	Various functions are renamed to better reflect the new
situation, and relevant comments are updated.

Signed-off-by: Jay Vosburgh <fubar@us.ibm.com>
Cc: Vlad Yasevich <vyasevic@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-06-07 15:05:51 -07:00
Fabio Estevam
ca162a82f5 fec: Only pass pdev in fec_ptp_init()
Passing pdev in fec_ptp_init() is enough, since we can get ndev locally.

Signed-off-by: Fabio Estevam <fabio.estevam@freescale.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-06-07 14:41:07 -07:00
Daniel Borkmann
28850dc7c7 net: tcp: move GRO/GSO functions to tcp_offload
Would be good to make things explicit and move those functions to
a new file called tcp_offload.c, thus make this similar to tcpv6_offload.c.
While moving all related functions into tcp_offload.c, we can also
make some of them static, since they are only used there. Also, add
an explicit registration function.

Suggested-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: Daniel Borkmann <dborkman@redhat.com>
Acked-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-06-07 14:39:05 -07:00
Daniel Borkmann
5ee9859157 net: minor: tcp: use tcp_skb_mss helper in tcp_tso_segment
We have the minimal inline helper tcp_skb_mss to access
skb_shinfo(skb)->gso_size, so also use it here to get mss.

Signed-off-by: Daniel Borkmann <dborkman@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-06-07 14:39:05 -07:00
Daniel Borkmann
350e780570 net: vlan: minor: remove unused HAVE_VLAN_PUT_TAG
Remove the definition of HAVE_VLAN_PUT_TAG since it's not
used or exported anywhere.

Signed-off-by: Daniel Borkmann <dborkman@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-06-07 14:39:05 -07:00
Daniel Borkmann
d70a3f887a doc: packet: simplify tpacket example code
This patch simplifies the tpacket_v3 example code a bit by getting rid
of unecessary macro wrappers, removing some debugging code so that it is
more to the point, and also adds a header comment. Now this example code
is the very minimum one needs to start from when dealing with tpacket_v3
and ~100 lines smaller than before.

Signed-off-by: Daniel Borkmann <dborkman@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-06-07 14:39:05 -07:00
David S. Miller
93a306aef5 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
Merge 'net' into 'net-next' to get the MSG_CMSG_COMPAT
regression fix.

Signed-off-by: David S. Miller <davem@davemloft.net>
2013-06-06 23:39:26 -07:00
Linus Torvalds
1612e111e4 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
Pull networking fix from David Miller:
 "This is a quick one commit pull request to cure the regression
  introduced by the MSG_CMSG_COMPAT change."

(Background: commit 1be374a051 completely broke 32-bit COMPAT handling
by not only disallowing MSG_CMSG_COMPAT from user APIs, but clearing it
in our own internal use too!)

* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net:
  net: Unbreak compat_sys_{send,recv}msg
2013-06-06 18:09:05 -07:00
Linus Torvalds
e2b02e25c5 Staging driver fixes for 3.10-rc5
Here are some staging and IIO driver fixes for the 3.10-rc5 release.
 
 All of them are tiny, and fix a number of reported issues (build and
 runtime.)
 
 Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v2.0.19 (GNU/Linux)
 
 iEYEABECAAYFAlGwzF0ACgkQMUfUDdst+ymYtgCgg8fPe5FWUgi0Mu+dn/QZS0UC
 uv4AoI3nLDo/PdTgXe3xzTMN9H3AECIB
 =0fZC
 -----END PGP SIGNATURE-----

Merge tag 'staging-3.10-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/staging

Pull staging driver fixes from Greg Kroah-Hartman:
 "Here are some staging and IIO driver fixes for the 3.10-rc5 release.

  All of them are tiny, and fix a number of reported issues (build and
  runtime)"

Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

* tag 'staging-3.10-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/staging:
  iio:inkern: Fix typo/bug in convert raw to processed.
  iio: frequency: ad4350: Fix bug / typo in mask
  inkern: iio_device_put after incorrect return/goto
  staging: alarm-dev: information leak in alarm_compat_ioctl()
  iio:callback buffer: free the scan_mask
  staging: alarm-dev: information leak in alarm_ioctl()
  drivers: staging: zcache: fix compile error
  staging: dwc2: fix value of dma_mask
2013-06-06 16:34:11 -07:00
Linus Torvalds
3b285cb2f7 TTY/Serial driver fixes for 3.10-rc4
Here are some small bugfixes, and one revert, of serial driver issues
 that have been reported.
 
 Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v2.0.19 (GNU/Linux)
 
 iEYEABECAAYFAlGwzMgACgkQMUfUDdst+ykFHgCgwaBKNKG1fYpfdasryDCluK+P
 YT8An2wg7ogl6/IaENORK3/D3j0sjpFw
 =pYSh
 -----END PGP SIGNATURE-----

Merge tag 'tty-3.10-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty

Pull tty/serial driver fixes from Greg Kroah-Hartman:
 "Here are some small bugfixes, and one revert, of serial driver issues
  that have been reported"

* tag 'tty-3.10-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty:
  Revert "serial: 8250: Make SERIAL_8250_RUNTIME_UARTS work correctly"
  serial: samsung: enable clock before clearing pending interrupts during init
  serial/imx: disable hardware flow control at startup
2013-06-06 16:33:35 -07:00
Linus Torvalds
c6d6b9d149 USB fixes for 3.10-rc4
Here are a number of USB bugfixes and new device ids for the 3.10-rc5 tree.
 
 Nothing major here, a number of new device ids (and movement from the
 option to the zte_ev driver of a number of ids that we had previously
 gotten wrong, some xhci bugfixes, some usb-serial driver fixes that were
 recently found, some host controller fixes / reverts, and a variety of
 smaller other things.
 
 Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v2.0.19 (GNU/Linux)
 
 iEYEABECAAYFAlGwzXMACgkQMUfUDdst+ynKMgCgsM5KsOowmq6Xit8kTa5FNv9P
 kF8An2svAr8CdzEH8i6USghYKA2B1NhC
 =FAus
 -----END PGP SIGNATURE-----

Merge tag 'usb-3.10-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb

Pull USB fixes from Greg Kroah-Hartman:
 "Here are a number of USB bugfixes and new device ids for the 3.10-rc5
  tree.

  Nothing major here, a number of new device ids (and movement from the
  option to the zte_ev driver of a number of ids that we had previously
  gotten wrong, some xhci bugfixes, some usb-serial driver fixes that
  were recently found, some host controller fixes / reverts, and a
  variety of smaller other things"

* tag 'usb-3.10-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb: (29 commits)
  USB: option,zte_ev: move most ZTE CDMA devices to zte_ev
  USB: option: blacklist network interface on Huawei E1820
  USB: whiteheat: fix broken port configuration
  USB: serial: fix TIOCMIWAIT return value
  USB: mos7720: fix hardware flow control
  USB: keyspan: remove unused endpoint-array access
  USB: keyspan: fix bogus array index
  USB: zte_ev: fix broken open
  USB: serial: Add Option GTM681W to qcserial device table.
  USB: Serial: cypress_M8: Enable FRWD Dongle hidcom device
  USB: EHCI: fix regression related to qh_refresh()
  usbfs: Increase arbitrary limit for USB 3 isopkt length
  USB: zte_ev: fix control-message timeouts
  USB: mos7720: fix message timeouts
  USB: iuu_phoenix: fix bulk-message timeout
  USB: ark3116: fix control-message timeout
  USB: mos7840: fix DMA to stack
  USB: mos7720: fix DMA to stack
  USB: visor: fix initialisation of Treo/Kyocera devices
  USB: serial: fix Treo/Kyocera interrrupt-in urb context
  ...
2013-06-06 16:29:17 -07:00
Linus Torvalds
c51aa6db2a PCI update for v3.10:
PCI ROM from EFI
       x86/PCI: Map PCI setup data with ioremap() so it can be in highmem
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1.4.11 (GNU/Linux)
 
 iQIcBAABAgAGBQJRsMcBAAoJEFmIoMA60/r8EGMP/RiVo0oHoM1DlNfFonw8BjTl
 iClotD3rfQRgjDx8/563/KfO464mLJLbWdjanbpmWB215Xu4LJwv3wfWWEB1f3C+
 hbsz4aFLmoQFnCzg1BIsuNNu88cXymf4A+osfKl+pCxPY+zChkqNZUPBk0m7aW4/
 Y8JZz0NkxDUUb6ewtDhRDci9d5dHAzL/bmfcdUrP5X56znV72eVH555vY33hzNQ+
 opoB9bFOdjxq8EDMlSaGxHmClUH/1VJd8Gr9Q4oj8OUOMq7At/WbHbFYJPbn/09+
 xdh+78oGKZ6vI1rfIwlGSbJaizZJoE8l3EVEeY1rSgxF2zKNRAyN9IZOPaTigsjd
 5zHTwUp+itxstw3fxjeG0R+Gl8guf13XTWzGcR6sC1TTn/NXSxqI3inORs5PWmpP
 ldfPkOW5Y3pcP6UfmJcAn1z9W8toyCouovFPnSoit6ZzS/+aFY2vYfHBQZ8LdJHv
 Gq4sYmcxWTzfgBxybc6OczztXgTQ+grhs3nnP29/QrGDyJ6RA6E6ENKiqARkALCF
 1qFeOdWcoG4HaAlQhMRwuJTsUw7ofJWmYi6AdbHgI5pBQM4lLR1Kp+hGJnmeTBHm
 Svx98dv525HZ2tVUZwsd+ExsEfQ4IlPECVmOpxFkUf4sRILhvCTP5C6KWHBev7sG
 +F4rwomkmlS1hu82xMwl
 =dMNk
 -----END PGP SIGNATURE-----

Merge tag 'pci-v3.10-fixes-3' of git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci

Pull PCI fixes from Bjorn Helgaas:
 "This fixes a crash when booting a 32-bit kernel via the EFI boot stub.

  PCI ROM from EFI
      x86/PCI: Map PCI setup data with ioremap() so it can be in highmem"

* tag 'pci-v3.10-fixes-3' of git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci:
  x86/PCI: Map PCI setup data with ioremap() so it can be in highmem
2013-06-06 16:28:15 -07:00
Nicolas Ferre
4df95131ea net/macb: change RX path for GEM
GEM is able to adapt its DMA buffer size, so change
the RX path to take advantage of this possibility and
remove all kind of memcpy in this path.
This modification introduces function pointers for managing
differences between MACB and GEM adapter type.

Signed-off-by: Nicolas Ferre <nicolas.ferre@atmel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-06-06 16:22:45 -07:00
Nicolas Ferre
1b44791ab4 net/macb: increase RX buffer size for GEM
Macb Ethernet controller requires a RX buffer of 128 bytes. It is
highly sub-optimal for Gigabit-capable GEM that is able to use
a bigger DMA buffer. Change this constant and associated macros
with data stored in the private structure.
RX DMA buffer size has to be multiple of 64 bytes as indicated in
DMA Configuration Register specification.

Signed-off-by: Nicolas Ferre <nicolas.ferre@atmel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-06-06 16:21:11 -07:00