The mv643xx_eth hardware ignores the lower three bits of the buffer
size field in receive descriptors, causing the reception of full-sized
packets to fail at some MTUs. Fix this by rounding the size of
allocated receive buffers up to a multiple of eight bytes.
While we are at it, add a bit of extra space to each receive buffer so
that we can handle multiple vlan tags on ingress.
Signed-off-by: Lennert Buytenhek <buytenh@marvell.com>
When we are low on memory, the assumption that every descriptor in the
receive ring will have an skbuff associated with it does not hold.
rxq_process() was assuming that if the receive descriptor it is working
on is not owned by the hardware, it can safely be processed and handed
to the networking stack. But a descriptor in the receive ring not being
owned by the hardware can also happen when we are low on memory and did
not manage to refill the receive ring fully.
This patch changes rxq_process()'s bailout condition from "the first
receive descriptor to be processed is owned by the hardware" to "the
first receive descriptor to be processed is owned by the hardware OR
the number of valid receive descriptors in the ring is zero".
Signed-off-by: Lennert Buytenhek <buytenh@marvell.com>
Nicolas Pitre noted that mv643xx_eth_poll was incorrectly using
non-IRQ-safe locks while checking whether to wake up the netdevice's
transmit queue. Convert the locking to *_irq() variants, since we
are running from softirq context where interrupts are enabled.
Signed-off-by: Lennert Buytenhek <buytenh@marvell.com>
Commit 12e4ab79cd ("mv643xx_eth: be
more agressive about RX refill") changed the condition for the receive
out-of-memory timer to be scheduled from "the receive ring is empty"
to "the receive ring is not full".
This can lead to a situation where the receive out-of-memory timer is
pending because a previous rxq_refill() didn't manage to refill the
receive ring entirely as a result of being out of memory, and
rxq_refill() is then called again as a side effect of a packet receive
interrupt, and that rxq_refill() call then again does not succeed to
refill the entire receive ring with fresh empty skbuffs because we are
still out of memory, and then tries to call add_timer() on the already
scheduled out-of-memory timer.
This patch fixes this issue by changing the add_timer() call in
rxq_refill() to a mod_timer() call. If the OOM timer was not already
scheduled, this will behave as before, whereas if it was already
scheduled, this patch will push back its firing time a bit, which is
safe because we've (unsuccessfully) attempted to refill the receive
ring just before we do this.
Signed-off-by: Lennert Buytenhek <buytenh@marvell.com>
When a receive interrupt occurs, mv643xx_eth would first process the
receive descriptors and then ACK the receive interrupt, instead of the
other way round.
This would leave a small race window between processing the last
receive descriptor and clearing the receive interrupt status in which
a new packet could come in, which would then 'rot' in the receive
ring until the next receive interrupt would come in.
Fix this by ACKing (clearing) the receive interrupt condition before
processing the receive descriptors.
Signed-off-by: Lennert Buytenhek <buytenh@marvell.com>
Although mv643xx_eth has no hardware support for inserting a vlan
tag by twiddling some bits in the TX descriptor, it does support
hardware TX checksumming on packets where the IP header starts {a
limited set of values other than 14} bytes into the packet.
This patch sets mv643xx_eth's ->vlan_features to NETIF_F_SG |
NETIF_F_IP_CSUM, which prevents the stack from checksumming vlan'ed
packets in software, and if vlan tags are present on a transmitted
packet, notifies the hardware of this fact by toggling the right
bits in the TX descriptor.
Signed-off-by: Lennert Buytenhek <buytenh@marvell.com>
When there is a link status change (link or phy status interrupt),
print a message notifying the user of the new link status.
Signed-off-by: Lennert Buytenhek <buytenh@marvell.com>
The mv643xx_eth hardware has a provision for polling the PHY's
MII management registers to obtain the (R)(G)MII interface speed
(10/100/1000) and duplex (half/full) and pause (off/symmetric)
settings to use to talk to the PHY.
The driver currently does not make use of this feature. Instead,
whenever there is a link status change event, it reads the current
link parameters from the PHY, and programs those parameters into
the mv643xx_eth MAC by hand.
This patch switches the mv643xx_eth driver to letting the MAC
auto-determine the (R)(G)MII link parameters by PHY polling, if there
is a PHY present. For PHYless ports (when e.g. the (R)(G)MII
interface is connected to a hardware switch), we keep hardcoding the
MII interface parameters.
Signed-off-by: Lennert Buytenhek <buytenh@marvell.com>
Instead of hardcoding MII register addresses and values, use the
symbolic constants defined in linux/mii.h.
Signed-off-by: Lennert Buytenhek <buytenh@marvell.com>
The mv643xx_eth driver is limiting DMA bursts to 32 bytes, while
using the largest burst size (128 bytes) gives a couple percentage
points performance improvement in throughput tests, and the docs
say that the 128 byte default should not need to be changed, so
use 128 byte bursts instead.
Signed-off-by: Lennert Buytenhek <buytenh@marvell.com>
The recommended sequence for waiting for the transmit path to clear
after disabling all of the transmit queues is to wait for the
TX_FIFO_EMPTY bit in the Port Status register to become set as well
as the TX_IN_PROGRESS bit to clear.
Signed-off-by: Lennert Buytenhek <buytenh@marvell.com>
The maximum receive packet size field in the Port Serial Control
register controls at what size received packets are flagged
overlength in the receive descriptor, but it doesn't prevent
overlength packets from being DMAd to memory and signaled to the
host like other received packets.
mv643xx_eth does not support receiving jumbo frames in 10/100 mode,
but setting the packet threshold to larger than 1522 bytes in 10/100
mode won't cause breakage by itself.
If we really want to enforce maximum packet size on the receiving
end instead of on the sending end where it should be done, we can
always just add a length check to the software receive handler
instead of relying on the hardware to do the comparison for us.
What's more, changing the maximum packet size field requires
temporarily disabling the RX/TX paths. So once the link comes
up in 10/100 Mb/s mode or 1000 Mb/s mode, we'd have to disable it
again just to set the right maximum packet size field (1522 in
10/100 Mb/s mode or 9700 in 1000 Mb/s mode), just so that we can
offload one comparison operation to hardware that we might as well
do in software, assuming that we'd want to do it at all.
Contrary to what the documentation suggests, there is no harm in
just setting a 9700 byte maximum packet size in 10/100 mode, so use
the maximum maximum packet size for all modes.
Signed-off-by: Lennert Buytenhek <buytenh@marvell.com>
The mv643xx_eth driver allows doing transmit reclaim from within the
napi poll routine, but after doing reclaim, it would forget to check
the free transmit descriptor count and wake up the transmit queue if
the reclaim caused enough descriptors for a new packet to become
available. This would cause the netdev watchdog to occasionally kick
in during certain workloads with combined receive and transmit traffic.
Fix this by adding a wakeup check identical to the one in the
interrupt handler to the napi poll routine.
Signed-off-by: Lennert Buytenhek <buytenh@marvell.com>
When the ethernet link goes down while mv643xx_eth is transmitting
data, transmit DMA can stop before all queued transmit descriptors
have been processed. But even the descriptors that _have_ been
processed might not be properly marked as done before the transmit
DMA unit shuts down.
Then when the link comes up again, the hardware transmit pointer
might have advanced while not all previous packet descriptors have
been marked as transmitted, causing software transmit reclaim to
hang waiting for the hardware to finish transmitting a descriptor
that it has already skipped.
This patch forcibly reclaims all packets on the transmit ring on a
link down interrupt, and then resyncs the hardware transmit pointer to
what the software's idea of the first free descriptor is. Also, we
need to prevent re-waking the transmit queue if we get a 'transmit
done' interrupt at the same time as a 'link down' interrupt, which
this patch does as well.
Signed-off-by: Lennert Buytenhek <buytenh@marvell.com>
The previously merged TX hang erratum workaround ("mv643xx_eth:
work around TX hang hardware issue") assumes that TX_END interrupts
are delivered simultaneously with or after their corresponding TX
interrupts, but this is not always true in practise.
In particular, it appears that TX_END interrupts are issued as soon
as descriptor fetch returns an invalid descriptor, which may happen
before earlier descriptors have been fully transmitted and written
back to memory as being done.
This hardware behavior can lead to a situation where the current
driver code mistakenly assumes that the MAC has given up transmitting
before noticing the packets that it is in fact still currently working
on, causing the driver to re-kick the transmit queue, which will only
cause the MAC to re-fetch the invalid head descriptor, and generate
another TX_END interrupt, et cetera, until the packets in the pipe
finally finish transmitting and have their descriptors written back
to memory, which will then finally break the loop.
Fix this by having the erratum workaround not check the 'number of
unfinished descriptor', but instead, to compare the software's idea
of what the head descriptor pointer should be to the hardware's head
descriptor pointer (which is updated on the same conditions as the
TX_END interupt is generated on, i.e. possibly before all previous
descriptors have been transmitted and written back).
Signed-off-by: Lennert Buytenhek <buytenh@marvell.com>
Joseph Fannin <jfannin@gmail.com> and Takashi Iwai <tiwai@suse.de>
noticed that commit 073a345c04
("mv643xx_eth: clarify irq masking and unmasking") broke the
mv643xx_eth build when NETPOLL is enabled, due to it not renaming
one instance of INT_CAUSE_EXT in mv643xx_eth_netpoll(). This patch
takes care of that instance as well.
Signed-off-by: Lennert Buytenhek <buytenh@marvell.com>
Cc: Dale Farnsworth <dale@farnsworth.org>
Cc: Joseph Fannin <jfannin@gmail.com>
Cc: Takashi Iwai <tiwai@suse.de>
Signed-off-by: Jeff Garzik <jgarzik@redhat.com>
On some boards, the mv643xx_eth MAC isn't connected to a PHY but
directly (via the MII/GMII/RGMII interface) to another MAC-layer
device. This patch allows specifying ->phy_addr = -1 to skip all
PHY-related initialisation and run-time poking in that case.
Signed-off-by: Lennert Buytenhek <buytenh@marvell.com>
Acked-by: Dale Farnsworth <dale@farnsworth.org>
During OOM, instead of stopping RX refill when the rx desc ring is
not empty, keep trying to refill the ring as long as it is not full
instead.
Signed-off-by: Lennert Buytenhek <buytenh@marvell.com>
Acked-by: Dale Farnsworth <dale@farnsworth.org>
Some SoCs have the TX bandwidth control registers in a slightly
different place. This patch detects that case at run time, and
re-directs accesses to those registers to the proper place at
run time if needed.
Signed-off-by: Lennert Buytenhek <buytenh@marvell.com>
Acked-by: Dale Farnsworth <dale@farnsworth.org>
Newer hardware has a 16-bit instead of a 14-bit RX coalescing
count field in the SDMA_CONFIG register. This patch adds a run-time
check for which of the two we have, and adjusts further writes to the
rx coal count field accordingly.
Signed-off-by: Lennert Buytenhek <buytenh@marvell.com>
Acked-by: Dale Farnsworth <dale@farnsworth.org>
Under some conditions, the TXQ ('TX queue being served') bit can clear
before all packets queued for that TX queue have been transmitted.
This patch enables TXend interrupts, and uses those to re-kick TX
queues that claim to be idle but still have queued descriptors from
the interrupt handler.
Signed-off-by: Lennert Buytenhek <buytenh@marvell.com>
Acked-by: Dale Farnsworth <dale@farnsworth.org>
As with the multiple RX queue support, allow the platform code to
specify that the hardware we are running on supports multiple TX
queues. This patch only uses the highest-numbered enabled queue
to send packets to for now, this can be extended later to enable
QoS and such.
Signed-off-by: Lennert Buytenhek <buytenh@marvell.com>
Acked-by: Dale Farnsworth <dale@farnsworth.org>
Allow the platform code to specify that we are running on hardware
that is capable of supporting multiple RX queues. If this option
is used, initialise all of the given RX queues instead of just RX
queue zero.
Signed-off-by: Lennert Buytenhek <buytenh@marvell.com>
Acked-by: Dale Farnsworth <dale@farnsworth.org>
Add an interface for the hardware's per-port and per-subqueue
TX rate control. In this stage, this is mainly so that we can
disable the bandwidth limits during initialisation of the port.
Signed-off-by: Lennert Buytenhek <buytenh@marvell.com>
Acked-by: Dale Farnsworth <dale@farnsworth.org>
General cleanup of the mv643xx_eth driver. Mainly fixes coding
style / indentation issues, get rid of some useless 'volatile's,
kill some more superfluous comments, and such.
Signed-off-by: Lennert Buytenhek <buytenh@marvell.com>
Acked-by: Dale Farnsworth <dale@farnsworth.org>
Remove the write-only ->[rt]x_int_coal members from struct
mv643xx_eth_private. In the process, tweak the RX/TX interrupt
mitigation code so that it is compiled by default, and set the
default coalescing delays to 0 usec.
Signed-off-by: Lennert Buytenhek <buytenh@marvell.com>
Acked-by: Dale Farnsworth <dale@farnsworth.org>
Split all TX queue related state into 'struct tx_queue', in
preparation for multiple TX queue support.
Signed-off-by: Lennert Buytenhek <buytenh@marvell.com>
Acked-by: Dale Farnsworth <dale@farnsworth.org>
Split all RX queue related state into 'struct rx_queue', in
preparation for multiple RX queue support.
Signed-off-by: Lennert Buytenhek <buytenh@marvell.com>
Acked-by: Dale Farnsworth <dale@farnsworth.org>
The per-port mv643xx_eth_private struct had a private instance
of struct net_device_stats that was never ever written to, only
read (via the ethtool statistics interface). This patch gets
rid of the private instance, and tweaks the ethtool statistics
code in mv643xx_eth to use the statistics in struct net_device
instead.
Signed-off-by: Lennert Buytenhek <buytenh@marvell.com>
Acked-by: Dale Farnsworth <dale@farnsworth.org>
Since they are no longer used, kill enum FUNC_RET_STATUS and
struct pkt_info (which were a rather roundabout way of communicating
RX/TX status within the same driver).
Signed-off-by: Lennert Buytenhek <buytenh@marvell.com>
Acked-by: Dale Farnsworth <dale@farnsworth.org>
rx_return_buff() is also a remnant of the HAL layering that the
original mv643xx_eth driver used. Moving it into its caller kills
the last reference to FUNC_RET_STATUS/pkt_info.
Signed-off-by: Lennert Buytenhek <buytenh@marvell.com>
Acked-by: Dale Farnsworth <dale@farnsworth.org>
The port_receive() function is a remnant of the original mv643xx_eth
HAL split. This patch moves port_receive() into its caller, so that
the top and the bottom half of RX processing no longer communicate
via the HAL FUNC_RET_STATUS/pkt_info mechanism abstraction anymore.
Signed-off-by: Lennert Buytenhek <buytenh@marvell.com>
Acked-by: Dale Farnsworth <dale@farnsworth.org>
Half of the functions in the mv643xx_eth driver are prefixed by
useless and baroque comment blocks on _what_ those functions do (which
is obvious from the code itself) rather than why, and there's no point
in keeping those comments around.
Signed-off-by: Lennert Buytenhek <buytenh@marvell.com>
Acked-by: Dale Farnsworth <dale@farnsworth.org>
A bunch of places in the mv643xx_eth driver use the 'mv643xx_'
prefix. Since the mv643xx is a chip that includes more than just
ethernet, this patch makes all those places use either no prefix
(for some internal-use-only functions), or the full 'mv643xx_eth_'
prefix.
Signed-off-by: Lennert Buytenhek <buytenh@marvell.com>
Acked-by: Dale Farnsworth <dale@farnsworth.org>
The fact that mv643xx_eth is an ethernet driver is pretty obvious,
and having a lot of internal-use-only functions and defines prefixed
with ETH_/ethernet_/eth_ prefixes is rather pointless. So, get rid
of most of those prefixes.
Signed-off-by: Lennert Buytenhek <buytenh@marvell.com>
Acked-by: Dale Farnsworth <dale@farnsworth.org>
Remove the unused rx/tx descriptor field defines, and move the ones
that are actually used to the actual definitions of the rx/tx
descriptor format.
Signed-off-by: Lennert Buytenhek <buytenh@marvell.com>
Acked-by: Dale Farnsworth <dale@farnsworth.org>
All except one of the port serial status register bit defines are
unused -- kill the unused ones.
Signed-off-by: Lennert Buytenhek <buytenh@marvell.com>
Acked-by: Dale Farnsworth <dale@farnsworth.org>
The only user of the ETH_MIB_VERY_LONG_NAME_HERE defines is the
eth_update_mib_counters() function. Get rid of the defines by
open-coding the register offsets in the latter.
Signed-off-by: Lennert Buytenhek <buytenh@marvell.com>
Acked-by: Dale Farnsworth <dale@farnsworth.org>
Get rid of RX_BUF_OFFSET (which is synonymous with ETH_HW_IP_ALIGN).
Signed-off-by: Lennert Buytenhek <buytenh@marvell.com>
Acked-by: Dale Farnsworth <dale@farnsworth.org>
Replace the nondescriptive names ETH_INT_UNMASK_ALL and
ETH_INT_UNMASK_ALL_EXT by names of the actual fields being masked
and unmasked in the various writes to the interrupt mask registers.
Signed-off-by: Lennert Buytenhek <buytenh@marvell.com>
Acked-by: Dale Farnsworth <dale@farnsworth.org>
None of the port status register bit defines are ever used in the
mv643xx_eth driver -- nuke them all.
Signed-off-by: Lennert Buytenhek <buytenh@marvell.com>
Acked-by: Dale Farnsworth <dale@farnsworth.org>
Over half of the port serial control register bit defines are never
used, and the PORT_SERIAL_CONTROL_DEFAULT_VALUE define is never used
either. Keep only those defines that are actually used.
Signed-off-by: Lennert Buytenhek <buytenh@marvell.com>
Acked-by: Dale Farnsworth <dale@farnsworth.org>