When a PCI device lives behind an IOMMU, it should use 'pci_dma_*' family of
functions when any transfer from/to guest memory is required while
'cpu_physical_memory_*' family of functions completely bypass any MMU/IOMMU in
the system.
vmxnet3 in some places was using 'cpu_physical_memory_*' family of functions
which works fine with the default QEMU setup where IOMMU is not enabled but
fails miserably when IOMMU is enabled. This commit converts all such instances
in favor of 'pci_dma_*'
Cc: Dmitry Fleytman <dmitry@daynix.com>
Cc: Jason Wang <jasowang@redhat.com>
Cc: qemu-devel@nongnu.org
Cc: Anthony Liguori <aliguori@amazon.com>
Signed-off-by: KarimAllah Ahmed <karahmed@amazon.de>
Acked-by: Dmitry Fleytman <dmitry@daynix.com>
Signed-off-by: Jason Wang <jasowang@redhat.com>
Commit 9d29cdeaac (rtl8139: port
TallyCounters to vmstate) introduced in incompatibility in the v4
format as it omitted the RxOkMul counter.
There are presumably no users that were impacted by the v4 to v4'
breakage, so increase the save version to 5 and re-add the field,
keeping backward compatibility with v4'.
We can't have a field conditional on the section version in
vmstate_tally_counters since this version checked would not be the
section version (but the version defined in this structure). So, move
all the fields into the main state structure.
Signed-off-by: David Vrabel <david.vrabel@citrix.com>
Signed-off-by: Jason Wang <jasowang@redhat.com>
When processing MIPSnet I/O port write operation, it uses a
transmit buffer tx_buffer[MAX_ETH_FRAME_SIZE=1514]. Two indices
's->tx_written' and 's->tx_count' are used to control data written
to this buffer. If the two were to be equal before writing, it'd
lead to an OOB write access beyond tx_buffer. Add check to avoid it.
Reported-by: Li Qiang <qiang6-s@360.cn>
Signed-off-by: Prasad J Pandit <pjp@fedoraproject.org>
Signed-off-by: Jason Wang <jasowang@redhat.com>
Don't use *_to_cpup() to do byte-swapped loads; instead use
ld*_p() which correctly handle misaligned accesses.
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <rth@twiddle.net>
Acked-by: Jason Wang <jasowang@redhat.com>
Acked-by: Dmitry Fleytman <dmitry@daynix.com <mailto:dmitry@daynix.com>>
Message-id: 1466097446-981-6-git-send-email-peter.maydell@linaro.org
Don't use *_to_cpup() to do byte-swapped loads; instead use
ld*_p() which correctly handle misaligned accesses.
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <rth@twiddle.net>
Acked-by: Jason Wang <jasowang@redhat.com>
Acked-by: Dmitry Fleytman <dmitry@daynix.com <mailto:dmitry@daynix.com>>
Message-id: 1466097446-981-5-git-send-email-peter.maydell@linaro.org
Don't use *_to_cpup() to do byte-swapped loads; instead use
ld*_p() which correctly handle misaligned accesses.
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <rth@twiddle.net>
Acked-by: Jason Wang <jasowang@redhat.com>
Acked-by: Dmitry Fleytman <dmitry@daynix.com <mailto:dmitry@daynix.com>>
Message-id: 1466097446-981-4-git-send-email-peter.maydell@linaro.org
Don't use *_to_cpup() to do byte-swapped loads; instead use
ld*_p() which correctly handle misaligned accesses.
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <rth@twiddle.net>
Acked-by: Jason Wang <jasowang@redhat.com>
Acked-by: Dmitry Fleytman <dmitry@daynix.com <mailto:dmitry@daynix.com>>
Message-id: 1466097446-981-3-git-send-email-peter.maydell@linaro.org
Don't use cpu_to_*w() and *_to_cpup() to do byte-swapped loads
and stores; instead use ld*_p() and st*_p() which correctly handle
misaligned accesses.
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <rth@twiddle.net>
Acked-by: Jason Wang <jasowang@redhat.com>
Acked-by: Dmitry Fleytman <dmitry@daynix.com <mailto:dmitry@daynix.com>>
Message-id: 1466097446-981-2-git-send-email-peter.maydell@linaro.org
The Cadence GEM data sheet says:
"Wrap - marks last descriptor in transmit buffer descriptor list. This
can be set for any buffer within the frame."
which seems to imply that when the wrap bit is set so is the last bit.
Previously if the wrap bit is set, but the last is not then QEMU will
enter an infinite loop.
Signed-off-by: Alistair Francis <alistair.francis@xilinx.com>
Reported-by: Li Qiang <liqiang6-s@360.cn>
Reported-by: P J P <ppandit@redhat.com>
Message-id: eb23f15c67989ea6a53609dc66568399dadf52a7.1466539342.git.alistair.francis@xilinx.com
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
A guest can write zero to the DMACFG resulting in an infinite loop when
it reaches the while(bytes_to_copy) loop.
To avoid this issue enforce a minimum size for the RX buffer. Hardware
does not have this enforcement and relies on the guest to set a non-zero
value.
Signed-off-by: Alistair Francis <alistair.francis@xilinx.com>
Reported-by: Li Qiang <liqiang6-s@360.cn>
Reported-by: P J P <ppandit@redhat.com>
Message-id: 84bb1c391b833275da3f573d4972920cea34c188.1466539342.git.alistair.francis@xilinx.com
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Version: GnuPG v1
iQEcBAABAgAGBQJXaFInAAoJEJykq7OBq3PI6VsH/0Sfgbdo1RksYuQwb/y92sCW
EN+lxUZ+OLfgrc8PYgNZwfSM3rsfYhznL0MAXOeEe7Ahabi07w7DhGR8WvwfAOlI
G96FRuvrIPfv5u6U6fwS4CvG3TIHVLxfHKCsTpPUmH8U5CNx/x/tpjNiWN1dj6t+
sXybSjYHfZfiZy2tI9MFIFWCdxnF/pl0QAPhbRqc8Y/RQTDrPKRjLpz+nitN/u96
5TS7KlELyQuP91YMmLceYSmIkHbxW703h+iE2n4hov0uZCP8Jil+2Jsd3ziQSRlL
j6LqexQ2ViBGdDSfiZGYES2VPlsHOCwb4G+IgWBStfZg1ppaXENvcDzPrgrB+L4=
=eUnF
-----END PGP SIGNATURE-----
Merge remote-tracking branch 'remotes/stefanha/tags/tracing-pull-request' into staging
# gpg: Signature made Mon 20 Jun 2016 21:29:27 BST
# gpg: using RSA key 0x9CA4ABB381AB73C8
# gpg: Good signature from "Stefan Hajnoczi <stefanha@redhat.com>"
# gpg: aka "Stefan Hajnoczi <stefanha@gmail.com>"
# Primary key fingerprint: 8695 A8BF D3F9 7CDA AC35 775A 9CA4 ABB3 81AB 73C8
* remotes/stefanha/tags/tracing-pull-request: (42 commits)
trace: split out trace events for linux-user/ directory
trace: split out trace events for qom/ directory
trace: split out trace events for target-ppc/ directory
trace: split out trace events for target-s390x/ directory
trace: split out trace events for target-sparc/ directory
trace: split out trace events for net/ directory
trace: split out trace events for audio/ directory
trace: split out trace events for ui/ directory
trace: split out trace events for hw/alpha/ directory
trace: split out trace events for hw/arm/ directory
trace: split out trace events for hw/acpi/ directory
trace: split out trace events for hw/vfio/ directory
trace: split out trace events for hw/s390x/ directory
trace: split out trace events for hw/pci/ directory
trace: split out trace events for hw/ppc/ directory
trace: split out trace events for hw/9pfs/ directory
trace: split out trace events for hw/i386/ directory
trace: split out trace events for hw/isa/ directory
trace: split out trace events for hw/sd/ directory
trace: split out trace events for hw/sparc/ directory
...
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Move all trace-events for files in the hw/net/ directory to
their own file.
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
Message-id: 1466066426-16657-11-git-send-email-berrange@redhat.com
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
error_propagate() already ignores local_err==NULL, so there's no
need to check it before calling.
Coccinelle patch used to perform the changes added to
scripts/coccinelle/error_propagate_null.cocci.
Reviewed-by: Eric Blake <eblake@redhat.com>
Acked-by: Cornelia Huck <cornelia.huck@de.ibm.com>
Reviewed-by: Markus Armbruster <armbru@redhat.com>
Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
Message-Id: <1465855078-19435-2-git-send-email-ehabkost@redhat.com>
Signed-off-by: Markus Armbruster <armbru@redhat.com>
Beginning of reconnect support for vhost-user.
Misc cleanups and fixes.
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1
iQEcBAABAgAGBQJXY0Q3AAoJECgfDbjSjVRpkVcH/2gTHRE9yUoWe6ROvPV67BKx
8Iy9GzJ3BMO3RolVZEA5KXIevn5TG+pV274BZEuXMD3AL/molv279p0o/gvBYoqq
V0jNH2MO+MV6D9OzhUXcgWSejvybF5W07ojPDU/hlgtFXPZFbJDyt95MWaLiilOg
cCtTuRqgrrRaypcnnk/CIDbC+Ek2kAYdgQHQbfj9ihle3TWO8R0bSXnFqSaqCIkM
4slMlv8y82fODeiO83nkpfAP1NCnfnRC8r8Gv7hbEUTlZQntavx5DuYdiIx6nsJE
W0g+Gpe1o0+jRuMnucGIUZvqzZ0e/I0wZuV16Nsfx+Rbd5+4CzTxZda5Qb05v7I=
=BHbJ
-----END PGP SIGNATURE-----
Merge remote-tracking branch 'remotes/mst/tags/for_upstream' into staging
pc, pci, virtio: new features, cleanups, fixes
Beginning of reconnect support for vhost-user.
Misc cleanups and fixes.
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
# gpg: Signature made Fri 17 Jun 2016 01:28:39 BST
# gpg: using RSA key 0x281F0DB8D28D5469
# gpg: Good signature from "Michael S. Tsirkin <mst@kernel.org>"
# gpg: aka "Michael S. Tsirkin <mst@redhat.com>"
# Primary key fingerprint: 0270 606B 6F3C DF3D 0B17 0970 C350 3912 AFBE 8E67
# Subkey fingerprint: 5D09 FD08 71C8 F85B 94CA 8A0D 281F 0DB8 D28D 5469
* remotes/mst/tags/for_upstream:
MAINTAINERS: add Marcel to PCI
msi_init: change return value to 0 on success
fix some coding style problems
pci core: assert ENOSPC when add capability
test: start vhost-user reconnect test
tests: append i386 tests
vhost-net: save & restore vring enable state
vhost-net: save & restore vhost-user acked features
vhost-net: do not crash if backend is not present
vhost-user: disconnect on start failure
qemu-char: add qemu_chr_disconnect to close a fd accepted by listen fd
tests/vhost-user-bridge: workaround stale vring base
tests/vhost-user-bridge: add client mode
vhost-user: add ability to know vhost-user backend disconnection
pci: fix pci_requester_id()
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Conflicts:
tests/Makefile.include
It has:
1. More newlines make the code block well separated.
2. Add more comments for msi_init.
3. Fix a indentation in vmxnet3.c.
4. ioh3420 & xio3130_downstream: put PCI Express capability init function
together, make it more readable.
cc: Michael S. Tsirkin <mst@redhat.com>
cc: Markus Armbruster <armbru@redhat.com>
cc: Marcel Apfelbaum <marcel@redhat.com>
cc: Dmitry Fleytman <dmitry@daynix.com>
cc: Jason Wang <jasowang@redhat.com>
Reviewed-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Marcel Apfelbaum <marcel@redhat.com>
Signed-off-by: Cao jin <caoj.fnst@cn.fujitsu.com>
Reviewed-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
A driver may change the vring enable state at run time but vhost-user
backend may not be present (a contrived example is when the backend is
disconnected and the device is reconfigured after driver rebinding)
Restore the vring state when the vhost-user backend is started, so it
can process the ring.
Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com>
Tested-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>
Reviewed-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>
Reviewed-by: Victor Kaplansky <victork@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
The initial vhost-user connection sets the features to be negotiated
with the driver. Renegotiation isn't possible without device reset.
To handle reconnection of vhost-user backend, ensure the same set of
features are provided, and reuse already acked features.
Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com>
Tested-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>
Reviewed-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>
Reviewed-by: Victor Kaplansky <victork@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Do not crash when backend is not present while enabling the ring. A
following patch will save the enabled state so it can be restored once
the backend is started.
Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com>
Tested-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>
Reviewed-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>
Reviewed-by: Victor Kaplansky <victork@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
qemu/osdep.h checks whether MAP_ANONYMOUS is defined, but this check
is bogus without a previous inclusion of sys/mman.h. Include it in
sysemu/os-posix.h and remove it from everywhere else.
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Replace (((n) + (d) - 1) /(d)) by DIV_ROUND_UP(n,d).
This patch is the result of coccinelle script
scripts/coccinelle/round.cocci
CC: Scott Feldman <sfeldma@gmail.com>
Signed-off-by: Laurent Vivier <lvivier@redhat.com>
Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>
Since mit_delay can never be 0 this if statement is
superfluous.
Signed-off-by: Sameeh Jubran <sameeh@daynix.com>
Reviewed-by: Stefan Weil <sw@weilnetz.de>
Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>
This patch fixes used-uninitialized false
positive while compiling with ust tracing
backend plus gcc 4.6.3:
hw/net/e1000e.c: In function ‘e1000e_io_write’:
hw/net/e1000e.c:170:39: error: ‘idx’ may be used uninitialized in this function [-Werror=uninitialized]
hw/net/e1000e.c: In function ‘e1000e_io_read’:
hw/net/e1000e.c:145:35: error: ‘idx’ may be used uninitialized in this function [-Werror=uninitialized]
cc1: all warnings being treated as errors
make: *** [hw/net/e1000e.o] Error 1
Signed-off-by: Dmitry Fleytman <dmitry@daynix.com>
Acked-by: Jason Wang <jasowang@redhat.com>
Message-id: 1465023763-10773-1-git-send-email-dmitry@daynix.com
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
ust trace backend has limitation of maximum 10
arguments per event. Traces with more arguments
cannot be compiled for this backend.
Trace e1000e_rx_rss_ip6 introduced by previous
commits has 11 arguments and fails to compile with
ust trace backend.
This patch fixes the problem by splitting this
tracepoint into two successive tracepoints with
smaller number of arguments.
For more information see comment regarding TP_ARGS
in lttng/tracepoint.h:
/*
* TP_ARGS takes tuples of type, argument separated by a comma.
* It can take up to 10 tuples (which means that less than 10 tuples is
* fine too).
* Each tuple is also separated by a comma.
*/
Build log generated by this problem:
In file included from ./trace/generated-tracers.h:9:0,
from /home/travis/build/qemu/qemu/include/trace.h:4,
from util/oslib-posix.c:36:
./trace/generated-ust-provider.h:16556:3: error: unknown type name ‘_TP_EXPROTO_Bool’
In file included from /home/travis/build/qemu/qemu/include/trace.h:4:0,
from util/oslib-posix.c:36:
./trace/generated-tracers.h: In function ‘trace_e1000e_rx_rss_ip6’:
./trace/generated-tracers.h:8379:431: error: expected string literal before ‘_SDT_ASM_OPERANDS_ipv6_enabled’
./trace/generated-tracers.h:8379:431: error: implicit declaration of function ‘__tracepoint_cb_qemu___e1000e_rx_rss_ip6’ [-Werror=implicit-function-declaration]
./trace/generated-tracers.h:8379:431: error: nested extern declaration of ‘__tracepoint_cb_qemu___e1000e_rx_rss_ip6’ [-Werror=nested-externs]
cc1: all warnings being treated as errors
make: *** [util/oslib-posix.o] Error 1
make: *** Waiting for unfinished jobs....
In file included from ./trace/generated-tracers.h:9:0,
from /home/travis/build/qemu/qemu/include/trace.h:4,
from util/hbitmap.c:16:
./trace/generated-ust-provider.h:16556:3: error: unknown type name ‘_TP_EXPROTO_Bool’
In file included from /home/travis/build/qemu/qemu/include/trace.h:4:0,
from util/hbitmap.c:16:
./trace/generated-tracers.h: In function ‘trace_e1000e_rx_rss_ip6’:
./trace/generated-tracers.h:8379:431: error: expected string literal before ‘_SDT_ASM_OPERANDS_ipv6_enabled’
./trace/generated-tracers.h:8379:431: error: implicit declaration of function ‘__tracepoint_cb_qemu___e1000e_rx_rss_ip6’ [-Werror=implicit-function-declaration]
./trace/generated-tracers.h:8379:431: error: nested extern declaration of ‘__tracepoint_cb_qemu___e1000e_rx_rss_ip6’ [-Werror=nested-externs]
cc1: all warnings being treated as errors
make: *** [util/hbitmap.o] Error 1
Signed-off-by: Dmitry Fleytman <dmitry@daynix.com>
Message-id: 1464894748-27803-1-git-send-email-dmitry@daynix.com
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Version: GnuPG v1
iQEcBAABAgAGBQJXT9DWAAoJEO8Ells5jWIRgFAH/1ZDXm8V523AMDOEvBAWgqur
Dj8ZaIwFkqJp7xtLdhS0yKF3xW+vtgx9k+Qftk0S8qEiFKPbThR8iB5VNuesErwd
AZhWo4bnVhKwtWyMw3BDRDK1N4huAWPMZEva1xovR/Cc9v5IG5mx57/K3Zz5C8ec
Jsn4DsLKN0q7W0D0dlnbEOkSjl6iKJchvfPCR6UfvrU7BxfXaCZ9Z7Sfh8ec6tfr
iMgcV9u3A3Zs72gTM9/jdKx8vOrWtdKJufJ8s2Bctc7CyfBNWwnV8PjndhEe3Xvs
vlYeJopdpDPsdMkMtYD6cevtEgvD5yhOBndJ7et807jjuCvUf837tMhodKkFk9M=
=SjIZ
-----END PGP SIGNATURE-----
Merge remote-tracking branch 'remotes/jasowang/tags/net-pull-request' into staging
# gpg: Signature made Thu 02 Jun 2016 07:23:18 BST using RSA key ID 398D6211
# gpg: Good signature from "Jason Wang (Jason Wang on RedHat) <jasowang@redhat.com>"
# gpg: WARNING: This key is not certified with sufficiently trusted signatures!
# gpg: It is not certain that the signature belongs to the owner.
# Primary key fingerprint: 215D 46F4 8246 689E C77F 3562 EF04 965B 398D 6211
* remotes/jasowang/tags/net-pull-request: (31 commits)
Add ENET device to i.MX6 SOC.
Add ENET/Gbps Ethernet support to FEC device
i.MX: move FEC device to a register array structure.
i.MX: Rename i.MX FEC defines to ENET_XXX
i.MX: reset TX/RX descriptors when FEC is disabled.
i.MX: Fix FEC code for ECR register reset value.
i.MX: Fix FEC code for MDIO address selection
i.MX: Fix FEC code for MDIO operation selection
net: handle optional VLAN header in checksum computation.
net: improve UDP/TCP checksum computation.
e1000e: Introduce qtest for e1000e device
net: Introduce e1000e device emulation
e1000: Move out code that will be reused in e1000e
e1000_regs: Add definitions for Intel 82574-specific bits
vmxnet3: Use pci_dma_* API instead of cpu_physical_memory_*
net_pkt: Extend packet abstraction as required by e1000e functionality
rtl8139: Move more TCP definitions to common header
net_pkt: Name vmxnet3 packet abstractions more generic
vmxnet3: Use common MAC address tracing macros
net: Add macros for MAC address tracing
...
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
The ENET device (present in i.MX6) is "derived" from FEC and backward
compatible with it.
This patch adds the necessary support of the added feature in the ENET
device to allow Linux to use it (on supported processors).
Signed-off-by: Jean-Christophe Dubois <jcd@tribudubois.net>
Signed-off-by: Jason Wang <jasowang@redhat.com>
This is to prepare for the ENET Gb device of the i.MX6.
Signed-off-by: Jean-Christophe Dubois <jcd@tribudubois.net>
Signed-off-by: Jason Wang <jasowang@redhat.com>
According to the FEC chapter of i.MX25 reference manual
RX adn TX descriptors are reseted when the FEC device is disabled through ECR.
Signed-off-by: Jean-Christophe Dubois <jcd@tribudubois.net>
Signed-off-by: Jason Wang <jasowang@redhat.com>
According to the FEC chapter of i.MX25 reference manual ECR register is
initialized at 0xf0000000 at reset time.
We fix the value.
Signed-off-by: Jean-Christophe Dubois <jcd@tribudubois.net>
Signed-off-by: Jason Wang <jasowang@redhat.com>
According to the FEC chapter of i.MX25 reference manual
When writing to MMFR register, the MDIO device and adress are selected by
bit 27 to 23 and bit 22 to 18 respectively. This is a total of 10 bits
that need to be used by the Phy chip/address decoding function.
This patch fixes the number of bits used from 9 to 10.
Signed-off-by: Jean-Christophe Dubois <jcd@tribudubois.net>
Signed-off-by: Jason Wang <jasowang@redhat.com>
According to the FEC chapter of i.MX25 reference manual
When writing the MMFR register, bit 29 and 28 select the requested operation.
* 10 means read operation with valid MII mgmt frame
* 11 means read operation with non compliant MII mgmt frame
* 01 means write operation with valid MII mgmt frame
* 00 means write operation with non compliant MII mgmt frame
So while bit 28 does change beween read/write for valid MII mgmt frame, the
mening is inverted for non compliant MII mgmt frame.
Bit 29 on the other hand means read/write whatever the type of mgmt frame
involved.
So this patch change the operation selection from bit 28 to bit 29 as it is
more generic.
Signed-off-by: Jean-Christophe Dubois <jcd@tribudubois.net>
Signed-off-by: Jason Wang <jasowang@redhat.com>
This patch introduces emulation for the Intel 82574 adapter, AKA e1000e.
This implementation is derived from the e1000 emulation code, and
utilizes the TX/RX packet abstractions that were initially developed for
the vmxnet3 device. Although some parts of the introduced code may be
shared with e1000, the differences are substantial enough so that the
only shared resources for the two devices are the definitions in
hw/net/e1000_regs.h.
Similarly to vmxnet3, the new device uses virtio headers for task
offloads (for backends that support virtio extensions). Usage of
virtio headers may be forcibly disabled via a boolean device property
"vnet" (which is enabled by default). In such case task offloads
will be performed in software, in the same way it is done on
backends that do not support virtio headers.
The device code is split into two parts:
1. hw/net/e1000e.c: QEMU-specific code for a network device;
2. hw/net/e1000e_core.[hc]: Device emulation according to the spec.
The new device name is e1000e.
Intel specifications for the 82574 controller are available at:
http://www.intel.com/content/dam/doc/datasheet/82574l-gbe-controller-datasheet.pdf
Throughput measurement results (iperf2):
Fedora 22 guest, TCP, RX
4 ++------------------------------------------+
| |
| X X X X X
3.5 ++ X X X X |
| X |
| |
3 ++ |
G | X |
b | |
/ 2.5 ++ |
s | |
| |
2 ++ |
| |
| |
1.5 X+ |
| |
+ + + + + + + + + + + +
1 ++--+---+---+---+---+---+---+---+---+---+---+
32 64 128 256 512 1 2 4 8 16 32 64
B B B B B KB KB KB KB KB KB KB
Buffer size
Fedora 22 guest, TCP, TX
18 ++-------------------------------------------+
| X |
16 ++ X X X X X
| X |
14 ++ |
| |
12 ++ |
G | X |
b 10 ++ |
/ | |
s 8 ++ |
| |
6 ++ X |
| |
4 ++ |
| X |
2 ++ X |
X + + + + + + + + + + +
0 ++--+---+---+---+---+----+---+---+---+---+---+
32 64 128 256 512 1 2 4 8 16 32 64
B B B B B KB KB KB KB KB KB KB
Buffer size
Fedora 22 guest, UDP, RX
3 ++------------------------------------------+
| X
| |
2.5 ++ |
| |
| |
2 ++ X |
G | |
b | |
/ 1.5 ++ |
s | X |
| |
1 ++ |
| |
| X |
0.5 ++ |
| X |
X + + + + +
0 ++-------+--------+-------+--------+--------+
32 64 128 256 512 1
B B B B B KB
Datagram size
Fedora 22 guest, UDP, TX
1 ++------------------------------------------+
| X
0.9 ++ |
| |
0.8 ++ |
0.7 ++ |
| |
G 0.6 ++ |
b | |
/ 0.5 ++ |
s | X |
0.4 ++ |
| |
0.3 ++ |
0.2 ++ X |
| |
0.1 ++ X |
X X + + + +
0 ++-------+--------+-------+--------+--------+
32 64 128 256 512 1
B B B B B KB
Datagram size
Windows 2012R2 guest, TCP, RX
3.2 ++------------------------------------------+
| X |
3 ++ |
| |
2.8 ++ |
| |
2.6 ++ X |
G | X X X X X
b 2.4 ++ X X |
/ | |
s 2.2 ++ |
| |
2 ++ |
| X X |
1.8 ++ |
| |
1.6 X+ |
+ + + + + + + + + + + +
1.4 ++--+---+---+---+---+---+---+---+---+---+---+
32 64 128 256 512 1 2 4 8 16 32 64
B B B B B KB KB KB KB KB KB KB
Buffer size
Windows 2012R2 guest, TCP, TX
14 ++-------------------------------------------+
| |
| X X
12 ++ |
| |
10 ++ |
| |
G | |
b 8 ++ |
/ | X |
s 6 ++ |
| |
| |
4 ++ X |
| |
2 ++ |
| X X X |
+ X X + + X X + + + + +
0 X+--+---+---+---+---+----+---+---+---+---+---+
32 64 128 256 512 1 2 4 8 16 32 64
B B B B B KB KB KB KB KB KB KB
Buffer size
Windows 2012R2 guest, UDP, RX
1.6 ++------------------------------------------X
| |
1.4 ++ |
| |
1.2 ++ |
| X |
| |
G 1 ++ |
b | |
/ 0.8 ++ |
s | |
0.6 ++ X |
| |
0.4 ++ |
| X |
| |
0.2 ++ X |
X + + + + +
0 ++-------+--------+-------+--------+--------+
32 64 128 256 512 1
B B B B B KB
Datagram size
Windows 2012R2 guest, UDP, TX
0.6 ++------------------------------------------+
| X
| |
0.5 ++ |
| |
| |
0.4 ++ |
G | |
b | |
/ 0.3 ++ X |
s | |
| |
0.2 ++ |
| |
| X |
0.1 ++ |
| X |
X X + + + +
0 ++-------+--------+-------+--------+--------+
32 64 128 256 512 1
B B B B B KB
Datagram size
Signed-off-by: Dmitry Fleytman <dmitry.fleytman@ravellosystems.com>
Signed-off-by: Leonid Bloch <leonid.bloch@ravellosystems.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Jason Wang <jasowang@redhat.com>
Code that will be shared moved to a separate files.
Signed-off-by: Dmitry Fleytman <dmitry.fleytman@ravellosystems.com>
Signed-off-by: Leonid Bloch <leonid.bloch@ravellosystems.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Jason Wang <jasowang@redhat.com>
Signed-off-by: Dmitry Fleytman <dmitry.fleytman@ravellosystems.com>
Signed-off-by: Leonid Bloch <leonid.bloch@ravellosystems.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Jason Wang <jasowang@redhat.com>
To make this device and network packets
abstractions ready for IOMMU.
Signed-off-by: Dmitry Fleytman <dmitry.fleytman@ravellosystems.com>
Signed-off-by: Leonid Bloch <leonid.bloch@ravellosystems.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Jason Wang <jasowang@redhat.com>
This patch extends the TX/RX packet abstractions with features that will
be used by the e1000e device implementation.
Changes are:
1. Support iovec lists for RX buffers
2. Deeper RX packets parsing
3. Loopback option for TX packets
4. Extended VLAN headers handling
5. RSS processing for RX packets
Signed-off-by: Dmitry Fleytman <dmitry.fleytman@ravellosystems.com>
Signed-off-by: Leonid Bloch <leonid.bloch@ravellosystems.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Jason Wang <jasowang@redhat.com>
Signed-off-by: Dmitry Fleytman <dmitry.fleytman@ravellosystems.com>
Signed-off-by: Leonid Bloch <leonid.bloch@ravellosystems.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Jason Wang <jasowang@redhat.com>
This patch drops "vmx" prefix from packet abstractions names
to emphasize the fact they are generic and not tied to any
specific network device.
These abstractions will be reused by e1000e emulation implementation
introduced by following patches so their names need generalization.
This patch (except renamed files, adjusted comments and changes in MAINTAINTERS)
was produced by:
git grep -lz 'vmxnet_tx_pkt' | xargs -0 perl -i'' -pE "s/vmxnet_tx_pkt/net_tx_pkt/g"
git grep -lz 'vmxnet_rx_pkt' | xargs -0 perl -i'' -pE "s/vmxnet_rx_pkt/net_rx_pkt/g"
git grep -lz 'VmxnetTxPkt' | xargs -0 perl -i'' -pE "s/VmxnetTxPkt/NetTxPkt/g"
git grep -lz 'VMXNET_TX_PKT' | xargs -0 perl -i'' -pE "s/VMXNET_TX_PKT/NET_TX_PKT/g"
git grep -lz 'VmxnetRxPkt' | xargs -0 perl -i'' -pE "s/VmxnetRxPkt/NetRxPkt/g"
git grep -lz 'VMXNET_RX_PKT' | xargs -0 perl -i'' -pE "s/VMXNET_RX_PKT/NET_RX_PKT/g"
sed -ie 's/VMXNET_/NET_/g' hw/net/vmxnet_rx_pkt.c
sed -ie 's/VMXNET_/NET_/g' hw/net/vmxnet_tx_pkt.c
Signed-off-by: Dmitry Fleytman <dmitry.fleytman@ravellosystems.com>
Signed-off-by: Leonid Bloch <leonid.bloch@ravellosystems.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Jason Wang <jasowang@redhat.com>
Signed-off-by: Dmitry Fleytman <dmitry.fleytman@ravellosystems.com>
Signed-off-by: Leonid Bloch <leonid.bloch@ravellosystems.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Jason Wang <jasowang@redhat.com>
Signed-off-by: Dmitry Fleytman <dmitry.fleytman@ravellosystems.com>
Signed-off-by: Leonid Bloch <leonid.bloch@ravellosystems.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Jason Wang <jasowang@redhat.com>
The last 8 bytes of the receive buffer list page (that has been supplied
by the guest with the H_REGISTER_LOGICAL_LAN call) contain a counter
for frames that have been dropped because there was no suitable receive
buffer available. This patch introduces code to use this field to
provide the information about dropped rx packets to the guest.
There it can be queried with "ethtool -S eth0 | grep rx_no_buffer".
Signed-off-by: Thomas Huth <thuth@redhat.com>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Currently, the spapr-vlan device is trying to flush the RX queue
after each RX buffer that has been added by the guest via the
H_ADD_LOGICAL_LAN_BUFFER hypercall. In case the receive buffer pool
was empty before, we only pass single packets to the guest this
way. This can cause very bad performance if a sender is trying
to stream fragmented UDP packets to the guest. For example when
using the UDP_STREAM test from netperf with UDP packets that are
much bigger than the MTU size, almost all UDP packets are dropped
in the guest since the chances are quite high that at least one of
the fragments got lost on the way.
When flushing the receive queue, it's much better if we'd have
a bunch of receive buffers available already, so that fragmented
packets can be passed to the guest in one go. To do this, the
spapr_vlan_receive() function should return 0 instead of -1 if there
are no more receive buffers available, so that receive_disabled = 1
gets temporarily set for the receive queue, and we have to delay
the queue flushing at the end of h_add_logical_lan_buffer() a little
bit by using a timer, so that the guest gets a chance to add multiple
RX buffers before we flush the queue again.
This improves the UDP_STREAM test with the spapr-vlan device a lot:
Running
netserver -p 44444 -L <guestip> -f -D -4
in the guest, and
netperf -p 44444 -L <hostip> -H <guestip> -t UDP_STREAM -l 60 -- -m 16384
in the host, I get the following values _without_ this patch:
Socket Message Elapsed Messages
Size Size Time Okay Errors Throughput
bytes bytes secs # # 10^6bits/sec
229376 16384 60.00 1738970 0 3798.83
229376 60.00 23 0.05
That "0.05" means that almost all UDP packets got lost/discarded
at the receiving side.
With this patch applied, the value look much better:
Socket Message Elapsed Messages
Size Size Time Okay Errors Throughput
bytes bytes secs # # 10^6bits/sec
229376 16384 60.00 1789104 0 3908.35
229376 60.00 22818 49.85
Signed-off-by: Thomas Huth <thuth@redhat.com>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
When receiving packets over MIPSnet network device, it uses
receive buffer of size 1514 bytes. In case the controller
accepts large(MTU) packets, it could lead to memory corruption.
Add check to avoid it.
Reported by: Oleksandr Bazhaniuk <oleksandr.bazhaniuk@intel.com>
Signed-off-by: Prasad J Pandit <pjp@fedoraproject.org>
Signed-off-by: Jason Wang <jasowang@redhat.com>
open_eth_start_xmit has a huge stack usage of 65536 bytes approx.
Moving large arrays to heap to reduce stack usage.
Reduce size of a buffer allocated on stack to 0x600 bytes, which is the
maximal frame length when HUGEN bit is not set in MODER, only allocate
buffer on heap when that is too small. Thus heap is not used in typical
use case.
Signed-off-by: Zhou Jie <zhoujie2011@cn.fujitsu.com>
Signed-off-by: Max Filippov <jcmvbkbc@gmail.com>
Drop local definitions of MII registers and use constants from mii.h for
registers and register bits. No functional changes.
Signed-off-by: Max Filippov <jcmvbkbc@gmail.com>
Reserve this to CPU state serialization.
Luckily, they were only used by sPAPR devices and these are ppc64
only. So there is no change to migration format.
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
When receiving packets over Stellaris ethernet controller, it
uses receive buffer of size 2048 bytes. In case the controller
accepts large(MTU) packets, it could lead to memory corruption.
Add check to avoid it.
Reported-by: Oleksandr Bazhaniuk <oleksandr.bazhaniuk@intel.com>
Signed-off-by: Prasad J Pandit <pjp@fedoraproject.org>
Message-id: 1460095428-22698-1-git-send-email-ppandit@redhat.com
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>