virtio-net has code to flush the queue and notify the iothread
whenever new receive buffers are added by the guest. That is
fine, and indeed we need to do the same in all other drivers.
However, notifying the iothread should be work for the network
subsystem. And since we are at it we can add a little smartness:
if some of the queued packets already could not be delivered,
there is no need to notify the iothread.
Reported-by: Luigi Rizzo <rizzo@iet.unipi.it>
Cc: Stefan Hajnoczi <stefanha@linux.vnet.ibm.com>
Cc: Jan Kiszka <jan.kiszka@siemens.de>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Reviewed-by: Amos Kong <akong@redhat.com>
Signed-off-by: Stefan Hajnoczi <stefanha@linux.vnet.ibm.com>
Another step in moving the vlan feature out of net core. Users only
deal with NetClientState and therefore qemu_del_vlan_client() should be
named qemu_del_net_client().
Signed-off-by: Stefan Hajnoczi <stefanha@linux.vnet.ibm.com>
Signed-off-by: Zhi Yong Wu <wuzhy@linux.vnet.ibm.com>
Reviewed-by: Laszlo Ersek <lersek@redhat.com>
The vlan feature is no longer part of net core. Rename VLANClientState
to NetClientState because net clients are not explicitly associated with
a vlan at all, instead they have a peer net client to which they are
connected.
This patch is a mechanical search-and-replace except for a few
whitespace fixups where changing VLANClientState to NetClientState
misaligned whitespace.
Signed-off-by: Stefan Hajnoczi <stefanha@linux.vnet.ibm.com>
Signed-off-by: Zhi Yong Wu <wuzhy@linux.vnet.ibm.com>
Reviewed-by: Laszlo Ersek <lersek@redhat.com>
Reorder arguments to be more natural, readable and
consistent with other iov_* functions, and change
argument names, from:
iov_from_buf(iov, iov_cnt, buf, iov_off, size)
to
iov_from_buf(iov, iov_cnt, offset, buf, bytes)
The result becomes natural English:
copy data to this `iov' vector with `iov_cnt'
elements starting at byte offset `offset'
from memory buffer `buf', processing `bytes'
bytes max.
(Try to read the original prototype this way).
Also change iov_clear() to more general iov_memset()
(it uses memset() internally anyway).
While at it, add comments to the header file
describing what the routines actually does.
The patch only renames argumens in the header, but
keeps old names in the implementation. The next
patch will touch actual code to match.
Now, it might look wrong to pay so much attention
to so small things. But we've so many badly designed
interfaces already so the whole thing becomes rather
confusing or error prone. One example of this is
previous commit and small discussion which emerged
from it, with an outcome that the utility functions
like these aren't well-understdandable, leading to
strange usage cases. That's why I paid quite some
attention to this set of functions and a few
others in subsequent patches.
Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>
Right now, DeviceInfo acts as the class for qdev. In order to switch to a
proper ObjectClass derivative, we need to ween all of the callers off of
interacting directly with the info pointer.
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
virtio_cleanup() will be changed by the following patch to remove the
VirtIONet struct that gets allocated via virtio_common_init(). Ensure
we don't dereference the structure after calling the cleanup function.
Signed-off-by: Amit Shah <amit.shah@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
iov_to_buf() has an 'offset' parameter, iov_from_buf() hasn't.
This patch adds the missing parameter to iov_from_buf().
It also renames the 'offset' parameter to 'iov_off' to
emphasize it's the offset into the iovec and not the buffer.
Signed-off-by: Hannes Reinecke <hare@suse.de>
Acked-by: Alexander Graf <agraf@suse.de>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
This was done with:
sed -i 's/qemu_get_clock\>/qemu_get_clock_ns/' \
$(git grep -l 'qemu_get_clock\>' )
sed -i 's/qemu_new_timer\>/qemu_new_timer_ns/' \
$(git grep -l 'qemu_new_timer\>' )
after checking that get_clock and new_timer never occur twice
on the same line. There were no missed occurrences; however, even
if there had been, they would have been caught by the compiler.
There was exactly one false positive in qemu_run_timers:
- current_time = qemu_get_clock (clock);
+ current_time = qemu_get_clock_ns (clock);
which is of course not in this patch.
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
A pointer to a size_t variable was passed as the void * pointer to
lduw_p() in virtio_net_receive(). Instead of acting on the 16-bit value
this caused failure on big-endian hosts.
Avoid this issue in the future by using stw_p() instead. In general we
should use ld*_p() for loading from target memory and st*_p() for
storing to target memory anyway, not the other way around.
Also tighten up a correct use of lduw_p() when stw_p() should be used
instead in virtio_net_get_config().
Signed-off-by: Stefan Hajnoczi <stefanha@linux.vnet.ibm.com>
Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>
qemu makes it possible to disable link at tap which is not communicated
to the guest but causes all packets to be dropped.
When vhost-net is enabled, vhost needs to be aware of both the virtio
link_down and the peer link_down. we switch to userspace emulation when
either is down.
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Reported-by: pradeep <psuriset@linux.vnet.ibm.com>
Acked-by: Alex Williamson <alex.williamson@redhat.com>
Acked-by: Jason Wang <jasowang@redhat.com>
Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>
When MSI is off, each interrupt needs to be bounced through the io
thread when it's set/cleared, so vhost-net causes more context switches and
higher CPU utilization than userspace virtio which handles networking in
the same thread.
We'll need to fix this by adding level irq support in kvm irqfd,
for now disable vhost-net in these configurations.
Added a vhostforce flag to force vhost-net back on.
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
virtio-net used to work on cross-endianness configurations, but doesn't
anymore with recent guest kernels, as the new features don't handle
endianness correctly.
This patch fixes wrong conversion, and add missing ones to make
virtio-net working. Tested on the following configurations:
- i386 guest on x86_64 host
- ppc guest on x86_64 host
- i386 guest on mips host
- ppc guest on mips host
Cc: Anthony Liguori <aliguori@us.ibm.com>
Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>
Move tracking vmstate change from virtio-net to virtio.c
as it is going to be used by virito-blk and virtio-pci
for the ioeventfd support.
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
If bootindex is specified on command line a string that describes device
in firmware readable way is added into sorted list. Later this list will
be passed into firmware to control boot order.
Signed-off-by: Gleb Natapov <gleb@redhat.com>
Signed-off-by: Blue Swirl <blauwirbel@gmail.com>
Avoid sending out packets, and modifying
memory, when VM is stopped.
Add assert statements to verify this does not happen.
Avoid scheduling bh when vhost-net is started.
Stop bh when driver disabled bus mastering
(we must not access memory after this).
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Tested-by: Jason Wang <jasowang@redhat.com>
DMA into memory while VM is stopped makes it
hard to debug migration (consequitive saves
result in different files).
Fixing this completely is a large effort,
this patch does this for virtio-net.
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Tested-by: Jason Wang <jasowang@redhat.com>
Move all of vhost-net start/stop logic to a single routine,
and call it from everywhere.
Additionally, start/stop vhost-net on link up/down:
we should not transmit anything if user asked us to
put the link down.
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Acked-by: Alex Williamson <alex.williamson@redhat.com>
Based on a patch from Mark McLoughlin, this patch introduces a new
bottom half packet transmitter that avoids the latency imposed by
the tx_timer approach. Rather than scheduling a timer when a TX
packet comes in, schedule a bottom half to be run from the iothread.
The bottom half handler first attempts to flush the queue with
notification disabled (this is where we could race with a guest
without txburst). If we flush a full burst, reschedule immediately.
If we send short of a full burst, try to re-enable notification.
To avoid a race with TXs that may have occurred, we must then
flush again. If we find some packets to send, the guest it probably
active, so we can reschedule again.
tx_timer and tx_bh are mutually exclusive, so we can re-use the
tx_waiting flag to indicate one or the other needs to be setup.
This allows us to seamlessly migrate between timer and bh TX
handling.
The bottom half handler becomes the new default and we add a new
tx= option to virtio-net-pci. Usage:
-device virtio-net-pci,tx=timer # select timer mitigation vs "bh"
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
De-couple this from the timer since we might want to use
different backends to send the packet.
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
If virtio_net_flush_tx() is called with notification disabled, we can
race with the guest, processing packets at the same rate as they
get produced. The trouble is that this means we have no guaranteed
exit condition from the function and can spend minutes in there.
Currently flush_tx is only called with notification on, which seems
to limit us to one pass through the queue per call. An upcoming
patch changes this.
Also add an option to set this value on the command line as different
workloads may wish to use different values. We can't necessarily
support any random value, so this is a developer option: x-txburst=
Usage:
-device virtio-net-pci,x-txburst=64 # 64 packets per tx flush
One pass through the queue (256) seems to be a good default value
for this, balancing latency with throughput. We use a signed int
for x-txburst because 2^31 packets in a burst would take many, many
minutes to process and it allows us to easily return a negative
value value from virtio_net_flush_tx() to indicate a back-off
or error condition.
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Add an option to make the TX mitigation timer adjustable as a device
option. The 150us hard coded default used currently is reasonable,
but may not be suitable for all workloads, this gives us a way to
adjust it using a single binary. We can't support any random option
though, so use the "x-" prefix to indicate this is a developer
option. Usage:
-device virtio-net-pci,x-txtimer=500000,... # .5ms timeout
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
We were requesting too much when checking buffer
length: size already includes host header length.
Further, we should not exit if we get a packet that
is too long, since this might not be under control
of the guest. Just drop the packet.
Red Hat bz 591494
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Stuff a pointer to the DeviceState into the VirtIONet structure so that
we can easily remove the vmstate entry later. Also, let vmstate track
the instance number (it should always be zero internally since the
device path should now be unique).
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
When available, we'd like to be able to access the DeviceState
when registering a savevm. For buses with a get_dev_path()
function, this will allow us to create more unique savevm
id strings.
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
virtio net attempts to peek into virtio queue to
determine that we have enough space for the complete
packet to fit. However, it fails to account for space
consumed by virtio net header when it does this,
under stress this results in a failure
with the message 'truncating packet'.
redhat bz 591494.
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
vhost net currently keeps running after vmstop,
which causes trouble as qemy does not check
for dirty pages anymore.
The fix is to simply keep vm and vhost running/stopped
status in sync.
Tested-by: David L Stevens <dlstevens@us.ibm.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
virtio-net has return with value in a void function.
No idea why does it compile with gcc,
but this isn't standard C.
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
The virtio-net code uses iov_fill() which fills an iov from a linear
buffer. The virtio-serial-bus code does something similar in an
open-coded function.
Create a new iov.c file that has iov_from_buf().
Convert virtio-net and virtio-serial-bus over to use this functionality.
virtio-net used ints to hold sizes, the new function is going to use
size_t types.
Later commits will add the opposite functionality -- going from an iov
to a linear buffer.
Signed-off-by: Amit Shah <amit.shah@redhat.com>
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
vhost driver in qemu didn't ack features, and this happens
to work because we don't really require any features. However,
it's better not to rely on this. This patch passes features to
vhost as guest acks them.
Signed-off-by: David L Stevens <dlstevens@us.ibm.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>
This connects virtio-net to vhost net backend.
The code is structured in a way analogous to what we have with vnet
header capability in tap.
We start/stop backend on driver start/stop as
well as on save and vm start (for migration).
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
error_report() terminates the message with a newline. Strip it it
from its arguments.
This fixes a few error messages lacking a newline:
net_handle_fd_param()'s "No file descriptor named %s found", and
tap_open()'s "vnet_hdr=1 requested, but no kernel support for
IFF_VNET_HDR available" (all three versions).
There's one place that passes arguments without newlines
intentionally: load_vmstate(). Fix it up.
Fix a race condition where qemu finds that there are not enough virtio
ring buffers available and the guest make more buffers available before
qemu can enable notifications.
Signed-off-by: Tom Lendacky <toml@us.ibm.com>
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
clang-analyzer points out value assigned to 'len' is not used.
Signed-off-by: Amit Shah <amit.shah@redhat.com>
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
Mac feature bit isn't going to work as all network cards already have a
'mac' property to set the mac address. Remove it from mask and add in
get_features.
Reported-by: Gerd Hoffmann <kraxel@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
Add feature bits as properties to virtio. This makes it possible to e.g. define
machine without indirect buffer support, which is required for 0.10
compatibility, or without hardware checksum support, which is required for 0.11
compatibility. Since default values for optional features are now set by qdev,
get_features callback has been modified: it sets non-optional bits, and clears
bits not supported by host.
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Acked-by: Gerd Hoffmann <kraxel@redhat.com>
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
Rename features->guest_features. This is
what they are, avoid confusion with
host features which we also need to keep around.
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
Add a NetClientInfo pointer to VLANClientState and use that
for the typecode and function pointers.
Signed-off-by: Mark McLoughlin <markmc@redhat.com>
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
This patch is a followup to 8eca6b1bc7,
fixing crashes when guests with 2.6.25 virtio drivers have saturated
virtio network connections.
https://bugs.edge.launchpad.net/ubuntu/+source/qemu-kvm/+bug/458521
That patch should have been whitelisting *_HOST_* rather than the the
*_GUEST_* features.
I tested this by running an Ubuntu 8.04 Hardy guest (2.6.24 kernel +
2.6.25-virtio driver). I saturated both the incoming, and outgoing
network connection with nc, seeing sustained 6MB/s up and 6MB/s down
bitrates for ~20 minutes. Previously, this crashed immediately. Now,
the guest does not crash and maintains network connectivity throughout
the test.
Signed-off-by: Dustin Kirkland <kirkland@canonical.com>
Signed-off-by: Mark McLoughlin <markmc@redhat.com>
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
We should only return zero from receive() for a condition which we'll
get notification of when it changes. Currently, we're returning zero
if the guest driver is not ready, but we won't ever flush our queue
when that status changes.
Also, don't check buffer space in can_receive(), but instead just allow
receive() to return zero when this condition occurs and have the caller
handle queueing the packet.
Signed-off-by: Mark McLoughlin <markmc@redhat.com>
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
This commit:
commit 97b15621
virtio: use qdev properties for configuration.
Signed-off-by: Gerd Hoffmann <kraxel@redhat.com>
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
makes a guest using virtio-net see an empty macaddr because we never
copy the macaddr into the location that virtio_net_get_config() uses.
Signed-off-by: Mark McLoughlin <markmc@redhat.com>
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>