When a new option is added that qemu does not know
about, the prudent thing is to use the default not
force it to "no".
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Windows will fail to start drivers for devices with an Endpoint type
PCIe capability attached to a Root Complex (code 10 - Device cannot
start). The proper type for such a device is Root Complex Integrated
Endpoint. Devices don't care which they are, so do this conversion
automatically.
This allows the Windows driver to load for nec-usb-xhci when attached
to pcie.0 of a q35 machine.
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Convert q35, ioh3420, xio3130_upstream, and xio3130_downstream to
use the new TYPE_PCIE_BUS.
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Move these so that we can reference them from a more common header
instead of including pci_bus.h everywhere.
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
This will allow us to differentiate Express and Legacy buses.
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
find_ram_offset() does not handle size=0 gracefully. It hands out the
same RAMBlock offset multiple times, leading to obscure failures later
on.
Add an assert to warn early if something is incorrectly allocating a
zero size RAMBlock.
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
A zero size ROM file is invalid and should produce a warning.
Attempting to use a zero size file ends up hitting an assertion
qemu_ram_set_idstr() because RAMBlocks with duplicate addresses are
allocated - due to zero size the allocator doesn't increment the next
available RAMBlock offset.
Also convert __FUNCTION__ to __func__ while we're touching this code.
There are no other __FUNCTION__ instances in pci.c anymore.
Reported-by: Milos Ivanovic <milosivanovic@orcon.net.nz>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
pci_bridge_dev_map_irq_fn() is identical to pci_swizzle_map_irq_fn(),
which is now the default for all PCI bridges. We can therefore remove
this function and the pci_bridge_map_irq() call that used it.
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
The PCI bridge spec defines a default swizzle for translating INTx
IRQs from secondary bus to primary. Use this by default for any
bridge that doesn't set a function.
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
For some reason we recurse to fire the INTx routing notifier for each
child of a bus, for each possible device of a bus. That means that if
we add a root port, the notifier gets called for that bridge 256
times. If we add an upstream switch behind that root port, 256^2. But
of course we need a downstream switch, 256^3. This starts to be
noticeable. Stop the insanity.
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
We use the same formatting for all files, it
doesn't make sense to have formatting directives only
in pci bridge header.
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Reuse common code in pcie_port, override the hardwired-to-0
bits per PCI Express spec.
No functional change but makes the code easier to follow.
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Each PCI Bridge has a set of implied VGA regions that are enabled when
the VGA bit is set in the bridge control register. This allows VGA
devices behind bridges. Unfortunately with VGA Enable, which we
formerly allowed but didn't back, comes along some required VGA
baggage. VGA Palette Snooping is required, along with VGA 16-bit
decoding. We don't yet have support for palette snooping.
We also don't have support for 10-bit VGA aliases, the default mode, but
we enable the register, even on root ports, to avoid confusing guests.
Fortunately there's likely nothing from this century that requires these
features, so the missing bits are noted with TODOs.
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Allow devices to register VGA memory regions for handling PCI spec
defined VGA I/O port and MMIO areas. PCI will attach these to the
bus address spaces and enable them according to the device command
register value.
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
non-irqfd setups are currently broken with vhost:
we start up masked and nothing unmasks the interrupts.
Fix by using mask notifiers, same as the irqfd path.
Sharing irqchip/non irqchip code is always a good thing,
in this case it will help non irqchip benefit
from backend masking optimization.
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Following commit 921ac5d0f3 (virtio-net:
remove layout assumptions for ctrl vq), this patch makes multiqueue ctrl
handling not rely on the layout of descriptors.
Signed-off-by: Jason Wang <jasowang@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Add QOM path to device deleted event. It now becomes useful to report
it for devices which don't have an ID assigned.
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
It seems more logical to have destruction flow start with the subclass
and move up to the base class. This ensures object has a valid
canonical path when destructor is called.
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
libvirt has a long-standing bug: when removing the device,
it can request removal but does not know when the
removal completes. Add an event so we can fix this in a robust way.
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Currently the Makefile creates TAGS for emacs with the command:
find "$(SRC_PATH)" -name '*.[hc]' -print0 | xargs -0 etags
That works only if xargs ends up invoking etags just once. If xargs runs
etags several times, as it will if there are enough files, then the later
invocations will overwrite the output from the earlier invocations. This
patch uses the etags --append option to fix the bug.
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Message-id: 1363057048-21534-1-git-send-email-david@gibson.dropbear.id.au
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
# By Stefan Hajnoczi (14) and others
# Via Stefan Hajnoczi
* stefanha/block: (28 commits)
blockdev: Fix up copyright and permission notice
qemu-iotests: use -nographic in test case 007
qemu-iotests: add tests for rebasing zero clusters
dataplane: fix hang introduced by AioContext transition
coroutine: use AioContext for CoQueue BH
threadpool: drop global thread pool
block: add bdrv_get_aio_context()
aio: add a ThreadPool instance to AioContext
threadpool: add thread_pool_new() and thread_pool_free()
threadpool: move globals into struct ThreadPool
main-loop: add qemu_get_aio_context()
sheepdog: set io_flush handler in do_co_req
sheepdog: use non-blocking fd in coroutine context
qcow2: make is_allocated return true for zero clusters
qcow2: drop unnecessary flush in qcow2_update_snapshot_refcount()
qcow2: drop flush in update_cluster_refcount()
qcow2: flush in qcow2_update_snapshot_refcount()
qcow2: set L2 cache dependency in qcow2_alloc_bytes()
qcow2: flush refcount cache correctly in qcow2_write_snapshots()
qcow2: flush refcount cache correctly in alloc_refcount_block()
...
# By Christian Borntraeger (1) and Cornelia Huck (1)
# Via Cornelia Huck
* cohuck/virtio-ccw-upstr:
virtio-ccw: Wire up virtio-rng.
virtio-ccw: remove qdev_unparent in unplug routing
Screwed up in commit 666daa68. Thanks to Kevin Wolf for reminding me
to fix this.
Signed-off-by: Markus Armbruster <armbru@redhat.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
A comment explains that -nographic hangs test case 007. This is no
longer the case so add -nographic. This makes the test suite faster and
more pleasant to run since no windows pop up.
I am not sure exactly when -nographic starting working for this case but
there is no fundamental reason why graphics are needed here. Make sure
the serial port is not on stdio, it would conflict with the monitor.
Also remove unnecessary trailing whitespace on these lines.
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Tested-by: Kevin Wolf <kwolf@redhat.com>
If zero clusters are erroneously treated as unallocated, "qemu-img rebase"
will copy the backing file's contents onto the cluster.
The bug existed also in image streaming, but since the root cause was in
qcow2's is_allocated implementation it is enough to test it with qemu-img.
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
The bug is that the EventNotifiers do have a NULL io_flush callback.
Because _none_ of the callbacks on the dataplane AioContext have such a
callback, aio_poll will simply do nothing. Fixed by adding the callbacks:
the ioeventfd will always be polled (this can change in the future to
pause/resume the processing during live snapshots or similar operations);
the ioqueue will be polled if there are outstanding requests.
I must admit I have screwed up my testing somehow, because commit
2c20e71 does not work even if cherry-picked on top of 1.4.0, and this
patch fixes it there as well.
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
CoQueue uses a BH to awake coroutines that were made ready to run again
using qemu_co_queue_next() or qemu_co_queue_restart_all(). The BH
currently runs in the iothread AioContext and would break coroutines
that run in a different AioContext.
This is a slightly tricky problem because the lifetime of the BH exceeds
that of the CoQueue. This means coroutines can be awoken after CoQueue
itself has been freed. Also, there is no qemu_co_queue_destroy()
function which we could use to handle freeing resources.
Introducing qemu_co_queue_destroy() has a ripple effect of requiring us
to also add qemu_co_mutex_destroy() and qemu_co_rwlock_destroy(), as
well as updating all callers. Avoid doing that.
We also cannot switch from BH to GIdle function because aio_poll() does
not dispatch GIdle functions. (GIdle functions make memory management
slightly easier because they free themselves.)
Finally, I don't want to move unlock_queue and unlock_bh into
AioContext. That would break encapsulation - AioContext isn't supposed
to know about CoQueue.
This patch implements a different solution: each qemu_co_queue_next() or
qemu_co_queue_restart_all() call creates a new BH and list of coroutines
to wake up. Callers tend to invoke qemu_co_queue_next() and
qemu_co_queue_restart_all() occasionally after blocking I/O, so creating
a new BH for each call shouldn't be massively inefficient.
Note that this patch does not add an interface for specifying the
AioContext. That is left to future patches which will convert CoQueue,
CoMutex, and CoRwlock to expose AioContext.
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
Now that each AioContext has a ThreadPool and the main loop AioContext
can be fetched with bdrv_get_aio_context(), we can eliminate the concept
of a global thread pool from thread-pool.c.
The submit functions must take a ThreadPool* argument.
block/raw-posix.c and block/raw-win32.c use
aio_get_thread_pool(bdrv_get_aio_context(bs)) to fetch the main loop's
ThreadPool.
tests/test-thread-pool.c must be updated to reflect the new
thread_pool_submit() function prototypes.
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
For now bdrv_get_aio_context() is just a stub that calls
qemu_aio_get_context() since the block layer is currently tied to the
main loop AioContext.
Add the stub now so that the block layer can begin accessing its
AioContext.
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
This patch adds a ThreadPool to AioContext. It's possible that some
AioContext instances will never use the ThreadPool, so defer creation
until aio_get_thread_pool().
The reason why AioContext should have the ThreadPool is because the
ThreadPool is bound to a AioContext instance where the work item's
callback function is invoked. It doesn't make sense to keep the
ThreadPool pointer anywhere other than AioContext. For example,
block/raw-posix.c can get its AioContext's ThreadPool and submit work.
Special note about headers: I used struct ThreadPool in aio.h because
there is a circular dependency if aio.h includes thread-pool.h.
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
ThreadPool is tied to an AioContext through its event notifier, which
dictates in which AioContext the work item's callback function will be
invoked.
In order to support multiple AioContexts we need to support multiple
ThreadPool instances.
This patch adds the new/free functions. The free function deserves
special attention because it quiesces remaining worker threads. This
requires a new condition variable and a "stopping" flag to let workers
know they should terminate once idle.
We never needed to do this before since the global threadpool was not
explicitly destroyed until process termination.
Also stash the AioContext pointer in ThreadPool so that we can call
aio_set_event_notifier() in thread_pool_free(). We didn't need to hold
onto AioContext previously since there was no free function.
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
Move global variables into a struct so multiple thread pools can be
supported in the future.
This patch does not change thread-pool.h interfaces. There is still a
global thread pool and it is not yet possible to create/destroy
individual thread pools. Moving the variables into a struct first makes
later patches easier to review.
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
It is very useful to get the main loop AioContext, which is a static
variable in main-loop.c.
I'm not sure whether qemu_get_aio_context() will be necessary in the
future once devices focus on using their own AioContext instead of the
main loop AioContext, but for now it allows us to refactor code to
support multiple AioContext while actually passing the main loop
AioContext.
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
If an io_flush handler is not set, qemu_aio_wait doesn't invoke
callbacks.
Signed-off-by: MORITA Kazutaka <morita.kazutaka@lab.ntt.co.jp>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Using a blocking socket in the coroutine context reduces the chance of
switching to other work. This patch makes the sheepdog driver use a
non-blocking fd always.
Signed-off-by: MORITA Kazutaka <morita.kazutaka@lab.ntt.co.jp>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Otherwise, live migration of the top layer will miss zero clusters and
let the backing file show through. This also matches what is done in qed.
QCOW2_CLUSTER_ZERO clusters are invalid in v2 image files. Check this
directly in qcow2_get_cluster_offset instead of replicating the test
everywhere.
Cc: qemu-stable@nongnu.org
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
We already flush when the function completes. There is no need to flush
after every compressed cluster.
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
The update_cluster_refcount() function increments/decrements a cluster's
refcount and then returns the new refcount value.
There is no need to flush since both update_cluster_refcount() callers
already take care of this:
1. qcow2_alloc_bytes() calls update_cluster_refcount() when compressed
sectors will be appended to an existing cluster with enough free
space. qcow2_alloc_bytes() already flushes so there is no need to do
so in update_cluster_refcount().
2. qcow2_update_snapshot_refcount() sets a cache dependency on refcounts
if it needs to update L2 entries. It also flushes before completing.
Removing this flush significantly speeds up qcow2 snapshot creation:
$ qemu-img create -f qcow2 test.qcow2 -o size=50G,preallocation=metadata
$ time qemu-img snapshot -c new test.qcow2
Time drops from more than 3 minutes to under 1 second.
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Users of qcow2_update_snapshot_refcount() do not flush consistently.
qcow2_snapshot_create() flushes but qcow2_snapshot_goto() and
qcow2_snapshot_delete() do not.
Solve this by moving the bdrv_flush() into
qcow2_update_snapshot_refcount().
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Compressed writes use qcow2_alloc_bytes() to allocate space with byte
granularity. The affected clusters' refcounts will be incremented but
we do not need to flush yet.
Set a L2 cache dependency on the refcount block cache, so that the
refcounts get written out before the L2 updates.
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Since qcow2 metadata is cached we need to flush the caches, not just the
underlying file. Use bdrv_flush(bs) instead of bdrv_flush(bs->file).
Also add the error return path when bdrv_flush() fails and move the
flush after checking for qcow2_alloc_clusters() failure so that the
qcow2_alloc_clusters() error return value takes precedence.
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
update_refcount() affects the refcount cache, it does not write to disk.
Therefore bdrv_flush(bs->file) does nothing. We need to flush the
refcount cache in order to write out the refcount updates!
While we're here also add error returns when qcow2_cache_flush() fails.
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
qcow2 images now accept a boolean lazy_refcounts options. Use it like
this:
-drive file=test.qcow2,lazy_refcounts=on
If the option is specified on the command line, it overrides the default
specified by the qcow2 header flags that were set when creating the
image.
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Any non-default -drive options are now passed down to the block drivers.
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Pointing to a QemuOpts element is surprising and can lead to subtle
use-after-free errors when the QemuOpts is freed after all options are
parsed.
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
This adds a function that adds all entries of a QDict to a QemuOpts if
the keys are known, and leaves only the rest in the QDict.
This way a single QDict of -drive options can be processed in multiple
places (generic block layer, block driver, backing file block driver,
etc.), where each part picks the options it knows. If at the end of the
process the QDict isn't empty, the user specified an invalid option.
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
The options are passed down to the block drivers, which are supposed to
remove all options they have processed. Anything that is left over in
the end is an unknown option and results in an error.
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>