The find_desc_by_name() from util/qemu-option.c relies on the .name not being
NULL to call strcmp(). This check becomes unsafe when the list is not
NULL-terminated, which is the case of nbd_runtime_opts in block/nbd.c, and can
result in segmentation fault when strcmp() tries to access an invalid memory:
#0 0x00007fff8c75f7d4 in __strcmp_power9 () from /lib64/libc.so.6
#1 0x00000000102d3ec8 in find_desc_by_name (desc=0x1036d6f0, name=0x28e46670 "server.path") at util/qemu-option.c:166
#2 0x00000000102d93e0 in qemu_opts_absorb_qdict (opts=0x28e47a80, qdict=0x28e469a0, errp=0x7fffec247c98) at util/qemu-option.c:1026
#3 0x000000001012a2e4 in nbd_open (bs=0x28e42290, options=0x28e469a0, flags=24578, errp=0x7fffec247d80) at block/nbd.c:406
#4 0x00000000100144e8 in bdrv_open_driver (bs=0x28e42290, drv=0x1036e070 <bdrv_nbd_unix>, node_name=0x0, options=0x28e469a0, open_flags=24578, errp=0x7fffec247f50) at block.c:1135
#5 0x0000000010015b04 in bdrv_open_common (bs=0x28e42290, file=0x0, options=0x28e469a0, errp=0x7fffec247f50) at block.c:1395
>From gdb, the desc[i].name was not NULL and resulted in strcmp() accessing an
invalid memory:
>>> p desc[5]
$8 = {
name = 0x1037f098 "R27A",
type = 1561964883,
help = 0xc0bbb23e <error: Cannot access memory at address 0xc0bbb23e>,
def_value_str = 0x2 <error: Cannot access memory at address 0x2>
}
>>> p desc[6]
$9 = {
name = 0x103dac78 <__gcov0.do_qemu_init_bdrv_nbd_init> "\001",
type = 272101528,
help = 0x29ec0b754403e31f <error: Cannot access memory at address 0x29ec0b754403e31f>,
def_value_str = 0x81f343b9 <error: Cannot access memory at address 0x81f343b9>
}
This patch fixes the segmentation fault in strcmp() by adding a NULL element at
the end of nbd_runtime_opts.desc list, which is the common practice to most of
other structs like runtime_opts in block/null.c. Thus, the desc[i].name != NULL
check becomes safe because it will not evaluate to true when .desc list reached
its end.
Reported-by: R. Nageswara Sastry <nasastry@in.ibm.com>
Buglink: https://bugs.launchpad.net/qemu/+bug/1727259
Signed-off-by: Murilo Opsfelder Araujo <muriloo@linux.vnet.ibm.com>
Message-Id: <20180105133241.14141-2-muriloo@linux.vnet.ibm.com>
CC: qemu-stable@nongnu.org
Fixes: 7ccc44fd7d
Signed-off-by: Eric Blake <eblake@redhat.com>
If we are careful to handle 0-length read requests correctly,
we can optimize our sparse read to send the NBD_REPLY_FLAG_DONE
bit on our last OFFSET_DATA or OFFSET_HOLE chunk rather than
needing a separate chunk.
Signed-off-by: Eric Blake <eblake@redhat.com>
Message-Id: <20171107030912.23930-3-eblake@redhat.com>
Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
The reason that NBD added structured reply in the first place was
to allow for efficient reads of sparse files, by allowing the
reply to include chunks to quickly communicate holes to the client
without sending lots of zeroes over the wire. Time to implement
this in the server; our client can already read such data.
We can only skip holes insofar as the block layer can query them;
and only if the client is okay with a fragmented request (if a
client requests NBD_CMD_FLAG_DF and the entire read is a hole, we
could technically return a single NBD_REPLY_TYPE_OFFSET_HOLE, but
that's a fringe case not worth catering to here). Sadly, the
control flow is a bit wonkier than I would have preferred, but
it was minimally invasive to have a split in the action between
a fragmented read (handled directly where we recognize
NBD_CMD_READ with the right conditions, and sending multiple
chunks) vs. a single read (handled at the end of nbd_trip, for
both simple and structured replies, when we know there is only
one thing being read). Likewise, I didn't make any effort to
optimize the final chunk of a fragmented read to set the
NBD_REPLY_FLAG_DONE, but unconditionally send that as a separate
NBD_REPLY_TYPE_NONE.
Signed-off-by: Eric Blake <eblake@redhat.com>
Message-Id: <20171107030912.23930-2-eblake@redhat.com>
Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
-----BEGIN PGP SIGNATURE-----
iQEcBAABAgAGBQJaPWUZAAoJEHWtZYAqC0IRe2oH/1tRMtwtoO2rvd7JBdIgl56J
q+PTTOc/vI+YU9Yr7U0/oRnuX+QRswtLsWII8PKjj0bDc5eRm8NcT0dA7OmJ1KcV
wgfIr8PsaO3Rz73ZV7AJ2epJuFJ8jJvfRiJ4nCdDXMGblmQHVurYPaUAf4OJkWTA
a8He8zImjW5Qw51CMfU1Dq9MZfGaHc/i1HNo7kusEn9pEAzjQ8dSqJPYo/TIsLyK
5dXSSWDQCRSXbd84Ft2idMFmIbZYVAihNuclc7oQ6wqMYH7oin0KV3h2QSGwFdFb
FPlGEsoZ5Yk805ZCblkfqSGPI3Y9R2ZkgAgEP4TD+6dJOB8T35c2XdQo8YMI3G8=
=NbNT
-----END PGP SIGNATURE-----
Merge remote-tracking branch 'remotes/stefanberger/tags/pull-tpm-2017-12-22-1' into staging
Merge tpm 2017/12/22 v1
# gpg: Signature made Fri 22 Dec 2017 20:03:37 GMT
# gpg: using RSA key 0x75AD65802A0B4211
# gpg: Good signature from "Stefan Berger <stefanb@linux.vnet.ibm.com>"
# gpg: WARNING: This key is not certified with a trusted signature!
# gpg: There is no indication that the signature belongs to the owner.
# Primary key fingerprint: B818 B9CA DF90 89C2 D5CE C66B 75AD 6580 2A0B 4211
* remotes/stefanberger/tags/pull-tpm-2017-12-22-1:
acpi: Update TPM2 ACPI table to more recent specs
tpm: Implement tpm_sized_buffer_reset
tpm_tis: merge r/w_offset into rw_offset
tpm_tis: move r/w_offsets to TPMState
tpm_tis: merge read and write buffer into single buffer
tpm_tis: move buffers from localities into common location
tpm_tis: remove TPMSizeBuffer usage
tpm_tis: limit size of buffer from backend
tpm_tis: convert uint32_t to size_t
tpm_emulator: Add a caching layer for the TPM Established flag
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Version: GnuPG v1
iQEcBAABAgAGBQJaPGoNAAoJEO8Ells5jWIR7lMH/iQrtE4qSCKbdIMM6Qf4ccpS
qMV15tZmT2cVpGXWSrDl6xhQ+7BXmQX6buJqyuf97Q1niVJuqnilsmQrYkkh2mR4
0NIunu3t24v0eeKcmIWnT/L7+9/S0h97X5TFQCZYST5W/ZsUYCYN2EaIGQWUN8y+
dSrpJvoxDvFrMv66W5H/Kskm84LL2sqQg76cxawLy7nYF/M6SiRIoovuDSB58ceq
iUZd1Jxk8IWPFktiAJ/zc3VKPfVuAomJhNMCNWNFEdDHEBmFe+TxqtsnnCMM22mA
fZQCu3eOObW8J+1V/y+5S1g7cpHqUS4tpivQzGEYO6/OWpMPIOYseHIDIYM87G4=
=BYlK
-----END PGP SIGNATURE-----
Merge remote-tracking branch 'remotes/jasowang/tags/net-pull-request' into staging
# gpg: Signature made Fri 22 Dec 2017 02:12:29 GMT
# gpg: using RSA key 0xEF04965B398D6211
# gpg: Good signature from "Jason Wang (Jason Wang on RedHat) <jasowang@redhat.com>"
# gpg: WARNING: This key is not certified with sufficiently trusted signatures!
# gpg: It is not certain that the signature belongs to the owner.
# Primary key fingerprint: 215D 46F4 8246 689E C77F 3562 EF04 965B 398D 6211
* remotes/jasowang/tags/net-pull-request:
qemu-doc: Update the deprecation information of -tftp, -bootp, -redir and -smb
qemu-doc: The "-net nic" option can be used with "netdev=...", too
net: Remove the legacy "-net channel" parameter
net: remove unused compute_mcast_idx() function
rtl8139: use inline net_crc32() and bitshift instead of compute_mcast_idx()
ne2000: use inline net_crc32() and bitshift instead of compute_mcast_idx()
ftgmac100: use inline net_crc32() and bitshift instead of compute_mcast_idx()
lan9118: use inline net_crc32() and bitshift instead of compute_mcast_idx()
opencores_eth: use inline net_crc32() and bitshift instead of compute_mcast_idx()
eepro100: use inline net_crc32() and bitshift instead of compute_mcast_idx()
sungem: fix multicast filter CRC calculation
sunhme: switch sunhme over to use net_crc32_le()
eepro100: switch eepro100 e100_compute_mcast_idx() over to use net_crc32()
pcnet: switch pcnet over to use net_crc32_le()
net: introduce net_crc32_le() function
net: move CRC32 calculation from compute_mcast_idx() into its own net_crc32() function
e1000: Separate TSO and non-TSO contexts, fixing UDP TX corruption
e1000, e1000e: Move per-packet TX offload flags out of context state
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
More recent specs of the TPM2 ACPI table add fields for the log area
start address and the log area minimum size, which we already use
for the TCPA table.
Signed-off-by: Stefan Berger <stefanb@linux.vnet.ibm.com>
Reviewed-by: Marc-André Lureau <marcandre.lureau@redhat.com>
The bdrv_reopen*() implementation doesn't like it if the graph is
changed between queuing nodes for reopen and actually reopening them
(one of the reasons is that queuing can be recursive).
So instead of draining the device only in bdrv_reopen_multiple(),
require that callers already drained all affected nodes, and assert this
in bdrv_reopen_queue().
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Fam Zheng <famz@redhat.com>
Since commit bde70715, base is the only node that is reopened in
commit_start(). This means that the code, which still involves an
explicit BlockReopenQueue, can now be simplified by using bdrv_reopen().
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Fam Zheng <famz@redhat.com>
We need to remember how many of the drain sections in which a node is
were recursive (i.e. subtree drain rather than node drain), so that they
can be correctly applied when children are added or removed during the
drained section.
With this change, it is safe to modify the graph even inside a
bdrv_subtree_drained_begin/end() section.
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
If bdrv_do_drained_begin/end() are called in coroutine context, they
first use a BH to get out of the coroutine context. Call some existing
tests again from a coroutine to cover this code path.
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
bdrv_drained_begin() waits for the completion of requests in the whole
subtree, but it only actually keeps its immediate bs parameter quiesced
until bdrv_drained_end().
Add a version that keeps the whole subtree drained. As of this commit,
graph changes cannot be allowed during a subtree drained section, but
this will be fixed soon.
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
This is in preparation for subtree drains, i.e. drained sections that
affect not only a single node, but recursively all child nodes, too.
Calling the parent callbacks for drain is pointless when we just came
from that parent node recursively and leads to multiple increases of
bs->quiesce_counter in a single drain call. Don't do it.
In order for this to work correctly, the parent callback must be called
for every bdrv_drain_begin/end() call, not only for the outermost one:
If we have a node N with two parents A and B, recursive draining of A
should cause the quiesce_counter of B to increase because its child N is
drained independently of B. If now B is recursively drained, too, A must
increase its quiesce_counter because N is drained independently of A
only now, even if N is going from quiesce_counter 1 to 2.
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
bdrv_do_drained_begin() restricts the call of parent callbacks and
aio_disable_external() to the outermost drain section, but the block
driver callbacks are always called. bdrv_do_drained_end() must match
this behaviour, otherwise nodes stay drained even if begin/end calls
were balanced.
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Block jobs are already paused using the BdrvChildRole drain callbacks,
so we don't need an additional block_job_pause_all() call.
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Block jobs already paused themselves when their main BlockBackend
entered a drained section. This is not good enough: We also want to
pause a block job and may not submit new requests if, for example, the
mirror target node should be drained.
This implements .drained_begin/end callbacks in child_job in order to
consider all block nodes related to the job, and removes the
BlockBackend callbacks which are unnecessary now because the root of the
job main BlockBackend is always referenced with a child_job, too.
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
This is currently only working correctly for bdrv_drain(), not for
bdrv_drain_all(). Leave a comment for the drain_all case, we'll address
it later.
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
The existing test is for bdrv_drain_all_begin/end() only. Generalise the
test case so that it can be run for the other variants as well. At the
moment this is only bdrv_drain_begin/end(), but in a while, we'll add
another one.
Also, add a backing file to the test node to test whether the operations
work recursively.
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
bdrv_drained_begin() doesn't increase bs->quiesce_counter recursively
and also doesn't notify other parent nodes of children, which both means
that the child nodes are not actually drained, and bdrv_drained_begin()
is providing useful functionality only on a single node.
To keep things consistent, we also shouldn't call the block driver
callbacks recursively.
A proper recursive drain version that provides an actually working
drained section for child nodes will be introduced later.
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Fam Zheng <famz@redhat.com>
Looks like we forgot to announce the deprecation of these options in
the corresponding chapter of the qemu-doc text, so let's do that now.
Signed-off-by: Thomas Huth <thuth@redhat.com>
Reviewed-by: John Snow <jsnow@redhat.com>
Reviewed-by: Markus Armbruster <armbru@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
It's been marked as deprecated since QEMU v2.10.0, and so far nobody
complained that we should keep it, so let's remove this legacy option
now to simplify the code quite a bit.
Signed-off-by: Thomas Huth <thuth@redhat.com>
Reviewed-by: John Snow <jsnow@redhat.com>
Reviewed-by: Markus Armbruster <armbru@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
It's not working anymore since QEMU v1.3.0 - time to remove it now.
Signed-off-by: Thomas Huth <thuth@redhat.com>
Reviewed-by: John Snow <jsnow@redhat.com>
Reviewed-by: Markus Armbruster <armbru@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Management tools create overlays of running guests with qemu-img:
$ qemu-img create -b /image/in/use.qcow2 -f qcow2 /overlay/image.qcow2
but this doesn't work anymore due to image locking:
qemu-img: /overlay/image.qcow2: Failed to get shared "write" lock
Is another process using the image?
Could not open backing image to determine size.
Use the force share option to allow this use case again.
Cc: qemu-stable@nongnu.org
Signed-off-by: Fam Zheng <famz@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Add trace output for commands, errors, and undefined behavior.
Add guest error log output for undefined behavior.
Report invalid undefined accesses to MMIO.
Annotate unlikely error checks with unlikely.
Signed-off-by: Doug Gale <doug16k@gmail.com>
Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Removing a quorum child node with x-blockdev-change results in a quorum
driver state that cannot be recreated with create options because it
would require a list with gaps. This causes trouble in at least
.bdrv_refresh_filename().
Document this problem so that we won't accidentally mark the command
stable without having addressed it.
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Alberto Garcia <berto@igalia.com>
Since bdrv_co_preadv does all neccessary checks including
reading after the end of the backing file, avoid duplication
of verification before bdrv_co_preadv call.
Signed-off-by: Edgar Kaziakhmedov <edgar.kaziakhmedov@virtuozzo.com>
Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Commit 15afd94a04 added code to acquire and release the AioContext in
qemuio_command(). This means that the lock is taken twice now in the
call path from hmp_qemu_io(). This causes BDRV_POLL_WHILE() to hang for
any requests issued to nodes in a non-mainloop AioContext.
Dropping the first locking from hmp_qemu_io() fixes the problem.
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Drain requests are propagated to child nodes, parent nodes and directly
to the AioContext. The order in which this happened was different
between all combinations of drain/drain_all and begin/end.
The correct order is to keep children only drained when their parents
are also drained. This means that at the start of a drained section, the
AioContext needs to be drained first, the parents second and only then
the children. The correct order for the end of a drained section is the
opposite.
This patch changes the three other functions to follow the example of
bdrv_drained_begin(), which is the only one that got it right.
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
The device is drained, so there is no point in waiting for requests at
the end of the drained section. Remove the bdrv_drain_recurse() calls
there.
The bdrv_drain_recurse() calls were introduced in commit 481cad48e5
in order to call the .bdrv_co_drain_end() driver callback. This is now
done by a separate bdrv_drain_invoke() call.
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Now that the bdrv_drain_invoke() calls are pulled up to the callers of
bdrv_drain_recurse(), the 'begin' parameter isn't needed any more.
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
This adds a test case that the BlockDriver callbacks for drain are
called in bdrv_drained_all_begin/end(), and that both of them are called
exactly once.
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
bdrv_drain_all_begin() used to call the .bdrv_co_drain_begin() driver
callback inside its polling loop. This means that how many times it got
called for each node depended on long it had to poll the event loop.
This is obviously not right and results in nodes that stay drained even
after bdrv_drain_all_end(), which calls .bdrv_co_drain_begin() once per
node.
Fix bdrv_drain_all_begin() to call the callback only once, too.
Cc: qemu-stable@nongnu.org
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
This change separates bdrv_drain_invoke(), which calls the BlockDriver
drain callbacks, from bdrv_drain_recurse(). Instead, the function
performs its own recursion now.
One reason for this is that bdrv_drain_recurse() can be called multiple
times by bdrv_drain_all_begin(), but the callbacks may only be called
once. The separation is necessary to fix this bug.
The other reason is that we intend to go to a model where we call all
driver callbacks first, and only then start polling. This is not fully
achieved yet with this patch, as bdrv_drain_invoke() contains a
BDRV_POLL_WHILE() loop for the block driver callbacks, which can still
call callbacks for any unrelated event. It's a step in this direction
anyway.
Cc: qemu-stable@nongnu.org
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
VPC has some difficulty creating geometries of particular size.
However, we can indeed force it to use a literal one, so let's
do that for the sake of test 197, which is testing some specific
offsets.
Signed-off-by: John Snow <jsnow@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Lukáš Doktor <ldoktor@redhat.com>
Commit 1f4ad7d fixed 'qemu-img info' for raw images that are currently
in use as a mirror target. It is not enough for image formats, though,
as these still unconditionally request BLK_PERM_CONSISTENT_READ.
As this permission is geared towards whether the guest-visible data is
consistent, and has no impact on whether the metadata is sane, and
'qemu-img info' does not read guest-visible data (except for the raw
format), it makes sense to not require BLK_PERM_CONSISTENT_READ if there
is not going to be any guest I/O performed, regardless of image format.
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Rather than unsupported situations, some VM_PANIC calls actually
are caused by internal errors. Convert them to just abort.
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>