Currently gen_jmp_tb() assumes that if it is called then the jump it
is handling is the only reason that we might be trying to end the TB,
so it will use goto_tb if it can. This is usually the case: mostly
"we did something that means we must end the TB" happens on a
non-branch instruction. However, there are cases where we decide
early in handling an instruction that we need to end the TB and
return to the main loop, and then the insn is a complex one that
involves gen_jmp_tb(). For instance, for M-profile FP instructions,
in gen_preserve_fp_state() which is called from vfp_access_check() we
want to force an exit to the main loop if lazy state preservation is
active and we are in icount mode.
Make gen_jmp_tb() look at the current value of is_jmp, and only use
goto_tb if the previous is_jmp was DISAS_NEXT or DISAS_TOO_MANY.
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20210913095440.13462-2-peter.maydell@linaro.org
We can expose cycle counters on the PMU easily. To be as compatible as
possible, let's do so, but make sure we don't expose any other architectural
counters that we can not model yet.
This allows OSs to work that require PMU support.
Signed-off-by: Alexander Graf <agraf@csgraf.de>
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Message-id: 20210916155404.86958-10-agraf@csgraf.de
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Now that we have all logic in place that we need to handle Hypervisor.framework
on Apple Silicon systems, let's add CONFIG_HVF for aarch64 as well so that we
can build it.
Signed-off-by: Alexander Graf <agraf@csgraf.de>
Reviewed-by: Roman Bolshakov <r.bolshakov@yadro.com>
Tested-by: Roman Bolshakov <r.bolshakov@yadro.com> (x86 only)
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Sergio Lopez <slp@redhat.com>
Message-id: 20210916155404.86958-9-agraf@csgraf.de
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
We need to handle PSCI calls. Most of the TCG code works for us,
but we can simplify it to only handle aa64 mode and we need to
handle SUSPEND differently.
This patch takes the TCG code as template and duplicates it in HVF.
To tell the guest that we support PSCI 0.2 now, update the check in
arm_cpu_initfn() as well.
Signed-off-by: Alexander Graf <agraf@csgraf.de>
Reviewed-by: Sergio Lopez <slp@redhat.com>
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Message-id: 20210916155404.86958-8-agraf@csgraf.de
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Now that we have working system register sync, we push more target CPU
properties into the virtual machine. That might be useful in some
situations, but is not the typical case that users want.
So let's add a -cpu host option that allows them to explicitly pass all
CPU capabilities of their host CPU into the guest.
Signed-off-by: Alexander Graf <agraf@csgraf.de>
Acked-by: Roman Bolshakov <r.bolshakov@yadro.com>
Reviewed-by: Sergio Lopez <slp@redhat.com>
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Message-id: 20210916155404.86958-7-agraf@csgraf.de
[PMM: drop unnecessary #include line from .h file]
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Sleep on WFI until the VTIMER is due but allow ourselves to be woken
up on IPI.
In this implementation IPI is blocked on the CPU thread at startup and
pselect() is used to atomically unblock the signal and begin sleeping.
The signal is sent unconditionally so there's no need to worry about
races between actually sleeping and the "we think we're sleeping"
state. It may lead to an extra wakeup but that's better than missing
it entirely.
Signed-off-by: Peter Collingbourne <pcc@google.com>
Signed-off-by: Alexander Graf <agraf@csgraf.de>
Acked-by: Roman Bolshakov <r.bolshakov@yadro.com>
Reviewed-by: Sergio Lopez <slp@redhat.com>
Message-id: 20210916155404.86958-6-agraf@csgraf.de
[agraf: Remove unused 'set' variable, always advance PC on WFX trap,
support vm stop / continue operations and cntv offsets]
Signed-off-by: Alexander Graf <agraf@csgraf.de>
Acked-by: Roman Bolshakov <r.bolshakov@yadro.com>
Reviewed-by: Sergio Lopez <slp@redhat.com>
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
With Apple Silicon available to the masses, it's a good time to add support
for driving its virtualization extensions from QEMU.
This patch adds all necessary architecture specific code to get basic VMs
working, including save/restore.
Known limitations:
- WFI handling is missing (follows in later patch)
- No watchpoint/breakpoint support
Signed-off-by: Alexander Graf <agraf@csgraf.de>
Reviewed-by: Roman Bolshakov <r.bolshakov@yadro.com>
Reviewed-by: Sergio Lopez <slp@redhat.com>
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Message-id: 20210916155404.86958-5-agraf@csgraf.de
[PMM: added missing #include]
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
We will need to install a migration helper for the ARM hvf backend.
Let's introduce an arch callback for the overall hvf init chain to
do so.
Signed-off-by: Alexander Graf <agraf@csgraf.de>
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Message-id: 20210916155404.86958-4-agraf@csgraf.de
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Hvf's permission bitmap during and after dirty logging does not include
the HV_MEMORY_EXEC permission. At least on Apple Silicon, this leads to
instruction faults once dirty logging was enabled.
Add the bit to make it work properly.
Signed-off-by: Alexander Graf <agraf@csgraf.de>
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Message-id: 20210916155404.86958-3-agraf@csgraf.de
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
We will need PMC register definitions in accel specific code later.
Move all constant definitions to common arm headers so we can reuse
them.
Signed-off-by: Alexander Graf <agraf@csgraf.de>
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Message-id: 20210916155404.86958-2-agraf@csgraf.de
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
During sbsa acs level 3 testing, it is seen that the GIC maintenance
interrupts are not triggered and the related test cases fail. This
is because we were incorrectly passing the value of the MISR register
(from maintenance_interrupt_state()) to qemu_set_irq() as the level
argument, whereas the device on the other end of this irq line
expects a 0/1 value.
Fix the logic to pass a 0/1 level indication, rather than a
0/not-0 value.
Fixes: c5fc89b36c ("hw/intc/arm_gicv3: Implement gicv3_cpuif_virt_update()")
Signed-off-by: Shashi Mallela <shashi.mallela@linaro.org>
Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Message-id: 20210915205809.59068-1-shashi.mallela@linaro.org
[PMM: tweaked commit message; collapsed nested if()s into one]
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Move an ifndef CONFIG_USER_ONLY code block up in arm_cpu_reset() so
it can be merged with another earlier one.
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20210914120725.24992-4-peter.maydell@linaro.org
There's no particular reason why the exclusive monitor should
be only cleared on reset in system emulation mode. It doesn't
hurt if it isn't cleared in user mode, but we might as well
reduce the amount of code we have that's inside an ifdef.
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20210914120725.24992-3-peter.maydell@linaro.org
Currently all of the M-profile specific code in arm_cpu_reset() is
inside a !defined(CONFIG_USER_ONLY) ifdef block. This is
unintentional: it happened because originally the only
M-profile-specific handling was the setup of the initial SP and PC
from the vector table, which is system-emulation only. But then we
added a lot of other M-profile setup to the same "if (ARM_FEATURE_M)"
code block without noticing that it was all inside a not-user-mode
ifdef. This has generally been harmless, but with the addition of
v8.1M low-overhead-loop support we ran into a problem: the reset of
FPSCR.LTPSIZE to 4 was only being done for system emulation mode, so
if a user-mode guest tried to execute the LE instruction it would
incorrectly take a UsageFault.
Adjust the ifdefs so only the really system-emulation specific parts
are covered. Because this means we now run some reset code that sets
up initial values in the FPCCR and similar FPU related registers,
explicitly set up the registers controlling FPU context handling in
user-emulation mode so that the FPU works by design and not by
chance.
Resolves: https://gitlab.com/qemu-project/qemu/-/issues/613
Cc: qemu-stable@nongnu.org
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20210914120725.24992-2-peter.maydell@linaro.org
Coverity points out that if the PDB file we're trying to read
has a header specifying a block_size of zero then we will
end up trying to divide by zero in pdb_ds_read_file().
Check for this and fail cleanly instead.
Fixes: Coverity CID 1458869
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Viktor Prutyanov <viktor.prutyanov@phystech.edu>
Signed-off-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Tested-by: Viktor Prutyanov <viktor.prutyanov@phystech.edu>
Message-id: 20210910170656.366592-3-philmd@redhat.com
Message-Id: <20210901143910.17112-3-peter.maydell@linaro.org>
Signed-off-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Coverity points out that we aren't checking the return value
from curl_easy_setopt().
Fixes: Coverity CID 1458895
Inspired-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Reviewed-by: Viktor Prutyanov <viktor.prutyanov@phystech.edu>
Tested-by: Viktor Prutyanov <viktor.prutyanov@phystech.edu>
Message-id: 20210910170656.366592-2-philmd@redhat.com
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Two minor fixes; one for performance, the other seccomp
on s390x.
Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
-----BEGIN PGP SIGNATURE-----
iQIzBAABCAAdFiEERfXHG0oMt/uXep+pBRYzHrxb/ecFAmFDS+oACgkQBRYzHrxb
/edhFg//Wldr2jMom7TlbgkhsbjPQbdbsbdHx7rWXviq3bqMq2cB8UzcHvhUFkW4
j15ElKDO2ZFsjtjaFQLtQmuO4zoMN/V4u8c/0V93b/vaAh4IgMYiDXaZ18fpuYi9
Y02tiTcTUyUj4fv5OUoUeynNUkkzgxGrL8Q5oZK3KSHn5uwFOWgnZiAycbECKzJH
wNljEpXjyMcZoIUJvoJ9oO246be3Flo82eA8UVOj+O0MMb2/tl18DL5IGOnglBYB
U/9CkFoa5qHfiIQ63/OsndHBMXSTZiOqnv+S9RSbAdoZRTjVP1BjFYvNjnzz9/Nv
czHfM+ecjLxNzG7WSHOrdm8D+L3E4O42Xuf6umcib1KfV6l/giQ3V9WfqfEX1JA4
V6XpZ5C25rs6AqFwbbh/Eo+wvb2syck30sXbwh7C1oDK/nfwHDgegd1EPE9VUaxi
c0yoqV/ScHfo2fbK0QVkIt2mwi8lcjH3cg2gSKfZ0YiHcYBqC7RB9IonndEkIoqd
mdvAdGafD27dDEsm1OkSxBDItmE+BZ7C2+7OF5x+a/rnt7L+yQszTG/A0DNH23kD
ktvTjEBLx2jzhmR+4My5YnTYoK0rE3zs0Nh2zxg7Kx9Sh3LlMtdN5o2GS0USq0wm
YCT4EfNAhPumPi+59mOCybgV6tayxz/ihgjAymQONAcRrTwddyo=
=GMdE
-----END PGP SIGNATURE-----
Merge remote-tracking branch 'remotes/dgilbert-gitlab/tags/pull-virtiofs-20210916' into staging
virtiofsd pull 2021-08-16
Two minor fixes; one for performance, the other seccomp
on s390x.
Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
# gpg: Signature made Thu 16 Sep 2021 14:51:38 BST
# gpg: using RSA key 45F5C71B4A0CB7FB977A9FA90516331EBC5BFDE7
# gpg: Good signature from "Dr. David Alan Gilbert (RH2) <dgilbert@redhat.com>" [full]
# Primary key fingerprint: 45F5 C71B 4A0C B7FB 977A 9FA9 0516 331E BC5B FDE7
* remotes/dgilbert-gitlab/tags/pull-virtiofs-20210916:
virtiofsd: Reverse req_list before processing it
tools/virtiofsd: Add fstatfs64 syscall to the seccomp allowlist
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
In do_setsockopt(), the code path for the options which take a struct
ip_mreq_source (IP_BLOCK_SOURCE, IP_UNBLOCK_SOURCE,
IP_ADD_SOURCE_MEMBERSHIP and IP_DROP_SOURCE_MEMBERSHIP) fails to
check the return value from lock_user(). Handle this in the usual
way by returning -TARGET_EFAULT.
(In practice this was probably harmless because we'd pass a NULL
pointer to setsockopt() and the kernel would then return EFAULT.)
Fixes: Coverity CID 1459987
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Reviewed-by: Laurent Vivier <laurent@vivier.eu>
Message-Id: <20210809155424.30968-1-peter.maydell@linaro.org>
Signed-off-by: Laurent Vivier <laurent@vivier.eu>
With the thread pool disabled, we add the requests in the queue to a
GList, processing by iterating over there afterwards.
For adding them, we're using "g_list_prepend()", which is more
efficient but causes the requests to be processed in reverse order,
breaking the read-ahead and request-merging optimizations in the host
for sequential operations.
According to the documentation, if you need to process the request
in-order, using "g_list_prepend()" and then reversing the list with
"g_list_reverse()" is more efficient than using "g_list_append()", so
let's do it that way.
Testing on a spinning disk (to boost the increase of read-ahead and
request-merging) shows a 4x improvement on sequential write fio test:
Test:
fio --directory=/mnt/virtio-fs --filename=fio-file1 --runtime=20
--iodepth=16 --size=4G --direct=1 --blocksize=4K --ioengine libaio
--rw write --name seqwrite-libaio
Without "g_list_reverse()":
...
Jobs: 1 (f=1): [W(1)][100.0%][w=22.4MiB/s][w=5735 IOPS][eta 00m:00s]
seqwrite-libaio: (groupid=0, jobs=1): err= 0: pid=710: Tue Aug 24 12:58:16 2021
write: IOPS=5709, BW=22.3MiB/s (23.4MB/s)(446MiB/20002msec); 0 zone resets
...
With "g_list_reverse()":
...
Jobs: 1 (f=1): [W(1)][100.0%][w=84.0MiB/s][w=21.5k IOPS][eta 00m:00s]
seqwrite-libaio: (groupid=0, jobs=1): err= 0: pid=716: Tue Aug 24 13:00:15 2021
write: IOPS=21.3k, BW=83.1MiB/s (87.2MB/s)(1663MiB/20001msec); 0 zone resets
...
Signed-off-by: Sergio Lopez <slp@redhat.com>
Message-Id: <20210824131158.39970-1-slp@redhat.com>
Reviewed-by: Vivek Goyal <vgoyal@redhat.com>
Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
The virtiofsd currently crashes on s390x when doing something like
this in the guest:
mkdir -p /mnt/myfs
mount -t virtiofs myfs /mnt/myfs
touch /mnt/myfs/foo.txt
stat -f /mnt/myfs/foo.txt
The problem is that the fstatfs64 syscall is called in this case
from the virtiofsd. We have to put it on the seccomp allowlist to
avoid that the daemon gets killed in this case.
Buglink: https://bugzilla.redhat.com/show_bug.cgi?id=2001728
Suggested-by: Vivek Goyal <vgoyal@redhat.com>
Signed-off-by: Thomas Huth <thuth@redhat.com>
Message-Id: <20210914123214.181885-1-thuth@redhat.com>
Reviewed-by: Vivek Goyal <vgoyal@redhat.com>
Reviewed-by: Sergio Lopez <slp@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
The sparc_cpu_dump_state() function is only called within
the same file. Make it static.
Signed-off-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Reviewed-by: Laurent Vivier <laurent@vivier.eu>
Reviewed-by: Mark Cave-Ayland <mark.cave-ayland@ilande.co.uk>
Message-Id: <20210916084002.1918445-1-f4bug@amsat.org>
Signed-off-by: Laurent Vivier <laurent@vivier.eu>
../target/avr/translate.c: In function ‘gen_jmp_ez’:
../target/avr/translate.c:1012:22: error: implicit conversion from ‘enum <anonymous>’ to ‘DisasJumpType’ [-Werror=enum-conversion]
1012 | ctx->base.is_jmp = DISAS_LOOKUP;
| ^
Signed-off-by: Stefan Weil <sw@weilnetz.de>
Reviewed-by: Michael Rolnik <mrolnik@gmail.com>
Message-Id: <20210706180936.249912-1-sw@weilnetz.de>
Signed-off-by: Laurent Vivier <laurent@vivier.eu>
Although we have long supported 'qemu-img convert -o
backing_file=foo,backing_fmt=bar', the fact that we have a shortcut -B
for backing_file but none for backing_fmt has made it more likely that
users accidentally run into:
qemu-img: warning: Deprecated use of backing file without explicit backing format
when using -B instead of -o. For similarity with other qemu-img
commands, such as create and compare, add '-F $fmt' as the shorthand
for '-o backing_fmt=$fmt'. Update iotest 122 for coverage of both
spellings.
Signed-off-by: Eric Blake <eblake@redhat.com>
Message-Id: <20210913131735.1948339-1-eblake@redhat.com>
Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com>
Signed-off-by: Hanna Reitz <hreitz@redhat.com>
Split checking for reserved bits out of aligned offset check.
Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Tested-by: Kirill Tkhai <ktkhai@virtuozzo.com>
Reviewed-by: Hanna Reitz <hreitz@redhat.com>
Message-Id: <20210914122454.141075-11-vsementsov@virtuozzo.com>
Signed-off-by: Hanna Reitz <hreitz@redhat.com>
- use g_autofree for l1_table
- better name for size in bytes variable
- reduce code blocks nesting
- whitespaces, braces, newlines
Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Reviewed-by: Hanna Reitz <hreitz@redhat.com>
Message-Id: <20210914122454.141075-9-vsementsov@virtuozzo.com>
Signed-off-by: Hanna Reitz <hreitz@redhat.com>
Check subcluster bitmap of the l2 entry for different types of
clusters:
- for compressed it must be zero
- for allocated check consistency of two parts of the bitmap
- for unallocated all subclusters should be unallocated
(or zero-plain)
Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Tested-by: Kirill Tkhai <ktkhai@virtuozzo.com>
Message-Id: <20210914122454.141075-7-vsementsov@virtuozzo.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Hanna Reitz <hreitz@redhat.com>
Signed-off-by: Hanna Reitz <hreitz@redhat.com>
We'll reuse the function to fix wrong L2 entry bitmap. Support it now.
Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Hanna Reitz <hreitz@redhat.com>
Message-Id: <20210914122454.141075-6-vsementsov@virtuozzo.com>
Signed-off-by: Hanna Reitz <hreitz@redhat.com>
Split fix_l2_entry_by_zero() out of check_refcounts_l2() to be
reused in further patch.
Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Hanna Reitz <hreitz@redhat.com>
Message-Id: <20210914122454.141075-5-vsementsov@virtuozzo.com>
Signed-off-by: Hanna Reitz <hreitz@redhat.com>
Add helper to parse compressed l2_entry and use it everywhere instead
of open-coding.
Note, that in most places we move to precise coffset/csize instead of
sector-aligned. Still it should work good enough for updating
refcounts.
Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Hanna Reitz <hreitz@redhat.com>
Message-Id: <20210914122454.141075-4-vsementsov@virtuozzo.com>
Signed-off-by: Hanna Reitz <hreitz@redhat.com>
Let's pass the whole L2 entry and not bother with
L2E_COMPRESSED_OFFSET_SIZE_MASK.
It also helps further refactoring that adds generic
qcow2_parse_compressed_l2_entry() helper.
Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Alberto Garcia <berto@igalia.com>
Reviewed-by: Hanna Reitz <hreitz@redhat.com>
Message-Id: <20210914122454.141075-3-vsementsov@virtuozzo.com>
Signed-off-by: Hanna Reitz <hreitz@redhat.com>
- don't use same name for size in bytes and in entries
- use g_autofree for l2_table
- add whitespace
- fix block comment style
Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Hanna Reitz <hreitz@redhat.com>
Message-Id: <20210914122454.141075-2-vsementsov@virtuozzo.com>
Signed-off-by: Hanna Reitz <hreitz@redhat.com>
If a gitlab CI job is marked as manual-only but is not marked
as allow_failure, then gitlab considers that the pipeline is
"blocked" until the job has been manually triggered. We need
to mark these manual-only jobs as also allow_failure: true
so that gitlab doesn't insist that they have run before it
will consider the pipeline to be complete.
Fixes: 4c9af1ea14
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>
Message-id: 20210915123412.8232-1-peter.maydell@linaro.org
Acked-by: Thomas Huth <thuth@redhat.com>
Reviewed-by: Willian Rampazzo <willianr@redhat.com>
We cannot write to images opened with O_DIRECT unless we allow them to
be resized so they are aligned to the sector size: Since 9c60a5d197,
bdrv_node_refresh_perm() ensures that for nodes whose length is not
aligned to the request alignment and where someone has taken a WRITE
permission, the RESIZE permission is taken, too).
Let qemu-img convert pass the BDRV_O_RESIZE flag (which causes
blk_new_open() to take the RESIZE permission) when using cache=none for
the target, so that when writing to it, it can be aligned to the target
sector size.
Without this patch, an error is returned:
$ qemu-img convert -f raw -O raw -t none foo.img /mnt/tmp/foo.img
qemu-img: Could not open '/mnt/tmp/foo.img': Cannot get 'write'
permission without 'resize': Image size is not a multiple of request
alignment
Buglink: https://bugzilla.redhat.com/show_bug.cgi?id=1994266
Signed-off-by: Hanna Reitz <hreitz@redhat.com>
Message-Id: <20210819101200.64235-1-hreitz@redhat.com>
Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
There is no conflict and no dependency if we have parallel writes to
different subclusters of one cluster when the cluster itself is already
allocated. So, relax extra dependency.
Measure performance:
First, prepare build/qemu-img-old and build/qemu-img-new images.
cd scripts/simplebench
./img_bench_templater.py
Paste the following to stdin of running script:
qemu_img=../../build/qemu-img-{old|new}
$qemu_img create -f qcow2 -o extended_l2=on /ssd/x.qcow2 1G
$qemu_img bench -c 100000 -d 8 [-s 2K|-s 2K -o 512|-s $((1024*2+512))] \
-w -t none -n /ssd/x.qcow2
The result:
All results are in seconds
------------------ --------- ---------
old new
-s 2K 6.7 ± 15% 6.2 ± 12%
-7%
-s 2K -o 512 13 ± 3% 11 ± 5%
-16%
-s $((1024*2+512)) 9.5 ± 4% 8.4
-12%
------------------ --------- ---------
So small writes are more independent now and that helps to keep deeper
io queue which improves performance.
271 iotest output becomes racy for three allocation in one cluster.
Second and third writes may finish in different order. Second and
third requests don't depend on each other any more. Still they both
depend on first request anyway. Filter out second and third write
offsets to cover both possible outputs.
Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Message-Id: <20210824101517.59802-4-vsementsov@virtuozzo.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Hanna Reitz <hreitz@redhat.com>
[hreitz: s/ an / and /]
Signed-off-by: Hanna Reitz <hreitz@redhat.com>
No logic change, just prepare for the following commit. While being
here do also small grammar fix in a comment.
Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Hanna Reitz <hreitz@redhat.com>
Message-Id: <20210824101517.59802-3-vsementsov@virtuozzo.com>
Signed-off-by: Hanna Reitz <hreitz@redhat.com>
Add simple grammar-parsing template benchmark. New tool consume test
template written in bash with some special grammar injections and
produces multiple tests, run them and finally print a performance
comparison table of different tests produced from one template.
Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Message-Id: <20210824101517.59802-2-vsementsov@virtuozzo.com>
Reviewed-by: Hanna Reitz <hreitz@redhat.com>
Signed-off-by: Hanna Reitz <hreitz@redhat.com>
We must not inactivate child when parent has write permissions on
it.
Calling .bdrv_inactivate() doesn't help: actually only qcow2 has this
handler and it is used to flush caches, not for permission
manipulations.
So, let's simply check cumulative parent permissions before
inactivating the node.
This commit fixes a crash when we do migration during backup: prior to
the commit nothing prevents all nodes inactivation at migration finish
and following backup write to the target crashes on assertion
"assert(!(bs->open_flags & BDRV_O_INACTIVE));" in
bdrv_co_write_req_prepare().
After the commit, we rely on the fact that copy-before-write filter
keeps write permission on target node to be able to write to it. So
inactivation fails and migration fails as expected.
Corresponding test now passes, so, enable it.
Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Reviewed-by: Hanna Reitz <hreitz@redhat.com>
Message-Id: <20210911120027.8063-3-vsementsov@virtuozzo.com>
Signed-off-by: Hanna Reitz <hreitz@redhat.com>
Add a simple test which tries to run migration during backup.
bdrv_inactivate_all() should fail. But due to bug (see next commit with
fix) it doesn't, nodes are inactivated and continued backup crashes
on assertion "assert(!(bs->open_flags & BDRV_O_INACTIVE));" in
bdrv_co_write_req_prepare().
Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Message-Id: <20210911120027.8063-2-vsementsov@virtuozzo.com>
Signed-off-by: Hanna Reitz <hreitz@redhat.com>
In mirror_iteration() we call mirror_wait_on_conflicts() with
`self` parameter set to NULL.
Starting from commit d44dae1a7c we dereference `self` pointer in
mirror_wait_on_conflicts() without checks if it is not NULL.
Backtrace:
Program terminated with signal SIGSEGV, Segmentation fault.
#0 mirror_wait_on_conflicts (self=0x0, s=<optimized out>, offset=<optimized out>, bytes=<optimized out>)
at ../block/mirror.c:172
172 self->waiting_for_op = op;
[Current thread is 1 (Thread 0x7f0908931ec0 (LWP 380249))]
(gdb) bt
#0 mirror_wait_on_conflicts (self=0x0, s=<optimized out>, offset=<optimized out>, bytes=<optimized out>)
at ../block/mirror.c:172
#1 0x00005610c5d9d631 in mirror_run (job=0x5610c76a2c00, errp=<optimized out>) at ../block/mirror.c:491
#2 0x00005610c5d58726 in job_co_entry (opaque=0x5610c76a2c00) at ../job.c:917
#3 0x00005610c5f046c6 in coroutine_trampoline (i0=<optimized out>, i1=<optimized out>)
at ../util/coroutine-ucontext.c:173
#4 0x00007f0909975820 in ?? () at ../sysdeps/unix/sysv/linux/x86_64/__start_context.S:91
from /usr/lib64/libc.so.6
Buglink: https://bugzilla.redhat.com/show_bug.cgi?id=2001404
Fixes: d44dae1a7c ("block/mirror: fix active mirror dead-lock in mirror_wait_on_conflicts")
Signed-off-by: Stefano Garzarella <sgarzare@redhat.com>
Message-Id: <20210910124533.288318-1-sgarzare@redhat.com>
Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Signed-off-by: Hanna Reitz <hreitz@redhat.com>
297 so far does not check the named tests, which reside in the tests/
directory (i.e. full path tests/qemu-iotests/tests). Fix it.
Thanks to the previous two commits, all named tests pass its scrutiny,
so we do not have to add anything to SKIP_FILES.
Signed-off-by: Hanna Reitz <hreitz@redhat.com>
Reviewed-by: Willian Rampazzo <willianr@redhat.com>
Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Message-Id: <20210902094017.32902-6-hreitz@redhat.com>
The AbnormalShutdown exception class is not in qemu.machine, but in
qemu.machine.machine. (qemu.machine.AbnormalShutdown was enough for
Python to find it in order to run this test, but pylint complains about
it.)
Signed-off-by: Hanna Reitz <hreitz@redhat.com>
Message-Id: <20210902094017.32902-5-hreitz@redhat.com>
Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>