In job_finish_sync job_enter should be enough for a job to make some
progress and draining is a wrong tool for it. So use job_enter directly
here and drop job_drain with all related staff not used more.
Suggested-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Tested-by: John Snow <jsnow@redhat.com>
Reviewed-by: John Snow <jsnow@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Add handlers supporting qiov_offset parameter:
bdrv_co_preadv_part
bdrv_co_pwritev_part
bdrv_co_pwritev_compressed_part
This is used to reduce need of defining local_qiovs and hd_qiovs in all
corners of block layer code. The following patches will increase usage
of this new API part by part.
Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Acked-by: Stefan Hajnoczi <stefanha@redhat.com>
Message-id: 20190604161514.262241-5-vsementsov@virtuozzo.com
Message-Id: <20190604161514.262241-5-vsementsov@virtuozzo.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
write flags are constant, let's store it in BackupBlockJob instead of
recalculating. It also makes two boolean fields to be unused, so,
drop them.
Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Reviewed-by: John Snow <jsnow@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Message-id: 20190730163251.755248-4-vsementsov@virtuozzo.com
Signed-off-by: John Snow <jsnow@redhat.com>
We have detect_zeroes option, so at least for blockdev-backup user
should define it if zero-detection is needed. For drive-backup leave
detection enabled by default but do it through existing option instead
of open-coding.
Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Reviewed-by: John Snow <jsnow@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Message-id: 20190730163251.755248-2-vsementsov@virtuozzo.com
Signed-off-by: John Snow <jsnow@redhat.com>
Accept bitmaps and sync policies for the other backup modes.
This allows us to do things like create a bitmap synced to a full backup
without a transaction, or start a resumable backup process.
Some combinations don't make sense, though:
- NEVER policy combined with any non-BITMAP mode doesn't do anything,
because the bitmap isn't used for input or output.
It's harmless, but is almost certainly never what the user wanted.
- sync=NONE is more questionable. It can't use on-success because this
job never completes with success anyway, and the resulting artifact
of 'always' is suspect: because we start with a full bitmap and only
copy out segments that get written to, the final output bitmap will
always be ... a fully set bitmap.
Maybe there's contexts in which bitmaps make sense for sync=none,
but not without more severe changes to the current job, and omitting
it here doesn't prevent us from adding it later.
Signed-off-by: John Snow <jsnow@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Message-id: 20190716000117.25219-11-jsnow@redhat.com
Signed-off-by: John Snow <jsnow@redhat.com>
Presently, If sync=TOP is selected, we mark the entire bitmap as dirty.
In the write notifier handler, we dutifully copy out such regions.
Fix this in three parts:
1. Mark the bitmap as being initialized before the first yield.
2. After the first yield but before the backup loop, interrogate the
allocation status asynchronously and initialize the bitmap.
3. Teach the write notifier to interrogate allocation status if it is
invoked during bitmap initialization.
As an effect of this patch, the job progress for TOP backups
now behaves like this:
- total progress starts at bdrv_length.
- As allocation status is interrogated, total progress decreases.
- As blocks are copied, current progress increases.
Taken together, the floor and ceiling move to meet each other.
Signed-off-by: John Snow <jsnow@redhat.com>
Message-id: 20190716000117.25219-10-jsnow@redhat.com
[Remove ret = -ECANCELED change. --js]
[Squash in conflict resolution based on Max's patch --js]
Message-id: c8b0ab36-79c8-0b4b-3193-4e12ed8c848b@redhat.com
Reviewed-by: Max Reitz <mreitz@redhat.com>
Signed-off-by: John Snow <jsnow@redhat.com>
Modify bdrv_is_unallocated_range to utilize the pnum return from
bdrv_is_allocated, and in the process change the semantics from
"is unallocated" to "is allocated."
Optionally returns a full number of clusters that share the same
allocation status.
This will be used to carefully toggle bits in the bitmap for sync=top
initialization in the following commits.
Signed-off-by: John Snow <jsnow@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Message-id: 20190716000117.25219-9-jsnow@redhat.com
Signed-off-by: John Snow <jsnow@redhat.com>
Just a few housekeeping changes that keeps the following commit easier
to read; perform the initial copy_bitmap initialization in one place.
Signed-off-by: John Snow <jsnow@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Message-id: 20190716000117.25219-8-jsnow@redhat.com
Signed-off-by: John Snow <jsnow@redhat.com>
When making backups based on bitmaps, the work estimate can be more
accurate. Update iotests to reflect the new strategy.
TOP work estimates are broken, but do not get worse with this commit.
That issue is addressed in the following commits instead.
Signed-off-by: John Snow <jsnow@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Message-id: 20190716000117.25219-7-jsnow@redhat.com
Signed-off-by: John Snow <jsnow@redhat.com>
This is nicer to do in the unified QMP interface that we have now,
because it lets us use the right terminology back at the user.
Signed-off-by: John Snow <jsnow@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Message-id: 20190716000117.25219-5-jsnow@redhat.com
Signed-off-by: John Snow <jsnow@redhat.com>
With the "never" sync policy, we actually can utilize readonly bitmaps
now. Loosen the check at the QMP level, and tighten it based on
provided arguments down at the job creation level instead.
Signed-off-by: John Snow <jsnow@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Message-id: 20190709232550.10724-19-jsnow@redhat.com
Signed-off-by: John Snow <jsnow@redhat.com>
This adds an "always" policy for bitmap synchronization. Regardless of if
the job succeeds or fails, the bitmap is *always* synchronized. This means
that for backups that fail part-way through, the bitmap retains a record of
which sectors need to be copied out to accomplish a new backup using the
old, partial result.
In effect, this allows us to "resume" a failed backup; however the new backup
will be from the new point in time, so it isn't a "resume" as much as it is
an "incremental retry." This can be useful in the case of extremely large
backups that fail considerably through the operation and we'd like to not waste
the work that was already performed.
Signed-off-by: John Snow <jsnow@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Message-id: 20190709232550.10724-13-jsnow@redhat.com
Signed-off-by: John Snow <jsnow@redhat.com>
This simplifies some interface matters; namely the initialization and
(later) merging the manifest back into the sync_bitmap if it was
provided.
Signed-off-by: John Snow <jsnow@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Message-id: 20190709232550.10724-12-jsnow@redhat.com
Signed-off-by: John Snow <jsnow@redhat.com>
This adds a "never" policy for bitmap synchronization. Regardless of if
the job succeeds or fails, we never update the bitmap. This can be used
to perform differential backups, or simply to avoid the job modifying a
bitmap.
Signed-off-by: John Snow <jsnow@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Message-id: 20190709232550.10724-7-jsnow@redhat.com
Signed-off-by: John Snow <jsnow@redhat.com>
We don't need or want a new sync mode for simple differences in
semantics. Create a new mode simply named "BITMAP" that is designed to
make use of the new Bitmap Sync Mode field.
Because the only bitmap sync mode is 'on-success', this adds no new
functionality to the backup job (yet). The old incremental backup mode
is maintained as a syntactic sugar for sync=bitmap, mode=on-success.
Add all of the plumbing necessary to support this new instruction.
Signed-off-by: John Snow <jsnow@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Message-id: 20190709232550.10724-6-jsnow@redhat.com
Signed-off-by: John Snow <jsnow@redhat.com>
This fixes devices like IDE that can still start new requests from I/O
handlers in the CPU thread while the block backend is drained.
The basic assumption is that in a drain section, no new requests should
be allowed through a BlockBackend (blk_drained_begin/end don't exist,
we get drain sections only on the node level). However, there are two
special cases where requests should not be queued:
1. Block jobs: We already make sure that block jobs are paused in a
drain section, so they won't start new requests. However, if the
drain_begin is called on the job's BlockBackend first, it can happen
that we deadlock because the job stays busy until it reaches a pause
point - which it can't if its requests aren't processed any more.
The proper solution here would be to make all requests through the
job's filter node instead of using a BlockBackend. For now, just
disabling request queuing on the job BlockBackend is simpler.
2. In test cases where making requests through bdrv_* would be
cumbersome because we'd need a BdrvChild. As we already got the
functionality to disable request queuing from 1., use it in tests,
too, for convenience.
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Enabled by default copy_range ignores compress option. It's definitely
unexpected for user.
It's broken since introduction of copy_range usage in backup in
9ded4a0114.
Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Message-id: 20190730163251.755248-3-vsementsov@virtuozzo.com
Reviewed-by: John Snow <jsnow@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Cc: qemu-stable@nongnu.org
Signed-off-by: Max Reitz <mreitz@redhat.com>
The backup job must only copy areas that the copy_bitmap reports as
dirty. This is always the case when using traditional non-offloading
backup, because it copies each cluster separately. When offloading the
copy operation, we sometimes copy more than one cluster at a time, but
we only check whether the first one is dirty.
Therefore, whenever copy offloading is possible, the backup job
currently produces wrong output when the guest writes to an area of
which an inner part has already been backed up, because that inner part
will be re-copied.
Fixes: 9ded4a0114
Signed-off-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Message-id: 20190801173900.23851-2-mreitz@redhat.com
Cc: qemu-stable@nongnu.org
Signed-off-by: Max Reitz <mreitz@redhat.com>
This adds a new parameter to blk_new() which requires its callers to
declare from which AioContext this BlockBackend is going to be used (or
the locks of which AioContext need to be taken anyway).
The given context is only stored and kept up to date when changing
AioContexts. Actually applying the stored AioContext to the root node
is saved for another commit.
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Split out cluster_size calculation. Move copy-bitmap creation above
block-job creation, as we are going to share it with upcoming
backup-top filter, which also should be created before actual block job
creation.
Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Message-id: 20190429090842.57910-6-vsementsov@virtuozzo.com
[mreitz: Dropped a paragraph from the commit message that was left over
from a previous version]
Signed-off-by: Max Reitz <mreitz@redhat.com>
Do full, top and incremental mode copying all in one place. This
unifies the code path and helps further improvements.
Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Message-id: 20190429090842.57910-5-vsementsov@virtuozzo.com
Signed-off-by: Max Reitz <mreitz@redhat.com>
Split allocation checking to separate function and reduce nesting.
Consider bdrv_is_allocated() fail as allocated area, as copying more
than needed is not wrong (and we do it anyway) and seems better than
fail the whole job. And, most probably we will fail on the next read,
if there are real problem with source.
Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Message-id: 20190429090842.57910-4-vsementsov@virtuozzo.com
Signed-off-by: Max Reitz <mreitz@redhat.com>
We are going to share this bitmap between backup and backup-top filter
driver, so let's share something more meaningful. It also simplifies
some calculations.
Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Message-id: 20190429090842.57910-3-vsementsov@virtuozzo.com
Signed-off-by: Max Reitz <mreitz@redhat.com>
Simplify backup_incremental_init_copy_bitmap using the function
bdrv_dirty_bitmap_next_dirty_area.
Note: move to job->len instead of bitmap size: it should not matter but
less code.
Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Message-id: 20190429090842.57910-2-vsementsov@virtuozzo.com
Signed-off-by: Max Reitz <mreitz@redhat.com>
Block jobs require that all of the nodes the job is using are in the
same AioContext. Therefore all BdrvChild objects of the job propagate
.(can_)set_aio_context to all other job nodes, so that the switch is
checked and performed consistently even if both nodes are in different
subtrees.
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Reviewed-by: Stefano Garzarella <sgarzare@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Use new qemu_iovec_init_buf() instead of
qemu_iovec_init_external( ... , 1), which simplifies the code.
Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Message-id: 20190218140926.333779-5-vsementsov@virtuozzo.com
Message-Id: <20190218140926.333779-5-vsementsov@virtuozzo.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
This reverts commit a33fbb4f8b.
The functionality is unused.
Note: in addition to automatic revert, drop second parameter in
hbitmap_iter_next() call from hbitmap_next_dirty_area() too.
Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Reviewed-by: John Snow <jsnow@redhat.com>
Rename opaque_job to job to be consistent with other job implementations.
Rename 'job', the BackupBlockJob object, to 's' to also be consistent.
Suggested-by: Eric Blake <eblake@redhat.com>
Signed-off-by: John Snow <jsnow@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Message-id: 20180830015734.19765-8-jsnow@redhat.com
Signed-off-by: Max Reitz <mreitz@redhat.com>
Utilize the job_exit shim by not calling job_defer_to_main_loop, and
where applicable, converting the deferred callback into the job_exit
callback.
This converts backup, stream, create, and the unit tests all at once.
Most of these jobs do not see any changes to the order in which they
clean up their resources, except the test-blockjob-txn test, which
now puts down its bs before job_completed is called.
This is safe for the same reason the reordering in the mirror job is
safe, because job_completed no longer runs under two locks, making
the unref safe even if it causes a flush.
Signed-off-by: John Snow <jsnow@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Message-id: 20180830015734.19765-7-jsnow@redhat.com
Signed-off-by: Max Reitz <mreitz@redhat.com>
Jobs presently use both an Error object in the case of the create job,
and char strings in the case of generic errors elsewhere.
Unify the two paths as just j->err, and remove the extra argument from
job_completed. The integer error code for job_completed is kept for now,
to be removed shortly in a separate patch.
Signed-off-by: John Snow <jsnow@redhat.com>
Message-id: 20180830015734.19765-3-jsnow@redhat.com
[mreitz: Dropped a superfluous g_strdup()]
Reviewed-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Max Reitz <mreitz@redhat.com>
Presently we codify the entry point for a job as the "start" callback,
but a more apt name would be "run" to clarify the idea that when this
function returns we consider the job to have "finished," except for
any cleanup which occurs in separate callbacks later.
As part of this clarification, change the signature to include an error
object and a return code. The error ptr is not yet used, and the return
code while captured, will be overwritten by actions in the job_completed
function.
Signed-off-by: John Snow <jsnow@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Message-id: 20180830015734.19765-2-jsnow@redhat.com
Reviewed-by: Jeff Cody <jcody@redhat.com>
Signed-off-by: Max Reitz <mreitz@redhat.com>
Signed-off-by: Stefan Weil <sw@weilnetz.de>
Reviewed-by: John Snow <jsnow@redhat.com>
Reviewed-by: Jeff Cody <jcody@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Fleecing scheme works as follows: we want a kind of temporary snapshot
of active drive A. We create temporary image B, with B->backing = A.
Then we start backup(sync=none) from A to B. From this point, B reads
as point-in-time snapshot of A (A continues to be active drive,
accepting guest IO).
This scheme needs some additional synchronization between reads from B
and backup COW operations, otherwise, the following situation is
theoretically possible:
(assume B is qcow2, client is NBD client, reading from B)
1. client starts reading and take qcow2 mutex in qcow2_co_preadv, and
goes up to l2 table loading (assume cache miss)
2) guest write => backup COW => qcow2 write =>
try to take qcow2 mutex => waiting
3. l2 table loaded, we see that cluster is UNALLOCATED, go to
"case QCOW2_CLUSTER_UNALLOCATED" and unlock mutex before
bdrv_co_preadv(bs->backing, ...)
4) aha, mutex unlocked, backup COW continues, and we finally finish
guest write and change cluster in our active disk A
5. actually, do bdrv_co_preadv(bs->backing, ...) and read
_new updated_ data.
To avoid this, let's make backup writes serializing, to not intersect
with reads from B.
Note: we expand range of handled cases from (sync=none and
B->backing = A) to just (A in backing chain of B), to finally allow
safe reading from B during backup for all cases when A in backing chain
of B, i.e. B formally looks like point-in-time snapshot of A.
Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Reviewed-by: Fam Zheng <famz@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Pass read flags and write flags separately. This is needed to handle
coming BDRV_REQ_NO_SERIALISING clearly in following patches.
Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Reviewed-by: Fam Zheng <famz@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
The implementation is similar to the 'qemu-img convert'. In the
beginning of the job, offloaded copy is attempted. If it fails, further
I/O will go through the existing bounce buffer code path.
Then, as Kevin pointed out, both this and qemu-img convert can benefit
from a local check if one request fails because of, for example, the
offset is beyond EOF, but another may well be accepted by the protocol
layer. This will be implemented separately.
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Fam Zheng <famz@redhat.com>
Message-id: 20180703023758.14422-4-famz@redhat.com
Signed-off-by: Jeff Cody <jcody@redhat.com>
This new parameter allows the caller to just query the next dirty
position without moving the iterator.
Signed-off-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: Fam Zheng <famz@redhat.com>
Reviewed-by: John Snow <jsnow@redhat.com>
Message-id: 20180613181823.13618-8-mreitz@redhat.com
Signed-off-by: Max Reitz <mreitz@redhat.com>
So far we relied on job->ret and strerror() to produce an error message
for failed jobs. Not surprisingly, this tends to result in completely
useless messages.
This adds a Job.error field that can contain an error string for a
failing job, and a parameter to job_completed() that sets the field. As
a default, if NULL is passed, we continue to use strerror(job->ret).
All existing callers are changed to pass NULL. They can be improved in
separate patches.
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: Jeff Cody <jcody@redhat.com>
BlockJob has fields .offset and .len, which are actually misnomers today
because they are no longer tied to block device sizes, but just progress
counters. As such they make a lot of sense in generic Jobs.
This patch moves the fields to Job and renames them to .progress_current
and .progress_total to describe their function better.
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
This doesn't actually move any transaction code to Job yet, but it
renames the type for transactions from BlockJobTxn to JobTxn and makes
them contain Jobs rather than BlockJobs
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
block_job_drain() contains a blk_drain() call which cannot be moved to
Job, so add a new JobDriver callback JobDriver.drain which has a common
implementation for all BlockJobs. In addition to this we keep the
existing BlockJobDriver.drain callback that is called by the common
drain implementation for all block jobs.
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
This moves the finalisation of a single job from BlockJob to Job.
Some part of this code depends on job transactions, and job transactions
call this code, we introduce some temporary calls from Job functions to
BlockJob ones. This will be fixed once transactions move to Job, too.
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
While we already moved the state related to job pausing to Job, the
functions to do were still BlockJob only. This commit moves them over to
Job.
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: John Snow <jsnow@redhat.com>
There is nothing block layer specific about block_job_sleep_ns(), so
move the function to Job.
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: John Snow <jsnow@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
This commit moves some core functions for dealing with the job coroutine
from BlockJob to Job. This includes primarily entering the coroutine
(both for the first and reentering) and yielding explicitly and at pause
points.
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: John Snow <jsnow@redhat.com>
Move the defer_to_main_loop functionality from BlockJob to Job.
The code can be simplified because we can use job->aio_context in
job_defer_to_main_loop_bh() now, instead of having to access the
BlockDriverState.
Probably taking the data->aio_context lock in addition was already
unnecessary in the old code because we didn't actually make use of
anything protected by the old AioContext except getting the new
AioContext, in case it changed between scheduling the BH and running it.
But it's certainly unnecessary now that the BDS isn't accessed at all
any more.
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: John Snow <jsnow@redhat.com>