Support the follwoing arguments in the ctr parameter parser:
- add 'delta_disks', 'data_offset' taking int and sector respectively
- 'raid10_use_near_sets' bool argument to optionally select
near sets with supporting raid10 mappings
Signed-off-by: Heinz Mauelshagen <heinzm@redhat.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
Add new members to the dm-raid superblock and new raid types to support
takeover/reshape.
Add all necessary members needed to support takeover and reshape in one
go -- aiming to limit the amount of changes to the superblock layout.
This is a larger patch due to the new superblock members, their related
flags, validation of both and involved API additions/changes:
- add additional members to keep track of:
- state about forward/backward reshaping
- reshape position
- new level, layout, stripe size and delta disks
- data offset to current and new data for out-of-place reshapes
- failed devices bitfield extensions to keep track of max raid devices
- adjust super_validate() to cope with new superblock members
- adjust super_init_validation() to cope with new superblock members
- add definitions for ctr flags supporting delta disks etc.
- add new raid types (raid6_n_6 etc.)
- add new raid10 supporting function API (_is_raid10_*())
- adjust to changed raid10 supporting function API
Signed-off-by: Heinz Mauelshagen <heinzm@redhat.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
Make use if raid type rt_is_*() bool functions for simplification and
consistency reasons.
Signed-off-by: Heinz Mauelshagen <heinzm@redhat.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
- add _test_flags() function
- use it to simplify rs_check_for_invalid_flags()
- use _test_flag() throughout
Signed-off-by: Heinz Mauelshagen <heinzm@redhat.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
Reject invalid flag combinations to avoid potential data corruption or
failing raid set construction:
- add definitions for constructor flag combinations and invalid flags
per level
- add bool test functions for the various raid types
(also will be used by future reshaping enhancements)
- introduce rs_check_for_invalid_flags() and _invalid_flags()
to perform the validity checks
Signed-off-by: Heinz Mauelshagen <heinzm@redhat.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
Provide necessary infrastructure to handle ctr flags and their names
and cleanup setting ti->error:
- comment constructor flags
- introduce constructor flag manipulation
- introduce ti_error_*() functions to simplify
setting the error message (use in other targets?)
- introduce array to hold ctr flag <-> flag name mapping
- introduce argument name by flag functions for that array
- use those functions throughout the ctr call path
Signed-off-by: Heinz Mauelshagen <heinzm@redhat.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
- use dm_arg_set API in ctr and its callees parse_raid_params() and dev_parms()
- introduce _in_range() function to check a value is in a [ min, max ] range;
this is to support more callers in parsing parameters etc. in the future
- correct comment on MAX_RAID_DEVICES
Signed-off-by: Heinz Mauelshagen <heinzm@redhat.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
Separate the op from the rq_flag_bits and have md
set/get the bio using bio_set_op_attrs/bio_op.
Signed-off-by: Mike Christie <mchristi@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Hannes Reinecke <hare@suse.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
Given we don't yet support any feature flags in the dm-raid ondisk
metadata (see: 'features' member of 'struct dm_raid_superblock'),
add a check to ensure no flags are actually set, if any features are
set reject the activation of the RAID mapping.
This is to prevent possible data corruption in case of a kernel
downgrade when there'll potentially be feature flags set by a future
dm-raid target.
Signed-off-by: Heinz Mauelshagen <heinzm@redhat.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
Commit 3a0f9aaee0 ("dm raid: round region_size to power of two")
intended to make sure that the default region size is a power of two.
However, the logic in that commit is incorrect and sets the variable
region_size to 0 or 1, depending on whether min_region_size is a power
of two.
Fix this logic, using roundup_pow_of_two(), so that region_size is
properly rounded up to the next power of two.
Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
Fixes: 3a0f9aaee0 ("dm raid: round region_size to power of two")
Cc: stable@vger.kernel.org # v3.8+
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
As generic_make_request() is now able to handle arbitrarily sized bios,
it's no longer necessary for each individual block driver to define its
own ->merge_bvec_fn() callback. Remove every invocation completely.
Cc: Jens Axboe <axboe@kernel.dk>
Cc: Lars Ellenberg <drbd-dev@lists.linbit.com>
Cc: drbd-user@lists.linbit.com
Cc: Jiri Kosina <jkosina@suse.cz>
Cc: Yehuda Sadeh <yehuda@inktank.com>
Cc: Sage Weil <sage@inktank.com>
Cc: Alex Elder <elder@kernel.org>
Cc: ceph-devel@vger.kernel.org
Cc: Alasdair Kergon <agk@redhat.com>
Cc: Mike Snitzer <snitzer@redhat.com>
Cc: dm-devel@redhat.com
Cc: Neil Brown <neilb@suse.de>
Cc: linux-raid@vger.kernel.org
Cc: Christoph Hellwig <hch@infradead.org>
Cc: "Martin K. Petersen" <martin.petersen@oracle.com>
Acked-by: NeilBrown <neilb@suse.de> (for the 'md' bits)
Acked-by: Mike Snitzer <snitzer@redhat.com>
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
[dpark: also remove ->merge_bvec_fn() in dm-thin as well as
dm-era-target, and resolve merge conflicts]
Signed-off-by: Dongsu Park <dpark@posteo.net>
Signed-off-by: Ming Lin <ming.l@ssi.samsung.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
Add dm-raid access to the MD RAID0 personality to enable single zone
striping.
The following changes enable that access:
- add type definition to raid_types array
- make bitmap creation conditonal in super_validate(), because
bitmaps are not allowed in raid0
- set rdev->sectors to the data image size in super_validate()
to allow the raid0 personality to calculate the MD array
size properly
- use mdddev(un)lock() functions instead of direct mutex_(un)lock()
(wrapped in here because it's a trivial change)
- enhance raid_status() to always report full sync for raid0
so that userspace checks for 100% sync will succeed and allow
for resize (and takeover/reshape once added in future paches)
- enhance raid_resume() to not load bitmap in case of raid0
- add merge function to avoid data corruption (seen with readahead)
that resulted from bio payloads that grew too large. This problem
did not occur with the other raid levels because it either did not
apply without striping (raid1) or was avoided via stripe caching.
- raise version to 1.7.0 because of the raid0 API change
Signed-off-by: Heinz Mauelshagen <heinzm@redhat.com>
Reviewed-by: Jonathan Brassow <jbrassow@redhat.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
- ensure maximum device limit in superblock
- rename DMPF_* (print flags) to CTR_FLAG_* (constructor flags)
and their respective struct raid_set member
- use strcasecmp() in raid10_format_to_md_layout() as in the constructor
Signed-off-by: Heinz Mauelshagen <heinzm@redhat.com>
Reviewed-by: Jonathan Brassow <jbrassow@redhat.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
Remove comment above parse_raid_params() that claims
"devices_handle_discard_safely" is a table line argument when it is
actually is a module parameter.
Also, backfill dm-raid target version 1.6.0 documentation.
Signed-off-by: Heinz Mauelshagen <heinzm@redhat.com>
Reviewed-by: Jonathan Brassow <jbrassow@redhat.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
stacking ontop of blk-mq devices. This blk-mq support changes the
model request-based DM uses for cloning a request to relying on
calling blk_get_request() directly from the underlying blk-mq device.
Early consumer of this code is Intel's emerging NVMe hardware; thanks
to Keith Busch for working on, and pushing for, these changes.
- A few other small fixes and cleanups across other DM targets.
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1
iQEcBAABAgAGBQJU3NRnAAoJEMUj8QotnQNavG0H/3yogMcHvKg9H+w0WmUQdwhN
w99Wj3nkquAw2sm9yahKlAMBNY53iu/LHmC6/PaTpJetgdH7y1foTrRa0qjyeB2D
DgNr8mOzxSxzX6CX9V8JMwqzky9XoG2IOt/7FeQQOpMqp4T1M2zgvbZtpl0lK/f3
lNaNBFpl+47NbGssD/WbtfI4Yy3hX0u406yGmQN5DxRyGTWD2AFqpA76g2mp8vrp
wmw259gPr4oLhj3pDc0GkuiVn59ZR2Zp+2gs0jD5uKlDL84VP/nE+WNB+ny1Mnmt
cOg8Q+W6/OosL66MKBHNsF0QS6DXNo5UvsN9fHGa5IUJw7Tsa11ZEPKHZGEbQw4=
=RiN2
-----END PGP SIGNATURE-----
Merge tag 'dm-3.20-changes' of git://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm
Pull device mapper changes from Mike Snitzer:
- The most significant change this cycle is request-based DM now
supports stacking ontop of blk-mq devices. This blk-mq support
changes the model request-based DM uses for cloning a request to
relying on calling blk_get_request() directly from the underlying
blk-mq device.
An early consumer of this code is Intel's emerging NVMe hardware;
thanks to Keith Busch for working on, and pushing for, these changes.
- A few other small fixes and cleanups across other DM targets.
* tag 'dm-3.20-changes' of git://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm:
dm: inherit QUEUE_FLAG_SG_GAPS flags from underlying queues
dm snapshot: remove unnecessary NULL checks before vfree() calls
dm mpath: simplify failure path of dm_multipath_init()
dm thin metadata: remove unused dm_pool_get_data_block_size()
dm ioctl: fix stale comment above dm_get_inactive_table()
dm crypt: update url in CONFIG_DM_CRYPT help text
dm bufio: fix time comparison to use time_after_eq()
dm: use time_in_range() and time_after()
dm raid: fix a couple integer overflows
dm table: train hybrid target type detection to select blk-mq if appropriate
dm: allocate requests in target when stacking on blk-mq devices
dm: prepare for allocating blk-mq clone requests in target
dm: submit stacked requests in irq enabled context
dm: split request structure out from dm_rq_target_io structure
dm: remove exports for request-based interfaces without external callers
My static checker complains that if "num_raid_params" is UINT_MAX then
the "if (num_raid_params + 1 > argc) {" check doesn't work as intended.
The other change is that I moved the "if (argc != (num_raid_devs * 2))"
condition forward a few lines so it was before the call to
context_alloc(). If we had an integer overflow inside that function
then it would lead to an immediate crash.
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
There is currently no locking around calls to the 'congested'
bdi function. If called at an awkward time while an array is
being converted from one level (or personality) to another, there
is a tiny chance of running code in an unreferenced module etc.
So add a 'congested' function to the md_personality operations
structure, and call it with appropriate locking from a central
'mddev_congested'.
When the array personality is changing the array will be 'suspended'
so no IO is processed.
If mddev_congested detects this, it simply reports that the
array is congested, which is a safe guess.
As mddev_suspend calls synchronize_rcu(), mddev_congested can
avoid races by included the whole call inside an rcu_read_lock()
region.
This require that the congested functions for all subordinate devices
can be run under rcu_lock. Fortunately this is the case.
Signed-off-by: NeilBrown <neilb@suse.de>
Commit 48cf06bc5f ("dm raid: add discard support for RAID levels 4, 5
and 6") did not properly handle missing metadata device(s). A failing
read of the superblock causes the metadata and data devices to be
removed from the dev array in struct raid_set, setting references to
both devices to NULL. configure_discard_support() nonetheless tries to
access the data dev unconditionally causing an oops.
Signed-off-by: Heinz Mauelshagen <heinzm@redhat.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
The dm-raid superblock (struct dm_raid_superblock) is padded to 512
bytes and that size is being used to read it in from the metadata
device into one preallocated page.
Reading or writing this on a 512-byte sector device works fine but on
a 4096-byte sector device this fails.
Set the dm-raid superblock's size to the logical block size of the
metadata device, because IO at that size is guaranteed too work. Also
add a size check to avoid silent partial metadata loss in case the
superblock should ever grow past the logical block size or PAGE_SIZE.
[includes pointer math fix from Dan Carpenter]
Reported-by: "Liuhua Wang" <lwang@suse.com>
Signed-off-by: Heinz Mauelshagen <heinzm@redhat.com>
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
Cc: stable@vger.kernel.org
In case of RAID levels 4, 5 and 6 we have to verify each RAID members'
ability to zero data on discards to avoid stripe data corruption -- if
discard_zeroes_data is not set for each RAID member discard support must
be disabled. But given the uncertainty of whether or not a RAID member
properly supports zeroing data on discard we require the user to
explicitly allow discard support on RAID levels 4, 5, and 6 by setting
a dm-raid module paramter, e.g.: dm-raid.devices_handle_discard_safely=Y
Otherwise, discards could cause data corruption on RAID4/5/6.
Signed-off-by: Heinz Mauelshagen <heinzm@redhat.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
Discard support is not enabled for RAID levels 4, 5, and 6 at this time
due to concerns about unreliable discard_zeroes_data support on some
hardware. Otherwise, discards could cause stripe data corruption
(classic example of bad apples spoiling the bunch).
Signed-off-by: Heinz Mauelshagen <heinzm@redhat.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
MD: Remember the last sync operation that was performed
This patch adds a field to the mddev structure to track the last
sync operation that was performed. This is especially useful when
it comes to what is recorded in mismatch_cnt in sysfs. If the
last operation was "data-check", then it reports the number of
descrepancies found by the user-initiated check. If it was a
"repair" operation, then it is reporting the number of
descrepancies repaired. etc.
Signed-off-by: Jonathan Brassow <jbrassow@redhat.com>
Signed-off-by: NeilBrown <neilb@suse.de>
The usage of strict_strtoul() is not preferred, because
strict_strtoul() is obsolete. Thus, kstrtoul() should be
used.
Signed-off-by: Jingoo Han <jg1.han@samsung.com>
Signed-off-by: NeilBrown <neilb@suse.de>
This doesn't really need to be initialised, but it doesn't hurt,
silences the compiler, and as it is a counter it makes sense for it to
start at zero.
Signed-off-by: NeilBrown <neilb@suse.de>
DM RAID: Fix raid_resume not reviving failed devices in all cases
When a device fails in a RAID array, it is marked as Faulty. Later,
md_check_recovery is called which (through the call chain) calls
'hot_remove_disk' in order to have the personalities remove the device
from use in the array.
Sometimes, it is possible for the array to be suspended before the
personalities get their chance to perform 'hot_remove_disk'. This is
normally not an issue. If the array is deactivated, then the failed
device will be noticed when the array is reinstantiated. If the
array is resumed and the disk is still missing, md_check_recovery will
be called upon resume and 'hot_remove_disk' will be called at that
time. However, (for dm-raid) if the device has been restored,
a resume on the array would cause it to attempt to revive the device
by calling 'hot_add_disk'. If 'hot_remove_disk' had not been called,
a situation is then created where the device is thought to concurrently
be the replacement and the device to be replaced. Thus, the device
is first sync'ed with the rest of the array (because it is the replacement
device) and then marked Faulty and removed from the array (because
it is also the device being replaced).
The solution is to check and see if the device had properly been removed
before the array was suspended. This is done by seeing whether the
device's 'raid_disk' field is -1 - a condition that implies that
'md_check_recovery -> remove_and_add_spares (where raid_disk is set to -1)
-> hot_remove_disk' has been called. If 'raid_disk' is not -1, then
'hot_remove_disk' must be called to complete the removal of the previously
faulty device before it can be revived via 'hot_add_disk'.
Signed-off-by: Jonathan Brassow <jbrassow@redhat.com>
Signed-off-by: NeilBrown <neilb@suse.de>
DM RAID: Break-up untidy function
Clean-up excessive indentation by moving some code in raid_resume()
into its own function.
Signed-off-by: Jonathan Brassow <jbrassow@redhat.com>
Signed-off-by: NeilBrown <neilb@suse.de>
DM RAID: Add ability to restore transiently failed devices on resume
This patch adds code to the resume function to check over the devices
in the RAID array. If any are found to be marked as failed and their
superblocks can be read, an attempt is made to reintegrate them into
the array. This allows the user to refresh the array with a simple
suspend and resume of the array - rather than having to load a
completely new table, allocate and initialize all the structures and
throw away the old instantiation.
Signed-off-by: Jonathan Brassow <jbrassow@redhat.com>
Signed-off-by: NeilBrown <neilb@suse.de>
DM RAID: Add message/status support for changing sync action
This patch adds a message interface to dm-raid to allow the user to more
finely control the sync actions being performed by the MD driver. This
gives the user the ability to initiate "check" and "repair" (i.e. scrubbing).
Two additional fields have been appended to the status output to provide more
information about the type of sync action occurring and the results of those
actions, specifically: <sync_action> and <mismatch_cnt>. These new fields
will always be populated. This is essentially the device-mapper way of doing
what MD controls through the 'sync_action' sysfs file and shows through the
'mismatch_cnt' sysfs file.
Signed-off-by: Jonathan Brassow <jbrassow@redhat.com>
Signed-off-by: NeilBrown <neilb@suse.de>
mostly little bugfixes.
Only "feature" is a new RAID10 layout which slightly
improves the number of sets of devices that can concurrently
fail, without data loss.
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2.0.19 (GNU/Linux)
iQIVAwUAUTPm+znsnt1WYoG5AQLLsw/+PMqr8roC4twgxTWV1NRbU8NtOcRi9Rj9
uvBS63uYAaLdi/D3UBKFYczmNCu9knuXbcp9SgFDxH7LlthQsWN/GYnif06pPo3w
9Agu5M8c062TJEG1vrnX6FhPO6pNgrWFr3h+CKkTiD3179i9DoQpP8LXQToeyMtI
YRMQf/zCkxYtDvWAP0iwsEWtw8cf+q9I/uGPhQ1L+DnZapXYdbtnqWBRz9q6mrDt
orcGrP41aZHvnOHUaTbwmaorCKkf/Ys4SMaGenrSFpnpQMypt7VgNuwHC59LxvJT
5eiFG/26zIsv7Wk0jv/TvFP5qzUPo0/PFkd5ug0ArvbVRiXS2cMJDwQvMdO1toxD
i5Bb+P9DptadvoWhOTgIpxnG77yRH45wJvyJOk+ZfS1/IO87nCRa3d0yiNOU5e2/
o0VdXPZRr72sdKKTK6kQuYfwCPb+Z2Pz6Q8BJdk6GxlmTXyP6sKhIgwUX86534fE
LrOxfK8qV+GetVu3X02RoX2CyJJRQHXyXmbHuSzXuo/JiOYtDigAydwNZChvf+tf
OoMY9K8vgNbhnGsUG6la7XPvZ+6dZMjdnxp2HB99Ml5A3PWZd75i5T6IHHxIQFbD
C3z9PWTWP+hK4k15DEyjlELtsE9WduGTXG4kUcf328xJ/7lj4VIImVugdCz+1B6z
+HlI6BiLwzY=
=YdVD
-----END PGP SIGNATURE-----
Merge tag 'md-3.9' of git://neil.brown.name/md
Pull md updates from NeilBrown:
"Mostly little bugfixes.
Only "feature" is a new RAID10 layout which slightly improves the
number of sets of devices that can concurrently fail, without data
loss."
* tag 'md-3.9' of git://neil.brown.name/md:
md: expedite metadata update when switching read-auto -> active
md: remove CONFIG_MULTICORE_RAID456
md/raid1,raid10: fix deadlock with freeze_array()
md/raid0: improve error message when converting RAID4-with-spares to RAID0
md: raid0: fix error return from create_stripe_zones.
md: fix two bugs when attempting to resize RAID0 array.
DM RAID: Add support for MD's RAID10 "far" and "offset" algorithms
MD RAID10: Improve redundancy for 'far' and 'offset' algorithms (part 2)
MD RAID10: Improve redundancy for 'far' and 'offset' algorithms (part 1)
MD RAID10: Minor non-functional code changes
md: raid1,10: Handle REQ_WRITE_SAME flag in write bios
md: protect against crash upon fsync on ro array
Use 'bio' in the name of variables and functions that deal with
bios rather than 'request' to avoid confusion with the normal
block layer use of 'request'.
No functional changes.
Signed-off-by: Alasdair G Kergon <agk@redhat.com>
Avoid returning a truncated table or status string instead of setting
the DM_BUFFER_FULL_FLAG when the last target of a table fills the
buffer.
When processing a table or status request, the function retrieve_status
calls ti->type->status. If ti->type->status returns non-zero,
retrieve_status assumes that the buffer overflowed and sets
DM_BUFFER_FULL_FLAG.
However, targets don't return non-zero values from their status method
on overflow. Most targets returns always zero.
If a buffer overflow happens in a target that is not the last in the
table, it gets noticed during the next iteration of the loop in
retrieve_status; but if a buffer overflow happens in the last target, it
goes unnoticed and erroneously truncated data is returned.
In the current code, the targets behave in the following way:
* dm-crypt returns -ENOMEM if there is not enough space to store the
key, but it returns 0 on all other overflows.
* dm-thin returns errors from the status method if a disk error happened.
This is incorrect because retrieve_status doesn't check the error
code, it assumes that all non-zero values mean buffer overflow.
* all the other targets always return 0.
This patch changes the ti->type->status function to return void (because
most targets don't use the return code). Overflow is detected in
retrieve_status: if the status method fills up the remaining space
completely, it is assumed that buffer overflow happened.
Cc: stable@vger.kernel.org
Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
Signed-off-by: Alasdair G Kergon <agk@redhat.com>
DM RAID: Add support for MD's RAID10 "far" and "offset" algorithms
Until now, dm-raid.c only supported the "near" algorthm of MD's RAID10
implementation. This patch adds support for the "far" and "offset"
algorithms, but only with the improved redundancy that is brought with
the introduction of the 'use_far_sets' bit, which shifts copied stripes
according to smaller sets vs the entire array. That is, the 17th bit
of the 'layout' variable that defines the RAID10 implementation will
always be set. (More information on how the 'layout' variable selects
the RAID10 algorithm can be found in the opening comments of
drivers/md/raid10.c.)
Signed-off-by: Jonathan Brassow <jbrassow@redhat.com>
Signed-off-by: NeilBrown <neilb@suse.de>
Before attempting to activate a RAID array, it is checked for sufficient
redundancy. That is, we make sure that there are not too many failed
devices - or devices specified for rebuild - to undermine our ability to
activate the array. The current code performs this check twice - once to
ensure there were not too many devices specified for rebuild by the user
('validate_rebuild_devices') and again after possibly experiencing a failure
to read the superblock ('analyse_superblocks'). Neither of these checks are
sufficient. The first check is done properly but with insufficient
information about the possible failure state of the devices to make a good
determination if the array can be activated. The second check is simply
done wrong in the case of RAID10 because it doesn't account for the
independence of the stripes (i.e. mirror sets). The solution is to use the
properly written check ('validate_rebuild_devices'), but perform the check
after the superblocks have been read and we know which devices have failed.
This gives us one check instead of two and performs it in a location where
it can be done right.
Only RAID10 was affected and it was affected in the following ways:
- the code did not properly catch the condition where a user specified
a device for rebuild that already had a failed device in the same mirror
set. (This condition would, however, be caught at a deeper level in MD.)
- the code triggers a false positive and denies activation when devices in
independent mirror sets have failed - counting the failures as though they
were all in the same set.
The most likely place this error was introduced (or this patch should have
been included) is in commit 4ec1e369 - first introduced in v3.7-rc1.
Consequently this fix should also go in v3.7.y, however there is a
small conflict on the .version in raid_target, so I'll submit a
separate patch to -stable.
Cc: stable@vger.kernel.org
Signed-off-by: Jonathan Brassow <jbrassow@redhat.com>
Signed-off-by: NeilBrown <neilb@suse.de>
This patch removes map_info from bio-based device mapper targets.
map_info is still used for request-based targets.
Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
Signed-off-by: Alasdair G Kergon <agk@redhat.com>
If the user does not supply a bitmap region_size to the dm raid target,
a reasonable size is computed automatically. If this is not a power of 2,
the md code will report an error later.
This patch catches the problem early and rounds the region_size to the
next power of two.
Signed-off-by: Jonathan Brassow <jbrassow@redhat.com>
Signed-off-by: Alasdair G Kergon <agk@redhat.com>
There are two table arguments that can be given to a DM RAID target
that control whether the array is forced to (re)synchronize or skip
initialization: "sync" and "nosync". When "sync" is given, we set
mddev->recovery_cp to 0 in order to cause the device to resynchronize.
This is insufficient if there is a bitmap in use, because the array
will simply look at the bitmap and see that there is no recovery
necessary.
The fix is to skip over the loading of the superblocks when "sync" is
given, causing new superblocks to be written that will force the array
to go through initialization (i.e. synchronization).
Signed-off-by: Jonathan Brassow <jbrassow@redhat.com>
Signed-off-by: NeilBrown <neilb@suse.de>
DM RAID: Fix comparison of index and quantity for "rebuild" parameter
The "rebuild" parameter takes an index argument that starts counting from
zero. The conditional used to validate the index was using '>' rather than
'>=', leaving the door open for an index value that would be 1 too large.
Reported-by: Neil Brown <neilb@suse.de>
Signed-off-by: Jonathan Brassow <jbrassow@redhat.com>
Signed-off-by: NeilBrown <neilb@suse.de>
DM RAID: Add code to validate replacement slots for RAID10 arrays
RAID10 can handle 'copies - 1' failures for each mirror group. This code
ensures the user has provided a valid array - one whose devices specified for
rebuild do not exceed the amount of redundancy available.
Signed-off-by: Jonathan Brassow <jbrassow@redhat.com>
Signed-off-by: NeilBrown <neilb@suse.de>
DM RAID: Move chunk of code to it's own function
The code that checks whether device replacements/rebuilds are possible given
a specific RAID type is moved to it's own function. It will further expand
when the code to check RAID10 is added. A separate function makes it easier
to read.
Signed-off-by: Jonathan Brassow <jbrassow@redhat.com>
Signed-off-by: NeilBrown <neilb@suse.de>
Pull md updates from NeilBrown.
* 'for-next' of git://neil.brown.name/md:
DM RAID: Add support for MD RAID10
md/RAID1: Add missing case for attempting to repair known bad blocks.
md/raid5: For odirect-write performance, do not set STRIPE_PREREAD_ACTIVE.
md/raid1: don't abort a resync on the first badblock.
md: remove duplicated test on ->openers when calling do_md_stop()
raid5: Add R5_ReadNoMerge flag which prevent bio from merging at block layer
md/raid1: prevent merging too large request
md/raid1: read balance chooses idlest disk for SSD
md/raid1: make sequential read detection per disk based
MD RAID10: Export md_raid10_congested
MD: Move macros from raid1*.h to raid1*.c
MD RAID1: rename mirror_info structure
MD RAID10: rename mirror_info structure
MD RAID10: Fix compiler warning.
raid5: add a per-stripe lock
raid5: remove unnecessary bitmap write optimization
raid5: lockless access raid5 overrided bi_phys_segments
raid5: reduce chance release_stripe() taking device_lock
Commit outstanding metadata before returning the status for a dm thin
pool so that the numbers reported are as up-to-date as possible.
The commit is not performed if the device is suspended or if
the DM_NOFLUSH_FLAG is supplied by userspace and passed to the target
through a new 'status_flags' parameter in the target's dm_status_fn.
The userspace dmsetup tool will support the --noflush flag with the
'dmsetup status' and 'dmsetup wait' commands from version 1.02.76
onwards.
Tested-by: Mike Snitzer <snitzer@redhat.com>
Signed-off-by: Alasdair G Kergon <agk@redhat.com>
In preparation for RAID10 inclusion in dm-raid, we move the sectors_per_dev
calculation later in the device creation process. This is because we won't
know up-front how many stripes vs how many mirrors there are which will
change the calculation.
Signed-off-by: Jonathan Brassow <jbrassow@redhat.com>
Signed-off-by: Alasdair G Kergon <agk@redhat.com>
In preparation for RAID10 addition to dm-raid, we change an 'if' conditional
to a 'switch' conditional to make it easier to see what is being checked for
each RAID type.
Signed-off-by: Jonathan Brassow <jbrassow@redhat.com>
Signed-off-by: Alasdair G Kergon <agk@redhat.com>
Remove the restriction that limits a target's specified maximum incoming
I/O size to be a power of 2.
Rename this setting from 'split_io' to the less-ambiguous 'max_io_len'.
Change it from sector_t to uint32_t, which is plenty big enough, and
introduce a wrapper function dm_set_target_max_io_len() to set it.
Use sector_div() to process it now that it is not necessarily a power of 2.
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
Signed-off-by: Alasdair G Kergon <agk@redhat.com>
When encountering an error while reading the superblock, call md_error.
We are currently setting the 'Faulty' bit on one of the array devices when an
error is encountered while reading the superblock of a dm-raid array. We should
be calling md_error(), as it handles the error more completely.
Signed-off-by: Jonathan Brassow <jbrassow@redhat.com>
Signed-off-by: NeilBrown <neilb@suse.de>
Missing dm-raid devices should be recorded in the superblock
When specifying the devices that compose a DM RAID array, it is possible to denote
failed or missing devices with '-'s. When this occurs, we must record this in the
superblock. We do this by checking if the array position's data device is missing
and then forcing MD to record the superblock by setting 'MD_CHANGE_DEVS' in
'raid_resume'. If we do not cause the superblock to be rewritten by the resume
function, it is possible for a stale superblock to be written by an out-going
in-active table (during 'raid_dtr').
Signed-off-by: Jonathan Brassow <jbrassow@redhat.com>
Signed-off-by: NeilBrown <neilb@suse.de>
Properly initialize MD recovery flags when resuming device-mapper devices.
When a device-mapper device is suspended, all I/O must stop. This is done by
calling 'md_stop_writes' and 'mddev_suspend'. These calls in-turn manipulate
the recovery flags - including setting 'MD_RECOVERY_FROZEN'. The DM device
may have been suspended while recovery was not yet complete, so the process
needs to pick-up where it left off. Since 'mddev_resume' does not unset
'MD_RECOVERY_FROZEN' and set 'MD_RECOVERY_NEEDED', we must do it ourselves.
'MD_RECOVERY_NEEDED' can safely be set in 'mddev_resume', but 'MD_RECOVERY_FROZEN'
must be set outside of 'mddev_resume' due to how MD handles RAID reshaping.
(e.g. It is possible for a user to delay reshaping a RAID5->RAID6 by purposefully
setting 'MD_RECOVERY_FROZEN'. Clearing it in 'mddev_resume' would override the
desired behavior.)
Because 'mddev_resume' already unconditionally calls 'md_wakeup_thread(mddev->thread)'
there is no need to make this call from 'raid_resume' since it calls 'mddev_resume'.
Also clean up where level_store calls mddev_resume() - it current
duplicates some of the funcitons of that call. - NB
Signed-off-by: Jonathan Brassow <jbrassow@redhat.com>
Signed-off-by: NeilBrown <neilb@suse.de>
dm-raid currently open-codes the freeing of some members of
and rdev. It is more maintainable to have it call common code
from md.c which does this for all call-sites.
So remove free_disk_sb to md_rdev_clear, export it, and use it in
dm-raid.c
Signed-off-by: NeilBrown <neilb@suse.de>