To support the merging of snapshots back into their origin we need
to trigger exceptions in other snapshots not being merged without
any incoming bio on the origin device. The bio parameter to
__origin_write() becomes optional and the sector needs supplying
separately.
Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
Signed-off-by: Alasdair G Kergon <agk@redhat.com>
This patch moves DMF_SUSPENDED flag set before postsuspend.
No one should care about the ordering, because the flag set and
the postsuspend are protected by a single lock, md->suspend_lock,
and all strict flag-checkers take the lock.
Signed-off-by: Kiyoshi Ueda <k-ueda@ct.jp.nec.com>
Signed-off-by: Jun'ichi Nomura <j-nomura@ce.jp.nec.com>
Cc: Mike Anderson <andmike@linux.vnet.ibm.com>
Signed-off-by: Alasdair G Kergon <agk@redhat.com>
The default plain IV is 32-bit only.
This plain64 IV provides a compatible mode for encrypted devices bigger
than 4TB.
Signed-off-by: Milan Broz <mbroz@redhat.com>
Signed-off-by: Alasdair G Kergon <agk@redhat.com>
Permit in-use snapshot exception data to be 'handed over' from one
snapshot instance to another. This is a pre-requisite for patches
that allow the changes made in a snapshot device to be merged back into
its origin device and also allows device resizing.
The basic call sequence is:
dmsetup load new_snapshot (referencing the existing in-use cow device)
- the ctr code detects that the cow is already in use and allows the
two snapshot target instances to be linked together
dmsetup suspend original_snapshot
dmsetup resume new_snapshot
- the new_snapshot becomes live, and if anything now tries to access
the original one it will receive -EIO
dmsetup remove original_snapshot
(There can only be two snapshot targets referencing the same cow device
simultaneously.)
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
Signed-off-by: Alasdair G Kergon <agk@redhat.com>
When swapping a new table into place, retain the old table until
its replacement is in place.
An old check for an empty table is removed because this is enforced
in populate_table().
__unbind() becomes redundant when followed by __bind().
Signed-off-by: Alasdair G Kergon <agk@redhat.com>
When replacing a mapped device's table during a 'resume', delay the
destruction of the old table until the new one is successfully in place.
This will make it easier for a later patch to transfer internal state
information from the old table to the new one (something we do not currently
support) while giving us more options for reversion if a later part
of the operation fails.
Devices are always in the suspended state during dm_swap_table().
This patch reinforces the requirement that all I/O must have been
flushed from the table targets while in this state (including any in
workqueues). In the case of 'noflush' suspending, unprocessed
I/O should have been 'pushed back' to the dm core prior to this point,
for resubmission after the new table is in place.
Signed-off-by: Alasdair G Kergon <agk@redhat.com>
Add the flag DM_QUERY_INACTIVE_TABLE_FLAG to the ioctls to return
infomation about the loaded-but-not-yet-active table instead of the live
table. Prior to this patch it was impossible to obtain this information
until the device had been 'resumed'.
Userspace dmsetup and libdevmapper support the flag as of version 1.02.40.
e.g. dmsetup info --inactive vg1-lv1
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
Signed-off-by: Alasdair G Kergon <agk@redhat.com>
Accept empty barriers in dm-io.
dm-io will process empty write barrier requests just like the other
read/write requests.
Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
Signed-off-by: Alasdair G Kergon <agk@redhat.com>
Add a mutex to allow possible creators of new work to synchronize with
flushing work queues.
Signed-off-by: Mike Anderson <andmike@linux.vnet.ibm.com>
Acked-by: Kiyoshi Ueda <k-ueda@ct.jp.nec.com>
Signed-off-by: Alasdair G Kergon <agk@redhat.com>
Once we begin deleting a device, prevent any further messages being sent
to targets of its table (to avoid races).
Signed-off-by: Mike Anderson <andmike@linux.vnet.ibm.com>
Signed-off-by: Alasdair G Kergon <agk@redhat.com>
Add dm_deleting_md to check whether or not a given mapped
device is currently being deleted.
Signed-off-by: Mike Anderson <andmike@linux.vnet.ibm.com>
Signed-off-by: Alasdair G Kergon <agk@redhat.com>
This patch stops the remaining dm-mpath activity during the suspend
sequence by flushing workqueues in postsuspend function.
The current dm-mpath target may not be quiet even after suspend completes
because some workqueues (e.g. device_handler's work, event handling)
are not flushed during the suspend sequence, even though suspended
devices/targets are supposed to be quiet in this state.
Signed-off-by: Kiyoshi Ueda <k-ueda@ct.jp.nec.com>
Signed-off-by: Jun'ichi Nomura <j-nomura@ce.jp.nec.com>
Signed-off-by: Alasdair G Kergon <agk@redhat.com>
This patch adds barrier support for request-based dm.
CORE DESIGN
The design is basically same as bio-based dm, which emulates barrier
by mapping empty barrier bios before/after a barrier I/O.
But request-based dm has been using struct request_queue for I/O
queueing, so the block-layer's barrier mechanism can be used.
o Summary of the block-layer's behavior (which is depended by dm-core)
Request-based dm uses QUEUE_ORDERED_DRAIN_FLUSH ordered mode for
I/O barrier. It means that when an I/O requiring barrier is found
in the request_queue, the block-layer makes pre-flush request and
post-flush request just before and just after the I/O respectively.
After the ordered sequence starts, the block-layer waits for all
in-flight I/Os to complete, then gives drivers the pre-flush request,
the barrier I/O and the post-flush request one by one.
It means that the request_queue is stopped automatically by
the block-layer until drivers complete each sequence.
o dm-core
For the barrier I/O, treats it as a normal I/O, so no additional
code is needed.
For the pre/post-flush request, flushes caches by the followings:
1. Make the number of empty barrier requests required by target's
num_flush_requests, and map them (dm_rq_barrier()).
2. Waits for the mapped barriers to complete (dm_rq_barrier()).
If error has occurred, save the error value to md->barrier_error
(dm_end_request()).
(*) Basically, the first reported error is taken.
But -EOPNOTSUPP supersedes any error and DM_ENDIO_REQUEUE
follows.
3. Requeue the pre/post-flush request if the error value is
DM_ENDIO_REQUEUE. Otherwise, completes with the error value
(dm_rq_barrier_work()).
The pre/post-flush work above is done in the kernel thread (kdmflush)
context, since memory allocation which might sleep is needed in
dm_rq_barrier() but sleep is not allowed in dm_request_fn(), which is
an irq-disabled context.
Also, clones of the pre/post-flush request share an original, so
such clones can't be completed using the softirq context.
Instead, complete them in the context of underlying device drivers.
It should be safe since there is no I/O dispatching during
the completion of such clones.
For suspend, the workqueue of kdmflush needs to be flushed after
the request_queue has been stopped. Otherwise, the next flush work
can be kicked even after the suspend completes.
TARGET INTERFACE
No new interface is added.
Just use the existing num_flush_requests in struct target_type
as same as bio-based dm.
Signed-off-by: Kiyoshi Ueda <k-ueda@ct.jp.nec.com>
Signed-off-by: Jun'ichi Nomura <j-nomura@ce.jp.nec.com>
Signed-off-by: Alasdair G Kergon <agk@redhat.com>
This patch moves dm_end_request() to make the next patch more readable.
No functional change.
Signed-off-by: Kiyoshi Ueda <k-ueda@ct.jp.nec.com>
Signed-off-by: Jun'ichi Nomura <j-nomura@ce.jp.nec.com>
Signed-off-by: Alasdair G Kergon <agk@redhat.com>
This patch factors out the clone completion code, dm_done(),
from dm_softirq_done() in preparation for a subsequent patch.
No functional change.
dm_done() will be used in barrier completion, which can't use and
doesn't need softirq. The softirq_done callback needs to get a clone
from an original request but it can't in the case of barrier, where
an original request is shared by multiple clones. On the other hand,
the completion of barrier clones doesn't involve re-submitting requests,
which was the primary reason of the need for softirq.
Signed-off-by: Kiyoshi Ueda <k-ueda@ct.jp.nec.com>
Signed-off-by: Jun'ichi Nomura <j-nomura@ce.jp.nec.com>
Signed-off-by: Alasdair G Kergon <agk@redhat.com>
This patch changes the counter for the number of in_flight I/Os
to md->pending from q->in_flight in preparation for a later patch.
No functional change.
Request-based dm used q->in_flight to count the number of in-flight
clones assuming the counter is always incremented for an in-flight
original request and original:clone is 1:1 relationship.
However, it this no longer true for barrier requests.
So use md->pending to count the number of in-flight clones.
Signed-off-by: Kiyoshi Ueda <k-ueda@ct.jp.nec.com>
Signed-off-by: Jun'ichi Nomura <j-nomura@ce.jp.nec.com>
Signed-off-by: Alasdair G Kergon <agk@redhat.com>
The semantics of bio-based dm were changed recently in the case of
suspend with "--nolockfs" but without "--noflush".
Before 2.6.30, I/Os submitted before the suspend invocation were always
flushed. From 2.6.30 onwards, I/Os submitted before the suspend
invocation might not be flushed. (For details, see
http://marc.info/?t=123994433400003&r=1&w=2)
This patch brings the behaviour of request-based dm into line with
bio-based dm, simplifying the code and preparing for a subsequent patch
that will wait for all in_flight I/Os to complete without stopping
request_queue and use dm_wait_for_completion() for it.
This change in semantics simplifies the suspend code as follows:
o Suspend is implemented as stopping request_queue
in request-based dm, and all I/Os are queued in the request_queue
even after suspend is invoked.
o In the old semantics, we had to track whether I/Os were
queued before or after the suspend invocation, so a special
barrier-like request called 'suspend marker' was introduced.
o With the new semantics, we don't need to flush any I/O
so we can remove the marker and the code related to the marker
handling and I/O flushing.
After removing this codes, the suspend sequence is now:
1. Flush all I/Os by lock_fs() if needed.
2. Stop dispatching any I/O by stopping the request_queue.
3. Wait for all in-flight I/Os to be completed or requeued.
Signed-off-by: Kiyoshi Ueda <k-ueda@ct.jp.nec.com>
Signed-off-by: Jun'ichi Nomura <j-nomura@ce.jp.nec.com>
Signed-off-by: Alasdair G Kergon <agk@redhat.com>
This patch factors out the request cloning code in dm_prep_fn()
as clone_rq(). No functional change.
This patch is a preparation for a later patch in this series which needs to
make clones from an original barrier request.
Signed-off-by: Kiyoshi Ueda <k-ueda@ct.jp.nec.com>
Signed-off-by: Jun'ichi Nomura <j-nomura@ce.jp.nec.com>
Signed-off-by: Alasdair G Kergon <agk@redhat.com>
This patch adds the gfp_mask argument to alloc_rq_tio().
No functional change.
This patch is a preparation for a later patch in this series which needs to
allocate tio (for barrier I/O) with different allocation flag (GFP_NOIO) from
the one in the normal I/O code path.
Signed-off-by: Kiyoshi Ueda <k-ueda@ct.jp.nec.com>
Signed-off-by: Jun'ichi Nomura <j-nomura@ce.jp.nec.com>
Signed-off-by: Alasdair G Kergon <agk@redhat.com>